Re: [PATCH] [x86] Fix ICE [PR target/98833]

2021-01-27 Thread Jakub Jelinek via Gcc-patches
On Wed, Jan 27, 2021 at 03:22:45PM +0800, Hongtao Liu wrote:
> Hi:
>   As desribed in PR, also remove the relevant and useless expanders
> and builtins, the user can
> directly use == and >, without calling the builtin function.
>   Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> 
> gcc/ChangeLog:
> 
> PR target/98833
> * config/i386/i386-builtin.def (BDESC): Delete
> __builtin_ia32_pcmpeqb128, __builtin_ia32_pcmpeqw128,
> __builtin_ia32_pcmpeqd128, __builtin_ia32_pcmpgtb128,
> __builtin_ia32_pcmpgtw128, __builtin_ia32_pcmpgtd128,
> __builtin_ia32_pcmpeqb256, __builtin_ia32_pcmpeqw256,
> __builtin_ia32_pcmpeqd256, __builtin_ia32_pcmpeqq256,
> __builtin_ia32_pcmpgtb256, __builtin_ia32_pcmpgtw256,
> __builtin_ia32_pcmpgtd256, __builtin_ia32_pcmpgtq256.
> * config/i386/sse.md (avx2_eq3): Deleted.
> (sse2_eq3): Ditto.
> (sse2_gt3): Renamed to ..
> (*sse2_gt3): And drop !TARGET_XOP in condition.
> (*sse2_eq3): Drop !TARGET_XOP in condition.
> 
> gcc/testsuite/ChangeLog:
> 
> PR target/98833
> * gcc.target/i386/pr98833.c: New test.
> 
> libcpp/
> 
> PR target/98833
> * lex.c (search_line_sse2): Replace builtins with == operator.

Oops, I wasn't aware of the libcpp use, I'm afraid that means we should
reconsider removing the builtins, because that means that e.g.
GCC 10 will not build with GCC 11 as system compiler anymore, people
bisecting GCC changes will have troubles etc.
And a codesearch seems to show that other projects do use these builtins
(even the AVX2 ones) too :(.
I think the libcpp/ change is ok, as we've bumped minimum GCC version
for building GCC to GCC 4.8 in GCC 11.  Note that GCC 4.7 in C++
doesn't handle the == operators:
error: invalid operands of types ‘v16qi {aka __vector(16) char}’ and ‘v16qi 
{aka __vector(16) char}’ to binary ‘operator==’
but 4.8 seems to work fine already.

But, can we perhaps keep the builtins, but fold them immediately in
ix86_gimple_fold_builtin so that we don't need the named patterns?

Though, I guess that we could defer those changes to GCC 12.

So can you just drop the !TARGET_XOP from conditions, add testcase and
change libcpp/ for GCC 11 and do the rest for GCC 12?

Thanks.

Jakub



[arm/testsuite]: Skip pr97969.c if -mthumb is not compatible [PR target/97969]

2021-01-27 Thread Christophe Lyon via Gcc-patches
Depending on how the toolchain is configured or how the testsuite is
executed, -mthumb may not be compatible. Like for other tests, skip
pr97969.c in this case.

For instance arm-linux-gnueabihf and -march=armv5t in RUNTESTFLAGS.

2021-01-27  Christophe Lyon  

gcc/testsuite/
PR target/97969
* gcc.target/arm/pr97969.c: Skip if thumb mode is not available.

diff --git a/gcc/testsuite/gcc.target/arm/pr97969.c
b/gcc/testsuite/gcc.target/arm/pr97969.c
index 714a1d1..0b5d07f 100644
--- a/gcc/testsuite/gcc.target/arm/pr97969.c
+++ b/gcc/testsuite/gcc.target/arm/pr97969.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */
 /* { dg-options "-std=c99 -fno-omit-frame-pointer -mthumb -w -Os" } */

 typedef a[23];
[arm/testsuite]: Skip pr97969.c if -mthumb is not compatible [PR target/97969]

Depending on how the toolchain is configured or how the testsuite is
executed, -mthumb may not be compatible. Like for other tests, skip
pr97969.c in this case.

For instance arm-linux-gnueabihf and -march=armv5t in RUNTESTFLAGS.

2021-01-27  Christophe Lyon  

	gcc/testsuite/
	PR target/97969
	* gcc.target/arm/pr97969.c: Skip if thumb mode is not available.

diff --git a/gcc/testsuite/gcc.target/arm/pr97969.c b/gcc/testsuite/gcc.target/arm/pr97969.c
index 714a1d1..0b5d07f 100644
--- a/gcc/testsuite/gcc.target/arm/pr97969.c
+++ b/gcc/testsuite/gcc.target/arm/pr97969.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */
 /* { dg-options "-std=c99 -fno-omit-frame-pointer -mthumb -w -Os" } */
 
 typedef a[23];


[committed] Rename PROP_trees to PROP_gimple

2021-01-27 Thread Jakub Jelinek via Gcc-patches
On Tue, Jan 26, 2021 at 12:25:16PM +0100, Richard Biener wrote:
> On Tue, 26 Jan 2021, Jakub Jelinek wrote:
> 
> > On Tue, Jan 26, 2021 at 12:16:14PM +0100, Richard Biener wrote:
> > > > +  /* Unless this is called during FE folding.  */
> > > > +  if (cfun
> > > > + && (cfun->curr_properties & (PROP_trees | PROP_rtl)) == 0
> > > 
> > > don't you want && (cfun->curr_properties & PROP_trees) != 0?
> > 
> > No, PROP_trees is misnamed and it actually means GIMPLE.
> 
> Doh.  Patch doing s/PROP_trees/PROP_gimple/ pre-approved ;)

Here is what I've committed after bootstrapping/regtesting it on
x86_64-linux and i686-linux:

2021-01-27  Jakub Jelinek  

* tree-pass.h (PROP_trees): Rename to ...
(PROP_gimple): ... this.
* cfgexpand.c (pass_data_expand): Replace PROP_trees with PROP_gimple.
* passes.c (execute_function_dump, execute_function_todo,
execute_one_ipa_transform_pass, execute_one_pass): Likewise.
* varpool.c (ctor_for_folding): Likewise.

--- gcc/tree-pass.h.jj  2021-01-04 10:25:37.702246631 +0100
+++ gcc/tree-pass.h 2021-01-26 12:35:54.336577674 +0100
@@ -225,7 +225,7 @@ protected:
 #define PROP_gimple_lomp_dev   (1 << 16)   /* done omp_device_lower */
 #define PROP_rtl_split_insns   (1 << 17)   /* RTL has insns split.  */
 
-#define PROP_trees \
+#define PROP_gimple \
   (PROP_gimple_any | PROP_gimple_lcf | PROP_gimple_leh | PROP_gimple_lomp)
 
 /* To-do flags.  */
--- gcc/cfgexpand.c.jj  2021-01-05 19:13:20.573245780 +0100
+++ gcc/cfgexpand.c 2021-01-26 12:34:37.033460746 +0100
@@ -6503,7 +6503,7 @@ const pass_data pass_data_expand =
 | PROP_gimple_lvec
 | PROP_gimple_lva), /* properties_required */
   PROP_rtl, /* properties_provided */
-  ( PROP_ssa | PROP_trees ), /* properties_destroyed */
+  ( PROP_ssa | PROP_gimple ), /* properties_destroyed */
   0, /* todo_flags_start */
   0, /* todo_flags_finish */
 };
--- gcc/passes.c.jj 2021-01-04 10:25:37.289251307 +0100
+++ gcc/passes.c2021-01-26 12:35:04.595145895 +0100
@@ -1793,7 +1793,7 @@ execute_function_dump (function *fn, voi
 {
   push_cfun (fn);
 
-  if (fn->curr_properties & PROP_trees)
+  if (fn->curr_properties & PROP_gimple)
 dump_function_to_file (fn->decl, dump_file, dump_flags);
   else
print_rtl_with_bb (dump_file, get_insns (), dump_flags);
@@ -2034,7 +2034,7 @@ execute_function_todo (function *fn, voi
 
   if (flags & TODO_verify_il)
{
- if (cfun->curr_properties & PROP_trees)
+ if (cfun->curr_properties & PROP_gimple)
{
  if (cfun->curr_properties & PROP_cfg)
/* IPA passes leave stmts to be fixed up, so make sure to
@@ -2272,7 +2272,7 @@ execute_one_ipa_transform_pass (struct c
 
   /* Note that the folders should only create gimple expressions.
  This is a hack until the new folder is ready.  */
-  in_gimple_form = (cfun && (cfun->curr_properties & PROP_trees)) != 0;
+  in_gimple_form = (cfun && (cfun->curr_properties & PROP_gimple)) != 0;
 
   pass_init_dump_file (pass);
 
@@ -2545,7 +2545,7 @@ execute_one_pass (opt_pass *pass)
 
   /* Note that the folders should only create gimple expressions.
  This is a hack until the new folder is ready.  */
-  in_gimple_form = (cfun && (cfun->curr_properties & PROP_trees)) != 0;
+  in_gimple_form = (cfun && (cfun->curr_properties & PROP_gimple)) != 0;
 
   pass_init_dump_file (pass);
 
@@ -2628,7 +2628,7 @@ execute_one_pass (opt_pass *pass)
   pass_fini_dump_file (pass);
 
   if (pass->type != SIMPLE_IPA_PASS && pass->type != IPA_PASS)
-gcc_assert (!(cfun->curr_properties & PROP_trees)
+gcc_assert (!(cfun->curr_properties & PROP_gimple)
|| pass->type != RTL_PASS);
 
   current_pass = NULL;
--- gcc/varpool.c.jj2021-01-26 11:36:46.081295478 +0100
+++ gcc/varpool.c   2021-01-26 12:36:13.989353173 +0100
@@ -415,7 +415,7 @@ ctor_for_folding (tree decl)
   gcc_assert (!TREE_PUBLIC (decl));
   /* Unless this is called during FE folding.  */
   if (cfun
- && (cfun->curr_properties & (PROP_trees | PROP_rtl)) == 0
+ && (cfun->curr_properties & (PROP_gimple | PROP_rtl)) == 0
  && TREE_READONLY (decl)
  && !TREE_SIDE_EFFECTS (decl)
  && DECL_INITIAL (decl))


Jakub



RE: arm: Adjust cost of vector of constant zero

2021-01-27 Thread Kyrylo Tkachov via Gcc-patches
Hi Christophe,

> -Original Message-
> From: Gcc-patches  On Behalf Of
> Christophe Lyon via Gcc-patches
> Sent: 26 January 2021 18:03
> To: gcc Patches 
> Subject: arm: Adjust cost of vector of constant zero
> 
> Neon vector comparisons have a dedicated version when comparing with
> constant zero: it means its cost is free.
> 
> Adjust the cost in arm_rtx_costs_internal accordingly, for Neon only,
> since MVE does not support this.

I guess the other way to do this would be in the comparison code handling in 
this function where we could check for a const_vector of zeroes and a Neon mode 
and avoid recursing into the operands.
That would avoid the extra switch statement in your patch.
WDYT?
Thanks,
Kyrill

> 
> 2021-01-26  Christophe Lyon  
> 
> gcc/
> PR target/98730
> * config/arm/arm.c (arm_rtx_costs_internal): Adjust cost of vector
> of constant zero for comparisons.
> 
> gcc/testsuite/
> PR target/98730
> * gcc.target/arm/simd/vceqzq_p64.c: Update expected result.
> 
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 4a5f265..9c5c0df 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -11544,7 +11544,28 @@ arm_rtx_costs_internal (rtx x, enum rtx_code
> code, enum rtx_code outer_code,
>   && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE
> (mode)))
>  || TARGET_HAVE_MVE)
> && simd_immediate_valid_for_move (x, mode, NULL, NULL))
> - *cost = COSTS_N_INSNS (1);
> + {
> +   *cost = COSTS_N_INSNS (1);
> +
> +   /* Neon has special instructions when comparing with 0 (vceq, vcge,
> +  vcgt, vcle and vclt). */
> +   if (TARGET_NEON && (x == CONST0_RTX (mode)))
> + {
> +   switch (outer_code)
> + {
> + case EQ:
> + case GE:
> + case GT:
> + case LE:
> + case LT:
> +   *cost = COSTS_N_INSNS (0);
> +   break;
> +
> + default:
> +   break;
> + }
> + }
> + }
>else
>   *cost = COSTS_N_INSNS (4);
>return true;
> diff --git a/gcc/testsuite/gcc.target/arm/simd/vceqzq_p64.c
> b/gcc/testsuite/gcc.target/arm/simd/vceqzq_p64.c
> index 640754c..a99bb8a 100644
> --- a/gcc/testsuite/gcc.target/arm/simd/vceqzq_p64.c
> +++ b/gcc/testsuite/gcc.target/arm/simd/vceqzq_p64.c
> @@ -15,4 +15,4 @@ void func()
>result2 = vceqzq_p64 (v2);
>  }
> 
> -/* { dg-final { scan-assembler-times "vceq\.i32\[
> \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+\n" 2 } } */
> +/* { dg-final { scan-assembler-times "vceq\.i32\[
> \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+, #0\n" 2 } } */


[PATCH] i386: Add peephole2 for __atomic_sub_fetch (x, y, z) == 0 [PR98737]

2021-01-27 Thread Jakub Jelinek via Gcc-patches
Hi!

This patch adds a peephole2 for the optimization requested in the PR,
namely that we emit awful code for __atomic_sub_fetch (x, y, z) == 0
or __atomic_sub_fetch (x, y, z) != 0 when y is not constant.
This can't be done in the combiner which punts on combining UNSPEC_VOLATILE
into other insns.

For other ops we'd need different peephole2s, this one is specific with its
comparison instruction and negation that need to be matched.

Bootstrapped/regtested on x86_64-linux and i686-linux.  Is this ok for trunk
(as exception), or for GCC 12?

2021-01-27  Jakub Jelinek  

PR target/98737
* config/i386/sync.md (neg; mov; lock xadd; add peephole2): New
define_peephole2.
(*atomic_fetch_sub_cmp): New define_insn.

* gcc.target/i386/pr98737.c: New test.

--- gcc/config/i386/sync.md.jj  2021-01-04 10:25:45.392159555 +0100
+++ gcc/config/i386/sync.md 2021-01-26 16:03:13.911100510 +0100
@@ -777,6 +777,63 @@ (define_insn "*atomic_fetch_add_cmp}\t{%1, %0|%0, %1}";
 })
 
+;; Similarly, peephole for __sync_sub_fetch (x, b) == 0 into just
+;; lock sub followed by testing of flags instead of lock xadd, negation and
+;; comparison.
+(define_peephole2
+  [(parallel [(set (match_operand 0 "register_operand")
+  (neg (match_dup 0)))
+ (clobber (reg:CC FLAGS_REG))])
+   (set (match_operand:SWI 1 "register_operand")
+   (match_operand:SWI 2 "register_operand"))
+   (parallel [(set (match_operand:SWI 3 "register_operand")
+  (unspec_volatile:SWI
+[(match_operand:SWI 4 "memory_operand")
+ (match_operand:SI 5 "const_int_operand")]
+UNSPECV_XCHG))
+ (set (match_dup 4)
+  (plus:SWI (match_dup 4)
+(match_dup 3)))
+ (clobber (reg:CC FLAGS_REG))])
+   (parallel [(set (reg:CCZ FLAGS_REG)
+  (compare:CCZ (neg:SWI
+ (match_operand:SWI 6 "register_operand"))
+   (match_dup 3)))
+ (clobber (match_dup 3))])]
+  "(GET_MODE (operands[0]) == mode
+|| GET_MODE (operands[0]) == mode)
+   && reg_or_subregno (operands[0]) == reg_or_subregno (operands[2])
+   && (rtx_equal_p (operands[2], operands[3])
+   ? rtx_equal_p (operands[1], operands[6])
+   : (rtx_equal_p (operands[2], operands[6])
+ && rtx_equal_p (operands[1], operands[3])))
+   && peep2_reg_dead_p (4, operands[6])
+   && peep2_reg_dead_p (4, operands[3])
+   && !reg_overlap_mentioned_p (operands[1], operands[4])
+   && !reg_overlap_mentioned_p (operands[2], operands[4])"
+  [(parallel [(set (reg:CCZ FLAGS_REG)
+  (compare:CCZ
+(unspec_volatile:SWI [(match_dup 4) (match_dup 5)]
+ UNSPECV_XCHG)
+(match_dup 2)))
+ (set (match_dup 4)
+  (minus:SWI (match_dup 4)
+ (match_dup 2)))])])
+
+(define_insn "*atomic_fetch_sub_cmp"
+  [(set (reg:CCZ FLAGS_REG)
+   (compare:CCZ
+ (unspec_volatile:SWI
+   [(match_operand:SWI 0 "memory_operand" "+m")
+(match_operand:SI 2 "const_int_operand")]  ;; model
+   UNSPECV_XCHG)
+ (match_operand:SWI 1 "register_operand" "r")))
+   (set (match_dup 0)
+   (minus:SWI (match_dup 0)
+  (match_dup 1)))]
+  ""
+  "lock{%;} %K2sub{}\t{%1, %0|%0, %1}")
+
 ;; Recall that xchg implicitly sets LOCK#, so adding it again wastes space.
 ;; In addition, it is always a full barrier, so we can ignore the memory model.
 (define_insn "atomic_exchange"
--- gcc/testsuite/gcc.target/i386/pr98737.c.jj  2021-01-26 15:59:24.640620178 
+0100
+++ gcc/testsuite/gcc.target/i386/pr98737.c 2021-01-26 16:00:02.898205888 
+0100
@@ -0,0 +1,38 @@
+/* PR target/98737 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -masm=att" } */
+/* { dg-additional-options "-march=i686" { target ia32 } } */
+/* { dg-final { scan-assembler "lock\[^\n\r]\*subq\t" { target lp64 } } } */
+/* { dg-final { scan-assembler "lock\[^\n\r]\*subl\t" } } */
+/* { dg-final { scan-assembler "lock\[^\n\r]\*subw\t" } } */
+/* { dg-final { scan-assembler "lock\[^\n\r]\*subb\t" } } */
+/* { dg-final { scan-assembler-not "lock\[^\n\r]\*xadd" } } */
+
+long a;
+int b;
+short c;
+char d;
+
+int
+foo (long x)
+{
+  return __atomic_sub_fetch (&a, x, __ATOMIC_RELEASE) == 0;
+}
+
+int
+bar (int x)
+{
+  return __atomic_sub_fetch (&b, x, __ATOMIC_RELEASE) == 0;
+}
+
+int
+baz (short x)
+{
+  return __atomic_sub_fetch (&c, x, __ATOMIC_RELEASE) == 0;
+}
+
+int
+qux (char x)
+{
+  return __atomic_sub_fetch (&d, x, __ATOMIC_RELEASE) == 0;
+}

Jakub



[Patch, fortran] PR98472 - internal compiler error: in gfc_conv_expr_descriptor, at fortran/trans-array.c:7352

2021-01-27 Thread Paul Richard Thomas via Gcc-patches
I have applied another obvious patch to fix this PR. It was tempting to
remove both gcc-asserts but I have erred on the side of caution this time.

Commit r11-6924-g003f0414291d595d2126e6d2e24b281f38f3448f

Again, it is sufficiently safe and obvious that I am tempted to put it on
my list of backports.

Paul

Fortran: Fix ICE due to elemental procedure pointers [PR98472].

2021-01-27  Paul Thomas  

gcc/fortran
PR fortran/98472
* trans-array.c (gfc_conv_expr_descriptor): Include elemental
procedure pointers in the assert under the comment 'elemental
function' and eliminate the second, spurious assert.

gcc/testsuite/
PR fortran/98472
* gfortran.dg/elemental_function_5.f90 : New test.
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index 4bd4db877bd..c346183e129 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -7477,9 +7477,9 @@ gfc_conv_expr_descriptor (gfc_se *se, gfc_expr *expr)
 			 && expr->value.function.esym->attr.elemental)
 			|| (expr->value.function.isym != NULL
 			&& expr->value.function.isym->elemental)
+			|| (gfc_expr_attr (expr).proc_pointer
+			&& gfc_expr_attr (expr).elemental)
 			|| gfc_inline_intrinsic_function_p (expr));
-	  else
-	gcc_assert (ss_type == GFC_SS_INTRINSIC);
 
 	  need_tmp = 1;
 	  if (expr->ts.type == BT_CHARACTER
! { dg-do compile }
!
! Test the fix for PR98472.
!
! Contributed by Rui Coelho  
!
module a
type, abstract :: base
contains
procedure(elem_func), deferred, nopass :: add
end type base

type, extends(base) :: derived
contains
procedure, nopass :: add => add_derived
end type derived

abstract interface
elemental function elem_func(x, y) result(out)
integer, intent(in) :: x, y
integer :: out
end function elem_func
end interface

contains
elemental function add_derived(x, y) result(out)
integer, intent(in) :: x, y
integer :: out
out = x + y
end function add_derived
end module a

program main
use a
call foo
contains
subroutine foo
   integer, dimension(:), allocatable :: vec
   class(base), allocatable :: instance
   allocate(derived :: instance)
   allocate(vec, source=instance%add([1, 2], [1, 2])) ! ICE here
   if (any (vec .ne. [2, 4])) stop 1
end
end program main




[PATCH] libgcc, i386: Add .note.GNU-stack sections to the ms sse/avx sav/res

2021-01-27 Thread Jakub Jelinek via Gcc-patches
Hi!

On Linux, GCC emits .note.GNU-stack sections when compiling code to mark
the code as not needing or needing executable stack, missing section means
unknown.  But assembly files need to be marked manually.  We already
mark various *.S files in libgcc manually, but the
avx_resms64f.o
avx_resms64fx.o
avx_resms64.o
avx_resms64x.o
avx_savms64f.o
avx_savms64.o
sse_resms64f.o
sse_resms64fx.o
sse_resms64.o
sse_resms64x.o
sse_savms64f.o
sse_savms64.o
files aren't marked, so when something links it in, it will require
executable stack.  Nothing in the assembly requires executable stack though.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2021-01-27  Jakub Jelinek  

* config/i386/savms64.h: Add .note.GNU-stack section on Linux.
* config/i386/savms64f.h: Likewise.
* config/i386/resms64.h: Likewise.
* config/i386/resms64f.h: Likewise.
* config/i386/resms64x.h: Likewise.
* config/i386/resms64fx.h: Likewise.

--- libgcc/config/i386/savms64.h.jj 2021-01-04 10:25:53.912063080 +0100
+++ libgcc/config/i386/savms64.h2021-01-26 19:22:12.371719078 +0100
@@ -57,3 +57,8 @@ MS2SYSV_STUB_END(savms64_17)
 MS2SYSV_STUB_END(savms64_18)
 
 #endif /* __x86_64__ */
+
+#if defined(__ELF__) && defined(__linux__)
+   .section .note.GNU-stack,"",@progbits
+   .previous
+#endif
--- libgcc/config/i386/savms64f.h.jj2021-01-04 10:25:53.906063148 +0100
+++ libgcc/config/i386/savms64f.h   2021-01-26 19:22:18.064654821 +0100
@@ -55,3 +55,8 @@ MS2SYSV_STUB_END(savms64f_16)
 MS2SYSV_STUB_END(savms64f_17)
 
 #endif /* __x86_64__ */
+
+#if defined(__ELF__) && defined(__linux__)
+   .section .note.GNU-stack,"",@progbits
+   .previous
+#endif
--- libgcc/config/i386/resms64.h.jj 2021-01-04 10:25:53.904063171 +0100
+++ libgcc/config/i386/resms64.h2021-01-26 19:21:45.486022557 +0100
@@ -57,3 +57,8 @@ MS2SYSV_STUB_END(resms64_17)
 MS2SYSV_STUB_END(resms64_18)
 
 #endif /* __x86_64__ */
+
+#if defined(__ELF__) && defined(__linux__)
+   .section .note.GNU-stack,"",@progbits
+   .previous
+#endif
--- libgcc/config/i386/resms64f.h.jj2021-01-04 10:25:53.910063103 +0100
+++ libgcc/config/i386/resms64f.h   2021-01-26 19:21:54.024926172 +0100
@@ -55,3 +55,8 @@ MS2SYSV_STUB_END(resms64f_16)
 MS2SYSV_STUB_END(resms64f_17)
 
 #endif /* __x86_64__ */
+
+#if defined(__ELF__) && defined(__linux__)
+   .section .note.GNU-stack,"",@progbits
+   .previous
+#endif
--- libgcc/config/i386/resms64x.h.jj2021-01-04 10:25:53.901063205 +0100
+++ libgcc/config/i386/resms64x.h   2021-01-26 19:22:07.017779514 +0100
@@ -63,3 +63,8 @@ MS2SYSV_STUB_END(resms64x_17)
 MS2SYSV_STUB_END(resms64x_18)
 
 #endif /* __x86_64__ */
+
+#if defined(__ELF__) && defined(__linux__)
+   .section .note.GNU-stack,"",@progbits
+   .previous
+#endif
--- libgcc/config/i386/resms64fx.h.jj   2021-01-04 10:25:53.901063205 +0100
+++ libgcc/config/i386/resms64fx.h  2021-01-26 19:22:00.943848074 +0100
@@ -62,3 +62,8 @@ MS2SYSV_STUB_END(resms64fx_16)
 MS2SYSV_STUB_END(resms64fx_17)
 
 #endif /* __x86_64__ */
+
+#if defined(__ELF__) && defined(__linux__)
+   .section .note.GNU-stack,"",@progbits
+   .previous
+#endif

Jakub



Re: [PATCH 1/4] unroll: Add middle-end unroll factor estimation

2021-01-27 Thread Kewen.Lin via Gcc-patches
on 2021/1/26 下午6:53, Richard Biener wrote:
> On Tue, 26 Jan 2021, Kewen.Lin wrote:
> 
>> Hi Segher/Richard B./Richard S.,
>>
>> Many thanks for your all helps and comments on this!
>>
>> on 2021/1/25 下午3:56, Richard Biener wrote:
>>> On Fri, 22 Jan 2021, Segher Boessenkool wrote:
>>>
 On Fri, Jan 22, 2021 at 02:47:06PM +0100, Richard Biener wrote:
> On Thu, 21 Jan 2021, Segher Boessenkool wrote:
>> What is holding up this patch still?  Ke Wen has pinged it every month
>> since May, and there has still not been a review.

 Richard Sandiford wrote:
> FAOD (since I'm on cc:), I don't feel qualified to review this.
> Tree-level loop stuff isn't really my area.

 And Richard Biener wrote:
> I don't like it, it feels wrong but I don't have a good suggestion
> that had positive feedback.  Since a reviewer / approver is indirectly
> responsible for at least the design I do not want to ack this patch.
> Bin made forward progress on the other parts of the series but clearly
> there's somebody missing with the appropriate privileges who feels
> positive about the patch and its general direction.
>
> Sorry to be of no help here.

 How unfortunate :-(

 So, first off, this will then have to work for next stage 1 to make any
 progress.  Rats.

 But what could have been done differently that would have helped?  Of
 course Ke Wen could have written a better patch (aka one that is more
 acceptable); either of you could have made your current replies earlier,
 so that it is clear help needs to be sought elsewhere; and I could have
 pushed people earlier, too.  No one really did anything wrong, I'm not
 seeking who to blame, I'm just trying to find out how to prevent
 deadlocks like this in the future (where one party waits for replies
 that will never come).

 Is it just that we have a big gaping hole in reviewers with experience
 in such loop optimisations?
>>>
>>> May be.  But what I think is the biggest problem is that we do not
>>> have a good way to achieve what the patch tries (if you review the
>>> communications you'll see many ideas tossed around) first and foremost
>>> because IV selection is happening early on GIMPLE and unrolling
>>> happens late on RTL.  Both need a quite accurate estimate of costs
>>> but unrolling has an ever harder time than IV selection where we've
>>> got along with throwing dummy RTL at costing functions.
>>>
>>
>> Yeah, exactly.
>>
>>> IMHO the patch is the wrong "start" to try fixing the issue and my
>>> fear is that wiring this kind of "features" into the current
>>> (fundamentally broken) state will make it much harder to rework
>>> that state without introducing regressions on said features (I'm
>>> there with trying to turn the vectorizer upside down - for three
>>> years now, struggling to not regress any of the "features" we've
>>> accumulated for various targets where most of them feel a
>>> "bolted-on" rather than well-designed ;/).
>>>
>>
>> OK, understandable.
>>
>>> I think IV selection and unrolling (and scheduling FWIW) need to move
>>> closer together.  I do not have a good idea how that can work out
>>> though but I very much believe that this "most wanted" GIMPLE unroller
>>> will not be a good way of progressing here.  Maybe taking the bullet
>>> and moving IV selection back to RTL is the answer.
>>>
>>
>> I haven't looked into loop-iv.c, but IVOPTS in gimple can leverage
>> SCEV analysis for iv detection, if moving it to RTL, it could be
>> very heavier to detect the full set there?
>>
>>> For a "short term" solution I still think that trying to perform
>>> unrolling and IV selection (for the D-form case you're targeting)
>>> at the same time is a better design, even if it means complicating
>>> the IV selection pass (and yeah, it'll still be at GIMPLE and w/o
>>> any good idea about scheduling).  There are currently 20+ GIMPLE
>>> optimization passes and 10+ RTL optimization passes between
>>> IV selection and unrolling, the idea that you can have transform
>>> decision and transform apply this far apart looks scary.
>>>
>>
>> I have some questions in mind for this part, for "perform unrolling
>> and IV selection at the same time", it can be interpreted to two
>> different implementation ways to me:
>>
>> 1) Run one gimple unrolling pass just before IVOPTS, probably using
>>the same gate for IVOPTS.  The unrolling factor is computed by
>>the same method as that of RTL unrolling.  But this sounds very
>>like "most wanted gimple unrolling" which is what we want to avoid.
>>
>>The positive aspect here is what IVOPTS faces is already one unrolled
>>loop, it can see the whole picture and get the optimal IV set.  The
>>downside/question is how we position these gimple unrolling and RTL
>>unrolling passes, whether we still need RTL unrolling.  If no, it's
>>doubtable that one gimple unrolling can fully rep

Re: [PATCH] libgcc, i386: Add .note.GNU-stack sections to the ms sse/avx sav/res

2021-01-27 Thread Uros Bizjak via Gcc-patches
On Wed, Jan 27, 2021 at 10:26 AM Jakub Jelinek  wrote:
>
> Hi!
>
> On Linux, GCC emits .note.GNU-stack sections when compiling code to mark
> the code as not needing or needing executable stack, missing section means
> unknown.  But assembly files need to be marked manually.  We already
> mark various *.S files in libgcc manually, but the
> avx_resms64f.o
> avx_resms64fx.o
> avx_resms64.o
> avx_resms64x.o
> avx_savms64f.o
> avx_savms64.o
> sse_resms64f.o
> sse_resms64fx.o
> sse_resms64.o
> sse_resms64x.o
> sse_savms64f.o
> sse_savms64.o
> files aren't marked, so when something links it in, it will require
> executable stack.  Nothing in the assembly requires executable stack though.
>
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?
>
> 2021-01-27  Jakub Jelinek  
>
> * config/i386/savms64.h: Add .note.GNU-stack section on Linux.
> * config/i386/savms64f.h: Likewise.
> * config/i386/resms64.h: Likewise.
> * config/i386/resms64f.h: Likewise.
> * config/i386/resms64x.h: Likewise.
> * config/i386/resms64fx.h: Likewise.

LGTM.

Thanks,
Uros.

>
> --- libgcc/config/i386/savms64.h.jj 2021-01-04 10:25:53.912063080 +0100
> +++ libgcc/config/i386/savms64.h2021-01-26 19:22:12.371719078 +0100
> @@ -57,3 +57,8 @@ MS2SYSV_STUB_END(savms64_17)
>  MS2SYSV_STUB_END(savms64_18)
>
>  #endif /* __x86_64__ */
> +
> +#if defined(__ELF__) && defined(__linux__)
> +   .section .note.GNU-stack,"",@progbits
> +   .previous
> +#endif
> --- libgcc/config/i386/savms64f.h.jj2021-01-04 10:25:53.906063148 +0100
> +++ libgcc/config/i386/savms64f.h   2021-01-26 19:22:18.064654821 +0100
> @@ -55,3 +55,8 @@ MS2SYSV_STUB_END(savms64f_16)
>  MS2SYSV_STUB_END(savms64f_17)
>
>  #endif /* __x86_64__ */
> +
> +#if defined(__ELF__) && defined(__linux__)
> +   .section .note.GNU-stack,"",@progbits
> +   .previous
> +#endif
> --- libgcc/config/i386/resms64.h.jj 2021-01-04 10:25:53.904063171 +0100
> +++ libgcc/config/i386/resms64.h2021-01-26 19:21:45.486022557 +0100
> @@ -57,3 +57,8 @@ MS2SYSV_STUB_END(resms64_17)
>  MS2SYSV_STUB_END(resms64_18)
>
>  #endif /* __x86_64__ */
> +
> +#if defined(__ELF__) && defined(__linux__)
> +   .section .note.GNU-stack,"",@progbits
> +   .previous
> +#endif
> --- libgcc/config/i386/resms64f.h.jj2021-01-04 10:25:53.910063103 +0100
> +++ libgcc/config/i386/resms64f.h   2021-01-26 19:21:54.024926172 +0100
> @@ -55,3 +55,8 @@ MS2SYSV_STUB_END(resms64f_16)
>  MS2SYSV_STUB_END(resms64f_17)
>
>  #endif /* __x86_64__ */
> +
> +#if defined(__ELF__) && defined(__linux__)
> +   .section .note.GNU-stack,"",@progbits
> +   .previous
> +#endif
> --- libgcc/config/i386/resms64x.h.jj2021-01-04 10:25:53.901063205 +0100
> +++ libgcc/config/i386/resms64x.h   2021-01-26 19:22:07.017779514 +0100
> @@ -63,3 +63,8 @@ MS2SYSV_STUB_END(resms64x_17)
>  MS2SYSV_STUB_END(resms64x_18)
>
>  #endif /* __x86_64__ */
> +
> +#if defined(__ELF__) && defined(__linux__)
> +   .section .note.GNU-stack,"",@progbits
> +   .previous
> +#endif
> --- libgcc/config/i386/resms64fx.h.jj   2021-01-04 10:25:53.901063205 +0100
> +++ libgcc/config/i386/resms64fx.h  2021-01-26 19:22:00.943848074 +0100
> @@ -62,3 +62,8 @@ MS2SYSV_STUB_END(resms64fx_16)
>  MS2SYSV_STUB_END(resms64fx_17)
>
>  #endif /* __x86_64__ */
> +
> +#if defined(__ELF__) && defined(__linux__)
> +   .section .note.GNU-stack,"",@progbits
> +   .previous
> +#endif
>
> Jakub
>


Re: [PATCH] i386: Add peephole2 for __atomic_sub_fetch (x, y, z) == 0 [PR98737]

2021-01-27 Thread Uros Bizjak via Gcc-patches
On Wed, Jan 27, 2021 at 10:20 AM Jakub Jelinek  wrote:
>
> Hi!
>
> This patch adds a peephole2 for the optimization requested in the PR,
> namely that we emit awful code for __atomic_sub_fetch (x, y, z) == 0
> or __atomic_sub_fetch (x, y, z) != 0 when y is not constant.
> This can't be done in the combiner which punts on combining UNSPEC_VOLATILE
> into other insns.
>
> For other ops we'd need different peephole2s, this one is specific with its
> comparison instruction and negation that need to be matched.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux.  Is this ok for trunk
> (as exception), or for GCC 12?

If there is no urgent need, I'd rather see to obey stage-4 and wait
for gcc-12. There is PR98375 meta bug to track gcc-12 pending patches.

> 2021-01-27  Jakub Jelinek  
>
> PR target/98737
> * config/i386/sync.md (neg; mov; lock xadd; add peephole2): New
> define_peephole2.
> (*atomic_fetch_sub_cmp): New define_insn.
>
> * gcc.target/i386/pr98737.c: New test.

OK, although this peephole is quite complex and matched sequence is
easily perturbed. Please note that reg-reg move is due to RA to
satisfy register constraint; if the value is already in the right
register, then the sequence won't match. Do we need additional pattern
with reg-reg move omitted?

In the PR, Ulrich suggested to also handle other arith/logic
operations, but matching these would be even harder, as they are
emitted using cmpxchg loop. Maybe middle-end could emit a special
version of the "boolean" atomic insn, if only flags are needed?

Uros.

> --- gcc/config/i386/sync.md.jj  2021-01-04 10:25:45.392159555 +0100
> +++ gcc/config/i386/sync.md 2021-01-26 16:03:13.911100510 +0100
> @@ -777,6 +777,63 @@ (define_insn "*atomic_fetch_add_cmpreturn "lock{%;} %K3add{}\t{%1, %0|%0, %1}";
>  })
>
> +;; Similarly, peephole for __sync_sub_fetch (x, b) == 0 into just
> +;; lock sub followed by testing of flags instead of lock xadd, negation and
> +;; comparison.
> +(define_peephole2
> +  [(parallel [(set (match_operand 0 "register_operand")
> +  (neg (match_dup 0)))
> + (clobber (reg:CC FLAGS_REG))])
> +   (set (match_operand:SWI 1 "register_operand")
> +   (match_operand:SWI 2 "register_operand"))
> +   (parallel [(set (match_operand:SWI 3 "register_operand")
> +  (unspec_volatile:SWI
> +[(match_operand:SWI 4 "memory_operand")
> + (match_operand:SI 5 "const_int_operand")]
> +UNSPECV_XCHG))
> + (set (match_dup 4)
> +  (plus:SWI (match_dup 4)
> +(match_dup 3)))
> + (clobber (reg:CC FLAGS_REG))])
> +   (parallel [(set (reg:CCZ FLAGS_REG)
> +  (compare:CCZ (neg:SWI
> + (match_operand:SWI 6 "register_operand"))
> +   (match_dup 3)))
> + (clobber (match_dup 3))])]
> +  "(GET_MODE (operands[0]) == mode
> +|| GET_MODE (operands[0]) == mode)
> +   && reg_or_subregno (operands[0]) == reg_or_subregno (operands[2])
> +   && (rtx_equal_p (operands[2], operands[3])
> +   ? rtx_equal_p (operands[1], operands[6])
> +   : (rtx_equal_p (operands[2], operands[6])
> + && rtx_equal_p (operands[1], operands[3])))
> +   && peep2_reg_dead_p (4, operands[6])
> +   && peep2_reg_dead_p (4, operands[3])
> +   && !reg_overlap_mentioned_p (operands[1], operands[4])
> +   && !reg_overlap_mentioned_p (operands[2], operands[4])"
> +  [(parallel [(set (reg:CCZ FLAGS_REG)
> +  (compare:CCZ
> +(unspec_volatile:SWI [(match_dup 4) (match_dup 5)]
> + UNSPECV_XCHG)
> +(match_dup 2)))
> + (set (match_dup 4)
> +  (minus:SWI (match_dup 4)
> + (match_dup 2)))])])
> +
> +(define_insn "*atomic_fetch_sub_cmp"
> +  [(set (reg:CCZ FLAGS_REG)
> +   (compare:CCZ
> + (unspec_volatile:SWI
> +   [(match_operand:SWI 0 "memory_operand" "+m")
> +(match_operand:SI 2 "const_int_operand")]  ;; model
> +   UNSPECV_XCHG)
> + (match_operand:SWI 1 "register_operand" "r")))
> +   (set (match_dup 0)
> +   (minus:SWI (match_dup 0)
> +  (match_dup 1)))]
> +  ""
> +  "lock{%;} %K2sub{}\t{%1, %0|%0, %1}")
> +
>  ;; Recall that xchg implicitly sets LOCK#, so adding it again wastes space.
>  ;; In addition, it is always a full barrier, so we can ignore the memory 
> model.
>  (define_insn "atomic_exchange"
> --- gcc/testsuite/gcc.target/i386/pr98737.c.jj  2021-01-26 15:59:24.640620178 
> +0100
> +++ gcc/testsuite/gcc.target/i386/pr98737.c 2021-01-26 16:00:02.898205888 
> +0100
> @@ -0,0 +1,38 @@
> +/* PR target/98737 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -masm=att" } */
> +/* { dg-additional-options "-march=i686" { target ia32 } } */
> +/

Re: [PATCH] i386: Add peephole2 for __atomic_sub_fetch (x, y, z) == 0 [PR98737]

2021-01-27 Thread Jakub Jelinek via Gcc-patches
On Wed, Jan 27, 2021 at 11:22:57AM +0100, Uros Bizjak wrote:
> > Bootstrapped/regtested on x86_64-linux and i686-linux.  Is this ok for trunk
> > (as exception), or for GCC 12?
> 
> If there is no urgent need, I'd rather see to obey stage-4 and wait
> for gcc-12. There is PR98375 meta bug to track gcc-12 pending patches.

Okay.

> > 2021-01-27  Jakub Jelinek  
> >
> > PR target/98737
> > * config/i386/sync.md (neg; mov; lock xadd; add peephole2): New
> > define_peephole2.
> > (*atomic_fetch_sub_cmp): New define_insn.
> >
> > * gcc.target/i386/pr98737.c: New test.
> 
> OK, although this peephole is quite complex and matched sequence is
> easily perturbed. Please note that reg-reg move is due to RA to
> satisfy register constraint; if the value is already in the right
> register, then the sequence won't match. Do we need additional pattern
> with reg-reg move omitted?

If there is no reg-reg move, then it is impossible to prove that it is a
negation.  The use of lock xadd forces addition instead of subtraction,
and additionally modifies its result, so for the comparison one needs
another register that holds the same value as the xadd initially.  And
we need to prove it is a negation.

> In the PR, Ulrich suggested to also handle other arith/logic
> operations, but matching these would be even harder, as they are
> emitted using cmpxchg loop. Maybe middle-end could emit a special
> version of the "boolean" atomic insn, if only flags are needed?

I guess we could add new optabs for the atomic builtins whose result
with the *_fetch operation rather than fetch_* is ==/!= compared against 0,
not sure if we could do anything else easily, because what exact kind of
comparison it is then is heavily machine dependent and the backend would
then need to emit everything including branches (like e.g. the addv4
etc. expanders).
Would equality comparison against 0 handle the most common cases.

The user can write it as
__atomic_sub_fetch (x, y, z) == 0
or
__atomic_fetch_sub (x, y, z) - y == 0
thouch, so the expansion code would need to be able to cope with both.
And the latter form is where all kinds of interfering optimizations pop up,
e.g. for the subtraction it will be actually optimized into
__atomic_fetch_sub (x, y, z) == y

Jakub



Re: [PATCH] [x86] Fix ICE [PR target/98833]

2021-01-27 Thread Hongtao Liu via Gcc-patches
On Wed, Jan 27, 2021 at 5:03 PM Jakub Jelinek  wrote:
>
> On Wed, Jan 27, 2021 at 03:22:45PM +0800, Hongtao Liu wrote:
> > Hi:
> >   As desribed in PR, also remove the relevant and useless expanders
> > and builtins, the user can
> > directly use == and >, without calling the builtin function.
> >   Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> >
> > gcc/ChangeLog:
> >
> > PR target/98833
> > * config/i386/i386-builtin.def (BDESC): Delete
> > __builtin_ia32_pcmpeqb128, __builtin_ia32_pcmpeqw128,
> > __builtin_ia32_pcmpeqd128, __builtin_ia32_pcmpgtb128,
> > __builtin_ia32_pcmpgtw128, __builtin_ia32_pcmpgtd128,
> > __builtin_ia32_pcmpeqb256, __builtin_ia32_pcmpeqw256,
> > __builtin_ia32_pcmpeqd256, __builtin_ia32_pcmpeqq256,
> > __builtin_ia32_pcmpgtb256, __builtin_ia32_pcmpgtw256,
> > __builtin_ia32_pcmpgtd256, __builtin_ia32_pcmpgtq256.
> > * config/i386/sse.md (avx2_eq3): Deleted.
> > (sse2_eq3): Ditto.
> > (sse2_gt3): Renamed to ..
> > (*sse2_gt3): And drop !TARGET_XOP in condition.
> > (*sse2_eq3): Drop !TARGET_XOP in condition.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/98833
> > * gcc.target/i386/pr98833.c: New test.
> >
> > libcpp/
> >
> > PR target/98833
> > * lex.c (search_line_sse2): Replace builtins with == operator.
>
> Oops, I wasn't aware of the libcpp use, I'm afraid that means we should
> reconsider removing the builtins, because that means that e.g.
> GCC 10 will not build with GCC 11 as system compiler anymore, people
> bisecting GCC changes will have troubles etc.
> And a codesearch seems to show that other projects do use these builtins
> (even the AVX2 ones) too :(.
> I think the libcpp/ change is ok, as we've bumped minimum GCC version
> for building GCC to GCC 4.8 in GCC 11.  Note that GCC 4.7 in C++
> doesn't handle the == operators:
> error: invalid operands of types ‘v16qi {aka __vector(16) char}’ and ‘v16qi 
> {aka __vector(16) char}’ to binary ‘operator==’
> but 4.8 seems to work fine already.
>
> But, can we perhaps keep the builtins, but fold them immediately in
> ix86_gimple_fold_builtin so that we don't need the named patterns?
>
> Though, I guess that we could defer those changes to GCC 12.
>
> So can you just drop the !TARGET_XOP from conditions, add testcase and
> change libcpp/ for GCC 11 and do the rest for GCC 12?
>
Yes, and update patch.
> Thanks.
>
> Jakub
>


-- 
BR,
Hongtao


0001-Fix-ICE-for-PR-target-98833_v2.patch
Description: Binary data


Re: [PATCH] [x86] Fix ICE [PR target/98833]

2021-01-27 Thread Hongtao Liu via Gcc-patches
On Wed, Jan 27, 2021 at 6:38 PM Hongtao Liu  wrote:
>
> On Wed, Jan 27, 2021 at 5:03 PM Jakub Jelinek  wrote:
> >
> > On Wed, Jan 27, 2021 at 03:22:45PM +0800, Hongtao Liu wrote:
> > > Hi:
> > >   As desribed in PR, also remove the relevant and useless expanders
> > > and builtins, the user can
> > > directly use == and >, without calling the builtin function.
> > >   Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> > >
> > > gcc/ChangeLog:
> > >
> > > PR target/98833
> > > * config/i386/i386-builtin.def (BDESC): Delete
> > > __builtin_ia32_pcmpeqb128, __builtin_ia32_pcmpeqw128,
> > > __builtin_ia32_pcmpeqd128, __builtin_ia32_pcmpgtb128,
> > > __builtin_ia32_pcmpgtw128, __builtin_ia32_pcmpgtd128,
> > > __builtin_ia32_pcmpeqb256, __builtin_ia32_pcmpeqw256,
> > > __builtin_ia32_pcmpeqd256, __builtin_ia32_pcmpeqq256,
> > > __builtin_ia32_pcmpgtb256, __builtin_ia32_pcmpgtw256,
> > > __builtin_ia32_pcmpgtd256, __builtin_ia32_pcmpgtq256.
> > > * config/i386/sse.md (avx2_eq3): Deleted.
> > > (sse2_eq3): Ditto.
> > > (sse2_gt3): Renamed to ..
> > > (*sse2_gt3): And drop !TARGET_XOP in condition.
> > > (*sse2_eq3): Drop !TARGET_XOP in condition.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > PR target/98833
> > > * gcc.target/i386/pr98833.c: New test.
> > >
> > > libcpp/
> > >
> > > PR target/98833
> > > * lex.c (search_line_sse2): Replace builtins with == operator.
> >
> > Oops, I wasn't aware of the libcpp use, I'm afraid that means we should
> > reconsider removing the builtins, because that means that e.g.
> > GCC 10 will not build with GCC 11 as system compiler anymore, people
> > bisecting GCC changes will have troubles etc.
> > And a codesearch seems to show that other projects do use these builtins
> > (even the AVX2 ones) too :(.
> > I think the libcpp/ change is ok, as we've bumped minimum GCC version
> > for building GCC to GCC 4.8 in GCC 11.  Note that GCC 4.7 in C++
> > doesn't handle the == operators:
> > error: invalid operands of types ‘v16qi {aka __vector(16) char}’ and ‘v16qi 
> > {aka __vector(16) char}’ to binary ‘operator==’
> > but 4.8 seems to work fine already.
> >
> > But, can we perhaps keep the builtins, but fold them immediately in
> > ix86_gimple_fold_builtin so that we don't need the named patterns?
> >
> > Though, I guess that we could defer those changes to GCC 12.
> >
> > So can you just drop the !TARGET_XOP from conditions, add testcase and
> > change libcpp/ for GCC 11 and do the rest for GCC 12?
> >
> Yes, and update patch.
Correct gcc change log

gcc/ChangeLog:

PR target/98833
* config/i386/sse.md (sse2_gt3): Drop !TARGET_XOP in condition.
(*sse2_eq3): Ditto.

> > Thanks.
> >
> > Jakub
> >
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao


Re: [PATCH] [x86] Fix ICE [PR target/98833]

2021-01-27 Thread Jakub Jelinek via Gcc-patches
On Wed, Jan 27, 2021 at 06:38:23PM +0800, Hongtao Liu wrote:
> Yes, and update patch.

Ok, thanks.

Jakub



[PATCH] Add SSA coalescing verification (disabled)

2021-01-27 Thread Richard Biener
This adds a helper to allow verifying of abnormal coalescing
at pass boundaries.  It helps debugging issues like PR98845
since it's not always obvious where invalid overlapping life
ranges of abnormals were introduced.  The verifier is expensive
so I've added it in a #if 0 block in the usual places we do
IL verification.

Bootstrapped and tested (with the checker enabled) on 
x86_64-unknown-linux-gnu.

OK for trunk?

Thanks,
Richard.

2021-01-27  Richard Biener  

* tree-ssa-coalesce.h (verify_ssa_coalescing): Declare.
* tree-ssa-coalesce.c (verify_ssa_coalescing): New.
(create_coalesce_list_for_region): Do not assert we have
a default def for RESULT_DECLs.
(coalesce_with_default): Handle not existing default defs.
* passes.c (execute_function_todo): Call verify_ssa_coalescing,
commented out.
---
 gcc/passes.c| 14 +++---
 gcc/tree-ssa-coalesce.c | 15 ---
 gcc/tree-ssa-coalesce.h |  1 +
 3 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/gcc/passes.c b/gcc/passes.c
index 4fb1be99ce4..3c90dc9db99 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -63,6 +63,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic-core.h" /* for fnotice */
 #include "stringpool.h"
 #include "attribs.h"
+#include "tree-ssa-coalesce.h"
 
 using namespace gcc;
 
@@ -2044,9 +2045,16 @@ execute_function_todo (function *fn, void *data)
verify_gimple_in_seq (gimple_body (cfun->decl));
}
  if (cfun->curr_properties & PROP_ssa)
-   /* IPA passes leave stmts to be fixed up, so make sure to
-  not verify SSA operands whose verifier will choke on that.  */
-   verify_ssa (true, !from_ipa_pass);
+   {
+ /* IPA passes leave stmts to be fixed up, so make sure to
+not verify SSA operands whose verifier will choke on that.  */
+ verify_ssa (true, !from_ipa_pass);
+#if 0
+ /* If you want to debug a SSA coalescing issue, uncomment.  */
+ if (!from_ipa_pass)
+   verify_ssa_coalescing ();
+#endif
+   }
  /* IPA passes leave basic-blocks unsplit, so make sure to
 not trip on that.  */
  if ((cfun->curr_properties & PROP_cfg)
diff --git a/gcc/tree-ssa-coalesce.c b/gcc/tree-ssa-coalesce.c
index 77ccd6dd618..277136afd45 100644
--- a/gcc/tree-ssa-coalesce.c
+++ b/gcc/tree-ssa-coalesce.c
@@ -1017,7 +1017,7 @@ coalesce_with_default (tree var, coalesce_list *cl, 
bitmap used_in_copy)
 return;
 
   tree ssa = ssa_default_def (cfun, SSA_NAME_VAR (var));
-  if (!has_zero_uses (ssa))
+  if (!ssa || !has_zero_uses (ssa))
 return;
 
   add_cost_one_coalesce (cl, SSA_NAME_VERSION (ssa), SSA_NAME_VERSION (var));
@@ -1124,8 +1124,8 @@ create_coalesce_list_for_region (var_map map, bitmap 
used_in_copy)
if (!rhs1)
  break;
tree lhs = ssa_default_def (cfun, res);
-   gcc_assert (lhs);
-   if (TREE_CODE (rhs1) == SSA_NAME
+   if (lhs
+   && TREE_CODE (rhs1) == SSA_NAME
&& gimple_can_coalesce_p (lhs, rhs1))
  {
v1 = SSA_NAME_VERSION (lhs);
@@ -1759,3 +1759,12 @@ coalesce_ssa_name (var_map map)
   ssa_conflicts_delete (graph);
 }
 
+/* Verify that we can coalesce SSA names we must coalesce.  */
+
+DEBUG_FUNCTION void
+verify_ssa_coalescing (void)
+{
+  var_map map = init_var_map (num_ssa_names);
+  coalesce_ssa_name (map);
+  delete_var_map (map);
+}
diff --git a/gcc/tree-ssa-coalesce.h b/gcc/tree-ssa-coalesce.h
index 7e1447bed09..213922433a1 100644
--- a/gcc/tree-ssa-coalesce.h
+++ b/gcc/tree-ssa-coalesce.h
@@ -22,5 +22,6 @@ along with GCC; see the file COPYING3.  If not see
 
 extern void coalesce_ssa_name (var_map);
 extern bool gimple_can_coalesce_p (tree, tree);
+extern void verify_ssa_coalescing (void);
 
 #endif /* GCC_TREE_SSA_COALESCE_H */
-- 
2.26.2


Re: [PATCH] i386: Add peephole2 for __atomic_sub_fetch (x, y, z) == 0 [PR98737]

2021-01-27 Thread Ulrich Drepper via Gcc-patches
On 1/27/21 11:37 AM, Jakub Jelinek wrote:
> Would equality comparison against 0 handle the most common cases.
> 
> The user can write it as
> __atomic_sub_fetch (x, y, z) == 0
> or
> __atomic_fetch_sub (x, y, z) - y == 0
> thouch, so the expansion code would need to be able to cope with both.

Please also keep !=0, <0, <=0, >0, and >=0 in mind.  They all can be
useful and can be handled with the flags.



OpenPGP_signature
Description: OpenPGP digital signature


[Patch, fortran] PR93924/5 - [OOP] ICE with procedure pointer

2021-01-27 Thread Paul Richard Thomas via Gcc-patches
This patch fixes PRs 93924/5. It is another 'obvious' patch, whose
consequences are very limited.

I am trying to slip in as many small ready-to-go patches as I can before we
go too far into stage 4. It would be nice to have the patch for PR98573
(posted 23rd Jan) OK'd before the end of the week.

Cheers

Paul

Fortran: Fix ICE due to elemental procedure pointers [PR93924/5].

2021-01-27  Paul Thomas  

gcc/fortran
PR fortran/93924
PR fortran/93925
* trans-expr.c (gfc_conv_procedure_call): Suppress the call to
gfc_conv_intrinsic_to_class for unlimited polymorphic procedure
pointers.
(gfc_trans_assignment_1): Similarly suppress class assignment
for class valued procedure pointers.

gcc/testsuite/
PR fortran/93924
PR fortran/93925
* gfortran.dg/proc_ptr_52.f90 : New test.
diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index 7150e48bc93..b0c8d577ca5 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -5772,7 +5772,8 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
  CLASS_DATA (fsym)->attr.class_pointer
  || CLASS_DATA (fsym)->attr.allocatable);
 	}
-  else if (UNLIMITED_POLY (fsym) && e->ts.type != BT_CLASS)
+  else if (UNLIMITED_POLY (fsym) && e->ts.type != BT_CLASS
+	   && gfc_expr_attr (e).flavor != FL_PROCEDURE)
 	{
 	  /* The intrinsic type needs to be converted to a temporary
 	 CLASS object for the unlimited polymorphic formal.  */
@@ -11068,7 +11069,8 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * expr2, bool init_flag,
 		   || gfc_is_class_array_ref (expr1, NULL)
 		   || gfc_is_class_scalar_expr (expr1)
 		   || gfc_is_class_array_ref (expr2, NULL)
-		   || gfc_is_class_scalar_expr (expr2));
+		   || gfc_is_class_scalar_expr (expr2))
+		   && lhs_attr.flavor != FL_PROCEDURE;

   realloc_flag = flag_realloc_lhs
 		 && gfc_is_reallocatable_lhs (expr1)
! { dg-do run }
!
! Test the fix for PRs93924 & 93925.
!
! Contributed by Martin Stein  
!
module cs

implicit none

integer, target :: integer_target

abstract interface
   function classStar_map_ifc(x) result(y)
  class(*), pointer:: y
  class(*), target, intent(in) :: x
   end function classStar_map_ifc
end interface

contains

   function fun(x) result(y)
  class(*), pointer:: y
  class(*), target, intent(in) :: x
  select type (x)
  type is (integer)
 integer_target = x! Deals with dangling target.
 y => integer_target
  class default
 y => null()
  end select
   end function fun

   function apply(f, x) result(y)
  procedure(classStar_map_ifc) :: f
  integer, intent(in) :: x
  integer :: y
  class(*), pointer :: p
  y = 0! Get rid of 'y' undefined warning
  p => f (x)
  select type (p)
  type is (integer)
 y = p
  end select
   end function apply

   function selector() result(f)
  procedure(classStar_map_ifc), pointer :: f
  f => fun
   end function selector

end module cs


program classStar_map

use cs
implicit none

integer :: x, y
procedure(classStar_map_ifc), pointer :: f

x = 123654
f => selector ()   ! Fixed by second chunk in patch
y = apply (f, x)   ! Fixed by first chunk in patch
if (x .ne. y) stop 1

x = 2 * x
y = apply (fun, x) ! PR93925; fixed as above
if (x .ne. y) stop 2

end program classStar_map


RE: [PATCH] aarch64: Use RTL builtins for integer mla_n intrinsics

2021-01-27 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Jonathan Wright 
> Sent: 26 January 2021 11:43
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Sandiford
> 
> Subject: [PATCH] aarch64: Use RTL builtins for integer mla_n intrinsics
> 
> Hi,
> 
> As subject, this patch rewrites integer mla_n Neon intrinsics to use RTL
> builtins rather than inline assembly code, allowing for better scheduling
> and optimization.
> 
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
> 
> Ok for master?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Jonathan
> 
> ---
> 
> gcc/ChangeLog:
> 
> 2021-01-15  Jonathan Wright  
> 
> * config/aarch64/aarch64-simd-builtins.def: Add mla_n builtin
> generator macro.
> * config/aarch64/aarch64-simd.md (*aarch64_mla_elt_merge):
> Rename to...
> (aarch64_mla_n): This.
> * config/aarch64/arm_neon.h (vmla_n_s16): Use RTL builtin
> instead of asm.
> (vmla_n_s32): Likewise.
> (vmla_n_u16): Likewise.
> (vmla_n_u32): Likewise.
> (vmlaq_n_s16): Likewise.
> (vmlaq_n_s32): Likewise.
> (vmlaq_n_u16): Likewise.
> (vmlaq_n_u32): Likewise.



Re: [Patch, fortran] PR98573 - Dynamic type lost on assignment

2021-01-27 Thread Thomas Koenig via Gcc-patches

Hi Paul,


This is a relatively obvious patch. The chunk in trans-array.c is not part
of the fix for the PR but does suppress some of the bad dtype's that arise
from allocation of class objects. The part in trans-stmt.c provides vptrs
for all class allocations if the expression3 is available.

Regtests on FC33/x86_64


OK.

Thanks for the patch!

Best regards

Thomas


Re: follow SSA defs for asan base

2021-01-27 Thread Alexandre Oliva
On Jan 26, 2021, Richard Biener  wrote:

> So while I think it's safe let's look at if we can improve tree-nested.c,
> like I see (probably not the correct place):

*nod*, it's just not the *only* place.

> seeing how we adjust current_function_decl around the
> recompute_tree_invariant_for_addr_expr call but not the
> gsi_gimplify_val one (we already pass it a nesting_info,
> not sure if wi->info is the same as the 'info' used above though),
> so eventually we can fix it in one place?

There are pieces of nested function lowering for which we set cfun and
current_function_decl while walking each function, and there are other
pieces that just don't bother, and we only set up current_function_decl
temporarily for ADDR_EXPR handling.

This patch adjusts both of the ADDR_EXPR handlers that override
current_function_decl, so that the temporary overriding remains in
effect during the re-gimplification.  That is enough to avoid the
problem.  But I'm not very happy with this temporary overriding, it
seems fishy.  I'd rather we set things up for the entire duration of the
walking of each function.

But that's only relevant because we rely on current_function_decl for
address handling.  It's not clear to me that we should, as the other
patch demonstrated.  With it, we could probably even do away with these
overriders.

But, for this stage, this is probably as conservative a change as we
could possibly hope for.  I've regstrapped it on x86_64-linux-gnu, and
also bootstrapped it with asan and ubsan.  Ok to install?


restore current_function_decl after re-gimplifying nested ADDR_EXPRs

From: Alexandre Oliva 

Ada makes extensive use of nested functions, which turn all automatic
variables of the enclosing function that are used in nested ones into
members of an artificial FRAME record type.

The address of a local variable is usually passed to asan marking
functions without using a temporary.  asan_expand_mark_ifn will reject
an ADDR_EXPRs if it's split out from the call into an SSA_NAMEs.

Taking the address of a member of FRAME within a nested function was
not regarded as a gimple val: while introducing FRAME variables,
current_function_decl pointed to the outermost function, even while
processing a nested function, so decl_address_invariant_p, checking
that the context of the variable is current_function_decl, returned
false for such ADDR_EXPRs.

decl_address_invariant_p, called when determining whether an
expression is a legitimate gimple value, compares the context of
automatic variables with current_function_decl.  Some of the
tree-nested function processing doesn't set current_function_decl, but
ADDR_EXPR-processing bits temporarily override it.  However, they
restore it before re-gimplifying, which causes even ADDR_EXPRs
referencing automatic variables in the FRAME struct of a nested
function to not be regarded as address-invariant.

This patch moves the restores of current_function_decl in the
ADDR_EXPR-handling bits after the re-gimplification, so that the
correct current_function_decl is used when testing for address
invariance.


for  gcc/ChangeLog

* tree-nested.c (convert_nonlocal_reference_op): Move
current_function_decl restore after re-gimplification.
(convert_local_reference_op): Likewise.

for  gcc/testsuite/ChangeLog

* gcc.dg/asan/nested-1.c: New.
---
 gcc/testsuite/gcc.dg/asan/nested-1.c |   24 
 gcc/tree-nested.c|4 ++--
 2 files changed, 26 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/asan/nested-1.c

diff --git a/gcc/testsuite/gcc.dg/asan/nested-1.c 
b/gcc/testsuite/gcc.dg/asan/nested-1.c
new file mode 100644
index 0..87e842098077c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/asan/nested-1.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=address" } */
+
+int f(int i) {
+  auto int h() {
+int r;
+int *p;
+
+{
+  int x[3];
+
+  auto int g() {
+   return x[i];
+  }
+
+  p = &r;
+  *p = g();
+}
+
+return *p;
+  }
+
+  return h();
+}
diff --git a/gcc/tree-nested.c b/gcc/tree-nested.c
index 1b52669b622aa..addd6eef9aba6 100644
--- a/gcc/tree-nested.c
+++ b/gcc/tree-nested.c
@@ -1214,7 +1214,6 @@ convert_nonlocal_reference_op (tree *tp, int 
*walk_subtrees, void *data)
save_context = current_function_decl;
current_function_decl = info->context;
recompute_tree_invariant_for_addr_expr (t);
-   current_function_decl = save_context;
 
/* If the callback converted the address argument in a context
   where we only accept variables (and min_invariant, presumably),
@@ -1222,6 +1221,7 @@ convert_nonlocal_reference_op (tree *tp, int 
*walk_subtrees, void *data)
if (save_val_only)
  *tp = gsi_gimplify_val ((struct nesting_info *) wi->info,
  t, &wi->gsi);
+   current_function_decl = save_context;
   

[RFC] test builtin ratio for loop distribution

2021-01-27 Thread Alexandre Oliva


This patch attempts to fix a libgcc codegen regression introduced in
gcc-10, as -ftree-loop-distribute-patterns was enabled at -O2.


The ldist pass turns even very short loops into memset calls.  E.g.,
the TFmode emulation calls end with a loop of up to 3 iterations, to
zero out trailing words, and the loop distribution pass turns them
into calls of the memset builtin.

Though short constant-length memsets are usually dealt with
efficiently, for non-constant-length ones, the options are setmemM, or
a function calls.

RISC-V doesn't have any setmemM pattern, so the loops above end up
"optimized" into memset calls, incurring not only the overhead of an
explicit call, but also discarding the information the compiler has
about the alignment of the destination, and that the length is a
multiple of the word alignment.

This patch adds to the loop distribution pass some cost analysis based
on preexisting *_RATIO macros, so that we won't transform loops with
trip counts as low as the ratios we'd rather expand inline.


This patch is not finished; it needs adjustments to the testsuite, to
make up for the behavior changes it brings about.  Specifically, on a
x86_64-linux-gnu regstrap, it regresses:

> FAIL: gcc.dg/pr53265.c  (test for warnings, line 40)
> FAIL: gcc.dg/pr53265.c  (test for warnings, line 42)
> FAIL: gcc.dg/tree-ssa/ldist-38.c scan-tree-dump ldist "split to 0 loops and 1 
> library cal> FAIL: g++.dg/tree-ssa/pr78847.C  -std=gnu++14  scan-tree-dump 
> ldist "split to 0 loops and 1 library calls"
> FAIL: g++.dg/tree-ssa/pr78847.C  -std=gnu++17  scan-tree-dump ldist "split to 
> 0 loops and 1 library calls"
> FAIL: g++.dg/tree-ssa/pr78847.C  -std=gnu++2a  scan-tree-dump ldist "split to 
> 0 loops and 1 library calls"

I suppose just lengthening the loops will take care of ldist-38 and
pr78847, but the loss of the warnings in pr53265 is more concerning, and
will require investigation.

Nevertheless, I seek feedback on whether this is an acceptable approach,
or whether we should use alternate tuning parameters for ldist, or
something entirely different.  Thanks in advance,


for  gcc/ChangeLog

* tree-loop-distribution.c (maybe_normalize_partition): New.
(loop_distribution::distribute_loop): Call it.

[requires testsuite adjustments and investigation of a warning regression]
---
 gcc/tree-loop-distribution.c |   54 ++
 1 file changed, 54 insertions(+)

diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index bb15fd3723fb6..b5198652817ee 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -2848,6 +2848,52 @@ fuse_memset_builtins (vec 
*partitions)
 }
 }
 
+/* Return false if it's profitable to turn the LOOP PARTITION into a builtin
+   call, and true if it wasn't, changing the PARTITION to PKIND_NORMAL.  */
+
+static bool
+maybe_normalize_partition (class loop *loop, struct partition *partition)
+{
+  unsigned HOST_WIDE_INT ratio;
+
+  switch (partition->kind)
+{
+case PKIND_NORMAL:
+case PKIND_PARTIAL_MEMSET:
+  return false;
+
+case PKIND_MEMSET:
+  if (integer_zerop (gimple_assign_rhs1 (DR_STMT
+(partition->builtin->dst_dr
+   ratio = CLEAR_RATIO (optimize_loop_for_speed_p (loop));
+  else
+   ratio = SET_RATIO (optimize_loop_for_speed_p (loop));
+  break;
+
+case PKIND_MEMCPY:
+case PKIND_MEMMOVE:
+  ratio = MOVE_RATIO (optimize_loop_for_speed_p (loop));
+  break;
+
+default:
+  gcc_unreachable ();
+}
+
+  tree niters = number_of_latch_executions (loop);
+  if (niters == NULL_TREE || niters == chrec_dont_know)
+return false;
+
+  wide_int minit, maxit;
+  value_range_kind vrk = determine_value_range (niters, &minit, &maxit);
+  if (vrk == VR_RANGE && wi::ltu_p (maxit, ratio))
+{
+  partition->kind = PKIND_NORMAL;
+  return true;
+}
+
+  return false;
+}
+
 void
 loop_distribution::finalize_partitions (class loop *loop,
vec *partitions,
@@ -3087,6 +3133,14 @@ loop_distribution::distribute_loop (class loop *loop, 
vec stmts,
 }
 
   finalize_partitions (loop, &partitions, &alias_ddrs);
+  {
+bool any_changes_p = false;
+for (i = 0; partitions.iterate (i, &partition); ++i)
+  if (maybe_normalize_partition (loop, partition))
+   any_changes_p = true;
+if (any_changes_p)
+  finalize_partitions (loop, &partitions, &alias_ddrs);
+  }
 
   /* If there is a reduction in all partitions make sure the last one
  is not classified for builtin code generation.  */

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar


Re: [PATCH v2] libstdc++: C++23, implement WG21 P1679R3

2021-01-27 Thread Jonathan Wakely via Gcc-patches

On 15/01/21 01:23 +, Paul Fee via Libstdc++ wrote:

Add contains member function to basic_string_view and basic_string.

The new method is enabled for -std=gnu++20, gnu++2b and c++2b.  This allows
users to access the method as a GNU extension to C++20.  The conditional
test may be reduced to "__cplusplus > 202011L" once GCC has a c++2b switch.

Changes since v1 (13th Jan 2021)
* New:
 Test __cplusplus >= 202011L, rather than __cplusplus > 202011L.

* As suggested by Jonathan Wakely:
 Adjust formatting.
 Test feature-test macro is defined by  and .
 Correct copyright dates on new files.
 Fix comment typo.

libstdc++-v3/

   Add contains member function to basic_string_view.
   Likewise to basic_string_view, both with and without _GLIBCXX_USE_CXX11_ABI.
   Enabled with -std=gnu++20, gnu++2b and c++2b.
   * include/bits/basic_string.h (basic_string::contains): New.
   * libstdc++-v3/include/std/string_view (basic_string_view::contains): New.
   * testsuite/21_strings/basic_string/operations/contains/char/1.cc: New test.
   * testsuite/21_strings/basic_string/operations/contains/wchar_t/1.cc:
New test.
   * testsuite/21_strings/basic_string_view/operations/contains/char/1.cc:
New test.
   * testsuite/21_strings/basic_string_view/operations/contains/wchar_t/1.cc:
New test.


I've committed this to master now.

Now that -std=gnu++23 is supported I changed the preprocessor
condition to just check __cplusplus > 201102L (and not enable it for
-std=gnu++20) and changed the tests to use 23 not 2b. I've attached
what I committed.

Thanks for your patch!


commit f004d6d9fab9fe732b94f0e7d254700795a37f30
Author: Paul Fee 
Date:   Wed Jan 27 12:11:28 2021

libstdc++: Add string contains member functions for C++2b

This implements WG21 P1679R3, adding contains member functions to
basic_string_view and basic_string.

libstdc++-v3/ChangeLog:

* include/bits/basic_string.h (basic_string::contains): New
member functions.
* include/std/string_view (basic_string_view::contains):
Likewise.
* include/std/version (__cpp_lib_string_contains): Define.
* testsuite/21_strings/basic_string/operations/starts_with/char/1.cc:
Remove trailing whitespace.
* testsuite/21_strings/basic_string/operations/starts_with/wchar_t/1.cc:
Likewise.
* testsuite/21_strings/basic_string/operations/contains/char/1.cc: New test.
* testsuite/21_strings/basic_string/operations/contains/wchar_t/1.cc: New test.
* testsuite/21_strings/basic_string_view/operations/contains/char/1.cc: New test.
* testsuite/21_strings/basic_string_view/operations/contains/char/2.cc: New test.
* testsuite/21_strings/basic_string_view/operations/contains/wchar_t/1.cc: New test.

diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h
index e272d332934..bfc97644bd0 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -3073,6 +3073,20 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   { return __sv_type(this->data(), this->size()).ends_with(__x); }
 #endif // C++20
 
+#if __cplusplus > 202002L
+  bool
+  contains(basic_string_view<_CharT, _Traits> __x) const noexcept
+  { return __sv_type(this->data(), this->size()).contains(__x); }
+
+  bool
+  contains(_CharT __x) const noexcept
+  { return __sv_type(this->data(), this->size()).contains(__x); }
+
+  bool
+  contains(const _CharT* __x) const noexcept
+  { return __sv_type(this->data(), this->size()).contains(__x); }
+#endif // C++23
+
   // Allow basic_stringbuf::__xfer_bufptrs to call _M_length:
   template friend class basic_stringbuf;
 };
@@ -5998,6 +6012,21 @@ _GLIBCXX_END_NAMESPACE_CXX11
   { return __sv_type(this->data(), this->size()).ends_with(__x); }
 #endif // C++20
 
+#if __cplusplus >= 202011L \
+  || (__cplusplus == 202002L && !defined __STRICT_ANSI__)
+  bool
+  contains(basic_string_view<_CharT, _Traits> __x) const noexcept
+  { return __sv_type(this->data(), this->size()).contains(__x); }
+
+  bool
+  contains(_CharT __x) const noexcept
+  { return __sv_type(this->data(), this->size()).contains(__x); }
+
+  bool
+  contains(const _CharT* __x) const noexcept
+  { return __sv_type(this->data(), this->size()).contains(__x); }
+#endif // C++23
+
 # ifdef _GLIBCXX_TM_TS_INTERNAL
   friend void
   ::_txnal_cow_string_C1_for_exceptions(void* that, const char* s,
diff --git a/libstdc++-v3/include/std/string_view b/libstdc++-v3/include/std/string_view
index e33e1bc4b79..dba757fad6b 100644
--- a/libstdc++-v3/include/std/string_view
+++ b/libstdc++-v3/include/std/string_view
@@ -352,6 +352,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return this->ends_with(basic_string_view(__x)); }
 #endif // C++20
 
+#if __cplusplus > 202002L
+#define __cpp_lib_string_c

Re: [PATCH v2] libstdc++: C++23, implement WG21 P1679R3

2021-01-27 Thread Jonathan Wakely via Gcc-patches

On 27/01/21 12:40 +, Jonathan Wakely wrote:

On 15/01/21 01:23 +, Paul Fee via Libstdc++ wrote:

Add contains member function to basic_string_view and basic_string.

The new method is enabled for -std=gnu++20, gnu++2b and c++2b.  This allows
users to access the method as a GNU extension to C++20.  The conditional
test may be reduced to "__cplusplus > 202011L" once GCC has a c++2b switch.

Changes since v1 (13th Jan 2021)
* New:
Test __cplusplus >= 202011L, rather than __cplusplus > 202011L.

* As suggested by Jonathan Wakely:
Adjust formatting.
Test feature-test macro is defined by  and .
Correct copyright dates on new files.
Fix comment typo.

libstdc++-v3/

  Add contains member function to basic_string_view.
  Likewise to basic_string_view, both with and without _GLIBCXX_USE_CXX11_ABI.
  Enabled with -std=gnu++20, gnu++2b and c++2b.
  * include/bits/basic_string.h (basic_string::contains): New.
  * libstdc++-v3/include/std/string_view (basic_string_view::contains): New.
  * testsuite/21_strings/basic_string/operations/contains/char/1.cc: New test.
  * testsuite/21_strings/basic_string/operations/contains/wchar_t/1.cc:
New test.
  * testsuite/21_strings/basic_string_view/operations/contains/char/1.cc:
New test.
  * testsuite/21_strings/basic_string_view/operations/contains/wchar_t/1.cc:
New test.


I've committed this to master now.

Now that -std=gnu++23 is supported I changed the preprocessor
condition to just check __cplusplus > 201102L (and not enable it for
-std=gnu++20) and changed the tests to use 23 not 2b. I've attached
what I committed.

Thanks for your patch!


And here's the patch for the release notes, pushed to wwwdocs.


commit 12bb55fed649ece4d29536438cdd9853bbd85b1d
Author: Jonathan Wakely 
Date:   Wed Jan 27 12:42:58 2021 +

Document C++23 std::string::contains support

diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html
index 9d2e4f29..9b86e6c8 100644
--- a/htdocs/gcc-11/changes.html
+++ b/htdocs/gcc-11/changes.html
@@ -310,7 +310,7 @@ a work-in-progress.
   std::from_chars for floating-point types.
 
   
-  Improved experimental C++2a support, including:
+  Improved experimental C++20 support, including:
 
   Calendar additions to .
   std::bit_cast
@@ -322,6 +322,13 @@ a work-in-progress.
   Efficient access to basic_stringbuf's buffer.
 
   
+  Experimental C++23 support, including:
+
+  contains member functions for strings,
+thanks to Paul Fee.
+  
+
+  
   Faster std::uniform_int_distribution,
   thanks to Daniel Lemire.
   


[RFC] mask out mult expr ctz bits from nonzero bits

2021-01-27 Thread Alexandre Oliva


While looking into the possibility of introducing setmemM patterns on
RISC-V to undo the transformation from loops of word writes into
memset, I was disappointed to find out that get_nonzero_bits would
take into account the range of the length passed to memset, but not
the trivially-available observation that this length was a multiple of
the word size.  This knowledge, if passed on to setmemM, could enable
setmemM to output more efficient code.

In the end, I did not introduce a setmemM pattern, nor the machinery
to pass the ctz of the length on to it along with other useful
information, but I figured this small improvement to nonzero_bits
could still improve code generation elsewhere.
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564341.html


Regstrapped on x86_64-linux-gnu.  No analysis of codegen impact yet.
Does this seem worth pursuing, presumably for stage1?


for  gcc/ChangeLog

* tree-ssanames.c (get_nonzero_bits): Zero out low bits of
integral types, when a MULT_EXPR INTEGER_CST operand ensures
the result will be a multiple of a power of two.
---
 gcc/tree-ssanames.c |   23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-ssanames.c b/gcc/tree-ssanames.c
index 51a26d2fce1c2..c4b5bf2a4999a 100644
--- a/gcc/tree-ssanames.c
+++ b/gcc/tree-ssanames.c
@@ -546,10 +546,29 @@ get_nonzero_bits (const_tree name)
 }
 
   range_info_def *ri = SSA_NAME_RANGE_INFO (name);
+  wide_int ret;
   if (!ri)
-return wi::shwi (-1, precision);
+ret = wi::shwi (-1, precision);
+  else
+ret = ri->get_nonzero_bits ();
+
+  /* If NAME is defined as a multiple of a constant C, we know the ctz(C) low
+ bits are zero.  ??? Should we handle LSHIFT_EXPR too?  Non-constants,
+ e.g. the minimum shift count, and ctz from both MULT_EXPR operands?  That
+ could make for deep recursion.  */
+  if (INTEGRAL_TYPE_P (TREE_TYPE (name))
+  && SSA_NAME_DEF_STMT (name)
+  && is_gimple_assign (SSA_NAME_DEF_STMT (name))
+  && gimple_assign_rhs_code (SSA_NAME_DEF_STMT (name)) == MULT_EXPR
+  && TREE_CODE (gimple_assign_rhs2 (SSA_NAME_DEF_STMT (name))) == 
INTEGER_CST)
+{
+  unsigned HOST_WIDE_INT bits
+   = tree_ctz (gimple_assign_rhs2 (SSA_NAME_DEF_STMT (name)));
+  wide_int mask = wi::shwi (-1, precision) << bits;
+  ret &= mask;
+}
 
-  return ri->get_nonzero_bits ();
+  return ret;
 }
 
 /* Return TRUE is OP, an SSA_NAME has a range of values [0..1], false


-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar


Re: [PATCH] IBM Z: Fix usage of "f" constraint with long doubles

2021-01-27 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2021-01-27 at 08:58 +0100, Andreas Krebbel wrote:
> On 1/18/21 10:54 PM, Ilya Leoshkevich wrote:
> ...
> 
> > +static rtx_insn *
> > +s390_md_asm_adjust (vec &outputs, vec &inputs,
> > +   vec &input_modes,
> > +   vec &constraints, vec &
> > /*clobbers*/,
> > +   HARD_REG_SET & /*clobbered_regs*/)
> > +{
> > +  if (!TARGET_VXE)
> > +/* Long doubles are stored in FPR pairs - nothing to do.  */
> > +return NULL;
> > +
> > +  rtx_insn *after_md_seq = NULL, *after_md_end = NULL;
> > +
> > +  unsigned ninputs = inputs.length ();
> > +  unsigned noutputs = outputs.length ();
> > +  for (unsigned i = 0; i < noutputs; i++)
> > +{
> > +  if (GET_MODE (outputs[i]) != TFmode)
> > +   /* Not a long double - nothing to do.  */
> > +   continue;
> > +  const char *constraint = constraints[i];
> > +  bool allows_mem, allows_reg, is_inout;
> > +  bool ok = parse_output_constraint (&constraint, i, ninputs,
> > noutputs,
> > +&allows_mem, &allows_reg,
> > &is_inout);
> > +  gcc_assert (ok);
> > +  if (strcmp (constraint, "=f") != 0)
> > +   /* Long double with a constraint other than "=f" - nothing to
> > do.  */
> > +   continue;
> 
> What about other constraint modifiers like & and %? Don't we need to
> handle matching constraints as
> well here?

Oh, right - we need to account for %?!*&# and maybe some others.  I'll
j
ust copy the code from parse_output_constraint() that skips over all
of
them, because I don't think they need any special handling - we just
nee
d to make sure they don't mess up the recognition of "=f".

I don't think we need to explicitly support matching constraints,
because parse_input_constraint() will resolve them for us.  I'll add
a test for this just in case.

Do we make use of multi-alternative constraints on s390?  I think not,
because our instructions are fairly rigid, but maybe I'm missing
something?

...

> > diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
> > index 0e3c31f5d4f..1332a65a1d1 100644
> > --- a/gcc/config/s390/vector.md
> > +++ b/gcc/config/s390/vector.md
> > @@ -616,12 +616,23 @@ (define_insn "*vec_tf_to_v1tf_vr"
> > vlvgp\t%v0,%1,%N1"
> >[(set_attr "op_type" "VRR,VRX,VRX,VRI,VRR")])
> >  
> > -(define_insn "*fprx2_to_tf"
> > -  [(set (match_operand:TF   0 "nonimmediate_operand"
> > "=v")
> > -   (subreg:TF (match_operand:FPRX2 1 "general_operand"   "f")
> > 0))]
> > +(define_insn_and_split "fprx2_to_tf"
> > +  [(set (match_operand:TF   0 "nonimmediate_operand"
> > "=v,R")
> > +   (subreg:TF (match_operand:FPRX2 1
> > "general_operand"   "f,f") 0))]
> >"TARGET_VXE"
> > -  "vmrhg\t%v0,%1,%N1"
> > -  [(set_attr "op_type" "VRR")])
> > +  "@
> > +   vmrhg\t%v0,%1,%N1
> > +   #"
> > +  "!(MEM_P (operands[0]) && MEM_VOLATILE_P (operands[0]))"
> > +  [(set (match_dup 2) (match_dup 3))
> > +   (set (match_dup 4) (match_dup 5))]
> > +{
> > +  operands[2] = simplify_gen_subreg (DFmode, operands[0], TFmode,
> > 0);
> > +  operands[3] = simplify_gen_subreg (DFmode, operands[1],
> > FPRX2mode, 0);
> > +  operands[4] = simplify_gen_subreg (DFmode, operands[0], TFmode,
> > 8);
> > +  operands[5] = simplify_gen_subreg (DFmode, operands[1],
> > FPRX2mode, 8);
> > +}
> > +  [(set_attr "op_type" "VRR,*")])
> 
> Splitting an address like this might cause the displacement to
> overflow in the second part. This
> would require an additional reg to make the address valid again.
> Which in turn will be a problem
> after reload. You can use the 'AR' constraint for the memory
> alternative. That way reload will make
> sure the address is offsetable.

Ok, thanks for the hint!



Re: [PATCH] libstdc++: implement locale support for AIX

2021-01-27 Thread CHIGOT, CLEMENT via Gcc-patches
Hi everyone, 

Here is a better version of the patch. 
All tests are on Linux are passing. Few have been disabled as 
they are working only with GNU model. 
For AIX, few failures remains. I haven't XFAIL them yet, as I 
want to know if they AIX only or related to the model itself. 

A few part still need to be improved (dg-require-localmodel,
std::locale:global, FreeBSD specific #ifdef). 
But at least it can be tested in most of the platforms as is. 

Note that I'll stop working on it until gcc12. Mostly because gcc
is on freeze but also because I've more urgent stuff to do right now. 
Of course any feedbacks is welcome ! But I might not send a
new patch if it requires too much time (at least not right now). 

Thanks anyway Rainer and Jonathan for your help ! I hope this 
version suits you better !

Clément



0001-libstdc-implement-locale-support-for-XPG7.patch
Description: 0001-libstdc-implement-locale-support-for-XPG7.patch


Re: [PATCH] aarch64: Use GCC vector extensions for integer mls intrinsics

2021-01-27 Thread Jonathan Wright via Gcc-patches
I have re-written this to use RTL builtins - regression tested and bootstrapped 
on aarch64-none-linux-gnu with no issues:

aarch64: Use RTL builtins for integer mls intrinsics

Rewrite integer mls Neon intrinsics to use RTL builtins rather than
inline assembly code, allowing for better scheduling and
optimization.

gcc/Changelog:

2021-01-11  Jonathan Wright  

* config/aarch64/aarch64-simd-builtins.def: Add mls builtin
generator macro.
* config/aarch64/arm_neon.h (vmls_s8): Use RTL builtin rather
than asm.
(vmls_s16): Likewise.
(vmls_s32): Likewise.
(vmls_u8): Likewise.
(vmls_u16): Likewise.
(vmls_u32): Likewise.
(vmlsq_s8): Likewise.
(vmlsq_s16): Likewise.
(vmlsq_s32): Likewise.
(vmlsq_u8): Likewise.
(vmlsq_u16): Likewise.
(vmlsq_u32): Likewise.

​

From: Richard Sandiford 
Sent: 19 January 2021 17:43
To: Jonathan Wright 
Cc: gcc-patches@gcc.gnu.org ; Richard Earnshaw 
; Kyrylo Tkachov 
Subject: Re: [PATCH] aarch64: Use GCC vector extensions for integer mls 
intrinsics

Jonathan Wright  writes:
> Hi,
>
> As subject, this patch rewrites integer mls Neon intrinsics to use
> a - b * c rather than inline assembly code, allowing for better
> scheduling and optimization.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
>
> If ok, please commit to master (I don't have commit rights.)

Thanks for doing this.  The patch looks good from a functional
point of view.  I guess my only concern is that things like:

a = vmla_u8 (vmulq_u8 (b, c), d, e);

would become:

a = b * c + d * e;

and I don't think anything guarantees that the user's original
choice of instructon selection will be preserved.  We might end
up with the equivalent of:

a = vmla_u8 (vmulq_u8 (d, e), b, c);

giving different latencies.

If we added built-in functions instead, we could lower them to
IFN_FMA and IFN_FNMA, which support integers as well as floats,
and which stand a better chance of preserving the original grouping.

There again, the unfused floating-point MLAs already decompose
into separate multiplies and adds (although they can't of course
use IFN_FMA).

Any thoughts on doing it that way instead?

I'm not saying the patch shouldn't go in though, just thought it
was worth asking.

Thanks,
Richard

>
> Thanks,
> Jonathan
>
> ---
>
> gcc/Changelog:
>
> 2021-01-14  Jonathan Wright  
>
> * config/aarch64/arm_neon.h (vmls_s8): Use C rather than asm.
> (vmls_s16): Likewise.
> (vmls_s32): Likewise.
> (vmls_u8): Likewise.
> (vmls_u16): Likewise.
> (vmls_u32): Likewise.
> (vmlsq_s8): Likewise.
> (vmlsq_s16): Likewise.
> (vmlsq_s32): Likewise.
> (vmlsq_u8): Likewise.
> (vmlsq_u16): Likewise.
> (vmlsq_u32): Likewise.
>
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 
> 608e582d25820062a409310e7f3fc872660f8041..ad04eab1e753aa86f20a8f6cc2717368b1840ef7
>  100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -7968,72 +7968,45 @@ __extension__ extern __inline int8x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmls_s8 (int8x8_t __a, int8x8_t __b, int8x8_t __c)
>  {
> -  int8x8_t __result;
> -  __asm__ ("mls %0.8b,%2.8b,%3.8b"
> -   : "=w"(__result)
> -   : "0"(__a), "w"(__b), "w"(__c)
> -   : /* No clobbers */);
> -  return __result;
> +  uint8x8_t __result = (uint8x8_t) __a - (uint8x8_t) __b * (uint8x8_t) __c;
> +  return (int8x8_t) __result;
>  }
>
>  __extension__ extern __inline int16x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmls_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
>  {
> -  int16x4_t __result;
> -  __asm__ ("mls %0.4h,%2.4h,%3.4h"
> -   : "=w"(__result)
> -   : "0"(__a), "w"(__b), "w"(__c)
> -   : /* No clobbers */);
> -  return __result;
> +  uint16x4_t __result = (uint16x4_t) __a - (uint16x4_t) __b * (uint16x4_t) 
> __c;
> +  return (int16x4_t) __result;
>  }
>
>  __extension__ extern __inline int32x2_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmls_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
>  {
> -  int32x2_t __result;
> -  __asm__ ("mls %0.2s,%2.2s,%3.2s"
> -   : "=w"(__result)
> -   : "0"(__a), "w"(__b), "w"(__c)
> -   : /* No clobbers */);
> -  return __result;
> +  uint32x2_t __result = (uint32x2_t) __a - (uint32x2_t) __b * (uint32x2_t) 
> __c;
> +  return (int32x2_t) __result;
>  }
>
>  __extension__ extern __inline uint8x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmls_u8 (uint8x8_t __a, uint8x8_t __b, uint8x8_t __c)
>  {
> -  uint8x8_t __result;
> -  __asm__ ("mls %0.8b,%2.8b,%3.8b"
> -   : "=w"(__result)
> -   : "0"(__a), "w"(__b), "w"(__c)
> -   : /* No clobbers */);
> -  return __result;
> +  return __a - __b * 

Re: arm: Adjust cost of vector of constant zero

2021-01-27 Thread Christophe Lyon via Gcc-patches
On Wed, 27 Jan 2021 at 10:15, Kyrylo Tkachov  wrote:
>
> Hi Christophe,
>
> > -Original Message-
> > From: Gcc-patches  On Behalf Of
> > Christophe Lyon via Gcc-patches
> > Sent: 26 January 2021 18:03
> > To: gcc Patches 
> > Subject: arm: Adjust cost of vector of constant zero
> >
> > Neon vector comparisons have a dedicated version when comparing with
> > constant zero: it means its cost is free.
> >
> > Adjust the cost in arm_rtx_costs_internal accordingly, for Neon only,
> > since MVE does not support this.
>
> I guess the other way to do this would be in the comparison code handling in 
> this function where we could check for a const_vector of zeroes and a Neon 
> mode and avoid recursing into the operands.
> That would avoid the extra switch statement in your patch.
> WDYT?

Do you mean like so:
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 4a5f265..542c15e 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -11316,6 +11316,28 @@ arm_rtx_costs_internal (rtx x, enum rtx_code
code, enum rtx_code outer_code,
*cost = 0;
return true;
  }
+  /* Neon has special instructions when comparing with 0 (vceq, vcge, vcgt,
+ vcle and vclt). */
+  else if (TARGET_NEON
+&& TARGET_HARD_FLOAT
+&& (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode))
+&& (XEXP (x, 1) == CONST0_RTX (mode)))
+ {
+   switch (code)
+ {
+ case EQ:
+ case GE:
+ case GT:
+ case LE:
+ case LT:
+   *cost = 0;
+   return true;
+
+ default:
+   break;
+ }
+ }
+
   return false;

I'm not sure I can remove the switch, since the other comparisons are
not supported by Neon anyway.

Thanks,

Christophe


> Thanks,
> Kyrill
>
> >
> > 2021-01-26  Christophe Lyon  
> >
> > gcc/
> > PR target/98730
> > * config/arm/arm.c (arm_rtx_costs_internal): Adjust cost of vector
> > of constant zero for comparisons.
> >
> > gcc/testsuite/
> > PR target/98730
> > * gcc.target/arm/simd/vceqzq_p64.c: Update expected result.
> >
> > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> > index 4a5f265..9c5c0df 100644
> > --- a/gcc/config/arm/arm.c
> > +++ b/gcc/config/arm/arm.c
> > @@ -11544,7 +11544,28 @@ arm_rtx_costs_internal (rtx x, enum rtx_code
> > code, enum rtx_code outer_code,
> >   && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE
> > (mode)))
> >  || TARGET_HAVE_MVE)
> > && simd_immediate_valid_for_move (x, mode, NULL, NULL))
> > - *cost = COSTS_N_INSNS (1);
> > + {
> > +   *cost = COSTS_N_INSNS (1);
> > +
> > +   /* Neon has special instructions when comparing with 0 (vceq, vcge,
> > +  vcgt, vcle and vclt). */
> > +   if (TARGET_NEON && (x == CONST0_RTX (mode)))
> > + {
> > +   switch (outer_code)
> > + {
> > + case EQ:
> > + case GE:
> > + case GT:
> > + case LE:
> > + case LT:
> > +   *cost = COSTS_N_INSNS (0);
> > +   break;
> > +
> > + default:
> > +   break;
> > + }
> > + }
> > + }
> >else
> >   *cost = COSTS_N_INSNS (4);
> >return true;
> > diff --git a/gcc/testsuite/gcc.target/arm/simd/vceqzq_p64.c
> > b/gcc/testsuite/gcc.target/arm/simd/vceqzq_p64.c
> > index 640754c..a99bb8a 100644
> > --- a/gcc/testsuite/gcc.target/arm/simd/vceqzq_p64.c
> > +++ b/gcc/testsuite/gcc.target/arm/simd/vceqzq_p64.c
> > @@ -15,4 +15,4 @@ void func()
> >result2 = vceqzq_p64 (v2);
> >  }
> >
> > -/* { dg-final { scan-assembler-times "vceq\.i32\[
> > \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+\n" 2 } } */
> > +/* { dg-final { scan-assembler-times "vceq\.i32\[
> > \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+, #0\n" 2 } } */


arm-rtx-cost-vceq.patch3
Description: Binary data


Re: c++: cross-module __cxa_atexit use [PR 98531]

2021-01-27 Thread Rainer Orth
Hi Nathan,

> Solaris tickled this bug as it has some mutex/sync/something primitive with
> a destructor, hence wanted to generate a __cxa_atexit call inside an
> inline/template function.  But the problem is not solaris-specific.
>
> I tested this bootstrapping both x86_64-linux and aarch64-linux.  I'll
> commit in a couple of days if there are no further comments.

I've tried the patch on Solaris last night, with mixed results (some
regressions, remaining failures).  I've reported the details in the PR.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


RE: arm: Adjust cost of vector of constant zero

2021-01-27 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Christophe Lyon 
> Sent: 27 January 2021 13:12
> To: Kyrylo Tkachov 
> Cc: Kyrylo Tkachov via Gcc-patches 
> Subject: Re: arm: Adjust cost of vector of constant zero
> 
> On Wed, 27 Jan 2021 at 10:15, Kyrylo Tkachov 
> wrote:
> >
> > Hi Christophe,
> >
> > > -Original Message-
> > > From: Gcc-patches  On Behalf Of
> > > Christophe Lyon via Gcc-patches
> > > Sent: 26 January 2021 18:03
> > > To: gcc Patches 
> > > Subject: arm: Adjust cost of vector of constant zero
> > >
> > > Neon vector comparisons have a dedicated version when comparing with
> > > constant zero: it means its cost is free.
> > >
> > > Adjust the cost in arm_rtx_costs_internal accordingly, for Neon only,
> > > since MVE does not support this.
> >
> > I guess the other way to do this would be in the comparison code handling
> in this function where we could check for a const_vector of zeroes and a
> Neon mode and avoid recursing into the operands.
> > That would avoid the extra switch statement in your patch.
> > WDYT?
> 
> Do you mean like so:
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 4a5f265..542c15e 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -11316,6 +11316,28 @@ arm_rtx_costs_internal (rtx x, enum rtx_code
> code, enum rtx_code outer_code,
> *cost = 0;
> return true;
>   }
> +  /* Neon has special instructions when comparing with 0 (vceq, vcge,
> vcgt,
> + vcle and vclt). */
> +  else if (TARGET_NEON
> +&& TARGET_HARD_FLOAT
> +&& (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE
> (mode))
> +&& (XEXP (x, 1) == CONST0_RTX (mode)))
> + {
> +   switch (code)
> + {
> + case EQ:
> + case GE:
> + case GT:
> + case LE:
> + case LT:
> +   *cost = 0;
> +   return true;
> +
> + default:
> +   break;
> + }
> + }
> +
>return false;
> 
> I'm not sure I can remove the switch, since the other comparisons are
> not supported by Neon anyway.
> 

No, I mean where:
case EQ:
case NE:
case LT:
case LE:
case GT:
case GE:
case LTU:
case LEU:
case GEU:
case GTU:
case ORDERED:
case UNORDERED:
case UNEQ:
case UNLE:
case UNLT:
case UNGE:
case UNGT:
case LTGT:
  if (outer_code == SET)
{
  /* Is it a store-flag operation?  */
  if (REG_P (XEXP (x, 0)) && REGNO (XEXP (x, 0)) == CC_REGNUM

you reorder the codes that are relevant to NEON, handle them for a vector zero 
argument (and the right target checks), and fall through to the rest if not.

Kyrill

> Thanks,
> 
> Christophe
> 
> 
> > Thanks,
> > Kyrill
> >
> > >
> > > 2021-01-26  Christophe Lyon  
> > >
> > > gcc/
> > > PR target/98730
> > > * config/arm/arm.c (arm_rtx_costs_internal): Adjust cost of vector
> > > of constant zero for comparisons.
> > >
> > > gcc/testsuite/
> > > PR target/98730
> > > * gcc.target/arm/simd/vceqzq_p64.c: Update expected result.
> > >
> > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> > > index 4a5f265..9c5c0df 100644
> > > --- a/gcc/config/arm/arm.c
> > > +++ b/gcc/config/arm/arm.c
> > > @@ -11544,7 +11544,28 @@ arm_rtx_costs_internal (rtx x, enum
> rtx_code
> > > code, enum rtx_code outer_code,
> > >   && (VALID_NEON_DREG_MODE (mode) ||
> VALID_NEON_QREG_MODE
> > > (mode)))
> > >  || TARGET_HAVE_MVE)
> > > && simd_immediate_valid_for_move (x, mode, NULL, NULL))
> > > - *cost = COSTS_N_INSNS (1);
> > > + {
> > > +   *cost = COSTS_N_INSNS (1);
> > > +
> > > +   /* Neon has special instructions when comparing with 0 (vceq, vcge,
> > > +  vcgt, vcle and vclt). */
> > > +   if (TARGET_NEON && (x == CONST0_RTX (mode)))
> > > + {
> > > +   switch (outer_code)
> > > + {
> > > + case EQ:
> > > + case GE:
> > > + case GT:
> > > + case LE:
> > > + case LT:
> > > +   *cost = COSTS_N_INSNS (0);
> > > +   break;
> > > +
> > > + default:
> > > +   break;
> > > + }
> > > + }
> > > + }
> > >else
> > >   *cost = COSTS_N_INSNS (4);
> > >return true;
> > > diff --git a/gcc/testsuite/gcc.target/arm/simd/vceqzq_p64.c
> > > b/gcc/testsuite/gcc.target/arm/simd/vceqzq_p64.c
> > > index 640754c..a99bb8a 100644
> > > --- a/gcc/testsuite/gcc.target/arm/simd/vceqzq_p64.c
> > > +++ b/gcc/testsuite/gcc.target/arm/simd/vceqzq_p64.c
> > > @@ -15,4 +15,4 @@ void func()
> > >result2 = vceqzq_p64 (v2);
> > >  }
> > >
> > > -/* { dg-final { scan-assembler-times "vceq\.i32\[
> > > \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+\n" 2 } } */
> > > +/* { dg-final { scan-assembler-times "vceq\.i32\[
> > > \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+, #0\n" 2 } } */


[committed] libstdc++: Optimize std::string_view::find [PR 66414]

2021-01-27 Thread Jonathan Wakely via Gcc-patches
This reuses the code from std::string::find, which was improved by
r244225, but string_view was not changed to match.

libstdc++-v3/ChangeLog:

PR libstdc++/66414
* include/bits/string_view.tcc
(basic_string_view::find(const CharT*, size_type, size_type)):
Optimize.

Tested x86_64-linux and powerpc64le-linux. Committed to trunk.

This makes string_view::find as fast as string::find, rather than 2-10
times slower! We should probably consider whether to define the find
algorithm once and reuse it in string and string_view, rather than
duplicating the code.

In stage 1 I'll look at using memmem here for the specializations
using std::char_traits.



commit a199da782fc165fd45f42a15cc9020994efd455d
Author: Jonathan Wakely 
Date:   Wed Jan 27 13:21:52 2021

libstdc++: Optimize std::string_view::find [PR 66414]

This reuses the code from std::string::find, which was improved by
r244225, but string_view was not changed to match.

libstdc++-v3/ChangeLog:

PR libstdc++/66414
* include/bits/string_view.tcc
(basic_string_view::find(const CharT*, size_type, size_type)):
Optimize.

diff --git a/libstdc++-v3/include/bits/string_view.tcc 
b/libstdc++-v3/include/bits/string_view.tcc
index bcd8fc1339e..efb0edee26a 100644
--- a/libstdc++-v3/include/bits/string_view.tcc
+++ b/libstdc++-v3/include/bits/string_view.tcc
@@ -50,15 +50,27 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __glibcxx_requires_string_len(__str, __n);
 
   if (__n == 0)
-   return __pos <= this->_M_len ? __pos : npos;
+   return __pos <= _M_len ? __pos : npos;
+  if (__pos >= _M_len)
+   return npos;
 
-  if (__n <= this->_M_len)
+  const _CharT __elem0 = __str[0];
+  const _CharT* __first = _M_str + __pos;
+  const _CharT* const __last = _M_str + _M_len;
+  size_type __len = _M_len - __pos;
+
+  while (__len >= __n)
{
- for (; __pos <= this->_M_len - __n; ++__pos)
-   if (traits_type::eq(this->_M_str[__pos], __str[0])
-   && traits_type::compare(this->_M_str + __pos + 1,
-   __str + 1, __n - 1) == 0)
- return __pos;
+ // Find the first occurrence of __elem0:
+ __first = traits_type::find(__first, __len - __n + 1, __elem0);
+ if (!__first)
+   return npos;
+ // Compare the full strings from the first occurrence of __elem0.
+ // We already know that __first[0] == __s[0] but compare them again
+ // anyway because __s is probably aligned, which helps memcmp.
+ if (traits_type::compare(__first, __str, __n) == 0)
+   return __first - _M_str;
+ __len = __last - ++__first;
}
   return npos;
 }


Re: arm: Adjust cost of vector of constant zero

2021-01-27 Thread Christophe Lyon via Gcc-patches
On Wed, 27 Jan 2021 at 14:44, Kyrylo Tkachov  wrote:
>
>
>
> > -Original Message-
> > From: Christophe Lyon 
> > Sent: 27 January 2021 13:12
> > To: Kyrylo Tkachov 
> > Cc: Kyrylo Tkachov via Gcc-patches 
> > Subject: Re: arm: Adjust cost of vector of constant zero
> >
> > On Wed, 27 Jan 2021 at 10:15, Kyrylo Tkachov 
> > wrote:
> > >
> > > Hi Christophe,
> > >
> > > > -Original Message-
> > > > From: Gcc-patches  On Behalf Of
> > > > Christophe Lyon via Gcc-patches
> > > > Sent: 26 January 2021 18:03
> > > > To: gcc Patches 
> > > > Subject: arm: Adjust cost of vector of constant zero
> > > >
> > > > Neon vector comparisons have a dedicated version when comparing with
> > > > constant zero: it means its cost is free.
> > > >
> > > > Adjust the cost in arm_rtx_costs_internal accordingly, for Neon only,
> > > > since MVE does not support this.
> > >
> > > I guess the other way to do this would be in the comparison code handling
> > in this function where we could check for a const_vector of zeroes and a
> > Neon mode and avoid recursing into the operands.
> > > That would avoid the extra switch statement in your patch.
> > > WDYT?
> >
> > Do you mean like so:
> > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> > index 4a5f265..542c15e 100644
> > --- a/gcc/config/arm/arm.c
> > +++ b/gcc/config/arm/arm.c
> > @@ -11316,6 +11316,28 @@ arm_rtx_costs_internal (rtx x, enum rtx_code
> > code, enum rtx_code outer_code,
> > *cost = 0;
> > return true;
> >   }
> > +  /* Neon has special instructions when comparing with 0 (vceq, vcge,
> > vcgt,
> > + vcle and vclt). */
> > +  else if (TARGET_NEON
> > +&& TARGET_HARD_FLOAT
> > +&& (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE
> > (mode))
> > +&& (XEXP (x, 1) == CONST0_RTX (mode)))
> > + {
> > +   switch (code)
> > + {
> > + case EQ:
> > + case GE:
> > + case GT:
> > + case LE:
> > + case LT:
> > +   *cost = 0;
> > +   return true;
> > +
> > + default:
> > +   break;
> > + }
> > + }
> > +
> >return false;
> >
> > I'm not sure I can remove the switch, since the other comparisons are
> > not supported by Neon anyway.
> >
>
> No, I mean where:
> case EQ:
> case NE:
> case LT:
> case LE:
> case GT:
> case GE:
> case LTU:
> case LEU:
> case GEU:
> case GTU:
> case ORDERED:
> case UNORDERED:
> case UNEQ:
> case UNLE:
> case UNLT:
> case UNGE:
> case UNGT:
> case LTGT:
>   if (outer_code == SET)
> {
>   /* Is it a store-flag operation?  */
>   if (REG_P (XEXP (x, 0)) && REGNO (XEXP (x, 0)) == CC_REGNUM
>
> you reorder the codes that are relevant to NEON, handle them for a vector 
> zero argument (and the right target checks), and fall through to the rest if 
> not.
>

OK, I didn't find reordering this appealing :-)

Like so, then?

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 4a5f265..88e398d 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -11211,11 +11211,23 @@ arm_rtx_costs_internal (rtx x, enum rtx_code
code, enum rtx_code outer_code,
   return true;

 case EQ:
-case NE:
-case LT:
-case LE:
-case GT:
 case GE:
+case GT:
+case LE:
+case LT:
+  /* Neon has special instructions when comparing with 0 (vceq, vcge, vcgt,
+ vcle and vclt). */
+  if (TARGET_NEON
+   && TARGET_HARD_FLOAT
+   && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode))
+   && (XEXP (x, 1) == CONST0_RTX (mode)))
+ {
+   *cost = 0;
+   return true;
+ }
+
+  /* Fall through.  */
+case NE:
 case LTU:
 case LEU:
 case GEU:


Thanks,

Christophe

> Kyrill
>
> > Thanks,
> >
> > Christophe
> >
> >
> > > Thanks,
> > > Kyrill
> > >
> > > >
> > > > 2021-01-26  Christophe Lyon  
> > > >
> > > > gcc/
> > > > PR target/98730
> > > > * config/arm/arm.c (arm_rtx_costs_internal): Adjust cost of vector
> > > > of constant zero for comparisons.
> > > >
> > > > gcc/testsuite/
> > > > PR target/98730
> > > > * gcc.target/arm/simd/vceqzq_p64.c: Update expected result.
> > > >
> > > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> > > > index 4a5f265..9c5c0df 100644
> > > > --- a/gcc/config/arm/arm.c
> > > > +++ b/gcc/config/arm/arm.c
> > > > @@ -11544,7 +11544,28 @@ arm_rtx_costs_internal (rtx x, enum
> > rtx_code
> > > > code, enum rtx_code outer_code,
> > > >   && (VALID_NEON_DREG_MODE (mode) ||
> > VALID_NEON_QREG_MODE
> > > > (mode)))
> > > >  || TARGET_HAVE_MVE)
> > > > && simd_immediate_valid_for_move (x, mode, NULL, NULL))
> > > > - *cost = COSTS_N_INSNS (1);
> > > > + {
> > > > +   *cost = COSTS_N_INSNS (1);
> > > > +
> > > > +   /* Neon has special instructions when comparing with 0 (vceq, vcge,
> > > > +  vcgt, vcle and vclt). */
> > > > +   if (TARGET_NEON && (x == CONST0_RTX (mode)))
> > > > + {
> > > > +   switch (outer_code)
> > > > + {
> > > > + ca

RE: arm: Adjust cost of vector of constant zero

2021-01-27 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Christophe Lyon 
> Sent: 27 January 2021 13:56
> To: Kyrylo Tkachov 
> Cc: Kyrylo Tkachov via Gcc-patches 
> Subject: Re: arm: Adjust cost of vector of constant zero
> 
> On Wed, 27 Jan 2021 at 14:44, Kyrylo Tkachov 
> wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Christophe Lyon 
> > > Sent: 27 January 2021 13:12
> > > To: Kyrylo Tkachov 
> > > Cc: Kyrylo Tkachov via Gcc-patches 
> > > Subject: Re: arm: Adjust cost of vector of constant zero
> > >
> > > On Wed, 27 Jan 2021 at 10:15, Kyrylo Tkachov
> 
> > > wrote:
> > > >
> > > > Hi Christophe,
> > > >
> > > > > -Original Message-
> > > > > From: Gcc-patches  On Behalf
> Of
> > > > > Christophe Lyon via Gcc-patches
> > > > > Sent: 26 January 2021 18:03
> > > > > To: gcc Patches 
> > > > > Subject: arm: Adjust cost of vector of constant zero
> > > > >
> > > > > Neon vector comparisons have a dedicated version when comparing
> with
> > > > > constant zero: it means its cost is free.
> > > > >
> > > > > Adjust the cost in arm_rtx_costs_internal accordingly, for Neon only,
> > > > > since MVE does not support this.
> > > >
> > > > I guess the other way to do this would be in the comparison code
> handling
> > > in this function where we could check for a const_vector of zeroes and a
> > > Neon mode and avoid recursing into the operands.
> > > > That would avoid the extra switch statement in your patch.
> > > > WDYT?
> > >
> > > Do you mean like so:
> > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> > > index 4a5f265..542c15e 100644
> > > --- a/gcc/config/arm/arm.c
> > > +++ b/gcc/config/arm/arm.c
> > > @@ -11316,6 +11316,28 @@ arm_rtx_costs_internal (rtx x, enum
> rtx_code
> > > code, enum rtx_code outer_code,
> > > *cost = 0;
> > > return true;
> > >   }
> > > +  /* Neon has special instructions when comparing with 0 (vceq, vcge,
> > > vcgt,
> > > + vcle and vclt). */
> > > +  else if (TARGET_NEON
> > > +&& TARGET_HARD_FLOAT
> > > +&& (VALID_NEON_DREG_MODE (mode) ||
> VALID_NEON_QREG_MODE
> > > (mode))
> > > +&& (XEXP (x, 1) == CONST0_RTX (mode)))
> > > + {
> > > +   switch (code)
> > > + {
> > > + case EQ:
> > > + case GE:
> > > + case GT:
> > > + case LE:
> > > + case LT:
> > > +   *cost = 0;
> > > +   return true;
> > > +
> > > + default:
> > > +   break;
> > > + }
> > > + }
> > > +
> > >return false;
> > >
> > > I'm not sure I can remove the switch, since the other comparisons are
> > > not supported by Neon anyway.
> > >
> >
> > No, I mean where:
> > case EQ:
> > case NE:
> > case LT:
> > case LE:
> > case GT:
> > case GE:
> > case LTU:
> > case LEU:
> > case GEU:
> > case GTU:
> > case ORDERED:
> > case UNORDERED:
> > case UNEQ:
> > case UNLE:
> > case UNLT:
> > case UNGE:
> > case UNGT:
> > case LTGT:
> >   if (outer_code == SET)
> > {
> >   /* Is it a store-flag operation?  */
> >   if (REG_P (XEXP (x, 0)) && REGNO (XEXP (x, 0)) == CC_REGNUM
> >
> > you reorder the codes that are relevant to NEON, handle them for a vector
> zero argument (and the right target checks), and fall through to the rest if
> not.
> >
> 
> OK, I didn't find reordering this appealing :-)
> 
> Like so, then?
> 
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 4a5f265..88e398d 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -11211,11 +11211,23 @@ arm_rtx_costs_internal (rtx x, enum rtx_code
> code, enum rtx_code outer_code,
>return true;
> 
>  case EQ:
> -case NE:
> -case LT:
> -case LE:
> -case GT:
>  case GE:
> +case GT:
> +case LE:
> +case LT:
> +  /* Neon has special instructions when comparing with 0 (vceq, vcge,
> vcgt,
> + vcle and vclt). */
> +  if (TARGET_NEON
> +   && TARGET_HARD_FLOAT
> +   && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE
> (mode))
> +   && (XEXP (x, 1) == CONST0_RTX (mode)))
> + {
> +   *cost = 0;
> +   return true;
> + }
> +
> +  /* Fall through.  */
> +case NE:
>  case LTU:
>  case LEU:
>  case GEU:
> 

I find it much cleaner, but I guess it's subjective 😊
Ok if it passes bootstrap and testing.
Thanks,
Kyrill

> 
> Thanks,
> 
> Christophe
> 
> > Kyrill
> >
> > > Thanks,
> > >
> > > Christophe
> > >
> > >
> > > > Thanks,
> > > > Kyrill
> > > >
> > > > >
> > > > > 2021-01-26  Christophe Lyon  
> > > > >
> > > > > gcc/
> > > > > PR target/98730
> > > > > * config/arm/arm.c (arm_rtx_costs_internal): Adjust cost of vector
> > > > > of constant zero for comparisons.
> > > > >
> > > > > gcc/testsuite/
> > > > > PR target/98730
> > > > > * gcc.target/arm/simd/vceqzq_p64.c: Update expected result.
> > > > >
> > > > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> > > > > index 4a5f265..9c5c0df 100644
> > > > > --- a/gcc/config/arm/arm.c
> > > > > +++ b/gcc/config

Re: [PATCH] libstdc++: implement locale support for AIX

2021-01-27 Thread Rainer Orth
Hi Clement,

> Here is a better version of the patch. 
> All tests are on Linux are passing. Few have been disabled as 
> they are working only with GNU model. 
> For AIX, few failures remains. I haven't XFAIL them yet, as I 
> want to know if they AIX only or related to the model itself. 
>
> A few part still need to be improved (dg-require-localmodel,
> std::locale:global, FreeBSD specific #ifdef). 
> But at least it can be tested in most of the platforms as is. 
>
> Note that I'll stop working on it until gcc12. Mostly because gcc
> is on freeze but also because I've more urgent stuff to do right now. 
> Of course any feedbacks is welcome ! But I might not send a
> new patch if it requires too much time (at least not right now). 
>
> Thanks anyway Rainer and Jonathan for your help ! I hope this 
> version suits you better !

very much so, thanks a lot for your work!  I've just looked over it to
determine what changes to config/os/solaris are necessary and found a
few nits:

* There are minor formatting issues:

  Should the linebreak in the extern inline definitions of strtof_l be
  after the return type, not before, matching GNU coding standards?  It
  may well be that the C++ style is different, though.

  Unrelated whitespace changes in xpg7/ctype_members.cc

* The changes in the copyright ranges need to be undone, given that this
  is just a renamed/augmented version of the previous dragonfly code.

* Seeing the __DragonFly__ || __FreeBSD__ again, I had a quick look at
  the FreeBSD 12.2 headers and found that localeconv_l does take a
  locale_t arg, just like uselocale.  DragonFlyBSD seems to be the same
  according to their online manuals.  I expect to give the code a try in
  a FreeBSD 12.2 VM at some point to check.

* While you now define _GLIBCXX_C_LOCALE_XPG7 in
  config/locale/xpg7/c_locale.h, config/os/aix/ctype_configure_char.cc
  still tests the previous _GLIBCXX_C_LOCALE_IEEE_2008.

Nothing tested yet, just wanted to point those out ASAP.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH] rtl-optimization/80960 - avoid creating garbage RTL in DSE

2021-01-27 Thread Richard Biener
The following avoids repeatedly turning VALUE RTXen into
sth useful and re-applying a constant offset through get_addr
via DSE check_mem_read_rtx.  Instead perform this once for
all stores to be visited in check_mem_read_rtx.  This avoids
allocating 1.6GB of garbage PLUS RTXen on the PR80960
testcase, fixing the memory usage regression from old GCC.

Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

2021-01-27  Richard Biener  

PR rtl-optimization/80960
* dse.c (check_mem_read_rtx): Call get_addr on the
offsetted address.
---
 gcc/dse.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/dse.c b/gcc/dse.c
index c88587e7d94..da0df54a2dd 100644
--- a/gcc/dse.c
+++ b/gcc/dse.c
@@ -2219,6 +2219,11 @@ check_mem_read_rtx (rtx *loc, bb_info_t bb_info)
 }
   if (maybe_ne (offset, 0))
 mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset);
+  /* Avoid passing VALUE RTXen as mem_addr to canon_true_dependence
+ which will over and over re-create proper RTL and re-apply the
+ offset above.  See PR80960 where we almost allocate 1.6GB of PLUS
+ RTXen that way.  */
+  mem_addr = get_addr (mem_addr);
 
   if (group_id >= 0)
 {
-- 
2.26.2


Re: [PATCH] libstdc++: implement locale support for AIX

2021-01-27 Thread CHIGOT, CLEMENT via Gcc-patches
> * There are minor formatting issues:
> 
>   Should the linebreak in the extern inline definitions of strtof_l be
>   after the return type, not before, matching GNU coding standards?  It
>   may well be that the C++ style is different, though.
> 
>  Unrelated whitespace changes in xpg7/ctype_members.cc

I haven't yet check all the format issues. Especially, because the previous
code was already wrongly formatted at some point. I'll do that when 
releasing a "final" version of the patch.

> * The changes in the copyright ranges need to be undone, given that this
>  is just a renamed/augmented version of the previous dragonfly code.

Ok, I wasn't sure about that. I'll revert. 

> * Seeing the __DragonFly__ || __FreeBSD__ again, I had a quick look at
>   the FreeBSD 12.2 headers and found that localeconv_l does take a
>   locale_t arg, just like uselocale.  DragonFlyBSD seems to be the same
>   according to their online manuals.  I expect to give the code a try in
>   a FreeBSD 12.2 VM at some point to check.

Actually, it seems that the problem isn't about input argument, but 
the output which should "int*".

> * While you now define _GLIBCXX_C_LOCALE_XPG7 in
>  config/locale/xpg7/c_locale.h, config/os/aix/ctype_configure_char.cc
>   still tests the previous _GLIBCXX_C_LOCALE_IEEE_2008.

Arf, I've missed that. it might not be the last patch then. 
(I've made so much versions of it as I've tried to backport it to our 
old versions. It seems that I've mixed up things...) 


[PATCH] tree-optimization/98854 - avoid some PHI BB vectorization

2021-01-27 Thread Richard Biener
This avoids cases of PHI node vectorization that just causes us
to insert vector CTORs inside loops for values only required
outside of the loop.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

2021-01-27  Richard Biener  

PR tree-optimization/98854
* tree-vect-slp.c (vect_build_slp_tree_2): Also build
PHIs from scalars when the number of CTORs matches the
number of children.

* gcc.dg/vect/bb-slp-pr98854.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr98854.c | 24 ++
 gcc/tree-vect-slp.c|  5 -
 2 files changed, 28 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr98854.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr98854.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr98854.c
new file mode 100644
index 000..0c8141e1d17
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr98854.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+
+double a[1024];
+
+int bar();
+void foo (int n)
+{
+  double x = 0, y = 0;
+  int i = 1023;
+  do
+{
+  x += a[i] + a[i+1];
+  y += a[i] / a[i+1];
+  if (bar ())
+break;
+}
+  while (--i);
+  /* We want to avoid vectorizing the LC PHI and insert vector CTORs
+ inside of the loop where it is only needed here.  */
+  a[0] = x;
+  a[1] = y;
+}
+
+/* { dg-final { scan-tree-dump-not "vectorizing SLP node starting from: 
._\[0-9\]+ = PHI" "slp1" } } */
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 4465cf7494e..10b876ff5ed 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -1896,7 +1896,10 @@ fail:
n_vector_builds++;
}
}
-  if (all_uniform_p || n_vector_builds > 1)
+  if (all_uniform_p
+ || n_vector_builds > 1
+ || (n_vector_builds == children.length ()
+ && is_a  (stmt_info->stmt)))
{
  /* Roll back.  */
  matches[0] = false;
-- 
2.26.2


Re: [PATCH] rtl-optimization/80960 - avoid creating garbage RTL in DSE

2021-01-27 Thread Jakub Jelinek via Gcc-patches
On Wed, Jan 27, 2021 at 03:40:38PM +0100, Richard Biener wrote:
> The following avoids repeatedly turning VALUE RTXen into
> sth useful and re-applying a constant offset through get_addr
> via DSE check_mem_read_rtx.  Instead perform this once for
> all stores to be visited in check_mem_read_rtx.  This avoids
> allocating 1.6GB of garbage PLUS RTXen on the PR80960
> testcase, fixing the memory usage regression from old GCC.
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK?
> 
> Thanks,
> Richard.
> 
> 2021-01-27  Richard Biener  
> 
>   PR rtl-optimization/80960
>   * dse.c (check_mem_read_rtx): Call get_addr on the
>   offsetted address.
> ---
>  gcc/dse.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/gcc/dse.c b/gcc/dse.c
> index c88587e7d94..da0df54a2dd 100644
> --- a/gcc/dse.c
> +++ b/gcc/dse.c
> @@ -2219,6 +2219,11 @@ check_mem_read_rtx (rtx *loc, bb_info_t bb_info)
>  }
>if (maybe_ne (offset, 0))
>  mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset);
> +  /* Avoid passing VALUE RTXen as mem_addr to canon_true_dependence
> + which will over and over re-create proper RTL and re-apply the
> + offset above.  See PR80960 where we almost allocate 1.6GB of PLUS
> + RTXen that way.  */
> +  mem_addr = get_addr (mem_addr);
>  
>if (group_id >= 0)
>  {

Does that result in any changes on how much does DSE optimize?
I mean, if you do 2 bootstraps/regtests, one with this patch and another one
without it, and at the end of rest_of_handle_dse dump
locally_deleted, globally_deleted
for each CU/function, do you get the same counts except perhaps for dse.c?

Jakub



RE: [PATCH] aarch64: Use GCC vector extensions for integer mls intrinsics

2021-01-27 Thread Kyrylo Tkachov via Gcc-patches


From: Jonathan Wright  
Sent: 27 January 2021 12:57
To: Richard Sandiford ; Kyrylo Tkachov 

Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw 
Subject: Re: [PATCH] aarch64: Use GCC vector extensions for integer mls 
intrinsics

I have re-written this to use RTL builtins - regression tested and bootstrapped 
on aarch64-none-linux-gnu with no issues:

aarch64: Use RTL builtins for integer mls intrinsics
Rewrite integer mls Neon intrinsics to use RTL builtins rather than
inline assembly code, allowing for better scheduling and
optimization.

gcc/Changelog:

2021-01-11  Jonathan Wright  

* config/aarch64/aarch64-simd-builtins.def: Add mls builtin
generator macro.
* config/aarch64/arm_neon.h (vmls_s8): Use RTL builtin rather
than asm.
(vmls_s16): Likewise.
(vmls_s32): Likewise.
(vmls_u8): Likewise.
(vmls_u16): Likewise.
(vmls_u32): Likewise.
(vmlsq_s8): Likewise.
(vmlsq_s16): Likewise.
(vmlsq_s32): Likewise.
(vmlsq_u8): Likewise.
(vmlsq_u16): Likewise.
(vmlsq_u32): Likewise.
​

Ok.
Thanks,
Kyrill


From: Richard Sandiford 
Sent: 19 January 2021 17:43
To: Jonathan Wright 
Cc: mailto:gcc-patches@gcc.gnu.org ; Richard 
Earnshaw ; Kyrylo Tkachov 

Subject: Re: [PATCH] aarch64: Use GCC vector extensions for integer mls 
intrinsics 
 
Jonathan Wright  writes:
> Hi,
>
> As subject, this patch rewrites integer mls Neon intrinsics to use
> a - b * c rather than inline assembly code, allowing for better
> scheduling and optimization.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
>
> If ok, please commit to master (I don't have commit rights.)

Thanks for doing this.  The patch looks good from a functional
point of view.  I guess my only concern is that things like:

    a = vmla_u8 (vmulq_u8 (b, c), d, e);

would become:

    a = b * c + d * e;

and I don't think anything guarantees that the user's original
choice of instructon selection will be preserved.  We might end
up with the equivalent of:

    a = vmla_u8 (vmulq_u8 (d, e), b, c);

giving different latencies.

If we added built-in functions instead, we could lower them to
IFN_FMA and IFN_FNMA, which support integers as well as floats,
and which stand a better chance of preserving the original grouping.

There again, the unfused floating-point MLAs already decompose
into separate multiplies and adds (although they can't of course
use IFN_FMA).

Any thoughts on doing it that way instead?

I'm not saying the patch shouldn't go in though, just thought it
was worth asking.

Thanks,
Richard

>
> Thanks,
> Jonathan
>
> ---
>
> gcc/Changelog:
>
> 2021-01-14  Jonathan Wright  
>
> * config/aarch64/arm_neon.h (vmls_s8): Use C rather than asm.
> (vmls_s16): Likewise.
> (vmls_s32): Likewise.
> (vmls_u8): Likewise.
> (vmls_u16): Likewise.
> (vmls_u32): Likewise.
> (vmlsq_s8): Likewise.
> (vmlsq_s16): Likewise.
> (vmlsq_s32): Likewise.
> (vmlsq_u8): Likewise.
> (vmlsq_u16): Likewise.
> (vmlsq_u32): Likewise.
>
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 
> 608e582d25820062a409310e7f3fc872660f8041..ad04eab1e753aa86f20a8f6cc2717368b1840ef7
>  100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -7968,72 +7968,45 @@ __extension__ extern __inline int8x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmls_s8 (int8x8_t __a, int8x8_t __b, int8x8_t __c)
>  {
> -  int8x8_t __result;
> -  __asm__ ("mls %0.8b,%2.8b,%3.8b"
> -   : "=w"(__result)
> -   : "0"(__a), "w"(__b), "w"(__c)
> -   : /* No clobbers */);
> -  return __result;
> +  uint8x8_t __result = (uint8x8_t) __a - (uint8x8_t) __b * (uint8x8_t) __c;
> +  return (int8x8_t) __result;
>  }
>  
>  __extension__ extern __inline int16x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmls_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
>  {
> -  int16x4_t __result;
> -  __asm__ ("mls %0.4h,%2.4h,%3.4h"
> -   : "=w"(__result)
> -   : "0"(__a), "w"(__b), "w"(__c)
> -   : /* No clobbers */);
> -  return __result;
> +  uint16x4_t __result = (uint16x4_t) __a - (uint16x4_t) __b * (uint16x4_t) 
> __c;
> +  return (int16x4_t) __result;
>  }
>  
>  __extension__ extern __inline int32x2_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vmls_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
>  {
> -  int32x2_t __result;
> -  __asm__ ("mls %0.2s,%2.2s,%3.2s"
> -   : "=w"(__result)
> -   : "0"(__a), "w"(__b), "w"(__c)
> -   : /* No clobbers */);
> -  return __result

Re: follow SSA defs for asan base

2021-01-27 Thread Richard Biener via Gcc-patches
On Wed, Jan 27, 2021 at 1:29 PM Alexandre Oliva  wrote:
>
> On Jan 26, 2021, Richard Biener  wrote:
>
> > So while I think it's safe let's look at if we can improve tree-nested.c,
> > like I see (probably not the correct place):
>
> *nod*, it's just not the *only* place.
>
> > seeing how we adjust current_function_decl around the
> > recompute_tree_invariant_for_addr_expr call but not the
> > gsi_gimplify_val one (we already pass it a nesting_info,
> > not sure if wi->info is the same as the 'info' used above though),
> > so eventually we can fix it in one place?
>
> There are pieces of nested function lowering for which we set cfun and
> current_function_decl while walking each function, and there are other
> pieces that just don't bother, and we only set up current_function_decl
> temporarily for ADDR_EXPR handling.
>
> This patch adjusts both of the ADDR_EXPR handlers that override
> current_function_decl, so that the temporary overriding remains in
> effect during the re-gimplification.  That is enough to avoid the
> problem.  But I'm not very happy with this temporary overriding, it
> seems fishy.  I'd rather we set things up for the entire duration of the
> walking of each function.
>
> But that's only relevant because we rely on current_function_decl for
> address handling.  It's not clear to me that we should, as the other
> patch demonstrated.  With it, we could probably even do away with these
> overriders.

True, but I guess at least documentation of the predicates need to be
more precise.  I've also considered making the current_function_decl
accesses function parameters instead.  We do have
decl_address_ip_invariant_p on which decl_address_invariant_p
could build on by simply doing

 decl_address_ip_invariant_p (op) || auto_var_p (op)

(if there were not the strange STRING_CST handling in the _ip_ variant)

> But, for this stage, this is probably as conservative a change as we
> could possibly hope for.  I've regstrapped it on x86_64-linux-gnu, and
> also bootstrapped it with asan and ubsan.  Ok to install?

Yes, OK.

Thanks,
Richard.

>
> restore current_function_decl after re-gimplifying nested ADDR_EXPRs
>
> From: Alexandre Oliva 
>
> Ada makes extensive use of nested functions, which turn all automatic
> variables of the enclosing function that are used in nested ones into
> members of an artificial FRAME record type.
>
> The address of a local variable is usually passed to asan marking
> functions without using a temporary.  asan_expand_mark_ifn will reject
> an ADDR_EXPRs if it's split out from the call into an SSA_NAMEs.
>
> Taking the address of a member of FRAME within a nested function was
> not regarded as a gimple val: while introducing FRAME variables,
> current_function_decl pointed to the outermost function, even while
> processing a nested function, so decl_address_invariant_p, checking
> that the context of the variable is current_function_decl, returned
> false for such ADDR_EXPRs.
>
> decl_address_invariant_p, called when determining whether an
> expression is a legitimate gimple value, compares the context of
> automatic variables with current_function_decl.  Some of the
> tree-nested function processing doesn't set current_function_decl, but
> ADDR_EXPR-processing bits temporarily override it.  However, they
> restore it before re-gimplifying, which causes even ADDR_EXPRs
> referencing automatic variables in the FRAME struct of a nested
> function to not be regarded as address-invariant.
>
> This patch moves the restores of current_function_decl in the
> ADDR_EXPR-handling bits after the re-gimplification, so that the
> correct current_function_decl is used when testing for address
> invariance.
>
>
> for  gcc/ChangeLog
>
> * tree-nested.c (convert_nonlocal_reference_op): Move
> current_function_decl restore after re-gimplification.
> (convert_local_reference_op): Likewise.
>
> for  gcc/testsuite/ChangeLog
>
> * gcc.dg/asan/nested-1.c: New.
> ---
>  gcc/testsuite/gcc.dg/asan/nested-1.c |   24 
>  gcc/tree-nested.c|4 ++--
>  2 files changed, 26 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/asan/nested-1.c
>
> diff --git a/gcc/testsuite/gcc.dg/asan/nested-1.c 
> b/gcc/testsuite/gcc.dg/asan/nested-1.c
> new file mode 100644
> index 0..87e842098077c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/asan/nested-1.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-options "-fsanitize=address" } */
> +
> +int f(int i) {
> +  auto int h() {
> +int r;
> +int *p;
> +
> +{
> +  int x[3];
> +
> +  auto int g() {
> +   return x[i];
> +  }
> +
> +  p = &r;
> +  *p = g();
> +}
> +
> +return *p;
> +  }
> +
> +  return h();
> +}
> diff --git a/gcc/tree-nested.c b/gcc/tree-nested.c
> index 1b52669b622aa..addd6eef9aba6 100644
> --- a/gcc/tree-nested.c
> +++ b/gcc/tree-nested.c
> @@ -1214,7 +1214,6 @@ convert_nonlocal_reference_o

aarch64: Use RTL builtins for integer mls_n intrinsics

2021-01-27 Thread Jonathan Wright via Gcc-patches
Hi,

As subject, this patch rewrites integer mls_n Neon intrinsics to use RTL
builtins rather than inline assembly code, allowing for better scheduling
and optimization.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-01-15  Jonathan Wright  

* config/aarch64/aarch64-simd-builtins.def: Add mls_n builtin
generator macro.
* config/aarch64/aarch64-simd.md (*aarch64_mls_elt_merge):
Rename to...
(aarch64_mls_n): This.
* config/aarch64/arm_neon.h (vmls_n_s16): Use RTL builtin
instead of asm.
(vmls_n_s32): Likewise.
(vmls_n_u16): Likewise.
(vmls_n_u32): Likewise.
(vmlsq_n_s16): Likewise.
(vmlsq_n_s32): Likewise.
(vmlsq_n_u16): Likewise.
(vmlsq_n_u32): Likewise.
​diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 93a087987bb7f039b2f85a6e1d2e05eb95fa0058..32aee6024a89e6ca1f423717463fe67d011afd8b 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -185,6 +185,8 @@
 
   /* Implemented by aarch64_mls.  */
   BUILTIN_VDQ_BHSI (TERNOP, mls, 0, NONE)
+  /* Implemented by aarch64_mls_n.  */
+  BUILTIN_VDQHS (TERNOP, mls_n, 0, NONE)
 
   /* Implemented by aarch64_mlsl.  */
   BUILTIN_VD_BHSI (TERNOP, smlsl, 0, NONE)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 693a61871051cb5030811e772b21bd0429c0fddb..544bac7dc9b62a9d5387465ec26d0e3204be6601 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1443,15 +1443,16 @@
   [(set_attr "type" "neon_mla__scalar")]
 )
 
-(define_insn "*aarch64_mls_elt_merge"
+(define_insn "aarch64_mls_n"
   [(set (match_operand:VDQHS 0 "register_operand" "=w")
 	(minus:VDQHS
 	  (match_operand:VDQHS 1 "register_operand" "0")
-	  (mult:VDQHS (vec_duplicate:VDQHS
-		  (match_operand: 2 "register_operand" ""))
-		(match_operand:VDQHS 3 "register_operand" "w"]
+	  (mult:VDQHS
+	(vec_duplicate:VDQHS
+	  (match_operand: 3 "register_operand" ""))
+	(match_operand:VDQHS 2 "register_operand" "w"]
   "TARGET_SIMD"
-  "mls\t%0., %3., %2.[0]"
+  "mls\t%0., %2., %3.[0]"
   [(set_attr "type" "neon_mla__scalar")]
 )
 
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 45b3c125babae2e3d32d6cd3b36ce09c502c04d8..d891067f021a0bcc24af79dfbe2d9dd5889b23bc 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -7840,48 +7840,32 @@ __extension__ extern __inline int16x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmls_n_s16 (int16x4_t __a, int16x4_t __b, int16_t __c)
 {
-  int16x4_t __result;
-  __asm__ ("mls %0.4h, %2.4h, %3.h[0]"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "x"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __builtin_aarch64_mls_nv4hi (__a, __b, __c);
 }
 
 __extension__ extern __inline int32x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmls_n_s32 (int32x2_t __a, int32x2_t __b, int32_t __c)
 {
-  int32x2_t __result;
-  __asm__ ("mls %0.2s, %2.2s, %3.s[0]"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __builtin_aarch64_mls_nv2si (__a, __b, __c);
 }
 
 __extension__ extern __inline uint16x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmls_n_u16 (uint16x4_t __a, uint16x4_t __b, uint16_t __c)
 {
-  uint16x4_t __result;
-  __asm__ ("mls %0.4h, %2.4h, %3.h[0]"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "x"(__c)
-   : /* No clobbers */);
-  return __result;
+  return (uint16x4_t) __builtin_aarch64_mls_nv4hi ((int16x4_t) __a,
+   (int16x4_t) __b,
+   (int16_t) __c);
 }
 
 __extension__ extern __inline uint32x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmls_n_u32 (uint32x2_t __a, uint32x2_t __b, uint32_t __c)
 {
-  uint32x2_t __result;
-  __asm__ ("mls %0.2s, %2.2s, %3.s[0]"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return (uint32x2_t) __builtin_aarch64_mls_nv2si ((int32x2_t) __a,
+   (int32x2_t) __b,
+   (int32_t) __c);
 }
 
 __extension__ extern __inline int8x8_t
@@ -8353,48 +8337,32 @@ __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmlsq_n_s16 (int16x8_t __a, int16x8_t __b, int16_t __c)
 {
-  int16x8_t __result;
-  __asm__ ("mls %0.8h, %2.8h, %3.h[0]"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "x"(__c)
-  

RE: aarch64: Use RTL builtins for integer mls_n intrinsics

2021-01-27 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Jonathan Wright 
> Sent: 27 January 2021 15:08
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: aarch64: Use RTL builtins for integer mls_n intrinsics
> 
> Hi,
> 
> As subject, this patch rewrites integer mls_n Neon intrinsics to use RTL
> builtins rather than inline assembly code, allowing for better scheduling
> and optimization.
> 
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
> 
> Ok for master?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Jonathan
> 
> ---
> 
> gcc/ChangeLog:
> 
> 2021-01-15  Jonathan Wright  
> 
> * config/aarch64/aarch64-simd-builtins.def: Add mls_n builtin
> generator macro.
> * config/aarch64/aarch64-simd.md (*aarch64_mls_elt_merge):
> Rename to...
> (aarch64_mls_n): This.
> * config/aarch64/arm_neon.h (vmls_n_s16): Use RTL builtin
> instead of asm.
> (vmls_n_s32): Likewise.
> (vmls_n_u16): Likewise.
> (vmls_n_u32): Likewise.
> (vmlsq_n_s16): Likewise.
> (vmlsq_n_s32): Likewise.
> (vmlsq_n_u16): Likewise.
> (vmlsq_n_u32): Likewise.
> ​


Re: [RFC] test builtin ratio for loop distribution

2021-01-27 Thread Richard Biener via Gcc-patches
On Wed, Jan 27, 2021 at 2:18 PM Alexandre Oliva  wrote:
>
>
> This patch attempts to fix a libgcc codegen regression introduced in
> gcc-10, as -ftree-loop-distribute-patterns was enabled at -O2.
>
>
> The ldist pass turns even very short loops into memset calls.  E.g.,
> the TFmode emulation calls end with a loop of up to 3 iterations, to
> zero out trailing words, and the loop distribution pass turns them
> into calls of the memset builtin.
>
> Though short constant-length memsets are usually dealt with
> efficiently, for non-constant-length ones, the options are setmemM, or
> a function calls.

There's also folding which is expected to turn small memcpy/memset
into simple stmts again.

> RISC-V doesn't have any setmemM pattern, so the loops above end up
> "optimized" into memset calls, incurring not only the overhead of an
> explicit call, but also discarding the information the compiler has
> about the alignment of the destination, and that the length is a
> multiple of the word alignment.

But yes, this particular issue has come up before.  We do have some
on-the-side info for alignment and size estimates from profiling, notably
memset expansion will call

  if (currently_expanding_gimple_stmt)
stringop_block_profile (currently_expanding_gimple_stmt,
&expected_align, &expected_size);

I'm not sure whether those would be a suitable vehicle to record
known alignments.  The alternative is to create builtin variants
that encode this info and are usable for the middle-end here.

That said, rather than not transforming the loop as you do I'd
say we want to re-inline small copies more forcefully during
loop distribution code-gen so we turn a loop that sets
3 'short int' to zero into a 'int' store and a 'short' store for example
(we have more code-size leeway here because we formerly had
a loop).

Since you don't add a testcase I can't see whether the actual
case would be fixed by setting SSA pointer alignment on the
memset arguments (that is, whether they are invariant and thus
no SSA name is involved or not).

> This patch adds to the loop distribution pass some cost analysis based
> on preexisting *_RATIO macros, so that we won't transform loops with
> trip counts as low as the ratios we'd rather expand inline.
>
>
> This patch is not finished; it needs adjustments to the testsuite, to
> make up for the behavior changes it brings about.  Specifically, on a
> x86_64-linux-gnu regstrap, it regresses:
>
> > FAIL: gcc.dg/pr53265.c  (test for warnings, line 40)
> > FAIL: gcc.dg/pr53265.c  (test for warnings, line 42)
> > FAIL: gcc.dg/tree-ssa/ldist-38.c scan-tree-dump ldist "split to 0 loops and 
> > 1 library cal> FAIL: g++.dg/tree-ssa/pr78847.C  -std=gnu++14  
> > scan-tree-dump ldist "split to 0 loops and 1 library calls"
> > FAIL: g++.dg/tree-ssa/pr78847.C  -std=gnu++17  scan-tree-dump ldist "split 
> > to 0 loops and 1 library calls"
> > FAIL: g++.dg/tree-ssa/pr78847.C  -std=gnu++2a  scan-tree-dump ldist "split 
> > to 0 loops and 1 library calls"
>
> I suppose just lengthening the loops will take care of ldist-38 and
> pr78847, but the loss of the warnings in pr53265 is more concerning, and
> will require investigation.
>
> Nevertheless, I seek feedback on whether this is an acceptable approach,
> or whether we should use alternate tuning parameters for ldist, or
> something entirely different.  Thanks in advance,
>
>
> for  gcc/ChangeLog
>
> * tree-loop-distribution.c (maybe_normalize_partition): New.
> (loop_distribution::distribute_loop): Call it.
>
> [requires testsuite adjustments and investigation of a warning regression]
> ---
>  gcc/tree-loop-distribution.c |   54 
> ++
>  1 file changed, 54 insertions(+)
>
> diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
> index bb15fd3723fb6..b5198652817ee 100644
> --- a/gcc/tree-loop-distribution.c
> +++ b/gcc/tree-loop-distribution.c
> @@ -2848,6 +2848,52 @@ fuse_memset_builtins (vec 
> *partitions)
>  }
>  }
>
> +/* Return false if it's profitable to turn the LOOP PARTITION into a builtin
> +   call, and true if it wasn't, changing the PARTITION to PKIND_NORMAL.  */
> +
> +static bool
> +maybe_normalize_partition (class loop *loop, struct partition *partition)
> +{
> +  unsigned HOST_WIDE_INT ratio;
> +
> +  switch (partition->kind)
> +{
> +case PKIND_NORMAL:
> +case PKIND_PARTIAL_MEMSET:
> +  return false;
> +
> +case PKIND_MEMSET:
> +  if (integer_zerop (gimple_assign_rhs1 (DR_STMT
> +(partition->builtin->dst_dr
> +   ratio = CLEAR_RATIO (optimize_loop_for_speed_p (loop));
> +  else
> +   ratio = SET_RATIO (optimize_loop_for_speed_p (loop));
> +  break;
> +
> +case PKIND_MEMCPY:
> +case PKIND_MEMMOVE:
> +  ratio = MOVE_RATIO (optimize_loop_for_speed_p (loop));
> +  break;
> +
> +default:
> +  gcc_unreachable ();
> +}
> +
> +  t

Re: [PATCH] rtl-optimization/80960 - avoid creating garbage RTL in DSE

2021-01-27 Thread Richard Biener
On Wed, 27 Jan 2021, Jakub Jelinek wrote:

> On Wed, Jan 27, 2021 at 03:40:38PM +0100, Richard Biener wrote:
> > The following avoids repeatedly turning VALUE RTXen into
> > sth useful and re-applying a constant offset through get_addr
> > via DSE check_mem_read_rtx.  Instead perform this once for
> > all stores to be visited in check_mem_read_rtx.  This avoids
> > allocating 1.6GB of garbage PLUS RTXen on the PR80960
> > testcase, fixing the memory usage regression from old GCC.
> > 
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK?
> > 
> > Thanks,
> > Richard.
> > 
> > 2021-01-27  Richard Biener  
> > 
> > PR rtl-optimization/80960
> > * dse.c (check_mem_read_rtx): Call get_addr on the
> > offsetted address.
> > ---
> >  gcc/dse.c | 5 +
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/gcc/dse.c b/gcc/dse.c
> > index c88587e7d94..da0df54a2dd 100644
> > --- a/gcc/dse.c
> > +++ b/gcc/dse.c
> > @@ -2219,6 +2219,11 @@ check_mem_read_rtx (rtx *loc, bb_info_t bb_info)
> >  }
> >if (maybe_ne (offset, 0))
> >  mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset);
> > +  /* Avoid passing VALUE RTXen as mem_addr to canon_true_dependence
> > + which will over and over re-create proper RTL and re-apply the
> > + offset above.  See PR80960 where we almost allocate 1.6GB of PLUS
> > + RTXen that way.  */
> > +  mem_addr = get_addr (mem_addr);
> >  
> >if (group_id >= 0)
> >  {
> 
> Does that result in any changes on how much does DSE optimize?
> I mean, if you do 2 bootstraps/regtests, one with this patch and another one
> without it, and at the end of rest_of_handle_dse dump
> locally_deleted, globally_deleted
> for each CU/function, do you get the same counts except perhaps for dse.c?

I can check but all immediate first uses of mem_addr are in
true_dependece_1 which does x_addr = get_addr (x_addr); as the
first thing on it.  So the concern would be that
get_addr (get_addr (x_addr)) != get_addr (x_addr) which I think
shouldn't be the case (no semantic difference at least).

I'll see to do the test tomorrow.

Thanks,
Richard.


Re: [PATCH] rs6000: Fix vec insert ilp32 ICE and test failures [PR98799]

2021-01-27 Thread David Edelsohn via Gcc-patches
On Tue, Jan 26, 2021 at 10:56 PM Xionghu Luo  wrote:
>
> Hi,
>
> On 2021/1/27 03:00, David Edelsohn wrote:
> > On Tue, Jan 26, 2021 at 2:46 AM Xionghu Luo  wrote:
> >>
> >> From: "luo...@cn.ibm.com" 
> >>
> >> UNSPEC_SI_FROM_SF is not supported when TARGET_DIRECT_MOVE_64BIT
> >> is false for -m32, don't generate VIEW_CONVERT_EXPR(ARRAY_REF) for
> >> variable vector insert.  Remove rs6000_expand_vector_set_var helper
> >> function, adjust the p8 and p9 definitions position and make them
> >> static.
> >>
> >> The previous commit r11-6858 missed check m32, This patch is tested pass
> >> on P7BE{m32,m64}/P8BE{m32,m64}/P8LE/P9LE with
> >> RUNTESTFLAGS="--target_board =unix'{-m32,-m64}" for BE targets.
> >
> > Hi, Xionghu
> >
> > Thanks for addressing these failures and the cleanups.
> >
> > This patch addresses most of the failures.
> >
> > pr79251-run.c continues to fail.  The directives are not complete.
> > I'm not certain if your intention is to run the testcase on all
> > targets or only on Power7 and above.  The testcase relies on vector
> > "long long", which only is available with -mvsx, but the testcase only
> > enables -maltivec.  I believe that the testcase happens to pass on the
> > Linux platforms you tested because GCC defaulted to Power7 or Power8
> > ISA and the ABI specifies VSX.  The testcase probably needs to be
> > restricted to only run on some level of VSX enabled processor (VSX?
> > Power8? Power9?) and also needs some additional compiler options when
> > compiling the testcase instead of relying upon the default
> > configuration of the compiler.
>
>
> P8BE: gcc/testsuite/gcc/gcc.sum(it didn't run before due to no 'dg-do run'):
>
> Running target unix/-m32
> Running 
> /home/luoxhu/workspace/gcc/gcc/testsuite/gcc.target/powerpc/powerpc.exp ...
> PASS: gcc.target/powerpc/pr79251-run.c (test for excess errors)
> PASS: gcc.target/powerpc/pr79251-run.c execution test
> === gcc Summary for unix/-m32 ===
>
> # of expected passes2
> Running target unix/-m64
> Running 
> /home/luoxhu/workspace/gcc/gcc/testsuite/gcc.target/powerpc/powerpc.exp ...
> PASS: gcc.target/powerpc/pr79251-run.c (test for excess errors)
> PASS: gcc.target/powerpc/pr79251-run.c execution test
> === gcc Summary for unix/-m64 ===
>
> # of expected passes2
>
>
> How did you get the failure of pr79251-run.c, please?  I tested it all
> passes on P7BE{m32,m64}/P8BE{m32,m64}/P8LE/P9LE of Linux.  This case is
> just verifying the *functionality* of "u = vec_insert (254, v, k)" and
> compare whether u[k] is changed to 254, it must work on all platforms,
> no matter with the optimization or not, otherwise there is a functional
> error.  As to "long long", add target vsx_hw and powerpc like below?

AIX is not Linux.  Linux ABI requires VSX, at least when it defaults
to Power7 and above.  AIX does not.

I do not know if you have access to AIX systems and I don't want to
overly-complicate the testing by adding AIX.  The issue is that you
testcase is assuming VSX, but nothing in the directives to the
testcase ensured that VSX was enabled.

> (Also change the -maltive to -mvsx for pr79251.p8.c/pr79251.p9.c.)
>
> --- a/gcc/testsuite/gcc.target/powerpc/pr79251-run.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr79251-run.c
> @@ -1,4 +1,6 @@
> -/* { dg-options "-O2 -maltivec" } */
> +/* { dg-do run { target powerpc*-*-* } } */

Do not add target powerpc*-*-*, the testcase already is in the
gcc.target/powerpc directory.

> +/* { dg-require-effective-target vsx_hw { target powerpc*-*-* } } */

Similarly.  And dg-require-effective-target should not have a "target"
specifier.

> +/* { dg-options "-O2 -mvsx" } */

I will test with this change.

>
>
> Any other options necessary to limit the testcases? :)

Again, it's a question of what, exactly, you are trying to test.  This
will test with the default code generation for the target.  In other
words, it may not test that the new P8 and P9 code generation paths
run correctly.  You would need separate "run p8" and "run p9"
testcases that explicitly compile the code with -mdejagnu-cpu=power8
and -mdejagnu-cpu=power9.  Your "compile" testcases check that the
appropriate instructions appear in the assembly, but the "run"
testcase doesn't verify that the p8 or p9 code runs correctly.

Thanks, David

>
> >
> > Also, part of the change seems to be
> >
> >> -  if (TARGET_P9_VECTOR || GET_MODE_SIZE (inner_mode) == 8)
> >> -rs6000_expand_vector_set_var_p9 (target, val, idx);
> >> + if ((TARGET_P9_VECTOR && TARGET_POWERPC64) || width == 8)
> >> +   {
> >> + rs6000_expand_vector_set_var_p9 (target, val, elt_rtx);
> >> + return;
> >> +   }
> >
> > Does the P9 case need TARGET_POWERPC64?  This optimization seemed to
> > be functioning on P9 in 32 bit mode prior to this fix.  It would be a
> > shame to unnecessarily disable this optimization in 32 bit mode.  Or
> > maybe it generated a functioning sequence b

Re: [PATCH] rtl-optimization/80960 - avoid creating garbage RTL in DSE

2021-01-27 Thread Jakub Jelinek via Gcc-patches
On Wed, Jan 27, 2021 at 04:16:22PM +0100, Richard Biener wrote:
> I can check but all immediate first uses of mem_addr are in
> true_dependece_1 which does x_addr = get_addr (x_addr); as the
> first thing on it.  So the concern would be that
> get_addr (get_addr (x_addr)) != get_addr (x_addr) which I think
> shouldn't be the case (no semantic difference at least).

I guess you're right, indeed I can't find other uses of mem_addr
and it isn't saved, but just used in that one function, though
perhaps against more than one addr.
And in get_addr we most likely get into the:

  if (GET_CODE (x) != VALUE)
{
  if ((GET_CODE (x) == PLUS || GET_CODE (x) == MINUS)
  && GET_CODE (XEXP (x, 0)) == VALUE
  && CONST_SCALAR_INT_P (XEXP (x, 1)))
{
  rtx op0 = get_addr (XEXP (x, 0));
  if (op0 != XEXP (x, 0))
{
  poly_int64 c;
  if (GET_CODE (x) == PLUS
  && poly_int_rtx_p (XEXP (x, 1), &c))
return plus_constant (GET_MODE (x), op0, c);
  return simplify_gen_binary (GET_CODE (x), GET_MODE (x),
  op0, XEXP (x, 1));
}
}
  return x;
}

and if get_addr already is the VALUE it should be then it will not create
any further PLUS/MINUS.

One question is how often we do that
  if (maybe_ne (offset, 0))
mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset);
and with your patch also
  mem_addr = get_addr (mem_addr);
if there are no active local stores (or just no canon_true_dependence
calls for that read).
If it would happen often, we could further optimize compile time and memory
by changing mem_addr above this plus_constant into mem_addr_base,
add mem_addr var initially set to NULL and before each of those 3
canon_true_dependence calls do:
  if (mem_addr == NULL_RTX)
{
  mem_addr = mem_addr_base;
  if (maybe_ne (offset, 0))
mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset);
  mem_addr = get_addr (mem_addr);
}

Jakub



[PATCH] aarch64: Use GCC vector extensions for FP ml[as]_n intrinsics

2021-01-27 Thread Jonathan Wright via Gcc-patches
Hi,

As subject, this patch rewrites floating-point mla_n/mls_n intrinsics to use
a + b * c / a - b * c rather than inline assembly code, allowing for better
scheduling and optimization.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-01-18  Jonathan Wright  

* config/aarch64/arm_neon.h (vmla_n_f32): Use C rather than asm.
(vmlaq_n_f32): Likewise.
(vmls_n_f32): Likewise.
(vmlsq_n_f32): Likewise.
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index d891067f021a0bcc24af79dfbe2d9dd5889b23bc..d1ab3b7d54cd5b965f91e685139677864fcfe3e1 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -7233,13 +7233,7 @@ __extension__ extern __inline float32x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmla_n_f32 (float32x2_t __a, float32x2_t __b, float32_t __c)
 {
-  float32x2_t __result;
-  float32x2_t __t1;
-  __asm__ ("fmul %1.2s, %3.2s, %4.s[0]; fadd %0.2s, %0.2s, %1.2s"
-   : "=w"(__result), "=w"(__t1)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __a + __b * __c;
 }
 
 __extension__ extern __inline int16x4_t
@@ -7734,13 +7728,7 @@ __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmlaq_n_f32 (float32x4_t __a, float32x4_t __b, float32_t __c)
 {
-  float32x4_t __result;
-  float32x4_t __t1;
-  __asm__ ("fmul %1.4s, %3.4s, %4.s[0]; fadd %0.4s, %0.4s, %1.4s"
-   : "=w"(__result), "=w"(__t1)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __a + __b * __c;
 }
 
 __extension__ extern __inline int16x8_t
@@ -7827,13 +7815,7 @@ __extension__ extern __inline float32x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmls_n_f32 (float32x2_t __a, float32x2_t __b, float32_t __c)
 {
-  float32x2_t __result;
-  float32x2_t __t1;
-  __asm__ ("fmul %1.2s, %3.2s, %4.s[0]; fsub %0.2s, %0.2s, %1.2s"
-   : "=w"(__result), "=w"(__t1)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __a - __b * __c;
 }
 
 __extension__ extern __inline int16x4_t
@@ -8324,13 +8306,7 @@ __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmlsq_n_f32 (float32x4_t __a, float32x4_t __b, float32_t __c)
 {
-  float32x4_t __result;
-  float32x4_t __t1;
-  __asm__ ("fmul %1.4s, %3.4s, %4.s[0]; fsub %0.4s, %0.4s, %1.4s"
-   : "=w"(__result), "=w"(__t1)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __a - __b * __c;
 }
 
 __extension__ extern __inline int16x8_t


RE: [PATCH] aarch64: Use GCC vector extensions for FP ml[as]_n intrinsics

2021-01-27 Thread Kyrylo Tkachov via Gcc-patches
Hi Jonathan,

> -Original Message-
> From: Jonathan Wright 
> Sent: 27 January 2021 16:03
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: [PATCH] aarch64: Use GCC vector extensions for FP ml[as]_n
> intrinsics
> 
> Hi,
> 
> As subject, this patch rewrites floating-point mla_n/mls_n intrinsics to use
> a + b * c / a - b * c rather than inline assembly code, allowing for better
> scheduling and optimization.
> 
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
> 
> Ok for master?

I'm quite keen to remove that ugly inline asm, but I'm a bit concerned about 
the floating-point semantics now being affected by things like FP contractions.
The intrinsics are supposed to preserve the semantics of the instructions as 
much as possible.
Richard, does this mean we'll want to implement this using RTL builtins, like 
for the integer ones?
Thanks,
Kyrill

> 
> Thanks,
> Jonathan
> 
> ---
> 
> gcc/ChangeLog:
> 
> 2021-01-18  Jonathan Wright  
> 
> * config/aarch64/arm_neon.h (vmla_n_f32): Use C rather than asm.
> (vmlaq_n_f32): Likewise.
> (vmls_n_f32): Likewise.
> (vmlsq_n_f32): Likewise.



[PATCH] aarch64: Use RTL builtins for [su]mlal intrinsics

2021-01-27 Thread Jonathan Wright via Gcc-patches
Hi,

As subject, this patch rewrites [su]mlal Neon intrinsics to use RTL
builtins rather than inline assembly code, allowing for better
scheduling and optimization.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-01-26  Jonathan Wright  

* config/aarch64/aarch64-simd-builtins.def: Add [su]mlal
builtin generator macros.
* config/aarch64/aarch64-simd.md (*aarch64_mlal):
Rename to...
(aarch64_mlal): This.
* config/aarch64/arm_neon.h (vmlal_s8): Use RTL builtin
instead of inline asm.
(vmlal_s16): Likewise.
(vmlal_s32): Likewise.
(vmlal_u8): Likewise.
(vmlal_u16): Likewise.
(vmlal_u32): Likewise.
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 32aee6024a89e6ca1f423717463fe67d011afd8b..a71ae4d724136c8b626d397bf6187e8b595a2b8a 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -192,6 +192,10 @@
   BUILTIN_VD_BHSI (TERNOP, smlsl, 0, NONE)
   BUILTIN_VD_BHSI (TERNOPU, umlsl, 0, NONE)
 
+  /* Implemented by aarch64_mlal.  */
+  BUILTIN_VD_BHSI (TERNOP, smlal, 0, NONE)
+  BUILTIN_VD_BHSI (TERNOPU, umlal, 0, NONE)
+
   /* Implemented by aarch64_mlsl_hi.  */
   BUILTIN_VQW (TERNOP, smlsl_hi, 0, NONE)
   BUILTIN_VQW (TERNOPU, umlsl_hi, 0, NONE)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 544bac7dc9b62a9d5387465ec26d0e3204be6601..db56b61baf2093c88d8757b25580b3032f00a355 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1825,17 +1825,17 @@
 }
 )
 
-(define_insn "*aarch64_mlal"
+(define_insn "aarch64_mlal"
   [(set (match_operand: 0 "register_operand" "=w")
 (plus:
   (mult:
 (ANY_EXTEND:
-  (match_operand:VD_BHSI 1 "register_operand" "w"))
+  (match_operand:VD_BHSI 2 "register_operand" "w"))
 (ANY_EXTEND:
-  (match_operand:VD_BHSI 2 "register_operand" "w")))
-  (match_operand: 3 "register_operand" "0")))]
+  (match_operand:VD_BHSI 3 "register_operand" "w")))
+  (match_operand: 1 "register_operand" "0")))]
   "TARGET_SIMD"
-  "mlal\t%0., %1., %2."
+  "mlal\t%0., %2., %3."
   [(set_attr "type" "neon_mla__long")]
 )
 
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index d1ab3b7d54cd5b965f91e685139677864fcfe3e1..674ccc63b69ca1945dc684d2b06c1e31f52bfdb3 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -7656,72 +7656,42 @@ __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmlal_s8 (int16x8_t __a, int8x8_t __b, int8x8_t __c)
 {
-  int16x8_t __result;
-  __asm__ ("smlal %0.8h,%2.8b,%3.8b"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __builtin_aarch64_smlalv8qi (__a, __b, __c);
 }
 
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmlal_s16 (int32x4_t __a, int16x4_t __b, int16x4_t __c)
 {
-  int32x4_t __result;
-  __asm__ ("smlal %0.4s,%2.4h,%3.4h"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __builtin_aarch64_smlalv4hi (__a, __b, __c);
 }
 
 __extension__ extern __inline int64x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmlal_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c)
 {
-  int64x2_t __result;
-  __asm__ ("smlal %0.2d,%2.2s,%3.2s"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __builtin_aarch64_smlalv2si (__a, __b, __c);
 }
 
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmlal_u8 (uint16x8_t __a, uint8x8_t __b, uint8x8_t __c)
 {
-  uint16x8_t __result;
-  __asm__ ("umlal %0.8h,%2.8b,%3.8b"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __builtin_aarch64_umlalv8qi_ (__a, __b, __c);
 }
 
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmlal_u16 (uint32x4_t __a, uint16x4_t __b, uint16x4_t __c)
 {
-  uint32x4_t __result;
-  __asm__ ("umlal %0.4s,%2.4h,%3.4h"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __builtin_aarch64_umlalv4hi_ (__a, __b, __c);
 }
 
 __extension__ extern __inline uint64x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmlal_u32 (uint64x2_t __a, uint32x2_t __b, uint32x2_t __c)
 {
-  uint64x2_t __result;

Re: [PATCH] rtl-optimization/80960 - avoid creating garbage RTL in DSE

2021-01-27 Thread Richard Biener
On Wed, 27 Jan 2021, Jakub Jelinek wrote:

> On Wed, Jan 27, 2021 at 04:16:22PM +0100, Richard Biener wrote:
> > I can check but all immediate first uses of mem_addr are in
> > true_dependece_1 which does x_addr = get_addr (x_addr); as the
> > first thing on it.  So the concern would be that
> > get_addr (get_addr (x_addr)) != get_addr (x_addr) which I think
> > shouldn't be the case (no semantic difference at least).
> 
> I guess you're right, indeed I can't find other uses of mem_addr
> and it isn't saved, but just used in that one function, though
> perhaps against more than one addr.
> And in get_addr we most likely get into the:
> 
>   if (GET_CODE (x) != VALUE)
> {
>   if ((GET_CODE (x) == PLUS || GET_CODE (x) == MINUS)
>   && GET_CODE (XEXP (x, 0)) == VALUE
>   && CONST_SCALAR_INT_P (XEXP (x, 1)))
> {
>   rtx op0 = get_addr (XEXP (x, 0));
>   if (op0 != XEXP (x, 0))
> {
>   poly_int64 c;
>   if (GET_CODE (x) == PLUS
>   && poly_int_rtx_p (XEXP (x, 1), &c))
> return plus_constant (GET_MODE (x), op0, c);
>   return simplify_gen_binary (GET_CODE (x), GET_MODE (x),
>   op0, XEXP (x, 1));
> }
> }
>   return x;
> }
> 
> and if get_addr already is the VALUE it should be then it will not create
> any further PLUS/MINUS.
> 
> One question is how often we do that
>   if (maybe_ne (offset, 0))
> mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset);
> and with your patch also
>   mem_addr = get_addr (mem_addr);
> if there are no active local stores (or just no canon_true_dependence
> calls for that read).
> If it would happen often, we could further optimize compile time and memory
> by changing mem_addr above this plus_constant into mem_addr_base,
> add mem_addr var initially set to NULL and before each of those 3
> canon_true_dependence calls do:
>   if (mem_addr == NULL_RTX)
> {
>   mem_addr = mem_addr_base;
>   if (maybe_ne (offset, 0))
>   mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset);
>   mem_addr = get_addr (mem_addr);
> }

Sure, more micro-optimizing is possible, including passing a flag
to canon_true_dependence whether the addr RTX already had get_addr
called on it.  And pass in the offset as poly-rtx-int and make
get_addr apply it if not zero.  But I've mostly tried to address
the non-linearity here, after the patch the number of get_addr
and plus_constant calls should be linear in the number of loads
rather than O (#loads * #stores).

I've also tried to find the most minimalistic change at this point
(so it could be eventually backported).

Richard.


Re: [PATCH 1/2] Add std::experimental::simd from the Parallelism TS 2

2021-01-27 Thread Jonathan Wakely via Gcc-patches

On 18/12/20 16:49 +0100, Matthias Kretz wrote:

Resending this patch with proper commit message and rebased on master.

From: Matthias Kretz 

Adds .

This implements the simd and simd_mask class templates via
[[gnu::vector_size(N)]] data members. It implements overloads for all of
 for simd. Explicit vectorization of the  functions is not
finished.
The majority of functions are marked as [[gnu::always_inline]] to enable
quasi-ODR-conforming linking of TUs with different -m flags.
Performance optimization was done for x86_64. ARM, Aarch64, and POWER
rely on the compiler to recognize reduction, conversion, and shuffle
patterns.
Besides verification using many different machine flages, the code was
also verified with different fast-math flags.

libstdc++-v3/ChangeLog:
* doc/xml/manual/status_cxx2017.xml: Add implementation status
of the Parallelism TS 2. Document implementation-defined types
and behavior.
* include/Makefile.am: Add new headers.
* include/Makefile.in: Regenerate.
* include/experimental/simd: New file. New header for
Parallelism TS 2.
* include/experimental/bits/numeric_traits.h: New file.
Implementation of P1841R1 using internal naming. Addition of
missing IEC559 functionality query.
* include/experimental/bits/simd.h: New file. Definition of the
public simd interfaces and general implementation helpers.
* include/experimental/bits/simd_builtin.h: New file.
Implementation of the _VecBuiltin simd_abi.
* include/experimental/bits/simd_converter.h: New file. Generic
simd conversions.
* include/experimental/bits/simd_detail.h: New file. Internal
macros for the simd implementation.
* include/experimental/bits/simd_fixed_size.h: New file. Simd
fixed_size ABI specific implementations.
* include/experimental/bits/simd_math.h: New file. Math
overloads for simd.
* include/experimental/bits/simd_neon.h: New file. Simd NEON
specific implementations.
* include/experimental/bits/simd_ppc.h: New file. Implement bit
shifts to avoid invalid results for integral types smaller than
int.
* include/experimental/bits/simd_scalar.h: New file. Simd scalar
ABI specific implementations.
* include/experimental/bits/simd_x86.h: New file. Simd x86
specific implementations.
* include/experimental/bits/simd_x86_conversions.h: New file.
x86 specific conversion optimizations. The conversion patterns
work around missing conversion patterns in the compiler and
should be removed as soon as PR85048 is resolved.
* testsuite/experimental/simd/standard_abi_usable.cc: New file.
Test that all (not all fixed_size, though) standard simd and
simd_mask types are usable.
* testsuite/experimental/simd/standard_abi_usable_2.cc: New
file. As above but with -ffast-math.
* testsuite/libstdc++-dg/conformance.exp: Don't build simd tests
from the standard test loop. Instead use
check_vect_support_and_set_flags to build simd tests with the
relevant machine flags.
---
.../doc/xml/manual/status_cxx2017.xml |  216 +
libstdc++-v3/include/Makefile.am  |   13 +
libstdc++-v3/include/Makefile.in  |   13 +
.../experimental/bits/numeric_traits.h|  567 ++
libstdc++-v3/include/experimental/bits/simd.h | 5051 
.../include/experimental/bits/simd_builtin.h  | 2949 ++
.../experimental/bits/simd_converter.h|  354 ++
.../include/experimental/bits/simd_detail.h   |  306 +
.../experimental/bits/simd_fixed_size.h   | 2066 +++
.../include/experimental/bits/simd_math.h | 1500 +
.../include/experimental/bits/simd_neon.h |  519 ++
.../include/experimental/bits/simd_ppc.h  |  123 +
.../include/experimental/bits/simd_scalar.h   |  772 +++
.../include/experimental/bits/simd_x86.h  | 5169 +
.../experimental/bits/simd_x86_conversions.h  | 2029 +++
libstdc++-v3/include/experimental/simd|   70 +
.../experimental/simd/standard_abi_usable.cc  |   64 +
.../simd/standard_abi_usable_2.cc |4 +
.../testsuite/libstdc++-dg/conformance.exp|   18 +-
19 files changed, 21802 insertions(+), 1 deletion(-)
create mode 100644 libstdc++-v3/include/experimental/bits/numeric_traits.h
create mode 100644 libstdc++-v3/include/experimental/bits/simd.h
create mode 100644 libstdc++-v3/include/experimental/bits/simd_builtin.h
create mode 100644 libstdc++-v3/include/experimental/bits/simd_converter.h
create mode 100644 libstdc++-v3/include/experimental/bits/simd_detail.h
create mode 100644 libstdc++-v3/include/experimental/bits/simd_fixed_size.h
create mode 100644 libstdc++-v3/include/experimental/bits/simd_math.h
create mode 100644 libstdc++-v3/include/experimental/bits/simd_neon.h
create mode 100644 libstdc++-v3/

Re: [PATCH] rtl-optimization/80960 - avoid creating garbage RTL in DSE

2021-01-27 Thread Jakub Jelinek via Gcc-patches
On Wed, Jan 27, 2021 at 05:37:54PM +0100, Richard Biener wrote:
> Sure, more micro-optimizing is possible, including passing a flag
> to canon_true_dependence whether the addr RTX already had get_addr
> called on it.  And pass in the offset as poly-rtx-int and make
> get_addr apply it if not zero.  But I've mostly tried to address
> the non-linearity here, after the patch the number of get_addr
> and plus_constant calls should be linear in the number of loads
> rather than O (#loads * #stores).
> 
> I've also tried to find the most minimalistic change at this point
> (so it could be eventually backported).

Ok.  I'll gather stats incrementally and see if it is worth to do something
further later.

Jakub



Re: [PATCH 2/2] Add simd testsuite

2021-01-27 Thread Jonathan Wakely via Gcc-patches

On 18/12/20 16:49 +0100, Matthias Kretz wrote:

Resending squashed patch after addressing Jonathan's comments.

From: Matthias Kretz 

Add a new check-simd target to the testsuite. The new target creates a
subdirectory, generates the necessary Makefiles, and spawns submakes to
build and run the tests. Running this testsuite with defaults on my
machine takes half of the time the dejagnu testsuite required to only
determine whether to run tests. Since the simd testsuite integrated in
dejagnu increased the time of the whole libstdc++ testsuite by ~100%
this approach is a compromise for speed while not sacrificing coverage
too much. Since the test driver is invoked individually per test
executable from a Makefile, make's jobserver (-j) trivially parallelizes
testing.

Testing different flags and with simulator (or remote execution) is
possible. E.g. `make check-simd DRIVEROPTS=-q
target_list="unix{-m64,-m32}{-march=sandybridge,-march=skylake-avx512}{,-
ffast-math}"`
runs the testsuite 8 times in different subdirectories, using 8
different combinations of compiler flags, only outputs failing tests
(-q), and prints all summaries at the end. It skips most ABI tags by
default unless --run-expensive is passed to DRIVEROPTS or
GCC_TEST_RUN_EXPENSIVE is not empty.

To use a simulator, the CHECK_SIMD_CONFIG variable needs to point to a
shell script which calls `define_target   ` and
set target_list as needed. E.g.:
case "$target_triplet" in
x86_64-*)
 target_list="unix{-march=sandybridge,-march=skylake-avx512}
 ;;
powerpc64le-*)
 define_target power8 "-static -mcpu=power8" "/usr/bin/qemu-ppc64le -cpu
power8"
 define_target power9 -mcpu=power9 "$HOME/bin/run_on_gcc135"
 target_list="power8 power9{,-ffast-math}"
 ;;
esac

libstdc++-v3/ChangeLog:
* scripts/check_simd: New file. This script is called from the
the check-simd target. It determines a set of compiler flags and
simulator setups for calling generate_makefile.sh and passes the
information back to the check-simd target, which recurses to the
generated Makefiles.
* scripts/create_testsuite_files: Remove files below simd/tests/
from testsuite_files and place them in testsuite_files_simd.
* testsuite/Makefile.am: Add testsuite_files_simd. Add
check-simd target.
* testsuite/Makefile.in: Regenerate.
* testsuite/experimental/simd/driver.sh: New file. This script
compiles and runs a given simd test, logging its output and
status. It uses the timeout command to implement compile and
test timeouts.
* testsuite/experimental/simd/generate_makefile.sh: New file.
This script generates a Makefile which uses driver.sh to compile
and run the tests and collect the logs into a single log file.
* testsuite/experimental/simd/tests/abs.cc: New file. Tests
abs(simd).
* testsuite/experimental/simd/tests/algorithms.cc: New file.
Tests min/max(simd, simd).
* testsuite/experimental/simd/tests/bits/conversions.h: New
file. Contains functions to support tests involving conversions.
* testsuite/experimental/simd/tests/bits/make_vec.h: New file.
Support functions make_mask and make_vec.
* testsuite/experimental/simd/tests/bits/mathreference.h: New
file. Support functions to supply precomputed math function
reference data.
* testsuite/experimental/simd/tests/bits/metahelpers.h: New
file. Support code for SFINAE testing.
* testsuite/experimental/simd/tests/bits/simd_view.h: New file.
* testsuite/experimental/simd/tests/bits/test_values.h: New
file. Test functions to easily drive a test with simd objects
initialized from a given list of values and a range of random
values.
* testsuite/experimental/simd/tests/bits/ulp.h: New file.
Support code to determine the ULP distance of simd objects.
* testsuite/experimental/simd/tests/bits/verify.h: New file.
Test framework for COMPARE'ing simd objects and instantiating
the test templates with value_type and ABI tag.
* testsuite/experimental/simd/tests/broadcast.cc: New file. Test
simd broadcasts.
* testsuite/experimental/simd/tests/casts.cc: New file. Test
simd casts.
* testsuite/experimental/simd/tests/fpclassify.cc: New file.
Test floating-point classification functions.
* testsuite/experimental/simd/tests/frexp.cc: New file. Test
frexp(simd).
* testsuite/experimental/simd/tests/generator.cc: New file. Test
simd generator constructor.
* testsuite/experimental/simd/tests/hypot3_fma.cc: New file.
Test 3-arg hypot(simd,simd,simd) and fma(simd,simd,sim).
* testsuite/experimental/simd/tests/integer_operators.cc: New
file. Test integer operators.
* testsuite/experimental/simd/tests/ldexp_scalbn_scalbln_modf.cc:
New fil

RE: [PATCH] aarch64: Use RTL builtins for [su]mlal intrinsics

2021-01-27 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Jonathan Wright 
> Sent: 27 January 2021 16:28
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: [PATCH] aarch64: Use RTL builtins for [su]mlal intrinsics
> 
> Hi,
> 
> As subject, this patch rewrites [su]mlal Neon intrinsics to use RTL
> builtins rather than inline assembly code, allowing for better
> scheduling and optimization.
> 
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
> 
> Ok for master?
> 

Ok.
Thanks,
Kyrill

> Thanks,
> Jonathan
> 
> ---
> 
> gcc/ChangeLog:
> 
> 2021-01-26  Jonathan Wright  
> 
> * config/aarch64/aarch64-simd-builtins.def: Add [su]mlal
> builtin generator macros.
> * config/aarch64/aarch64-simd.md (*aarch64_mlal):
> Rename to...
> (aarch64_mlal): This.
> * config/aarch64/arm_neon.h (vmlal_s8): Use RTL builtin
> instead of inline asm.
> (vmlal_s16): Likewise.
> (vmlal_s32): Likewise.
> (vmlal_u8): Likewise.
> (vmlal_u16): Likewise.
> (vmlal_u32): Likewise.



Re: [PATCH] aarch64: Use GCC vector extensions for FP ml[as]_n intrinsics

2021-01-27 Thread Richard Sandiford via Gcc-patches
Kyrylo Tkachov  writes:
> Hi Jonathan,
>
>> -Original Message-
>> From: Jonathan Wright 
>> Sent: 27 January 2021 16:03
>> To: gcc-patches@gcc.gnu.org
>> Cc: Kyrylo Tkachov 
>> Subject: [PATCH] aarch64: Use GCC vector extensions for FP ml[as]_n
>> intrinsics
>>
>> Hi,
>>
>> As subject, this patch rewrites floating-point mla_n/mls_n intrinsics to use
>> a + b * c / a - b * c rather than inline assembly code, allowing for better
>> scheduling and optimization.
>>
>> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
>> issues.
>>
>> Ok for master?
>
> I'm quite keen to remove that ugly inline asm, but I'm a bit concerned about 
> the floating-point semantics now being affected by things like FP 
> contractions.
> The intrinsics are supposed to preserve the semantics of the instructions as 
> much as possible.
> Richard, does this mean we'll want to implement this using RTL builtins, like 
> for the integer ones?

It seems like a grey area in the spec.  E.g. vmlaq_f32 is described as:

RESULT[I] = a[i] + (b[i] * c[i]) for i = 0 to 3

which could be taken to mean that it behaves in the same way as the
C arithmetic would, and so should be subject to -ffp-contract.

At the moment, a separate vmulq_f32 and vaddq_f32 could be fused,
but that's arguably a bug, since the spec says that they should
behave like FMUL and FADD respectively.  So:

* At the moment, vmla_* is the only way of forcibly disabling fusing.

* -ffp-contract has different defaults between Clang and GCC,
  and the default GCC behaviour would be to contract the vmlas.

* It would be a change in behaviour from previous releases.

So I agree we should probably use builtins.

We'd need to be careful that we don't grow define_insns or RTL
optimisations that do their own fusing of separate MULTs and ADDs.
I think we should have new tests to make sure that we generate
separate FMULs and FADDs, if we don't already.

Thanks,
Richard


Re: [PATCH 2/2] Add simd testsuite

2021-01-27 Thread Jonathan Wakely via Gcc-patches

On 27/01/21 16:45 +, Jonathan Wakely wrote:

I'll regen the docs [...]


Done. Regenerating the docs needed the attached fix.


commit 3670dbe49059ab1746ac2e3b77940160c05db6c2
Author: Jonathan Wakely 
Date:   Wed Jan 27 17:52:27 2021

libstdc++: Regenerate libstdc++ HTML docs

libstdc++-v3/ChangeLog:

* doc/xml/manual/status_cxx2017.xml: Replace invalid entity.
* doc/html/*: Regenerate.

diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
index bc740f8e1ba..f97fc060fa0 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
@@ -3113,7 +3113,7 @@ since C++14 and the implementation is complete.
 is supported if __ALTIVEC__ is defined and sizeof(T)
 < 8. Additionally, double is supported if
 __VSX__ is defined, and any T with 
-sizeof(T) ≤ 8 is supported if __POWER8_VECTOR__
+sizeof(T) <= 8 is supported if __POWER8_VECTOR__
 is defined.
 
 On x86, given an extended ABI tag Abi,


Re: [PATCH 2/2] Add simd testsuite

2021-01-27 Thread Jonathan Wakely via Gcc-patches

On 27/01/21 17:54 +, Jonathan Wakely wrote:

and add something to the release notes too.


Also done. Pushed to wwwdocs.


commit f948177c3d01d09cbc8035a75583d425a4dca46e
Author: Jonathan Wakely 
Date:   Wed Jan 27 18:30:00 2021 +

Document simd additions to libstdc++

diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html
index 9b86e6c8..efbf3341 100644
--- a/htdocs/gcc-11/changes.html
+++ b/htdocs/gcc-11/changes.html
@@ -329,6 +329,9 @@ a work-in-progress.
   
 
   
+  Experimental support for Data-Parallel Types (simd)
+from the Parallelism 2 TS, thanks to Matthias Kretz.
+  
   Faster std::uniform_int_distribution,
   thanks to Daniel Lemire.
   


Re: [PATCH] PowerPC: Map IEEE 128-bit long double built-ins.

2021-01-27 Thread will schmidt via Gcc-patches
On Thu, 2021-01-14 at 11:59 -0500, Michael Meissner via Gcc-patches wrote:
> From 78435dee177447080434cdc08fc76b1029c7f576 Mon Sep 17 00:00:00 2001
> From: Michael Meissner 
> Date: Wed, 13 Jan 2021 21:47:03 -0500
> Subject: [PATCH] PowerPC: Map IEEE 128-bit long double built-ins.
> 
> This patch replaces patches previously submitted:
> 
> September 24th, 2020:
> Message-ID: <20200924203159.ga31...@ibm-toto.the-meissners.org>
> 
> October 9th, 2020:
> Message-ID: <20201009043543.ga11...@ibm-toto.the-meissners.org>
> 
> October 24th, 2020:
> Message-ID: <2020100346.ga8...@ibm-toto.the-meissners.org>
> 
> November 19th, 2020:
> Message-ID: <20201119235814.ga...@ibm-toto.the-meissners.org>


Subject and date should be sufficient _if_ having the old versions
of the patchs are necessary to review the latest version of the
patch.  Which ideally is not the case.


> 
> This patch maps the built-in functions that take or return long double
> arguments on systems where long double is IEEE 128-bit.
> 
> If long double is IEEE 128-bit, this patch goes through the built-in functions
> and changes the name of the math, scanf, and printf built-in functions to use
> the functions that GLIBC provides when long double uses the IEEE 128-bit
> representation.

ok.

> 
> In addition, changing the name in GCC allows the Fortran compiler to
> automatically use the correct name.

Does the fortran compiler currently use the wrong name? (pr?)

> 
> To map the math functions, typically this patch changes l to
> __ieee128.  However there are some exceptions that are handled with this
> patch.

This appears to be  the rs6000_mangle_decl_assembler_name() function, which
also maps l_r to ieee128_r, and looks like some additional special
handling for printf and scanf.  


> To map the printf functions,  is mapped to __ieee128.
> 
> To map the scanf functions,  is mapped to __isoc99_ieee128.


> 
> I have tested this patch by doing builds, bootstraps, and make check with 3
> builds on a power9 little endian server:
> 
> * Build one used the default long double being IBM 128-bit;
> * Build two set the long double default to IEEE 128-bit; (and)
> * Build three set the long double default to 64-bit.
> 

ok

> The compilers built fine providing I recompiled gmp, mpc, and mpfr with the
> appropriate long double options.

Presumably the build is otherwise broken... 
Does that mean more than invoking download_preqrequisites as part of the
build?   If there are specific options required during configure/build of
those packages, they should be called out.

> There were a few differences in the test
> suite runs that will be addressed in later patches, but over all it works
> well.

Presumably minimal. :-)


>   This patch is required to be able to build a toolchain where the default
> long double is IEEE 128-bit. 

Ok.   Could lead the patch description with this,.  I imagine this is
just one of several patches that are still required towrards that goal.



>  Can I check this patch into the master branch for
> GCC 11?





> 
> gcc/
> 2021-01-14  Michael Meissner  
> 
>   * config/rs6000/rs6000.c (ieee128_builtin_name): New function.
>   (built_in_uses_long_double): New function.
>   (identifier_ends_in_suffix): New function.
>   (rs6000_mangle_decl_assembler_name): Update support for mapping built-in
>   function names for long double built-in functions if long double is
>   IEEE 128-bit to catch all of the built-in functions that take or
>   return long double arguments.
> 
> gcc/testsuite/
> 2021-01-14  Michael Meissner  
> 
>   * gcc.target/powerpc/float128-longdouble-math.c: New test.
>   * gcc.target/powerpc/float128-longdouble-stdio.c: New test.
>   * gcc.target/powerpc/float128-math.c: Adjust test for new name
>   being generated.  Add support for running test on power10.  Add
>   support for running if long double defaults to 64-bits.
> ---
>  gcc/config/rs6000/rs6000.c| 239 --
>  .../powerpc/float128-longdouble-math.c| 442 ++
>  .../powerpc/float128-longdouble-stdio.c   |  36 ++
>  .../gcc.target/powerpc/float128-math.c|  16 +-
>  4 files changed, 694 insertions(+), 39 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/powerpc/float128-longdouble-math.c
>  create mode 100644 
> gcc/testsuite/gcc.target/powerpc/float128-longdouble-stdio.c
> 
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 6f48dd6566d..282703b9715 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -27100,6 +27100,172 @@ rs6000_globalize_decl_name (FILE * stream, tree 
> decl)
>  #endif
> 
>  
> +/* If long double uses the IEEE 128-bit representation, return the name used
> +   within GLIBC for the IEEE 128-bit long double built-in, instead of the
> +   default IBM 128-bit long double built-in.  Or return NULL if the built-in
> +   function does not use long double.  */

[PATCH] aarch64: Fix up *aarch64_bfxilsi_uxtw [PR98853]

2021-01-27 Thread Jakub Jelinek via Gcc-patches
Hi!

The https://gcc.gnu.org/legacy-ml/gcc-patches/2018-07/msg01895.html
patch that introduced this pattern claimed:
Would generate:

combine_balanced_int:
bfxil   w0, w1, 0, 16
uxtwx0, w0
ret

But with this patch generates:

combine_balanced_int:
bfxil   w0, w1, 0, 16
ret
and it is indeed what it should generate, but it doesn't do that,
it emits bfxil  x0, x1, 0, 16
instead which doesn't zero extend from 32 to 64 bits, but preserves
the bits from the destination register.

The following patch fixes that, bootstrapped/regtested on aarch64-linux,
ok for trunk (and later backports)?

2021-01-27  Jakub Jelinek  

PR target/98853
* config/aarch64/aarch64.md (*aarch64_bfxilsi_uxtw): Use
%w0, %w1 and %2 instead of %0, %1 and %2.

* gcc.c-torture/execute/pr98853-1.c: New test.
* gcc.c-torture/execute/pr98853-2.c: New test.

--- gcc/config/aarch64/aarch64.md.jj2021-01-04 10:25:46.435147744 +0100
+++ gcc/config/aarch64/aarch64.md   2021-01-27 15:13:13.993275204 +0100
@@ -5724,10 +5724,10 @@ (define_insn "*aarch64_bfxilsi_uxtw"
 {
   case 0:
operands[3] = GEN_INT (ctz_hwi (~INTVAL (operands[3])));
-   return "bfxil\\t%0, %1, 0, %3";
+   return "bfxil\\t%w0, %w1, 0, %3";
   case 1:
operands[3] = GEN_INT (ctz_hwi (~INTVAL (operands[4])));
-   return "bfxil\\t%0, %2, 0, %3";
+   return "bfxil\\t%w0, %w2, 0, %3";
   default:
gcc_unreachable ();
 }
--- gcc/testsuite/gcc.c-torture/execute/pr98853-1.c.jj  2021-01-27 
15:26:15.544335342 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr98853-1.c 2021-01-27 
15:28:37.877710203 +0100
@@ -0,0 +1,21 @@
+/* PR target/98853 */
+
+#if __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8 && __BYTE_ORDER__ == 
__ORDER_LITTLE_ENDIAN__
+__attribute__((__noipa__)) unsigned long long
+foo (unsigned x, unsigned long long y, unsigned long long z)
+{
+  __builtin_memcpy (2 + (char *) &x, 2 + (char *) &y, 2);
+  return x + z;
+}
+#endif
+
+int
+main ()
+{
+#if __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8 && __BYTE_ORDER__ == 
__ORDER_LITTLE_ENDIAN__
+  if (foo (0xU, 0xULL, 0xULL)
+  != 0xULL)
+__builtin_abort ();
+#endif
+  return 0;
+}
--- gcc/testsuite/gcc.c-torture/execute/pr98853-2.c.jj  2021-01-27 
19:35:52.312351623 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr98853-2.c 2021-01-27 
19:37:51.369515183 +0100
@@ -0,0 +1,19 @@
+/* PR target/98853 */
+
+#if __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8
+__attribute__((noipa)) unsigned long long
+foo (unsigned long long x, unsigned int y)
+{
+  return ((unsigned) x & 0xfffeU) | (y & 0x1);
+}
+#endif
+
+int
+main ()
+{
+#if __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8
+  if (foo (0xdeadbeefcaf2babeULL, 0xdeaffeedU) != 0xcaf3feedULL)
+__builtin_abort ();
+#endif
+  return 0;
+}

Jakub



[PATCH v2] IBM Z: Fix usage of "f" constraint with long doubles

2021-01-27 Thread Ilya Leoshkevich via Gcc-patches
v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563799.html

v1 -> v2: Handle constraint modifiers, use AR constraint instead of R,
add testcases for & and %.




After switching the s390 backend to store long doubles in vector
registers, "f" constraint broke when used with the former: long doubles
correspond to TFmode, which in combination with "f" corresponds to
hard regs %v0-%v15, however, asm users expect a %f0-%f15 pair.

Fix by using TARGET_MD_ASM_ADJUST hook to convert TFmode values to
FPRX2mode and back.

gcc/ChangeLog:

2020-12-14  Ilya Leoshkevich  

* config/s390/s390.c (f_constraint_p): New function.
(s390_md_asm_adjust): Implement TARGET_MD_ASM_ADJUST.
(TARGET_MD_ASM_ADJUST): Likewise.
* config/s390/vector.md (fprx2_to_tf): Rename from *fprx2_to_tf,
add memory alternative.
(tf_to_fprx2): New pattern.

gcc/testsuite/ChangeLog:

2020-12-14  Ilya Leoshkevich  

* gcc.target/s390/vector/long-double-asm-abi.c: New test.
* gcc.target/s390/vector/long-double-asm-commutative.c: New
test.
* gcc.target/s390/vector/long-double-asm-earlyclobber.c: New
test.
* gcc.target/s390/vector/long-double-asm-in-out.c: New test.
* gcc.target/s390/vector/long-double-asm-inout.c: New test.
* gcc.target/s390/vector/long-double-volatile-from-i64.c: New
test.
---
 gcc/config/s390/s390.c| 88 +++
 gcc/config/s390/vector.md | 36 ++--
 .../s390/vector/long-double-asm-abi.c | 26 ++
 .../s390/vector/long-double-asm-commutative.c | 16 
 .../vector/long-double-asm-earlyclobber.c | 17 
 .../s390/vector/long-double-asm-in-out.c  | 14 +++
 .../s390/vector/long-double-asm-inout.c   | 14 +++
 .../s390/vector/long-double-asm-matching.c| 13 +++
 .../vector/long-double-volatile-from-i64.c| 22 +
 9 files changed, 241 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-abi.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-commutative.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-earlyclobber.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-inout.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-matching.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-volatile-from-i64.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 9d2cee950d0..d4b098325e8 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -16688,6 +16688,91 @@ s390_shift_truncation_mask (machine_mode mode)
   return mode == DImode || mode == SImode ? 63 : 0;
 }
 
+/* Return TRUE iff CONSTRAINT is an "f" constraint, possibly with additional
+   modifiers.  */
+
+static bool
+f_constraint_p (const char *constraint)
+{
+  for (size_t i = 0, c_len = strlen (constraint); i < c_len;
+   i += CONSTRAINT_LEN (constraint[i], constraint + i))
+{
+  if (constraint[i] == 'f')
+   return true;
+}
+  return false;
+}
+
+/* Implement TARGET_MD_ASM_ADJUST hook in order to fix up "f"
+   constraints when long doubles are stored in vector registers.  */
+
+static rtx_insn *
+s390_md_asm_adjust (vec &outputs, vec &inputs,
+   vec &input_modes,
+   vec &constraints, vec & /*clobbers*/,
+   HARD_REG_SET & /*clobbered_regs*/)
+{
+  if (!TARGET_VXE)
+/* Long doubles are stored in FPR pairs - nothing to do.  */
+return NULL;
+
+  rtx_insn *after_md_seq = NULL, *after_md_end = NULL;
+
+  unsigned ninputs = inputs.length ();
+  unsigned noutputs = outputs.length ();
+  for (unsigned i = 0; i < noutputs; i++)
+{
+  if (GET_MODE (outputs[i]) != TFmode)
+   /* Not a long double - nothing to do.  */
+   continue;
+  const char *constraint = constraints[i];
+  bool allows_mem, allows_reg, is_inout;
+  bool ok = parse_output_constraint (&constraint, i, ninputs, noutputs,
+&allows_mem, &allows_reg, &is_inout);
+  gcc_assert (ok);
+  if (!f_constraint_p (constraint + 1))
+   /* Long double with a constraint other than "=f" - nothing to do.  */
+   continue;
+  gcc_assert (allows_reg);
+  gcc_assert (!allows_mem);
+  gcc_assert (!is_inout);
+  /* Copy output value from a FPR pair into a vector register.  */
+  rtx fprx2 = gen_reg_rtx (FPRX2mode);
+  push_to_sequence2 (after_md_seq, after_md_end);
+  emit_insn (gen_fprx2_to_tf (outputs[i], fprx2));
+  after_md_seq = get_insns ();
+  after_md_end = get_last_insn ();
+  end_sequence ();
+  outputs[i] = fprx2;
+}
+
+  for (unsigned i = 0; i < ninputs; i++)
+{
+  if (GET_MODE (inputs[i]) != TFmode)
+   /* Not a long double - not

RE: [PATCH] aarch64: Fix up *aarch64_bfxilsi_uxtw [PR98853]

2021-01-27 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Jakub Jelinek 
> Sent: 27 January 2021 19:11
> To: Richard Earnshaw ; Richard Sandiford
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Cc: gcc-patches@gcc.gnu.org
> Subject: [PATCH] aarch64: Fix up *aarch64_bfxilsi_uxtw [PR98853]
> 
> Hi!
> 
> The https://gcc.gnu.org/legacy-ml/gcc-patches/2018-07/msg01895.html
> patch that introduced this pattern claimed:
> Would generate:
> 
> combine_balanced_int:
> bfxil   w0, w1, 0, 16
> uxtwx0, w0
> ret
> 
> But with this patch generates:
> 
> combine_balanced_int:
> bfxil   w0, w1, 0, 16
> ret
> and it is indeed what it should generate, but it doesn't do that,
> it emits bfxilx0, x1, 0, 16
> instead which doesn't zero extend from 32 to 64 bits, but preserves
> the bits from the destination register.
> 
> The following patch fixes that, bootstrapped/regtested on aarch64-linux,
> ok for trunk (and later backports)?

Ok.
Thanks,
Kyrill

> 
> 2021-01-27  Jakub Jelinek  
> 
>   PR target/98853
>   * config/aarch64/aarch64.md (*aarch64_bfxilsi_uxtw): Use
>   %w0, %w1 and %2 instead of %0, %1 and %2.
> 
>   * gcc.c-torture/execute/pr98853-1.c: New test.
>   * gcc.c-torture/execute/pr98853-2.c: New test.
> 
> --- gcc/config/aarch64/aarch64.md.jj  2021-01-04 10:25:46.435147744
> +0100
> +++ gcc/config/aarch64/aarch64.md 2021-01-27 15:13:13.993275204
> +0100
> @@ -5724,10 +5724,10 @@ (define_insn "*aarch64_bfxilsi_uxtw"
>  {
>case 0:
>   operands[3] = GEN_INT (ctz_hwi (~INTVAL (operands[3])));
> - return "bfxil\\t%0, %1, 0, %3";
> + return "bfxil\\t%w0, %w1, 0, %3";
>case 1:
>   operands[3] = GEN_INT (ctz_hwi (~INTVAL (operands[4])));
> - return "bfxil\\t%0, %2, 0, %3";
> + return "bfxil\\t%w0, %w2, 0, %3";
>default:
>   gcc_unreachable ();
>  }
> --- gcc/testsuite/gcc.c-torture/execute/pr98853-1.c.jj2021-01-27
> 15:26:15.544335342 +0100
> +++ gcc/testsuite/gcc.c-torture/execute/pr98853-1.c   2021-01-27
> 15:28:37.877710203 +0100
> @@ -0,0 +1,21 @@
> +/* PR target/98853 */
> +
> +#if __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8 &&
> __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> +__attribute__((__noipa__)) unsigned long long
> +foo (unsigned x, unsigned long long y, unsigned long long z)
> +{
> +  __builtin_memcpy (2 + (char *) &x, 2 + (char *) &y, 2);
> +  return x + z;
> +}
> +#endif
> +
> +int
> +main ()
> +{
> +#if __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8 &&
> __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> +  if (foo (0xU, 0xULL,
> 0xULL)
> +  != 0xULL)
> +__builtin_abort ();
> +#endif
> +  return 0;
> +}
> --- gcc/testsuite/gcc.c-torture/execute/pr98853-2.c.jj2021-01-27
> 19:35:52.312351623 +0100
> +++ gcc/testsuite/gcc.c-torture/execute/pr98853-2.c   2021-01-27
> 19:37:51.369515183 +0100
> @@ -0,0 +1,19 @@
> +/* PR target/98853 */
> +
> +#if __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8
> +__attribute__((noipa)) unsigned long long
> +foo (unsigned long long x, unsigned int y)
> +{
> +  return ((unsigned) x & 0xfffeU) | (y & 0x1);
> +}
> +#endif
> +
> +int
> +main ()
> +{
> +#if __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ == 8
> +  if (foo (0xdeadbeefcaf2babeULL, 0xdeaffeedU) !=
> 0xcaf3feedULL)
> +__builtin_abort ();
> +#endif
> +  return 0;
> +}
> 
>   Jakub



Re: PR fortran/93524 - rank >= 3 array stride incorrectly set in CFI_establish

2021-01-27 Thread Harris Snyder
Hi all,

Now that my copyright assignment is complete, I'm submitting this fix.
Test cases are included.
OK for master? I do not have write access, so someone will need to
commit this for me.

Regards,
Harris

libgfortran/ChangeLog:

* runtime/ISO_Fortran_binding.c (CFI_establish):  fixed strides
for rank >2 arrays

gcc/testsuite/ChangeLog:

* gfortran.dg/ISO_Fortran_binding_18.c: New test.
* gfortran.dg/ISO_Fortran_binding_18.f90: New test.




On Wed, Jan 13, 2021 at 2:10 PM Harris Snyder  wrote:
>
> Hi Tobias / all,
>
> Further related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93524
> `sm` is being incorrectly computed in CFI_establish. Take a look at
> the diff below - we are currently only using the extent of the
> previous rank to assign `sm`, instead of all previous ranks. Have I
> got this right, or am I missing something / does this need to be
> handled differently? I can offer some test cases and submit a proper
> patch if we think this solution is OK...
>
> Thanks,
> Harris
>
> diff --git a/libgfortran/runtime/ISO_Fortran_binding.c
> b/libgfortran/runtime/ISO_Fortran_binding.c
> index 3746ec1c681..20833ad2025 100644
> --- a/libgfortran/runtime/ISO_Fortran_binding.c
> +++ b/libgfortran/runtime/ISO_Fortran_binding.c
> @@ -391,7 +391,12 @@ int CFI_establish (CFI_cdesc_t *dv, void
> *base_addr, CFI_attribute_t attribute,
>   if (i == 0)
> dv->dim[i].sm = dv->elem_len;
>   else
> -   dv->dim[i].sm = (CFI_index_t)(dv->elem_len * extents[i - 1]);
> +   {
> + CFI_index_t extents_product = 1;
> + for (int j = 0; j < i; j++)
> +   extents_product *= extents[j];
> + dv->dim[i].sm = (CFI_index_t)(dv->elem_len * extents_product);
> +   }
> }
>  }
commit 451bd40aca006ebdba52553de2392fcb5b1ff42f
Author: Harris M. Snyder 
Date:   Tue Jan 26 23:29:24 2021 -0500

Partial fix for PR fortran/93524

diff --git a/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_18.c b/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_18.c
new file mode 100644
index 000..4d1c4ecbd72
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_18.c
@@ -0,0 +1,29 @@
+#include 
+
+#include 
+#include 
+
+
+
+extern int do_loop(CFI_cdesc_t* array);
+
+int main(int argc, char ** argv)
+{
+	int nx = 9;
+	int ny = 10;
+	int nz = 2;
+
+	int arr[nx*ny*nz];
+	memset(arr,0,sizeof(int)*nx*ny*nz);
+	CFI_index_t shape[3];
+	shape[0] = nz;
+	shape[1] = ny;
+	shape[2] = nx;
+
+	CFI_CDESC_T(3) farr;
+	int rc = CFI_establish((CFI_cdesc_t*)&farr, arr, CFI_attribute_other, CFI_type_int, 0, (CFI_rank_t)3, (const CFI_index_t *)shape);
+	if (rc != CFI_SUCCESS) abort();
+	int result = do_loop((CFI_cdesc_t*)&farr);
+	if (result != nx*ny*nz) abort();
+	return 0;
+}
diff --git a/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_18.f90 b/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_18.f90
new file mode 100644
index 000..76be51d22fb
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_18.f90
@@ -0,0 +1,28 @@
+! { dg-do run }
+! { dg-additional-sources ISO_Fortran_binding_18.c }
+
+module fortran_binding_test_18
+use iso_c_binding
+implicit none
+contains
+
+subroutine test(array)
+integer(c_int) :: array(:)
+array = 1
+end subroutine
+
+function do_loop(array) result(the_sum) bind(c)
+integer(c_int), intent(in out) :: array(:,:,:)
+integer(c_int) :: the_sum, i, j
+
+the_sum = 0  
+array = 0
+do i=1,size(array,3)
+do j=1,size(array,2)
+call test(array(:,j,i))
+end do
+end do
+the_sum = sum(array)
+end function
+
+end module
diff --git a/libgfortran/runtime/ISO_Fortran_binding.c b/libgfortran/runtime/ISO_Fortran_binding.c
index 3746ec1c681..20833ad2025 100644
--- a/libgfortran/runtime/ISO_Fortran_binding.c
+++ b/libgfortran/runtime/ISO_Fortran_binding.c
@@ -391,7 +391,12 @@ int CFI_establish (CFI_cdesc_t *dv, void *base_addr, CFI_attribute_t attribute,
 	  if (i == 0)
 	dv->dim[i].sm = dv->elem_len;
 	  else
-	dv->dim[i].sm = (CFI_index_t)(dv->elem_len * extents[i - 1]);
+	{
+	  CFI_index_t extents_product = 1;
+	  for (int j = 0; j < i; j++)
+		extents_product *= extents[j];
+	  dv->dim[i].sm = (CFI_index_t)(dv->elem_len * extents_product);
+	}
 	}
 }
 


Re: [PATCH, rs6000] improve vec_ctf invalid parameter handling. (pr91903)

2021-01-27 Thread will schmidt via Gcc-patches


Ping!  

Thanks
-Will


On Mon, 2021-01-04 at 18:03 -0600, will schmidt via Gcc-patches wrote:
> On Mon, 2020-10-26 at 16:22 -0500, will schmidt wrote:
> > [PATCH, rs6000] improve vec_ctf invalid parameter handling.
> > 
> > Hi,
> >   Per PR91903, GCC ICEs when we attempt to pass a variable
> > (or out of range value) into the vec_ctf() builtin.  Per
> > investigation, the parameter checking exists for this
> > builtin with the int types, but was missing for
> > the long long types.
> > 
> > This patch adds the missing CODE_FOR_* entries to the
> > rs6000_expand_binup_builtin to cover that scenario.
> > This patch also updates some existing tests to remove
> > calls to vec_ctf() and vec_cts() that contain negative
> > values.
> > 
> > Regtested clean on power7, power8, power9 Linux targets.
> > 
> > OK for trunk?
> 
> 
> I've reviewed the list archives in case my local inbox lost a
> response..  I don't think this one was reviewed.  
> so..
> 
> ping!  
> 
> :-) 
> 
> thanks
> -Will
> 
> 
> > 
> > THanks,
> > -Will
> > 
> > PR target/91903
> > 
> > 2020-10-26  Will Schmidt  
> > 
> > gcc/ChangeLog:
> > * config/rs6000/rs6000-call.c (rs6000_expand_binup_builtin):
> > Add
> > clauses for CODE_FOR_vsx_xvcvuxddp_scale and
> > CODE_FOR_vsx_xvcvsxddp_scale to the parameter checking code.
> > 
> > gcc/testsuite/ChangeLog:
> > * testsuite/gcc.target/powerpc/pr91903.c: New test.
> > * testsuite/gcc.target/powerpc/builtins-1.fold.h: Update.
> > * testsuite/gcc.target/powerpc/builtins-2.c: Update.
> > 
> > diff --git a/gcc/config/rs6000/rs6000-call.c
> > b/gcc/config/rs6000/rs6000-call.c
> > index b044778a7ae4..eb7e007e68d3 100644
> > --- a/gcc/config/rs6000/rs6000-call.c
> > +++ b/gcc/config/rs6000/rs6000-call.c
> > @@ -9447,11 +9447,13 @@ rs6000_expand_binop_builtin (enum insn_code
> > icode, tree exp, rtx target)
> > }
> >  }
> >else if (icode == CODE_FOR_altivec_vcfux
> >|| icode == CODE_FOR_altivec_vcfsx
> >|| icode == CODE_FOR_altivec_vctsxs
> > -  || icode == CODE_FOR_altivec_vctuxs)
> > +  || icode == CODE_FOR_altivec_vctuxs
> > +  || icode == CODE_FOR_vsx_xvcvuxddp_scale
> > +  || icode == CODE_FOR_vsx_xvcvsxddp_scale)
> >  {
> >/* Only allow 5-bit unsigned literals.  */
> >STRIP_NOPS (arg1);
> >if (TREE_CODE (arg1) != INTEGER_CST
> >   || TREE_INT_CST_LOW (arg1) & ~0x1f)
> > diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1.fold.h
> > b/gcc/testsuite/gcc.target/powerpc/builtins-1.fold.h
> > index 8bc5f5e43366..42d552295e3e 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/builtins-1.fold.h
> > +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1.fold.h
> > @@ -212,14 +212,14 @@ int main ()
> >extern vector unsigned long long u9; u9 = vec_mergeo (u3, u4);
> >  
> >extern vector long long l8; l8 = vec_mul (l3, l4);
> >extern vector unsigned long long u6; u6 = vec_mul (u3, u4);
> >  
> > -  extern vector double dh; dh = vec_ctf (la, -2);
> > +  extern vector double dh; dh = vec_ctf (la, 2);
> >extern vector double di; di = vec_ctf (ua, 2);
> >extern vector int sz; sz = vec_cts (fa, 0x1F);
> > -  extern vector long long l9; l9 = vec_cts (dh, -2);
> > +  extern vector long long l9; l9 = vec_cts (dh, 2);
> >extern vector unsigned long long u7; u7 = vec_ctu (di, 2);
> >extern vector unsigned int usz; usz = vec_ctu (fa, 0x1F);
> >  
> >extern vector float f1; f1 = vec_mergee (fa, fb);
> >extern vector float f2; f2 = vec_mergeo (fa, fb);
> > diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-2.c
> > b/gcc/testsuite/gcc.target/powerpc/builtins-2.c
> > index 2aa23a377992..30acae47faff 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/builtins-2.c
> > +++ b/gcc/testsuite/gcc.target/powerpc/builtins-2.c
> > @@ -40,16 +40,16 @@ int main ()
> >  
> >if (se[0] != 27L || se[1] != 27L || sf[0] != -14L || sf[1] !=
> > -14L
> >|| ue[0] != 27L || ue[1] != 27L || uf[0] != 14L || uf[1] !=
> > 14L)
> >  abort ();
> >  
> > -  vector double da = vec_ctf (sa, -2);
> > +  vector double da = vec_ctf (sa, 2);
> >vector double db = vec_ctf (ua, 2);
> > -  vector long long sg = vec_cts (da, -2);
> > +  vector long long sg = vec_cts (da, 2);
> >vector unsigned long long ug = vec_ctu (db, 2);
> >  
> > -  if (da[0] != 108.0 || da[1] != -56.0 || db[0] != 6.75 || db[1]
> > != 3.5
> > +  if (da[0] != 6.75 || da[1] != -3.5 || db[0] != 6.75 || db[1] !=
> > 3.5
> >|| sg[0] != 27L || sg[1] != -14L || ug[0] != 27L || ug[1] !=
> > 14L)
> >  abort ();
> >  
> >vector float fa = vec_ctf (inta, 5);
> >if (fa[0] != 0.843750 || fa[1] != -0.031250 || fa[2] != 0.125000
> > || fa[3] != 0.281250)
> > diff --git a/gcc/testsuite/gcc.target/powerpc/pr91903.c
> > b/gcc/testsuite/gcc.target/powerpc/pr91903.c
> > new file mode 100644
> > index ..f0792117a88f
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/pr91903.c
> > @@ -0,0 +1,74 @@
> > +/* { dg-do 

[pushed] c++: Dependent using enum [PR97874]

2021-01-27 Thread Jason Merrill via Gcc-patches
The handling of dependent scopes and unsuitable scopes in lookup_using_decl
was a bit convoluted; I tweaked it for a while and then eventually
reorganized much of the function to hopefully be clearer.  Along the way I
noticed a couple of ways we were mishandling inherited constructors.

The local binding for a dependent using is the USING_DECL.

Implement instantiation of a dependent USING_DECL at function scope.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

PR c++/97874
* name-lookup.c (lookup_using_decl): Clean up handling
of dependency and inherited constructors.
(finish_nonmember_using_decl): Handle DECL_DEPENDENT_P.
* pt.c (tsubst_expr): Handle DECL_DEPENDENT_P.

gcc/testsuite/ChangeLog:

PR c++/97874
* g++.dg/lookup/using4.C: No error in C++20.
* g++.dg/cpp0x/decltype37.C: Adjust message.
* g++.dg/template/crash75.C: Adjust message.
* g++.dg/template/crash76.C: Adjust message.
* g++.dg/cpp0x/inh-ctor36.C: New test.
* g++.dg/cpp1z/inh-ctor39.C: New test.
* g++.dg/cpp2a/using-enum-7.C: New test.
---
 gcc/cp/name-lookup.c  | 144 +++---
 gcc/cp/pt.c   |  41 +++---
 gcc/testsuite/g++.dg/cpp0x/decltype37.C   |   2 +-
 gcc/testsuite/g++.dg/cpp0x/inh-ctor36.C   |  10 ++
 gcc/testsuite/g++.dg/cpp1z/inh-ctor39.C   |  12 ++
 gcc/testsuite/g++.dg/cpp2a/using-enum-7.C |  27 
 gcc/testsuite/g++.dg/lookup/using4.C  |   2 +-
 gcc/testsuite/g++.dg/template/crash75.C   |   4 +-
 gcc/testsuite/g++.dg/template/crash76.C   |   2 +-
 9 files changed, 154 insertions(+), 90 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/inh-ctor36.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/inh-ctor39.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/using-enum-7.C

diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index 0fb0036c4f3..52e4a630e25 100644
--- a/gcc/cp/name-lookup.c
+++ b/gcc/cp/name-lookup.c
@@ -5729,6 +5729,16 @@ lookup_using_decl (tree scope, name_lookup &lookup)
   /* Naming a class member.  This is awkward in C++20, because we
 might be naming an enumerator of an unrelated class.  */
 
+  tree npscope = scope;
+  if (PACK_EXPANSION_P (scope))
+   npscope = PACK_EXPANSION_PATTERN (scope);
+
+  if (!MAYBE_CLASS_TYPE_P (npscope))
+   {
+ error ("%qT is not a class, namespace, or enumeration", npscope);
+ return NULL_TREE;
+   }
+
   /* You cannot using-decl a destructor.  */
   if (TREE_CODE (lookup.name) == BIT_NOT_EXPR)
{
@@ -5737,14 +5747,13 @@ lookup_using_decl (tree scope, name_lookup &lookup)
}
 
   /* Using T::T declares inheriting ctors, even if T is a typedef.  */
-  if (MAYBE_CLASS_TYPE_P (scope)
- && (lookup.name == TYPE_IDENTIFIER (scope)
- || constructor_name_p (lookup.name, scope)))
+  if (lookup.name == TYPE_IDENTIFIER (npscope)
+ || constructor_name_p (lookup.name, npscope))
{
  if (!TYPE_P (current))
{
  error ("non-member using-declaration names constructor of %qT",
-scope);
+npscope);
  return NULL_TREE;
}
  maybe_warn_cpp0x (CPP0X_INHERITING_CTORS);
@@ -5752,88 +5761,79 @@ lookup_using_decl (tree scope, name_lookup &lookup)
  CLASSTYPE_NON_AGGREGATE (current) = true;
}
 
-  if (!MAYBE_CLASS_TYPE_P (scope))
-   ;
+  if (!TYPE_P (current) && cxx_dialect < cxx20)
+   {
+ error ("using-declaration for member at non-class scope");
+ return NULL_TREE;
+   }
+
+  bool depscope = dependent_scope_p (scope);
+
+  if (depscope)
+   /* Leave binfo null.  */;
   else if (TYPE_P (current))
{
- dependent_p = dependent_scope_p (scope);
- if (!dependent_p)
-   {
- binfo = lookup_base (current, scope, ba_any, &b_kind, tf_none);
- gcc_checking_assert (b_kind >= bk_not_base);
+ binfo = lookup_base (current, scope, ba_any, &b_kind, tf_none);
+ gcc_checking_assert (b_kind >= bk_not_base);
 
- if (lookup.name == ctor_identifier)
+ if (b_kind == bk_not_base && any_dependent_bases_p ())
+   /* Treat as-if dependent.  */
+   depscope = true;
+ else if (lookup.name == ctor_identifier
+  && (b_kind < bk_proper_base || !binfo_direct_p (binfo)))
+   {
+ if (any_dependent_bases_p ())
+   depscope = true;
+ else
{
- /* Even if there are dependent bases, SCOPE will not
-be direct base, no matter.  */
- if (b_kind < bk_proper_base || !binfo_direct_p (binfo))
-   {
- error ("%qT is not a direct base of %qT", scope, current);
- return NULL

Re: [[C++ PATCH]] Implement C++2a P0330R2 - Literal Suffixes for ptrdiff_t and size_t

2021-01-27 Thread Jakub Jelinek via Gcc-patches
On Sun, Oct 21, 2018 at 04:39:30PM -0400, Ed Smith-Rowland wrote:
> This patch implements C++2a proposal P0330R2 Literal Suffixes for ptrdiff_t
> and size_t*.  It's not official yet but looks very likely to pass.  It is
> incomplete because I'm looking for some opinions. 9We also might wait 'till
> it actually passes).
> 
> This paper takes the direction of a language change rather than a library
> change through C++11 literal operators.  This was after feedback on that
> paper after a few iterations.
> 
> As coded in this patch, integer suffixes involving 'z' are errors in C and
> warnings for C++ <= 17 (in addition to the usual warning about
> implementation suffixes shadowing user-defined ones).
> 
> OTOH, the 'z' suffix is not currently legal - it can't break
> currently-correct code in any C/C++ dialect.  furthermore, I suspect the
> language direction was chosen to accommodate a similar addition to C20.
> 
> I'm thinking of making this feature available as an extension to all of
> C/C++ perhaps with appropriate pedwarn.

GCC now supports -std=c++2b and -std=gnu++2b, are you going to update your
patch against it (and change for z/Z standing for ssize_t rather than
ptrdiff_t), plus incorporate the feedback from Joseph and Jason?

Jakub



[PATCH 00/16] stdx::simd fixes and testsuite improvements

2021-01-27 Thread Matthias Kretz
As promised on IRC ...

Matthias Kretz (15):
  Support skip, only, expensive, and xfail markers
  Fix NEON intrinsic types usage
  Support -mlong-double-64 on PPC
  Fix simd_mask on POWER w/o POWER8
  Fix several check-simd interaction issues
  Fix DRIVEROPTS and TESTFLAGS processing
  Fix incorrect display of old test summaries
  Immediate feedback with -v
  Fix mask reduction of simd_mask on POWER7
  Skip testing hypot3 for long double on PPC
  Abort test after 1000 lines of output
  Support timeout and timeout-factor options
  Improve test codegen for interpreting assembly
  Implement hmin and hmax
  Work around test failures using -mno-tree-vrp

yaozhongxiao (1):
  Improve "find_first/last_set" for NEON

 libstdc++-v3/include/experimental/bits/simd.h | 170 ++-
 .../include/experimental/bits/simd_builtin.h  |   6 +-
 .../include/experimental/bits/simd_neon.h |  17 +-
 .../include/experimental/bits/simd_ppc.h  |  35 ++-
 .../include/experimental/bits/simd_scalar.h   |   2 +-
 libstdc++-v3/testsuite/Makefile.am|   5 +-
 libstdc++-v3/testsuite/Makefile.in|   5 +-
 .../testsuite/experimental/simd/driver.sh | 263 ++
 .../experimental/simd/generate_makefile.sh| 201 +++--
 .../testsuite/experimental/simd/tests/abs.cc  |   1 +
 .../experimental/simd/tests/algorithms.cc |   1 +
 .../experimental/simd/tests/bits/verify.h |  44 +--
 .../experimental/simd/tests/broadcast.cc  |   1 +
 .../experimental/simd/tests/casts.cc  |   1 +
 .../experimental/simd/tests/fpclassify.cc |   3 +-
 .../experimental/simd/tests/frexp.cc  |   3 +-
 .../experimental/simd/tests/generator.cc  |   1 +
 .../experimental/simd/tests/hypot3_fma.cc |   4 +-
 .../simd/tests/integer_operators.cc   |   1 +
 .../simd/tests/ldexp_scalbn_scalbln_modf.cc   |   3 +-
 .../experimental/simd/tests/loadstore.cc  |   2 +
 .../experimental/simd/tests/logarithm.cc  |   3 +-
 .../experimental/simd/tests/mask_broadcast.cc |   1 +
 .../simd/tests/mask_conversions.cc|   1 +
 .../simd/tests/mask_implicit_cvt.cc   |   1 +
 .../experimental/simd/tests/mask_loadstore.cc |   1 +
 .../simd/tests/mask_operator_cvt.cc   |   1 +
 .../experimental/simd/tests/mask_operators.cc |   1 +
 .../simd/tests/mask_reductions.cc |   1 +
 .../experimental/simd/tests/math_1arg.cc  |   3 +-
 .../experimental/simd/tests/math_2arg.cc  |   3 +-
 .../experimental/simd/tests/operator_cvt.cc   |   1 +
 .../experimental/simd/tests/operators.cc  |   1 +
 .../experimental/simd/tests/reductions.cc |  22 ++
 .../experimental/simd/tests/remqo.cc  |   3 +-
 .../testsuite/experimental/simd/tests/simd.cc |   1 +
 .../experimental/simd/tests/sincos.cc |   4 +-
 .../experimental/simd/tests/split_concat.cc   |   1 +
 .../experimental/simd/tests/splits.cc |   1 +
 .../experimental/simd/tests/trigonometric.cc  |   3 +-
 .../simd/tests/trunc_ceil_floor.cc|   3 +-
 .../experimental/simd/tests/where.cc  |   1 +
 42 files changed, 635 insertions(+), 191 deletions(-)

-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 std::experimental::simd  https://github.com/VcDevel/std-simd
──






[PATCH 01/16] Support skip, only, expensive, and xfail markers

2021-01-27 Thread Matthias Kretz
From: Matthias Kretz 

libstdc++-v3/ChangeLog:
* testsuite/experimental/simd/driver.sh: Implement skip, only,
expensive, and xfail markers. They can select on type, ABI tag
subset number, target-triplet, and compiler flags.
* testsuite/experimental/simd/generate_makefile.sh: The summary
now includes lines for unexpected passes and expected failures.
If the skip or only markers are only conditional on the type, do
not generate rules for those types.
* testsuite/experimental/simd/tests/abs.cc: Mark test expensive
for ABI tag subsets 1-9.
* testsuite/experimental/simd/tests/algorithms.cc: Ditto.
* testsuite/experimental/simd/tests/broadcast.cc: Ditto.
* testsuite/experimental/simd/tests/casts.cc: Ditto.
* testsuite/experimental/simd/tests/generator.cc: Ditto.
* testsuite/experimental/simd/tests/integer_operators.cc: Ditto.
* testsuite/experimental/simd/tests/loadstore.cc: Ditto.
* testsuite/experimental/simd/tests/mask_broadcast.cc: Ditto.
* testsuite/experimental/simd/tests/mask_conversions.cc: Ditto.
* testsuite/experimental/simd/tests/mask_implicit_cvt.cc: Ditto.
* testsuite/experimental/simd/tests/mask_loadstore.cc: Ditto.
* testsuite/experimental/simd/tests/mask_operator_cvt.cc: Ditto.
* testsuite/experimental/simd/tests/mask_operators.cc: Ditto.
* testsuite/experimental/simd/tests/mask_reductions.cc: Ditto.
* testsuite/experimental/simd/tests/operator_cvt.cc: Ditto.
* testsuite/experimental/simd/tests/operators.cc: Ditto.
* testsuite/experimental/simd/tests/reductions.cc: Ditto.
* testsuite/experimental/simd/tests/simd.cc: Ditto.
* testsuite/experimental/simd/tests/split_concat.cc: Ditto.
* testsuite/experimental/simd/tests/splits.cc: Ditto.
* testsuite/experimental/simd/tests/where.cc: Ditto.
* testsuite/experimental/simd/tests/fpclassify.cc: Ditto. In
addition replace "test only floattypes" marker by unconditional
"float|double|ldouble" only marker.
* testsuite/experimental/simd/tests/frexp.cc: Ditto.
* testsuite/experimental/simd/tests/hypot3_fma.cc: Ditto.
* testsuite/experimental/simd/tests/ldexp_scalbn_scalbln_modf.cc:
Ditto.
* testsuite/experimental/simd/tests/logarithm.cc: Ditto.
* testsuite/experimental/simd/tests/math_1arg.cc: Ditto.
* testsuite/experimental/simd/tests/math_2arg.cc: Ditto.
* testsuite/experimental/simd/tests/remqo.cc: Ditto.
* testsuite/experimental/simd/tests/trigonometric.cc: Ditto.
* testsuite/experimental/simd/tests/trunc_ceil_floor.cc: Ditto.
* testsuite/experimental/simd/tests/sincos.cc: Ditto. In
addition, xfail on run because the reference data is missing.
---
 .../testsuite/experimental/simd/driver.sh | 114 +---
 .../experimental/simd/generate_makefile.sh| 122 --
 .../testsuite/experimental/simd/tests/abs.cc  |   1 +
 .../experimental/simd/tests/algorithms.cc |   1 +
 .../experimental/simd/tests/broadcast.cc  |   1 +
 .../experimental/simd/tests/casts.cc  |   1 +
 .../experimental/simd/tests/fpclassify.cc |   3 +-
 .../experimental/simd/tests/frexp.cc  |   3 +-
 .../experimental/simd/tests/generator.cc  |   1 +
 .../experimental/simd/tests/hypot3_fma.cc |   3 +-
 .../simd/tests/integer_operators.cc   |   1 +
 .../simd/tests/ldexp_scalbn_scalbln_modf.cc   |   3 +-
 .../experimental/simd/tests/loadstore.cc  |   1 +
 .../experimental/simd/tests/logarithm.cc  |   3 +-
 .../experimental/simd/tests/mask_broadcast.cc |   1 +
 .../simd/tests/mask_conversions.cc|   1 +
 .../simd/tests/mask_implicit_cvt.cc   |   1 +
 .../experimental/simd/tests/mask_loadstore.cc |   1 +
 .../simd/tests/mask_operator_cvt.cc   |   1 +
 .../experimental/simd/tests/mask_operators.cc |   1 +
 .../simd/tests/mask_reductions.cc |   1 +
 .../experimental/simd/tests/math_1arg.cc  |   3 +-
 .../experimental/simd/tests/math_2arg.cc  |   3 +-
 .../experimental/simd/tests/operator_cvt.cc   |   1 +
 .../experimental/simd/tests/operators.cc  |   1 +
 .../experimental/simd/tests/reductions.cc |   1 +
 .../experimental/simd/tests/remqo.cc  |   3 +-
 .../testsuite/experimental/simd/tests/simd.cc |   1 +
 .../experimental/simd/tests/sincos.cc |   4 +-
 .../experimental/simd/tests/split_concat.cc   |   1 +
 .../experimental/simd/tests/splits.cc |   1 +
 .../experimental/simd/tests/trigonometric.cc  |   3 +-
 .../simd/tests/trunc_ceil_floor.cc|   3 +-
 .../experimental/simd/tests/where.cc  |   1 +
 34 files changed, 225 insertions(+), 66 deletions(-)


--
──
 Dr. Matthias Kretz   https://mat

[PATCH 02/16] Fix NEON intrinsic types usage

2021-01-27 Thread Matthias Kretz
From: Matthias Kretz 

Intrinsics types for NEON differ from gnu::vector_size types now. This
requires explicit specializations for __intrinsic_type and a new
__is_intrinsic_type trait.

libstdc++-v3/ChangeLog:
* include/experimental/bits/simd.h (__is_intrinsic_type): New
internal type trait. Alias for __is_vector_type on x86.
(_VectorTraitsImpl): Enable for __intrinsic_type in addition for
__vector_type.
(__intrin_bitcast): Allow casting to & from vector & intrinsic
types.
(__intrinsic_type): Explicitly specialize for NEON intrinsic
vector types.
---
 libstdc++-v3/include/experimental/bits/simd.h | 70 +--
 1 file changed, 66 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/
include/experimental/bits/simd.h
index 00eec50d64f..d56176210df 100644
--- a/libstdc++-v3/include/experimental/bits/simd.h
+++ b/libstdc++-v3/include/experimental/bits/simd.h
@@ -1379,13 +1379,35 @@ template 
 template 
   inline constexpr bool __is_vector_type_v = __is_vector_type<_Tp>::value;
 
+// }}}
+// __is_intrinsic_type {{{
+#if _GLIBCXX_SIMD_HAVE_SSE_ABI
+template 
+  using __is_intrinsic_type = __is_vector_type<_Tp>;
+#else // not SSE (x86)
+template >
+  struct __is_intrinsic_type : false_type {};
+
+template 
+  struct __is_intrinsic_type<
+_Tp, void_t()[0])>, 
sizeof(_Tp)>::type>>
+: is_same<_Tp, typename __intrinsic_type<
+remove_reference_t()[0])>,
+sizeof(_Tp)>::type> {};
+#endif
+
+template 
+  inline constexpr bool __is_intrinsic_type_v = 
__is_intrinsic_type<_Tp>::value;
+
 // }}}
 // _VectorTraits{{{
 template >
   struct _VectorTraitsImpl;
 
 template 
-  struct _VectorTraitsImpl<_Tp, enable_if_t<__is_vector_type_v<_Tp>>>
+  struct _VectorTraitsImpl<_Tp, enable_if_t<__is_vector_type_v<_Tp>
+ || __is_intrinsic_type_v<_Tp>>>
   {
 using type = _Tp;
 using value_type = remove_reference_t()[0])>;
@@ -1457,7 +1479,8 @@ template 
   _GLIBCXX_SIMD_INTRINSIC constexpr _To
   __intrin_bitcast(_From __v)
   {
-static_assert(__is_vector_type_v<_From> && __is_vector_type_v<_To>);
+static_assert((__is_vector_type_v<_From> || __is_intrinsic_type_v<_From>)
+   && (__is_vector_type_v<_To> || __is_intrinsic_type_v<_To>));
 if constexpr (sizeof(_To) == sizeof(_From))
   return reinterpret_cast<_To>(__v);
 else if constexpr (sizeof(_From) > sizeof(_To))
@@ -2183,16 +2206,55 @@ template 
 #endif // _GLIBCXX_SIMD_HAVE_SSE_ABI
 // __intrinsic_type (ARM){{{
 #if _GLIBCXX_SIMD_HAVE_NEON
+template <>
+  struct __intrinsic_type
+  { using type = float32x2_t; };
+
+template <>
+  struct __intrinsic_type
+  { using type = float32x4_t; };
+
+#if _GLIBCXX_SIMD_HAVE_NEON_A64
+template <>
+  struct __intrinsic_type
+  { using type = float64x1_t; };
+
+template <>
+  struct __intrinsic_type
+  { using type = float64x2_t; };
+#endif
+
+#define _GLIBCXX_SIMD_ARM_INTRIN(_Bits, _Np)   
\
+template <>
\
+  struct __intrinsic_type<__int_with_sizeof_t<_Bits / 8>,  
\
+ _Np * _Bits / 8, void>   \
+  { using type = int##_Bits##x##_Np##_t; };
\
+template <>
\
+  struct __intrinsic_type>, 
\
+ _Np * _Bits / 8, void>   \
+  { using type = uint##_Bits##x##_Np##_t; }
+_GLIBCXX_SIMD_ARM_INTRIN(8, 8);
+_GLIBCXX_SIMD_ARM_INTRIN(8, 16);
+_GLIBCXX_SIMD_ARM_INTRIN(16, 4);
+_GLIBCXX_SIMD_ARM_INTRIN(16, 8);
+_GLIBCXX_SIMD_ARM_INTRIN(32, 2);
+_GLIBCXX_SIMD_ARM_INTRIN(32, 4);
+_GLIBCXX_SIMD_ARM_INTRIN(64, 1);
+_GLIBCXX_SIMD_ARM_INTRIN(64, 2);
+#undef _GLIBCXX_SIMD_ARM_INTRIN
+
 template 
   struct __intrinsic_type<_Tp, _Bytes,
  enable_if_t<__is_vectorizable_v<_Tp> && _Bytes <= 16>>
   {
-static constexpr int _S_VBytes = _Bytes <= 8 ? 8 : 16;
+static constexpr int _SVecBytes = _Bytes <= 8 ? 8 : 16;
 using _Ip = __int_for_sizeof_t<_Tp>;
 using _Up = conditional_t<
   is_floating_point_v<_Tp>, _Tp,
   conditional_t, make_unsigned_t<_Ip>, _Ip>>;
-using type [[__gnu__::__vector_size__(_S_VBytes)]] = _Up;
+static_assert(!is_same_v<_Tp, _Up> || _SVecBytes != _Bytes,
+ "should use explicit specialization above");
+using type = typename __intrinsic_type<_Up, _SVecBytes>::type;
   };
 #endif // _GLIBCXX_SIMD_HAVE_NEON
 
-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 std::experimental::simd  https://github.com/VcDevel/std-si

[PATCH 03/16] Support -mlong-double-64 on PPC

2021-01-27 Thread Matthias Kretz
From: Matthias Kretz 

libstdc++-v3/ChangeLog:
* include/experimental/bits/simd.h: Let __intrinsic_type be valid if sizeof(long double) == sizeof(double) and
use a __vector double as member type.
---
 libstdc++-v3/include/experimental/bits/simd.h | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/
include/experimental/bits/simd.h
index d56176210df..64cf8d32328 100644
--- a/libstdc++-v3/include/experimental/bits/simd.h
+++ b/libstdc++-v3/include/experimental/bits/simd.h
@@ -2285,7 +2285,9 @@ template 
   struct __intrinsic_type<_Tp, _Bytes,
  enable_if_t<__is_vectorizable_v<_Tp> && _Bytes <= 16>>
   {
-static_assert(!is_same_v<_Tp, long double>,
+static constexpr bool _S_is_ldouble = is_same_v<_Tp, long double>;
+// allow _Tp == long double with -mlong-double-64
+static_assert(!(_S_is_ldouble && sizeof(long double) > sizeof(double)),
  "no __intrinsic_type support for long double on PPC");
 #ifndef __VSX__
 static_assert(!is_same_v<_Tp, double>,
@@ -2297,8 +2299,11 @@ template 
   "no __intrinsic_type support for integers larger than 4 Bytes "
   "on PPC w/o POWER8 vectors");
 #endif
-using type = typename __intrinsic_type_impl, _Tp, __int_for_sizeof_t<_Tp>>>::type;
+using type =
+  typename __intrinsic_type_impl<
+conditional_t,
+  conditional_t<_S_is_ldouble, double, _Tp>,
+  __int_for_sizeof_t<_Tp>>>::type;
   };
 #endif // __ALTIVEC__
 
-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 std::experimental::simd  https://github.com/VcDevel/std-simd
──



[PATCH 06/16] Fix DRIVEROPTS and TESTFLAGS processing

2021-01-27 Thread Matthias Kretz
From: Matthias Kretz 

libstdc++-v3/ChangeLog:
* testsuite/experimental/simd/generate_makefile.sh: Use
different variables internally than documented for user
overrides. This makes internal append/prepend work as intended.
---
 .../testsuite/experimental/simd/generate_makefile.sh  | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/testsuite/experimental/simd/generate_makefile.sh b/
libstdc++-v3/testsuite/experimental/simd/generate_makefile.sh
index 8d642a2941a..4fb710c7767 100755
--- a/libstdc++-v3/testsuite/experimental/simd/generate_makefile.sh
+++ b/libstdc++-v3/testsuite/experimental/simd/generate_makefile.sh
@@ -85,19 +85,20 @@ CXX="$1"
 shift
 
 echo "TESTFLAGS ?=" > "$dst"
-[ -n "$testflags" ] && echo "TESTFLAGS := $testflags \$(TESTFLAGS)" >> "$dst"
-echo CXXFLAGS = "$@" "\$(TESTFLAGS)" >> "$dst"
+echo "test_flags := $testflags \$(TESTFLAGS)" >> "$dst"
+echo CXXFLAGS = "$@" "\$(test_flags)" >> "$dst"
 [ -n "$sim" ] && echo "export GCC_TEST_SIMULATOR = $sim" >> "$dst"
 cat >> "$dst" 

[PATCH 04/16] Fix simd_mask on POWER w/o POWER8

2021-01-27 Thread Matthias Kretz
From: Matthias Kretz 

libstdc++-v3/ChangeLog:
* include/experimental/bits/simd.h: Remove unnecessary static
assertion. Allow sizeof(8) integer __intrinsic_type to enable
the necessary mask type.
---
 libstdc++-v3/include/experimental/bits/simd.h | 6 --
 1 file changed, 6 deletions(-)

diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/
include/experimental/bits/simd.h
index 64cf8d32328..9685df0be9e 100644
--- a/libstdc++-v3/include/experimental/bits/simd.h
+++ b/libstdc++-v3/include/experimental/bits/simd.h
@@ -2292,12 +2292,6 @@ template 
 #ifndef __VSX__
 static_assert(!is_same_v<_Tp, double>,
  "no __intrinsic_type support for double on PPC w/o VSX");
-#endif
-#ifndef __POWER8_VECTOR__
-static_assert(
-  !(is_integral_v<_Tp> && sizeof(_Tp) > 4),
-  "no __intrinsic_type support for integers larger than 4 Bytes "
-  "on PPC w/o POWER8 vectors");
 #endif
 using type =
   typename __intrinsic_type_impl<
-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 std::experimental::simd  https://github.com/VcDevel/std-simd
──



[PATCH 05/16] Fix several check-simd interaction issues

2021-01-27 Thread Matthias Kretz
From: Matthias Kretz 

libstdc++-v3/ChangeLog:
* testsuite/experimental/simd/driver.sh (verify_test): Print
test output on run xfail. Do not repeat lines from the log that
were already printed on stdout.
(test_selector): Make the compiler flags pattern usable as a
substring selector.
(toplevel): Trap on SIGINT and remove the log and sum files.
Call timout with --foreground to quickly terminate on SIGINT.
* testsuite/experimental/simd/generate_makefile.sh: Simplify run
targets via target patterns. Default DRIVEROPTS to -v for run
targets. Remove log and sum files after completion of the run
target (so that it's always recompiled).
Place help text into text file for reasonable 'make help'
performance.
---
 .../testsuite/experimental/simd/driver.sh | 16 +++--
 .../experimental/simd/generate_makefile.sh| 70 +--
 2 files changed, 44 insertions(+), 42 deletions(-)


--
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 std::experimental::simd  https://github.com/VcDevel/std-simd
──diff --git a/libstdc++-v3/testsuite/experimental/simd/driver.sh b/libstdc++-v3/testsuite/experimental/simd/driver.sh
index 84f3829c2d4..cf07ff9ad85 100755
--- a/libstdc++-v3/testsuite/experimental/simd/driver.sh
+++ b/libstdc++-v3/testsuite/experimental/simd/driver.sh
@@ -224,16 +224,17 @@ verify_test() {
   fail "timeout: execution test"
 elif [ "$xfail" = "run" ]; then
   xfail "execution test"
-  exit 0
 else
   fail "execution test"
 fi
 if $verbose; then
-  if [ $(cat "$log"|wc -l) -gt 1000 ]; then
+  lines=$(wc -l < "$log")
+  lines=$((lines-3))
+  if [ $lines -gt 1000 ]; then
 echo "[...]"
 tail -n1000 "$log"
   else
-cat "$log"
+tail -n$lines "$log"
   fi
 elif ! $quiet; then
   grep -i fail "$log" | head -n5
@@ -267,7 +268,7 @@ test_selector() {
   [ -z "$target_triplet" ] && target_triplet=$($CXX -dumpmachine)
   if matches "$target_triplet" "$pat_triplet"; then
 pat_flags="${string#* }"
-if matches "$CXXFLAGS" "$pat_flags"; then
+if matches "$CXXFLAGS" "*$pat_flags*"; then
   return 0
 fi
   fi
@@ -276,6 +277,7 @@ test_selector() {
   return 1
 }
 
+trap "rm -f '$log' '$sum'; exit" INT
 rm -f "$log" "$sum"
 touch "$log" "$sum"
 
@@ -316,15 +318,15 @@ if [ -n "$xfail" ]; then
 fi
 
 write_log_and_verbose "$CXX $src $@ -D_GLIBCXX_SIMD_TESTTYPE=$type $abiflag -o $exe"
-timeout $timeout "$CXX" "$src" "$@" "-D_GLIBCXX_SIMD_TESTTYPE=$type" $abiflag -o "$exe" >> "$log" 2>&1
+timeout --foreground $timeout "$CXX" "$src" "$@" "-D_GLIBCXX_SIMD_TESTTYPE=$type" $abiflag -o "$exe" >> "$log" 2>&1
 verify_compilation $?
 if [ -n "$sim" ]; then
   write_log_and_verbose "$sim ./$exe"
-  timeout $timeout $sim "./$exe" >> "$log" 2>&1 <&-
+  timeout --foreground $timeout $sim "./$exe" >> "$log" 2>&1 <&-
 else
   write_log_and_verbose "./$exe"
   timeout=$(awk "BEGIN { print int($timeout / 2) }")
-  timeout $timeout "./$exe" >> "$log" 2>&1 <&-
+  timeout --foreground $timeout "./$exe" >> "$log" 2>&1 <&-
 fi
 verify_test $?
 
diff --git a/libstdc++-v3/testsuite/experimental/simd/generate_makefile.sh b/libstdc++-v3/testsuite/experimental/simd/generate_makefile.sh
index 553bc98f60b..8d642a2941a 100755
--- a/libstdc++-v3/testsuite/experimental/simd/generate_makefile.sh
+++ b/libstdc++-v3/testsuite/experimental/simd/generate_makefile.sh
@@ -240,7 +240,7 @@ EOF
 %-$type.log: %-$type-0.log %-$type-1.log %-$type-2.log %-$type-3.log \
 %-$type-4.log %-$type-5.log %-$type-6.log %-$type-7.log \
 %-$type-8.log %-$type-9.log
-	@cat $^ > \$@
+	@cat \$^ > \$@
 	@cat \$(^:log=sum) > \$(@:log=sum)${rmline}
 
 EOF
@@ -252,47 +252,47 @@ EOF
 EOF
 done
   done
-  echo 'run-%: export GCC_TEST_RUN_EXPENSIVE=yes'
-  all_tests | while read file && read name; do
-echo "run-$name: $name.log"
-all_types "$file" | while read t && read type; do
-  echo "run-$name-$type: $name-$type.log"
-  for i in $(seq 0 9); do
-echo "run-$name-$type-$i: $name-$type-$i.log"
-  done
-done
-echo
-  done
   cat < to pass the following options:\n"\\
-	"-q, --quiet Only print failures.\n"\\
-	"-v, --verbose   Print compiler and test output on failure.\n"\\
-	"-k, --keep-failed   Keep executables of failed tests.\n"\\
-	"--sim   Path to an executable that is prepended to the test\n"\\
-	"execution binary (default: the value of\n"\\
-	"GCC_TEST_SIMULATOR).\n"\\
-	"--timeout-factor \n"\\
-	"Multiply the default timeout with x.\n"\\
-	"--run-expensive Compile an

[PATCH 07/16] Fix incorrect display of old test summaries

2021-01-27 Thread Matthias Kretz
From: Matthias Kretz 

libstdc++-v3/ChangeLog:
* testsuite/Makefile.am: Ensure .simd.summary is empty before
collecting a new summary.
* testsuite/Makefile.in: Regenerate.
---
 libstdc++-v3/testsuite/Makefile.am | 1 +
 libstdc++-v3/testsuite/Makefile.in | 1 +
 2 files changed, 2 insertions(+)

diff --git a/libstdc++-v3/testsuite/Makefile.am b/libstdc++-v3/testsuite/
Makefile.am
index 5dd109b40c9..2d3ad481dba 100644
--- a/libstdc++-v3/testsuite/Makefile.am
+++ b/libstdc++-v3/testsuite/Makefile.am
@@ -191,6 +191,7 @@ check-simd: $(srcdir)/experimental/simd/
generate_makefile.sh \
${glibcxx_srcdir}/scripts/check_simd \
testsuite_files_simd \
${glibcxx_builddir}/scripts/testsuite_flags
+   @rm -f .simd.summary
${glibcxx_srcdir}/scripts/check_simd "${glibcxx_srcdir}" "$
{glibcxx_builddir}" "$(CXXFLAGS)" | \
  while read subdir; do \
$(MAKE) -C "$${subdir}"; \
diff --git a/libstdc++-v3/testsuite/Makefile.in b/libstdc++-v3/testsuite/
Makefile.in
index 3900d6d87b4..ac6207ae75c 100644
--- a/libstdc++-v3/testsuite/Makefile.in
+++ b/libstdc++-v3/testsuite/Makefile.in
@@ -716,6 +716,7 @@ check-simd: $(srcdir)/experimental/simd/
generate_makefile.sh \
${glibcxx_srcdir}/scripts/check_simd \
testsuite_files_simd \
${glibcxx_builddir}/scripts/testsuite_flags
+   @rm -f .simd.summary
${glibcxx_srcdir}/scripts/check_simd "${glibcxx_srcdir}" "$
{glibcxx_builddir}" "$(CXXFLAGS)" | \
  while read subdir; do \
$(MAKE) -C "$${subdir}"; \
-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 std::experimental::simd  https://github.com/VcDevel/std-simd
──



[PATCH 08/16] Immediate feedback with -v

2021-01-27 Thread Matthias Kretz
From: Matthias Kretz 

libstdc++-v3/ChangeLog:
* testsuite/experimental/simd/driver.sh: Remove executable on
SIGINT. Process compiler and test executable output: In verbose
mode print messages immediately, limited to 1000 lines and
breaking long lines to below $COLUMNS (or 1024 if not set).
Communicating the exit status of the compiler / test with the
necessary pipe is done via a message through stdout/-in.
---
 .../testsuite/experimental/simd/driver.sh | 194 +++---
 1 file changed, 116 insertions(+), 78 deletions(-)


--
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 std::experimental::simd  https://github.com/VcDevel/std-simd
──diff --git a/libstdc++-v3/testsuite/experimental/simd/driver.sh b/libstdc++-v3/testsuite/experimental/simd/driver.sh
index cf07ff9ad85..314c6a16f86 100755
--- a/libstdc++-v3/testsuite/experimental/simd/driver.sh
+++ b/libstdc++-v3/testsuite/experimental/simd/driver.sh
@@ -172,81 +172,14 @@ unsupported() {
   echo "UNSUPPORTED: $src $type $abiflag ($*)" >> "$log"
 }
 
-verify_compilation() {
-  failed=$1
-  if [ $failed -eq 0 ]; then
-warnings=$(grep -ic 'warning:' "$log")
-if [ $warnings -gt 0 ]; then
-  fail "excess warnings:" $warnings
-  if $verbose; then
-cat "$log"
-  elif ! $quiet; then
-grep -i 'warning:' "$log" | head -n5
-  fi
-elif [ "$xfail" = "compile" ]; then
-  xpass "test for excess errors"
-else
-  pass "test for excess errors"
-fi
-  else
-if [ $failed -eq 124 ]; then
-  fail "timeout: test for excess errors"
-else
-  errors=$(grep -ic 'error:' "$log")
-  if [ "$xfail" = "compile" ]; then
-xfail "excess errors:" $errors
-exit 0
-  else
-fail "excess errors:" $errors
-  fi
-fi
-if $verbose; then
-  cat "$log"
-elif ! $quiet; then
-  grep -i 'error:' "$log" | head -n5
-fi
-exit 0
-  fi
-}
-
-verify_test() {
-  failed=$1
-  if [ $failed -eq 0 ]; then
-rm "$exe"
-if [ "$xfail" = "run" ]; then
-  xpass "execution test"
-else
-  pass "execution test"
-fi
-  else
-$keep_failed || rm "$exe"
-if [ $failed -eq 124 ]; then
-  fail "timeout: execution test"
-elif [ "$xfail" = "run" ]; then
-  xfail "execution test"
-else
-  fail "execution test"
-fi
-if $verbose; then
-  lines=$(wc -l < "$log")
-  lines=$((lines-3))
-  if [ $lines -gt 1000 ]; then
-echo "[...]"
-tail -n1000 "$log"
-  else
-tail -n$lines "$log"
-  fi
-elif ! $quiet; then
-  grep -i fail "$log" | head -n5
-fi
-exit 0
-  fi
-}
-
 write_log_and_verbose() {
   echo "$*" >> "$log"
   if $verbose; then
-echo "$*"
+if [ -z "$COLUMNS" ] || ! type fmt>/dev/null; then
+  echo "$*"
+else
+  echo "$*" | fmt -w $COLUMNS -s - || cat
+fi
   fi
 }
 
@@ -277,7 +210,7 @@ test_selector() {
   return 1
 }
 
-trap "rm -f '$log' '$sum'; exit" INT
+trap "rm -f '$log' '$sum' $exe; exit" INT
 rm -f "$log" "$sum"
 touch "$log" "$sum"
 
@@ -317,17 +250,122 @@ if [ -n "$xfail" ]; then
   fi
 fi
 
+log_output() {
+  if $verbose; then
+maxcol=${1:-1024}
+awk "
+BEGIN { count = 0 }
+/^###exitstatus### [0-9]+$/ { exit \$2 }
+{
+  print >> \"$log\"
+  if (count >= 1000) next
+  ++count
+  if (length(\$0) > $maxcol) {
+i = 1
+while (i + $maxcol <= length(\$0)) {
+  len = $maxcol
+  line = substr(\$0, i, len)
+  len = match(line, / [^ ]*$/)
+  if (len <= 0) {
+len = match(substr(\$0, i), / [^ ]/)
+if (len <= 0) len = $maxcol
+  }
+  print substr(\$0, i, len)
+  i += len
+}
+print substr(\$0, i)
+  } else {
+print
+  }
+}
+END { close(\"$log\") }
+"
+  else
+awk "
+/^###exitstatus### [0-9]+$/ { exit \$2 }
+{ print >> \"$log\" }
+END { close(\"$log\") }
+"
+  fi
+}
+
+verify_compilation() {
+  log_output $COLUMNS
+  exitstatus=$?
+  if [ $exitstatus -eq 0 ]; then
+warnings=$(grep -ic 'warning:' "$log")
+if [ $warnings -gt 0 ]; then
+  fail "excess warnings:" $warnings
+  if ! $verbose && ! $quiet; then
+grep -i 'warning:' "$log" | head -n5
+  fi
+elif [ "$xfail" = "compile" ]; then
+  xpass "test for excess errors"
+else
+  pass "test for excess errors"
+fi
+return 0
+  else
+if [ $exitstatus -eq 124 ]; then
+  fail "timeout: test for excess errors"
+else
+  errors=$(grep -ic 'error:' "$log")
+  if [ "$xfail" = "compile" ]; then
+xfail "excess errors:" $errors
+exit 0
+  else
+fail "excess errors:" $errors
+  fi
+fi
+if ! $verbose && ! $quiet; then
+  grep -i '

[PATCH 09/16] Fix mask reduction of simd_mask on POWER7

2021-01-27 Thread Matthias Kretz
From: Matthias Kretz 

POWER7 does not support __vector long long reductions, making the
generic _S_popcount implementation ill-formed. Specializing _S_popcount
for PPC allows optimization and avoids the issue.

libstdc++-v3/ChangeLog:
* include/experimental/bits/simd.h: Add __have_power10vec
conditional on _ARCH_PWR10.
* include/experimental/bits/simd_builtin.h: Forward declare
_MaskImplPpc and use it as _MaskImpl when __ALTIVEC__ is
defined.
(_MaskImplBuiltin::_S_some_of): Call _S_popcount from the
_SuperImpl for optimizations and correctness.
* include/experimental/bits/simd_ppc.h: Add _MaskImplPpc.
(_MaskImplPpc::_S_popcount): Implement via vec_cntm for POWER10.
Otherwise, for >=int use -vec_sums divided by a sizeof factor.
For  struct _MaskImplX86;
 template  struct _SimdImplNeon;
 template  struct _MaskImplNeon;
 template  struct _SimdImplPpc;
+template  struct _MaskImplPpc;
 
 // simd_abi::_VecBuiltin {{{
 template 
@@ -959,10 +960,11 @@ template 
 using _CommonImpl = _CommonImplBuiltin;
 #ifdef __ALTIVEC__
 using _SimdImpl = _SimdImplPpc<_VecBuiltin<_UsedBytes>>;
+using _MaskImpl = _MaskImplPpc<_VecBuiltin<_UsedBytes>>;
 #else
 using _SimdImpl = _SimdImplBuiltin<_VecBuiltin<_UsedBytes>>;
-#endif
 using _MaskImpl = _MaskImplBuiltin<_VecBuiltin<_UsedBytes>>;
+#endif
 #endif
 
 // }}}
@@ -2899,7 +2901,7 @@ template 
   _GLIBCXX_SIMD_INTRINSIC static bool
   _S_some_of(simd_mask<_Tp, _Abi> __k)
   {
-   const int __n_true = _S_popcount(__k);
+   const int __n_true = _SuperImpl::_S_popcount(__k);
return __n_true > 0 && __n_true < int(_S_size<_Tp>);
   }
 
diff --git a/libstdc++-v3/include/experimental/bits/simd_ppc.h b/libstdc++-v3/
include/experimental/bits/simd_ppc.h
index c00d2323ac6..1d649931eb9 100644
--- a/libstdc++-v3/include/experimental/bits/simd_ppc.h
+++ b/libstdc++-v3/include/experimental/bits/simd_ppc.h
@@ -30,6 +30,7 @@
 #ifndef __ALTIVEC__
 #error "simd_ppc.h may only be included when AltiVec/VMX is available"
 #endif
+#include 
 
 _GLIBCXX_SIMD_BEGIN_NAMESPACE
 
@@ -114,10 +115,42 @@ template 
 // }}}
   };
 
+// }}}
+// _MaskImplPpc {{{
+template 
+  struct _MaskImplPpc : _MaskImplBuiltin<_Abi>
+  {
+using _Base = _MaskImplBuiltin<_Abi>;
+
+// _S_popcount {{{
+template 
+  _GLIBCXX_SIMD_INTRINSIC static int _S_popcount(simd_mask<_Tp, _Abi> 
__k)
+  {
+   const auto __kv = __as_vector(__k);
+   if constexpr (__have_power10vec)
+ {
+   return vec_cntm(__to_intrin(__kv), 1);
+ }
+   else if constexpr (sizeof(_Tp) >= sizeof(int))
+ {
+   using _Intrin = __intrinsic_type16_t;
+   const int __sum = -vec_sums(__intrin_bitcast<_Intrin>(__kv), 
_Intrin())[3];
+   return __sum / (sizeof(_Tp) / sizeof(int));
+ }
+   else
+ {
+   const auto __summed_to_int = vec_sum4s(__to_intrin(__kv), 
__intrinsic_type16_t());
+   return -vec_sums(__summed_to_int, __intrinsic_type16_t())[3];
+ }
+  }
+
+// }}}
+  };
+
 // }}}
 
 _GLIBCXX_SIMD_END_NAMESPACE
 #endif // __cplusplus >= 201703L
 #endif // _GLIBCXX_EXPERIMENTAL_SIMD_PPC_H_
 
-// vim: foldmethod=marker sw=2 noet ts=8 sts=2 tw=80
+// vim: foldmethod=marker foldmarker={{{,}}} sw=2 noet ts=8 sts=2 tw=100
-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 std::experimental::simd  https://github.com/VcDevel/std-simd
──






[PATCH 10/16] Skip testing hypot3 for long double on PPC

2021-01-27 Thread Matthias Kretz
From: Matthias Kretz 

std::hypot(a, b, c) is imprecise and makes this test fail even though
the failure is unrelated to simd.

libstdc++-v3/ChangeLog:
* testsuite/experimental/simd/tests/hypot3_fma.cc: Add skip:
markup for long double on powerpc64*.
---
 libstdc++-v3/testsuite/experimental/simd/tests/hypot3_fma.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libstdc++-v3/testsuite/experimental/simd/tests/hypot3_fma.cc b/
libstdc++-v3/testsuite/experimental/simd/tests/hypot3_fma.cc
index 689a90c10a5..94d267fccfb 100644
--- a/libstdc++-v3/testsuite/experimental/simd/tests/hypot3_fma.cc
+++ b/libstdc++-v3/testsuite/experimental/simd/tests/hypot3_fma.cc
@@ -16,6 +16,7 @@
 // .
 
 // only: float|double|ldouble * * *
+// skip: ldouble * powerpc64* *
 // expensive: * [1-9] * *
 #include "bits/verify.h"
 #include "bits/metahelpers.h"
-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 std::experimental::simd  https://github.com/VcDevel/std-simd
──



[PATCH 11/16] Abort test after 1000 lines of output

2021-01-27 Thread Matthias Kretz
From: Matthias Kretz 

Handle overly large output by aborting the log and thus the test. This
is a similar condition to a timeout.

libstdc++-v3/ChangeLog:
* testsuite/experimental/simd/driver.sh: When handling the pipe
to log (and on verbose to stdout) count the lines. If it exceeds
1000 log the issue and exit 125, which is then handled as a
failure.
---
 .../testsuite/experimental/simd/driver.sh   | 17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/testsuite/experimental/simd/driver.sh b/libstdc++-
v3/testsuite/experimental/simd/driver.sh
index 314c6a16f86..719e4db8e68 100755
--- a/libstdc++-v3/testsuite/experimental/simd/driver.sh
+++ b/libstdc++-v3/testsuite/experimental/simd/driver.sh
@@ -258,7 +258,11 @@ BEGIN { count = 0 }
 /^###exitstatus### [0-9]+$/ { exit \$2 }
 {
   print >> \"$log\"
-  if (count >= 1000) next
+  if (count >= 1000) {
+print \"Aborting: too much output\" >> \"$log\"
+print \"Aborting: too much output\"
+exit 125
+  }
   ++count
   if (length(\$0) > $maxcol) {
 i = 1
@@ -282,8 +286,17 @@ END { close(\"$log\") }
 "
   else
 awk "
+BEGIN { count = 0 }
 /^###exitstatus### [0-9]+$/ { exit \$2 }
-{ print >> \"$log\" }
+{
+  print >> \"$log\"
+  if (count >= 1000) {
+print \"Aborting: too much output\" >> \"$log\"
+print \"Aborting: too much output\"
+exit 125
+  }
+  ++count
+}
 END { close(\"$log\") }
 "
   fi
-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 std::experimental::simd  https://github.com/VcDevel/std-simd
──



[PATCH 12/16] Support timeout and timeout-factor options

2021-01-27 Thread Matthias Kretz
From: Matthias Kretz 

libstdc++-v3/ChangeLog:
* testsuite/experimental/simd/driver.sh: Abstract reading test
options into read_src_option function. Read skip, only,
expensive, and xfail via read_src_option. Add timeout and
timeout-factor options and adjust timeout variable accordingly.
* testsuite/experimental/simd/tests/loadstore.cc: Set
timeout-factor 2.
---
 .../testsuite/experimental/simd/driver.sh | 38 +--
 .../experimental/simd/tests/loadstore.cc  |  1 +
 2 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/libstdc++-v3/testsuite/experimental/simd/driver.sh b/libstdc++-
v3/testsuite/experimental/simd/driver.sh
index 719e4db8e68..71e0c7d5ee8 100755
--- a/libstdc++-v3/testsuite/experimental/simd/driver.sh
+++ b/libstdc++-v3/testsuite/experimental/simd/driver.sh
@@ -214,35 +214,43 @@ trap "rm -f '$log' '$sum' $exe; exit" INT
 rm -f "$log" "$sum"
 touch "$log" "$sum"
 
-skip="$(head -n25 "$src" | grep '^//\s*skip: ')"
-if [ -n "$skip" ]; then
-  skip="$(echo "$skip" | sed -e 's/^.*:\s*//' -e 's/ \+/ /g')"
+read_src_option() {
+  local key tmp var
+  key="$1"
+  var="$2"
+  [ -z "$var" ] && var="$1"
+  local tmp="$(head -n25 "$src" | grep "^//\\s*${key}: ")"
+  if [ -n "$tmp" ]; then
+tmp="$(echo "${tmp#//*${key}: }" | sed -e 's/ \+/ /g' -e 's/^ //' -e 's/ 
$//')"
+eval "$var=\"$tmp\""
+  else
+return 1
+  fi
+}
+
+if read_src_option skip; then
   if test_selector "$skip"; then
 # silently skip this test
 exit 0
   fi
 fi
-only="$(head -n25 "$src" | grep '^//\s*only: ')"
-if [ -n "$only" ]; then
-  only="$(echo "$only" | sed -e 's/^.*:\s*//' -e 's/ \+/ /g')"
+if read_src_option only; then
   if ! test_selector "$only"; then
 # silently skip this test
 exit 0
   fi
 fi
+
 if ! $run_expensive; then
-  expensive="$(head -n25 "$src" | grep '^//\s*expensive: ')"
-  if [ -n "$expensive" ]; then
-expensive="$(echo "$expensive" | sed -e 's/^.*:\s*//' -e 's/ \+/ /g')"
+  if read_src_option expensive; then
 if test_selector "$expensive"; then
   unsupported "skip expensive tests"
   exit 0
 fi
   fi
 fi
-xfail="$(head -n25 "$src" | grep '^//\s*xfail: ')"
-if [ -n "$xfail" ]; then
-  xfail="$(echo "$xfail" | sed -e 's/^.*:\s*//' -e 's/ \+/ /g')"
+
+if read_src_option xfail; then
   if test_selector "${xfail#* }"; then
 xfail="${xfail%% *}"
   else
@@ -250,6 +258,12 @@ if [ -n "$xfail" ]; then
   fi
 fi
 
+read_src_option timeout
+
+if read_src_option timeout-factor factor; then
+  timeout=$(awk "BEGIN { print int($timeout * $factor) }")
+fi
+
 log_output() {
   if $verbose; then
 maxcol=${1:-1024}
diff --git a/libstdc++-v3/testsuite/experimental/simd/tests/loadstore.cc b/
libstdc++-v3/testsuite/experimental/simd/tests/loadstore.cc
index dd7d6c30e8c..cd27c3a7426 100644
--- a/libstdc++-v3/testsuite/experimental/simd/tests/loadstore.cc
+++ b/libstdc++-v3/testsuite/experimental/simd/tests/loadstore.cc
@@ -16,6 +16,7 @@
 // .
 
 // expensive: * [1-9] * *
+// timeout-factor: 2
 #include "bits/verify.h"
 #include "bits/make_vec.h"
 #include "bits/conversions.h"
-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 std::experimental::simd  https://github.com/VcDevel/std-simd
──






[PATCH 13/16] Improve test codegen for interpreting assembly

2021-01-27 Thread Matthias Kretz
From: Matthias Kretz 

In many failure cases it is helpful to inspect the instructions leading
up to the test failure. After this change the location is easier to find
and the branch after failure is easier to find.

libstdc++-v3/ChangeLog:
* testsuite/experimental/simd/tests/bits/verify.h (verify): Add
instruction pointer data member. Ensure that the `if (m_failed)`
branch is always inlined into the calling code. The body of the
conditional can still be a function call. Move the get_ip call
into the verify ctor to simplify the ctor calls.
(COMPARE): Don't mention the use of all_of for reduction of a
simd_mask. It only distracts from the real issue.
---
 .../experimental/simd/tests/bits/verify.h | 44 +--
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/libstdc++-v3/testsuite/experimental/simd/tests/bits/verify.h b/
libstdc++-v3/testsuite/experimental/simd/tests/bits/verify.h
index 5da47b35536..17bda71b77e 100644
--- a/libstdc++-v3/testsuite/experimental/simd/tests/bits/verify.h
+++ b/libstdc++-v3/testsuite/experimental/simd/tests/bits/verify.h
@@ -60,6 +60,7 @@ template 
 class verify
 {
   const bool m_failed = false;
+  size_t m_ip = 0;
 
   template ()
@@ -129,20 +130,21 @@ class verify
 
 public:
   template 
-verify(bool ok, size_t ip, const char* file, const int line,
+[[gnu::always_inline]]
+verify(bool ok, const char* file, const int line,
   const char* func, const char* cond, const Ts&... extra_info)
-: m_failed(!ok)
+: m_failed(!ok), m_ip(get_ip())
 {
   if (m_failed)
-   {
+   [&] {
  __builtin_fprintf(stderr, "%s:%d: (%s):\nInstruction Pointer: %x\n"
"Assertion '%s' failed.\n",
-   file, line, func, ip, cond);
+   file, line, func, m_ip, cond);
  (print(extra_info, int()), ...);
-   }
+   }();
 }
 
-  ~verify()
+  [[gnu::always_inline]] ~verify()
   {
 if (m_failed)
   {
@@ -152,26 +154,27 @@ public:
   }
 
   template 
+[[gnu::always_inline]]
 const verify&
 operator<<(const T& x) const
 {
   if (m_failed)
-   {
- print(x, int());
-   }
+   print(x, int());
   return *this;
 }
 
   template 
+[[gnu::always_inline]]
 const verify&
 on_failure(const Ts&... xs) const
 {
   if (m_failed)
-   (print(xs, int()), ...);
+   [&] { (print(xs, int()), ...); }();
   return *this;
 }
 
-  [[gnu::always_inline]] static inline size_t
+  [[gnu::always_inline]] static inline
+  size_t
   get_ip()
   {
 size_t _ip = 0;
@@ -220,24 +223,21 @@ template 
 
 #define COMPARE(_a, _b)
\
   [&](auto&& _aa, auto&& _bb) {
\
-return verify(std::experimental::all_of(_aa == _bb), verify::get_ip(), 
\
- __FILE__, __LINE__, __PRETTY_FUNCTION__, \
- "all_of(" #_a " == " #_b ")", #_a " = ", _aa,\
+return verify(std::experimental::all_of(_aa == _bb), __FILE__, __LINE__,   
\
+ __PRETTY_FUNCTION__, #_a " == " #_b, #_a " = ", _aa, \
  "\n" #_b " = ", _bb);\
   }(force_fp_truncation(_a), force_fp_truncation(_b))
 #else
 #define COMPARE(_a, _b)
\
   [&](auto&& _aa, auto&& _bb) {
\
-return verify(std::experimental::all_of(_aa == _bb), verify::get_ip(), 
\
- __FILE__, __LINE__, __PRETTY_FUNCTION__, \
- "all_of(" #_a " == " #_b ")", #_a " = ", _aa,\
+return verify(std::experimental::all_of(_aa == _bb), __FILE__, __LINE__,   
\
+ __PRETTY_FUNCTION__, #_a " == " #_b, #_a " = ", _aa, \
  "\n" #_b " = ", _bb);\
   }((_a), (_b))
 #endif
 
 #define VERIFY(_test)  
\
-  verify(_test, verify::get_ip(), __FILE__, __LINE__, __PRETTY_FUNCTION__, 
\
-#_test)
+  verify(_test, __FILE__, __LINE__, __PRETTY_FUNCTION__, #_test)
 
   // ulp_distance_signed can raise FP exceptions and thus must be 
conditionally
   // executed
@@ -245,9 +245,9 @@ template 
   [&](auto&& _aa, auto&& _bb) {
\
 const bool success = std::experimental::all_of(
\
   vir::test::ulp_distance(_aa, _bb) <= (_allowed_distance));   
\
-return verify(success, verify::get_ip(), __FILE__, __LINE__,   
\
- __PRETTY_FUNCTION__, "all_of(" #_a " ~~ " #_b ")",   \
- #_a " = ", _aa, "\n" #_b " = ", _bb, "\ndistance = ", 

[PATCH 14/16] Implement hmin and hmax

2021-01-27 Thread Matthias Kretz
From: Matthias Kretz 

From 9.7.4 in Parallelism TS 2. For some reason I overlooked these two
functions. Implement them via call to _S_reduce.

libstdc++-v3/ChangeLog:
* include/experimental/bits/simd.h: Add __detail::_Minimum and
__detail::_Maximum to use them as _BinaryOperation to _S_reduce.
Add hmin and hmax overloads for simd and const_where_expression.
* include/experimental/bits/simd_scalar.h
(_SimdImplScalar::_S_reduce): Make unused _BinaryOperation
parameter const-ref to allow calling _S_reduce with an rvalue.
* testsuite/experimental/simd/tests/reductions.cc: Add tests for
hmin and hmax. Since the compiler statically determined that all
tests pass, repeat the test after a call to make_value_unknown.
---
 libstdc++-v3/include/experimental/bits/simd.h | 78 ++-
 .../include/experimental/bits/simd_scalar.h   |  2 +-
 .../experimental/simd/tests/reductions.cc | 21 +
 3 files changed, 99 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/
include/experimental/bits/simd.h
index 14179491f9d..f08ef4c027d 100644
--- a/libstdc++-v3/include/experimental/bits/simd.h
+++ b/libstdc++-v3/include/experimental/bits/simd.h
@@ -204,6 +204,27 @@ template 
 template 
   using _SizeConstant = integral_constant;
 
+namespace __detail {
+  struct _Minimum {
+template 
+  _GLIBCXX_SIMD_INTRINSIC constexpr
+  _Tp
+  operator()(_Tp __a, _Tp __b) const {
+   using std::min;
+   return min(__a, __b);
+  }
+  };
+  struct _Maximum {
+template 
+  _GLIBCXX_SIMD_INTRINSIC constexpr
+  _Tp
+  operator()(_Tp __a, _Tp __b) const {
+   using std::max;
+   return max(__a, __b);
+  }
+  };
+} // namespace __detail
+
 // unrolled/pack execution helpers
 // __execute_n_times{{{
 template 
@@ -3408,7 +3429,7 @@ template 
 
 // }}}1
 // reductions [simd.reductions] {{{1
-  template >
+template >
   _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR _Tp
   reduce(const simd<_Tp, _Abi>& __v,
 _BinaryOperation __binary_op = _BinaryOperation())
@@ -3454,6 +3475,61 @@ template 
   reduce(const const_where_expression<_M, _V>& __x, bit_xor<> __binary_op)
   { return reduce(__x, 0, __binary_op); }
 
+template 
+  _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR _Tp
+  hmin(const simd<_Tp, _Abi>& __v) noexcept
+  {
+return _Abi::_SimdImpl::_S_reduce(__v, __detail::_Minimum());
+  }
+
+template 
+  _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR _Tp
+  hmax(const simd<_Tp, _Abi>& __v) noexcept
+  {
+return _Abi::_SimdImpl::_S_reduce(__v, __detail::_Maximum());
+  }
+
+template 
+  _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR
+  typename _V::value_type
+  hmin(const const_where_expression<_M, _V>& __x) noexcept
+  {
+using _Tp = typename _V::value_type;
+constexpr _Tp __id_elem =
+#ifdef __FINITE_MATH_ONLY__
+  __finite_max_v<_Tp>;
+#else
+  __value_or<__infinity, _Tp>(__finite_max_v<_Tp>);
+#endif
+_V __tmp = __id_elem;
+_V::_Impl::_S_masked_assign(__data(__get_mask(__x)), __data(__tmp),
+   __data(__get_lvalue(__x)));
+return _V::abi_type::_SimdImpl::_S_reduce(__tmp, __detail::_Minimum());
+  }
+
+template 
+  _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR
+  typename _V::value_type
+  hmax(const const_where_expression<_M, _V>& __x) noexcept
+  {
+using _Tp = typename _V::value_type;
+constexpr _Tp __id_elem =
+#ifdef __FINITE_MATH_ONLY__
+  __finite_min_v<_Tp>;
+#else
+  [] {
+   if constexpr (__value_exists_v<__infinity, _Tp>)
+ return -__infinity_v<_Tp>;
+   else
+ return __finite_min_v<_Tp>;
+  }();
+#endif
+_V __tmp = __id_elem;
+_V::_Impl::_S_masked_assign(__data(__get_mask(__x)), __data(__tmp),
+   __data(__get_lvalue(__x)));
+return _V::abi_type::_SimdImpl::_S_reduce(__tmp, __detail::_Maximum());
+  }
+
 // }}}1
 // algorithms [simd.alg] {{{
 template 
diff --git a/libstdc++-v3/include/experimental/bits/simd_scalar.h b/libstdc++-
v3/include/experimental/bits/simd_scalar.h
index 7680bc39c30..7e480ecdb37 100644
--- a/libstdc++-v3/include/experimental/bits/simd_scalar.h
+++ b/libstdc++-v3/include/experimental/bits/simd_scalar.h
@@ -182,7 +182,7 @@ struct _SimdImplScalar
   // _S_reduce {{{2
   template 
 static constexpr inline _Tp
-_S_reduce(const simd<_Tp, simd_abi::scalar>& __x, _BinaryOperation&)
+_S_reduce(const simd<_Tp, simd_abi::scalar>& __x, const 
_BinaryOperation&)
 { return __x._M_data; }
 
   // _S_min, _S_max {{{2
diff --git a/libstdc++-v3/testsuite/experimental/simd/tests/reductions.cc b/
libstdc++-v3/testsuite/experimental/simd/tests/reductions.cc
index 9d897d5ccd6..02df68fafbc 100644
--- a/libstdc++-v3/testsuite/experimental/simd/tests/reductions.cc
+++ b/libstdc++-v3/testsuite/experimental/simd/tests/reductions.cc
@@ -57,6 +57,8 @@ template 
 }
 
   

[PATCH 15/16] Work around test failures using -mno-tree-vrp

2021-01-27 Thread Matthias Kretz
From: Matthias Kretz 

This is necessary to avoid failures resulting from PR98834.

libstdc++-v3/ChangeLog:
* testsuite/Makefile.am: Warn about the workaround. Add
-fno-tree-vrp to CXXFLAGS passed to the check_simd script.
Improve initial user feedback from make check-simd.
* testsuite/Makefile.in: Regenerated.
---
 libstdc++-v3/testsuite/Makefile.am | 4 +++-
 libstdc++-v3/testsuite/Makefile.in | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/testsuite/Makefile.am b/libstdc++-v3/testsuite/
Makefile.am
index 2d3ad481dba..ba5023a8b54 100644
--- a/libstdc++-v3/testsuite/Makefile.am
+++ b/libstdc++-v3/testsuite/Makefile.am
@@ -191,8 +191,10 @@ check-simd: $(srcdir)/experimental/simd/
generate_makefile.sh \
${glibcxx_srcdir}/scripts/check_simd \
testsuite_files_simd \
${glibcxx_builddir}/scripts/testsuite_flags
+   @echo "WARNING: Adding -fno-tree-vrp to CXXFLAGS to work around 
PR98834."
@rm -f .simd.summary
-   ${glibcxx_srcdir}/scripts/check_simd "${glibcxx_srcdir}" "$
{glibcxx_builddir}" "$(CXXFLAGS)" | \
+   @echo "Generating simd testsuite subdirs and Makefiles ..."
+   @${glibcxx_srcdir}/scripts/check_simd "${glibcxx_srcdir}" "$
{glibcxx_builddir}" "$(CXXFLAGS) -fno-tree-vrp" | \
  while read subdir; do \
$(MAKE) -C "$${subdir}"; \
tail -n20 $${subdir}/simd_testsuite.sum | \
diff --git a/libstdc++-v3/testsuite/Makefile.in b/libstdc++-v3/testsuite/
Makefile.in
index ac6207ae75c..c9dd7f5da61 100644
--- a/libstdc++-v3/testsuite/Makefile.in
+++ b/libstdc++-v3/testsuite/Makefile.in
@@ -716,8 +716,10 @@ check-simd: $(srcdir)/experimental/simd/
generate_makefile.sh \
${glibcxx_srcdir}/scripts/check_simd \
testsuite_files_simd \
${glibcxx_builddir}/scripts/testsuite_flags
+   @echo "WARNING: Adding -fno-tree-vrp to CXXFLAGS to work around 
PR98834."
@rm -f .simd.summary
-   ${glibcxx_srcdir}/scripts/check_simd "${glibcxx_srcdir}" "$
{glibcxx_builddir}" "$(CXXFLAGS)" | \
+   @echo "Generating simd testsuite subdirs and Makefiles ..."
+   @${glibcxx_srcdir}/scripts/check_simd "${glibcxx_srcdir}" "$
{glibcxx_builddir}" "$(CXXFLAGS) -fno-tree-vrp" | \
  while read subdir; do \
$(MAKE) -C "$${subdir}"; \
tail -n20 $${subdir}/simd_testsuite.sum | \
-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 std::experimental::simd  https://github.com/VcDevel/std-simd
──



[PATCH 16/16] Improve "find_first/last_set" for NEON

2021-01-27 Thread Matthias Kretz
From: yaozhongxiao 

find_first_set and find_last_set method is not optimal for neon,
it need to be improved by synthesized with horizontal adds(vaddv)
which will reduce the generated assembly code; in the following cases,
vaddvq_s16 will generate 2 instructions but vpadd_s16 will generate 4
instrunctions:
```
 # vaddvq_s16
vaddvq_s16(__asint);
//  addvh0, v1.8h
//  smovw1, v0.h[0]
 # vpadd_s16
vpaddq_s16(vpaddq_s16(vpaddq_s16(__asint, __zero), __zero), __zero)[0]
// addp v1.8h,v1.8h,v2.8h
// addp v1.8h,v1.8h,v2.8h
// addp v1.8h,v1.8h,v2.8h
// smovw1, v1.h[0]
 #
```

libstdc++-v3/ChangeLog:
* include/experimental/bits/simd_neon.h: Replace repeated vpadd
calls with a single vaddv for aarch64.
---
 .../include/experimental/bits/simd_neon.h   | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/experimental/bits/simd_neon.h b/libstdc++-
v3/include/experimental/bits/simd_neon.h
index a3a8ffe165f..0b8ccc17513 100644
--- a/libstdc++-v3/include/experimental/bits/simd_neon.h
+++ b/libstdc++-v3/include/experimental/bits/simd_neon.h
@@ -311,8 +311,7 @@ struct _MaskImplNeonMixin
  });
  __asint &= __bitsel;
 #ifdef __aarch64__
- return vpaddq_s16(vpaddq_s16(vpaddq_s16(__asint, __zero), 
__zero),
-   __zero)[0];
+ return vaddvq_s16(__asint);
 #else
  return vpadd_s16(
vpadd_s16(vpadd_s16(__lo64(__asint), __hi64(__asint)), __zero),
@@ -328,7 +327,7 @@ struct _MaskImplNeonMixin
  });
  __asint &= __bitsel;
 #ifdef __aarch64__
- return vpaddq_s32(vpaddq_s32(__asint, __zero), __zero)[0];
+ return vaddvq_s32(__asint);
 #else
  return vpadd_s32(vpadd_s32(__lo64(__asint), __hi64(__asint)),
   __zero)[0];
@@ -351,8 +350,12 @@ struct _MaskImplNeonMixin
return static_cast<_I>(__i < _Np ? 1 << __i : 0);
  });
  __asint &= __bitsel;
+#ifdef __aarch64__
+ return vaddv_s8(__asint);
+#else
  return vpadd_s8(vpadd_s8(vpadd_s8(__asint, __zero), __zero),
  __zero)[0];
+#endif
}
  else if constexpr (sizeof(_Tp) == 2)
{
@@ -362,12 +365,20 @@ struct _MaskImplNeonMixin
return static_cast<_I>(__i < _Np ? 1 << __i : 0);
  });
  __asint &= __bitsel;
+#ifdef __aarch64__
+ return vaddv_s16(__asint);
+#else
  return vpadd_s16(vpadd_s16(__asint, __zero), __zero)[0];
+#endif
}
  else if constexpr (sizeof(_Tp) == 4)
{
  __asint &= __make_vector<_I>(0x1, 0x2);
+#ifdef __aarch64__
+ return vaddv_s32(__asint);
+#else
  return vpadd_s32(__asint, __zero)[0];
+#endif
}
  else
__assert_unreachable<_Tp>();
-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 std::experimental::simd  https://github.com/VcDevel/std-simd
──






Re: [PATCH] rs6000: Fix vec insert ilp32 ICE and test failures [PR98799]

2021-01-27 Thread David Edelsohn via Gcc-patches
This patch is okay with the removal of

{ target powerpc*-*-* }

from the pr79251-run.c testcase directives.

As I explained in the earlier email, I still believe that the testcase
is not testing what you intend, but this patch is a definite
improvement and removes the failures.  We can correct the testcase in
a follow-up patch.

Thanks for the clarification about P9 support.  32 bit doesn't have a
fast mechanism to move SImode to SFmode.

Thanks, David

On Tue, Jan 26, 2021 at 10:56 PM Xionghu Luo  wrote:
>
> Hi,
>
> On 2021/1/27 03:00, David Edelsohn wrote:
> > On Tue, Jan 26, 2021 at 2:46 AM Xionghu Luo  wrote:
> >>
> >> From: "luo...@cn.ibm.com" 
> >>
> >> UNSPEC_SI_FROM_SF is not supported when TARGET_DIRECT_MOVE_64BIT
> >> is false for -m32, don't generate VIEW_CONVERT_EXPR(ARRAY_REF) for
> >> variable vector insert.  Remove rs6000_expand_vector_set_var helper
> >> function, adjust the p8 and p9 definitions position and make them
> >> static.
> >>
> >> The previous commit r11-6858 missed check m32, This patch is tested pass
> >> on P7BE{m32,m64}/P8BE{m32,m64}/P8LE/P9LE with
> >> RUNTESTFLAGS="--target_board =unix'{-m32,-m64}" for BE targets.
> >
> > Hi, Xionghu
> >
> > Thanks for addressing these failures and the cleanups.
> >
> > This patch addresses most of the failures.
> >
> > pr79251-run.c continues to fail.  The directives are not complete.
> > I'm not certain if your intention is to run the testcase on all
> > targets or only on Power7 and above.  The testcase relies on vector
> > "long long", which only is available with -mvsx, but the testcase only
> > enables -maltivec.  I believe that the testcase happens to pass on the
> > Linux platforms you tested because GCC defaulted to Power7 or Power8
> > ISA and the ABI specifies VSX.  The testcase probably needs to be
> > restricted to only run on some level of VSX enabled processor (VSX?
> > Power8? Power9?) and also needs some additional compiler options when
> > compiling the testcase instead of relying upon the default
> > configuration of the compiler.
>
>
> P8BE: gcc/testsuite/gcc/gcc.sum(it didn't run before due to no 'dg-do run'):
>
> Running target unix/-m32
> Running 
> /home/luoxhu/workspace/gcc/gcc/testsuite/gcc.target/powerpc/powerpc.exp ...
> PASS: gcc.target/powerpc/pr79251-run.c (test for excess errors)
> PASS: gcc.target/powerpc/pr79251-run.c execution test
> === gcc Summary for unix/-m32 ===
>
> # of expected passes2
> Running target unix/-m64
> Running 
> /home/luoxhu/workspace/gcc/gcc/testsuite/gcc.target/powerpc/powerpc.exp ...
> PASS: gcc.target/powerpc/pr79251-run.c (test for excess errors)
> PASS: gcc.target/powerpc/pr79251-run.c execution test
> === gcc Summary for unix/-m64 ===
>
> # of expected passes2
>
>
> How did you get the failure of pr79251-run.c, please?  I tested it all
> passes on P7BE{m32,m64}/P8BE{m32,m64}/P8LE/P9LE of Linux.  This case is
> just verifying the *functionality* of "u = vec_insert (254, v, k)" and
> compare whether u[k] is changed to 254, it must work on all platforms,
> no matter with the optimization or not, otherwise there is a functional
> error.  As to "long long", add target vsx_hw and powerpc like below?
> (Also change the -maltive to -mvsx for pr79251.p8.c/pr79251.p9.c.)
>
> --- a/gcc/testsuite/gcc.target/powerpc/pr79251-run.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr79251-run.c
> @@ -1,4 +1,6 @@
> -/* { dg-options "-O2 -maltivec" } */
> +/* { dg-do run { target powerpc*-*-* } } */
> +/* { dg-require-effective-target vsx_hw { target powerpc*-*-* } } */
> +/* { dg-options "-O2 -mvsx" } */
>
>
> Any other options necessary to limit the testcases? :)
>
> >
> > Also, part of the change seems to be
> >
> >> -  if (TARGET_P9_VECTOR || GET_MODE_SIZE (inner_mode) == 8)
> >> -rs6000_expand_vector_set_var_p9 (target, val, idx);
> >> + if ((TARGET_P9_VECTOR && TARGET_POWERPC64) || width == 8)
> >> +   {
> >> + rs6000_expand_vector_set_var_p9 (target, val, elt_rtx);
> >> + return;
> >> +   }
> >
> > Does the P9 case need TARGET_POWERPC64?  This optimization seemed to
> > be functioning on P9 in 32 bit mode prior to this fix.  It would be a
> > shame to unnecessarily disable this optimization in 32 bit mode.  Or
> > maybe it generated a functioning sequence but didn't utilize the
> > optimization.  Would you please check / clarify?
>
>
> >> -  if (TARGET_P8_VECTOR)
> >> +  if (TARGET_P8_VECTOR && TARGET_DIRECT_MOVE_64BIT)
> >>  {
> >>stmt = build_array_ref (loc, stmt, arg2);
> >>stmt = fold_build2 (MODIFY_EXPR, TREE_TYPE (arg0), stmt,
>
>
> This change in rs6000-c.c causes it not generating 
> VIEW_CONVERT_EXPR(ARRAY_REF)
> gimple code again for P9-32bit, then the IFN VEC_SET won't be matched,
> so rs6000.c:rs6000_expand_vector_set_var_p9 won't be called to produce
> optimized "lvsl+xxperm+lvsr" for P9-32bit again.  It's a pity, but without
> this, it IC

Re: [PATCH] PowerPC: Map IEEE 128-bit long double built-ins.

2021-01-27 Thread Michael Meissner via Gcc-patches
On Wed, Jan 27, 2021 at 01:06:46PM -0600, will schmidt wrote:
> On Thu, 2021-01-14 at 11:59 -0500, Michael Meissner via Gcc-patches wrote:
> > From 78435dee177447080434cdc08fc76b1029c7f576 Mon Sep 17 00:00:00 2001
> > From: Michael Meissner 
> > Date: Wed, 13 Jan 2021 21:47:03 -0500
> > Subject: [PATCH] PowerPC: Map IEEE 128-bit long double built-ins.
> > 
> > This patch replaces patches previously submitted:
> > 
> > September 24th, 2020:
> > Message-ID: <20200924203159.ga31...@ibm-toto.the-meissners.org>
> > 
> > October 9th, 2020:
> > Message-ID: <20201009043543.ga11...@ibm-toto.the-meissners.org>
> > 
> > October 24th, 2020:
> > Message-ID: <2020100346.ga8...@ibm-toto.the-meissners.org>
> > 
> > November 19th, 2020:
> > Message-ID: <20201119235814.ga...@ibm-toto.the-meissners.org>
> 
> 
> Subject and date should be sufficient _if_ having the old versions
> of the patchs are necessary to review the latest version of the
> patch.  Which ideally is not the case.
> 
> 
> > 
> > This patch maps the built-in functions that take or return long double
> > arguments on systems where long double is IEEE 128-bit.
> > 
> > If long double is IEEE 128-bit, this patch goes through the built-in 
> > functions
> > and changes the name of the math, scanf, and printf built-in functions to 
> > use
> > the functions that GLIBC provides when long double uses the IEEE 128-bit
> > representation.
> 
> ok.
> 
> > 
> > In addition, changing the name in GCC allows the Fortran compiler to
> > automatically use the correct name.
> 
> Does the fortran compiler currently use the wrong name? (pr?)

Yes.  If the compiler is configured for IBM 128-bit long double, the Fortran
compiler calls 'sinl' for real*16.  If the compiler is configured for IEEE
128-bit long double, the compiler needs to call __sinieee128 instead of sinl.

Similarly if a C or C++ user calls __builtin_sinl directly without including
math.h, the wrong name would be used.

Hence what this code does is change the names of all of the built-in functions
that can use long double to be the names appropriate for IEEE 128-bit.

> > 
> > To map the math functions, typically this patch changes l to
> > __ieee128.  However there are some exceptions that are handled with 
> > this
> > patch.
> 
> This appears to be  the rs6000_mangle_decl_assembler_name() function, which
> also maps l_r to ieee128_r, and looks like some additional special
> handling for printf and scanf.  

Yes, the rs6000_mangle_decl_assembler_name was not complete in the mapping.  In
particular, it did not handle *printf, *scanf, or *l_r calls.  There are also a
few names that need to have a different mapping.

> 
> > To map the printf functions,  is mapped to __ieee128.
> > 
> > To map the scanf functions,  is mapped to __isoc99_ieee128.
> 
> 
> > 
> > I have tested this patch by doing builds, bootstraps, and make check with 3
> > builds on a power9 little endian server:
> > 
> > *   Build one used the default long double being IBM 128-bit;
> > *   Build two set the long double default to IEEE 128-bit; (and)
> > *   Build three set the long double default to 64-bit.
> > 
> 
> ok
> 
> > The compilers built fine providing I recompiled gmp, mpc, and mpfr with the
> > appropriate long double options.
> 
> Presumably the build is otherwise broken... 
> Does that mean more than invoking download_preqrequisites as part of the
> build?   If there are specific options required during configure/build of
> those packages, they should be called out.
> 
> > There were a few differences in the test
> > suite runs that will be addressed in later patches, but over all it works
> > well.
> 
> Presumably minimal. :-)

It depends on what you mean by minimal.

* There are 5 C tests that fail (2 Decimal/IEEE, 3 NaN related)
* 2 C tests that need some changes to be able to run
* There are 2 C++ tests that fail (Decimal/IEEE, same as the C tests)
* There are 31 C++ modules tests that fail (PR 98645)
* There are 3 Fortran tests that used to fail that now pass

I have patches for the Decimal/IEEE tests

> 
> >   This patch is required to be able to build a toolchain where the 
> > default
> > long double is IEEE 128-bit. 
> 
> Ok.   Could lead the patch description with this,.  I imagine this is
> just one of several patches that are still required towrards that goal.

In terms of 'need', this patch and the Decimal patch next are the two patches
that absolutely need to be installed.  The others fix some things and tests,
but are not required.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


[committed] [PR97684] IRA: Recalculate pseudo classes if we added new pseduos since last calculation before updating equiv regs

2021-01-27 Thread Vladimir Makarov via Gcc-patches

The patch solves the following problem:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97684

The patch was successfully bootstrapped and tested on x86-64.

commit 238ea13cca75ad499f227b60a95c40174c6caf78
Author: Vladimir N. Makarov 
Date:   Wed Jan 27 14:53:28 2021 -0500

[PR97684] IRA: Recalculate pseudo classes if we added new pseduos since last calculation before updating equiv regs

update_equiv_regs can use reg classes of pseudos and they are set up in
register pressure sensitive scheduling and loop invariant motion and in
live range shrinking.  This info can become obsolete if we add new pseudos
since the last set up.  Recalculate it again if the new pseudos were
added.

gcc/ChangeLog:

PR rtl-optimization/97684
* ira.c (ira): Call ira_set_pseudo_classes before
update_equiv_regs when it is necessary.

gcc/testsuite/ChangeLog:

PR rtl-optimization/97684
* gcc.target/i386/pr97684.c: New.

diff --git a/gcc/ira.c b/gcc/ira.c
index f0bdbc8cf56..c32ecf814fd 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -5566,6 +5566,15 @@ ira (FILE *f)
   if (warn_clobbered)
 generate_setjmp_warnings ();
 
+  /* update_equiv_regs can use reg classes of pseudos and they are set up in
+ register pressure sensitive scheduling and loop invariant motion and in
+ live range shrinking.  This info can become obsolete if we add new pseudos
+ since the last set up.  Recalculate it again if the new pseudos were
+ added.  */
+  if (resize_reg_info () && (flag_sched_pressure || flag_live_range_shrinkage
+			 || flag_ira_loop_pressure))
+ira_set_pseudo_classes (true, ira_dump_file);
+
   init_alias_analysis ();
   loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
   reg_equiv = XCNEWVEC (struct equivalence, max_reg_num ());
@@ -5610,9 +5619,6 @@ ira (FILE *f)
   regstat_recompute_for_max_regno ();
 }
 
-  if (resize_reg_info () && flag_ira_loop_pressure)
-ira_set_pseudo_classes (true, ira_dump_file);
-
   setup_reg_equiv ();
   grow_reg_equivs ();
   setup_reg_equiv_init ();
diff --git a/gcc/testsuite/gcc.target/i386/pr97684.c b/gcc/testsuite/gcc.target/i386/pr97684.c
new file mode 100644
index 000..983bf535ad8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr97684.c
@@ -0,0 +1,24 @@
+/* PR rtl-optimization/97684 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -flive-range-shrinkage -fschedule-insns -fselective-scheduling -funroll-all-loops -fno-web" } */
+
+void
+c5 (double);
+
+void
+g4 (int *n4)
+{
+  double lp = 0.0;
+  int fn;
+
+  for (fn = 0; fn < 18; ++fn)
+{
+  int as;
+
+  as = __builtin_abs (n4[fn]);
+  if (as > lp)
+lp = as;
+}
+
+  c5 (lp);
+}


Re: [PATCH, revised] PowerPC: Add float128/Decimal conversions.

2021-01-27 Thread Michael Meissner via Gcc-patches
[PATCH, revised] PowerPC: Add float128/Decimal conversions.

This patch revises the patch on January 14th.  The only change in this patch
compared to the previous patch is to change the format string for converting
IEEE 128-bit to string.  This allows the c-c++-common/dfp/convert-bfp-6.c test
now passes.

This patch replaces the following three patches:

September 24th, 2020:
Message-ID: <20200924203545.gd31...@ibm-toto.the-meissners.org>

October 22nd, 2020:
Message-ID: <2020100603.ga11...@ibm-toto.the-meissners.org>

January 14th, 2021:
Message-ID: <20210114170936.ga3...@ibm-toto.the-meissners.org>
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563498.html

This patch rewrites those patches.  In order to run with older GLIBC's, this
patch uses weak references to the IEEE 128-bit conversions to/from string that
are found in GLIBC 2.32.

If the user uses GLIBC 2.32 or later, the Decimal <-> Float128 conversions will
call the functions in that library.  This isn't ideal, as IEEE 128-bit has more
exponent range than IBM 128-bit.

If an older library is used, these patches will convert IEEE 128-bit to IBM
128-bit and do the conversion with IBM 128-bit.  I have tested this with a
compiler configured to use an older library, and it worked for the conversion
if the number could be represented in the IBM 128-bit format.

While most of the Decimal <-> Long double tests now pass when long doubles are
IEEE 128-bit, there is one test that fails:

*   c-c++-common/dfp/convert-bfp-11.c

I have patches for the bfp-11 test (which requires that long double be IBM
128-bit).

I have tested this patch by doing builds, bootstraps, and make check with 3
builds on a power9 little endian server:

*   Build one used the default long double being IBM 128-bit;
*   Build two set the long double default to IEEE 128-bit; (and)
*   Build three set the long double default to 64-bit.

I have also built and tested this patch on a big endian Power8 system with
both 64 and 32-bit targets.  There were no regressions.

The compilers built fine providing I recompiled gmp, mpc, and mpfr with the
appropriate long double options.  There were a few differences in the test
suite runs that will be addressed in later patches, but over all it works
well.  This patch is required to be able to build a toolchain where the default
long double is IEEE 128-bit.  Can I check this patch into the master branch for
GCC 11?

libgcc/
2021-01-27  Michael Meissner  

* config/rs6000/_dd_to_kf.c: New file.
* config/rs6000/_kf_to_dd.c: New file.
* config/rs6000/_kf_to_sd.c: New file.
* config/rs6000/_kf_to_td.c: New file.
* config/rs6000/_sd_to_kf.c: New file.
* config/rs6000/_sprintfkf.c: New file.
* config/rs6000/_sprintfkf.h: New file.
* config/rs6000/_strtokf.h: New file.
* config/rs6000/_strtokf.c: New file.
* config/rs6000/_td_to_kf.c: New file.
* config/rs6000/quad-float128.h: Add new declarations.
* config/rs6000/t-float128 (fp128_dec_funcs): New macro.
(fp128_decstr_funcs): New macro.
(ibm128_dec_funcs): New macro.
(fp128_ppc_funcs): Add the new conversions.
(fp128_dec_objs): Force Decimal <-> __float128 conversions to be
compiled with -mabi=ieeelongdouble.
(fp128_decstr_objs): Force __float128 <-> string conversions to be
compiled with -mabi=ibmlongdouble.
(ibm128_dec_objs): Force Decimal <-> __float128 conversions to be
compiled with -mabi=ieeelongdouble.
(FP128_CFLAGS_DECIMAL): New macro.
(IBM128_CFLAGS_DECIMAL): New macro.
* dfp-bit.c (DFP_TO_BFP): Add PowerPC _Float128 support.
(BFP_TO_DFP): Add PowerPC _Float128 support.
* dfp-bit.h (BFP_KIND): Add new binary floating point kind for
IEEE 128-bit floating point.
(DFP_TO_BFP): Add PowerPC _Float128 support.
(BFP_TO_DFP): Add PowerPC _Float128 support.
(BFP_SPRINTF): New macro.
---
 libgcc/config/rs6000/_dd_to_kf.c | 37 ++
 libgcc/config/rs6000/_kf_to_dd.c | 37 ++
 libgcc/config/rs6000/_kf_to_sd.c | 37 ++
 libgcc/config/rs6000/_kf_to_td.c | 37 ++
 libgcc/config/rs6000/_sd_to_kf.c | 37 ++
 libgcc/config/rs6000/_sprintfkf.c| 57 
 libgcc/config/rs6000/_sprintfkf.h| 28 ++
 libgcc/config/rs6000/_strtokf.c  | 56 +++
 libgcc/config/rs6000/_strtokf.h  | 27 +
 libgcc/config/rs6000/_td_to_kf.c | 37 ++
 libgcc/config/rs6000/quad-float128.h |  8 
 libgcc/config/rs6000/t-float128  | 37 +-
 libgcc/dfp-bit.c | 12 +-
 libgcc/dfp-bit.h | 26 +
 14 files changed, 470 insertions(+), 3 deletions(-)
 create mode 100644 libgcc/config/rs6000/_dd_to_kf.c
 create mode 100644 libgcc

[PATCH, revised, #2] PowerPC: Add float128/Decimal conversions.

2021-01-27 Thread Michael Meissner via Gcc-patches
>From 02b04aed77130f2ec9156d2f7ff89d4cc6b5a78b Mon Sep 17 00:00:00 2001
From: Michael Meissner 
Date: Thu, 21 Jan 2021 12:58:56 -0500
Subject: [PATCH, revised] PowerPC: Add float128/Decimal conversions.

[PATCH, revised] PowerPC: Add float128/Decimal conversions.

Unfortunately, the revision I just posted had the old patch, and not the new
patch.  This patch actually has the BFP_FMT set to "%.36Le" which gives enough
accuracy to allow c-c++-common/dfp/convert-bfp-6.c test to pass.

This patch replaces the following three patches:

September 24th, 2020:
Message-ID: <20200924203545.gd31...@ibm-toto.the-meissners.org>

October 22nd, 2020:
Message-ID: <2020100603.ga11...@ibm-toto.the-meissners.org>

January 14th, 2021:
Message-ID: <20210114170936.ga3...@ibm-toto.the-meissners.org>
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563498.html

This patch rewrites those patches.  In order to run with older GLIBC's, this
patch uses weak references to the IEEE 128-bit conversions to/from string that
are found in GLIBC 2.32.

If the user uses GLIBC 2.32 or later, the Decimal <-> Float128 conversions will
call the functions in that library.  This isn't ideal, as IEEE 128-bit has more
exponent range than IBM 128-bit.

If an older library is used, these patches will convert IEEE 128-bit to IBM
128-bit and do the conversion with IBM 128-bit.  I have tested this with a
compiler configured to use an older library, and it worked for the conversion
if the number could be represented in the IBM 128-bit format.

While most of the Decimal <-> Long double tests now pass when long doubles are
IEEE 128-bit, there is one test that fails:

*   c-c++-common/dfp/convert-bfp-11.c

I have patches for the bfp-11 test (which requires that long double be IBM
128-bit).

Compared to the patch on January 14th, this patch fixes the format string
for converting IEEE 128-bit floating point to string.  This in turn allows
the c-c++-common/dfp/convert-bfp-6.c to pass.

I have tested this patch by doing builds, bootstraps, and make check with 3
builds on a power9 little endian server:

*   Build one used the default long double being IBM 128-bit;
*   Build two set the long double default to IEEE 128-bit; (and)
*   Build three set the long double default to 64-bit.

I have also built and tested this patch on a big endian Power8 system with
both 64 and 32-bit targets.  There were no regressions.

The compilers built fine providing I recompiled gmp, mpc, and mpfr with the
appropriate long double options.  There were a few differences in the test
suite runs that will be addressed in later patches, but over all it works
well.  This patch is required to be able to build a toolchain where the default
long double is IEEE 128-bit.  Can I check this patch into the master branch for
GCC 11?

libgcc/
2021-01-27  Michael Meissner  

* config/rs6000/_dd_to_kf.c: New file.
* config/rs6000/_kf_to_dd.c: New file.
* config/rs6000/_kf_to_sd.c: New file.
* config/rs6000/_kf_to_td.c: New file.
* config/rs6000/_sd_to_kf.c: New file.
* config/rs6000/_sprintfkf.c: New file.
* config/rs6000/_sprintfkf.h: New file.
* config/rs6000/_strtokf.h: New file.
* config/rs6000/_strtokf.c: New file.
* config/rs6000/_td_to_kf.c: New file.
* config/rs6000/quad-float128.h: Add new declarations.
* config/rs6000/t-float128 (fp128_dec_funcs): New macro.
(fp128_decstr_funcs): New macro.
(ibm128_dec_funcs): New macro.
(fp128_ppc_funcs): Add the new conversions.
(fp128_dec_objs): Force Decimal <-> __float128 conversions to be
compiled with -mabi=ieeelongdouble.
(fp128_decstr_objs): Force __float128 <-> string conversions to be
compiled with -mabi=ibmlongdouble.
(ibm128_dec_objs): Force Decimal <-> __float128 conversions to be
compiled with -mabi=ieeelongdouble.
(FP128_CFLAGS_DECIMAL): New macro.
(IBM128_CFLAGS_DECIMAL): New macro.
* dfp-bit.c (DFP_TO_BFP): Add PowerPC _Float128 support.
(BFP_TO_DFP): Add PowerPC _Float128 support.
* dfp-bit.h (BFP_KIND): Add new binary floating point kind for
IEEE 128-bit floating point.
(DFP_TO_BFP): Add PowerPC _Float128 support.
(BFP_TO_DFP): Add PowerPC _Float128 support.
(BFP_SPRINTF): New macro.
---
 libgcc/config/rs6000/_dd_to_kf.c | 37 ++
 libgcc/config/rs6000/_kf_to_dd.c | 37 ++
 libgcc/config/rs6000/_kf_to_sd.c | 37 ++
 libgcc/config/rs6000/_kf_to_td.c | 37 ++
 libgcc/config/rs6000/_sd_to_kf.c | 37 ++
 libgcc/config/rs6000/_sprintfkf.c| 57 
 libgcc/config/rs6000/_sprintfkf.h| 28 ++
 libgcc/config/rs6000/_strtokf.c  | 56 +++
 libgcc/config/rs6000/_strtokf.h  | 27 +
 libgcc/config/rs6000/_td_to_kf.c  

[PATCH Fortran] Re: PR fortran/93524 - rank >= 3 array stride incorrectly set in CFI_establish

2021-01-27 Thread Harris Snyder
(re-sending with subject line tags)

Hi all,

Now that my copyright assignment is complete, I'm submitting this fix.
Test cases are included.
OK for master? I do not have write access, so someone will need to
commit this for me.

Regards,
Harris

libgfortran/ChangeLog:

* runtime/ISO_Fortran_binding.c (CFI_establish):  fixed strides
for rank >2 arrays

gcc/testsuite/ChangeLog:

* gfortran.dg/ISO_Fortran_binding_18.c: New test.
* gfortran.dg/ISO_Fortran_binding_18.f90: New test.

> On Wed, Jan 13, 2021 at 2:10 PM Harris Snyder  wrote:
> >
> > Hi Tobias / all,
> >
> > Further related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93524
> > `sm` is being incorrectly computed in CFI_establish. Take a look at
> > the diff below - we are currently only using the extent of the
> > previous rank to assign `sm`, instead of all previous ranks. Have I
> > got this right, or am I missing something / does this need to be
> > handled differently? I can offer some test cases and submit a proper
> > patch if we think this solution is OK...
> >
> > Thanks,
> > Harris
> >
> > diff --git a/libgfortran/runtime/ISO_Fortran_binding.c
> > b/libgfortran/runtime/ISO_Fortran_binding.c
> > index 3746ec1c681..20833ad2025 100644
> > --- a/libgfortran/runtime/ISO_Fortran_binding.c
> > +++ b/libgfortran/runtime/ISO_Fortran_binding.c
> > @@ -391,7 +391,12 @@ int CFI_establish (CFI_cdesc_t *dv, void
> > *base_addr, CFI_attribute_t attribute,
> >   if (i == 0)
> > dv->dim[i].sm = dv->elem_len;
> >   else
> > -   dv->dim[i].sm = (CFI_index_t)(dv->elem_len * extents[i - 1]);
> > +   {
> > + CFI_index_t extents_product = 1;
> > + for (int j = 0; j < i; j++)
> > +   extents_product *= extents[j];
> > + dv->dim[i].sm = (CFI_index_t)(dv->elem_len * extents_product);
> > +   }
> > }
> >  }
commit 451bd40aca006ebdba52553de2392fcb5b1ff42f
Author: Harris M. Snyder 
Date:   Tue Jan 26 23:29:24 2021 -0500

Partial fix for PR fortran/93524

diff --git a/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_18.c b/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_18.c
new file mode 100644
index 000..4d1c4ecbd72
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_18.c
@@ -0,0 +1,29 @@
+#include 
+
+#include 
+#include 
+
+
+
+extern int do_loop(CFI_cdesc_t* array);
+
+int main(int argc, char ** argv)
+{
+	int nx = 9;
+	int ny = 10;
+	int nz = 2;
+
+	int arr[nx*ny*nz];
+	memset(arr,0,sizeof(int)*nx*ny*nz);
+	CFI_index_t shape[3];
+	shape[0] = nz;
+	shape[1] = ny;
+	shape[2] = nx;
+
+	CFI_CDESC_T(3) farr;
+	int rc = CFI_establish((CFI_cdesc_t*)&farr, arr, CFI_attribute_other, CFI_type_int, 0, (CFI_rank_t)3, (const CFI_index_t *)shape);
+	if (rc != CFI_SUCCESS) abort();
+	int result = do_loop((CFI_cdesc_t*)&farr);
+	if (result != nx*ny*nz) abort();
+	return 0;
+}
diff --git a/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_18.f90 b/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_18.f90
new file mode 100644
index 000..76be51d22fb
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_18.f90
@@ -0,0 +1,28 @@
+! { dg-do run }
+! { dg-additional-sources ISO_Fortran_binding_18.c }
+
+module fortran_binding_test_18
+use iso_c_binding
+implicit none
+contains
+
+subroutine test(array)
+integer(c_int) :: array(:)
+array = 1
+end subroutine
+
+function do_loop(array) result(the_sum) bind(c)
+integer(c_int), intent(in out) :: array(:,:,:)
+integer(c_int) :: the_sum, i, j
+
+the_sum = 0  
+array = 0
+do i=1,size(array,3)
+do j=1,size(array,2)
+call test(array(:,j,i))
+end do
+end do
+the_sum = sum(array)
+end function
+
+end module
diff --git a/libgfortran/runtime/ISO_Fortran_binding.c b/libgfortran/runtime/ISO_Fortran_binding.c
index 3746ec1c681..20833ad2025 100644
--- a/libgfortran/runtime/ISO_Fortran_binding.c
+++ b/libgfortran/runtime/ISO_Fortran_binding.c
@@ -391,7 +391,12 @@ int CFI_establish (CFI_cdesc_t *dv, void *base_addr, CFI_attribute_t attribute,
 	  if (i == 0)
 	dv->dim[i].sm = dv->elem_len;
 	  else
-	dv->dim[i].sm = (CFI_index_t)(dv->elem_len * extents[i - 1]);
+	{
+	  CFI_index_t extents_product = 1;
+	  for (int j = 0; j < i; j++)
+		extents_product *= extents[j];
+	  dv->dim[i].sm = (CFI_index_t)(dv->elem_len * extents_product);
+	}
 	}
 }
 


[PATCH] [8/9/10/11 Regression] [OOP] PR fortran/86470 - ICE with OpenMP

2021-01-27 Thread Harald Anlauf via Gcc-patches
Dear all,

the fix for this ICE is obvious: make gfc_call_malloc behave as documented.
Apparently the special case in question was not exercised in the testsuite.

Regtested on x86_64-pc-linux-gnu.

OK for master / backports?

Should the testcase be moved to the gomp/ subdirectory?

Thanks,
Harald


PR fortran/86470 - ICE with OpenMP, class(*) allocatable

gfc_call_malloc should malloc an area of size 1 if no size given.

gcc/fortran/ChangeLog:

PR fortran/86470
* trans.c (gfc_call_malloc): Allocate area of size 1 if passed
size is NULL (as documented).

gcc/testsuite/ChangeLog:

PR fortran/86470
* gfortran.dg/pr86470.f90: New test.

diff --git a/gcc/fortran/trans.c b/gcc/fortran/trans.c
index a2376917635..ab53fc5f441 100644
--- a/gcc/fortran/trans.c
+++ b/gcc/fortran/trans.c
@@ -689,6 +689,9 @@ gfc_call_malloc (stmtblock_t * block, tree type, tree size)
   /* Call malloc.  */
   gfc_start_block (&block2);

+  if (size == NULL_TREE)
+size = build_int_cst (size_type_node, 1);
+
   size = fold_convert (size_type_node, size);
   size = fold_build2_loc (input_location, MAX_EXPR, size_type_node, size,
 			  build_int_cst (size_type_node, 1));
diff --git a/gcc/testsuite/gfortran.dg/pr86470.f90 b/gcc/testsuite/gfortran.dg/pr86470.f90
new file mode 100644
index 000..4021e5d655c
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr86470.f90
@@ -0,0 +1,13 @@
+! { dg-do compile }
+! { dg-options "-fopenmp" }
+! PR fortran/86470 - ICE with OpenMP, class(*)
+
+program p
+  implicit none
+  class(*), allocatable :: val
+!$OMP PARALLEL private(val)
+  allocate(integer::val)
+  val = 1
+  deallocate(val)
+!$OMP END PARALLEL
+end


Re: [PATCH Fortran] Re: PR fortran/93524 - rank >= 3 array stride incorrectly set in CFI_establish

2021-01-27 Thread Thomas Koenig via Gcc-patches



Hi Harris!


OK for master? I do not have write access, so someone will need to
commit this for me.


Reviewed, regression-tested and committed as

https://gcc.gnu.org/g:1cdca4261e88f4dc9c3293c6b3c2fff3071ca32b

Thanks for your patch, and welcome aboard!

Best regards

Thomas


[PATCH v3] clear VLA bounds in attribute access (PR 97172)

2021-01-27 Thread Martin Sebor via Gcc-patches

Attached is another attempt to fix the problem caused by allowing
front-end trees representing nontrivial VLA bound expressions to
stay in attribute access attached to functions.  Since removing
these trees seems to be everyone's preference this patch does that
by extending the free_lang_data pass to look for and zero out these
trees.

Because free_lang_data only frees anything when LTO is enabled and
we want these trees cleared regardless to keep them from getting
clobbered during gimplification, this change also modifies the pass
to do the clearing even when the pass is otherwise inactive.

Tested on x86_64-linux.

Martin
PR middle-end/97172 - ICE: tree code 'ssa_name' is not supported in LTO streams

gcc/ChangeLog:

	PR middle-end/97172
	* attribs.c (attr_access::free_lang_data): Define new function.
	* attribs.h (attr_access::free_lang_data): Declare new function.
	* tree.c (free_lang_data_in_type): Call attr_access::free_lang_data.
	(array_bound_from_maxval): Define new function.
	* tree.h (array_bound_from_maxval): Declare new function.

gcc/c-family/ChangeLog:

	PR middle-end/97172
	* c-pretty-print.c (c_pretty_printer::direct_abstract_declarator):
	Call array_bound_from_maxval.

gcc/c/ChangeLog:

	PR middle-end/97172
	* c-decl.c (get_parm_array_spec): Call array_bound_from_maxval.

gcc/testsuite/ChangeLog:

	PR middle-end/97172
	* gcc.dg/pr97172.c: New test.

diff --git a/gcc/attribs.c b/gcc/attribs.c
index 94991fbbeab..81322d40f1d 100644
--- a/gcc/attribs.c
+++ b/gcc/attribs.c
@@ -2238,6 +2238,38 @@ attr_access::vla_bounds (unsigned *nunspec) const
   return list_length (size);
 }
 
+/* Reset front end-specific attribute access data from ATTRS.
+   Called from the free_lang_data pass.  */
+
+/* static */ void
+attr_access::free_lang_data (tree attrs)
+{
+  for (tree acs = attrs; (acs = lookup_attribute ("access", acs));
+   acs = TREE_CHAIN (acs))
+{
+  tree vblist = TREE_VALUE (acs);
+  vblist = TREE_CHAIN (vblist);
+  if (!vblist)
+	continue;
+
+  vblist = TREE_VALUE (vblist);
+  if (!vblist)
+	continue;
+
+  for (vblist = TREE_VALUE (vblist); vblist; vblist = TREE_CHAIN (vblist))
+	{
+	  tree *pvbnd = &TREE_VALUE (vblist);
+	  if (!*pvbnd || DECL_P (*pvbnd))
+	continue;
+
+	  /* VLA bounds that are expressions as opposed to DECLs are
+	 only used in the front end.  Reset them to keep front end
+	 trees leaking into the middle end (see pr97172) and to
+	 free up memory.  */
+	  *pvbnd = NULL_TREE;
+	}
+}
+}
 
 /* Defined in attr_access.  */
 constexpr char attr_access::mode_chars[];
diff --git a/gcc/attribs.h b/gcc/attribs.h
index 21d28a47f39..898e73db3e4 100644
--- a/gcc/attribs.h
+++ b/gcc/attribs.h
@@ -274,6 +274,9 @@ struct attr_access
   /* Return the access mode corresponding to the character code.  */
   static access_mode from_mode_char (char);
 
+  /* Reset front end-specific attribute access data from attributes.  */
+  static void free_lang_data (tree);
+
   /* The character codes corresponding to all the access modes.  */
   static constexpr char mode_chars[5] = { '-', 'r', 'w', 'x', '^' };
 
diff --git a/gcc/c-family/c-pretty-print.c b/gcc/c-family/c-pretty-print.c
index 2095d4badf7..c6e8a45afd5 100644
--- a/gcc/c-family/c-pretty-print.c
+++ b/gcc/c-family/c-pretty-print.c
@@ -635,22 +635,7 @@ c_pretty_printer::direct_abstract_declarator (tree t)
 		  /* Strip the expressions from around a VLA bound added
 		 internally to make it fit the domain mold, including
 		 any casts.  */
-		  if (TREE_CODE (maxval) == NOP_EXPR)
-		maxval = TREE_OPERAND (maxval, 0);
-		  if (TREE_CODE (maxval) == PLUS_EXPR
-		  && integer_all_onesp (TREE_OPERAND (maxval, 1)))
-		{
-		  maxval = TREE_OPERAND (maxval, 0);
-		  if (TREE_CODE (maxval) == NOP_EXPR)
-			maxval = TREE_OPERAND (maxval, 0);
-		}
-		  if (TREE_CODE (maxval) == SAVE_EXPR)
-		{
-		  maxval = TREE_OPERAND (maxval, 0);
-		  if (TREE_CODE (maxval) == NOP_EXPR)
-			maxval = TREE_OPERAND (maxval, 0);
-		}
-
+		  maxval = array_bound_from_maxval (maxval);
 		  expression (maxval);
 		}
 	}
diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 4ba9477f5d1..9dcad5e362d 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -5781,7 +5781,8 @@ get_parm_array_spec (const struct c_parm *parm, tree attrs)
 		{
 		  /* Each variable VLA bound is represented by the dollar
 		 sign.  */
-		  spec += "$";
+		  spec += '$';
+		  nelts = array_bound_from_maxval (nelts);
 		  tpbnds = tree_cons (NULL_TREE, nelts, tpbnds);
 		}
 	}
@@ -5835,7 +5836,8 @@ get_parm_array_spec (const struct c_parm *parm, tree attrs)
 	}
 
   /* Each variable VLA bound is represented by a dollar sign.  */
-  spec += "$";
+  spec += '$';
+  nelts = array_bound_from_maxval (nelts);
   vbchain = tree_cons (NULL_TREE, nelts, vbchain);
 }
 
diff --git a/gcc/testsuite/gcc.dg/pr97172.c b/gcc/testsuite/gcc.dg/pr97172.c
new file mode 100644
index 000.

Re: [[C++ PATCH]] Implement C++2a P0330R2 - Literal Suffixes for ptrdiff_t and size_t

2021-01-27 Thread Ed Smith-Rowland via Gcc-patches

On 1/27/21 3:32 PM, Jakub Jelinek wrote:

On Sun, Oct 21, 2018 at 04:39:30PM -0400, Ed Smith-Rowland wrote:

This patch implements C++2a proposal P0330R2 Literal Suffixes for ptrdiff_t
and size_t*.  It's not official yet but looks very likely to pass.  It is
incomplete because I'm looking for some opinions. 9We also might wait 'till
it actually passes).

This paper takes the direction of a language change rather than a library
change through C++11 literal operators.  This was after feedback on that
paper after a few iterations.

As coded in this patch, integer suffixes involving 'z' are errors in C and
warnings for C++ <= 17 (in addition to the usual warning about
implementation suffixes shadowing user-defined ones).

OTOH, the 'z' suffix is not currently legal - it can't break
currently-correct code in any C/C++ dialect.  furthermore, I suspect the
language direction was chosen to accommodate a similar addition to C20.

I'm thinking of making this feature available as an extension to all of
C/C++ perhaps with appropriate pedwarn.

GCC now supports -std=c++2b and -std=gnu++2b, are you going to update your
patch against it (and change for z/Z standing for ssize_t rather than
ptrdiff_t), plus incorporate the feedback from Joseph and Jason?

Jakub


I'm actually working on it now!




Re: [PATCH, rs6000] improve vec_ctf invalid parameter handling. (pr91903)

2021-01-27 Thread Segher Boessenkool
Hi!

On Mon, Oct 26, 2020 at 04:22:32PM -0500, will schmidt wrote:
>   Per PR91903, GCC ICEs when we attempt to pass a variable
> (or out of range value) into the vec_ctf() builtin.  Per
> investigation, the parameter checking exists for this
> builtin with the int types, but was missing for
> the long long types.
> 
> This patch adds the missing CODE_FOR_* entries to the
> rs6000_expand_binup_builtin to cover that scenario.
> This patch also updates some existing tests to remove
> calls to vec_ctf() and vec_cts() that contain negative
> values.

> --- a/gcc/testsuite/gcc.target/powerpc/builtins-1.fold.h
> +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1.fold.h
> @@ -212,14 +212,14 @@ int main ()
>extern vector unsigned long long u9; u9 = vec_mergeo (u3, u4);
>  
>extern vector long long l8; l8 = vec_mul (l3, l4);
>extern vector unsigned long long u6; u6 = vec_mul (u3, u4);
>  
> -  extern vector double dh; dh = vec_ctf (la, -2);
> +  extern vector double dh; dh = vec_ctf (la, 2);
>extern vector double di; di = vec_ctf (ua, 2);
>extern vector int sz; sz = vec_cts (fa, 0x1F);
> -  extern vector long long l9; l9 = vec_cts (dh, -2);
> +  extern vector long long l9; l9 = vec_cts (dh, 2);

I think removing the negative inputs here reduces test coverage?  Why
did you change them, it isn't immediately clear to me?

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr91903.c
> @@ -0,0 +1,74 @@
> +/* { dg-do compile */
> +/* { dg-require-effective-target p8vector_hw } */

Compile tests should use p8vector_ok, instead.  (We do not care what
kind of hardware the system under test is: we can run this on a cross-
compiler just fine, after all!)

> +/* { dg-skip-if "" { powerpc*-*-darwin* } } */

Please skip this line.  If the test does not work for Darwin Iain can
easily disable it, but if you do, no one will find out if it does work.

Okay for trunk with those things fixed, and the -2 thing looked at.
Thanks!


Segher


Re: [PATCH] document BLOCK_ABSTRACT_ORIGIN et al.

2021-01-27 Thread Martin Sebor via Gcc-patches

Attached is an updated patch for both tree.h and the internals manual
documenting the most important BLOCK_ macros and what they represent.

On 1/21/21 2:52 PM, Martin Sebor wrote:

On 1/18/21 6:25 AM, Richard Biener wrote:

PS Here are my notes on the macros and the two related functions:

BLOCK: Denotes a lexical scope.  Contains BLOCK_VARS of variables
declared in it, BLOCK_SUBBLOCKS of scopes nested in it, and
BLOCK_CHAIN pointing to the next BLOCK.  Its BLOCK_SUPERCONTEXT
point to the BLOCK of the enclosing scope.  May have
a BLOCK_ABSTRACT_ORIGIN and a BLOCK_SOURCE_LOCATION.

BLOCK_SUPERCONTEXT: The scope of the enclosing block, or FUNCTION_DECL
for the "outermost" function scope.  Inlined functions are chained by
this so that given expression E and its TREE_BLOCK(E) B,
BLOCK_SUPERCONTEXT(B) is the scope (BLOCK) in which E has been made
or into which E has been inlined.  In the latter case,

BLOCK_ORIGIN(B) evaluates either to the enclosing BLOCK or to
the enclosing function DECL.  It's never null.

BLOCK_ABSTRACT_ORIGIN(B) is the FUNCTION_DECL of the function into
which it has been inlined, or null if B is not inlined.


It's the BLOCK or FUNCTION it was inlined _from_, not were it was 
inlined to.
It's the "ultimate" source, thus the abstract copy of the block or 
function decl
(for the outermost scope, aka inlined_function_outer_scope_p).  It 
corresponds

to what you'd expect for the DWARF abstract origin.


Thanks for the correction!  It's just the "innermost" block that
points to the "ultimate" destination into which it's been inlined.



BLOCK_ABSTRACT_ORIGIN can be NULL (in case it isn't an inline instance).


BLOCK_ABSTRACT_ORIGIN: A BLOCK, or FUNCTION_DECL of the function
into which a block has been inlined.  In a BLOCK immediately enclosing
an inlined leaf expression points to the outermost BLOCK into which it
has been inlined (thus bypassing all intermediate BLOCK_SUPERCONTEXTs).

BLOCK_FRAGMENT_ORIGIN: ???
BLOCK_FRAGMENT_CHAIN: ???


that's for scope blocks split by hot/cold partitioning and only 
temporarily

populated.


Thanks, I now see these documented in detail in tree.h.




bool inlined_function_outer_scope_p(BLOCK)   [tree.h]
    Returns true if a BLOCK has a source location.
    True for all but the innermost (no SUBBLOCKs?) and outermost blocks
    into which an expression has been inlined. (Is this always true?)

tree block_ultimate_origin(BLOCK)   [tree.c]
    Returns BLOCK_ABSTRACT_ORIGIN(BLOCK), AO, after asserting that
    (DECL_P(AO) && DECL_ORIGIN(AO) == AO) || BLOCK_ORIGIN(AO) == AO).


The attached diff adds the comments above to tree.h.

I looked for a good place in the manual to add the same text but I'm
not sure.  Would the Blocks @subsection in generic.texi be appropriate?

Martin



Document various BLOCK macros.

gcc/ChangeLog:

	* doc/generic.texi (Function Basics): Mention BLOCK_SUBBLOCKS,
	BLOCK_VARS, BLOCK_SUPERCONTEXT, and BLOCK_ABSTRACT_ORIGIN.
	* doc/gimple.texi (GIMPLE): Update.  Mention free_lang_data pass.
	* tree.h (BLOCK_VARS): Add comment.
	(BLOCK_SUBBLOCKS): Same.
	(BLOCK_SUPERCONTEXT): Same.
	(BLOCK_ABSTRACT_ORIGIN): Same.
	(inlined_function_outer_scope_p): Same.

diff --git a/gcc/tree.h b/gcc/tree.h
index 02b03d1f68e..0dd2196008b 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1912,18 +1912,29 @@ class auto_suppress_location_wrappers
 #define OMP_CLAUSE_OPERAND(NODE, I)\
 	OMP_CLAUSE_ELT_CHECK (NODE, I)
 
-/* In a BLOCK node.  */
+/* In a BLOCK (scope) node:
+   Variables declared in the scope NODE.  */
 #define BLOCK_VARS(NODE) (BLOCK_CHECK (NODE)->block.vars)
 #define BLOCK_NONLOCALIZED_VARS(NODE) \
   (BLOCK_CHECK (NODE)->block.nonlocalized_vars)
 #define BLOCK_NUM_NONLOCALIZED_VARS(NODE) \
   vec_safe_length (BLOCK_NONLOCALIZED_VARS (NODE))
 #define BLOCK_NONLOCALIZED_VAR(NODE,N) (*BLOCK_NONLOCALIZED_VARS (NODE))[N]
+/* A chain of BLOCKs (scopes) nested within the scope NODE.  */
 #define BLOCK_SUBBLOCKS(NODE) (BLOCK_CHECK (NODE)->block.subblocks)
+/* The scope enclosing the scope NODE, or FUNCTION_DECL for the "outermost"
+   function scope.  Inlined functions are chained by this so that given
+   expression E and its TREE_BLOCK(E) B, BLOCK_SUPERCONTEXT(B) is the scope
+   in which E has been made or into which E has been inlined.   */
 #define BLOCK_SUPERCONTEXT(NODE) (BLOCK_CHECK (NODE)->block.supercontext)
+/* Points to the next scope at the same level of nesting as scope NODE.  */
 #define BLOCK_CHAIN(NODE) (BLOCK_CHECK (NODE)->block.chain)
+/* A BLOCK, or FUNCTION_DECL of the function from which a block has been
+   inlined.  In a scope immediately enclosing an inlined leaf expression,
+   points to the outermost scope into which it has been inlined (thus
+   bypassing all intermediate BLOCK_SUPERCONTEXTs). */
 #define BLOCK_ABSTRACT_ORIGIN(NODE) (BLOCK_CHECK (NODE)->block.abstract_origin)
-#define BLOCK_ORIGIN(NODE) \
+#define BLOCK_ORIGIN(NODE)		\
   (BLOCK_ABSTRACT_ORIGIN(NODE) ? BLOCK_ABSTRACT_ORIGIN(NODE) : (NODE))
 #define B

Re: [PATCH] PowerPC: Map IEEE 128-bit long double built-ins.

2021-01-27 Thread Segher Boessenkool
On Tue, Jan 19, 2021 at 12:24:51PM -0500, Michael Meissner wrote:
> On Fri, Jan 15, 2021 at 03:43:13PM -0600, Segher Boessenkool wrote:
> > Hi!
> > 
> > On Thu, Jan 14, 2021 at 11:59:19AM -0500, Michael Meissner wrote:
> > > >From 78435dee177447080434cdc08fc76b1029c7f576 Mon Sep 17 00:00:00 2001
> > > From: Michael Meissner 
> > > Date: Wed, 13 Jan 2021 21:47:03 -0500
> > > Subject: [PATCH] PowerPC: Map IEEE 128-bit long double built-ins.
> > > 
> > > This patch replaces patches previously submitted:
> > 
> > What did you change after I approved it?
> 
> You grumbled about the way I converted the names from the current name to the
> IEEE 128-bit name as being unclear.
> 
> 1) I moved the table of known mappings from within a function to a separate
> function, and I populated the switch statement with all of the current names.
> 
> 2) I moved the code that looks at a built-in function's arguments and returns
> whether it uses long double to a separate function rather than being buried
> within a larger function.
> 
> 3) I changed the code for case we we didn't provide a name (i.e. new 
> built-ins)
> to hopefully be clearer on the conversion.

Don't Do That.

Commit what was approved (unless it actually does not work, then explain
that clearly).  You can sent incremental patches after that.


I am not going to review this whole patch once again.


If you change things in a series, the 0/N message is a good free-form
place to explain that (and start sith a summary, and a summary of what
is different from the previous version, for example).  Some people
keep a changelog of what changed in all version (newest on top of
course).

If there is only one patch, or you need to commemnt something on just
one patch, you can do that after the "---" line.  Everything before that
line then is the exact commit message you will use (or anyone else can
do it as well, with a simple "git am").


The goal of a patch submission is for it to be reviewed.  Your
submission should be optimised for that, not for anything else.


So please send an incremental patch if you want more changes, or if the
previous version was actually very much broken, explain what?


Segher


Re: [Ping] PowerPC: Map IEEE 128-bit long double built-ins.

2021-01-27 Thread Segher Boessenkool
On Tue, Jan 26, 2021 at 06:39:22PM -0500, Michael Meissner wrote:
> Ping https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563496.html
> 
> | Date: Thu, 14 Jan 2021 11:59:19 -0500
> | Subject: [PATCH] PowerPC: Map IEEE 128-bit long double built-ins.
> | Message-ID: <20210114165919.ga1...@ibm-toto.the-meissners.org>
> 
> As I've said in the past, this is the most important patch of the IEEE 128-bit
> patches.  What do I need to do to be able to commit this patch ASAP?  Or what
> changes do I need to make?

https://patchwork.ozlabs.org/project/gcc/patch/20201119235814.ga...@ibm-toto.the-meissners.org/

  I cannot understand this code, and it does seem far from obviously
  correct.  But, okay for trunk if you handle all fallout (and I mean all,
  not just "all you consider important").

(And that is for *that* patch, not including later changes.  Send those
separately, don't make me do much more work than needed).


Segher


Re: [PATCH] PowerPC: Map IEEE 128-bit long double built-ins.

2021-01-27 Thread Segher Boessenkool
On Wed, Jan 27, 2021 at 01:06:46PM -0600, will schmidt wrote:
> On Thu, 2021-01-14 at 11:59 -0500, Michael Meissner via Gcc-patches wrote:
> > November 19th, 2020:
> > Message-ID: <20201119235814.ga...@ibm-toto.the-meissners.org>
> 
> Subject and date should be sufficient

Only if people pick good subjects, and do not send ten patches with a
similar subject line on the same day.  I asked for the message id,
that works pretty much everywhere.

> _if_ having the old versions
> of the patchs are necessary to review the latest version of the
> patch.  Which ideally is not the case.

Stronger that that: I need to know what changed!  So please just explain
what changed, in just a short sentence or two, or more if that is needed
(but not if it is not needed).


Segher


[PATCH PR97627]Avoid computing niters info for fake edges

2021-01-27 Thread bin.cheng via Gcc-patches
Hi,
As described in commit message, we need to avoid computing niters info for fake
edges.  This simple patch does this by two changes.  

Bootstrap and test on X86_64, is it ok?

Thanks,
bin

pr97627-20210128.patch
Description: Binary data


Re: [PATCH] PowerPC: Map IEEE 128-bit long double built-ins.

2021-01-27 Thread Michael Meissner via Gcc-patches
On Wed, Jan 27, 2021 at 07:43:56PM -0600, Segher Boessenkool wrote:
> On Wed, Jan 27, 2021 at 01:06:46PM -0600, will schmidt wrote:
> > On Thu, 2021-01-14 at 11:59 -0500, Michael Meissner via Gcc-patches wrote:
> > > November 19th, 2020:
> > > Message-ID: <20201119235814.ga...@ibm-toto.the-meissners.org>
> > 
> > Subject and date should be sufficient
> 
> Only if people pick good subjects, and do not send ten patches with a
> similar subject line on the same day.  I asked for the message id,
> that works pretty much everywhere.
> 
> > _if_ having the old versions
> > of the patchs are necessary to review the latest version of the
> > patch.  Which ideally is not the case.
> 
> Stronger that that: I need to know what changed!  So please just explain
> what changed, in just a short sentence or two, or more if that is needed
> (but not if it is not needed).

In the past you complained that the patch would abort if the user did not link
against GLIBC 2.32 (because there is an #ifdef in the code to do the abort if
gcc was configured against an older GLIBC).

In addition, it used some pre-processor magic so that I didn't have to modify
the dfp-bit.{c,h} functions to add new functions.  In particular, the new
functions pretended they where the TF functions, and used #define to change the
names.

The new code modifies dfp-bit.{c,h} to have support for the KF functions as
separate #ifdef's.  It eliminates the preprocessor trickery, since I did modify
the dfp-bit.{c,h} support.

In order to deal with older GLIBC's, I used a different function for the KF
library (__sprintfkf instead of sprintf, and __strtokf instead of strold).
This function uses weak references to see if we had the GLIBC symbols
(__sprintfieee128 and __strtoieee128 that are in GLIBC 2.32).  If those
functions exist, we call those functions directly.

If those functions do not exist, I converted the _Float128 type to or from
__ibm128, and I did the normal long double conversions.  Given that IEEE
128-bit has a much larger exponent range than IBM 128-bit, it means there are
some numbers that can't be converted.  But at least the majority of the values
are converted.

Note all of the other binary/decimal conversions use the GLIBC functions
(either sprintf or strto).  The GLIBC people have the expertise to do the
conversion, wheras I do not.  But until GLIBC 2.32, there was not enough of the
support in GLIBC to handle IEEE 128-bit conversions.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: [PATCH] PowerPC: Map IEEE 128-bit long double built-ins.

2021-01-27 Thread Michael Meissner via Gcc-patches
Whoops, I thought I was replying to the second patch about Decimal and IEEE
128-bit conversion, not about built-in support.

On Wed, Jan 27, 2021 at 10:01:38PM -0500, Michael Meissner wrote:
> On Wed, Jan 27, 2021 at 07:43:56PM -0600, Segher Boessenkool wrote:
> > On Wed, Jan 27, 2021 at 01:06:46PM -0600, will schmidt wrote:
> > > On Thu, 2021-01-14 at 11:59 -0500, Michael Meissner via Gcc-patches wrote:
> > > > November 19th, 2020:
> > > > Message-ID: <20201119235814.ga...@ibm-toto.the-meissners.org>
> > > 
> > > Subject and date should be sufficient
> > 
> > Only if people pick good subjects, and do not send ten patches with a
> > similar subject line on the same day.  I asked for the message id,
> > that works pretty much everywhere.
> > 
> > > _if_ having the old versions
> > > of the patchs are necessary to review the latest version of the
> > > patch.  Which ideally is not the case.
> > 
> > Stronger that that: I need to know what changed!  So please just explain
> > what changed, in just a short sentence or two, or more if that is needed
> > (but not if it is not needed).
> 
> In the past you complained that the patch would abort if the user did not link
> against GLIBC 2.32 (because there is an #ifdef in the code to do the abort if
> gcc was configured against an older GLIBC).
> 
> In addition, it used some pre-processor magic so that I didn't have to modify
> the dfp-bit.{c,h} functions to add new functions.  In particular, the new
> functions pretended they where the TF functions, and used #define to change 
> the
> names.
> 
> The new code modifies dfp-bit.{c,h} to have support for the KF functions as
> separate #ifdef's.  It eliminates the preprocessor trickery, since I did 
> modify
> the dfp-bit.{c,h} support.
> 
> In order to deal with older GLIBC's, I used a different function for the KF
> library (__sprintfkf instead of sprintf, and __strtokf instead of strold).
> This function uses weak references to see if we had the GLIBC symbols
> (__sprintfieee128 and __strtoieee128 that are in GLIBC 2.32).  If those
> functions exist, we call those functions directly.
> 
> If those functions do not exist, I converted the _Float128 type to or from
> __ibm128, and I did the normal long double conversions.  Given that IEEE
> 128-bit has a much larger exponent range than IBM 128-bit, it means there are
> some numbers that can't be converted.  But at least the majority of the values
> are converted.
> 
> Note all of the other binary/decimal conversions use the GLIBC functions
> (either sprintf or strto).  The GLIBC people have the expertise to do the
> conversion, wheras I do not.  But until GLIBC 2.32, there was not enough of 
> the
> support in GLIBC to handle IEEE 128-bit conversions.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


  1   2   >