date:20210721

[PATCH V2 00/10] Initial support for AVX512FP16

2021-07-21 Thread liuhongt via Gcc-patches

Hi:
  As discussed in [1], this patch support _Float16 under target sse2
and above, w/o avx512fp16, _Float16 type is storage only, all operations
are emulated by soft-fp and float instructions. Soft-fp keeps the intermediate
result of the operation at 32-bit precision by defaults, which may lead to
inconsistent behavior between soft-fp and avx512fp16 instructions, using option
-fexcess-precision=standard will force round back after every operation.
 
[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574112.html

There's 10 patches in this series:

1)  Update hf soft-fp from glibc.
2)  [i386] Enable _Float16 type for TARGET_SSE2 and above.
3)  [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
truncations.
4) AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16
instructions.
5) AVX512FP16: Support vector init/broadcast/set/extract for FP16.
6) AVX512FP16: Add testcase for vector init and broadcast intrinsics.
7) AVX512FP16: Add tests for vector passing in variable arguments.
8) AVX512FP16: Add ABI tests for xmm.
9) AVX512FP16: Add ABI test for ymm.
10) AVX512FP16: Add abi test for zmm

  Bootstrapped and regtested on x86_64-linux-gnu{-m32,} on CLX.
  Boostrappped and regtested on x86_64-linux-gnu{-m32\ -march=native,\ 
-march=native} on SPR.
  Pass 300+ new tests under gcc.dg/torture/*float16*
  
  On SPR, there're regressions related to FLT_EVAL_METHODS for 
pr69225-[1234567].c
 since TARGET_AVX512FP16 will set FLT_EVAL_MATHOD as 
FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16.

 gcc/common/config/i386/cpuinfo.h  |2 +
 gcc/common/config/i386/i386-common.c  |   26 +-
 gcc/common/config/i386/i386-cpuinfo.h |1 +
 gcc/common/config/i386/i386-isas.h|1 +
 gcc/config.gcc|2 +-
 gcc/config/i386/avx512fp16intrin.h|  225 
 gcc/config/i386/cpuid.h   |1 +
 gcc/config/i386/i386-builtin-types.def|7 +-
 gcc/config/i386/i386-builtins.c   |   23 +
 gcc/config/i386/i386-c.c  |2 +
 gcc/config/i386/i386-expand.c |  129 +-
 gcc/config/i386/i386-isa.def  |1 +
 gcc/config/i386/i386-modes.def|   13 +-
 gcc/config/i386/i386-options.c|4 +-
 gcc/config/i386/i386.c|  238 +++-
 gcc/config/i386/i386.h|   28 +-
 gcc/config/i386/i386.md   |  304 -
 gcc/config/i386/i386.opt  |4 +
 gcc/config/i386/immintrin.h   |4 +
 gcc/config/i386/sse.md|  395 --
 gcc/doc/extend.texi   |   16 +
 gcc/doc/invoke.texi   |   10 +-
 gcc/lto/lto-lang.c|3 +
 gcc/optabs-query.c|   10 +-
 gcc/testsuite/g++.dg/other/i386-2.C   |2 +-
 gcc/testsuite/g++.dg/other/i386-3.C   |2 +-
 gcc/testsuite/g++.target/i386/float16-1.C |8 +
 gcc/testsuite/g++.target/i386/float16-2.C |   14 +
 gcc/testsuite/g++.target/i386/float16-3.C |   10 +
 gcc/testsuite/gcc.target/i386/avx-1.c |2 +-
 gcc/testsuite/gcc.target/i386/avx-2.c |2 +-
 gcc/testsuite/gcc.target/i386/avx512-check.h  |3 +
 .../gcc.target/i386/avx512fp16-10a.c  |   14 +
 .../gcc.target/i386/avx512fp16-10b.c  |   25 +
 .../gcc.target/i386/avx512fp16-12a.c  |   21 +
 .../gcc.target/i386/avx512fp16-12b.c  |   27 +
 gcc/testsuite/gcc.target/i386/avx512fp16-1a.c |   24 +
 gcc/testsuite/gcc.target/i386/avx512fp16-1b.c |   32 +
 gcc/testsuite/gcc.target/i386/avx512fp16-1c.c |   26 +
 gcc/testsuite/gcc.target/i386/avx512fp16-1d.c |   33 +
 gcc/testsuite/gcc.target/i386/avx512fp16-1e.c |   30 +
 gcc/testsuite/gcc.target/i386/avx512fp16-2a.c |   28 +
 gcc/testsuite/gcc.target/i386/avx512fp16-2b.c |   33 +
 gcc/testsuite/gcc.target/i386/avx512fp16-2c.c |   36 +
 gcc/testsuite/gcc.target/i386/avx512fp16-3a.c |   36 +
 gcc/testsuite/gcc.target/i386/avx512fp16-3b.c |   35 +
 gcc/testsuite/gcc.target/i386/avx512fp16-3c.c |   40 +
 gcc/testsuite/gcc.target/i386/avx512fp16-4.c  |   31 +
 gcc/testsuite/gcc.target/i386/avx512fp16-5.c  |  133 ++
 gcc/testsuite/gcc.target/i386/avx512fp16-6.c  |   57 +
 gcc/testsuite/gcc.target/i386/avx512fp16-7.c  |   86 ++
 gcc/testsuite/gcc.target/i386/avx512fp16-8.c  |   53 +
 gcc/testsuite/gcc.target/i386/avx512fp16-9a.c |   27 +
 gcc/testsuite/gcc.target/i386/avx512fp16-9b.c |   49 +
 .../gcc.target/i386/avx512fp16-vararg-1.c |  122 ++
 .../gcc.target/i386/avx512fp16-vararg-2.c |  107 ++
 .../gcc.target/i386/avx512fp16-vararg-3.c |  114 ++
 .../gcc.target/i386/avx512fp16-vararg-4.c |  115 ++
 .../gcc.target/i386/avx512fp16-vec_set_var.c  |   30 +
 gcc/testsuite/gcc.target/i386/float16-3a.c|   10 +
 gcc/testsuite/gcc.target/i386/float16-3b.c|   10 +
 gcc/testsuite/gcc.

[PATCH 01/10] Update hf soft-fp from glibc.

2021-07-21 Thread liuhongt via Gcc-patches

libgcc/ChangeLog

* soft-fp/eqhf2.c: New file.
* soft-fp/extendhfdf2.c: New file.
* soft-fp/extendhfsf2.c: New file.
* soft-fp/extendhfxf2.c: New file.
* soft-fp/half.h (FP_CMP_EQ_H): New marco.
* soft-fp/truncdfhf2.c: New file
* soft-fp/truncsfhf2.c: New file
* soft-fp/truncxfhf2.c: New file
---
 libgcc/soft-fp/eqhf2.c   | 49 +
 libgcc/soft-fp/extendhfdf2.c | 53 
 libgcc/soft-fp/extendhfsf2.c | 49 +
 libgcc/soft-fp/half.h|  1 +
 libgcc/soft-fp/truncdfhf2.c  | 52 +++
 libgcc/soft-fp/truncsfhf2.c  | 48 
 6 files changed, 252 insertions(+)
 create mode 100644 libgcc/soft-fp/eqhf2.c
 create mode 100644 libgcc/soft-fp/extendhfdf2.c
 create mode 100644 libgcc/soft-fp/extendhfsf2.c
 create mode 100644 libgcc/soft-fp/truncdfhf2.c
 create mode 100644 libgcc/soft-fp/truncsfhf2.c

diff --git a/libgcc/soft-fp/eqhf2.c b/libgcc/soft-fp/eqhf2.c
new file mode 100644
index 000..6d6634e5c54
--- /dev/null
+++ b/libgcc/soft-fp/eqhf2.c
@@ -0,0 +1,49 @@
+/* Software floating-point emulation.
+   Return 0 iff a == b, 1 otherwise
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   .  */
+
+#include "soft-fp.h"
+#include "half.h"
+
+CMPtype
+__eqhf2 (HFtype a, HFtype b)
+{
+  FP_DECL_EX;
+  FP_DECL_H (A);
+  FP_DECL_H (B);
+  CMPtype r;
+
+  FP_INIT_EXCEPTIONS;
+  FP_UNPACK_RAW_H (A, a);
+  FP_UNPACK_RAW_H (B, b);
+  FP_CMP_EQ_H (r, A, B, 1);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
+
+strong_alias (__eqhf2, __nehf2);
diff --git a/libgcc/soft-fp/extendhfdf2.c b/libgcc/soft-fp/extendhfdf2.c
new file mode 100644
index 000..337ba791d48
--- /dev/null
+++ b/libgcc/soft-fp/extendhfdf2.c
@@ -0,0 +1,53 @@
+/* Software floating-point emulation.
+   Return an IEEE half converted to IEEE double
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   .  */
+
+#define FP_NO_EXACT_UNDERFLOW
+#include "soft-fp.h"
+#include "half.h"
+#include "double.h"
+
+DFtype
+__extendhfdf2 (HFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_H (A);
+  FP_DECL_D (R);
+  DFtype r;
+
+  FP_INIT_EXCEPTIONS;
+  FP_UNPACK_RAW_H (A, a);
+#if _FP_W_TYPE_SIZE < _FP_FRACBITS_D
+  FP_EXTEND (D, H, 2, 1, R, A);
+#else
+  FP_EXTEND (D, H, 1, 1, R, A);
+#endif
+  FP_PACK_RAW_D (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
diff --git a/

[PATCH 02/10] [i386] Enable _Float16 type for TARGET_SSE2 and above.

2021-07-21 Thread liuhongt via Gcc-patches

gcc/ChangeLog:

* config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode.
* config/i386/i386.c (enum x86_64_reg_class): Add
X86_64_SSEHF_CLASS.
(merge_classes): Handle X86_64_SSEHF_CLASS.
(examine_argument): Ditto.
(construct_container): Ditto.
(classify_argument): Ditto, and set HFmode/HCmode to
X86_64_SSEHF_CLASS.
(function_value_32): Return _FLoat16/Complex Float16 by
%xmm0/%xmm1.
(function_value_64): Return _Float16/Complex Float16 by SSE
register.
(ix86_print_operand): Handle CONST_DOUBLE HFmode.
(ix86_secondary_reload): Require gpr as intermediate register
to store _Float16 from sse register when sse4 is not
available.
(ix86_hard_regno_mode_ok): Put HFmode in sse register and gpr.
(ix86_libgcc_floating_mode_supported_p): Enable _FLoat16 under
sse2.
(ix86_scalar_mode_supported_p): Ditto.
(TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Defined.
(ix86_get_excess_precision): Return
FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 under sse2.
* config/i386/i386.h (VALID_SSE2_REG_MODE): Add HFmode.
* config/i386/i386.md (*pushhf_rex64): New define_insn.
(*pushhf): Ditto.
(*movhf_internal): Ditto.
* doc/extend.texi (Half-Precision Floating Point): Documemt
_Float16 for x86.

gcc/lto/ChangeLog:

* lto-lang.c (lto_type_for_mode): Return float16_type_node
when mode == TYPE_MODE (float16_type_node).

gcc/testsuite/ChangeLog

* gcc.target/i386/sse2-float16-1.c: New test.
* gcc.target/i386/sse2-float16-2.c: Ditto.
* gcc.target/i386/sse2-float16-3.c: Ditto.
---
 gcc/config/i386/i386-modes.def|   1 +
 gcc/config/i386/i386.c|  99 ++-
 gcc/config/i386/i386.h|   2 +-
 gcc/config/i386/i386.md   | 118 +-
 gcc/doc/extend.texi   |  16 +++
 gcc/lto/lto-lang.c|   3 +
 .../gcc.target/i386/sse2-float16-1.c  |   8 ++
 .../gcc.target/i386/sse2-float16-2.c  |  16 +++
 .../gcc.target/i386/sse2-float16-3.c  |  12 ++
 9 files changed, 265 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-3.c

diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def
index 4e7014be034..9232f59a925 100644
--- a/gcc/config/i386/i386-modes.def
+++ b/gcc/config/i386/i386-modes.def
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 
 FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_intel_96_format);
 FLOAT_MODE (TF, 16, ieee_quad_format);
+FLOAT_MODE (HF, 2, ieee_half_format);
 
 /* In ILP32 mode, XFmode has size 12 and alignment 4.
In LP64 mode, XFmode has size and alignment 16.  */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ff96134fb37..02628d838fc 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -387,6 +387,7 @@ enum x86_64_reg_class
 X86_64_INTEGER_CLASS,
 X86_64_INTEGERSI_CLASS,
 X86_64_SSE_CLASS,
+X86_64_SSEHF_CLASS,
 X86_64_SSESF_CLASS,
 X86_64_SSEDF_CLASS,
 X86_64_SSEUP_CLASS,
@@ -2023,8 +2024,10 @@ merge_classes (enum x86_64_reg_class class1, enum 
x86_64_reg_class class2)
 return X86_64_MEMORY_CLASS;
 
   /* Rule #4: If one of the classes is INTEGER, the result is INTEGER.  */
-  if ((class1 == X86_64_INTEGERSI_CLASS && class2 == X86_64_SSESF_CLASS)
-  || (class2 == X86_64_INTEGERSI_CLASS && class1 == X86_64_SSESF_CLASS))
+  if ((class1 == X86_64_INTEGERSI_CLASS
+   && (class2 == X86_64_SSESF_CLASS || class2 == X86_64_SSEHF_CLASS))
+  || (class2 == X86_64_INTEGERSI_CLASS
+ && (class1 == X86_64_SSESF_CLASS || class1 == X86_64_SSEHF_CLASS)))
 return X86_64_INTEGERSI_CLASS;
   if (class1 == X86_64_INTEGER_CLASS || class1 == X86_64_INTEGERSI_CLASS
   || class2 == X86_64_INTEGER_CLASS || class2 == X86_64_INTEGERSI_CLASS)
@@ -2178,6 +2181,8 @@ classify_argument (machine_mode mode, const_tree type,
/* The partial classes are now full classes.  */
if (subclasses[0] == X86_64_SSESF_CLASS && bytes != 4)
  subclasses[0] = X86_64_SSE_CLASS;
+   if (subclasses[0] == X86_64_SSEHF_CLASS && bytes != 2)
+ subclasses[0] = X86_64_SSE_CLASS;
if (subclasses[0] == X86_64_INTEGERSI_CLASS
&& !((bit_offset % 64) == 0 && bytes == 4))
  subclasses[0] = X86_64_INTEGER_CLASS;
@@ -2350,6 +2355,12 @@ classify_argument (machine_mode mode, const_tree type,
   gcc_unreachable ();
 case E_CTImode:
   return 0;
+case E_HFmode:
+  if (!(bit_offset % 64))
+   classes[0] = X86_64_SSEHF_CLASS;
+  else
+

[PATCH 03/10] [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and truncations.

2021-07-21 Thread liuhongt via Gcc-patches

gcc/ChangeLog:

* optabs-query.c (get_best_extraction_insn): Use word_mode for
HF field.

libgcc/ChangeLog:

* config/i386/32/sfp-machine.h (_FP_NANFRAC_H): New macro.
* config/i386/64/sfp-machine.h (_FP_NANFRAC_H): Ditto.
* config/i386/sfp-machine.h (_FP_NANSIGN_H): Ditto.
* config/i386/t-softfp: Add hf soft-fp.
* config.host: Add i386/64/t-softfp.
* config/i386/64/t-softfp: New file.
---
 gcc/optabs-query.c  | 10 +-
 libgcc/config.host  |  5 +
 libgcc/config/i386/32/sfp-machine.h |  1 +
 libgcc/config/i386/64/sfp-machine.h |  1 +
 libgcc/config/i386/64/t-softfp  |  1 +
 libgcc/config/i386/sfp-machine.h|  1 +
 libgcc/config/i386/t-softfp |  5 +
 7 files changed, 19 insertions(+), 5 deletions(-)
 create mode 100644 libgcc/config/i386/64/t-softfp

diff --git a/gcc/optabs-query.c b/gcc/optabs-query.c
index 05ee5f517da..0438e451474 100644
--- a/gcc/optabs-query.c
+++ b/gcc/optabs-query.c
@@ -205,7 +205,15 @@ get_best_extraction_insn (extraction_insn *insn,
  machine_mode field_mode)
 {
   opt_scalar_int_mode mode_iter;
-  FOR_EACH_MODE_FROM (mode_iter, smallest_int_mode_for_size (struct_bits))
+  scalar_int_mode smallest_int_mode;
+  /* FIXME: validate_subreg only allows (subreg:WORD_MODE (reg:HF) 0). */
+  if (FLOAT_MODE_P (field_mode)
+  && known_eq (GET_MODE_SIZE (field_mode), 2))
+smallest_int_mode = word_mode;
+  else
+smallest_int_mode = smallest_int_mode_for_size (struct_bits);
+
+  FOR_EACH_MODE_FROM (mode_iter, smallest_int_mode)
 {
   scalar_int_mode mode = mode_iter.require ();
   if (get_extraction_insn (insn, pattern, type, mode))
diff --git a/libgcc/config.host b/libgcc/config.host
index 50f00062232..96da9ef1cce 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1540,10 +1540,7 @@ i[34567]86-*-elfiamcu | i[34567]86-*-rtems*)
;;
 i[34567]86-*-* | x86_64-*-*)
tmake_file="${tmake_file} t-softfp-tf"
-   if test "${host_address}" = 32; then
-   tmake_file="${tmake_file} i386/${host_address}/t-softfp"
-   fi
-   tmake_file="${tmake_file} i386/t-softfp t-softfp"
+   tmake_file="${tmake_file} i386/${host_address}/t-softfp i386/t-softfp 
t-softfp"
;;
 esac
 
diff --git a/libgcc/config/i386/32/sfp-machine.h 
b/libgcc/config/i386/32/sfp-machine.h
index 1fa282d7afe..e24cbc8d180 100644
--- a/libgcc/config/i386/32/sfp-machine.h
+++ b/libgcc/config/i386/32/sfp-machine.h
@@ -86,6 +86,7 @@
 #define _FP_DIV_MEAT_D(R,X,Y)   _FP_DIV_MEAT_2_udiv(D,R,X,Y)
 #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_4_udiv(Q,R,X,Y)
 
+#define _FP_NANFRAC_H  _FP_QNANBIT_H
 #define _FP_NANFRAC_S  _FP_QNANBIT_S
 #define _FP_NANFRAC_D  _FP_QNANBIT_D, 0
 /* Even if XFmode is 12byte,  we have to pad it to
diff --git a/libgcc/config/i386/64/sfp-machine.h 
b/libgcc/config/i386/64/sfp-machine.h
index 1ff94c23ea4..e1c616699bb 100644
--- a/libgcc/config/i386/64/sfp-machine.h
+++ b/libgcc/config/i386/64/sfp-machine.h
@@ -13,6 +13,7 @@ typedef unsigned int UTItype __attribute__ ((mode (TI)));
 
 #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_2_udiv(Q,R,X,Y)
 
+#define _FP_NANFRAC_H  _FP_QNANBIT_H
 #define _FP_NANFRAC_S  _FP_QNANBIT_S
 #define _FP_NANFRAC_D  _FP_QNANBIT_D
 #define _FP_NANFRAC_E  _FP_QNANBIT_E, 0
diff --git a/libgcc/config/i386/64/t-softfp b/libgcc/config/i386/64/t-softfp
new file mode 100644
index 000..d812bb120bd
--- /dev/null
+++ b/libgcc/config/i386/64/t-softfp
@@ -0,0 +1 @@
+softfp_extras := fixhfti fixunshfti floattihf floatuntihf
\ No newline at end of file
diff --git a/libgcc/config/i386/sfp-machine.h b/libgcc/config/i386/sfp-machine.h
index 8319f0550bc..f15d29d3755 100644
--- a/libgcc/config/i386/sfp-machine.h
+++ b/libgcc/config/i386/sfp-machine.h
@@ -17,6 +17,7 @@ typedef int __gcc_CMPtype __attribute__ ((mode 
(__libgcc_cmp_return__)));
 #define _FP_KEEPNANFRACP   1
 #define _FP_QNANNEGATEDP 0
 
+#define _FP_NANSIGN_H  1
 #define _FP_NANSIGN_S  1
 #define _FP_NANSIGN_D  1
 #define _FP_NANSIGN_E  1
diff --git a/libgcc/config/i386/t-softfp b/libgcc/config/i386/t-softfp
index 685d9cf8502..4ac214eb0ce 100644
--- a/libgcc/config/i386/t-softfp
+++ b/libgcc/config/i386/t-softfp
@@ -1 +1,6 @@
 LIB2ADD += $(srcdir)/config/i386/sfp-exceptions.c
+
+softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf
+softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf
+
+softfp_extras += eqhf2
\ No newline at end of file
-- 
2.18.1

[PATCH 04/10] AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16 instructions.

2021-07-21 Thread liuhongt via Gcc-patches

From: "Guo, Xuepeng" 

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_available_features):
Detect FEATURE_AVX512FP16.
* common/config/i386/i386-common.c
(OPTION_MASK_ISA_AVX512FP16_SET,
OPTION_MASK_ISA_AVX512FP16_UNSET,
OPTION_MASK_ISA2_AVX512FP16_SET,
OPTION_MASK_ISA2_AVX512FP16_UNSET): New.
(OPTION_MASK_ISA2_AVX512BW_UNSET,
OPTION_MASK_ISA2_AVX512BF16_UNSET): Add AVX512FP16.
(ix86_handle_option): Handle -mavx512fp16.
* common/config/i386/i386-cpuinfo.h (enum processor_features):
Add FEATURE_AVX512FP16.
* common/config/i386/i386-isas.h: Add entry for AVX512FP16.
* config.gcc: Add avx512fp16intrin.h.
* config/i386/avx512fp16intrin.h: New intrinsic header.
* config/i386/cpuid.h: Add bit_AVX512FP16.
* config/i386/i386-builtin-types.def: (FLOAT16): New primitive type.
* config/i386/i386-builtins.c: Support _Float16 type for i386
backend.
(ix86_init_float16_builtins): New function.
(ix86_float16_type_node): New.
* config/i386/i386-c.c (ix86_target_macros_internal): Define
__AVX512FP16__.
* config/i386/i386-expand.c (ix86_expand_branch): Support
HFmode.
(ix86_prepare_fp_compare_args): Adjust TARGET_SSE_MATH &&
SSE_FLOAT_MODE_P to SSE_FLOAT_MODE_SSEMATH_OR_HF_P.
(ix86_expand_fp_movcc): Ditto.
* config/i386/i386-isa.def: Add PTA define for AVX512FP16.
* config/i386/i386-options.c (isa2_opts): Add -mavx512fp16.
(ix86_valid_target_attribute_inner_p): Add avx512fp16 attribute.
* config/i386/i386.c (ix86_get_ssemov): Use
vmovdqu16/vmovw/vmovsh for HFmode/HImode scalar or vector.
(ix86_get_excess_precision): Use
FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when TARGET_AVX512FP16
existed.
(output_387_binary_op): Update instruction suffix for HFmode.
(sse_store_index): Use SFmode cost for HFmode cost.
(inline_memory_move_cost): Add HFmode, and perfer SSE cost over
GPR cost for HFmode.
(ix86_hard_regno_mode_ok): Allow HImode in sse register.
(ix86_mangle_type): Add manlging for _Float16 type.
(inline_secondary_memory_needed): No memory is needed for
16bit movement between gpr and sse reg under
TARGET_AVX512FP16.
(ix86_multiplication_cost): Adjust TARGET_SSE_MATH &&
SSE_FLOAT_MODE_P to SSE_FLOAT_MODE_SSEMATH_OR_HF_P.
(ix86_division_cost): Ditto.
(ix86_rtx_costs): Ditto.
(ix86_add_stmt_cost): Ditto.
(ix86_optab_supported_p): Ditto.
* config/i386/i386.h (VALID_AVX512F_SCALAR_MODE): Add HFmode.
(SSE_FLOAT_MODE_SSEMATH_OR_HF_P): Add HFmode.
(SSE_FLOAT_MODE_P): Add HFmode.
(PTA_SAPPHIRERAPIDS): Add PTA_AVX512FP16.
* config/i386/i386.md (mode): Add HFmode.
(MODE_SIZE): Add HFmode.
(MODEFH): Likewise.
(ssemodesuffix): Add sh suffix for HFmode.
(cbranch4): Use MODEFH.
(3): Likewise.
(mul3): Likewise.
(div3): Likewise.
(*ieee_s3): Likewise.
(*cmpihf): New define_insn for HFmode.
(*movhf_internal): Adjust for avx512fp16 instruction.
(extendhf2): Likewise.
(trunchf2): Likewise.
(*fop_hf_comm): Likewise.
(*fop_hf_1): Likewise.
(floathf2): Likewise.
(movcc): Likewise.
* config/i386/i386.opt: Add mavx512fp16.
* config/i386/immintrin.h: Include avx512fp16intrin.h.
* doc/invoke.texi: Add mavx512fp16.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add -mavx512fp16 in dg-options.
* gcc.target/i386/avx-2.c: Ditto.
* gcc.target/i386/avx512-check.h: Check cpuid for AVX512FP16.
* gcc.target/i386/funcspec-56.inc: Add new target attribute check.
* gcc.target/i386/sse-13.c: Add -mavx512fp16.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* lib/target-supports.exp: (check_effective_target_avx512fp16): New.
* g++.target/i386/float16-1.C: New test.
* g++.target/i386/float16-2.C: Ditto.
* g++.target/i386/float16-3.C: Ditto.
* gcc.target/i386/avx512fp16-12a.c: Ditto.
* gcc.target/i386/avx512fp16-12b.c: Ditto.
* gcc.target/i386/float16-3a.c: Ditto.
* gcc.target/i386/float16-3b.c: Ditto.
* gcc.target/i386/float16-4a.c: Ditto.
* gcc.target/i386/float16-4b.c: Ditto.
* gcc.target/i386/pr54855-12.c: Ditto.
* g++.dg/other/i386-2.C: Ditto.
* g++.dg/other/i386-3.C: Ditto.

Co-Authored-By: Guo, Xuepeng 
Co-Authored-By: H.J. Lu 
Co-Authored-By: Liu, Hongtao 
Co-Authored-By: Wang, Hongyu 
Co-Authored-By: Xu, Dianhong 
---
 gcc/common/config/i386/cpuinfo.h  |   2 +
 gcc/common/config/i386/i386-common.c  |  26 ++-
 gcc/common/config/

[PATCH 05/10] AVX512FP16: Support vector init/broadcast/set/extract for FP16.

2021-07-21 Thread liuhongt via Gcc-patches

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h (_mm_set_ph): New intrinsic.
(_mm256_set_ph): Likewise.
(_mm512_set_ph): Likewise.
(_mm_setr_ph): Likewise.
(_mm256_setr_ph): Likewise.
(_mm512_setr_ph): Likewise.
(_mm_set1_ph): Likewise.
(_mm256_set1_ph): Likewise.
(_mm512_set1_ph): Likewise.
(_mm_setzero_ph): Likewise.
(_mm256_setzero_ph): Likewise.
(_mm512_setzero_ph): Likewise.
(_mm_set_sh): Likewise.
(_mm_load_sh): Likewise.
(_mm_store_sh): Likewise.
* config/i386/i386-builtin-types.def (V8HF): New type.
(DEF_FUNCTION_TYPE (V8HF, V8HI)): New builtin function type
* config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
Support vector HFmodes.
(ix86_expand_vector_init_one_nonzero): Likewise.
(ix86_expand_vector_init_one_var): Likewise.
(ix86_expand_vector_init_interleave): Likewise.
(ix86_expand_vector_init_general): Likewise.
(ix86_expand_vector_set): Likewise.
(ix86_expand_vector_extract): Likewise.
(ix86_expand_vector_init_concat): Likewise.
(ix86_expand_sse_movcc): Handle vector HFmodes.
(ix86_expand_vector_set_var): Ditto.
* config/i386/i386-modes.def: Add HF vector modes in comment.
* config/i386/i386.c (classify_argument): Add HF vector modes.
(ix86_hard_regno_mode_ok): Allow HF vector modes for AVX512FP16.
(ix86_vector_mode_supported_p): Likewise.
(ix86_set_reg_reg_cost): Handle vector HFmode.
(ix86_get_ssemov): Handle vector HFmode.
(function_arg_advance_64): Pass unamed V16HFmode and V32HFmode
by stack.
* config/i386/i386.h (VALID_AVX512FP16_REG_MODE): New.
(VALID_AVX256_REG_OR_OI_MODE): Rename to ..
(VALID_AVX256_REG_OR_OI_VHF_MODE): .. this, and add V16HF.
(VALID_SSE2_REG_VHF_MODE): New.
(VALID_AVX512VL_128_REG_MODE): Add V8HF and TImode.
(SSE_REG_MODE_P): Add vector HFmode.
* config/i386/i386.md (mode): Add HF vector modes.
(MODE_SIZE): Likewise.
(ssemodesuffix): Add ph suffix for HF vector modes.
* config/i386/sse.md (VFH_128): New mode iterator.
(VMOVE): Adjust for HF vector modes.
(V): Likewise.
(V_256_512): Likewise.
(avx512): Likewise.
(avx512fmaskmode): Likewise.
(shuffletype): Likewise.
(sseinsnmode): Likewise.
(ssedoublevecmode): Likewise.
(ssehalfvecmode): Likewise.
(ssehalfvecmodelower): Likewise.
(ssePScmode): Likewise.
(ssescalarmode): Likewise.
(ssescalarmodelower): Likewise.
(sseintprefix): Likewise.
(i128): Likewise.
(bcstscalarsuff): Likewise.
(xtg_mode): Likewise.
(VI12HF_AVX512VL): New mode_iterator.
(VF_AVX512FP16): Likewise.
(VIHF): Likewise.
(VIHF_256): Likewise.
(VIHF_AVX512BW): Likewise.
(V16_256): Likewise.
(V32_512): Likewise.
(sseintmodesuffix): New mode_attr.
(sse): Add scalar and vector HFmodes.
(ssescalarmode): Add vector HFmode mapping.
(ssescalarmodesuffix): Add sh suffix for HFmode.
(*_vm3): Use VFH_128.
(*_vm3): Likewise.
(*ieee_3): Likewise.
(_blendm): New define_insn.
(vec_setv8hf): New define_expand.
(vec_set_0): New define_insn for HF vector set.
(*avx512fp16_movsh): Likewise.
(avx512fp16_movsh): Likewise.
(vec_extract_lo_v32hi): Rename to ...
(vec_extract_lo_): ... this, and adjust to allow HF
vector modes.
(vec_extract_hi_v32hi): Likewise.
(vec_extract_hi_): Likewise.
(vec_extract_lo_v16hi): Likewise.
(vec_extract_lo_): Likewise.
(vec_extract_hi_v16hi): Likewise.
(vec_extract_hi_): Likewise.
(vec_set_hi_v16hi): Likewise.
(vec_set_hi_): Likewise.
(vec_set_lo_v16hi): Likewise.
(vec_set_lo_: Likewise.
(*vec_extract_0): New define_insn_and_split for HF
vector extract.
(*vec_extracthf): New define_insn.
(VEC_EXTRACT_MODE): Add HF vector modes.
(PINSR_MODE): Add V8HF.
(sse2p4_1): Likewise.
(pinsr_evex_isa): Likewise.
(_pinsr): Adjust to support
insert for V8HFmode.
(pbroadcast_evex_isa): Add HF vector modes.
(AVX2_VEC_DUP_MODE): Likewise.
(VEC_INIT_MODE): Likewise.
(VEC_INIT_HALF_MODE): Likewise.
(avx2_pbroadcast): Adjust to support HF vector mode
broadcast.
(avx2_pbroadcast_1): Likewise.
(_vec_dup_1): Likewise.
(_vec_dup): Likewise.
(_vec_dup_gpr):
Likewise.
---
 gcc/config/i386/avx512fp16intrin.h | 172 +++
 gcc/config/i386/i386-builtin-types.def |   6 +-
 gcc/config/i386/i386-expand.c  | 124 +++-
 gcc/config/i386/i386-mode

[PATCH 06/10] AVX512FP16: Add testcase for vector init and broadcast intrinsics.

2021-07-21 Thread liuhongt via Gcc-patches

gcc/testsuite/ChangeLog:

* gcc.target/i386/m512-check.h: Add union128h, union256h, union512h.
* gcc.target/i386/avx512fp16-10a.c: New test.
* gcc.target/i386/avx512fp16-10b.c: Ditto.
* gcc.target/i386/avx512fp16-1a.c: Ditto.
* gcc.target/i386/avx512fp16-1b.c: Ditto.
* gcc.target/i386/avx512fp16-1c.c: Ditto.
* gcc.target/i386/avx512fp16-1d.c: Ditto.
* gcc.target/i386/avx512fp16-1e.c: Ditto.
* gcc.target/i386/avx512fp16-2a.c: Ditto.
* gcc.target/i386/avx512fp16-2b.c: Ditto.
* gcc.target/i386/avx512fp16-2c.c: Ditto.
* gcc.target/i386/avx512fp16-3a.c: Ditto.
* gcc.target/i386/avx512fp16-3b.c: Ditto.
* gcc.target/i386/avx512fp16-3c.c: Ditto.
* gcc.target/i386/avx512fp16-4.c: Ditto.
* gcc.target/i386/avx512fp16-5.c: Ditto.
* gcc.target/i386/avx512fp16-6.c: Ditto.
* gcc.target/i386/avx512fp16-7.c: Ditto.
* gcc.target/i386/avx512fp16-8.c: Ditto.
* gcc.target/i386/avx512fp16-9a.c: Ditto.
* gcc.target/i386/avx512fp16-9b.c: Ditto.
* gcc.target/i386/pr54855-13.c: Ditto.
* gcc.target/i386/avx512fp16-vec_set_var.c: Ditto.
---
 .../gcc.target/i386/avx512fp16-10a.c  |  14 ++
 .../gcc.target/i386/avx512fp16-10b.c  |  25 
 gcc/testsuite/gcc.target/i386/avx512fp16-1a.c |  24 
 gcc/testsuite/gcc.target/i386/avx512fp16-1b.c |  32 +
 gcc/testsuite/gcc.target/i386/avx512fp16-1c.c |  26 
 gcc/testsuite/gcc.target/i386/avx512fp16-1d.c |  33 +
 gcc/testsuite/gcc.target/i386/avx512fp16-1e.c |  30 
 gcc/testsuite/gcc.target/i386/avx512fp16-2a.c |  28 
 gcc/testsuite/gcc.target/i386/avx512fp16-2b.c |  33 +
 gcc/testsuite/gcc.target/i386/avx512fp16-2c.c |  36 +
 gcc/testsuite/gcc.target/i386/avx512fp16-3a.c |  36 +
 gcc/testsuite/gcc.target/i386/avx512fp16-3b.c |  35 +
 gcc/testsuite/gcc.target/i386/avx512fp16-3c.c |  40 ++
 gcc/testsuite/gcc.target/i386/avx512fp16-4.c  |  31 
 gcc/testsuite/gcc.target/i386/avx512fp16-5.c  | 133 ++
 gcc/testsuite/gcc.target/i386/avx512fp16-6.c  |  57 
 gcc/testsuite/gcc.target/i386/avx512fp16-7.c  |  86 +++
 gcc/testsuite/gcc.target/i386/avx512fp16-8.c  |  53 +++
 gcc/testsuite/gcc.target/i386/avx512fp16-9a.c |  27 
 gcc/testsuite/gcc.target/i386/avx512fp16-9b.c |  49 +++
 .../gcc.target/i386/avx512fp16-vec_set_var.c  |  30 
 gcc/testsuite/gcc.target/i386/m512-check.h|  38 -
 gcc/testsuite/gcc.target/i386/pr54855-13.c|  14 ++
 23 files changed, 909 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-10a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-10b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1d.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-1e.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-2b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-2c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-3a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-3b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-3c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-9a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-9b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vec_set_var.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr54855-13.c

diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-10a.c 
b/gcc/testsuite/gcc.target/i386/avx512fp16-10a.c
new file mode 100644
index 000..f06a822
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-10a.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include 
+
+__m128h
+__attribute__ ((noinline, noclone))
+set_128 (_Float16 x)
+{
+  return _mm_set_sh (x);
+}
+
+/* { dg-final { scan-assembler-times "vmovw\[ \t]\+\[^\n\r]*xmm0" 1 { target { 
ia32 } } } } */
+/* { dg-final { scan-assembler-times "vmovw\[ \t]\+\[^\n\r]*xmm0" 2 { target { 
! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-10b.c 
b/gcc/testsuite/gcc.target/i386/avx512fp16-10b.c
new file mode 100644
index 000..055edd7aaf5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-10b.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target avx512fp16 }

[PATCH 07/10] AVX512FP16: Add tests for vector passing in variable arguments.

2021-07-21 Thread liuhongt via Gcc-patches

From: "H.J. Lu" 

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vararg-1.c: New test.
* gcc.target/i386/avx512fp16-vararg-2.c: Ditto.
* gcc.target/i386/avx512fp16-vararg-3.c: Ditto.
* gcc.target/i386/avx512fp16-vararg-4.c: Ditto.
---
 .../gcc.target/i386/avx512fp16-vararg-1.c | 122 ++
 .../gcc.target/i386/avx512fp16-vararg-2.c | 107 +++
 .../gcc.target/i386/avx512fp16-vararg-3.c | 114 
 .../gcc.target/i386/avx512fp16-vararg-4.c | 115 +
 4 files changed, 458 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vararg-4.c

diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-1.c 
b/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-1.c
new file mode 100644
index 000..9bd366838b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vararg-1.c
@@ -0,0 +1,122 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512fp16 } */
+/* { dg-options "-mavx512fp16" } */
+
+#include 
+#include 
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+#include "avx512-check.h"
+
+struct m256h
+{
+  __m256h  v;
+};
+
+__m128 n1 = { -283.3, -23.3, 213.4, 1119.03 };
+struct m256h n2 = { { -93.83f16, 893.318f16, 3994.3f16, -39484.0f16, 213.4f16, 
323.4f16, 42.5f16, -43.4f16,
+ 234.4f16, 93.9f16, 34.5f16, -14.5f16, -34.9f16, 
-421.0f16, 234.5f16, 214.5f16 } };
+__m128h n3 = { 11.5f16, -31.80f16, 242.3f16, 136.4f16, 42.8f16, -22.8f16, 
343.8f16, 215.4f16 } ;
+_Float16 n4 = 32.4f16;
+double n5 = 103.3;
+__m128h n6 = { -12.3f16, 2.0f16, 245.9f16, -432.1f16, 53.5f16, -13.4f16, 
432.5f16, 482.4f16 };
+__m128d n7 = { -91.387, -8193.518 };
+struct m256h n8 = { { -93.83f16, 893.318f16, 3994.3f16, -39484.0f16, 213.4f16, 
323.4f16, 42.5f16, -43.4f16,
+ 234.4f16, 93.9f16, 34.5f16, -14.5f16, -34.9f16, 
-421.0f16, 234.5f16, 214.5f16 } };
+__m128 n9 = { -123.3, 2.3, 3.4, -10.03 };
+__m128h n10 = { 123.3f16, -100.0f16, 246.9f16, 13.4f16, -134.4f16, 35.4f16, 
156.5f16, 953.1f16 };
+_Float16 n11 = 40.7f16;
+double n12 = 304.9;
+__m128h n13 = { 23.3f16, -11.0f16, 24.5f16, -24.5f16, 535.4f16, 35.4f16, 
-13.4f16, 14.5f16 };
+__m256h n14 = { -123.3f16, 23.9f16, 34.4f16, -100.3f16, 284.4f16, 352.5f16, 
131.5f16, -13.2f16,
+   131.4f16, 382.5f16, 38.5f16, 99.6f16, 423.2f16, -12.44f16, 
43.2f16, -34.45f16 };
+__m512h n15 = { -39.3f16, -180.9f16, 13.4f16, 35.4f16, -41.1f16, -14.4f16, 
24.5f16, 53.54f16,
+238.4f16, -134.8f16, 24.5f16, 35.6f16, -346.7f16, -43.4f16, 
-535.3f16, 324.7f16,
+82.5f16, 21.4f16, 24.4f16, 53.4f16, 23.5f16, -24.4f16, 
-34.5f16, -32.5f16,
+23.6f16, -13.4f16, 24.5f16, 35.5f16, -34.4f16, -24.5f16, 
-34.5f16, 13.5f16 };
+__m128d n16 = { 73.0, 63.18 };
+__m256 n17 = { -183.3, -22.3, 13.9, -119.3, 483.1, 122.3, -33.4, -9.37 };
+__m128 n18 = { -183.3, 22.3, 13.4, -19.03 };
+
+__m128 e1;
+struct m256h e2;
+__m128h e3;
+_Float16 e4;
+double e5;
+__m128h e6;
+__m128d e7;
+struct m256h e8;
+__m128 e9;
+__m128h e10;
+_Float16 e11;
+double e12;
+__m128h e13;
+__m256h e14;
+__m512h e15;
+__m128d e16;
+__m256 e17;
+__m128 e18;
+
+static void
+__attribute__((noinline))
+foo (va_list va_arglist)
+{
+  e4 = va_arg (va_arglist, _Float16);
+  e5 = va_arg (va_arglist, double);
+  e6 = va_arg (va_arglist, __m128h);
+  e7 = va_arg (va_arglist, __m128d);
+  e8 = va_arg (va_arglist, struct m256h);
+  e9 = va_arg (va_arglist, __m128);
+  e10 = va_arg (va_arglist, __m128h);
+  e11 = va_arg (va_arglist, _Float16);
+  e12 = va_arg (va_arglist, double);
+  e13 = va_arg (va_arglist, __m128h);
+  e14 = va_arg (va_arglist, __m256h);
+  e15 = va_arg (va_arglist, __m512h);
+  e16 = va_arg (va_arglist, __m128d);
+  e17 = va_arg (va_arglist, __m256);
+  e18 = va_arg (va_arglist, __m128);
+  va_end (va_arglist);
+}
+
+static void
+__attribute__((noinline))
+test (__m128 a1, struct m256h a2, __m128h a3, ...)
+{
+  va_list va_arglist;
+
+  e1 = a1;
+  e2 = a2;
+  e3 = a3;
+  va_start (va_arglist, a3);
+  foo (va_arglist);
+  va_end (va_arglist);
+}
+
+static void
+do_test (void)
+{
+  test (n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11, n12,
+   n13, n14, n15, n16, n17, n18);
+  assert (__builtin_memcmp (&e1, &n1, sizeof (e1)) == 0);
+  assert (__builtin_memcmp (&e2, &n2, sizeof (e2)) == 0);
+  assert (__builtin_memcmp (&e3, &n3, sizeof (e3)) == 0);
+  assert (n4 == e4);
+  assert (n5 == e5);
+  assert (__builtin_memcmp (&e6, &n6, sizeof (e6)) == 0);
+  assert (__builtin_memcmp (&e7, &n7, sizeof (e7)) == 0);
+  assert (__builtin_memcmp (&e8, &n8, sizeof (e8)) == 0);
+  assert (__builtin_memcmp (&e9, &n9, sizeof (e9)) == 0);
+  assert (__builtin_memcmp (&e10, &n10, sizeo

[PATCH 09/10] AVX512FP16: Add ABI test for ymm.

2021-07-21 Thread liuhongt via Gcc-patches

gcc/testsuite/ChangeLog:

* gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp:
New exp file.
* gcc.target/x86_64/abi/avx512fp16/m256h/args.h: New header.
* gcc.target/x86_64/abi/avx512fp16/m256h/avx512fp16-ymm-check.h:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m256h/asm-support.S: New.
* gcc.target/x86_64/abi/avx512fp16/m256h/test_m256_returning.c:
New test.
* gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_m256.c: Likewise.
* gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_structs.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_unions.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m256h/test_varargs-m256.c: Likewise.
---
 .../avx512fp16/m256h/abi-avx512fp16-ymm.exp   |  45 +++
 .../x86_64/abi/avx512fp16/m256h/args.h| 182 +
 .../x86_64/abi/avx512fp16/m256h/asm-support.S |  81 
 .../avx512fp16/m256h/avx512fp16-ymm-check.h   |   3 +
 .../avx512fp16/m256h/test_m256_returning.c|  54 +++
 .../abi/avx512fp16/m256h/test_passing_m256.c  | 370 ++
 .../avx512fp16/m256h/test_passing_structs.c   | 113 ++
 .../avx512fp16/m256h/test_passing_unions.c| 337 
 .../abi/avx512fp16/m256h/test_varargs-m256.c  | 160 
 9 files changed, 1345 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/args.h
 create mode 100644 
gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/asm-support.S
 create mode 100644 
gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/avx512fp16-ymm-check.h
 create mode 100644 
gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_m256_returning.c
 create mode 100644 
gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_m256.c
 create mode 100644 
gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_structs.c
 create mode 100644 
gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_unions.c
 create mode 100644 
gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/test_varargs-m256.c

diff --git 
a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp 
b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp
new file mode 100644
index 000..ecf673bf796
--- /dev/null
+++ 
b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp
@@ -0,0 +1,45 @@
+# Copyright (C) 2019 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+# The x86-64 ABI testsuite needs one additional assembler file for most
+# testcases.  For simplicity we will just link it into each test.
+
+load_lib c-torture.exp
+load_lib target-supports.exp
+load_lib torture-options.exp
+load_lib file-format.exp
+
+if { (![istarget x86_64-*-*] && ![istarget i?86-*-*])
+ || [is-effective-target ia32]
+ || [gcc_target_object_format] != "elf"
+ || ![is-effective-target avx512fp16] } then {
+  return
+}
+
+
+torture-init
+set-torture-options $C_TORTURE_OPTIONS
+set additional_flags "-W -Wall -Wno-abi -mavx512fp16"
+
+foreach src [lsort [glob -nocomplain $srcdir/$subdir/test_*.c]] {
+if {[runtest_file_p $runtests $src]} {
+   c-torture-execute [list $src \
+   $srcdir/$subdir/asm-support.S] \
+   $additional_flags
+}
+}
+
+torture-finish
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/args.h 
b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/args.h
new file mode 100644
index 000..136db48c144
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m256h/args.h
@@ -0,0 +1,182 @@
+#ifndef INCLUDED_ARGS_H
+#define INCLUDED_ARGS_H
+
+#include 
+#include 
+
+/* Assertion macro.  */
+#define assert(test) if (!(test)) abort()
+
+#ifdef __GNUC__
+#define ATTRIBUTE_UNUSED __attribute__((__unused__))
+#else
+#define ATTRIBUTE_UNUSED
+#endif
+
+/* This defines the calling sequences for integers and floats.  */
+#define I0 rdi
+#define I1 rsi
+#define I2 rdx
+#define I3 rcx
+#define I4 r8
+#define I5 r9
+#define F0 ymm0
+#define F1 ymm1
+#define F2 ymm2
+#define F3 ymm3
+#define F4 ymm4
+#define F5 ymm5
+#define F6 ymm6
+#define F7 ymm7
+
+typedef union {
+

[PATCH 10/10] AVX512FP16: Add abi test for zmm

2021-07-21 Thread liuhongt via Gcc-patches

gcc/testsuite/ChangeLog:

* gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp:
New file.
* gcc.target/x86_64/abi/avx512fp16/m512h/args.h: Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/asm-support.S: Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/avx512fp16-zmm-check.h:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/test_m512_returning.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_m512.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_structs.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_unions.c:
Likewise.
* gcc.target/x86_64/abi/avx512fp16/m512h/test_varargs-m512.c:
Likewise.
---
 .../avx512fp16/m512h/abi-avx512fp16-zmm.exp   |  48 ++
 .../x86_64/abi/avx512fp16/m512h/args.h| 186 
 .../x86_64/abi/avx512fp16/m512h/asm-support.S |  97 
 .../avx512fp16/m512h/avx512fp16-zmm-check.h   |   4 +
 .../avx512fp16/m512h/test_m512_returning.c|  62 +++
 .../abi/avx512fp16/m512h/test_passing_m512.c  | 380 
 .../avx512fp16/m512h/test_passing_structs.c   | 123 ++
 .../avx512fp16/m512h/test_passing_unions.c| 415 ++
 .../abi/avx512fp16/m512h/test_varargs-m512.c  | 164 +++
 9 files changed, 1479 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/args.h
 create mode 100644 
gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/asm-support.S
 create mode 100644 
gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/avx512fp16-zmm-check.h
 create mode 100644 
gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_m512_returning.c
 create mode 100644 
gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_m512.c
 create mode 100644 
gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_structs.c
 create mode 100644 
gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_unions.c
 create mode 100644 
gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/test_varargs-m512.c

diff --git 
a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp 
b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp
new file mode 100644
index 000..33d24762788
--- /dev/null
+++ 
b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp
@@ -0,0 +1,48 @@
+# Copyright (C) 2019 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+# The x86-64 ABI testsuite needs one additional assembler file for most
+# testcases.  For simplicity we will just link it into each test.
+
+load_lib c-torture.exp
+load_lib target-supports.exp
+load_lib torture-options.exp
+load_lib clearcap.exp
+load_lib file-format.exp
+
+if { (![istarget x86_64-*-*] && ![istarget i?86-*-*])
+ || [is-effective-target ia32]
+ || [gcc_target_object_format] != "elf"
+ || ![is-effective-target avx512fp16] } then {
+  return
+}
+
+
+torture-init
+clearcap-init
+set-torture-options $C_TORTURE_OPTIONS
+set additional_flags "-W -Wall -Wno-abi -mavx512fp16"
+
+foreach src [lsort [glob -nocomplain $srcdir/$subdir/test_*.c]] {
+if {[runtest_file_p $runtests $src]} {
+   c-torture-execute [list $src \
+   $srcdir/$subdir/asm-support.S] \
+   $additional_flags
+}
+}
+
+clearcap-finish
+torture-finish
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/args.h 
b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/args.h
new file mode 100644
index 000..ec89fae4597
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512fp16/m512h/args.h
@@ -0,0 +1,186 @@
+#ifndef INCLUDED_ARGS_H
+#define INCLUDED_ARGS_H
+
+#include 
+#include 
+
+/* Assertion macro.  */
+#define assert(test) if (!(test)) abort()
+
+#ifdef __GNUC__
+#define ATTRIBUTE_UNUSED __attribute__((__unused__))
+#else
+#define ATTRIBUTE_UNUSED
+#endif
+
+/* This defines the calling sequences for integers and floats.  */
+#define I0 rdi
+#define I1 rsi
+#define I2 rdx
+#define I3 rcx
+#define I4 r8
+#define I5 r9
+#define F0 zmm0
+#define F1 zmm1
+#define F2 zmm2
+#define F3 zmm3
+#define F4 zmm4
+#defi

[committed] c++: Ensure OpenMP reduction with reference type references complete type [PR101516]

2021-07-21 Thread Jakub Jelinek via Gcc-patches

Hi!

The following testcase ICEs because we haven't verified if reduction decl
has reference type that TREE_TYPE of the reference is a complete type,
require_complete_type on the decl doesn't ensure that.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2021-07-21  Jakub Jelinek  

PR c++/101516
* semantics.c (finish_omp_reduction_clause): Also call
complete_type_or_else and return true if it fails.

* g++.dg/gomp/pr101516.C: New test.

--- gcc/cp/semantics.c.jj   2021-07-15 10:16:12.972581906 +0200
+++ gcc/cp/semantics.c  2021-07-20 13:31:04.972039268 +0200
@@ -6070,7 +6070,8 @@ finish_omp_reduction_clause (tree c, boo
   if (!processing_template_decl)
{
  t = require_complete_type (t);
- if (t == error_mark_node)
+ if (t == error_mark_node
+ || !complete_type_or_else (oatype, NULL_TREE))
return true;
  tree size = size_binop (EXACT_DIV_EXPR, TYPE_SIZE_UNIT (oatype),
  TYPE_SIZE_UNIT (type));
--- gcc/testsuite/g++.dg/gomp/pr101516.C.jj 2021-07-20 13:51:41.690789542 
+0200
+++ gcc/testsuite/g++.dg/gomp/pr101516.C2021-07-20 13:51:16.463141545 
+0200
@@ -0,0 +1,8 @@
+// PR c++/101516
+
+void
+foo (int (&v) [])
+{
+  #pragma omp parallel reduction (+:v) // { dg-error "invalid use of array 
with unspecified bounds" }
+  ;
+}

Jakub

Re: [PATCH 2/2][RFC] Add loop masking support for x86

2021-07-21 Thread Hongtao Liu via Gcc-patches

On Tue, Jul 20, 2021 at 3:38 PM Richard Biener  wrote:
>
> On Tue, 20 Jul 2021, Hongtao Liu wrote:
>
> > On Fri, Jul 16, 2021 at 5:11 PM Richard Biener  wrote:
> > >
> > > On Thu, 15 Jul 2021, Richard Biener wrote:
> > >
> > > > On Thu, 15 Jul 2021, Richard Biener wrote:
> > > >
> > > > > OK, guess I was more looking at
> > > > >
> > > > > #define N 32
> > > > > int foo (unsigned long *a, unsigned long * __restrict b,
> > > > >  unsigned int *c, unsigned int * __restrict d,
> > > > >  int n)
> > > > > {
> > > > >   unsigned sum = 1;
> > > > >   for (int i = 0; i < n; ++i)
> > > > > {
> > > > >   b[i] += a[i];
> > > > >   d[i] += c[i];
> > > > > }
> > > > >   return sum;
> > > > > }
> > > > >
> > > > > where we on x86 AVX512 vectorize with V8DI and V16SI and we
> > > > > generate two masks for the two copies of V8DI (VF is 16) and one
> > > > > mask for V16SI.  With SVE I see
> > > > >
> > > > > punpklo p1.h, p0.b
> > > > > punpkhi p2.h, p0.b
> > > > >
> > > > > that's sth I expected to see for AVX512 as well, using the V16SI
> > > > > mask and unpacking that to two V8DI ones.  But I see
> > > > >
> > > > > vpbroadcastd%eax, %ymm0
> > > > > vpaddd  %ymm12, %ymm0, %ymm0
> > > > > vpcmpud $6, %ymm0, %ymm11, %k3
> > > > > vpbroadcastd%eax, %xmm0
> > > > > vpaddd  %xmm10, %xmm0, %xmm0
> > > > > vpcmpud $1, %xmm7, %xmm0, %k1
> > > > > vpcmpud $6, %xmm0, %xmm8, %k2
> > > > > kortestb%k1, %k1
> > > > > jne .L3
> > > > >
> > > > > so three %k masks generated by vpcmpud.  I'll have to look what's
> > > > > the magic for SVE and why that doesn't trigger for x86 here.
> > > >
> > > > So answer myself, vect_maybe_permute_loop_masks looks for
> > > > vec_unpacku_hi/lo_optab, but with AVX512 the vector bools have
> > > > QImode so that doesn't play well here.  Not sure if there
> > > > are proper mask instructions to use (I guess there's a shift
> > > > and lopart is free).  This is QI:8 to two QI:4 (bits) mask
> > Yes, for 16bit and more, we have KUNPCKBW/D/Q. but for 8bit
> > unpack_lo/hi, only shift.
> > > > conversion.  Not sure how to better ask the target here - again
> > > > VnBImode might have been easier here.
> > >
> > > So I've managed to "emulate" the unpack_lo/hi for the case of
> > > !VECTOR_MODE_P masks by using sub-vector select (we're asking
> > > to turn vector(8)  into two
> > > vector(4) ) via BIT_FIELD_REF.  That then
> > > produces the desired single mask producer and
> > >
> > >   loop_mask_38 = VIEW_CONVERT_EXPR > > >(loop_mask_54);
> > >   loop_mask_37 = BIT_FIELD_REF ;
> > >
> > > note for the lowpart we can just view-convert away the excess bits,
> > > fully re-using the mask.  We generate surprisingly "good" code:
> > >
> > > kmovb   %k1, %edi
> > > shrb$4, %dil
> > > kmovb   %edi, %k2
> > >
> > > besides the lack of using kshiftrb.  I guess we're just lacking
> > > a mask register alternative for
> > Yes, we can do it similar as kor/kand/kxor.
> > >
> > > (insn 22 20 25 4 (parallel [
> > > (set (reg:QI 94 [ loop_mask_37 ])
> > > (lshiftrt:QI (reg:QI 98 [ loop_mask_54 ])
> > > (const_int 4 [0x4])))
> > > (clobber (reg:CC 17 flags))
> > > ]) 724 {*lshrqi3_1}
> > >  (expr_list:REG_UNUSED (reg:CC 17 flags)
> > > (nil)))
> > >
> > > and so we reload.  For the above cited loop the AVX512 vectorization
> > > with --param vect-partial-vector-usage=1 does look quite sensible
> > > to me.  Instead of a SSE vectorized epilogue plus a scalar
> > > epilogue we get a single fully masked AVX512 "iteration" for both.
> > > I suppose it's still mostly a code-size optimization (384 bytes
> > > with the masked epiloge vs. 474 bytes with trunk) since it will
> > > be likely slower for very low iteration counts but it's good
> > > for icache usage then and good for less branch predictor usage.
> > >
> > > That said, I have to set up SPEC on a AVX512 machine to do
> > Does patch  land in trunk already, i can have a test on CLX.
>
> I'm still experimenting a bit right now but hope to get something
> trunk ready at the end of this or beginning next week.  Since it's
> disabled by default we can work on improving it during stage1 then.
>
> I'm mostly struggling with the GIMPLE IL to be used for the
> mask unpacking since we currently reject both the BIT_FIELD_REF
> and the VIEW_CONVERT we generate (why do AVX512 masks not all have
> SImode but sometimes QImode and sometimes HImode ...).  Unfortunately
> we've dropped whole-vector shifts in favor of VEC_PERM but that
> doesn't work well either for integer mode vectors.  So I'm still
> playing with my options here and looking for something that doesn't
> require too much surgery on the RTL side to recover good mask
> register code ...
>
> Another part missing is expanders for the various cond_* patterns
>
> OPTAB_D (cond_add_

[committed] openmp: Fix up omp_check_private [PR101535]

2021-07-21 Thread Jakub Jelinek via Gcc-patches

Hi!

The target data construct shouldn't affect omp_check_private, unless
the decl there is privatized (use_device_* clauses).  The routine
had some code for that, but it just did continue; in a loop that looped
only if the region type is one of selected 4 kinds, so effectively resulted
in return false; instead of looping again.  And not diagnosing lastprivate
(or reduction etc.) on a variable that is private to containing parallel
results in ICEs later on, as there is no original list item to which store
the last result.
The target construct is unclear as it has an implicit parallel region
and it is not obvious if the data privatization clauses on the construct
shall be treated as data privatization on the implicit parallel or just
on the target.  For now treat those as privatization on the implicit
parallel, but treat map clauses as shared on the implicit parallel.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2021-07-21  Jakub Jelinek  

PR middle-end/101535
* gimplify.c (omp_check_private): Properly skip ORT_TARGET_DATA
contexts in which decl isn't privatized and for ORT_TARGET return
false if decl is mapped.

* c-c++-common/gomp/pr101535-1.c: New test.
* c-c++-common/gomp/pr101535-2.c: New test.

--- gcc/gimplify.c.jj   2021-07-15 10:17:00.0 +0200
+++ gcc/gimplify.c  2021-07-20 20:00:26.881050967 +0200
@@ -7798,7 +7798,13 @@ omp_check_private (struct gimplify_omp_c
 
   if ((ctx->region_type & (ORT_TARGET | ORT_TARGET_DATA)) != 0
  && (n == NULL || (n->value & GOVD_DATA_SHARE_CLASS) == 0))
-   continue;
+   {
+ if ((ctx->region_type & ORT_TARGET_DATA) != 0
+ || n == NULL
+ || (n->value & GOVD_MAP) == 0)
+   continue;
+ return false;
+   }
 
   if (n != NULL)
{
@@ -7807,11 +7813,16 @@ omp_check_private (struct gimplify_omp_c
return false;
  return (n->value & GOVD_SHARED) == 0;
}
+
+  if (ctx->region_type == ORT_WORKSHARE
+ || ctx->region_type == ORT_TASKGROUP
+ || ctx->region_type == ORT_SIMD
+ || ctx->region_type == ORT_ACC)
+   continue;
+
+  break;
 }
-  while (ctx->region_type == ORT_WORKSHARE
-|| ctx->region_type == ORT_TASKGROUP
-|| ctx->region_type == ORT_SIMD
-|| ctx->region_type == ORT_ACC);
+  while (1);
   return false;
 }
 
--- gcc/testsuite/c-c++-common/gomp/pr101535-1.c.jj 2021-07-20 
20:03:58.686095021 +0200
+++ gcc/testsuite/c-c++-common/gomp/pr101535-1.c2021-07-20 
20:03:03.507865101 +0200
@@ -0,0 +1,31 @@
+/* PR middle-end/101535 */
+
+void
+foo (void)
+{
+  int a = 1, i;
+  #pragma omp target data map(to:a)
+  #pragma omp for lastprivate(i)   /* { dg-error "lastprivate variable 'i' 
is private in outer context" } */
+  for (i = 1; i < 2; i++)
+;
+}
+
+void
+bar (void)
+{
+  int a = 1, i;
+  #pragma omp target private(i)
+  #pragma omp for lastprivate(i)   /* { dg-error "lastprivate variable 'i' 
is private in outer context" } */
+  for (i = 1; i < 2; i++)
+;
+}
+
+void
+baz (void)
+{
+  int a = 1, i;
+  #pragma omp target firstprivate(i)
+  #pragma omp for lastprivate(i)   /* { dg-error "lastprivate variable 'i' 
is private in outer context" } */
+  for (i = 1; i < 2; i++)
+;
+}
--- gcc/testsuite/c-c++-common/gomp/pr101535-2.c.jj 2021-07-20 
20:04:01.749052273 +0200
+++ gcc/testsuite/c-c++-common/gomp/pr101535-2.c2021-07-20 
20:09:33.086428032 +0200
@@ -0,0 +1,11 @@
+/* PR middle-end/101535 */
+
+void
+foo (void)
+{
+  int a = 1, i;
+  #pragma omp target map(tofrom:i)
+  #pragma omp for lastprivate(i)
+  for (i = 1; i < 2; i++)
+;
+}

Jakub

[PATCH] c/101512 - fix missing address-taking in c_common_mark_addressable_vec

2021-07-21 Thread Richard Biener

c_common_mark_addressable_vec fails to look through C_MAYBE_CONST_EXPR
in the case it isn't at the toplevel.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

2021-07-21  Richard Biener  

PR c/101512
gcc/c-family/
* c-common.c (c_common_mark_addressable_vec): Look through
C_MAYBE_CONST_EXPR even if not at the toplevel.

* gcc.dg/torture/pr101512.c: New testcase.
---
 gcc/c-family/c-common.c | 11 +++
 gcc/testsuite/gcc.dg/torture/pr101512.c | 11 +++
 2 files changed, 18 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr101512.c

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index aacdfb46a02..21da679cd3c 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -6894,10 +6894,13 @@ complete_flexible_array_elts (tree init)
 void 
 c_common_mark_addressable_vec (tree t)
 {   
-  if (TREE_CODE (t) == C_MAYBE_CONST_EXPR)
-t = C_MAYBE_CONST_EXPR_EXPR (t);
-  while (handled_component_p (t))
-t = TREE_OPERAND (t, 0);
+  while (handled_component_p (t) || TREE_CODE (t) == C_MAYBE_CONST_EXPR)
+{
+  if (TREE_CODE (t) == C_MAYBE_CONST_EXPR)
+   t = C_MAYBE_CONST_EXPR_EXPR (t);
+  else
+   t = TREE_OPERAND (t, 0);
+}
   if (!VAR_P (t)
   && TREE_CODE (t) != PARM_DECL
   && TREE_CODE (t) != COMPOUND_LITERAL_EXPR)
diff --git a/gcc/testsuite/gcc.dg/torture/pr101512.c 
b/gcc/testsuite/gcc.dg/torture/pr101512.c
new file mode 100644
index 000..a25da2aa0b5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr101512.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-w -Wno-psabi" } */
+
+int n();
+typedef unsigned long V __attribute__ ((vector_size (64)));
+V
+foo (int i, V v)
+{
+  i = ((V)(V){n()})[n()];
+  return v + i;
+}
-- 
2.26.2

Re: [PATCH 2/2][RFC] Add loop masking support for x86

2021-07-21 Thread Richard Biener

On Wed, 21 Jul 2021, Hongtao Liu wrote:

> On Tue, Jul 20, 2021 at 3:38 PM Richard Biener  wrote:
> >
> > On Tue, 20 Jul 2021, Hongtao Liu wrote:
> >
> > > On Fri, Jul 16, 2021 at 5:11 PM Richard Biener  wrote:
> > > >
> > > > On Thu, 15 Jul 2021, Richard Biener wrote:
> > > >
> > > > > On Thu, 15 Jul 2021, Richard Biener wrote:
> > > > >
> > > > > > OK, guess I was more looking at
> > > > > >
> > > > > > #define N 32
> > > > > > int foo (unsigned long *a, unsigned long * __restrict b,
> > > > > >  unsigned int *c, unsigned int * __restrict d,
> > > > > >  int n)
> > > > > > {
> > > > > >   unsigned sum = 1;
> > > > > >   for (int i = 0; i < n; ++i)
> > > > > > {
> > > > > >   b[i] += a[i];
> > > > > >   d[i] += c[i];
> > > > > > }
> > > > > >   return sum;
> > > > > > }
> > > > > >
> > > > > > where we on x86 AVX512 vectorize with V8DI and V16SI and we
> > > > > > generate two masks for the two copies of V8DI (VF is 16) and one
> > > > > > mask for V16SI.  With SVE I see
> > > > > >
> > > > > > punpklo p1.h, p0.b
> > > > > > punpkhi p2.h, p0.b
> > > > > >
> > > > > > that's sth I expected to see for AVX512 as well, using the V16SI
> > > > > > mask and unpacking that to two V8DI ones.  But I see
> > > > > >
> > > > > > vpbroadcastd%eax, %ymm0
> > > > > > vpaddd  %ymm12, %ymm0, %ymm0
> > > > > > vpcmpud $6, %ymm0, %ymm11, %k3
> > > > > > vpbroadcastd%eax, %xmm0
> > > > > > vpaddd  %xmm10, %xmm0, %xmm0
> > > > > > vpcmpud $1, %xmm7, %xmm0, %k1
> > > > > > vpcmpud $6, %xmm0, %xmm8, %k2
> > > > > > kortestb%k1, %k1
> > > > > > jne .L3
> > > > > >
> > > > > > so three %k masks generated by vpcmpud.  I'll have to look what's
> > > > > > the magic for SVE and why that doesn't trigger for x86 here.
> > > > >
> > > > > So answer myself, vect_maybe_permute_loop_masks looks for
> > > > > vec_unpacku_hi/lo_optab, but with AVX512 the vector bools have
> > > > > QImode so that doesn't play well here.  Not sure if there
> > > > > are proper mask instructions to use (I guess there's a shift
> > > > > and lopart is free).  This is QI:8 to two QI:4 (bits) mask
> > > Yes, for 16bit and more, we have KUNPCKBW/D/Q. but for 8bit
> > > unpack_lo/hi, only shift.
> > > > > conversion.  Not sure how to better ask the target here - again
> > > > > VnBImode might have been easier here.
> > > >
> > > > So I've managed to "emulate" the unpack_lo/hi for the case of
> > > > !VECTOR_MODE_P masks by using sub-vector select (we're asking
> > > > to turn vector(8)  into two
> > > > vector(4) ) via BIT_FIELD_REF.  That then
> > > > produces the desired single mask producer and
> > > >
> > > >   loop_mask_38 = VIEW_CONVERT_EXPR > > > >(loop_mask_54);
> > > >   loop_mask_37 = BIT_FIELD_REF ;
> > > >
> > > > note for the lowpart we can just view-convert away the excess bits,
> > > > fully re-using the mask.  We generate surprisingly "good" code:
> > > >
> > > > kmovb   %k1, %edi
> > > > shrb$4, %dil
> > > > kmovb   %edi, %k2
> > > >
> > > > besides the lack of using kshiftrb.  I guess we're just lacking
> > > > a mask register alternative for
> > > Yes, we can do it similar as kor/kand/kxor.
> > > >
> > > > (insn 22 20 25 4 (parallel [
> > > > (set (reg:QI 94 [ loop_mask_37 ])
> > > > (lshiftrt:QI (reg:QI 98 [ loop_mask_54 ])
> > > > (const_int 4 [0x4])))
> > > > (clobber (reg:CC 17 flags))
> > > > ]) 724 {*lshrqi3_1}
> > > >  (expr_list:REG_UNUSED (reg:CC 17 flags)
> > > > (nil)))
> > > >
> > > > and so we reload.  For the above cited loop the AVX512 vectorization
> > > > with --param vect-partial-vector-usage=1 does look quite sensible
> > > > to me.  Instead of a SSE vectorized epilogue plus a scalar
> > > > epilogue we get a single fully masked AVX512 "iteration" for both.
> > > > I suppose it's still mostly a code-size optimization (384 bytes
> > > > with the masked epiloge vs. 474 bytes with trunk) since it will
> > > > be likely slower for very low iteration counts but it's good
> > > > for icache usage then and good for less branch predictor usage.
> > > >
> > > > That said, I have to set up SPEC on a AVX512 machine to do
> > > Does patch  land in trunk already, i can have a test on CLX.
> >
> > I'm still experimenting a bit right now but hope to get something
> > trunk ready at the end of this or beginning next week.  Since it's
> > disabled by default we can work on improving it during stage1 then.
> >
> > I'm mostly struggling with the GIMPLE IL to be used for the
> > mask unpacking since we currently reject both the BIT_FIELD_REF
> > and the VIEW_CONVERT we generate (why do AVX512 masks not all have
> > SImode but sometimes QImode and sometimes HImode ...).  Unfortunately
> > we've dropped whole-vector shifts in favor of VEC_PERM but that
> > doesn't work well either for integer mode vector

Re: [PATCH] c/101512 - fix missing address-taking in c_common_mark_addressable_vec

2021-07-21 Thread Jakub Jelinek via Gcc-patches

On Wed, Jul 21, 2021 at 10:06:51AM +0200, Richard Biener wrote:
> c_common_mark_addressable_vec fails to look through C_MAYBE_CONST_EXPR
> in the case it isn't at the toplevel.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
> 
> Thanks,
> Richard.
> 
> 2021-07-21  Richard Biener  
> 
>   PR c/101512
> gcc/c-family/
>   * c-common.c (c_common_mark_addressable_vec): Look through
>   C_MAYBE_CONST_EXPR even if not at the toplevel.
> 
>   * gcc.dg/torture/pr101512.c: New testcase.

I wonder if instead when trying to wrap
C_MAYBE_CONST_EXPR into a VIEW_CONVERT_EXPR we shouldn't be
removing that C_MAYBE_CONST_EXPR and perhaps adding it around the
VIEW_CONVERT_EXPR.  E.g. various routines in c/c-typeck.c like
build_unary_op remember int_operands, remove_c_maybe_const_expr
and at the end note_integer_operands.

If Joseph thinks it is ok to have C_MAYBE_CONST_EXPR inside of
VCE, then the patch looks good to me.

Jakub

GCC 11.1.1 Status Report (2021-07-21), branch frozen for release

2021-07-21 Thread Richard Biener



Status
==

The GCC 11 branch is now frozen for the upcoming GCC 11.2 release.
All changes require release manager approval now.


Quality Data


Priority  #   Change from last report
---   ---
P1  
P2  260   -  12
P3   96   +   2
P4  206   -   4
P5   24
---   ---
Total P1-P3 356   -  10
Total   586   -  14


Previous Report
===

https://gcc.gnu.org/pipermail/gcc/2021-July/236674.html

Re: [PATCH 2/4 REVIEW] libtool.m4: fix nm BSD flag detection

2021-07-21 Thread Alan Modra via Gcc-patches

On Wed, Jul 07, 2021 at 08:03:45PM +0100, Nick Alcock via Gcc-patches wrote:
> On 7 Jul 2021, Nick Clifton told this:
> 
> > Hi Nick,
> >
> >> Ping?
> >
> > Oops.
> 
> I sent a bunch of pings out at the same time, to a bunch of different
> projects. You are the only person to respond, so thank you!
> 
> >>>   PR libctf/27482
> >>>   * libtool.m4 (LT_PATH_NM): Try BSDization flags with a user-provided
> >
> > Changes to libtool need to be posted to the libtool project:
> >
> >   https://www.gnu.org/software/libtool/
> 
> I considered this, but there is *serious* divergence between the
> libtool.m4 in our tree and upstream. Fixing this divergence looks to be
> a fairly major project in and of itself :( the last real sync looked
> like being all the way back in 2008.

Yes, I looked at doing a sync myself a few years ago..
I'll OK the two libtool changes for binutils.

-- 
Alan Modra
Australia Development Lab, IBM

Re: [PATCH] Support logic shift left/right for avx512 mask type.

2021-07-21 Thread Uros Bizjak via Gcc-patches

On Wed, Jul 21, 2021 at 5:05 AM Hongtao Liu  wrote:
>
> On Tue, Jul 20, 2021 at 9:41 PM Uros Bizjak  wrote:
> >
> > On Tue, Jul 20, 2021 at 2:33 PM liuhongt  wrote:
> > >
> > > Hi:
> > >   As mention in 
> > > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575420.html
> > >
> > > cut start-
> > > > note for the lowpart we can just view-convert away the excess bits,
> > > > fully re-using the mask.  We generate surprisingly "good" code:
> > > >
> > > > kmovb   %k1, %edi
> > > > shrb$4, %dil
> > > > kmovb   %edi, %k2
> > > >
> > > > besides the lack of using kshiftrb.  I guess we're just lacking
> > > > a mask register alternative for
> > > Yes, we can do it similar as kor/kand/kxor.
> > > ---cut end
> > >
> > >   Bootstrap and regtested on x86_64-linux-gnu{-m32,}.
> > >   Ok for trunk?
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/i386/constraints.md (Wb): New constraint.
> > > (Ww): Ditto.
> > > * config/i386/i386.md (*ashlhi3_1): Extend to avx512 mask
> > > shift.
> > > (*ashlqi3_1): Ditto.
> > > (*3_1): Ditto.
> > > (*3_1): Ditto.
> > > * config/i386/sse.md (k): New define_split after
> > > it to convert generic shift pattern to mask shift ones.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/mask-shift.c: New test.


+(define_insn "*lshr3_1"
+  [(set (match_operand:SWI12 0 "nonimmediate_operand" "=m, ?k")
+(lshiftrt:SWI12
+  (match_operand:SWI12 1 "nonimmediate_operand" "0, k")
+  (match_operand:QI 2 "nonmemory_operand" "c, ")))
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_binary_operator_ok (LSHIFTRT, mode, operands)"

Also split this one to QImode and HImode to avoid conditions in isa attribute.

OK with this change.

Thanks,
Uros.

Re: [committed] RISC-V: Detect python and pick best one for calling multilib-generator

2021-07-21 Thread Kito Cheng via Gcc-patches

> > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > index 93e2b3219b9..3df9b52cf25 100644
> > --- a/gcc/config.gcc
> > +++ b/gcc/config.gcc
> > @@ -4730,9 +4730,10 @@ case "${target}" in
> >   echo "--with-multilib-list= can't used with 
> > --with-multilib-generator= at same time" 1>&2
> >   exit 1
> >   fi
> > + PYTHON=`which python || which python3 || which 
> > python2`
>
> which is a non-standard utility.

hmmm, good point, let me revert this at this moment.

> Additionally, you will get extra
> output on stderr when one of the commands is not found.

I tried PYTHON will get an empty string if all of those are not found,
no stderr from which command.

Re: [OG11] Merge GCC 11 into branch, cherry picks from mainline

2021-07-21 Thread Tobias Burnus


OG11 = devel/omp/gcc-11, a branch with some OpenMP/OpenACC/offload patches
which are not yet on mainline. Additionally, patches in this area are
cherry-picked from mainline.

Commits since my last email on 15 June 21 which ended with commit adda89fd071.

My commits are all only cherry-picks plus GCC11 merges and some fallout
commits. Thus, a rather boring list and nothing sophisticate.

But for completeness, those commits are hereby documented.
[I am not quite sure who actually is interested in this list.]

The list includes all commits in this span, including those
by others.

Cherry pick from mainline by me:
2021-06-15  35b3fbf5d52  Fortran/OpenMP: Extend defaultmap clause for OpenMP 5 
[PR92568]

Interlude: ChungLin's commit (no cherry pick):
2021-06-17  dbf5d72f4c0  Fixes for lambda in offload regions


Cherry picks by Marcel (first one), Andrew (amdgcn one) and me (rest)
from mainline – plus git merge from the GCC 11 branch:

2021-06-22  9cb373f4439  gcc/configure.ac: fix register issue for global_load 
assembler functions
2021-06-23  235d6eda48d  openmp: Fix up *_reduction clause handling with UDRs 
on PARM_DECLs [PR101167]
2021-06-23  7cadfa1e4c8  Merge remote-tracking branch 'origin/releases/gcc-11' 
into devel/omp/gcc-11
2021-06-28  4c7c00c362e  fortran/dump-parse-tree.c: Use proper enum type
2021-06-28  a82a305d19c  Merge remote-tracking branch 'origin/releases/gcc-11' 
into devel/omp/gcc-11
2021-06-29  5536f1065fe  libgomp.fortran/defaultmap-8.f90: Fix non-shared 
memory handling
2021-06-29  33ef3d64e4d  doc/invoke.texi: Sort flags in 'C Language Options'
2021-06-29  1e42bad6b96  Add 'default' to -foffload=; document that flag 
[PR67300]
2021-06-29  6f08285014b  gcc.c: Silence warning in check_offload_target_name
2021-07-19  e054a4f7784  gcc/ChangeLog.omp: Update for last commit
2021-07-19  e897bb0c27d  openmp: Reject #pragma omp atomic update, [PR101297]
2021-07-19  36de16fd74f  openmp: Initial support for OpenMP directives 
expressed as C++11 attributes
2021-07-20  f2f97e59bd2  Merge remote-tracking branch 'origin/releases/gcc-11' 
into devel/omp/gcc-11
2021-07-20  d47128d328f  amdgcn: Add -mxnack and -msram-ecc [PR 100208]
2021-07-21  1201a27fba3  Fortran: Fix bind(C) character length checks
2021-07-21  a0b34d73e34  c++: Ensure OpenMP reduction with reference type 
references complete type [PR101516]
2021-07-21  858d20e2945  openmp: Fix up omp_check_private [PR101535]

Cheers,

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

[PATCH] RISC-V: Allow multi-lib build with different code model

2021-07-21 Thread Kito Cheng

--with-multilib-generator was only support for different ISA/ABI
combination, however code model is effect the code gen a lots it
should able to handled in multilib mechanism.

Adding `--cmodel=` option to `--with-multilib-generator` to generating
multilib combination with different code model.

E.g.
--with-multilib-generator="rv64ima-lp64--;--cmodel=medlow,medany"
will generate 3 multi-lib suppport:
1) rv64ima with lp64
2) rv64ima with lp64 and medlow code model
3) rv64ima with lp64 and medany code model

gcc/

* config/riscv/multilib-generator: Support code model option for
multi-lib.
* doc/install.texi: Add document of new option for
--with-multilib-generator.
---
 gcc/config/riscv/multilib-generator | 86 +++--
 gcc/doc/install.texi| 17 ++
 2 files changed, 73 insertions(+), 30 deletions(-)

diff --git a/gcc/config/riscv/multilib-generator 
b/gcc/config/riscv/multilib-generator
index fe115b3184f..1164d1c5c8e 100755
--- a/gcc/config/riscv/multilib-generator
+++ b/gcc/config/riscv/multilib-generator
@@ -40,6 +40,7 @@ import collections
 import itertools
 from functools import reduce
 import subprocess
+import argparse
 
 #
 # TODO: Add test for this script.
@@ -127,44 +128,69 @@ def expand_combination(ext):
 
   return ext
 
-for cfg in sys.argv[1:]:
-  try:
-(arch, abi, extra, ext) = cfg.split('-')
-  except:
-print ("Invalid configure string %s, ---\n"
-   " and  can be empty, "
-   "e.g. rv32imafd-ilp32--" % cfg)
-sys.exit(1)
-
-  arch = arch_canonicalize (arch)
-  arches[arch] = 1
-  abis[abi] = 1
-  extra = list(filter(None, extra.split(',')))
-  ext_combs = expand_combination(ext)
-  alts = sum([[x] + [x + y for y in ext_combs] for x in [arch] + extra], [])
-  alts = list(map(arch_canonicalize, alts))
+multilib_cfgs = filter(lambda x:not x.startswith("--"), sys.argv[1:])
+options = filter(lambda x:x.startswith("--"), sys.argv[1:])
+
+parser = argparse.ArgumentParser()
+parser.add_argument("--cmodel", type=str)
+parser.add_argument("cfgs", type=str, nargs='*')
+args = parser.parse_args()
+
+if args.cmodel:
+  cmodels = [None] + args.cmodel.split(",")
+else:
+  cmodels = [None]
+
+cmodel_options = '/'.join(['mcmodel=%s' % x for x in cmodels[1:]])
+cmodel_dirnames = ' \\\n'.join(cmodels[1:])
+
+for cmodel in cmodels:
+  for cfg in args.cfgs:
+try:
+  (arch, abi, extra, ext) = cfg.split('-')
+except:
+  print ("Invalid configure string %s, ---\n"
+ " and  can be empty, "
+ "e.g. rv32imafd-ilp32--" % cfg)
+  sys.exit(1)
+
+# Compact code model only support rv64.
+if cmodel == "compact" and arch.startswith("rv32"):
+  continue
 
-  # Drop duplicated entry.
-  alts = unique(alts)
+arch = arch_canonicalize (arch)
+arches[arch] = 1
+abis[abi] = 1
+extra = list(filter(None, extra.split(',')))
+ext_combs = expand_combination(ext)
+alts = sum([[x] + [x + y for y in ext_combs] for x in [arch] + extra], [])
+alts = list(map(arch_canonicalize, alts))
 
-  for alt in alts:
-if alt == arch:
-  continue
-arches[alt] = 1
-reuse.append('march.%s/mabi.%s=march.%s/mabi.%s' % (arch, abi, alt, abi))
-  required.append('march=%s/mabi=%s' % (arch, abi))
+# Drop duplicated entry.
+alts = unique(alts)
+
+for alt in alts[1:]:
+  if alt == arch:
+continue
+  arches[alt] = 1
+  reuse.append('march.%s/mabi.%s=march.%s/mabi.%s' % (arch, abi, alt, abi))
+
+if cmodel:
+  required.append('march=%s/mabi=%s/mcmodel=%s' % (arch, abi, cmodel))
+else:
+  required.append('march=%s/mabi=%s' % (arch, abi))
 
-arch_options = '/'.join(['march=%s' % x for x in arches.keys()])
-arch_dirnames = ' \\\n'.join(arches.keys())
+  arch_options = '/'.join(['march=%s' % x for x in arches.keys()])
+  arch_dirnames = ' \\\n'.join(arches.keys())
 
-abi_options = '/'.join(['mabi=%s' % x for x in abis.keys()])
-abi_dirnames = ' \\\n'.join(abis.keys())
+  abi_options = '/'.join(['mabi=%s' % x for x in abis.keys()])
+  abi_dirnames = ' \\\n'.join(abis.keys())
 
 prog = sys.argv[0].split('/')[-1]
 print('# This file was generated by %s with the command:' % prog)
 print('#  %s' % ' '.join(sys.argv))
 
-print('MULTILIB_OPTIONS = %s %s' % (arch_options, abi_options))
-print('MULTILIB_DIRNAMES = %s %s' % (arch_dirnames, abi_dirnames))
+print('MULTILIB_OPTIONS = %s %s %s' % (arch_options, abi_options, 
cmodel_options))
+print('MULTILIB_DIRNAMES = %s %s %s' % (arch_dirnames, abi_dirnames, 
cmodel_dirnames))
 print('MULTILIB_REQUIRED = %s' % ' \\\n'.join(required))
 print('MULTILIB_REUSE = %s' % ' \\\n'.join(reuse))
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 6eee1bb43d4..8e974d2952e 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1328,6 +1328,23 @@ rv64imac with lp64 and rv64imafc with lp64 will reuse 
this multi-lib set.
 rv64ima-lp64--f,c,fc
 @end smallexample
 
+@option{--with-multil

Re: [committed] RISC-V: Detect python and pick best one for calling multilib-generator

2021-07-21 Thread Andreas Schwab

On Jul 21 2021, Kito Cheng wrote:

>> Additionally, you will get extra
>> output on stderr when one of the commands is not found.
>
> I tried PYTHON will get an empty string if all of those are not found,
> no stderr from which command.

$ PATH=/usr/bin which foo >/dev/null
which: no foo in (/usr/bin)

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

Re: Pushing XFAILed test cases

2021-07-21 Thread Tobias Burnus


Hi all, hi Thomas (2x), hi Sandra,

On 16.07.21 09:52, Thomas Koenig via Fortran wrote:

The part of the patch to add tests for this goes on top of my base
TS29113 testsuite patch, which hasn't been reviewed or committed yet.


It is my understanding that it is not gcc policy to add xfailed test
cases for things that do not yet work. Rather, xfail is for tests that
later turn out not to work, especially on certain architectures.


...

On 17.07.21 09:25, Thomas Koenig via Fortran wrote:

Is it or is it not gcc policy to push a large number of test cases
that currently do not work and XFAIL them?


In my opinion, it is bad to add testcases which _only_ consist of
xfails for 'target *-*-*'; however, for an extensive set of test
cases, I think it is better to xfail missing parts than to comment
them out - or not having them at all. That permits a better
test coverage once the features have been implemented.

For the TS29113 patch, which Sandra has posted on July 7, I count:

* 77 'dg-do run' tests - of which 27 are xfailed (35%)
* 28 compile-time tests
* 291 dg-error - of which 59 are xfailed (20%)
* 29 dg-bogus - of which are 25 are xfailed (86%)
(And of course, those lines which are valid do not have
a dg-error - and usually also no dg-bogus.)

And in total:
* 1 '.exp' file
* 105 '.f90' files (with 8232 lines in total including comment lines)
* 53 '.c'files (5281 lines)
* 1 '.h' file (12 lines)

Hence, for me this sounds a rather reasonable amount of xfail.
Especially, given that several pending patches do/will reduce
the amount of xfails by fixing issues exposed by the testsuite
(which has been posted but so far not reviewed).

Of course, in an ideal world, xfail would not exist :-)

On 07.07.21 05:40, Sandra Loosemore wrote:

There was a question in one of the issues about why this testsuite
references TS29113 instead of the 2018 standard.  Well, that is what
our customer is interested in: finding out what parts of the TS29113
functionality remain unimplemented or broken, and fixing them, so that
gfortran can say that it implements that specification.


I believe the only real difference between TS29113 and
Fortran 2018's interoperability support is that
'select rank' was added in Fortran 2018.

The testsuite also tests 'select rank'; in that sense,
it is also for Fortran 2018. Thus, ts29113 + ts29113.exp
or 'f2018-c-interop' + 'f2018-c-interop.exp' are both
fine to me. — 'ts29113' is shorter while the other is
clearer to those who did not follow the Fortran standards
and missed that there was a technical specification (TS)
between F2008 and F2018, incorporated (with tiny modifications)
in F2018.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: *Ping**2 [Patch] Fortran: Fix bind(C) character length checks

2021-07-21 Thread Tobias Burnus


On 16.07.21 14:55, Jerry D wrote:

Good to go Tobias.


Thanks for the review!

I have committed it as
r12-2431-gb3d4011ba10275fbd5d6ec5a16d5aaebbdfb5d3c (+ cherry picked it
to OG11).

The attached and committed version has split the 'Allocatable and
pointer' in two separate gfc_error and removed the hyphen in the array
types as spotted and remarked by Sandra.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit b3d4011ba10275fbd5d6ec5a16d5aaebbdfb5d3c
Author: Tobias Burnus 
Date:   Wed Jul 21 09:36:48 2021 +0200

Fortran: Fix bind(C) character length checks

gcc/fortran/ChangeLog:

* decl.c (gfc_verify_c_interop_param): Update for F2008 + F2018
changes; reject unsupported bits with 'Error: Sorry,'.
* trans-expr.c (gfc_conv_procedure_call): Fix condition to
For using CFI descriptor with characters.

gcc/testsuite/ChangeLog:

* gfortran.dg/iso_c_binding_char_1.f90: Update dg-error.
* gfortran.dg/pr32599.f03: Use -std=-f2003 + update comment.
* gfortran.dg/bind_c_char_10.f90: New test.
* gfortran.dg/bind_c_char_6.f90: New test.
* gfortran.dg/bind_c_char_7.f90: New test.
* gfortran.dg/bind_c_char_8.f90: New test.
* gfortran.dg/bind_c_char_9.f90: New test.
---
 gcc/fortran/decl.c | 113 -
 gcc/fortran/trans-expr.c   |  18 +-
 gcc/testsuite/gfortran.dg/bind_c_char_10.f90   | 480 +
 gcc/testsuite/gfortran.dg/bind_c_char_6.f90| 262 +++
 gcc/testsuite/gfortran.dg/bind_c_char_7.f90| 261 +++
 gcc/testsuite/gfortran.dg/bind_c_char_8.f90| 249 +++
 gcc/testsuite/gfortran.dg/bind_c_char_9.f90| 188 
 gcc/testsuite/gfortran.dg/iso_c_binding_char_1.f90 |   2 +-
 gcc/testsuite/gfortran.dg/pr32599.f03  |   8 +-
 9 files changed, 1557 insertions(+), 24 deletions(-)

diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index 413c7a75e0c..05081c40f1e 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -1552,20 +1552,115 @@ gfc_verify_c_interop_param (gfc_symbol *sym)
 	}
 
   /* Character strings are only C interoperable if they have a
- length of 1.  */
-  if (sym->ts.type == BT_CHARACTER && !sym->attr.dimension)
+	 length of 1.  However, as argument they are either iteroperable
+	 when passed as descriptor (which requires len=: or len=*) or
+	 when having a constant length or are always passed by
+	 descriptor.  */
+	  if (sym->ts.type == BT_CHARACTER)
 	{
 	  gfc_charlen *cl = sym->ts.u.cl;
-	  if (!cl || !cl->length || cl->length->expr_type != EXPR_CONSTANT
-  || mpz_cmp_si (cl->length->value.integer, 1) != 0)
+
+	  if (sym->attr.allocatable || sym->attr.pointer)
 		{
-		  gfc_error ("Character argument %qs at %L "
-			 "must be length 1 because "
-			 "procedure %qs is BIND(C)",
-			 sym->name, &sym->declared_at,
-			 sym->ns->proc_name->name);
+		  /* F2018, 18.3.6 (6).  */
+		  if (!sym->ts.deferred)
+		{
+		  if (sym->attr.allocatable)
+			gfc_error ("Allocatable character dummy argument %qs "
+   "at %L must have deferred length as "
+   "procedure %qs is BIND(C)", sym->name,
+   &sym->declared_at, sym->ns->proc_name->name);
+		  else
+			gfc_error ("Pointer character dummy argument %qs at %L "
+   "must have deferred length as procedure %qs "
+   "is BIND(C)", sym->name, &sym->declared_at,
+   sym->ns->proc_name->name);
+		  retval = false;
+		}
+		  else if (!gfc_notify_std (GFC_STD_F2018,
+	"Deferred-length character dummy "
+	"argument %qs at %L of procedure "
+	"%qs with BIND(C) attribute",
+	sym->name, &sym->declared_at,
+	sym->ns->proc_name->name))
+		retval = false;
+		  else if (!sym->attr.dimension)
+		{
+		  /* FIXME: Use CFI array descriptor for scalars.  */
+		  gfc_error ("Sorry, deferred-length scalar character dummy "
+ "argument %qs at %L of procedure %qs with "
+ "BIND(C) not yet supported", sym->name,
+ &sym->declared_at, sym->ns->proc_name->name);
+		  retval = false;
+		}
+		}
+	  else if (sym->attr.value
+		   && (!cl || !cl->length
+			   || cl->length->expr_type != EXPR_CONSTANT
+			   || mpz_cmp_si (cl->length->value.integer, 1) != 0))
+		{
+		  gfc_error ("Character dummy argument %qs at %L must be "
+			 "of length 1 as it has the VALUE attribute",
+			 sym->name, &sym->declared_at);
 		  retval = false;
 		}
+	  else if (!cl || !cl->length)
+		{
+		  /* Assumed length; F2018, 18.3.6 (5)(2).
+		 Uses the CFI array

[PATCH 0/2] New target hook TARGET_COMPUTE_MULTILIB and implementation for RISC-V

2021-07-21 Thread Kito Cheng

This patch set allow target to use customized multi-lib mechanism rather than 
the built-in
multi-lib mechanism.

The motivation of this patch is RISC-V might have very complicated multi-lib 
re-use
rule*, which is hard to maintain and use current multi-lib scripts,
we even hit the "argument list too long" error when we tried to add more
multi-lib reuse rule.
 
* Here is an example for RISC-V multi-lib rules:
https://gist.github.com/kito-cheng/0289cd42d9a756382e5afeb77b42b73b

V2 Changes:
- NO changes for first patch(TARGET_COMPUTE_MULTILIB part) since first version.
- Handle option other than -march and -mabi for riscv_compute_multilib.

[PATCH 1/2] Add TARGET_COMPUTE_MULTILIB hook to override multi-lib result.

2021-07-21 Thread Kito Cheng

Create a new hook to let target could override the multi-lib result,
the motivation is RISC-V might have very complicated multi-lib re-use
rule*, which is hard to maintain and use current multi-lib scripts,
we even hit the "argument list too long" error when we tried to add more
multi-lib reuse rule.

So I think it would be great to have a target specific way to determine
the multi-lib re-use rule, then we could write those rule in C, instead
of expand every possible case in MULTILIB_REUSE.

* Here is an example for RISC-V multi-lib rules:
https://gist.github.com/kito-cheng/0289cd42d9a756382e5afeb77b42b73b

gcc/ChangeLog:

* common/common-target.def (compute_multilib): New.
* common/common-targhooks.c (default_compute_multilib): New.
* doc/tm.texi.in (TARGET_COMPUTE_MULTILIB): New.
* doc/tm.texi: Regen.
* gcc.c: Include common/common-target.h.
(set_multilib_dir) Call targetm_common.compute_multilib.
(SWITCH_LIVE): Move to opts.h.
(SWITCH_FALSE): Ditto.
(SWITCH_IGNORE): Ditto.
(SWITCH_IGNORE_PERMANENTLY): Ditto.
(SWITCH_KEEP_FOR_GCC): Ditto.
(struct switchstr): Ditto.
* opts.h (SWITCH_LIVE): Move from gcc.c.
(SWITCH_FALSE): Ditto.
(SWITCH_IGNORE): Ditto.
(SWITCH_IGNORE_PERMANENTLY): Ditto.
(SWITCH_KEEP_FOR_GCC): Ditto.
(struct switchstr): Ditto.
---
 gcc/common/common-target.def  | 25 ++
 gcc/common/common-targhooks.c | 15 +++
 gcc/doc/tm.texi   |  5 
 gcc/doc/tm.texi.in|  3 +++
 gcc/gcc.c | 48 +--
 gcc/opts.h| 36 ++
 6 files changed, 96 insertions(+), 36 deletions(-)

diff --git a/gcc/common/common-target.def b/gcc/common/common-target.def
index f54590a2a54..a720ecbea98 100644
--- a/gcc/common/common-target.def
+++ b/gcc/common/common-target.def
@@ -84,6 +84,31 @@ The result will be pruned to cases with PREFIX if not NULL.",
  vec, (int option_code, const char *prefix),
  default_get_valid_option_values)
 
+DEFHOOK
+(compute_multilib,
+ "Some target like RISC-V might have complicated multilib reuse rule which is\
+  hard to implemented on current multilib scheme, this hook allow target to\
+  override the result from built-in multilib mechanism.\
+  @var{switches} is the raw option list with @var{n_switches} items;\
+  @var{multilib_dir} is the multi-lib result which compute by the built-in\
+  multi-lib mechanism;\
+  @var{multilib_defaults} is the default options list for multi-lib; \
+  @var{multilib_select} is the string contain the list of supported multi-lib, 
\
+  and the option checking list. \
+  @var{multilib_matches}, @var{multilib_exclusions}, and @var{multilib_reuse} \
+  are corresponding to @var{MULTILIB_MATCHES}, @var{MULTILIB_EXCLUSIONS} \
+  @var{MULTILIB_REUSE}. \
+  The default definition does nothing but return @var{multilib_dir} directly.",
+ const char *, (const struct switchstr *switches,
+   int n_switches,
+   const char *multilib_dir,
+   const char *multilib_defaults,
+   const char *multilib_select,
+   const char *multilib_matches,
+   const char *multilib_exclusions,
+   const char *multilib_reuse),
+ default_compute_multilib)
+
 /* Leave the boolean fields at the end.  */
 
 /* True if unwinding tables should be generated by default.  */
diff --git a/gcc/common/common-targhooks.c b/gcc/common/common-targhooks.c
index 325f199bff3..1477aeeb536 100644
--- a/gcc/common/common-targhooks.c
+++ b/gcc/common/common-targhooks.c
@@ -90,3 +90,18 @@ const struct default_options empty_optimization_table[] =
   {
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };
+
+/* Default version of TARGET_COMPUTE_MULTILIB.  */
+const char *
+default_compute_multilib(
+  const struct switchstr *,
+  int,
+  const char *multilib,
+  const char *,
+  const char *,
+  const char *,
+  const char *,
+  const char *)
+{
+  return multilib;
+}
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index c8f4abe3e41..0268ea77996 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -778,6 +778,11 @@ options are changed via @code{#pragma GCC optimize} or by 
using the
 Set target-dependent initial values of fields in @var{opts}.
 @end deftypefn
 
+@deftypefn {Common Target Hook} {const char *} TARGET_COMPUTE_MULTILIB (const 
struct switchstr *@var{switches}, int @var{n_switches}, const char 
*@var{multilib_dir}, const char *@var{multilib_defaults}, const char 
*@var{multilib_select}, const char *@var{multilib_matches}, const char 
*@var{multilib_exclusions}, const char *@var{multilib_reuse})
+Some target like RISC-V might have complicated multilib reuse rule which is  
hard to implemented on current multilib scheme, this hook allow target to  
override the result from built-in multilib mechanism.  @var{switches} is the 
raw option list with @var{n_switche

[PATCH 2/2] RISC-V: Implement TARGET_COMPUTE_MULTILIB

2021-07-21 Thread Kito Cheng

Use TARGET_COMPUTE_MULTILIB to search the multi-lib reuse for riscv*-*-elf*,
according following rules:

 1. Check ABI is same.
 2. Check both has atomic extension or both don't have atomic extension.
- Because mix soft and hard atomic operation doesn't make sense and
  won't work as expect.
 3. Check current arch is superset of the target multi-lib arch.
- It might result slower performance or larger code size, but it
  safe to run.
 4. Pick most match multi-lib set if more than one multi-lib are pass
the above checking.

Example for how to select multi-lib:
  We build code with -march=rv32imaf and -mabi=ilp32, and we have
  following 5 multi-lib set:

1. rv32ia/ilp32
2. rv32ima/ilp32
3. rv32imf/ilp32
4. rv32imaf/ilp32f
5. rv32imafd/ilp32

  The first and second multi-lib is safe to like, 3rd multi-lib can't
  re-use becasue it don't have atomic extension, which is mismatch according
  rule 2, and the 4th multi-lib can't re-use too due to the ABI mismatch,
  the last multi-lib can't use since current arch is not superset of the
  arch of multi-lib.

And emit error if not found suitable multi-lib set, the error message
only emit when link with standard libraries.

Example for when error will be emitted:

  $ riscv64-unknown-elf-gcc -print-multi-lib
  .;
  rv32i/ilp32;@march=rv32i@mabi=ilp32
  rv32im/ilp32;@march=rv32im@mabi=ilp32
  rv32iac/ilp32;@march=rv32iac@mabi=ilp32
  rv32imac/ilp32;@march=rv32imac@mabi=ilp32
  rv32imafc/ilp32f;@march=rv32imafc@mabi=ilp32f
  rv64imac/lp64;@march=rv64imac@mabi=lp64

  // No actual linking, so no error emitted.
  $ riscv64-unknown-elf-gcc -print-multi-directory -march=rv32ia -mabi=ilp32
  .

  // Link to default libc and libgcc, so check the multi-lib, and emit
  // error because not found suitable multilib.
  $ riscv64-unknown-elf-gcc -march=rv32ia -mabi=ilp32 ~/hello.c
  riscv64-unknown-elf-gcc: fatal error: can't found suitable multilib set for 
'-march=rv32ia'/'-mabi=ilp32'
  compilation terminated.

  // No error emitted, because not link to stdlib.
  $ riscv64-unknown-elf-gcc -march=rv32ia -mabi=ilp32 ~/hello.c -nostdlib

  // No error emitted, because compile only.
  $ riscv64-unknown-elf-gcc -march=rv32ia -mabi=ilp32 ~/hello.c -c

gcc/ChangeLog:

* common/config/riscv/riscv-common.c: Include .
(struct riscv_multi_lib_info_t): New.
(riscv_subset_list::match_score): Ditto.
(find_last_appear_switch): Ditto.
(struct multi_lib_info_t): Ditto.
(riscv_current_arch_str): Ditto.
(riscv_current_abi_str): Ditto.
(riscv_multi_lib_info_t::parse): Ditto.
(riscv_check_cond): Ditto.
(riscv_check_other_cond): Ditto.
(riscv_compute_multilib): Ditto.
(TARGET_COMPUTE_MULTILIB): Defined.
* config/riscv/elf.h (LIB_SPEC): Call riscv_multi_lib_check if
doing link.
(RISCV_USE_CUSTOMISED_MULTI_LIB): New.
* config/riscv/riscv.h (riscv_multi_lib_check): New.
(EXTRA_SPEC_FUNCTIONS): Add riscv_multi_lib_check.
---
 gcc/common/config/riscv/riscv-common.c | 409 +
 gcc/config/riscv/elf.h |   6 +-
 gcc/config/riscv/riscv-subset.h|   2 +
 gcc/config/riscv/riscv.h   |   4 +-
 4 files changed, 419 insertions(+), 2 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.c 
b/gcc/common/config/riscv/riscv-common.c
index 10868fd417d..83819a7de6a 100644
--- a/gcc/common/config/riscv/riscv-common.c
+++ b/gcc/common/config/riscv/riscv-common.c
@@ -18,6 +18,7 @@ along with GCC; see the file COPYING3.  If not see
 .  */
 
 #include 
+#include 
 
 #define INCLUDE_STRING
 #include "config.h"
@@ -122,6 +123,26 @@ const riscv_subset_list *riscv_current_subset_list ()
   return current_subset_list;
 }
 
+/* struct for recording multi-lib info.  */
+struct riscv_multi_lib_info_t {
+  std::string path;
+  std::string arch_str;
+  std::string abi_str;
+  std::string other_cond;
+  riscv_subset_list *subset_list;
+
+  static bool parse (struct riscv_multi_lib_info_t *,
+const std::string &,
+const std::string &);
+};
+
+/* Flag for checking if there is no suitable multi-lib found.  */
+static bool riscv_no_matched_multi_lib;
+
+/* Used for record value of -march and -mabi.  */
+static std::string riscv_current_arch_str;
+static std::string riscv_current_abi_str;
+
 riscv_subset_t::riscv_subset_t ()
   : name (), major_version (0), minor_version (0), next (NULL),
 explicit_version_p (false), implied_p (false)
@@ -147,6 +168,42 @@ riscv_subset_list::~riscv_subset_list ()
 }
 }
 
+/* Compute the match score of two arch string, return 0 if incompatible.  */
+int
+riscv_subset_list::match_score (riscv_subset_list *list) const
+{
+  riscv_subset_t *s;
+  int score = 0;
+  bool has_a_ext, list_has_a_ext;
+
+  /* Impossible to match if XLEN is different.  */
+  if (list->m_xlen != this->m_xlen)
+return

Re: sync up new type indices for body adjustments

2021-07-21 Thread Alexandre Oliva

On Jul 19, 2021, Martin Jambor  wrote:

> So I would first check how come that you request IPA_PARAM_OP_COPY of
> something that does not seem to have a corresponding type but there is
> a DECL

The corresponding type is there all right, it was just stored in a
different vector entry, because some IPA optimization, applied after my
copying-and-wrapping pass, dropped several of the parms that came before
a NEW parms added by my pass.

This caused the types of the retained NEW parms to be pushed into lower
indices in the type array, but then accessed as if all of the dropped
parms were still there.  That can't be right.

I was actually lucky that enough parms were dropped as to make the
vector access out of range, flagged by checking.  If that wasn't the
case, we might have silently accessed an unrelated parm type.

Does this scenario make sense to you?

I can try to get you some code for a custom pass to trigger the problem
if you'd like to look more closely.

> If you believe that what you're doing is correct

I don't really know that it is.  IIRC back when I ran into this problem,
the logic to change some of the parameters in the wrapped function to
reference types was using NEW parameters.  Now I'm using COPY, save for
actual NEW parms, and changing the type of the clone after
create_version_clone_with_body.

Now, what puzzles me is why we even care about that parm mapping
afterwards.  The clone is created and materialized very early on, before
any preexisting ipa transformations, and there were not any edges
modified to use this clone.  As far as I'm concerned, it should be
entirely independent from the function it was cloned from, and it makes
no sense to me for IPA transformations applied to this clone to even
care what the function it was originally cloned from was: the clone is
already fully materialized, so argument back-mappings might as well stop
at it.

But I can't say I understand why it does that.  I haven't looked very
much into its internals, I'm mostly just trying to use
create_version_clone_with_body to clone a function, make some changes to
it, and turn the original function into a wrapper.

I'm not actually introducing IPA deferred transformations, and this is
all done before any relevant IPA transformations.  I can't even say I'm
using IPA proper, the reason I made it an IPA pass was because that has
enabled multiple passes over functions, which was convenient for some
purposes.  Then, I ended up iterating over aliases and undefined
functions, and relying on the call graph instead of iterating over
gimple bodies for some purposes, so now it *has* to be an IPA pass, but
not a typical one in that it doesn't queue up IPA transformations to be
applied at a later materialization.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about

Re: [PATCH 2/2][RFC] Add loop masking support for x86

2021-07-21 Thread Hongtao Liu via Gcc-patches

On Wed, Jul 21, 2021 at 4:16 PM Richard Biener  wrote:
>
> On Wed, 21 Jul 2021, Hongtao Liu wrote:
>
> > On Tue, Jul 20, 2021 at 3:38 PM Richard Biener  wrote:
> > >
> > > On Tue, 20 Jul 2021, Hongtao Liu wrote:
> > >
> > > > On Fri, Jul 16, 2021 at 5:11 PM Richard Biener  
> > > > wrote:
> > > > >
> > > > > On Thu, 15 Jul 2021, Richard Biener wrote:
> > > > >
> > > > > > On Thu, 15 Jul 2021, Richard Biener wrote:
> > > > > >
> > > > > > > OK, guess I was more looking at
> > > > > > >
> > > > > > > #define N 32
> > > > > > > int foo (unsigned long *a, unsigned long * __restrict b,
> > > > > > >  unsigned int *c, unsigned int * __restrict d,
> > > > > > >  int n)
> > > > > > > {
> > > > > > >   unsigned sum = 1;
> > > > > > >   for (int i = 0; i < n; ++i)
> > > > > > > {
> > > > > > >   b[i] += a[i];
> > > > > > >   d[i] += c[i];
> > > > > > > }
> > > > > > >   return sum;
> > > > > > > }
> > > > > > >
> > > > > > > where we on x86 AVX512 vectorize with V8DI and V16SI and we
> > > > > > > generate two masks for the two copies of V8DI (VF is 16) and one
> > > > > > > mask for V16SI.  With SVE I see
> > > > > > >
> > > > > > > punpklo p1.h, p0.b
> > > > > > > punpkhi p2.h, p0.b
> > > > > > >
> > > > > > > that's sth I expected to see for AVX512 as well, using the V16SI
> > > > > > > mask and unpacking that to two V8DI ones.  But I see
> > > > > > >
> > > > > > > vpbroadcastd%eax, %ymm0
> > > > > > > vpaddd  %ymm12, %ymm0, %ymm0
> > > > > > > vpcmpud $6, %ymm0, %ymm11, %k3
> > > > > > > vpbroadcastd%eax, %xmm0
> > > > > > > vpaddd  %xmm10, %xmm0, %xmm0
> > > > > > > vpcmpud $1, %xmm7, %xmm0, %k1
> > > > > > > vpcmpud $6, %xmm0, %xmm8, %k2
> > > > > > > kortestb%k1, %k1
> > > > > > > jne .L3
> > > > > > >
> > > > > > > so three %k masks generated by vpcmpud.  I'll have to look what's
> > > > > > > the magic for SVE and why that doesn't trigger for x86 here.
> > > > > >
> > > > > > So answer myself, vect_maybe_permute_loop_masks looks for
> > > > > > vec_unpacku_hi/lo_optab, but with AVX512 the vector bools have
> > > > > > QImode so that doesn't play well here.  Not sure if there
> > > > > > are proper mask instructions to use (I guess there's a shift
> > > > > > and lopart is free).  This is QI:8 to two QI:4 (bits) mask
> > > > Yes, for 16bit and more, we have KUNPCKBW/D/Q. but for 8bit
> > > > unpack_lo/hi, only shift.
> > > > > > conversion.  Not sure how to better ask the target here - again
> > > > > > VnBImode might have been easier here.
> > > > >
> > > > > So I've managed to "emulate" the unpack_lo/hi for the case of
> > > > > !VECTOR_MODE_P masks by using sub-vector select (we're asking
> > > > > to turn vector(8)  into two
> > > > > vector(4) ) via BIT_FIELD_REF.  That then
> > > > > produces the desired single mask producer and
> > > > >
> > > > >   loop_mask_38 = VIEW_CONVERT_EXPR > > > > >(loop_mask_54);
> > > > >   loop_mask_37 = BIT_FIELD_REF ;
> > > > >
> > > > > note for the lowpart we can just view-convert away the excess bits,
> > > > > fully re-using the mask.  We generate surprisingly "good" code:
> > > > >
> > > > > kmovb   %k1, %edi
> > > > > shrb$4, %dil
> > > > > kmovb   %edi, %k2
> > > > >
> > > > > besides the lack of using kshiftrb.  I guess we're just lacking
> > > > > a mask register alternative for
> > > > Yes, we can do it similar as kor/kand/kxor.
> > > > >
> > > > > (insn 22 20 25 4 (parallel [
> > > > > (set (reg:QI 94 [ loop_mask_37 ])
> > > > > (lshiftrt:QI (reg:QI 98 [ loop_mask_54 ])
> > > > > (const_int 4 [0x4])))
> > > > > (clobber (reg:CC 17 flags))
> > > > > ]) 724 {*lshrqi3_1}
> > > > >  (expr_list:REG_UNUSED (reg:CC 17 flags)
> > > > > (nil)))
> > > > >
> > > > > and so we reload.  For the above cited loop the AVX512 vectorization
> > > > > with --param vect-partial-vector-usage=1 does look quite sensible
> > > > > to me.  Instead of a SSE vectorized epilogue plus a scalar
> > > > > epilogue we get a single fully masked AVX512 "iteration" for both.
> > > > > I suppose it's still mostly a code-size optimization (384 bytes
> > > > > with the masked epiloge vs. 474 bytes with trunk) since it will
> > > > > be likely slower for very low iteration counts but it's good
> > > > > for icache usage then and good for less branch predictor usage.
> > > > >
> > > > > That said, I have to set up SPEC on a AVX512 machine to do
> > > > Does patch  land in trunk already, i can have a test on CLX.
> > >
> > > I'm still experimenting a bit right now but hope to get something
> > > trunk ready at the end of this or beginning next week.  Since it's
> > > disabled by default we can work on improving it during stage1 then.
> > >
> > > I'm mostly struggling with the GIMPLE IL to be used for the
> > > mask unpacking since we currently reject both

Re: [PATCH 2/3] [PR libfortran/101305] Bind(C): Correct sizes of some types in CFI_establish

2021-07-21 Thread Tobias Burnus


On 13.07.21 23:28, Sandra Loosemore wrote:

CFI_establish was failing to set the default elem_len correctly for
CFI_type_cptr, CFI_type_cfunptr, CFI_type_long_double, and
CFI_type_long_double_Complex.


LGTM – thanks for the patch!

Tobias


2021-07-13  Sandra Loosemore  

libgfortran/
  PR libfortran/101305
  * runtime/ISO_Fortran_binding.c (CFI_establish): Special-case
  CFI_type_cptr and CFI_type_cfunptr.  Correct size of long double
  on targets where it has kind 10.
---
  libgfortran/runtime/ISO_Fortran_binding.c | 19 ++-
  1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/libgfortran/runtime/ISO_Fortran_binding.c 
b/libgfortran/runtime/ISO_Fortran_binding.c
index 28fa9f5..6b5f26c 100644
--- a/libgfortran/runtime/ISO_Fortran_binding.c
+++ b/libgfortran/runtime/ISO_Fortran_binding.c
@@ -341,9 +341,13 @@ int CFI_establish (CFI_cdesc_t *dv, void *base_addr, 
CFI_attribute_t attribute,

dv->base_addr = base_addr;

-  if (type == CFI_type_char || type == CFI_type_ucs4_char ||
-  type == CFI_type_struct || type == CFI_type_other)
+  if (type == CFI_type_char || type == CFI_type_ucs4_char
+  || type == CFI_type_struct || type == CFI_type_other)
  dv->elem_len = elem_len;
+  else if (type == CFI_type_cptr)
+dv->elem_len = sizeof (void *);
+  else if (type == CFI_type_cfunptr)
+dv->elem_len = sizeof (void (*)(void));
else
  {
/* base_type describes the intrinsic type with kind parameter. */
@@ -351,16 +355,13 @@ int CFI_establish (CFI_cdesc_t *dv, void *base_addr, 
CFI_attribute_t attribute,
/* base_type_size is the size in bytes of the variable as given by its
 * kind parameter. */
size_t base_type_size = (type - base_type) >> CFI_type_kind_shift;
-  /* Kind types 10 have a size of 64 bytes. */
+  /* Kind type 10 maps onto the 80-bit long double encoding on x86.
+  Note that this has different storage size for -m32 than -m64.  */
if (base_type_size == 10)
- {
-   base_type_size = 64;
- }
+ base_type_size = sizeof (long double);
/* Complex numbers are twice the size of their real counterparts. */
if (base_type == CFI_type_Complex)
- {
-   base_type_size *= 2;
- }
+ base_type_size *= 2;
dv->elem_len = base_type_size;
  }


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: [PATCH 1/3] [PR libfortran/101305] Bind(C): Fix type encodings in ISO_Fortran_binding.h

2021-07-21 Thread Tobias Burnus


On 13.07.21 23:28, Sandra Loosemore wrote:


ISO_Fortran_binding.h had many incorrect hardwired kind encodings in
the definitions of the CFI_type_* macros.  Additionally, not all
targets support all the defined type encodings, and the Fortran
standard requires those macros to have a negative value.

This patch changes ISO_Fortran_binding.h to use sizeof instead of
hard-coded sizes, and assembles it from fragments that reflect the
set of types supported by the target.

2021-07-13  Sandra Loosemore
  Tobias Burnus

libgfortran/
  PR libfortran/101305
  * ISO_Fortran_binding.h: Fix hard-coded sizes and split into...
  * ISO_Fortran_binding-1-tmpl.h: New file.
  * ISO_Fortran_binding-2-tmpl.h: New file.
  * ISO_Fortran_binding-3-tmpl.h: New file.
  * Makefile.am: Add rule for generating ISO_Fortran_binding.h.
  Adjust pathnames to that file.
  * Makefile.in: Regenerated.
  * mk-kinds-h.sh: New file.
  * runtime/ISO_Fortran_binding.c: Fix include path.

LGTM – except for the following remark regarding a preexisting comment.


--- /dev/null
+++ b/libgfortran/ISO_Fortran_binding-1-tmpl.h
+/* Error codes.
+   CFI_INVALID_STRIDE should be defined in the standard because they are 
useful to the implementation of the functions.
+ */


The standard permits: "Error conditions other than those listed in this
subclause should be indicated by error codes different from the values
of the macros named in this subclause."

I personally do not like current (preexisting) the wording in the
comment – and CFI_FAILURE is also not listed, which is also not part
of Fortran standard. I think some wording along the following is
be more appropriate:
"Note that CFI_FAILURE and CFI_INVALID_STRIDE specific to GCC and not
part of the Fortran standard."



+#define CFI_SUCCESS 0
+#define CFI_FAILURE 1
+#define CFI_ERROR_BASE_ADDR_NULL 2
+#define CFI_ERROR_BASE_ADDR_NOT_NULL 3
+#define CFI_INVALID_ELEM_LEN 4
+#define CFI_INVALID_RANK 5
+#define CFI_INVALID_TYPE 6
+#define CFI_INVALID_ATTRIBUTE 7
+#define CFI_INVALID_EXTENT 8
+#define CFI_INVALID_STRIDE 9
+#define CFI_INVALID_DESCRIPTOR 10
+#define CFI_ERROR_MEM_ALLOCATION 11
+#define CFI_ERROR_OUT_OF_BOUNDS 12


Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: [PATCH 2/2][RFC] Add loop masking support for x86

2021-07-21 Thread Richard Biener

On Wed, 21 Jul 2021, Hongtao Liu wrote:

> On Wed, Jul 21, 2021 at 4:16 PM Richard Biener  wrote:
> >
> > On Wed, 21 Jul 2021, Hongtao Liu wrote:
> >
> > > On Tue, Jul 20, 2021 at 3:38 PM Richard Biener  wrote:
> > > >
> > > > On Tue, 20 Jul 2021, Hongtao Liu wrote:
> > > >
> > > > > On Fri, Jul 16, 2021 at 5:11 PM Richard Biener  
> > > > > wrote:
> > > > > >
> > > > > > On Thu, 15 Jul 2021, Richard Biener wrote:
> > > > > >
> > > > > > > On Thu, 15 Jul 2021, Richard Biener wrote:
> > > > > > >
> > > > > > > > OK, guess I was more looking at
> > > > > > > >
> > > > > > > > #define N 32
> > > > > > > > int foo (unsigned long *a, unsigned long * __restrict b,
> > > > > > > >  unsigned int *c, unsigned int * __restrict d,
> > > > > > > >  int n)
> > > > > > > > {
> > > > > > > >   unsigned sum = 1;
> > > > > > > >   for (int i = 0; i < n; ++i)
> > > > > > > > {
> > > > > > > >   b[i] += a[i];
> > > > > > > >   d[i] += c[i];
> > > > > > > > }
> > > > > > > >   return sum;
> > > > > > > > }
> > > > > > > >
> > > > > > > > where we on x86 AVX512 vectorize with V8DI and V16SI and we
> > > > > > > > generate two masks for the two copies of V8DI (VF is 16) and one
> > > > > > > > mask for V16SI.  With SVE I see
> > > > > > > >
> > > > > > > > punpklo p1.h, p0.b
> > > > > > > > punpkhi p2.h, p0.b
> > > > > > > >
> > > > > > > > that's sth I expected to see for AVX512 as well, using the V16SI
> > > > > > > > mask and unpacking that to two V8DI ones.  But I see
> > > > > > > >
> > > > > > > > vpbroadcastd%eax, %ymm0
> > > > > > > > vpaddd  %ymm12, %ymm0, %ymm0
> > > > > > > > vpcmpud $6, %ymm0, %ymm11, %k3
> > > > > > > > vpbroadcastd%eax, %xmm0
> > > > > > > > vpaddd  %xmm10, %xmm0, %xmm0
> > > > > > > > vpcmpud $1, %xmm7, %xmm0, %k1
> > > > > > > > vpcmpud $6, %xmm0, %xmm8, %k2
> > > > > > > > kortestb%k1, %k1
> > > > > > > > jne .L3
> > > > > > > >
> > > > > > > > so three %k masks generated by vpcmpud.  I'll have to look 
> > > > > > > > what's
> > > > > > > > the magic for SVE and why that doesn't trigger for x86 here.
> > > > > > >
> > > > > > > So answer myself, vect_maybe_permute_loop_masks looks for
> > > > > > > vec_unpacku_hi/lo_optab, but with AVX512 the vector bools have
> > > > > > > QImode so that doesn't play well here.  Not sure if there
> > > > > > > are proper mask instructions to use (I guess there's a shift
> > > > > > > and lopart is free).  This is QI:8 to two QI:4 (bits) mask
> > > > > Yes, for 16bit and more, we have KUNPCKBW/D/Q. but for 8bit
> > > > > unpack_lo/hi, only shift.
> > > > > > > conversion.  Not sure how to better ask the target here - again
> > > > > > > VnBImode might have been easier here.
> > > > > >
> > > > > > So I've managed to "emulate" the unpack_lo/hi for the case of
> > > > > > !VECTOR_MODE_P masks by using sub-vector select (we're asking
> > > > > > to turn vector(8)  into two
> > > > > > vector(4) ) via BIT_FIELD_REF.  That then
> > > > > > produces the desired single mask producer and
> > > > > >
> > > > > >   loop_mask_38 = VIEW_CONVERT_EXPR > > > > > >(loop_mask_54);
> > > > > >   loop_mask_37 = BIT_FIELD_REF ;
> > > > > >
> > > > > > note for the lowpart we can just view-convert away the excess bits,
> > > > > > fully re-using the mask.  We generate surprisingly "good" code:
> > > > > >
> > > > > > kmovb   %k1, %edi
> > > > > > shrb$4, %dil
> > > > > > kmovb   %edi, %k2
> > > > > >
> > > > > > besides the lack of using kshiftrb.  I guess we're just lacking
> > > > > > a mask register alternative for
> > > > > Yes, we can do it similar as kor/kand/kxor.
> > > > > >
> > > > > > (insn 22 20 25 4 (parallel [
> > > > > > (set (reg:QI 94 [ loop_mask_37 ])
> > > > > > (lshiftrt:QI (reg:QI 98 [ loop_mask_54 ])
> > > > > > (const_int 4 [0x4])))
> > > > > > (clobber (reg:CC 17 flags))
> > > > > > ]) 724 {*lshrqi3_1}
> > > > > >  (expr_list:REG_UNUSED (reg:CC 17 flags)
> > > > > > (nil)))
> > > > > >
> > > > > > and so we reload.  For the above cited loop the AVX512 vectorization
> > > > > > with --param vect-partial-vector-usage=1 does look quite sensible
> > > > > > to me.  Instead of a SSE vectorized epilogue plus a scalar
> > > > > > epilogue we get a single fully masked AVX512 "iteration" for both.
> > > > > > I suppose it's still mostly a code-size optimization (384 bytes
> > > > > > with the masked epiloge vs. 474 bytes with trunk) since it will
> > > > > > be likely slower for very low iteration counts but it's good
> > > > > > for icache usage then and good for less branch predictor usage.
> > > > > >
> > > > > > That said, I have to set up SPEC on a AVX512 machine to do
> > > > > Does patch  land in trunk already, i can have a test on CLX.
> > > >
> > > > I'm still experimenting a bit right now but hope to get something

Re: [PATCH 3/3] [PR libfortran/101305] Fix ISO_Fortran_binding.h paths in gfortran testsuite

2021-07-21 Thread Tobias Burnus


On 13.07.21 23:28, Sandra Loosemore wrote:

ISO_Fortran_binding.h is now generated in the libgfortran build
directory where it is on the default include path.  Adjust includes in
the gfortran testsuite not to include an explicit path pointing at the
source directory.

...

-#include "../../../libgfortran/ISO_Fortran_binding.h"
+#include "ISO_Fortran_binding.h"


Unfortunately, that does not help.

When running the testsuite in the build directory (cd $BUILD/gcc),
I get:

testsuite/gfortran.dg/pr93524.c:5:10: fatal error: ISO_Fortran_binding.h: No 
such file or directory

I wonder whether we need to do the same as with libgomp and libstdc++,
namely adding a libgfortran/testsuite/ to handle this.

In any case, compiling with '-v' shows all currently
searched include paths are the same for -m32 and -m64,
which is will pick up the wrong one for -m32. I tried it
by using the command line used when running in tree
  make check-fortran RUNTESTFLAGS="dg.exp=pr93524.f90 
--target_board=unix\{-m32,\}"

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: [PATCH 02/10] [i386] Enable _Float16 type for TARGET_SSE2 and above.

2021-07-21 Thread Uros Bizjak via Gcc-patches

On Wed, Jul 21, 2021 at 9:43 AM liuhongt  wrote:
>
> gcc/ChangeLog:
>
> * config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode.
> * config/i386/i386.c (enum x86_64_reg_class): Add
> X86_64_SSEHF_CLASS.
> (merge_classes): Handle X86_64_SSEHF_CLASS.
> (examine_argument): Ditto.
> (construct_container): Ditto.
> (classify_argument): Ditto, and set HFmode/HCmode to
> X86_64_SSEHF_CLASS.
> (function_value_32): Return _FLoat16/Complex Float16 by
> %xmm0/%xmm1.
> (function_value_64): Return _Float16/Complex Float16 by SSE
> register.
> (ix86_print_operand): Handle CONST_DOUBLE HFmode.
> (ix86_secondary_reload): Require gpr as intermediate register
> to store _Float16 from sse register when sse4 is not
> available.
> (ix86_hard_regno_mode_ok): Put HFmode in sse register and gpr.
> (ix86_libgcc_floating_mode_supported_p): Enable _FLoat16 under
> sse2.
> (ix86_scalar_mode_supported_p): Ditto.
> (TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Defined.
> (ix86_get_excess_precision): Return
> FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 under sse2.
> * config/i386/i386.h (VALID_SSE2_REG_MODE): Add HFmode.
> * config/i386/i386.md (*pushhf_rex64): New define_insn.
> (*pushhf): Ditto.
> (*movhf_internal): Ditto.
> * doc/extend.texi (Half-Precision Floating Point): Documemt
> _Float16 for x86.
>
> gcc/lto/ChangeLog:
>
> * lto-lang.c (lto_type_for_mode): Return float16_type_node
> when mode == TYPE_MODE (float16_type_node).
>
> gcc/testsuite/ChangeLog
>
> * gcc.target/i386/sse2-float16-1.c: New test.
> * gcc.target/i386/sse2-float16-2.c: Ditto.
> * gcc.target/i386/sse2-float16-3.c: Ditto.

OK for the x86 part with some small changes inline.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386-modes.def|   1 +
>  gcc/config/i386/i386.c|  99 ++-
>  gcc/config/i386/i386.h|   2 +-
>  gcc/config/i386/i386.md   | 118 +-
>  gcc/doc/extend.texi   |  16 +++
>  gcc/lto/lto-lang.c|   3 +
>  .../gcc.target/i386/sse2-float16-1.c  |   8 ++
>  .../gcc.target/i386/sse2-float16-2.c  |  16 +++
>  .../gcc.target/i386/sse2-float16-3.c  |  12 ++
>  9 files changed, 265 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-3.c
>
> diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def
> index 4e7014be034..9232f59a925 100644
> --- a/gcc/config/i386/i386-modes.def
> +++ b/gcc/config/i386/i386-modes.def
> @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
>
>  FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_intel_96_format);
>  FLOAT_MODE (TF, 16, ieee_quad_format);
> +FLOAT_MODE (HF, 2, ieee_half_format);
>
>  /* In ILP32 mode, XFmode has size 12 and alignment 4.
> In LP64 mode, XFmode has size and alignment 16.  */
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index ff96134fb37..02628d838fc 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -387,6 +387,7 @@ enum x86_64_reg_class
>  X86_64_INTEGER_CLASS,
>  X86_64_INTEGERSI_CLASS,
>  X86_64_SSE_CLASS,
> +X86_64_SSEHF_CLASS,
>  X86_64_SSESF_CLASS,
>  X86_64_SSEDF_CLASS,
>  X86_64_SSEUP_CLASS,
> @@ -2023,8 +2024,10 @@ merge_classes (enum x86_64_reg_class class1, enum 
> x86_64_reg_class class2)
>  return X86_64_MEMORY_CLASS;
>
>/* Rule #4: If one of the classes is INTEGER, the result is INTEGER.  */
> -  if ((class1 == X86_64_INTEGERSI_CLASS && class2 == X86_64_SSESF_CLASS)
> -  || (class2 == X86_64_INTEGERSI_CLASS && class1 == X86_64_SSESF_CLASS))
> +  if ((class1 == X86_64_INTEGERSI_CLASS
> +   && (class2 == X86_64_SSESF_CLASS || class2 == X86_64_SSEHF_CLASS))
> +  || (class2 == X86_64_INTEGERSI_CLASS
> + && (class1 == X86_64_SSESF_CLASS || class1 == X86_64_SSEHF_CLASS)))
>  return X86_64_INTEGERSI_CLASS;
>if (class1 == X86_64_INTEGER_CLASS || class1 == X86_64_INTEGERSI_CLASS
>|| class2 == X86_64_INTEGER_CLASS || class2 == X86_64_INTEGERSI_CLASS)
> @@ -2178,6 +2181,8 @@ classify_argument (machine_mode mode, const_tree type,
> /* The partial classes are now full classes.  */
> if (subclasses[0] == X86_64_SSESF_CLASS && bytes != 4)
>   subclasses[0] = X86_64_SSE_CLASS;
> +   if (subclasses[0] == X86_64_SSEHF_CLASS && bytes != 2)
> + subclasses[0] = X86_64_SSE_CLASS;
> if (subclasses[0] == X86_64_INTEGERSI_CLASS
> && !((bit_offset % 64) == 0 && byte

Re: [PATCH 03/10] [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and truncations.

2021-07-21 Thread Uros Bizjak via Gcc-patches

On Wed, Jul 21, 2021 at 9:43 AM liuhongt  wrote:
>
> gcc/ChangeLog:
>
> * optabs-query.c (get_best_extraction_insn): Use word_mode for
> HF field.
>
> libgcc/ChangeLog:
>
> * config/i386/32/sfp-machine.h (_FP_NANFRAC_H): New macro.
> * config/i386/64/sfp-machine.h (_FP_NANFRAC_H): Ditto.
> * config/i386/sfp-machine.h (_FP_NANSIGN_H): Ditto.
> * config/i386/t-softfp: Add hf soft-fp.
> * config.host: Add i386/64/t-softfp.
> * config/i386/64/t-softfp: New file.

OK for the x86 part, but please take care of newline at the end of
files to avoid:

> \ No newline at end of file

Thanks,
Uros.

> ---
>  gcc/optabs-query.c  | 10 +-
>  libgcc/config.host  |  5 +
>  libgcc/config/i386/32/sfp-machine.h |  1 +
>  libgcc/config/i386/64/sfp-machine.h |  1 +
>  libgcc/config/i386/64/t-softfp  |  1 +
>  libgcc/config/i386/sfp-machine.h|  1 +
>  libgcc/config/i386/t-softfp |  5 +
>  7 files changed, 19 insertions(+), 5 deletions(-)
>  create mode 100644 libgcc/config/i386/64/t-softfp
>
> diff --git a/gcc/optabs-query.c b/gcc/optabs-query.c
> index 05ee5f517da..0438e451474 100644
> --- a/gcc/optabs-query.c
> +++ b/gcc/optabs-query.c
> @@ -205,7 +205,15 @@ get_best_extraction_insn (extraction_insn *insn,
>   machine_mode field_mode)
>  {
>opt_scalar_int_mode mode_iter;
> -  FOR_EACH_MODE_FROM (mode_iter, smallest_int_mode_for_size (struct_bits))
> +  scalar_int_mode smallest_int_mode;
> +  /* FIXME: validate_subreg only allows (subreg:WORD_MODE (reg:HF) 0). */
> +  if (FLOAT_MODE_P (field_mode)
> +  && known_eq (GET_MODE_SIZE (field_mode), 2))
> +smallest_int_mode = word_mode;
> +  else
> +smallest_int_mode = smallest_int_mode_for_size (struct_bits);
> +
> +  FOR_EACH_MODE_FROM (mode_iter, smallest_int_mode)
>  {
>scalar_int_mode mode = mode_iter.require ();
>if (get_extraction_insn (insn, pattern, type, mode))
> diff --git a/libgcc/config.host b/libgcc/config.host
> index 50f00062232..96da9ef1cce 100644
> --- a/libgcc/config.host
> +++ b/libgcc/config.host
> @@ -1540,10 +1540,7 @@ i[34567]86-*-elfiamcu | i[34567]86-*-rtems*)
> ;;
>  i[34567]86-*-* | x86_64-*-*)
> tmake_file="${tmake_file} t-softfp-tf"
> -   if test "${host_address}" = 32; then
> -   tmake_file="${tmake_file} i386/${host_address}/t-softfp"
> -   fi
> -   tmake_file="${tmake_file} i386/t-softfp t-softfp"
> +   tmake_file="${tmake_file} i386/${host_address}/t-softfp i386/t-softfp 
> t-softfp"
> ;;
>  esac
>
> diff --git a/libgcc/config/i386/32/sfp-machine.h 
> b/libgcc/config/i386/32/sfp-machine.h
> index 1fa282d7afe..e24cbc8d180 100644
> --- a/libgcc/config/i386/32/sfp-machine.h
> +++ b/libgcc/config/i386/32/sfp-machine.h
> @@ -86,6 +86,7 @@
>  #define _FP_DIV_MEAT_D(R,X,Y)   _FP_DIV_MEAT_2_udiv(D,R,X,Y)
>  #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_4_udiv(Q,R,X,Y)
>
> +#define _FP_NANFRAC_H  _FP_QNANBIT_H
>  #define _FP_NANFRAC_S  _FP_QNANBIT_S
>  #define _FP_NANFRAC_D  _FP_QNANBIT_D, 0
>  /* Even if XFmode is 12byte,  we have to pad it to
> diff --git a/libgcc/config/i386/64/sfp-machine.h 
> b/libgcc/config/i386/64/sfp-machine.h
> index 1ff94c23ea4..e1c616699bb 100644
> --- a/libgcc/config/i386/64/sfp-machine.h
> +++ b/libgcc/config/i386/64/sfp-machine.h
> @@ -13,6 +13,7 @@ typedef unsigned int UTItype __attribute__ ((mode (TI)));
>
>  #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_2_udiv(Q,R,X,Y)
>
> +#define _FP_NANFRAC_H  _FP_QNANBIT_H
>  #define _FP_NANFRAC_S  _FP_QNANBIT_S
>  #define _FP_NANFRAC_D  _FP_QNANBIT_D
>  #define _FP_NANFRAC_E  _FP_QNANBIT_E, 0
> diff --git a/libgcc/config/i386/64/t-softfp b/libgcc/config/i386/64/t-softfp
> new file mode 100644
> index 000..d812bb120bd
> --- /dev/null
> +++ b/libgcc/config/i386/64/t-softfp
> @@ -0,0 +1 @@
> +softfp_extras := fixhfti fixunshfti floattihf floatuntihf
> \ No newline at end of file
> diff --git a/libgcc/config/i386/sfp-machine.h 
> b/libgcc/config/i386/sfp-machine.h
> index 8319f0550bc..f15d29d3755 100644
> --- a/libgcc/config/i386/sfp-machine.h
> +++ b/libgcc/config/i386/sfp-machine.h
> @@ -17,6 +17,7 @@ typedef int __gcc_CMPtype __attribute__ ((mode 
> (__libgcc_cmp_return__)));
>  #define _FP_KEEPNANFRACP   1
>  #define _FP_QNANNEGATEDP 0
>
> +#define _FP_NANSIGN_H  1
>  #define _FP_NANSIGN_S  1
>  #define _FP_NANSIGN_D  1
>  #define _FP_NANSIGN_E  1
> diff --git a/libgcc/config/i386/t-softfp b/libgcc/config/i386/t-softfp
> index 685d9cf8502..4ac214eb0ce 100644
> --- a/libgcc/config/i386/t-softfp
> +++ b/libgcc/config/i386/t-softfp
> @@ -1 +1,6 @@
>  LIB2ADD += $(srcdir)/config/i386/sfp-exceptions.c
> +
> +softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf
> +softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf
> +
> +softfp_extras += eqhf2
>

[PATCH] x86: Remove OPTION_MASK_ISA_SSE4_2 from CRC32 _builtin functions

2021-07-21 Thread H.J. Lu via Gcc-patches

Since

commit 39671f87b2df6a1894cc11a161e4a7949d1ddccd
Author: H.J. Lu 
Date:   Thu Apr 15 05:59:48 2021 -0700

x86: Use crc32 target option for CRC32 intrinsics

enabled OPTION_MASK_ISA_CRC32 for -msse4 and removed TARGET_SSE4_2 check
in sse4_2_crc32 pattens, remove OPTION_MASK_ISA_SSE4_2 from CRC32
_builtin functions.

gcc/

PR target/101549
* config/i386/i386-builtin.def: Remove OPTION_MASK_ISA_SSE4_2
from CRC32 _builtin functions.

gcc/testsuite/

PR target/101549
* gcc.target/i386/crc32-6.c: New test.
---
 gcc/config/i386/i386-builtin.def|  8 
 gcc/testsuite/gcc.target/i386/crc32-6.c | 13 +
 2 files changed, 17 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/crc32-6.c

diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 1cc0cc6968c..4b1ae0eb84c 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -970,10 +970,10 @@ BDESC (OPTION_MASK_ISA_SSE4_1, 0, 
CODE_FOR_sse4_1_ptestv2di, "__builtin_ia32_pte
 
 /* SSE4.2 */
 BDESC (OPTION_MASK_ISA_SSE4_2, 0, CODE_FOR_nothing, "__builtin_ia32_pcmpgtq", 
IX86_BUILTIN_PCMPGTQ, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI)
-BDESC (OPTION_MASK_ISA_SSE4_2 | OPTION_MASK_ISA_CRC32, 0, 
CODE_FOR_sse4_2_crc32qi, "__builtin_ia32_crc32qi", IX86_BUILTIN_CRC32QI, 
UNKNOWN, (int) UINT_FTYPE_UINT_UCHAR)
-BDESC (OPTION_MASK_ISA_SSE4_2 | OPTION_MASK_ISA_CRC32, 0, 
CODE_FOR_sse4_2_crc32hi, "__builtin_ia32_crc32hi", IX86_BUILTIN_CRC32HI, 
UNKNOWN, (int) UINT_FTYPE_UINT_USHORT)
-BDESC (OPTION_MASK_ISA_SSE4_2 | OPTION_MASK_ISA_CRC32, 0, 
CODE_FOR_sse4_2_crc32si, "__builtin_ia32_crc32si", IX86_BUILTIN_CRC32SI, 
UNKNOWN, (int) UINT_FTYPE_UINT_UINT)
-BDESC (OPTION_MASK_ISA_SSE4_2 | OPTION_MASK_ISA_CRC32 | OPTION_MASK_ISA_64BIT, 
0, CODE_FOR_sse4_2_crc32di, "__builtin_ia32_crc32di", IX86_BUILTIN_CRC32DI, 
UNKNOWN, (int) UINT64_FTYPE_UINT64_UINT64)
+BDESC (OPTION_MASK_ISA_CRC32, 0, CODE_FOR_sse4_2_crc32qi, 
"__builtin_ia32_crc32qi", IX86_BUILTIN_CRC32QI, UNKNOWN, (int) 
UINT_FTYPE_UINT_UCHAR)
+BDESC (OPTION_MASK_ISA_CRC32, 0, CODE_FOR_sse4_2_crc32hi, 
"__builtin_ia32_crc32hi", IX86_BUILTIN_CRC32HI, UNKNOWN, (int) 
UINT_FTYPE_UINT_USHORT)
+BDESC (OPTION_MASK_ISA_CRC32, 0, CODE_FOR_sse4_2_crc32si, 
"__builtin_ia32_crc32si", IX86_BUILTIN_CRC32SI, UNKNOWN, (int) 
UINT_FTYPE_UINT_UINT)
+BDESC (OPTION_MASK_ISA_CRC32 | OPTION_MASK_ISA_64BIT, 0, 
CODE_FOR_sse4_2_crc32di, "__builtin_ia32_crc32di", IX86_BUILTIN_CRC32DI, 
UNKNOWN, (int) UINT64_FTYPE_UINT64_UINT64)
 
 /* SSE4A */
 BDESC (OPTION_MASK_ISA_SSE4A, 0, CODE_FOR_sse4a_extrqi, 
"__builtin_ia32_extrqi", IX86_BUILTIN_EXTRQI, UNKNOWN, (int) 
V2DI_FTYPE_V2DI_UINT_UINT)
diff --git a/gcc/testsuite/gcc.target/i386/crc32-6.c 
b/gcc/testsuite/gcc.target/i386/crc32-6.c
new file mode 100644
index 000..464e3444069
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/crc32-6.c
@@ -0,0 +1,13 @@
+/* PR target/101549 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse4 -mno-crc32" } */
+
+#include 
+
+unsigned int
+test_mm_crc32_u8 (unsigned int CRC, unsigned char V)
+{
+  return _mm_crc32_u8 (CRC, V);
+}
+
+/* { dg-error "needs isa option -mcrc32" "" { target *-*-* } 0  } */
-- 
2.31.1

Re: [PATCH] Fix PR 10153: tail recusion for vector types.

2021-07-21 Thread Richard Biener via Gcc-patches

On Wed, Jul 21, 2021 at 12:03 AM apinski--- via Gcc-patches
 wrote:
>
> From: Andrew Pinski 
>
> The problem here is we try to an initialized value
> from a scalar constant. For vectors we need to do
> a vect_dup instead.  This fixes that issue and we
> get the correct code even and it does not crash.
>
> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.

I think the create_tailcall_accumulators should simply
use build_{one,zero}_cst (type) instead of passing down
integer_{zero,one}_node.  Then we can even elide the fold_convert ...

> gcc/ChangeLog:
>
> PR tree-optimize/10153
> * tree-tailcall.c (create_tailcall_accumulator):
> For vector types create a duplicated VECTOR_CST
> before calling fold_convert.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimize/10153
> * gcc.c-torture/compile/pr10153-1.c: New test.
> * gcc.c-torture/compile/pr10153-2.c: New test.
> ---
>  gcc/testsuite/gcc.c-torture/compile/pr10153-1.c | 7 +++
>  gcc/testsuite/gcc.c-torture/compile/pr10153-2.c | 9 +
>  gcc/tree-tailcall.c | 3 +++
>  3 files changed, 19 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr10153-1.c
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr10153-2.c
>
> diff --git a/gcc/testsuite/gcc.c-torture/compile/pr10153-1.c 
> b/gcc/testsuite/gcc.c-torture/compile/pr10153-1.c
> new file mode 100644
> index 000..3f2040f32a1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/pr10153-1.c
> @@ -0,0 +1,7 @@
> +typedef int V __attribute__ ((vector_size (2 * sizeof (int;
> +V
> +foo (void)
> +{
> +  V v = { };
> +  return v - foo();
> +}
> diff --git a/gcc/testsuite/gcc.c-torture/compile/pr10153-2.c 
> b/gcc/testsuite/gcc.c-torture/compile/pr10153-2.c
> new file mode 100644
> index 000..1af4c8e2a36
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/pr10153-2.c
> @@ -0,0 +1,9 @@
> +typedef int V __attribute__ ((vector_size (2 * sizeof (int;
> +V
> +foo (int t)
> +{
> +  if (t < 10)
> +return (V){1, 1};
> +  V v = { };
> +  return v - foo(t - 1);
> +}
> diff --git a/gcc/tree-tailcall.c b/gcc/tree-tailcall.c
> index a4d31c90c49..9d1a98b1cfd 100644
> --- a/gcc/tree-tailcall.c
> +++ b/gcc/tree-tailcall.c
> @@ -1080,6 +1080,9 @@ create_tailcall_accumulator (const char *label, 
> basic_block bb, tree init)
>
>phi = create_phi_node (tmp, bb);
>/* RET_TYPE can be a float when -ffast-maths is enabled.  */
> +  /* For vectors create a dup. */
> +  if (VECTOR_TYPE_P (ret_type))
> +init = build_vector_from_val (ret_type, fold_convert (TREE_TYPE 
> (ret_type), init));
>add_phi_arg (phi, fold_convert (ret_type, init), single_pred_edge (bb),
>UNKNOWN_LOCATION);
>return PHI_RESULT (phi);
> --
> 2.27.0
>

Re: [PATCH v3] Add QI vector mode support to by-pieces for memset

2021-07-21 Thread Richard Sandiford via Gcc-patches

"H.J. Lu"  writes:
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index 39ab139b7e1..1972301ce3c 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -3890,13 +3890,16 @@ expand_builtin_strnlen (tree exp, rtx target, 
> machine_mode target_mode)
>  
>  static rtx
>  builtin_memcpy_read_str (void *data, void *, HOST_WIDE_INT offset,
> -  scalar_int_mode mode)
> +  fixed_size_mode mode)
>  {
>/* The REPresentation pointed to by DATA need not be a nul-terminated
>   string but the caller guarantees it's large enough for MODE.  */
>const char *rep = (const char *) data;
>  
> -  return c_readstr (rep + offset, mode, /*nul_terminated=*/false);
> +  /* NB: Vector mode in the by-pieces infrastructure is only used by
> + the memset expander.  */

Sorry to nitpick, but I guess this might get out out-of-date.  Maybe:

  /* The by-pieces infrastructure does not try to pick a vector mode
 for memcpy expansion.  */

> +  return c_readstr (rep + offset, as_a  (mode),
> + /*nul_terminated=*/false);
>  }
>  
>  /* LEN specify length of the block of memcpy/memset operation.
> @@ -6478,14 +6481,16 @@ expand_builtin_stpncpy (tree exp, rtx)
>  
>  rtx
>  builtin_strncpy_read_str (void *data, void *, HOST_WIDE_INT offset,
> -   scalar_int_mode mode)
> +   fixed_size_mode mode)
>  {
>const char *str = (const char *) data;
>  
>if ((unsigned HOST_WIDE_INT) offset > strlen (str))
>  return const0_rtx;
>  
> -  return c_readstr (str + offset, mode);
> +  /* NB: Vector mode in the by-pieces infrastructure is only used by
> + the memset expander.  */

Similarly here for strncpy expansion.

> +  return c_readstr (str + offset, as_a  (mode));
>  }
>  
>  /* Helper to check the sizes of sequences and the destination of calls
> @@ -6686,30 +6691,117 @@ expand_builtin_strncpy (tree exp, rtx target)
>return NULL_RTX;
>  }
>  
> -/* Callback routine for store_by_pieces.  Read GET_MODE_BITSIZE (MODE)
> -   bytes from constant string DATA + OFFSET and return it as target
> -   constant.  If PREV isn't nullptr, it has the RTL info from the
> +/* Return the RTL of a register in MODE generated from PREV in the
> previous iteration.  */
>  
> -rtx
> -builtin_memset_read_str (void *data, void *prevp,
> -  HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
> -  scalar_int_mode mode)
> +static rtx
> +gen_memset_value_from_prev (by_pieces_prev *prev, fixed_size_mode mode)
>  {
> -  by_pieces_prev *prev = (by_pieces_prev *) prevp;
> +  rtx target = nullptr;
>if (prev != nullptr && prev->data != nullptr)
>  {
>/* Use the previous data in the same mode.  */
>if (prev->mode == mode)
>   return prev->data;
> +
> +  fixed_size_mode prev_mode = prev->mode;
> +
> +  /* Don't use the previous data to write QImode if it is in a
> +  vector mode.  */
> +  if (VECTOR_MODE_P (prev_mode) && mode == QImode)
> + return target;
> +
> +  rtx prev_rtx = prev->data;
> +
> +  if (REG_P (prev_rtx)
> +   && HARD_REGISTER_P (prev_rtx)
> +   && lowpart_subreg_regno (REGNO (prev_rtx), prev_mode, mode) < 0)
> + {
> +   /* If we can't put a hard register in MODE, first generate a
> +  subreg of word mode if the previous mode is wider than word
> +  mode and word mode is wider than MODE.  */
> +   if (UNITS_PER_WORD < GET_MODE_SIZE (prev_mode)
> +   && UNITS_PER_WORD > GET_MODE_SIZE (mode))
> + {
> +   prev_rtx = lowpart_subreg (word_mode, prev_rtx,
> +  prev_mode);
> +   if (prev_rtx != nullptr)
> + prev_mode = word_mode;
> + }
> +   else
> + prev_rtx = nullptr;

I don't understand this.  Why not just do the:

  if (REG_P (prev_rtx)
  && HARD_REGISTER_P (prev_rtx)
  && lowpart_subreg_regno (REGNO (prev_rtx), prev_mode, mode) < 0)
prev_rtx = copy_to_reg (prev_rtx);

that I suggested in the previous review?

IMO anything that relies on a sequence of two subreg operations is
doing something wrong.

> + }
> +  if (prev_rtx != nullptr)
> + target = lowpart_subreg (mode, prev_rtx, prev_mode);
>  }
> +  return target;
> +}
> +
> […]
> @@ -769,21 +769,41 @@ alignment_for_piecewise_move (unsigned int max_pieces, 
> unsigned int align)
>return align;
>  }
>  
> -/* Return the widest integer mode that is narrower than SIZE bytes.  */
> +/* Return the widest QI vector, if QI_MODE is true, or integer mode
> +   that is narrower than SIZE bytes.  */
>  
> -static scalar_int_mode
> -widest_int_mode_for_size (unsigned int size)
> +static fixed_size_mode
> +widest_fixed_size_mode_for_size (unsigned int size, bool qi_vector)
>  {
> -  scalar_int_mode result = NARROWEST_INT_MODE;
> +  machine_mode result = NARROWEST_INT_MODE;
>  
>gcc_checking_assert (size > 1);
>  
> +  /* Use QI vec

[PATCH 0/4] drop version checks for in-tree gas [PR91602]

2021-07-21 Thread Serge Belyshev

Special-casing checks for in-tree gas features is unnecessary since
r17 which made configure-gcc depend on all-gas, and thus making
alternate code path in gcc_GAS_CHECK_FEATURE for in-tree gas
redundant.

Along the way this fixes PR 91602, which is caused by incorrect guess
of leb128 support presense in RISC-V.

First patch removes alternate code path in gcc_GAS_CHECK_FEATURE and
related code, the rest are further cleanups.  Patches 2 and 3 in
series make no functional changes, thus configure is unchanged.

Bootstrapped/regtested on x86_64-pc-linux-gnu, riscv64-unknown-linux-gnu,
sparc-sun-solaris2.11 and powerpc-ibm-aix7.{1.5.0,2.4.0}, with and without
in-tree binutils (except on aix where combined tree does not appear to work
due to dynamic linker peculiarity).

OK for mainline ?

Serge Belyshev (4):
  configure: drop version checks for in-tree gas [PR91602]
  configure: remove version argument from gcc_GAS_CHECK_FEATURE
  configure: fixup formatting from previous change
  configure: remove gas versions from tls check

 gcc/acinclude.m4 |  82 +---
 gcc/configure| 472 ++-
 gcc/configure.ac | 335 -
 3 files changed, 188 insertions(+), 701 deletions(-)

[PATCH 1/4] configure: drop version checks for in-tree gas [PR91602]

2021-07-21 Thread Serge Belyshev

configure: drop version checks for in-tree gas [PR91602]

gcc/ChangeLog:

PR build/91602
* acinclude.m4 (_gcc_COMPUTE_GAS_VERSION, _gcc_GAS_VERSION_GTE_IFELSE)
(gcc_GAS_VERSION_GTE_IFELSE): Remove.
(gcc_GAS_CHECK_FEATURE): Do not handle in-tree case specially.
* configure.ac: Remove gcc_cv_gas_major_version, 
gcc_cv_gas_minor_version.
Remove remaining checks for in-tree assembler.
* configure: Regenerate.
---
 gcc/acinclude.m4 |  66 +---
 gcc/configure| 414 +++
 gcc/configure.ac |  26 +--
 3 files changed, 61 insertions(+), 445 deletions(-)

diff --git a/gcc/acinclude.m4 b/gcc/acinclude.m4
index f9f6a07b040..e038990cca6 100644
--- a/gcc/acinclude.m4
+++ b/gcc/acinclude.m4
@@ -442,63 +442,6 @@ AC_DEFINE_UNQUOTED(HAVE_INITFINI_ARRAY_SUPPORT,
   [Define 0/1 if .init_array/.fini_array sections are available and working.])
 ])
 
-dnl # _gcc_COMPUTE_GAS_VERSION
-dnl # Used by gcc_GAS_VERSION_GTE_IFELSE
-dnl #
-dnl # WARNING:
-dnl # gcc_cv_as_gas_srcdir must be defined before this.
-dnl # This gross requirement will go away eventually.
-AC_DEFUN([_gcc_COMPUTE_GAS_VERSION],
-[gcc_cv_as_bfd_srcdir=`echo $srcdir | sed -e 's,/gcc$,,'`/bfd
-for f in $gcc_cv_as_bfd_srcdir/configure \
- $gcc_cv_as_gas_srcdir/configure \
- $gcc_cv_as_gas_srcdir/configure.ac \
- $gcc_cv_as_gas_srcdir/Makefile.in ; do
-  gcc_cv_gas_version=`sed -n -e 's/^[[ 
]]*VERSION=[[^0-9A-Za-z_]]*\([[0-9]]*\.[[0-9]]*.*\)/VERSION=\1/p' < $f`
-  if test x$gcc_cv_gas_version != x; then
-break
-  fi
-done
-case $gcc_cv_gas_version in
-  VERSION=[[0-9]]*) ;;
-  *) AC_MSG_ERROR([[cannot find version of in-tree assembler]]);;
-esac
-gcc_cv_gas_major_version=`expr "$gcc_cv_gas_version" : "VERSION=\([[0-9]]*\)"`
-gcc_cv_gas_minor_version=`expr "$gcc_cv_gas_version" : 
"VERSION=[[0-9]]*\.\([[0-9]]*\)"`
-gcc_cv_gas_patch_version=`expr "$gcc_cv_gas_version" : 
"VERSION=[[0-9]]*\.[[0-9]]*\.\([[0-9]]*\)"`
-case $gcc_cv_gas_patch_version in
-  "") gcc_cv_gas_patch_version="0" ;;
-esac
-gcc_cv_gas_vers=`expr \( \( $gcc_cv_gas_major_version \* 1000 \) \
-   + $gcc_cv_gas_minor_version \) \* 1000 \
-   + $gcc_cv_gas_patch_version`
-]) []dnl # _gcc_COMPUTE_GAS_VERSION
-
-dnl # gcc_GAS_VERSION_GTE_IFELSE([elf,] major, minor, patchlevel,
-dnl # [command_if_true = :], [command_if_false = :])
-dnl # Check to see if the version of GAS is greater than or
-dnl # equal to the specified version.
-dnl #
-dnl # The first ifelse() shortens the shell code if the patchlevel
-dnl # is unimportant (the usual case).  The others handle missing
-dnl # commands.  Note that the tests are structured so that the most
-dnl # common version number cases are tested first.
-AC_DEFUN([_gcc_GAS_VERSION_GTE_IFELSE],
-[ifelse([$1], elf,
- [if test $in_tree_gas_is_elf = yes \
-  &&],
- [if]) test $gcc_cv_gas_vers -ge `expr \( \( $2 \* 1000 \) + $3 \) \* 1000 + 
$4`
-  then dnl
-ifelse([$5],,:,[$5])[]dnl
-ifelse([$6],,,[
-  else $6])
-fi])
-
-AC_DEFUN([gcc_GAS_VERSION_GTE_IFELSE],
-[AC_REQUIRE([_gcc_COMPUTE_GAS_VERSION])dnl
-ifelse([$1], elf, [_gcc_GAS_VERSION_GTE_IFELSE($@)],
-  [_gcc_GAS_VERSION_GTE_IFELSE(,$@)])])
-
 dnl # gcc_GAS_FLAGS
 dnl # Used by gcc_GAS_CHECK_FEATURE 
 dnl #
@@ -531,9 +474,7 @@ dnl gcc_GAS_CHECK_FEATURE(description, cv, 
[[elf,]major,minor,patchlevel],
 dnl [extra switches to as], [assembler input],
 dnl [extra testing logic], [command if feature available])
 dnl
-dnl Checks for an assembler feature.  If we are building an in-tree
-dnl gas, the feature is available if the associated assembler version
-dnl is greater than or equal to major.minor.patchlevel.  If not, then
+dnl Checks for an assembler feature.
 dnl ASSEMBLER INPUT is fed to the assembler and the feature is available
 dnl if assembly succeeds.  If EXTRA TESTING LOGIC is not the empty string,
 dnl then it is run instead of simply setting CV to "yes" - it is responsible
@@ -542,10 +483,7 @@ AC_DEFUN([gcc_GAS_CHECK_FEATURE],
 [AC_REQUIRE([gcc_GAS_FLAGS])dnl
 AC_CACHE_CHECK([assembler for $1], [$2],
  [[$2]=no
-  ifelse([$3],,,[dnl
-  if test $in_tree_gas = yes; then
-gcc_GAS_VERSION_GTE_IFELSE($3, [[$2]=yes])
-  el])if test x$gcc_cv_as != x; then
+  if test x$gcc_cv_as != x; then
 AS_ECHO([ifelse(m4_substr([$5],0,1),[$], "[$5]", '[$5]')]) > conftest.s
 if AC_TRY_COMMAND([$gcc_cv_as $gcc_cv_as_flags $4 -o conftest.o conftest.s 
>&AS_MESSAGE_LOG_FD])
 then
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 26da07325e7..c6e0bfdde90 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -2556,8 +2556,6 @@ AC_SUBST(enable_fast_install)
 # If build != host, and we aren't building gas in-tree, we identify a
 # build->target assembler and hope that it will have the same features
 # as the host->target assembler we'll be using.
-gcc_cv_gas_major_version=
-gcc_cv_gas_minor_version=

[PATCH 2/4] configure: remove version argument from gcc_GAS_CHECK_FEATURE

2021-07-21 Thread Serge Belyshev

configure: remove version argument from gcc_GAS_CHECK_FEATURE

gcc/ChangeLog:

* acinclude.m4 (gcc_GAS_CHECK_FEATURE): Remove third argument and ...
* configure.ac: ... update all callers.
---
 gcc/acinclude.m4 |  16 ++--
 gcc/configure.ac | 224 +++
 2 files changed, 120 insertions(+), 120 deletions(-)

diff --git a/gcc/acinclude.m4 b/gcc/acinclude.m4
index e038990cca6..082fa16ecb5 100644
--- a/gcc/acinclude.m4
+++ b/gcc/acinclude.m4
@@ -470,7 +470,7 @@ AC_DEFUN([gcc_GAS_FLAGS],
   esac])
 ])
 
-dnl gcc_GAS_CHECK_FEATURE(description, cv, [[elf,]major,minor,patchlevel],
+dnl gcc_GAS_CHECK_FEATURE(description, cv,
 dnl [extra switches to as], [assembler input],
 dnl [extra testing logic], [command if feature available])
 dnl
@@ -484,23 +484,23 @@ AC_DEFUN([gcc_GAS_CHECK_FEATURE],
 AC_CACHE_CHECK([assembler for $1], [$2],
  [[$2]=no
   if test x$gcc_cv_as != x; then
-AS_ECHO([ifelse(m4_substr([$5],0,1),[$], "[$5]", '[$5]')]) > conftest.s
-if AC_TRY_COMMAND([$gcc_cv_as $gcc_cv_as_flags $4 -o conftest.o conftest.s 
>&AS_MESSAGE_LOG_FD])
+AS_ECHO([ifelse(m4_substr([$4],0,1),[$], "[$4]", '[$4]')]) > conftest.s
+if AC_TRY_COMMAND([$gcc_cv_as $gcc_cv_as_flags $3 -o conftest.o conftest.s 
>&AS_MESSAGE_LOG_FD])
 then
-   ifelse([$6],, [$2]=yes, [$6])
+   ifelse([$5],, [$2]=yes, [$5])
 else
   echo "configure: failed program was" >&AS_MESSAGE_LOG_FD
   cat conftest.s >&AS_MESSAGE_LOG_FD
 fi
 rm -f conftest.o conftest.s
   fi])
-ifelse([$7],,,[dnl
+ifelse([$6],,,[dnl
 if test $[$2] = yes; then
-  $7
+  $6
 fi])
-ifelse([$8],,,[dnl
+ifelse([$7],,,[dnl
 if test $[$2] != yes; then
-  $8
+  $7
 fi])])
 
 dnl GCC_TARGET_TEMPLATE(KEY)
diff --git a/gcc/configure.ac b/gcc/configure.ac
index c6e0bfdde90..3846794b949 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -2884,27 +2884,27 @@ esac
 
 # Figure out what assembler alignment features are present.
 gcc_GAS_CHECK_FEATURE([.balign and .p2align], gcc_cv_as_balign_and_p2align,
- [2,6,0],,
+ ,
 [.balign 4
 .p2align 2],,
 [AC_DEFINE(HAVE_GAS_BALIGN_AND_P2ALIGN, 1,
   [Define if your assembler supports .balign and .p2align.])])
 
 gcc_GAS_CHECK_FEATURE([.p2align with maximum skip], gcc_cv_as_max_skip_p2align,
- [2,8,0],,
+ ,
  [.p2align 4,,7],,
 [AC_DEFINE(HAVE_GAS_MAX_SKIP_P2ALIGN, 1,
   [Define if your assembler supports specifying the maximum number
of bytes to skip when using the GAS .p2align command.])])
 
 gcc_GAS_CHECK_FEATURE([.literal16], gcc_cv_as_literal16,
- [2,8,0],,
+ ,
  [.literal16],,
 [AC_DEFINE(HAVE_GAS_LITERAL16, 1,
   [Define if your assembler supports .literal16.])])
 
 gcc_GAS_CHECK_FEATURE([working .subsection -1], gcc_cv_as_subsection_m1,
- [elf,2,9,0],,
+ ,
  [conftest_label1: .word 0
 .subsection -1
 conftest_label2: .word 0
@@ -2923,17 +2923,17 @@ conftest_label2: .word 0
emitting at the beginning of your section.])])
 
 gcc_GAS_CHECK_FEATURE([.weak], gcc_cv_as_weak,
- [2,2,0],,
+ ,
  [ .weak foobar],,
 [AC_DEFINE(HAVE_GAS_WEAK, 1, [Define if your assembler supports .weak.])])
 
 gcc_GAS_CHECK_FEATURE([.weakref], gcc_cv_as_weakref,
- [2,17,0],,
+ ,
  [ .weakref foobar, barfnot],,
 [AC_DEFINE(HAVE_GAS_WEAKREF, 1, [Define if your assembler supports 
.weakref.])])
 
 gcc_GAS_CHECK_FEATURE([.nsubspa comdat], gcc_cv_as_nsubspa_comdat,
- [2,15,91],,
+ ,
  [ .SPACE $TEXT$
.NSUBSPA $CODE$,COMDAT],,
 [AC_DEFINE(HAVE_GAS_NSUBSPA_COMDAT, 1, [Define if your assembler supports 
.nsubspa comdat option.])])
@@ -2955,7 +2955,7 @@ foobar:'
 ;;
 esac
 gcc_GAS_CHECK_FEATURE([.hidden], gcc_cv_as_hidden,
- [elf,2,13,0],, [$conftest_s])
+ , [$conftest_s])
 case "${target}" in
   *-*-darwin*)
 # Darwin as has some visibility support, though with a different syntax.
@@ -3174,7 +3174,7 @@ gcc_AC_INITFINI_ARRAY
 # Older versions of GAS and some non-GNU assemblers, have a bugs handling
 # these directives, even when they appear to accept them.
 gcc_GAS_CHECK_FEATURE([.sleb128 and .uleb128], gcc_cv_as_leb128,
- [elf,2,11,0],,
+ ,
 [  .data
.uleb128 L2 - L1
 L1:
@@ -3213,7 +3213,7 @@ gcc_fn_eh_frame_ro () {
 
 # Check if we have assembler support for unwind directives.
 gcc_GAS_CHECK_FEATURE([cfi directives], gcc_cv_as_cfi_directive,
-  ,,
+  ,
 [  .text
.cfi_startproc
.cfi_offset 0, 0
@@ -3269,7 +3269,7 @@ gcc_GAS_CHECK_FEATURE([cfi directives], 
gcc_cv_as_cfi_directive,
 esac])
 if test $gcc_cv_as_cfi_directive = yes && test x$gcc_cv_objdump != x; then
 gcc_GAS_CHECK_FEATURE([working cfi advance], gcc_cv_as_cfi_advance_working,
-  ,,
+  ,
 [  .text
.cfi_startproc
.cfi_adjust_cfa_offset 64
@@ -3294,7 +3294,7 @@ AC_DEFINE_UNQUOTED(HAVE_GAS_CFI_DIRECTIVE,
 
 GCC_TARGET_TEMPLATE(HAVE_GAS_CFI_PERSONALITY_DIRECTIVE)
 gcc_GAS_CHECK_FEATURE([cfi personality directive],
-  gcc_cv_as_cfi_personality_directive, ,,
+  gcc_cv_as_cfi_personality_directive,,
 [  .text
.cfi_startproc

[PATCH 3/4] configure: fixup formatting from previous change

2021-07-21 Thread Serge Belyshev

configure: fixup formatting from previous change

gcc/ChangeLog:

* configure.ac: Fixup formatting.
---
 gcc/configure.ac | 71 ++--
 1 file changed, 27 insertions(+), 44 deletions(-)

diff --git a/gcc/configure.ac b/gcc/configure.ac
index 3846794b949..6b452904ce7 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -2883,28 +2883,24 @@ case "$ORIGINAL_DSYMUTIL_FOR_TARGET" in
 esac 
 
 # Figure out what assembler alignment features are present.
-gcc_GAS_CHECK_FEATURE([.balign and .p2align], gcc_cv_as_balign_and_p2align,
- ,
+gcc_GAS_CHECK_FEATURE([.balign and .p2align], gcc_cv_as_balign_and_p2align,,
 [.balign 4
 .p2align 2],,
 [AC_DEFINE(HAVE_GAS_BALIGN_AND_P2ALIGN, 1,
   [Define if your assembler supports .balign and .p2align.])])
 
-gcc_GAS_CHECK_FEATURE([.p2align with maximum skip], gcc_cv_as_max_skip_p2align,
- ,
+gcc_GAS_CHECK_FEATURE([.p2align with maximum skip], 
gcc_cv_as_max_skip_p2align,,
  [.p2align 4,,7],,
 [AC_DEFINE(HAVE_GAS_MAX_SKIP_P2ALIGN, 1,
   [Define if your assembler supports specifying the maximum number
of bytes to skip when using the GAS .p2align command.])])
 
-gcc_GAS_CHECK_FEATURE([.literal16], gcc_cv_as_literal16,
- ,
+gcc_GAS_CHECK_FEATURE([.literal16], gcc_cv_as_literal16,,
  [.literal16],,
 [AC_DEFINE(HAVE_GAS_LITERAL16, 1,
   [Define if your assembler supports .literal16.])])
 
-gcc_GAS_CHECK_FEATURE([working .subsection -1], gcc_cv_as_subsection_m1,
- ,
+gcc_GAS_CHECK_FEATURE([working .subsection -1], gcc_cv_as_subsection_m1,,
  [conftest_label1: .word 0
 .subsection -1
 conftest_label2: .word 0
@@ -2922,18 +2918,15 @@ conftest_label2: .word 0
   [Define if your assembler supports .subsection and .subsection -1 starts
emitting at the beginning of your section.])])
 
-gcc_GAS_CHECK_FEATURE([.weak], gcc_cv_as_weak,
- ,
+gcc_GAS_CHECK_FEATURE([.weak], gcc_cv_as_weak,,
  [ .weak foobar],,
 [AC_DEFINE(HAVE_GAS_WEAK, 1, [Define if your assembler supports .weak.])])
 
-gcc_GAS_CHECK_FEATURE([.weakref], gcc_cv_as_weakref,
- ,
+gcc_GAS_CHECK_FEATURE([.weakref], gcc_cv_as_weakref,,
  [ .weakref foobar, barfnot],,
 [AC_DEFINE(HAVE_GAS_WEAKREF, 1, [Define if your assembler supports 
.weakref.])])
 
-gcc_GAS_CHECK_FEATURE([.nsubspa comdat], gcc_cv_as_nsubspa_comdat,
- ,
+gcc_GAS_CHECK_FEATURE([.nsubspa comdat], gcc_cv_as_nsubspa_comdat,,
  [ .SPACE $TEXT$
.NSUBSPA $CODE$,COMDAT],,
 [AC_DEFINE(HAVE_GAS_NSUBSPA_COMDAT, 1, [Define if your assembler supports 
.nsubspa comdat option.])])
@@ -2954,8 +2947,7 @@ case "${target}" in
 foobar:'
 ;;
 esac
-gcc_GAS_CHECK_FEATURE([.hidden], gcc_cv_as_hidden,
- , [$conftest_s])
+gcc_GAS_CHECK_FEATURE([.hidden], gcc_cv_as_hidden,, [$conftest_s])
 case "${target}" in
   *-*-darwin*)
 # Darwin as has some visibility support, though with a different syntax.
@@ -3173,8 +3165,7 @@ gcc_AC_INITFINI_ARRAY
 # Check if we have .[us]leb128, and support symbol arithmetic with it.
 # Older versions of GAS and some non-GNU assemblers, have a bugs handling
 # these directives, even when they appear to accept them.
-gcc_GAS_CHECK_FEATURE([.sleb128 and .uleb128], gcc_cv_as_leb128,
- ,
+gcc_GAS_CHECK_FEATURE([.sleb128 and .uleb128], gcc_cv_as_leb128,,
 [  .data
.uleb128 L2 - L1
 L1:
@@ -3212,8 +3203,7 @@ gcc_fn_eh_frame_ro () {
 }
 
 # Check if we have assembler support for unwind directives.
-gcc_GAS_CHECK_FEATURE([cfi directives], gcc_cv_as_cfi_directive,
-  ,
+gcc_GAS_CHECK_FEATURE([cfi directives], gcc_cv_as_cfi_directive,,
 [  .text
.cfi_startproc
.cfi_offset 0, 0
@@ -3268,8 +3258,7 @@ gcc_GAS_CHECK_FEATURE([cfi directives], 
gcc_cv_as_cfi_directive,
 ;;
 esac])
 if test $gcc_cv_as_cfi_directive = yes && test x$gcc_cv_objdump != x; then
-gcc_GAS_CHECK_FEATURE([working cfi advance], gcc_cv_as_cfi_advance_working,
-  ,
+gcc_GAS_CHECK_FEATURE([working cfi advance], gcc_cv_as_cfi_advance_working,,
 [  .text
.cfi_startproc
.cfi_adjust_cfa_offset 64
@@ -3332,8 +3321,7 @@ AC_DEFINE_UNQUOTED(HAVE_GAS_CFI_SECTIONS_DIRECTIVE,
 
 # GAS versions up to and including 2.11.0 may mis-optimize
 # .eh_frame data.
-gcc_GAS_CHECK_FEATURE(eh_frame optimization, gcc_cv_as_eh_frame,
-  ,
+gcc_GAS_CHECK_FEATURE(eh_frame optimization, gcc_cv_as_eh_frame,,
 [  .text
 .LFB1:
.4byte  0
@@ -3636,8 +3624,7 @@ case "${target}" in
 esac
 
 gcc_GAS_CHECK_FEATURE([line table is_stmt support],
- gcc_cv_as_is_stmt,
- ,
+ gcc_cv_as_is_stmt,,
 [  .text
.file 1 "conf.c"
.loc 1 1 0 is_stmt 1],,
@@ -3645,8 +3632,7 @@ gcc_GAS_CHECK_FEATURE([line table is_stmt support],
   [Define if your assembler supports the .loc is_stmt sub-directive.])])
 
 gcc_GAS_CHECK_FEATURE([line table discriminator support],
- gcc_cv_as_discriminator,
- ,
+ gcc_cv_as_discriminator,,
 [  .text
.file 1 "conf.c"
.loc 1 1 0 discriminator 1],,
@@ -4741,16 +4727,15 @@ changequote([,])dnl
# Recent binutils allows the three-operand fo

[PATCH 4/4] configure: remove gas versions from tls check

2021-07-21 Thread Serge Belyshev

configure: remove gas versions from tls check

gcc/ChangeLog:

* configure.ac (thread-local storage support): Remove tls_first_major
and tls_first_minor.  Use "$conftest_s" to check support.
* configure: Regenerate.
---
 gcc/configure| 58 +---
 gcc/configure.ac | 58 +---
 2 files changed, 2 insertions(+), 114 deletions(-)

diff --git a/gcc/configure.ac b/gcc/configure.ac
index 6b452904ce7..02211b376bf 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -3653,8 +3653,6 @@ esac], [])
 
 # Thread-local storage - the check is heavily parameterized.
 conftest_s=
-tls_first_major=
-tls_first_minor=
 tls_as_opt=
 case "$target" in
 changequote(,)dnl
@@ -3677,15 +3675,11 @@ foo:.long   25
ldah$2,foo($29) !tprelhi
lda $3,foo($2)  !tprello
lda $4,foo($29) !tprel'
-   tls_first_major=2
-   tls_first_minor=13
tls_as_opt=--fatal-warnings
;;
   arc*-*-*)
 conftest_s='
add_s r0,r0, @foo@tpoff'
-   tls_first_major=2
-   tls_first_minor=23
;;
   cris-*-*|crisv32-*-*)
 conftest_s='
@@ -3694,8 +3688,6 @@ x:  .long   25
 .text
move.d x:IE,$r10
nop'
-   tls_first_major=2
-   tls_first_minor=20
tls_as_opt=--fatal-warnings
;;
   frv*-*-*)
@@ -3704,8 +3696,6 @@ x:  .long   25
 x:  .long   25
 .text
 call#gettlsoff(x)'
-   tls_first_major=2
-   tls_first_minor=14
;;
   hppa*-*-linux*)
 conftest_s='
@@ -3732,8 +3722,6 @@ foo:  .long   25
mfctl %cr27,%t1 
addil LR%foo-$tls_leoff$,%t1
ldo RR%foo-$tls_leoff$(%r1),%t2'
-   tls_first_major=2
-   tls_first_minor=15
tls_as_opt=--fatal-warnings
;;
   arm*-*-*)
@@ -3746,8 +3734,6 @@ foo:  .long   25
 .word foo(tlsgd)
 .word foo(tlsldm)
 .word foo(tlsldo)'
-   tls_first_major=2
-   tls_first_minor=17
;;
   i[34567]86-*-* | x86_64-*-*)
 case "$target" in
@@ -3761,8 +3747,6 @@ foo:  .long   25
 if test x$on_solaris = xyes && test x$gas_flag = xno; then
   conftest_s='
.section .tdata,"awt",@progbits'
-  tls_first_major=0
-  tls_first_minor=0
   tls_section_flag=t
 changequote([,])dnl
   AC_DEFINE(TLS_SECTION_ASM_FLAG, 't',
@@ -3771,8 +3755,6 @@ changequote(,)dnl
 else
   conftest_s='
.section ".tdata","awT",@progbits'
-  tls_first_major=2
-  tls_first_minor=14
   tls_section_flag=T
   tls_as_opt="--fatal-warnings"
 fi
@@ -3831,8 +3813,6 @@ foo:  data8   25
addlr20 = @tprel(foo#), gp
addsr22 = @tprel(foo#), r13
movlr24 = @tprel(foo#)'
-   tls_first_major=2
-   tls_first_minor=13
tls_as_opt=--fatal-warnings
;;
   microblaze*-*-*)
@@ -3843,8 +3823,6 @@ x:
.text
addik r5,r20,x@TLSGD
addik r5,r20,x@TLSLDM'
-   tls_first_major=2
-   tls_first_minor=20
tls_as_opt='--fatal-warnings'
;;
   mips*-*-*)
@@ -3860,8 +3838,6 @@ x:
lw $4, %gottprel(x)($28)
lui $4, %tprel_hi(x)
addiu $4, $4, %tprel_lo(x)'
-   tls_first_major=2
-   tls_first_minor=16
tls_as_opt='-32 --fatal-warnings'
;;
   m68k-*-*)
@@ -3876,15 +3852,11 @@ foo:
move.l x@TLSLDO(%a5),%a0
move.l x@TLSIE(%a5),%a0
move.l x@TLSLE(%a5),%a0'
-   tls_first_major=2
-   tls_first_minor=19
tls_as_opt='--fatal-warnings'
;;
   nios2-*-*)
   conftest_s='
.section ".tdata","awT",@progbits'
-   tls_first_major=2
-   tls_first_minor=23
tls_as_opt="--fatal-warnings"
;;
   aarch64*-*-*)
@@ -3896,8 +3868,6 @@ foo:  .long   25
add   x0, x0, #:tlsgd_lo12:x
 bl__tls_get_addr
nop'
-   tls_first_major=2
-   tls_first_minor=20
tls_as_opt='--fatal-warnings'
;;
   or1k*-*-*)
@@ -3908,8 +3878,6 @@ foo:  .long   25
l.movhi r3, tpoffha(foo)
l.add   r3, r3, r10
l.lwz   r4, tpofflo(foo)(r3)'
-tls_first_major=2
-tls_first_minor=30
 tls_as_opt=--fatal-warnings
 ;;
   powerpc-ibm-aix*)
@@ -3927,8 +3895,6 @@ LC..1:
.csect a[TL],4
 a:
.space 4'
-   tls_first_major=0
-   tls_first_minor=0
;;
   powerpc64*-*-*)
 conftest_s='
@@ -3960,8 +3926,6 @@ x3:   .space 8
add 9,9,3
bl .__tls_get_addr
nop'
-   tls_first_major=2
-   tls_first_minor=14
tls_as_opt="-a64 --fatal-warnings"
;;
   powerpc*-*-*)
@@ -3986,8 +3950,6 @@ x3:   .space 4
addi 9,2,x1@tprel
addis 9,2,x2@tprel@ha
addi 9,9,x2@tprel@l'
-   tls_first_major=2
-   tls_first_minor=14
tls_as_opt="-a32 --fatal-warnings"
;;
   riscv*-

Re: [NEWS] libstdc++: Fix testsuite for skipping gdb tests on remote/non-native target

2021-07-21 Thread Marc Poulhies via Gcc-patches

With the correct patch attached, sorry for the incorrect previous one !

Marc

- Original Message -
> From: "gcc-patches" 
> To: "gcc-patches" , "libstdc++" 
> 
> Cc: "Luc Michel" 
> Sent: Tuesday, July 20, 2021 4:12:16 PM
> Subject: [NEWS]  libstdc++: Fix testsuite for skipping gdb tests on 
> remote/non-native target

> This fixes an incorrect invocation of gdb on remote targets where DejaGNU 
> would
> try to run host's gdb in remote target simulator.
> gdb-test skips the testing when target is remote or non native but the gdb
> version check function does not.
> 
> libstdc++-v3/ChangeLog:
>* testsuite/lib/gdb-test.exp (gdb_batch_check): Exit if non native or 
> remote
> target.


diff --git a/libstdc++-v3/testsuite/lib/gdb-test.exp b/libstdc++-v3/testsuite/lib/gdb-test.exp
index af20c85e5a0..0ec9ac46c68 100644
--- a/libstdc++-v3/testsuite/lib/gdb-test.exp
+++ b/libstdc++-v3/testsuite/lib/gdb-test.exp
@@ -244,6 +244,8 @@ proc gdb-test { marker {selector {}} {load_xmethods 0} } {
 
 # Invoke gdb with a command and pattern-match the output.
 proc gdb_batch_check {command pattern} {
+if { ![isnative] || [is_remote target] } { return 0 }
+
 set gdb_name $::env(GUALITY_GDB_NAME)
 set cmd "$gdb_name -nw -nx -quiet -batch -ex \"$command\""
 send_log "Spawning: $cmd\n"

Re: [PUSHED] Abstract out non_null adjustments in ranger.

2021-07-21 Thread Aldy Hernandez via Gcc-patches

As I mentioned when I pushed the patch in this thread, I have run into
cases where a pointer has a non-varying range, but it includes 0.  The
varying check causes no further refinements to be done.  The previous
cases I had seen were in follow-up threader work, so I was delaying
pushing this until then.  But I'm now going through VRP cases that
evrp is missing, and this is one of the main culprits.

I will push this once tests are done.  If for some reason, varying
restrictions are needed, we can always adjust the callers to
adjust_range.

Aldy

On Thu, Jul 15, 2021 at 2:21 PM Aldy Hernandez  wrote:
>
> There are 4 exact copies of the non-null range adjusting code in the
> ranger.  This patch abstracts the functionality into a separate method.
>
> As a follow-up I would like to remove the varying_p check, since I have
> seen incoming ranges such as [0, 0xffef] which are not varying, but
> are not-null.  Removing the varying restriction catches those.
>
> Tested on x86-64 Linux.
>
> Pushed to trunk.
>
> p.s. Andrew, what are your thoughts on removing the varying_p() check as
> a follow-up?
>
> gcc/ChangeLog:
>
> * gimple-range-cache.cc (non_null_ref::adjust_range): New.
> (ranger_cache::range_of_def): Call adjust_range.
> (ranger_cache::entry_range): Same.
> * gimple-range-cache.h (non_null_ref::adjust_range): New.
> * gimple-range.cc (gimple_ranger::range_of_expr): Call
> adjust_range.
> (gimple_ranger::range_on_entry): Same.
> ---
>  gcc/gimple-range-cache.cc | 35 ++-
>  gcc/gimple-range-cache.h  |  2 ++
>  gcc/gimple-range.cc   |  8 ++--
>  3 files changed, 30 insertions(+), 15 deletions(-)
>
> diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
> index 98ecdbbd68e..23597ade802 100644
> --- a/gcc/gimple-range-cache.cc
> +++ b/gcc/gimple-range-cache.cc
> @@ -81,6 +81,29 @@ non_null_ref::non_null_deref_p (tree name, basic_block bb, 
> bool search_dom)
>return false;
>  }
>
> +// If NAME has a non-null dereference in block BB, adjust R with the
> +// non-zero information from non_null_deref_p, and return TRUE.  If
> +// SEARCH_DOM is true, non_null_deref_p should search the dominator tree.
> +
> +bool
> +non_null_ref::adjust_range (irange &r, tree name, basic_block bb,
> +   bool search_dom)
> +{
> +  // Check if pointers have any non-null dereferences.  Non-call
> +  // exceptions mean we could throw in the middle of the block, so just
> +  // punt for now on those.
> +  if (!cfun->can_throw_non_call_exceptions
> +  && r.varying_p ()
> +  && non_null_deref_p (name, bb, search_dom))
> +{
> +  int_range<2> nz;
> +  nz.set_nonzero (TREE_TYPE (name));
> +  r.intersect (nz);
> +  return true;
> +}
> +  return false;
> +}
> +
>  // Allocate an populate the bitmap for NAME.  An ON bit for a block
>  // index indicates there is a non-null reference in that block.  In
>  // order to populate the bitmap, a quick run of all the immediate uses
> @@ -857,9 +880,8 @@ ranger_cache::range_of_def (irange &r, tree name, 
> basic_block bb)
> r = gimple_range_global (name);
>  }
>
> -  if (bb && r.varying_p () && m_non_null.non_null_deref_p (name, bb, false) 
> &&
> -  !cfun->can_throw_non_call_exceptions)
> -r = range_nonzero (TREE_TYPE (name));
> +  if (bb)
> +m_non_null.adjust_range (r, name, bb, false);
>  }
>
>  // Get the range of NAME as it occurs on entry to block BB.
> @@ -878,12 +900,7 @@ ranger_cache::entry_range (irange &r, tree name, 
> basic_block bb)
>if (!m_on_entry.get_bb_range (r, name, bb))
>  range_of_def (r, name);
>
> -  // Check if pointers have any non-null dereferences.  Non-call
> -  // exceptions mean we could throw in the middle of the block, so just
> -  // punt for now on those.
> -  if (r.varying_p () && m_non_null.non_null_deref_p (name, bb, false) &&
> -  !cfun->can_throw_non_call_exceptions)
> -r = range_nonzero (TREE_TYPE (name));
> +  m_non_null.adjust_range (r, name, bb, false);
>  }
>
>  // Get the range of NAME as it occurs on exit from block BB.
> diff --git a/gcc/gimple-range-cache.h b/gcc/gimple-range-cache.h
> index ecf63dc01b3..f842e9c092a 100644
> --- a/gcc/gimple-range-cache.h
> +++ b/gcc/gimple-range-cache.h
> @@ -34,6 +34,8 @@ public:
>non_null_ref ();
>~non_null_ref ();
>bool non_null_deref_p (tree name, basic_block bb, bool search_dom = true);
> +  bool adjust_range (irange &r, tree name, basic_block bb,
> +bool search_dom = true);
>  private:
>vec  m_nn;
>void process_name (tree name);
> diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
> index 1851339c528..b210787d0b7 100644
> --- a/gcc/gimple-range.cc
> +++ b/gcc/gimple-range.cc
> @@ -69,9 +69,7 @@ gimple_ranger::range_of_expr (irange &r, tree expr, gimple 
> *stmt)
>if (def_stmt && gimple_bb (def_stmt) == bb)
>  {
>range_of_stmt (r, def_stmt, expr);
>

Re: sync up new type indices for body adjustments

2021-07-21 Thread Martin Jambor

Hi,

On Wed, Jul 21 2021, Alexandre Oliva wrote:
> On Jul 19, 2021, Martin Jambor  wrote:
>
>> So I would first check how come that you request IPA_PARAM_OP_COPY of
>> something that does not seem to have a corresponding type but there is
>> a DECL
>
> The corresponding type is there all right, it was just stored in a
> different vector entry, because some IPA optimization, applied after my
> copying-and-wrapping pass, dropped several of the parms that came before
> a NEW parms added by my pass.
>
> This caused the types of the retained NEW parms to be pushed into lower
> indices in the type array, but then accessed as if all of the dropped
> parms were still there.  That can't be right.
>
> I was actually lucky that enough parms were dropped as to make the
> vector access out of range, flagged by checking.  If that wasn't the
> case, we might have silently accessed an unrelated parm type.
>
>
> Does this scenario make sense to you?
>
> I can try to get you some code for a custom pass to trigger the problem
> if you'd like to look more closely.
>
>> If you believe that what you're doing is correct
>
> I don't really know that it is.  IIRC back when I ran into this problem,
> the logic to change some of the parameters in the wrapped function to
> reference types was using NEW parameters.  Now I'm using COPY, save for
> actual NEW parms, and changing the type of the clone after
> create_version_clone_with_body.
>
> Now, what puzzles me is why we even care about that parm mapping
> afterwards.  The clone is created and materialized very early on, before
> any preexisting ipa transformations, and there were not any edges
> modified to use this clone.  As far as I'm concerned, it should be
> entirely independent from the function it was cloned from, and it makes
> no sense to me for IPA transformations applied to this clone to even
> care what the function it was originally cloned from was: the clone is
> already fully materialized, so argument back-mappings might as well stop
> at it.
>
> But I can't say I understand why it does that.  I haven't looked very
> much into its internals, I'm mostly just trying to use
> create_version_clone_with_body to clone a function, make some changes to
> it, and turn the original function into a wrapper.
>
> I'm not actually introducing IPA deferred transformations, and this is
> all done before any relevant IPA transformations.  I can't even say I'm
> using IPA proper, the reason I made it an IPA pass was because that has
> enabled multiple passes over functions, which was convenient for some
> purposes.  Then, I ended up iterating over aliases and undefined
> functions, and relying on the call graph instead of iterating over
> gimple bodies for some purposes, so now it *has* to be an IPA pass, but
> not a typical one in that it doesn't queue up IPA transformations to be
> applied at a later materialization.

So if I understand correctly, you clone during early tree optimizations
(or early-small-IPA passes) or even earlier, and yet somehow these
confuse clone materialization when it applies IPA modifications to
parameters.  I agree that should not be happening.

I cannot see how this can happen.  IPA-split and omp-simd also use
create_version_clone_with_body with parameter modifications and do not
cause this problem (and I have seen many interactions between ipa-split
and later IPA passes when debugging various issues).  Having said that,
these passes either act on fairly simple functions and/or do not do
sophisticated parameter modifications, so I would not be bugs when doing
them.

I am interested in making the infrastructure work for you, but at the
moment I unfortunately do not have an idea what the problem you are
facing might be.

Martin

Re: [PATCH 10/55] rs6000: Main function with stubs for parsing and output

2021-07-21 Thread Segher Boessenkool

On Tue, Jul 20, 2021 at 08:51:58PM -0500, Bill Schmidt wrote:
> On 7/20/21 6:22 PM, Segher Boessenkool wrote:
> >On Tue, Jul 20, 2021 at 05:19:54PM -0500, Bill Schmidt wrote:
> >>See the main function.  All three files are guaranteed to have been
> >>opened for writing when this is called, but some of them may have
> >>already been closed.  So the fclose calls may fail to do anything, but
> >>the unlinks will always delete the output files. This is done to avoid
> >>leaving garbage lying around after a parsing failure.
> >That is much worse actually!  From the C spec:
> >   The value of a pointer to a FILE object is indeterminate after the
> >   associated file is closed
> >so this is undefined behaviour.
> >
> >Please fix that?  Just assign 0 after closing, and guard the fclose on
> >error with that?
> 
> No, you're misunderstanding.
> 
> unlink doesn't use a pointer to a FILE object.  It takes a string 
> representing the path and deletes that name from the filesystem. If 
> nobody has the file open, the file is then deleted.

Ah, "the fclose calls may fail to do anything" confused me.  That should
never happen (it can get an error, maybe you meant that?)

> In this case the files are all always closed before unlink is called.  
> The names are removed from the filesystem, and the files are deleted.  
> If somehow the file managed to remain open (really impossible), the file 
> would not be deleted, but the name would be.  No undefined behavior.

Calling fclose on the same FILE * twice is UB.  You said you do that,
but that is probably not true?


Segher

Re: [NEWS] libstdc++: Fix testsuite for skipping gdb tests on remote/non-native target

2021-07-21 Thread Jonathan Wakely via Gcc-patches

On Wed, 21 Jul 2021 at 16:02, Marc Poulhies via Libstdc++
 wrote:
>
> With the correct patch attached, sorry for the incorrect previous one !

Thanks for the patch. I agree we should skip the version checks, not
only the actual tests. But I wonder whether we want to do that in
xmethods.exp and prettyprinters.exp rather than in the gdb_batch_check
proc. Or maybe like this instead:

--- a/libstdc++-v3/testsuite/lib/gdb-test.exp
+++ b/libstdc++-v3/testsuite/lib/gdb-test.exp
@@ -280,6 +280,8 @@ proc gdb_batch_check {command pattern} {
# but not earlier versions.
# Return 1 if the version is ok, 0 otherwise.
proc gdb_version_check {} {
+if { ![isnative] || [is_remote target] } { return 0 }
+
return [gdb_batch_check "python print(gdb.lookup_global_symbol)" \
 ""]
}
@@ -288,6 +290,8 @@ proc gdb_version_check {} {
# in a manner similar to the check for a version of gdb which supports the
# pretty-printer tests below.
proc gdb_version_check_xmethods {} {
+if { ![isnative] || [is_remote target] } { return 0 }
+
return [gdb_batch_check \
 "python import gdb.xmethod; print(gdb.xmethod.XMethod)" \
 ""]

I don't think it really makes much difference, I'm just unsure what is
"cleaner" and more consistent with DG conventions and/or the rest of
the gdb-test.exp file.

Re: [PATCH] x86: Remove OPTION_MASK_ISA_SSE4_2 from CRC32 _builtin functions

2021-07-21 Thread Uros Bizjak via Gcc-patches

V sre., 21. jul. 2021 14:23 je oseba H.J. Lu  napisala:

> Since
>
> commit 39671f87b2df6a1894cc11a161e4a7949d1ddccd
> Author: H.J. Lu 
> Date:   Thu Apr 15 05:59:48 2021 -0700
>
> x86: Use crc32 target option for CRC32 intrinsics
>
> enabled OPTION_MASK_ISA_CRC32 for -msse4 and removed TARGET_SSE4_2 check
> in sse4_2_crc32 pattens, remove OPTION_MASK_ISA_SSE4_2 from CRC32
> _builtin functions.
>
> gcc/
>
> PR target/101549
> * config/i386/i386-builtin.def: Remove OPTION_MASK_ISA_SSE4_2
> from CRC32 _builtin functions.
>
> gcc/testsuite/
>
> PR target/101549
> * gcc.target/i386/crc32-6.c: New test.
>

OK.

Thanks,
Uros.

---
>  gcc/config/i386/i386-builtin.def|  8 
>  gcc/testsuite/gcc.target/i386/crc32-6.c | 13 +
>  2 files changed, 17 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/crc32-6.c
>
> diff --git a/gcc/config/i386/i386-builtin.def
> b/gcc/config/i386/i386-builtin.def
> index 1cc0cc6968c..4b1ae0eb84c 100644
> --- a/gcc/config/i386/i386-builtin.def
> +++ b/gcc/config/i386/i386-builtin.def
> @@ -970,10 +970,10 @@ BDESC (OPTION_MASK_ISA_SSE4_1, 0,
> CODE_FOR_sse4_1_ptestv2di, "__builtin_ia32_pte
>
>  /* SSE4.2 */
>  BDESC (OPTION_MASK_ISA_SSE4_2, 0, CODE_FOR_nothing,
> "__builtin_ia32_pcmpgtq", IX86_BUILTIN_PCMPGTQ, UNKNOWN, (int)
> V2DI_FTYPE_V2DI_V2DI)
> -BDESC (OPTION_MASK_ISA_SSE4_2 | OPTION_MASK_ISA_CRC32, 0,
> CODE_FOR_sse4_2_crc32qi, "__builtin_ia32_crc32qi", IX86_BUILTIN_CRC32QI,
> UNKNOWN, (int) UINT_FTYPE_UINT_UCHAR)
> -BDESC (OPTION_MASK_ISA_SSE4_2 | OPTION_MASK_ISA_CRC32, 0,
> CODE_FOR_sse4_2_crc32hi, "__builtin_ia32_crc32hi", IX86_BUILTIN_CRC32HI,
> UNKNOWN, (int) UINT_FTYPE_UINT_USHORT)
> -BDESC (OPTION_MASK_ISA_SSE4_2 | OPTION_MASK_ISA_CRC32, 0,
> CODE_FOR_sse4_2_crc32si, "__builtin_ia32_crc32si", IX86_BUILTIN_CRC32SI,
> UNKNOWN, (int) UINT_FTYPE_UINT_UINT)
> -BDESC (OPTION_MASK_ISA_SSE4_2 | OPTION_MASK_ISA_CRC32 |
> OPTION_MASK_ISA_64BIT, 0, CODE_FOR_sse4_2_crc32di,
> "__builtin_ia32_crc32di", IX86_BUILTIN_CRC32DI, UNKNOWN, (int)
> UINT64_FTYPE_UINT64_UINT64)
> +BDESC (OPTION_MASK_ISA_CRC32, 0, CODE_FOR_sse4_2_crc32qi,
> "__builtin_ia32_crc32qi", IX86_BUILTIN_CRC32QI, UNKNOWN, (int)
> UINT_FTYPE_UINT_UCHAR)
> +BDESC (OPTION_MASK_ISA_CRC32, 0, CODE_FOR_sse4_2_crc32hi,
> "__builtin_ia32_crc32hi", IX86_BUILTIN_CRC32HI, UNKNOWN, (int)
> UINT_FTYPE_UINT_USHORT)
> +BDESC (OPTION_MASK_ISA_CRC32, 0, CODE_FOR_sse4_2_crc32si,
> "__builtin_ia32_crc32si", IX86_BUILTIN_CRC32SI, UNKNOWN, (int)
> UINT_FTYPE_UINT_UINT)
> +BDESC (OPTION_MASK_ISA_CRC32 | OPTION_MASK_ISA_64BIT, 0,
> CODE_FOR_sse4_2_crc32di, "__builtin_ia32_crc32di", IX86_BUILTIN_CRC32DI,
> UNKNOWN, (int) UINT64_FTYPE_UINT64_UINT64)
>
>  /* SSE4A */
>  BDESC (OPTION_MASK_ISA_SSE4A, 0, CODE_FOR_sse4a_extrqi,
> "__builtin_ia32_extrqi", IX86_BUILTIN_EXTRQI, UNKNOWN, (int)
> V2DI_FTYPE_V2DI_UINT_UINT)
> diff --git a/gcc/testsuite/gcc.target/i386/crc32-6.c
> b/gcc/testsuite/gcc.target/i386/crc32-6.c
> new file mode 100644
> index 000..464e3444069
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/crc32-6.c
> @@ -0,0 +1,13 @@
> +/* PR target/101549 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse4 -mno-crc32" } */
> +
> +#include 
> +
> +unsigned int
> +test_mm_crc32_u8 (unsigned int CRC, unsigned char V)
> +{
> +  return _mm_crc32_u8 (CRC, V);
> +}
> +
> +/* { dg-error "needs isa option -mcrc32" "" { target *-*-* } 0  } */
> --
> 2.31.1
>
>

Re: [PATCH 10/55] rs6000: Main function with stubs for parsing and output

2021-07-21 Thread Bill Schmidt via Gcc-patches


On 7/21/21 10:43 AM, Segher Boessenkool wrote:

On Tue, Jul 20, 2021 at 08:51:58PM -0500, Bill Schmidt wrote:

On 7/20/21 6:22 PM, Segher Boessenkool wrote:

On Tue, Jul 20, 2021 at 05:19:54PM -0500, Bill Schmidt wrote:

See the main function.  All three files are guaranteed to have been
opened for writing when this is called, but some of them may have
already been closed.  So the fclose calls may fail to do anything, but
the unlinks will always delete the output files. This is done to avoid
leaving garbage lying around after a parsing failure.

That is much worse actually!  From the C spec:
   The value of a pointer to a FILE object is indeterminate after the
   associated file is closed
so this is undefined behaviour.

Please fix that?  Just assign 0 after closing, and guard the fclose on
error with that?

No, you're misunderstanding.

unlink doesn't use a pointer to a FILE object.  It takes a string
representing the path and deletes that name from the filesystem. If
nobody has the file open, the file is then deleted.

Ah, "the fclose calls may fail to do anything" confused me.  That should
never happen (it can get an error, maybe you meant that?)


In this case the files are all always closed before unlink is called.
The names are removed from the filesystem, and the files are deleted.
If somehow the file managed to remain open (really impossible), the file
would not be deleted, but the name would be.  No undefined behavior.

Calling fclose on the same FILE * twice is UB.  You said you do that,
but that is probably not true?


That is unfortunately true.  I guess I'll have to track which files have 
been closed, or otherwise make this cleaner.  I had misremembered that 
duplicate fclose was ignored. :/


Bill




Segher

Re: [PATCH 10/55] rs6000: Main function with stubs for parsing and output

2021-07-21 Thread Bill Schmidt via Gcc-patches




On 7/21/21 11:08 AM, Bill Schmidt wrote:

On 7/21/21 10:43 AM, Segher Boessenkool wrote:

On Tue, Jul 20, 2021 at 08:51:58PM -0500, Bill Schmidt wrote:

On 7/20/21 6:22 PM, Segher Boessenkool wrote:

On Tue, Jul 20, 2021 at 05:19:54PM -0500, Bill Schmidt wrote:

See the main function.  All three files are guaranteed to have been
opened for writing when this is called, but some of them may have
already been closed.  So the fclose calls may fail to do anything, but
the unlinks will always delete the output files. This is done to avoid
leaving garbage lying around after a parsing failure.

That is much worse actually!  From the C spec:
The value of a pointer to a FILE object is indeterminate after the
associated file is closed
so this is undefined behaviour.

Please fix that?  Just assign 0 after closing, and guard the fclose on
error with that?

No, you're misunderstanding.

unlink doesn't use a pointer to a FILE object.  It takes a string
representing the path and deletes that name from the filesystem. If
nobody has the file open, the file is then deleted.

Ah, "the fclose calls may fail to do anything" confused me.  That should
never happen (it can get an error, maybe you meant that?)


In this case the files are all always closed before unlink is called.
The names are removed from the filesystem, and the files are deleted.
If somehow the file managed to remain open (really impossible), the file
would not be deleted, but the name would be.  No undefined behavior.

Calling fclose on the same FILE * twice is UB.  You said you do that,
but that is probably not true?

That is unfortunately true.  I guess I'll have to track which files have
been closed, or otherwise make this cleaner.  I had misremembered that
duplicate fclose was ignored. :/


I'll just move all the fclose calls to the end and avoid the problem.

Bill



Bill



Segher

[committed] libstdc++: Make __gnu_cxx::sequence_buffer move-aware [PR101542]

2021-07-21 Thread Jonathan Wakely via Gcc-patches

The PR explains that Clang trunk now selects a different constructor
when a non-const sequence_buffer is returned in a context where it
qualifies as an implicitly-movable entity. Because lookup is first
performed using an rvalue, the sequence_buffer(const sequence_buffer&)
constructor gets chosen, which makes a copy instead of a "pseudo-move"
via the sequence_buffer(sequence_buffer&) constructor. The problem isn't
seen with GCC because as noted in the r11-2412 commit log, GCC actually
implements a slightly modified rule that avoids breaking exactly this
type of code.

This patch adds a move constructor to sequence_buffer, so that implicit
or explicit moves will have the same effect, calling the
sequence_buffer(sequence_buffer&) constructor. A move assignment
operator is also added to make move assignment work similarly.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101542
* include/ext/rope (sequence_buffer): Add move constructor and
move assignment operator.
* testsuite/ext/rope/101542.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

commit 8edb61420502c62fa2cccdd98876a9aa039b72a6
Author: Jonathan Wakely 
Date:   Wed Jul 21 15:29:19 2021

libstdc++: Make __gnu_cxx::sequence_buffer move-aware [PR101542]

The PR explains that Clang trunk now selects a different constructor
when a non-const sequence_buffer is returned in a context where it
qualifies as an implicitly-movable entity. Because lookup is first
performed using an rvalue, the sequence_buffer(const sequence_buffer&)
constructor gets chosen, which makes a copy instead of a "pseudo-move"
via the sequence_buffer(sequence_buffer&) constructor. The problem isn't
seen with GCC because as noted in the r11-2412 commit log, GCC actually
implements a slightly modified rule that avoids breaking exactly this
type of code.

This patch adds a move constructor to sequence_buffer, so that implicit
or explicit moves will have the same effect, calling the
sequence_buffer(sequence_buffer&) constructor. A move assignment
operator is also added to make move assignment work similarly.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101542
* include/ext/rope (sequence_buffer): Add move constructor and
move assignment operator.
* testsuite/ext/rope/101542.cc: New test.

diff --git a/libstdc++-v3/include/ext/rope b/libstdc++-v3/include/ext/rope
index 81e4f23708f..9681dbc6225 100644
--- a/libstdc++-v3/include/ext/rope
+++ b/libstdc++-v3/include/ext/rope
@@ -203,6 +203,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
std::copy(__x._M_buffer, __x._M_buffer + __x._M_buf_count, _M_buffer);
   }
   
+  // Non-const "copy" modifies the parameter - yuck
   sequence_buffer(sequence_buffer& __x)
   {
__x.flush();
@@ -213,6 +214,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   sequence_buffer(_Sequence& __s)
   : _M_prefix(&__s), _M_buf_count(0) { }
   
+  // Non-const "copy" modifies the parameter - yuck
   sequence_buffer&
   operator=(sequence_buffer& __x)
   {
@@ -230,7 +232,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
std::copy(__x._M_buffer, __x._M_buffer + __x._M_buf_count, _M_buffer);
return *this;
   }
-  
+
+#if __cplusplus >= 201103L
+  sequence_buffer(sequence_buffer&& __x) : sequence_buffer(__x) { }
+  sequence_buffer& operator=(sequence_buffer&& __x) { return *this = __x; }
+#endif
+
   void
   push_back(value_type __x)
   {
diff --git a/libstdc++-v3/testsuite/ext/rope/101542.cc 
b/libstdc++-v3/testsuite/ext/rope/101542.cc
new file mode 100644
index 000..e89f23d3d48
--- /dev/null
+++ b/libstdc++-v3/testsuite/ext/rope/101542.cc
@@ -0,0 +1,27 @@
+// { dg-do run { target c++11 } }
+// PR libstdc++/101542
+#include 
+#include 
+
+template T f(T x) { return x; }
+template T g(T x) { return std::move(x); }
+
+int main()
+{
+  std::string s;
+  {
+__gnu_cxx::sequence_buffer a(s);
+{
+  __gnu_cxx::sequence_buffer b = std::move(a);
+  b.push_back('h');
+  b.push_back('e');
+  b.push_back('l');
+  b.push_back('l');
+  b.push_back('o');
+
+  __gnu_cxx::sequence_buffer c;
+  c = f(g((std::move(b;
+}
+  }
+  VERIFY( s == "hello" );
+}

RE: [PATCH libatomic/arm] avoid warning on constant addresses (PR 101379)

2021-07-21 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Martin Sebor
> via Gcc-patches
> Sent: 10 July 2021 00:11
> To: gcc-patches ; Christophe Lyon
> 
> Subject: [PATCH libatomic/arm] avoid warning on constant addresses (PR
> 101379)
> 
> The attached tweak avoids the new -Warray-bounds instances when
> building libatomic for arm. Christophe confirms it resolves
> the problem (thank you!)
> 
> As we have discussed, the main goal of this class of warnings
> is to detect accesses at addresses derived from null pointers
> (e.g., to struct members or array elements at a nonzero offset).
> Diagnosing accesses at hardcoded addresses is incidental because
> at the stage they are detected the two are not distinguishable
> from each another.
> 
> I'm planning (hoping) to implement detection of invalid pointer
> arithmetic involving null for GCC 12, so this patch is a stopgap
> solution to unblock the arm libatomic build without compromising
> the warning.  Once the new detection is in place these workarounds
> can be removed or replaced with something more appropriate (e.g.,
> declaring the objects at the hardwired addresses with an attribute
> like AVR's address or io; that would enable bounds checking at
> those addresses as well).

Let's get this patch in to unbreak bootstrap while the discussion on how to 
avoid these workarounds continues...
So ok.
Thanks,
Kyrill

> 
> Martin

Re: [PATCH] PR fortran/101514 - ICE: out of memory allocating 18446744073709551600 bytes

2021-07-21 Thread Tobias Burnus


On 20.07.21 21:49, Harald Anlauf via Gcc-patches wrote:


While investigating one of Gerhard's latest bug reports, which was almost
obvious to fix after a hint by Richard Biener, I found further variants of
valid and invalid code that lead to either NULL pointer dereferences or
similar OOM situations.

Regtested on x86_64-pc-linux-gnu.  OK for mainline / 11-branch?


LGTM – thanks!

Tobias


Fortran: ICE, OOM while calculating sizes of derived type array components

gcc/fortran/ChangeLog:

  PR fortran/101514
  * target-memory.c (gfc_interpret_derived): Size of array component
  of derived type can only be computed here for explicit size.
  * trans-types.c (gfc_get_nodesc_array_type): Do not dereference
  NULL pointers.

gcc/testsuite/ChangeLog:

  PR fortran/101514
  * gfortran.dg/pr101514.f90: New test.


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: [PATCH libatomic/arm] avoid warning on constant addresses (PR 101379)

2021-07-21 Thread Martin Sebor via Gcc-patches


On 7/21/21 10:41 AM, Kyrylo Tkachov wrote:




-Original Message-
From: Gcc-patches  On Behalf Of Martin Sebor
via Gcc-patches
Sent: 10 July 2021 00:11
To: gcc-patches ; Christophe Lyon

Subject: [PATCH libatomic/arm] avoid warning on constant addresses (PR
101379)

The attached tweak avoids the new -Warray-bounds instances when
building libatomic for arm. Christophe confirms it resolves
the problem (thank you!)

As we have discussed, the main goal of this class of warnings
is to detect accesses at addresses derived from null pointers
(e.g., to struct members or array elements at a nonzero offset).
Diagnosing accesses at hardcoded addresses is incidental because
at the stage they are detected the two are not distinguishable
from each another.

I'm planning (hoping) to implement detection of invalid pointer
arithmetic involving null for GCC 12, so this patch is a stopgap
solution to unblock the arm libatomic build without compromising
the warning.  Once the new detection is in place these workarounds
can be removed or replaced with something more appropriate (e.g.,
declaring the objects at the hardwired addresses with an attribute
like AVR's address or io; that would enable bounds checking at
those addresses as well).


Let's get this patch in to unbreak bootstrap while the discussion on how to 
avoid these workarounds continues...
So ok.


I just pushed it in r12-2438.

Martin


Thanks,
Kyrill



Martin

Re: [PATCH 20/55] rs6000: Write output to the builtins init file, part 3 of 3

2021-07-21 Thread Segher Boessenkool

Hi!

On Thu, Jun 17, 2021 at 10:19:04AM -0500, Bill Schmidt wrote:
> 2021-06-15  Bill Schmidt  
> 
> gcc/
>   * config/rs6000/rs6000-gen-builtins.c (typemap): New struct.
>   (TYPE_MAP_SIZE): New macro.
>   (type_map): New initialized variable.
>   (map_token_to_type_node): New function.
>   (write_type_node): Likewise.
>   (write_fntype_init): Implement.

> +/* Look up TOK in the type map and return the corresponding string used
> +   to build the type node.  */

There is a standard "bsearch" function ;-)

> +  /* Avoid side effects of strtok on the original string by using a copy.  */
> +  char *buf = (char *) malloc (strlen (str) + 1);
> +  strcpy (buf, str);

libiberty has xstrdup (and it can also be done using your new best
friend asprintf of course ;-) )

Okay for trunk with or without such improvements.  Thanks!


Segher

Re: [PATCH 21/55] rs6000: Write static initializations for built-in table

2021-07-21 Thread Segher Boessenkool

Hi!

On Thu, Jun 17, 2021 at 10:19:05AM -0500, Bill Schmidt wrote:
> 2021-06-07  Bill Schmidt  
> 
> gcc/
>   * config/rs6000/rs6000-gen-builtins.c (write_bif_static_init): New
>   function.
>   (write_init_file): Call write_bif_static_init.

> +  for (int j = 0; j < 3; j++)
> + res[j] = (bifp->proto.restr_opnd[j] == 0 ? "RES_NONE"
> +   : (bifp->proto.restr[j] == RES_BITS ? "RES_BITS"
> +  : (bifp->proto.restr[j] == RES_RANGE ? "RES_RANGE"
> + : (bifp->proto.restr[j] == RES_VALUES ? "RES_VALUES"
> +: (bifp->proto.restr[j] == RES_VAR_RANGE
> +   ? "RES_VAR_RANGE" : "ERROR");

The unnecessary parens make this harder to read.

Having ? on the same line as the condition but : on another is not
normal style.

Some "if"s would be more readable anyway?


Okay for trunk.  Thanks!


Segher

Re: [PATCH, Fortran] [PR libfortran/101317] Bind(c): Improve error checking in CFI_* functions

2021-07-21 Thread Tobias Burnus


On 17.07.21 02:49, Sandra Loosemore wrote:


This patch is for PR101317, one of the bugs uncovered by the TS29113
testsuite.  Here I'd observed that CFI_establish, etc was not
diagnosing some invalid-argument situations documented in the
standard, although it was properly catching others.  After fixing
those I discovered a couple small mistakes in the test cases and fixed
those too.


Some first comments – I think I have to read though the file
ISO_Fortran_binding.c itself and not only your patch.


--- a/libgfortran/runtime/ISO_Fortran_binding.c
+++ b/libgfortran/runtime/ISO_Fortran_binding.c
@@ -232,7 +232,16 @@ CFI_allocate (CFI_cdesc_t *dv, const CFI_index_t 
lower_bounds[],
/* If the type is a Fortran character type, the descriptor's element
   length is replaced by the elem_len argument. */
if (dv->type == CFI_type_char || dv->type == CFI_type_ucs4_char)
-dv->elem_len = elem_len;
+{
+  if (unlikely (compile_options.bounds_check) && elem_len == 0)
+ {
+   fprintf ("CFI_allocate: The supplied elem_len must be "
+"greater than zero (elem_len = %d).\n",
+(int) elem_len);


I think there is no need to use '(elem_len = %d)' given that it is always zero 
as stated in the error message itself.

(Appears twice)

However, the check itself is also wrong – cf. below.

 * * *

Talking about CFI_allocatable, there is also another bug in that function,
untouched by your patch:

 /* If the type is a character, the descriptor's element length is replaced
 by the elem_len argument. */
  if (dv->type == CFI_type_char || dv->type == CFI_type_ucs4_char ||
  dv->type == CFI_type_signed_char)
dv->elem_len = elem_len;

The bug is that CFI_type_signed_char is not a character type.


+  else if (unlikely (compile_options.bounds_check)
+&& type < 0)

Pointless line break.

+   fprintf (stderr, "CFI_establish: Extents must be nonnegative "
+"(extents[%d] = %d).\n", i, (int)extents[i]);
+   return CFI_INVALID_EXTENT;
+ }


How about PRIiPTR + ptrdiff_t instead of %d + (int) cast? At least as
positive value, extent may exceed INT_MAX.

(Twice)


if (result->type == CFI_type_char || result->type == CFI_type_ucs4_char)
-result->elem_len = elem_len;
+{
+  if (unlikely (compile_options.bounds_check) && elem_len == 0)
+ {
+   fprintf ("CFI_select_part: The supplied elem_len must be "
+"greater than zero (elem_len = %d).\n",
+(int) elem_len);


What's wrong with  ["", ""]? Or with:
  character(len=:), allocatable :: str2(:)
  str2 = [str1(5:4)]
both are len(...) == 0 arrays with 1 or 2 elements.


+   if (source->attribute == CFI_attribute_other
+   && source->rank > 0
+   && source->dim[source->rank - 1].extent == -1)
+ {
+   fprintf (stderr, "CFI_setpointer: The source is a "
+"nonallocatable nonpointer object that is an "
+"assumed-size array.\n");


I think you could just check for assumed rank – without
CFI_attribute_other in the 'if' and 'nonallocatable nonpointer' in the
error message. Only nonallocatable nonpointer variables can be of
assumed size (in Fortran); I think that makes the message simpler
(focusing on the issue) and if the C user passes an allocatable/pointer,
which is assumed rank, it is also a bug.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: [PATCH 22/55] rs6000: Write static initializations for overload tables

2021-07-21 Thread Segher Boessenkool

On Thu, Jun 17, 2021 at 10:19:06AM -0500, Bill Schmidt wrote:
> 2021-06-07  Bill Schmidt  
> 
> gcc/
>   * config/rs6000/rs6000-gen-builtins.c (write_ovld_static_init): New
>   function.
>   (write_init_file): Call write_ovld_static_init.

Okay for trunk.  Thanks!


Segher

Re: [PATCH] [DWARF] Fix hierarchy of debug information for offload kernels.

2021-07-21 Thread Hafiz Abid Qadeer

On 19/07/2021 17:41, Richard Biener wrote:
> On July 19, 2021 6:13:40 PM GMT+02:00, Hafiz Abid Qadeer 
>  wrote:
>> On 19/07/2021 11:45, Richard Biener wrote:
>>> On Fri, Jul 16, 2021 at 10:23 PM Hafiz Abid Qadeer
>>>  wrote:

 On 15/07/2021 13:09, Richard Biener wrote:
> On Thu, Jul 15, 2021 at 12:35 PM Hafiz Abid Qadeer
>  wrote:
>>
>> On 15/07/2021 11:33, Thomas Schwinge wrote:
>>>
 Note that the "parent" should be abstract but I don't think
>> dwarf has a
 way to express a fully abstract parent of a concrete instance
>> child - or
 at least how GCC expresses this causes consumers to
>> "misinterpret"
 that.  I wonder if adding a DW_AT_declaration to the late DWARF
 emitted "parent" would fix things as well here?
>>>
>>> (I suppose not, Abid?)
>>>
>>
>> Yes, adding DW_AT_declaration does not fix the problem.
>
> Does emitting
>
> DW_TAG_compile_unit
>   DW_AT_name("")
>
>   DW_TAG_subprogram // notional parent function (foo) with no code
>> range
> DW_AT_declaration 1
> a:DW_TAG_subprogram // offload function foo._omp_fn.0
>   DW_AT_declaration 1
>
>   DW_TAG_subprogram // offload function
>   DW_AT_abstract_origin a
> ...
>
> do the trick?  The following would do this, flattening function
>> definitions
> for the concrete copies:
>
> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
> index 82783c4968b..a9c8bc43e88 100644
> --- a/gcc/dwarf2out.c
> +++ b/gcc/dwarf2out.c
> @@ -6076,6 +6076,11 @@ maybe_create_die_with_external_ref (tree
>> decl)
>/* Peel types in the context stack.  */
>while (ctx && TYPE_P (ctx))
>  ctx = TYPE_CONTEXT (ctx);
> +  /* For functions peel the context up to namespace/TU scope.  The
>> abstract
> + copies reveal the true nesting.  */
> +  if (TREE_CODE (decl) == FUNCTION_DECL)
> +while (ctx && TREE_CODE (ctx) == FUNCTION_DECL)
> +  ctx = DECL_CONTEXT (ctx);
>/* Likewise namespaces in case we do not want to emit DIEs for
>> them.  */
>if (debug_info_level <= DINFO_LEVEL_TERSE)
>  while (ctx && TREE_CODE (ctx) == NAMESPACE_DECL)
> @@ -6099,8 +6104,7 @@ maybe_create_die_with_external_ref (tree
>> decl)
> /* Leave function local entities parent determination to
>> when
>we process scope vars.  */
> ;
> -  else
> -   parent = lookup_decl_die (ctx);
> +  parent = lookup_decl_die (ctx);
>  }
>else
>  /* In some cases the FEs fail to set DECL_CONTEXT properly.
>

 Thanks. This solves the problem. Only the first hunk was required.
>> Second hunk
 actually causes an ICE when TREE_CODE (ctx) == BLOCK.
 OK to commit the attached patch?
>>>
>>> I think we need to merge the TYPE_P peeling and FUNCTION_DECL peeling
>> into
>>> one loop since I suppose we can have a nested function in class
>> scope.
>>> So sth like
>>>
>>> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
>>> index 82783c4968b..61228410b51 100644
>>> --- a/gcc/dwarf2out.c
>>> +++ b/gcc/dwarf2out.c
>>> @@ -6073,8 +6073,12 @@ maybe_create_die_with_external_ref (tree decl)
>>>  }
>>>else
>>>  ctx = DECL_CONTEXT (decl);
>>> -  /* Peel types in the context stack.  */
>>> -  while (ctx && TYPE_P (ctx))
>>> +  /* Peel types in the context stack.  For functions peel the
>> context up
>>> + to namespace/TU scope.  The abstract copies reveal the true
>> nesting.  */
>>> +  while (ctx
>>> +&& (TYPE_P (ctx)
>>> +|| (TREE_CODE (decl) == FUNCTION_DECL
>>> +&& TREE_CODE (ctx) == FUNCTION_DECL)))
>>>  ctx = TYPE_CONTEXT (ctx);
>>>/* Likewise namespaces in case we do not want to emit DIEs for
>> them.  */
>>>if (debug_info_level <= DINFO_LEVEL_TERSE)
>>>
>> This causes an ICE,
>> internal compiler error: tree check: expected class 'type', have
>> 'declaration' (function_decl)
>>
>> Did you intend something like this:
>>
>> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
>> index 561f8b23517..c61f0041fba 100644
>> --- a/gcc/dwarf2out.c
>> +++ b/gcc/dwarf2out.c
>> @@ -6121,3 +6121,8 @@ maybe_create_die_with_external_ref (tree decl)
>> -  /* Peel types in the context stack.  */
>> -  while (ctx && TYPE_P (ctx))
>> -ctx = TYPE_CONTEXT (ctx);
>> +  /* Peel types in the context stack.  For functions peel the context
>> up
>> + to namespace/TU scope.  The abstract copies reveal the true
>> nesting.  */
>> +  while (ctx
>> +   && (TYPE_P (ctx)
>> +   || (TREE_CODE (decl) == FUNCTION_DECL
>> +   && TREE_CODE (ctx) == FUNCTION_DECL)))
>> +ctx = TYPE_P (ctx) ? TYPE_CONTEXT (ctx) : DECL_CONTEXT (ctx);
>> +
> 
> Yes, of course. 
> 
>>
>>> if that works it's OK.  Can you run it on the gdb testsuite with
>> -flto added
>>> as well please (you need to do before/after comparison since I

Re: [PATCH, Fortran] [PR libfortran/101317] Bind(c): Improve error checking in CFI_* functions

2021-07-21 Thread Sandra Loosemore


On 7/21/21 11:26 AM, Tobias Burnus wrote:

On 17.07.21 02:49, Sandra Loosemore wrote:

This patch is for PR101317, one of the bugs uncovered by the TS29113 
testsuite.  Here I'd observed that CFI_establish, etc was not 
diagnosing some invalid-argument situations documented in the 
standard, although it was properly catching others.  After fixing 
those I discovered a couple small mistakes in the test cases and fixed 
those too.


Some first comments – I think I have to read though the file 
ISO_Fortran_binding.c itself and not only your patch.



--- a/libgfortran/runtime/ISO_Fortran_binding.c
+++ b/libgfortran/runtime/ISO_Fortran_binding.c
@@ -232,7 +232,16 @@ CFI_allocate (CFI_cdesc_t *dv, const CFI_index_t 
lower_bounds[],

    /* If the type is a Fortran character type, the descriptor's element
   length is replaced by the elem_len argument. */
    if (dv->type == CFI_type_char || dv->type == CFI_type_ucs4_char)
-    dv->elem_len = elem_len;
+    {
+  if (unlikely (compile_options.bounds_check) && elem_len == 0)
+    {
+  fprintf ("CFI_allocate: The supplied elem_len must be "
+   "greater than zero (elem_len = %d).\n",
+   (int) elem_len);


I think there is no need to use '(elem_len = %d)' given that it is 
always zero as stated in the error message itself.


Yeah, I could fix this.  I'd initially forgotten that elem_len was an 
unsigned type and was trying to test it by passing a negative value.  :-P




(Appears twice)

However, the check itself is also wrong – cf. below.


Hmmm.  CFI_establish explicitly says that the elem_len has to be greater 
than zero.  It seems somewhat confusing that it's inconsistent with the 
other functions that take an elem_len argument.



Talking about CFI_allocatable, there is also another bug in that function,
untouched by your patch:

  /* If the type is a character, the descriptor's element length is 
replaced

  by the elem_len argument. */
   if (dv->type == CFI_type_char || dv->type == CFI_type_ucs4_char ||
   dv->type == CFI_type_signed_char)
     dv->elem_len = elem_len;

The bug is that CFI_type_signed_char is not a character type.


Ha!  I noticed the same thing and already posted a separate patch for 
that.  :-P


https://gcc.gnu.org/pipermail/fortran/2021-July/056243.html


+  else if (unlikely (compile_options.bounds_check)
+   && type < 0)

Pointless line break.

+  fprintf (stderr, "CFI_establish: Extents must be nonnegative "
+   "(extents[%d] = %d).\n", i, (int)extents[i]);
+  return CFI_INVALID_EXTENT;
+    }


How about PRIiPTR + ptrdiff_t instead of %d + (int) cast? At least as 
positive value, extent may exceed INT_MAX.


Hmmm, there are similar problems in existing code in other functions in 
this file (e.g., CFI_section).



+  if (source->attribute == CFI_attribute_other
+  && source->rank > 0
+  && source->dim[source->rank - 1].extent == -1)
+    {
+  fprintf (stderr, "CFI_setpointer: The source is a "
+   "nonallocatable nonpointer object that is an "
+   "assumed-size array.\n");


I think you could just check for assumed rank – without 
CFI_attribute_other in the 'if' and 'nonallocatable nonpointer' in the 
error message. Only nonallocatable nonpointer variables can be of 
assumed size (in Fortran); I think that makes the message simpler 
(focusing on the issue) and if the C user passes an allocatable/pointer, 
which is assumed rank, it is also a bug.


The wording of the message reflects the language of the standard:
"source shall be a null pointer or the address of a C descriptor for an 
allocated allocatable object, a data pointer object, or a nonallocatable 
nonpointer data object that is not an assumed-size array.


-Sandra

[PATCH, committed] rs6000: Add int128 target check to pr101129.c (PR101531)

2021-07-21 Thread Bill Schmidt via Gcc-patches


Hi,

PR101531 observes that gcc.target/powerpc/pr191129.c fails on 32-bit 
targets.  I overlooked that the OP's test case has an __int128 
dependency.  This patch fixes the obvious oversight. Committed as 
obvious.  I plan to backport to 11, 10, and 9 once the 11.2 release is 
complete.


Thanks!
Bill


rs6000: Add int128 target check to pr101129.c (PR101531)

2021-07-21  Bill Schmidt  

gcc/testsuite/
PR target/101531
* gcc.target/powerpc/pr101129.c: Adjust.

diff --git a/gcc/testsuite/gcc.target/powerpc/pr101129.c 
b/gcc/testsuite/gcc.target/powerpc/pr101129.c
index 1abc12480e4..6b8e5a9b597 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr101129.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr101129.c
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-require-effective-target p8vector_hw } */
+/* { dg-require-effective-target int128 } */
 /* { dg-options "-mdejagnu-cpu=power8 -O " } */
 
 /* PR101129: The swaps pass was turning a mult-lopart into a mult-hipart.

Re: [PATCH 23/55] rs6000: Incorporate new builtins code into the build machinery

2021-07-21 Thread Segher Boessenkool

On Thu, Jun 17, 2021 at 10:19:07AM -0500, Bill Schmidt wrote:
> 2021-06-07  Bill Schmidt  
> 
> gcc/
>   * config.gcc (extra_objs): Include rs6000-builtins.o and
>   rs6000-c.o.

The rs6000-c.o part needs an explanation, and probably should be a
separate (bugfix) patch (and it needs backports?)

The changelog entry should read
* config.gcc (powerpc*-*-*): Include [...] in extra_objs.
or similar.

>   * config/rs6000/t-rs6000 (rs6000-gen-builtins.o): New target.
>   (rbtree.o): Likewise.
>   (rs6000-gen-builtins): Likewise.
>   (rs6000-builtins.c): Likewise.
>   (rs6000-builtins.h): Likewise.
>   (rs6000.o): Add dependency.
>   (EXTRA_HEADERS): Add rs6000-vecdefines.h.
>   (rs6000-vecdefines.h): New target.
>   (rs6000-builtins.o): Likewise.
>   (rs6000-call.o): Add rs6000-builtins.h as a dependency.
>   (rs6000-c.o): Likewise.

> +rs6000-gen-builtins.o: $(srcdir)/config/rs6000/rs6000-gen-builtins.c
> + $(COMPILE) $(CXXFLAGS) $<
> + $(POSTCOMPILE)
> +
> +rbtree.o: $(srcdir)/config/rs6000/rbtree.c
> + $(COMPILE) $<
> + $(POSTCOMPILE)

Why does one need CXXFLAGS and the other does not?

> +# TODO: Whenever GNU make 4.3 is the minimum required, we should use
> +# grouped targets on this:

That may be quite a while still.  GNU make is the foundation of
everything, so we cannot require very new versions of it ever.

In the meantime, you can make all these targets depend on an
intermediate target (that you mark with .INTERMEDIATE), and have that
intermediate target have the dependencies.  This is from version 3.74.3
and we require 3.80 already, so this is fine.

> +EXTRA_HEADERS += rs6000-vecdefines.h
> +rs6000-vecdefines.h : rs6000-builtins.c

No space before the colon please.

Segher

Re: [PATCH v3] Add QI vector mode support to by-pieces for memset

2021-07-21 Thread H.J. Lu via Gcc-patches

On Wed, Jul 21, 2021 at 7:50 AM Richard Sandiford
 wrote:
>
> "H.J. Lu"  writes:
> > diff --git a/gcc/builtins.c b/gcc/builtins.c
> > index 39ab139b7e1..1972301ce3c 100644
> > --- a/gcc/builtins.c
> > +++ b/gcc/builtins.c
> > @@ -3890,13 +3890,16 @@ expand_builtin_strnlen (tree exp, rtx target, 
> > machine_mode target_mode)
> >
> >  static rtx
> >  builtin_memcpy_read_str (void *data, void *, HOST_WIDE_INT offset,
> > -  scalar_int_mode mode)
> > +  fixed_size_mode mode)
> >  {
> >/* The REPresentation pointed to by DATA need not be a nul-terminated
> >   string but the caller guarantees it's large enough for MODE.  */
> >const char *rep = (const char *) data;
> >
> > -  return c_readstr (rep + offset, mode, /*nul_terminated=*/false);
> > +  /* NB: Vector mode in the by-pieces infrastructure is only used by
> > + the memset expander.  */
>
> Sorry to nitpick, but I guess this might get out out-of-date.  Maybe:
>
>   /* The by-pieces infrastructure does not try to pick a vector mode
>  for memcpy expansion.  */

Fixed.

> > +  return c_readstr (rep + offset, as_a  (mode),
> > + /*nul_terminated=*/false);
> >  }
> >
> >  /* LEN specify length of the block of memcpy/memset operation.
> > @@ -6478,14 +6481,16 @@ expand_builtin_stpncpy (tree exp, rtx)
> >
> >  rtx
> >  builtin_strncpy_read_str (void *data, void *, HOST_WIDE_INT offset,
> > -   scalar_int_mode mode)
> > +   fixed_size_mode mode)
> >  {
> >const char *str = (const char *) data;
> >
> >if ((unsigned HOST_WIDE_INT) offset > strlen (str))
> >  return const0_rtx;
> >
> > -  return c_readstr (str + offset, mode);
> > +  /* NB: Vector mode in the by-pieces infrastructure is only used by
> > + the memset expander.  */
>
> Similarly here for strncpy expansion.

Fixed.

> > +  return c_readstr (str + offset, as_a  (mode));
> >  }
> >
> >  /* Helper to check the sizes of sequences and the destination of calls
> > @@ -6686,30 +6691,117 @@ expand_builtin_strncpy (tree exp, rtx target)
> >return NULL_RTX;
> >  }
> >
> > -/* Callback routine for store_by_pieces.  Read GET_MODE_BITSIZE (MODE)
> > -   bytes from constant string DATA + OFFSET and return it as target
> > -   constant.  If PREV isn't nullptr, it has the RTL info from the
> > +/* Return the RTL of a register in MODE generated from PREV in the
> > previous iteration.  */
> >
> > -rtx
> > -builtin_memset_read_str (void *data, void *prevp,
> > -  HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
> > -  scalar_int_mode mode)
> > +static rtx
> > +gen_memset_value_from_prev (by_pieces_prev *prev, fixed_size_mode mode)
> >  {
> > -  by_pieces_prev *prev = (by_pieces_prev *) prevp;
> > +  rtx target = nullptr;
> >if (prev != nullptr && prev->data != nullptr)
> >  {
> >/* Use the previous data in the same mode.  */
> >if (prev->mode == mode)
> >   return prev->data;
> > +
> > +  fixed_size_mode prev_mode = prev->mode;
> > +
> > +  /* Don't use the previous data to write QImode if it is in a
> > +  vector mode.  */
> > +  if (VECTOR_MODE_P (prev_mode) && mode == QImode)
> > + return target;
> > +
> > +  rtx prev_rtx = prev->data;
> > +
> > +  if (REG_P (prev_rtx)
> > +   && HARD_REGISTER_P (prev_rtx)
> > +   && lowpart_subreg_regno (REGNO (prev_rtx), prev_mode, mode) < 0)
> > + {
> > +   /* If we can't put a hard register in MODE, first generate a
> > +  subreg of word mode if the previous mode is wider than word
> > +  mode and word mode is wider than MODE.  */
> > +   if (UNITS_PER_WORD < GET_MODE_SIZE (prev_mode)
> > +   && UNITS_PER_WORD > GET_MODE_SIZE (mode))
> > + {
> > +   prev_rtx = lowpart_subreg (word_mode, prev_rtx,
> > +  prev_mode);
> > +   if (prev_rtx != nullptr)
> > + prev_mode = word_mode;
> > + }
> > +   else
> > + prev_rtx = nullptr;
>
> I don't understand this.  Why not just do the:
>
>   if (REG_P (prev_rtx)
>   && HARD_REGISTER_P (prev_rtx)
>   && lowpart_subreg_regno (REGNO (prev_rtx), prev_mode, mode) < 0)
> prev_rtx = copy_to_reg (prev_rtx);
>
> that I suggested in the previous review?

But for
---
extern void *ops;

void
foo (int c)
{
  __builtin_memset (ops, c, 18);
}
---
I got

vpbroadcastb %edi, %xmm31
vmovdqa64 %xmm31, -24(%rsp)
movq ops(%rip), %rax
movzwl -24(%rsp), %edx
vmovdqu8 %xmm31, (%rax)
movw %dx, 16(%rax)
ret

I want to avoid store and load.  I am testing

  if (REG_P (prev_rtx)
  && HARD_REGISTER_P (prev_rtx)
  && lowpart_subreg_regno (REGNO (prev_rtx), prev_mode, mode) < 0)
{
  /* Find the smallest subreg mode in the same mode class which
 is not narrower than MODE and narrower than PREV_MODE.  */
  machine_mode m;
  fixed

Re: [PATCH v3] Add QI vector mode support to by-pieces for memset

2021-07-21 Thread Richard Sandiford via Gcc-patches

"H.J. Lu via Gcc-patches"  writes:
> On Wed, Jul 21, 2021 at 7:50 AM Richard Sandiford
>  wrote:
>>
>> "H.J. Lu"  writes:
>> > diff --git a/gcc/builtins.c b/gcc/builtins.c
>> > index 39ab139b7e1..1972301ce3c 100644
>> > --- a/gcc/builtins.c
>> > +++ b/gcc/builtins.c
>> > @@ -3890,13 +3890,16 @@ expand_builtin_strnlen (tree exp, rtx target, 
>> > machine_mode target_mode)
>> >
>> >  static rtx
>> >  builtin_memcpy_read_str (void *data, void *, HOST_WIDE_INT offset,
>> > -  scalar_int_mode mode)
>> > +  fixed_size_mode mode)
>> >  {
>> >/* The REPresentation pointed to by DATA need not be a nul-terminated
>> >   string but the caller guarantees it's large enough for MODE.  */
>> >const char *rep = (const char *) data;
>> >
>> > -  return c_readstr (rep + offset, mode, /*nul_terminated=*/false);
>> > +  /* NB: Vector mode in the by-pieces infrastructure is only used by
>> > + the memset expander.  */
>>
>> Sorry to nitpick, but I guess this might get out out-of-date.  Maybe:
>>
>>   /* The by-pieces infrastructure does not try to pick a vector mode
>>  for memcpy expansion.  */
>
> Fixed.
>
>> > +  return c_readstr (rep + offset, as_a  (mode),
>> > + /*nul_terminated=*/false);
>> >  }
>> >
>> >  /* LEN specify length of the block of memcpy/memset operation.
>> > @@ -6478,14 +6481,16 @@ expand_builtin_stpncpy (tree exp, rtx)
>> >
>> >  rtx
>> >  builtin_strncpy_read_str (void *data, void *, HOST_WIDE_INT offset,
>> > -   scalar_int_mode mode)
>> > +   fixed_size_mode mode)
>> >  {
>> >const char *str = (const char *) data;
>> >
>> >if ((unsigned HOST_WIDE_INT) offset > strlen (str))
>> >  return const0_rtx;
>> >
>> > -  return c_readstr (str + offset, mode);
>> > +  /* NB: Vector mode in the by-pieces infrastructure is only used by
>> > + the memset expander.  */
>>
>> Similarly here for strncpy expansion.
>
> Fixed.
>
>> > +  return c_readstr (str + offset, as_a  (mode));
>> >  }
>> >
>> >  /* Helper to check the sizes of sequences and the destination of calls
>> > @@ -6686,30 +6691,117 @@ expand_builtin_strncpy (tree exp, rtx target)
>> >return NULL_RTX;
>> >  }
>> >
>> > -/* Callback routine for store_by_pieces.  Read GET_MODE_BITSIZE (MODE)
>> > -   bytes from constant string DATA + OFFSET and return it as target
>> > -   constant.  If PREV isn't nullptr, it has the RTL info from the
>> > +/* Return the RTL of a register in MODE generated from PREV in the
>> > previous iteration.  */
>> >
>> > -rtx
>> > -builtin_memset_read_str (void *data, void *prevp,
>> > -  HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
>> > -  scalar_int_mode mode)
>> > +static rtx
>> > +gen_memset_value_from_prev (by_pieces_prev *prev, fixed_size_mode mode)
>> >  {
>> > -  by_pieces_prev *prev = (by_pieces_prev *) prevp;
>> > +  rtx target = nullptr;
>> >if (prev != nullptr && prev->data != nullptr)
>> >  {
>> >/* Use the previous data in the same mode.  */
>> >if (prev->mode == mode)
>> >   return prev->data;
>> > +
>> > +  fixed_size_mode prev_mode = prev->mode;
>> > +
>> > +  /* Don't use the previous data to write QImode if it is in a
>> > +  vector mode.  */
>> > +  if (VECTOR_MODE_P (prev_mode) && mode == QImode)
>> > + return target;
>> > +
>> > +  rtx prev_rtx = prev->data;
>> > +
>> > +  if (REG_P (prev_rtx)
>> > +   && HARD_REGISTER_P (prev_rtx)
>> > +   && lowpart_subreg_regno (REGNO (prev_rtx), prev_mode, mode) < 0)
>> > + {
>> > +   /* If we can't put a hard register in MODE, first generate a
>> > +  subreg of word mode if the previous mode is wider than word
>> > +  mode and word mode is wider than MODE.  */
>> > +   if (UNITS_PER_WORD < GET_MODE_SIZE (prev_mode)
>> > +   && UNITS_PER_WORD > GET_MODE_SIZE (mode))
>> > + {
>> > +   prev_rtx = lowpart_subreg (word_mode, prev_rtx,
>> > +  prev_mode);
>> > +   if (prev_rtx != nullptr)
>> > + prev_mode = word_mode;
>> > + }
>> > +   else
>> > + prev_rtx = nullptr;
>>
>> I don't understand this.  Why not just do the:
>>
>>   if (REG_P (prev_rtx)
>>   && HARD_REGISTER_P (prev_rtx)
>>   && lowpart_subreg_regno (REGNO (prev_rtx), prev_mode, mode) < 0)
>> prev_rtx = copy_to_reg (prev_rtx);
>>
>> that I suggested in the previous review?
>
> But for
> ---
> extern void *ops;
>
> void
> foo (int c)
> {
>   __builtin_memset (ops, c, 18);
> }
> ---
> I got
>
> vpbroadcastb %edi, %xmm31
> vmovdqa64 %xmm31, -24(%rsp)
> movq ops(%rip), %rax
> movzwl -24(%rsp), %edx
> vmovdqu8 %xmm31, (%rax)
> movw %dx, 16(%rax)
> ret
>
> I want to avoid store and load.  I am testing
>
>   if (REG_P (prev_rtx)
>   && HARD_REGISTER_P (prev_rtx)
>   && lowpart_subreg_regno (REGNO (prev_rtx), prev_mode,

[PATCH] Fix PR 10153: tail recusion for vector types.

2021-07-21 Thread apinski--- via Gcc-patches

From: Andrew Pinski 

The problem here is we try to an initialized value
from a scalar constant. For vectors we need to do
a vect_dup instead.  This fixes that issue by using
build_{one,zero}_cst instead of integer_{one,zero}_node
when calling create_tailcall_accumulator.

Changes from v1:
* v2: Use build_{one,zero}_cst and get the correct type before.

OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.

gcc/ChangeLog:

PR tree-optimize/10153
* tree-tailcall.c (create_tailcall_accumulator):
Don't call fold_convert as the type should be correct already.
(tree_optimize_tail_calls_1): Use build_{one,zero}_cst instead
of integer_{one,zero}_node for the call of create_tailcall_accumulator.

gcc/testsuite/ChangeLog:

PR tree-optimize/10153
* gcc.c-torture/compile/pr10153-1.c: New test.
* gcc.c-torture/compile/pr10153-2.c: New test.
---
 gcc/testsuite/gcc.c-torture/compile/pr10153-1.c |  7 +++
 gcc/testsuite/gcc.c-torture/compile/pr10153-2.c |  9 +
 gcc/tree-tailcall.c | 10 ++
 3 files changed, 22 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr10153-1.c
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr10153-2.c

diff --git a/gcc/testsuite/gcc.c-torture/compile/pr10153-1.c 
b/gcc/testsuite/gcc.c-torture/compile/pr10153-1.c
new file mode 100644
index 000..3f2040f32a1
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr10153-1.c
@@ -0,0 +1,7 @@
+typedef int V __attribute__ ((vector_size (2 * sizeof (int;
+V
+foo (void)
+{
+  V v = { };
+  return v - foo();
+}
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr10153-2.c 
b/gcc/testsuite/gcc.c-torture/compile/pr10153-2.c
new file mode 100644
index 000..1af4c8e2a36
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr10153-2.c
@@ -0,0 +1,9 @@
+typedef int V __attribute__ ((vector_size (2 * sizeof (int;
+V
+foo (int t)
+{
+  if (t < 10)
+return (V){1, 1};
+  V v = { };
+  return v - foo(t - 1);
+}
diff --git a/gcc/tree-tailcall.c b/gcc/tree-tailcall.c
index a4d31c90c49..f2833d25ce8 100644
--- a/gcc/tree-tailcall.c
+++ b/gcc/tree-tailcall.c
@@ -1079,8 +1079,7 @@ create_tailcall_accumulator (const char *label, 
basic_block bb, tree init)
   gphi *phi;
 
   phi = create_phi_node (tmp, bb);
-  /* RET_TYPE can be a float when -ffast-maths is enabled.  */
-  add_phi_arg (phi, fold_convert (ret_type, init), single_pred_edge (bb),
+  add_phi_arg (phi, init, single_pred_edge (bb),
   UNKNOWN_LOCATION);
   return PHI_RESULT (phi);
 }
@@ -1157,14 +1156,17 @@ tree_optimize_tail_calls_1 (bool opt_tailcalls)
  }
  phis_constructed = true;
}
+  tree ret_type = TREE_TYPE (DECL_RESULT (current_function_decl));
+  if (POINTER_TYPE_P (ret_type))
+   ret_type = sizetype;
 
   if (act->add && !a_acc)
a_acc = create_tailcall_accumulator ("add_acc", first,
-integer_zero_node);
+build_zero_cst (ret_type));
 
   if (act->mult && !m_acc)
m_acc = create_tailcall_accumulator ("mult_acc", first,
-integer_one_node);
+build_one_cst (ret_type));
 }
 
   if (a_acc || m_acc)
-- 
2.27.0

Re: [PATCH v3] Add QI vector mode support to by-pieces for memset

2021-07-21 Thread H.J. Lu via Gcc-patches

On Wed, Jul 21, 2021 at 12:20 PM Richard Sandiford
 wrote:
>
> "H.J. Lu via Gcc-patches"  writes:
> > On Wed, Jul 21, 2021 at 7:50 AM Richard Sandiford
> >  wrote:
> >>
> >> "H.J. Lu"  writes:
> >> > diff --git a/gcc/builtins.c b/gcc/builtins.c
> >> > index 39ab139b7e1..1972301ce3c 100644
> >> > --- a/gcc/builtins.c
> >> > +++ b/gcc/builtins.c
> >> > @@ -3890,13 +3890,16 @@ expand_builtin_strnlen (tree exp, rtx target, 
> >> > machine_mode target_mode)
> >> >
> >> >  static rtx
> >> >  builtin_memcpy_read_str (void *data, void *, HOST_WIDE_INT offset,
> >> > -  scalar_int_mode mode)
> >> > +  fixed_size_mode mode)
> >> >  {
> >> >/* The REPresentation pointed to by DATA need not be a nul-terminated
> >> >   string but the caller guarantees it's large enough for MODE.  */
> >> >const char *rep = (const char *) data;
> >> >
> >> > -  return c_readstr (rep + offset, mode, /*nul_terminated=*/false);
> >> > +  /* NB: Vector mode in the by-pieces infrastructure is only used by
> >> > + the memset expander.  */
> >>
> >> Sorry to nitpick, but I guess this might get out out-of-date.  Maybe:
> >>
> >>   /* The by-pieces infrastructure does not try to pick a vector mode
> >>  for memcpy expansion.  */
> >
> > Fixed.
> >
> >> > +  return c_readstr (rep + offset, as_a  (mode),
> >> > + /*nul_terminated=*/false);
> >> >  }
> >> >
> >> >  /* LEN specify length of the block of memcpy/memset operation.
> >> > @@ -6478,14 +6481,16 @@ expand_builtin_stpncpy (tree exp, rtx)
> >> >
> >> >  rtx
> >> >  builtin_strncpy_read_str (void *data, void *, HOST_WIDE_INT offset,
> >> > -   scalar_int_mode mode)
> >> > +   fixed_size_mode mode)
> >> >  {
> >> >const char *str = (const char *) data;
> >> >
> >> >if ((unsigned HOST_WIDE_INT) offset > strlen (str))
> >> >  return const0_rtx;
> >> >
> >> > -  return c_readstr (str + offset, mode);
> >> > +  /* NB: Vector mode in the by-pieces infrastructure is only used by
> >> > + the memset expander.  */
> >>
> >> Similarly here for strncpy expansion.
> >
> > Fixed.
> >
> >> > +  return c_readstr (str + offset, as_a  (mode));
> >> >  }
> >> >
> >> >  /* Helper to check the sizes of sequences and the destination of calls
> >> > @@ -6686,30 +6691,117 @@ expand_builtin_strncpy (tree exp, rtx target)
> >> >return NULL_RTX;
> >> >  }
> >> >
> >> > -/* Callback routine for store_by_pieces.  Read GET_MODE_BITSIZE (MODE)
> >> > -   bytes from constant string DATA + OFFSET and return it as target
> >> > -   constant.  If PREV isn't nullptr, it has the RTL info from the
> >> > +/* Return the RTL of a register in MODE generated from PREV in the
> >> > previous iteration.  */
> >> >
> >> > -rtx
> >> > -builtin_memset_read_str (void *data, void *prevp,
> >> > -  HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
> >> > -  scalar_int_mode mode)
> >> > +static rtx
> >> > +gen_memset_value_from_prev (by_pieces_prev *prev, fixed_size_mode mode)
> >> >  {
> >> > -  by_pieces_prev *prev = (by_pieces_prev *) prevp;
> >> > +  rtx target = nullptr;
> >> >if (prev != nullptr && prev->data != nullptr)
> >> >  {
> >> >/* Use the previous data in the same mode.  */
> >> >if (prev->mode == mode)
> >> >   return prev->data;
> >> > +
> >> > +  fixed_size_mode prev_mode = prev->mode;
> >> > +
> >> > +  /* Don't use the previous data to write QImode if it is in a
> >> > +  vector mode.  */
> >> > +  if (VECTOR_MODE_P (prev_mode) && mode == QImode)
> >> > + return target;
> >> > +
> >> > +  rtx prev_rtx = prev->data;
> >> > +
> >> > +  if (REG_P (prev_rtx)
> >> > +   && HARD_REGISTER_P (prev_rtx)
> >> > +   && lowpart_subreg_regno (REGNO (prev_rtx), prev_mode, mode) < 0)
> >> > + {
> >> > +   /* If we can't put a hard register in MODE, first generate a
> >> > +  subreg of word mode if the previous mode is wider than word
> >> > +  mode and word mode is wider than MODE.  */
> >> > +   if (UNITS_PER_WORD < GET_MODE_SIZE (prev_mode)
> >> > +   && UNITS_PER_WORD > GET_MODE_SIZE (mode))
> >> > + {
> >> > +   prev_rtx = lowpart_subreg (word_mode, prev_rtx,
> >> > +  prev_mode);
> >> > +   if (prev_rtx != nullptr)
> >> > + prev_mode = word_mode;
> >> > + }
> >> > +   else
> >> > + prev_rtx = nullptr;
> >>
> >> I don't understand this.  Why not just do the:
> >>
> >>   if (REG_P (prev_rtx)
> >>   && HARD_REGISTER_P (prev_rtx)
> >>   && lowpart_subreg_regno (REGNO (prev_rtx), prev_mode, mode) < 0)
> >> prev_rtx = copy_to_reg (prev_rtx);
> >>
> >> that I suggested in the previous review?
> >
> > But for
> > ---
> > extern void *ops;
> >
> > void
> > foo (int c)
> > {
> >   __builtin_memset (ops, c, 18);
> > }
> > ---
> > I got
> >
> > vpbroadcastb %e

Re: [PATCH v3] Add QI vector mode support to by-pieces for memset

2021-07-21 Thread Richard Sandiford via Gcc-patches

Richard Sandiford  writes:
> "H.J. Lu via Gcc-patches"  writes:
>> On Wed, Jul 21, 2021 at 7:50 AM Richard Sandiford
>>  wrote:
>>>
>>> "H.J. Lu"  writes:
>>> > diff --git a/gcc/builtins.c b/gcc/builtins.c
>>> > index 39ab139b7e1..1972301ce3c 100644
>>> > --- a/gcc/builtins.c
>>> > +++ b/gcc/builtins.c
>>> > @@ -3890,13 +3890,16 @@ expand_builtin_strnlen (tree exp, rtx target, 
>>> > machine_mode target_mode)
>>> >
>>> >  static rtx
>>> >  builtin_memcpy_read_str (void *data, void *, HOST_WIDE_INT offset,
>>> > -  scalar_int_mode mode)
>>> > +  fixed_size_mode mode)
>>> >  {
>>> >/* The REPresentation pointed to by DATA need not be a nul-terminated
>>> >   string but the caller guarantees it's large enough for MODE.  */
>>> >const char *rep = (const char *) data;
>>> >
>>> > -  return c_readstr (rep + offset, mode, /*nul_terminated=*/false);
>>> > +  /* NB: Vector mode in the by-pieces infrastructure is only used by
>>> > + the memset expander.  */
>>>
>>> Sorry to nitpick, but I guess this might get out out-of-date.  Maybe:
>>>
>>>   /* The by-pieces infrastructure does not try to pick a vector mode
>>>  for memcpy expansion.  */
>>
>> Fixed.
>>
>>> > +  return c_readstr (rep + offset, as_a  (mode),
>>> > + /*nul_terminated=*/false);
>>> >  }
>>> >
>>> >  /* LEN specify length of the block of memcpy/memset operation.
>>> > @@ -6478,14 +6481,16 @@ expand_builtin_stpncpy (tree exp, rtx)
>>> >
>>> >  rtx
>>> >  builtin_strncpy_read_str (void *data, void *, HOST_WIDE_INT offset,
>>> > -   scalar_int_mode mode)
>>> > +   fixed_size_mode mode)
>>> >  {
>>> >const char *str = (const char *) data;
>>> >
>>> >if ((unsigned HOST_WIDE_INT) offset > strlen (str))
>>> >  return const0_rtx;
>>> >
>>> > -  return c_readstr (str + offset, mode);
>>> > +  /* NB: Vector mode in the by-pieces infrastructure is only used by
>>> > + the memset expander.  */
>>>
>>> Similarly here for strncpy expansion.
>>
>> Fixed.
>>
>>> > +  return c_readstr (str + offset, as_a  (mode));
>>> >  }
>>> >
>>> >  /* Helper to check the sizes of sequences and the destination of calls
>>> > @@ -6686,30 +6691,117 @@ expand_builtin_strncpy (tree exp, rtx target)
>>> >return NULL_RTX;
>>> >  }
>>> >
>>> > -/* Callback routine for store_by_pieces.  Read GET_MODE_BITSIZE (MODE)
>>> > -   bytes from constant string DATA + OFFSET and return it as target
>>> > -   constant.  If PREV isn't nullptr, it has the RTL info from the
>>> > +/* Return the RTL of a register in MODE generated from PREV in the
>>> > previous iteration.  */
>>> >
>>> > -rtx
>>> > -builtin_memset_read_str (void *data, void *prevp,
>>> > -  HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
>>> > -  scalar_int_mode mode)
>>> > +static rtx
>>> > +gen_memset_value_from_prev (by_pieces_prev *prev, fixed_size_mode mode)
>>> >  {
>>> > -  by_pieces_prev *prev = (by_pieces_prev *) prevp;
>>> > +  rtx target = nullptr;
>>> >if (prev != nullptr && prev->data != nullptr)
>>> >  {
>>> >/* Use the previous data in the same mode.  */
>>> >if (prev->mode == mode)
>>> >   return prev->data;
>>> > +
>>> > +  fixed_size_mode prev_mode = prev->mode;
>>> > +
>>> > +  /* Don't use the previous data to write QImode if it is in a
>>> > +  vector mode.  */
>>> > +  if (VECTOR_MODE_P (prev_mode) && mode == QImode)
>>> > + return target;
>>> > +
>>> > +  rtx prev_rtx = prev->data;
>>> > +
>>> > +  if (REG_P (prev_rtx)
>>> > +   && HARD_REGISTER_P (prev_rtx)
>>> > +   && lowpart_subreg_regno (REGNO (prev_rtx), prev_mode, mode) < 0)
>>> > + {
>>> > +   /* If we can't put a hard register in MODE, first generate a
>>> > +  subreg of word mode if the previous mode is wider than word
>>> > +  mode and word mode is wider than MODE.  */
>>> > +   if (UNITS_PER_WORD < GET_MODE_SIZE (prev_mode)
>>> > +   && UNITS_PER_WORD > GET_MODE_SIZE (mode))
>>> > + {
>>> > +   prev_rtx = lowpart_subreg (word_mode, prev_rtx,
>>> > +  prev_mode);
>>> > +   if (prev_rtx != nullptr)
>>> > + prev_mode = word_mode;
>>> > + }
>>> > +   else
>>> > + prev_rtx = nullptr;
>>>
>>> I don't understand this.  Why not just do the:
>>>
>>>   if (REG_P (prev_rtx)
>>>   && HARD_REGISTER_P (prev_rtx)
>>>   && lowpart_subreg_regno (REGNO (prev_rtx), prev_mode, mode) < 0)
>>> prev_rtx = copy_to_reg (prev_rtx);
>>>
>>> that I suggested in the previous review?
>>
>> But for
>> ---
>> extern void *ops;
>>
>> void
>> foo (int c)
>> {
>>   __builtin_memset (ops, c, 18);
>> }
>> ---
>> I got
>>
>> vpbroadcastb %edi, %xmm31
>> vmovdqa64 %xmm31, -24(%rsp)
>> movq ops(%rip), %rax
>> movzwl -24(%rsp), %edx
>> vmovdqu8 %xmm31, (%rax)
>> movw %dx, 16(%rax)
>> ret
>>
>> I want to avoi

[PATCH v4] Add QI vector mode support to by-pieces for memset

2021-07-21 Thread H.J. Lu via Gcc-patches

1. Replace scalar_int_mode with fixed_size_mode in the by-pieces
infrastructure to allow non-integer mode.
2. Rename widest_int_mode_for_size to widest_fixed_size_mode_for_size
to return QI vector mode for memset.
3. Add op_by_pieces_d::smallest_fixed_size_mode_for_size to return the
smallest integer or QI vector mode.
4. Remove clear_by_pieces_1 and use builtin_memset_read_str in
clear_by_pieces to support vector mode broadcast.
5. Add lowpart_subreg_regno, a wrapper around simplify_subreg_regno that
uses subreg_lowpart_offset (mode, prev_mode) as the offset.
6. Add TARGET_GEN_MEMSET_SCRATCH_RTX to allow the backend to use a hard
scratch register to avoid stack realignment when expanding memset.

gcc/

PR middle-end/90773
* builtins.c (builtin_memcpy_read_str): Change the mode argument
from scalar_int_mode to fixed_size_mode.
(builtin_strncpy_read_str): Likewise.
(gen_memset_value_from_prev): New function.
(gen_memset_broadcast): Likewise.
(builtin_memset_read_str): Change the mode argument from
scalar_int_mode to fixed_size_mode.  Use gen_memset_value_from_prev
and gen_memset_broadcast.
(builtin_memset_gen_str): Likewise.
(try_store_by_multiple_pieces): Use by_pieces_constfn to declare
constfun.
* builtins.h (builtin_strncpy_read_str): Replace scalar_int_mode
with fixed_size_mode.
(builtin_memset_read_str): Likewise.
* expr.c (widest_int_mode_for_size): Renamed to ...
(widest_fixed_size_mode_for_size): Add a bool argument to
indicate if QI vector mode can be used.
(by_pieces_ninsns): Call widest_fixed_size_mode_for_size
instead of widest_int_mode_for_size.
(pieces_addr::adjust): Change the mode argument from
scalar_int_mode to fixed_size_mode.
(op_by_pieces_d): Make m_len read-only.  Add a bool member,
m_qi_vector_mode, to indicate that QI vector mode can be used.
(op_by_pieces_d::op_by_pieces_d): Add a bool argument to
initialize m_qi_vector_mode.  Call widest_fixed_size_mode_for_size
instead of widest_int_mode_for_size.
(op_by_pieces_d::get_usable_mode): Change the mode argument from
scalar_int_mode to fixed_size_mode.  Call
widest_fixed_size_mode_for_size instead of
widest_int_mode_for_size.
(op_by_pieces_d::smallest_fixed_size_mode_for_size): New member
function to return the smallest integer or QI vector mode.
(op_by_pieces_d::run): Call widest_fixed_size_mode_for_size
instead of widest_int_mode_for_size.  Call
smallest_fixed_size_mode_for_size instead of
smallest_int_mode_for_size.
(store_by_pieces_d::store_by_pieces_d): Add a bool argument to
indicate that QI vector mode can be used and pass it to
op_by_pieces_d::op_by_pieces_d.
(can_store_by_pieces): Call widest_fixed_size_mode_for_size
instead of widest_int_mode_for_size.
(store_by_pieces): Pass memsetp to
store_by_pieces_d::store_by_pieces_d.
(clear_by_pieces_1): Removed.
(clear_by_pieces): Replace clear_by_pieces_1 with
builtin_memset_read_str and pass true to store_by_pieces_d to
support vector mode broadcast.
(string_cst_read_str): Change the mode argument from
scalar_int_mode to fixed_size_mode.
* expr.h (by_pieces_constfn): Change scalar_int_mode to
fixed_size_mode.
(by_pieces_prev): Likewise.
* rtl.h (lowpart_subreg_regno): New.
* rtlanal.c (lowpart_subreg_regno): New.  A wrapper around
simplify_subreg_regno.
* target.def (gen_memset_scratch_rtx): New hook.
* doc/tm.texi.in: Add TARGET_GEN_MEMSET_SCRATCH_RTX.
* doc/tm.texi: Regenerated.

gcc/testsuite/

* gcc.target/i386/pr100865-3.c: Expect vmovdqu8 instead of
vmovdqu.
* gcc.target/i386/pr100865-4b.c: Likewise.
---
 gcc/builtins.c  | 180 
 gcc/builtins.h  |   4 +-
 gcc/doc/tm.texi |   7 +
 gcc/doc/tm.texi.in  |   2 +
 gcc/expr.c  | 168 --
 gcc/expr.h  |   4 +-
 gcc/rtl.h   |   2 +
 gcc/rtlanal.c   |  11 ++
 gcc/target.def  |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-3.c  |   2 +-
 gcc/testsuite/gcc.target/i386/pr100865-4b.c |   2 +-
 11 files changed, 303 insertions(+), 88 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 170d776c410..26360b0b11b 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3890,13 +3890,16 @@ expand_builtin_strnlen (tree exp, rtx target, 
machine_mode target_mode)
 
 static rtx
 builtin_memcpy_read_str (void *data, void *, HOST_WIDE_INT offset,
-

Re: [PATCH v3] Add QI vector mode support to by-pieces for memset

2021-07-21 Thread H.J. Lu via Gcc-patches

On Wed, Jul 21, 2021 at 12:42 PM Richard Sandiford
 wrote:
>
> Richard Sandiford  writes:
> > "H.J. Lu via Gcc-patches"  writes:
> >> On Wed, Jul 21, 2021 at 7:50 AM Richard Sandiford
> >>  wrote:
> >>>
> >>> "H.J. Lu"  writes:
> >>> > diff --git a/gcc/builtins.c b/gcc/builtins.c
> >>> > index 39ab139b7e1..1972301ce3c 100644
> >>> > --- a/gcc/builtins.c
> >>> > +++ b/gcc/builtins.c
> >>> > @@ -3890,13 +3890,16 @@ expand_builtin_strnlen (tree exp, rtx target, 
> >>> > machine_mode target_mode)
> >>> >
> >>> >  static rtx
> >>> >  builtin_memcpy_read_str (void *data, void *, HOST_WIDE_INT offset,
> >>> > -  scalar_int_mode mode)
> >>> > +  fixed_size_mode mode)
> >>> >  {
> >>> >/* The REPresentation pointed to by DATA need not be a nul-terminated
> >>> >   string but the caller guarantees it's large enough for MODE.  */
> >>> >const char *rep = (const char *) data;
> >>> >
> >>> > -  return c_readstr (rep + offset, mode, /*nul_terminated=*/false);
> >>> > +  /* NB: Vector mode in the by-pieces infrastructure is only used by
> >>> > + the memset expander.  */
> >>>
> >>> Sorry to nitpick, but I guess this might get out out-of-date.  Maybe:
> >>>
> >>>   /* The by-pieces infrastructure does not try to pick a vector mode
> >>>  for memcpy expansion.  */
> >>
> >> Fixed.
> >>
> >>> > +  return c_readstr (rep + offset, as_a  (mode),
> >>> > + /*nul_terminated=*/false);
> >>> >  }
> >>> >
> >>> >  /* LEN specify length of the block of memcpy/memset operation.
> >>> > @@ -6478,14 +6481,16 @@ expand_builtin_stpncpy (tree exp, rtx)
> >>> >
> >>> >  rtx
> >>> >  builtin_strncpy_read_str (void *data, void *, HOST_WIDE_INT offset,
> >>> > -   scalar_int_mode mode)
> >>> > +   fixed_size_mode mode)
> >>> >  {
> >>> >const char *str = (const char *) data;
> >>> >
> >>> >if ((unsigned HOST_WIDE_INT) offset > strlen (str))
> >>> >  return const0_rtx;
> >>> >
> >>> > -  return c_readstr (str + offset, mode);
> >>> > +  /* NB: Vector mode in the by-pieces infrastructure is only used by
> >>> > + the memset expander.  */
> >>>
> >>> Similarly here for strncpy expansion.
> >>
> >> Fixed.
> >>
> >>> > +  return c_readstr (str + offset, as_a  (mode));
> >>> >  }
> >>> >
> >>> >  /* Helper to check the sizes of sequences and the destination of calls
> >>> > @@ -6686,30 +6691,117 @@ expand_builtin_strncpy (tree exp, rtx target)
> >>> >return NULL_RTX;
> >>> >  }
> >>> >
> >>> > -/* Callback routine for store_by_pieces.  Read GET_MODE_BITSIZE (MODE)
> >>> > -   bytes from constant string DATA + OFFSET and return it as target
> >>> > -   constant.  If PREV isn't nullptr, it has the RTL info from the
> >>> > +/* Return the RTL of a register in MODE generated from PREV in the
> >>> > previous iteration.  */
> >>> >
> >>> > -rtx
> >>> > -builtin_memset_read_str (void *data, void *prevp,
> >>> > -  HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
> >>> > -  scalar_int_mode mode)
> >>> > +static rtx
> >>> > +gen_memset_value_from_prev (by_pieces_prev *prev, fixed_size_mode mode)
> >>> >  {
> >>> > -  by_pieces_prev *prev = (by_pieces_prev *) prevp;
> >>> > +  rtx target = nullptr;
> >>> >if (prev != nullptr && prev->data != nullptr)
> >>> >  {
> >>> >/* Use the previous data in the same mode.  */
> >>> >if (prev->mode == mode)
> >>> >   return prev->data;
> >>> > +
> >>> > +  fixed_size_mode prev_mode = prev->mode;
> >>> > +
> >>> > +  /* Don't use the previous data to write QImode if it is in a
> >>> > +  vector mode.  */
> >>> > +  if (VECTOR_MODE_P (prev_mode) && mode == QImode)
> >>> > + return target;
> >>> > +
> >>> > +  rtx prev_rtx = prev->data;
> >>> > +
> >>> > +  if (REG_P (prev_rtx)
> >>> > +   && HARD_REGISTER_P (prev_rtx)
> >>> > +   && lowpart_subreg_regno (REGNO (prev_rtx), prev_mode, mode) < 0)
> >>> > + {
> >>> > +   /* If we can't put a hard register in MODE, first generate a
> >>> > +  subreg of word mode if the previous mode is wider than word
> >>> > +  mode and word mode is wider than MODE.  */
> >>> > +   if (UNITS_PER_WORD < GET_MODE_SIZE (prev_mode)
> >>> > +   && UNITS_PER_WORD > GET_MODE_SIZE (mode))
> >>> > + {
> >>> > +   prev_rtx = lowpart_subreg (word_mode, prev_rtx,
> >>> > +  prev_mode);
> >>> > +   if (prev_rtx != nullptr)
> >>> > + prev_mode = word_mode;
> >>> > + }
> >>> > +   else
> >>> > + prev_rtx = nullptr;
> >>>
> >>> I don't understand this.  Why not just do the:
> >>>
> >>>   if (REG_P (prev_rtx)
> >>>   && HARD_REGISTER_P (prev_rtx)
> >>>   && lowpart_subreg_regno (REGNO (prev_rtx), prev_mode, mode) < 0)
> >>> prev_rtx = copy_to_reg (prev_rtx);
> >>>
> >>> that I suggested in the previous review?
> >>
> >> But

[PATCH] PR fortran/101536 - ICE in gfc_conv_expr_descriptor, at fortran/trans-array.c:7324

2021-07-21 Thread Harald Anlauf via Gcc-patches

Another one of Gerhard's infamous testcases.  We did not properly detect
and reject array elements of type CLASS as argument to an intrinsic when
it should be an array.

Regtested on x86_64-pc-linux-gnu.  OK for mainline / 11-branch when it
reopens?

Thanks,
Harald


Fortran: extend check for array arguments and reject CLASS array elements.

gcc/fortran/ChangeLog:

PR fortran/101536
* check.c (array_check): Array elements of CLASS type are not
arrays.

gcc/testsuite/ChangeLog:

PR fortran/101536
* gfortran.dg/pr101536.f90: New test.

diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c
index 27bf3a7eafe..6d2d9fe4007 100644
--- a/gcc/fortran/check.c
+++ b/gcc/fortran/check.c
@@ -735,6 +735,10 @@ array_check (gfc_expr *e, int n)
 	&& CLASS_DATA (e)->attr.dimension
 	&& CLASS_DATA (e)->as->rank)
 {
+  if (e->ref && e->ref->type == REF_ARRAY
+	  && e->ref->u.ar.type == AR_ELEMENT)
+	goto error;
+
   gfc_add_class_array_ref (e);
   return true;
 }
@@ -742,6 +746,7 @@ array_check (gfc_expr *e, int n)
   if (e->rank != 0 && e->ts.type != BT_PROCEDURE)
 return true;

+error:
   gfc_error ("%qs argument of %qs intrinsic at %L must be an array",
 	 gfc_current_intrinsic_arg[n]->name, gfc_current_intrinsic,
 	 &e->where);
diff --git a/gcc/testsuite/gfortran.dg/pr101536.f90 b/gcc/testsuite/gfortran.dg/pr101536.f90
new file mode 100644
index 000..14cb4100bd6
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr101536.f90
@@ -0,0 +1,12 @@
+! { dg-do compile }
+! PR fortran/101536 - ICE in gfc_conv_expr_descriptor
+
+program p
+  type t
+  end type
+contains
+  integer function f(x)
+class(t), allocatable :: x(:)
+f = size (x(1)) ! { dg-error "must be an array" }
+  end
+end

[PATCH] PR fortrsn/101564 - ICE in resolve_allocate_deallocate, at fortran/resolve.c:8169

2021-07-21 Thread Harald Anlauf via Gcc-patches

I have the impression that Gerhard is a hydra: one PR down, he submits
two new ones... :-(
Anyway, here's a straightforward fix for a NULL pointer dereference for
an invalid argument to STAT.  For an alternative patch by Steve see PR.

Regtested on x86_64-pc-linux-gnu.  OK for mainline / 11-branch when it
reopens?

Thanks,
Harald


Fortran: ICE in resolve_allocate_deallocate for invalid STAT argument

gcc/fortran/ChangeLog:

PR fortran/101564
* resolve.c (resolve_allocate_deallocate): Avoid NULL pointer
dereference and shortcut for bad STAT argument to (DE)ALLOCATE.

gcc/testsuite/ChangeLog:

PR fortran/101564
* gfortran.dg/pr101564.f90: New test.

diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index 45c3ad387ac..51d312116eb 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -8165,6 +8165,9 @@ resolve_allocate_deallocate (gfc_code *code, const char *fcn)
 	gfc_error ("Stat-variable at %L must be a scalar INTEGER "
 		   "variable", &stat->where);

+  if (stat->expr_type == EXPR_CONSTANT || stat->symtree == NULL)
+	goto done_stat;
+
   for (p = code->ext.alloc.list; p; p = p->next)
 	if (p->expr->symtree->n.sym->name == stat->symtree->n.sym->name)
 	  {
@@ -8192,6 +8195,8 @@ resolve_allocate_deallocate (gfc_code *code, const char *fcn)
 	  }
 }

+done_stat:
+
   /* Check the errmsg variable.  */
   if (errmsg)
 {
diff --git a/gcc/testsuite/gfortran.dg/pr101564.f90 b/gcc/testsuite/gfortran.dg/pr101564.f90
new file mode 100644
index 000..1e7c9911ce6
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr101564.f90
@@ -0,0 +1,9 @@
+! { dg-do compile }
+! PR fortran/101564 - ICE in resolve_allocate_deallocate
+
+program p
+  integer, allocatable :: x(:)
+  integer  :: stat
+  allocate (x(2), stat=stat)
+  deallocate (x,  stat=stat%kind) ! { dg-error "(STAT variable)" }
+end

[committed] fix a couple of typos in a comment

2021-07-21 Thread Martin Sebor via Gcc-patches


My eye was drawn to the typos below so I fixed them.

Martin

diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
index 742a95a549e..cbd51ac4d7c 100644
--- a/gcc/tree-ssa-alias.c
+++ b/gcc/tree-ssa-alias.c
@@ -3745,7 +3745,7 @@ walk_non_aliased_vuses (ao_ref *ref, tree vuse, 
bool tbaa_p,

 }


-/* Based on the memory reference REF call WALKER for each vdef which
+/* Based on the memory reference REF call WALKER for each vdef whose
defining statement may clobber REF, starting with VDEF.  If REF
is NULL_TREE, each defining statement is visited.

@@ -3755,8 +3755,8 @@ walk_non_aliased_vuses (ao_ref *ref, tree vuse, 
bool tbaa_p,

If function entry is reached, FUNCTION_ENTRY_REACHED is set to true.
The pointer may be NULL and then we do not track this information.

-   At PHI nodes walk_aliased_vdefs forks into one walk for reach
-   PHI argument (but only one walk continues on merge points), the
+   At PHI nodes walk_aliased_vdefs forks into one walk for each
+   PHI argument (but only one walk continues at merge points), the
return value is true if any of the walks was successful.

The function returns the number of statements walked or -1 if

Re: [PATCH 1/2] RISC-V: Add arch flags for T-HEAD.

2021-07-21 Thread Jim Wilson

On Tue, Jul 13, 2021 at 11:06 AM Palmer Dabbelt  wrote:

> Is there are documentation as to what this "theadc" extension is?
>

The best doc I know of is
https://github.com/isrc-cas/c910-llvm
The README is in Chinese, but google translate does a decent job on it.  If
you want more details, you have to read the llvm sources to see exactly
what each instruction does.  They have mentioned that they are working on
English language docs, but I don't know when they will be available.

There are quite a few T-Head specific instructions here.  This patch is
only adding support for a few of them, probably as a trial to see how it
goes before they try to add the rest.

Jim

Clarification on CTF/BTF workings with LTO

2021-07-21 Thread Indu Bhagat via Gcc-patches


Hello,

Wanted to follow up on the CTF/BTF debug info + LTO workings.

To summarize, the current status/workflow on trunk is:

- The CTF container is written out in the ctfout.c or btfout.c via the 
ctf_debug_finalize () API.
- At this time, the ctf_debug_finalize () itself is called once in 
dwarf2out_early_finish ().

- Until this time, the requirements of CTF and BTF are simple.
   - The generated .ctf/.BTF sections needs no demarcation of 
"early"/"late" debug. All of it can be generated "early".
   - The generated .ctf/.BTF information does not need to be different 
for the final assembly and the fat LTO IR.

   - The BPF CO-RE is not yet implemented on trunk.

Writing out the CTF/BTF at dwarf2out_early_finish seems to work - there 
will always be a .ctf/.BTF section whether it's fat or slim LTO objects 
(because the emission is still in dwarf2out_early_finish on the trunk). 
And we have functionality to copy over the .ctf/.BTF debug sections in 
handle_lto_debug_sections (). However, reading through some of the past 
emails on the CTF/BTF patch series, it seems that you have been pointing 
to the CTF/BTF debug info generation being broken when used with LTO. If 
true, I am most certainly missing some key point here.


So, before we move to the next steps of supporting additional 
requirements of BPF CO-RE etc., I would like to make sure that my 
current understanding is OK and that the current state of CTF/BTF on 
trunk is functional -with LTO-. I have tested some bits (with and 
without fat objects on x86_64) and have not run into issues.


Can you please confirm what you see amiss in the current workings of 
CTF/BTF with LTO on trunk ?


Thanks
Indu

[committed] analyzer: tweak dumping of min_expr/max_expr

2021-07-21 Thread David Malcolm via Gcc-patches

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as dcdf6bb24e5f113f2bb9298588105a071bddf50f.

gcc/analyzer/ChangeLog:
* svalue.cc (infix_p): New.
(binop_svalue::dump_to_pp): Use it to print MIN_EXPR and MAX_EXPR
in prefix form, rather than infix.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/svalue.cc | 39 ++-
 1 file changed, 34 insertions(+), 5 deletions(-)

diff --git a/gcc/analyzer/svalue.cc b/gcc/analyzer/svalue.cc
index 094c7256818..a1e6f50b7d7 100644
--- a/gcc/analyzer/svalue.cc
+++ b/gcc/analyzer/svalue.cc
@@ -1053,6 +1053,21 @@ unaryop_svalue::maybe_fold_bits_within (tree type,
 
 /* class binop_svalue : public svalue.  */
 
+/* Return whether OP be printed as an infix operator.  */
+
+static bool
+infix_p (enum tree_code op)
+{
+  switch (op)
+{
+default:
+  return true;
+case MAX_EXPR:
+case MIN_EXPR:
+  return false;
+}
+}
+
 /* Implementation of svalue::dump_to_pp vfunc for binop_svalue.  */
 
 void
@@ -1060,11 +1075,25 @@ binop_svalue::dump_to_pp (pretty_printer *pp, bool 
simple) const
 {
   if (simple)
 {
-  pp_character (pp, '(');
-  m_arg0->dump_to_pp (pp, simple);
-  pp_string (pp, op_symbol_code (m_op));
-  m_arg1->dump_to_pp (pp, simple);
-  pp_character (pp, ')');
+  if (infix_p (m_op))
+   {
+ /* Print "(A OP B)".  */
+ pp_character (pp, '(');
+ m_arg0->dump_to_pp (pp, simple);
+ pp_string (pp, op_symbol_code (m_op));
+ m_arg1->dump_to_pp (pp, simple);
+ pp_character (pp, ')');
+   }
+  else
+   {
+ /* Print "OP(A, B)".  */
+ pp_string (pp, op_symbol_code (m_op));
+ pp_character (pp, '(');
+ m_arg0->dump_to_pp (pp, simple);
+ pp_string (pp, ", ");
+ m_arg1->dump_to_pp (pp, simple);
+ pp_character (pp, ')');
+   }
 }
   else
 {
-- 
2.26.3

[committed] analyzer: show BB index in BEFORE_SUPERNODE's in-edge

2021-07-21 Thread David Malcolm via Gcc-patches

This is useful for debugging how the analyzer handles phi nodes.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as 81703584769707c34533e78c7a2bc229b0e14b2d.

gcc/analyzer/ChangeLog:
* program-point.cc (function_point::print): Show src BB index at
BEFORE_SUPERNODE.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/program-point.cc | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/gcc/analyzer/program-point.cc b/gcc/analyzer/program-point.cc
index d8cfc61975e..d73b6211141 100644
--- a/gcc/analyzer/program-point.cc
+++ b/gcc/analyzer/program-point.cc
@@ -119,8 +119,15 @@ function_point::print (pretty_printer *pp, const format 
&f) const
 case PK_BEFORE_SUPERNODE:
   {
if (m_from_edge)
- pp_printf (pp, "before SN: %i (from SN: %i)",
-m_supernode->m_index, m_from_edge->m_src->m_index);
+ {
+   if (basic_block bb = m_from_edge->m_src->m_bb)
+ pp_printf (pp, "before SN: %i (from SN: %i (bb: %i))",
+m_supernode->m_index, m_from_edge->m_src->m_index,
+bb->index);
+   else
+ pp_printf (pp, "before SN: %i (from SN: %i)",
+m_supernode->m_index, m_from_edge->m_src->m_index);
+ }
else
  pp_printf (pp, "before SN: %i (NULL from-edge)",
 m_supernode->m_index);
-- 
2.26.3

[committed] analyzer: fixes to -fdump-analyzer-state-purge for phi nodes

2021-07-21 Thread David Malcolm via Gcc-patches

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as 6bbad96cd44774bc199b256dbf4260b25b87c7db.

gcc/analyzer/ChangeLog:
* state-purge.cc (state_purge_annotator::add_node_annotations):
Rather than erroneously always using the NULL in-edge, determine
each relevant in-edge, and print the appropriate data for each
in-edge.  Use print_needed to print the data as comma-separated
lists of SSA names.
(print_vec_of_names): Add "within_table" param and use it.
(state_purge_annotator::add_stmt_annotations): Factor out
collation and printing code into...
(state_purge_annotator::print_needed): ...this new function.
* state-purge.h (state_purge_annotator::print_needed): New decl.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/state-purge.cc | 66 ++---
 gcc/analyzer/state-purge.h  |  4 +++
 2 files changed, 43 insertions(+), 27 deletions(-)

diff --git a/gcc/analyzer/state-purge.cc b/gcc/analyzer/state-purge.cc
index e82ea87e735..3c3b77500a6 100644
--- a/gcc/analyzer/state-purge.cc
+++ b/gcc/analyzer/state-purge.cc
@@ -477,23 +477,20 @@ state_purge_annotator::add_node_annotations (graphviz_out 
*gv,
  "lightblue");
pp_write_text_to_stream (pp);
 
-   // FIXME: passing in a NULL in-edge means we get no hits
-   function_point before_supernode
- (function_point::before_supernode (&n, NULL));
-
-   for (state_purge_map::iterator iter = m_map->begin ();
-   iter != m_map->end ();
-   ++iter)
+   /* Different in-edges mean different names need purging.
+  Determine which points to dump.  */
+   auto_vec points;
+   if (n.entry_p ())
+ points.safe_push (function_point::before_supernode (&n, NULL));
+   else
+ for (auto inedge : n.m_preds)
+   points.safe_push (function_point::before_supernode (&n, inedge));
+
+   for (auto & point : points)
  {
-   tree name = (*iter).first;
-   state_purge_per_ssa_name *per_name_data = (*iter).second;
-   if (per_name_data->get_function () == n.m_fun)
-{
-  if (per_name_data->needed_at_point_p (before_supernode))
-pp_printf (pp, "%qE needed here", name);
-  else
-pp_printf (pp, "%qE not needed here", name);
-}
+   point.print (pp, format (true));
+   pp_newline (pp);
+   print_needed (gv, point, false);
pp_newline (pp);
  }
 
@@ -502,19 +499,20 @@ state_purge_annotator::add_node_annotations (graphviz_out 
*gv,
return false;
 }
 
-/* Print V to GV as a comma-separated list in braces within a ,
-   titling it with TITLE.
+/* Print V to GV as a comma-separated list in braces, titling it with TITLE.
+   If WITHIN_TABLE is true, print it within a 
 
-   Subroutine of state_purge_annotator::add_stmt_annotations.  */
+   Subroutine of state_purge_annotator::print_needed.  */
 
 static void
 print_vec_of_names (graphviz_out *gv, const char *title,
-   const auto_vec &v)
+   const auto_vec &v, bool within_table)
 {
   pretty_printer *pp = gv->get_pp ();
   tree name;
   unsigned i;
-  gv->begin_trtd ();
+  if (within_table)
+gv->begin_trtd ();
   pp_printf (pp, "%s: {", title);
   FOR_EACH_VEC_ELT (v, i, name)
 {
@@ -523,8 +521,11 @@ print_vec_of_names (graphviz_out *gv, const char *title,
   pp_printf (pp, "%qE", name);
 }
   pp_printf (pp, "}");
-  pp_write_text_as_html_like_dot_to_stream (pp);
-  gv->end_tdtr ();
+  if (within_table)
+{
+  pp_write_text_as_html_like_dot_to_stream (pp);
+  gv->end_tdtr ();
+}
   pp_newline (pp);
 }
 
@@ -556,6 +557,17 @@ state_purge_annotator::add_stmt_annotations (graphviz_out 
*gv,
   function_point before_stmt
 (function_point::before_stmt (supernode, stmt_idx));
 
+  print_needed (gv, before_stmt, true);
+}
+
+/* Get the ssa names needed and not-needed at POINT, and print them to GV.
+   If WITHIN_TABLE is true, print them within  elements.  */
+
+void
+state_purge_annotator::print_needed (graphviz_out *gv,
+const function_point &point,
+bool within_table) const
+{
   auto_vec needed;
   auto_vec not_needed;
   for (state_purge_map::iterator iter = m_map->begin ();
@@ -564,17 +576,17 @@ state_purge_annotator::add_stmt_annotations (graphviz_out 
*gv,
 {
   tree name = (*iter).first;
   state_purge_per_ssa_name *per_name_data = (*iter).second;
-  if (per_name_data->get_function () == supernode->m_fun)
+  if (per_name_data->get_function () == point.get_function ())
{
- if (per_name_data->needed_at_point_p (before_stmt))
+ if (per_name_data->needed_at_point_p (point))
needed.safe_push (name);
  else
not_needed.safe_push (name);
}
 }
 
-  print_vec_of_names (gv, "needed here", needed);
-  print_vec_of_names (gv, "not needed here", not_needed);
+  print_vec_of_na

[committed] analyzer: fix issues with phi handling

2021-07-21 Thread David Malcolm via Gcc-patches

The analyzer's state purging code was overzealously purging state
for ssa names that might be used within phi nodes, leading to
false positives from -Wanalyzer-use-of-uninitialized-value.

This patch updates phi handling in the analyzer to fix these issues.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as e0a7a6752dad7848eb4b29b826a551c0992256ec.

gcc/analyzer/ChangeLog:
* region-model.cc (region_model::handle_phi): Add "old_state"
param and use it.
(region_model::update_for_phis): Update so that all of the phi
stmts are effectively handled simultaneously, rather than in
order.
* region-model.h (region_model::handle_phi): Add "old_state"
param.
* state-purge.cc (self_referential_phi_p): Replace with...
(name_used_by_phis_p): ...this new function.
(state_purge_per_ssa_name::process_point): Update to use the
above, so that all phi stmts at a basic block are effectively
considered simultaneously, and only consider the phi arguments for
the pertinent in-edge.
* supergraph.cc (cfg_superedge::get_phi_arg_idx): New.
(cfg_superedge::get_phi_arg): Use the above.
* supergraph.h (cfg_superedge::get_phi_arg_idx): New decl.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/explode-2.c: Remove xfail.
* gcc.dg/analyzer/explode-2a.c: Remove expected leak warning on
while stmt.
* gcc.dg/analyzer/phi-2.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/region-model.cc   | 18 +++---
 gcc/analyzer/region-model.h|  1 +
 gcc/analyzer/state-purge.cc| 42 --
 gcc/analyzer/supergraph.cc | 11 +-
 gcc/analyzer/supergraph.h  |  1 +
 gcc/testsuite/gcc.dg/analyzer/explode-2.c  |  2 +-
 gcc/testsuite/gcc.dg/analyzer/explode-2a.c |  2 +-
 gcc/testsuite/gcc.dg/analyzer/phi-2.c  | 27 ++
 8 files changed, 78 insertions(+), 26 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/phi-2.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 6d02c60449c..c029759cb9b 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -1553,11 +1553,14 @@ region_model::on_longjmp (const gcall *longjmp_call, 
const gcall *setjmp_call,
 
 /* Update this region_model for a phi stmt of the form
  LHS = PHI <...RHS...>.
-   where RHS is for the appropriate edge.  */
+   where RHS is for the appropriate edge.
+   Get state from OLD_STATE so that all of the phi stmts for a basic block
+   are effectively handled simultaneously.  */
 
 void
 region_model::handle_phi (const gphi *phi,
  tree lhs, tree rhs,
+ const region_model &old_state,
  region_model_context *ctxt)
 {
   /* For now, don't bother tracking the .MEM SSA names.  */
@@ -1566,9 +1569,10 @@ region_model::handle_phi (const gphi *phi,
   if (VAR_DECL_IS_VIRTUAL_OPERAND (var))
return;
 
-  const svalue *rhs_sval = get_rvalue (rhs, ctxt);
+  const svalue *src_sval = old_state.get_rvalue (rhs, ctxt);
+  const region *dst_reg = old_state.get_lvalue (lhs, ctxt);
 
-  set_value (get_lvalue (lhs, ctxt), rhs_sval, ctxt);
+  set_value (dst_reg, src_sval, ctxt);
 
   if (ctxt)
 ctxt->on_phi (phi, rhs);
@@ -3036,6 +3040,10 @@ region_model::update_for_phis (const supernode *snode,
 {
   gcc_assert (last_cfg_superedge);
 
+  /* Copy this state and pass it to handle_phi so that all of the phi stmts
+ are effectively handled simultaneously.  */
+  const region_model old_state (*this);
+
   for (gphi_iterator gpi = const_cast(snode)->start_phis ();
!gsi_end_p (gpi); gsi_next (&gpi))
 {
@@ -3044,8 +3052,8 @@ region_model::update_for_phis (const supernode *snode,
   tree src = last_cfg_superedge->get_phi_arg (phi);
   tree lhs = gimple_phi_result (phi);
 
-  /* Update next_state based on phi.  */
-  handle_phi (phi, lhs, src, ctxt);
+  /* Update next_state based on phi and old_state.  */
+  handle_phi (phi, lhs, src, old_state, ctxt);
 }
 }
 
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 734ec601237..cc39929db26 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -582,6 +582,7 @@ class region_model
region_model_context *ctxt);
 
   void handle_phi (const gphi *phi, tree lhs, tree rhs,
+  const region_model &old_state,
   region_model_context *ctxt);
 
   bool maybe_update_for_edge (const superedge &edge,
diff --git a/gcc/analyzer/state-purge.cc b/gcc/analyzer/state-purge.cc
index 3c3b77500a6..bfa48a9ef3f 100644
--- a/gcc/analyzer/state-purge.cc
+++ b/gcc/analyzer/state-purge.cc
@@ -288,17 +288,23 @@ state_purge_per_ssa_name::add_to_worklist (const 
function_point &point,
 }
 }
 
-/* Does this phi depend

Re: [openacc] tile, independent, default, private and firstprivate support in c/++

2021-07-21 Thread Thomas Schwinge

Hi!

Half a decade later...  ;-)

On 2015-11-05T18:10:49-0800, Cesar Philippidis  wrote:
> I've applied this patch to trunk.
> [...]

> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/goacc/template.C
> @@ -0,0 +1,141 @@
> +[...]
> +#pragma acc atomic capture
> +c = b++;
> +
> +#pragma atomic update
> +c++;
> +
> +#pragma acc atomic read
> +b = a;
> +
> +#pragma acc atomic write
> +b = a;
> +[...]

Pushed "[OpenACC] Fix '#pragma atomic update' typo in
'g++.dg/goacc/template.C'" to master branch in commit
6099b9cc8ce70d2ec7f2fc9f71da95fbb66d335f, see attached.


(Did I suggest to enable '-Wunknown-pragmas' for '-fopenacc'/'-fopenmp*',
or if that's not permissible, then at least do it in the relevant
testsuite '*.exp' files?)


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 6099b9cc8ce70d2ec7f2fc9f71da95fbb66d335f Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 21 Jul 2021 08:20:18 +0200
Subject: [PATCH] [OpenACC] Fix '#pragma atomic update' typo in
 'g++.dg/goacc/template.C'
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

[...]/g++.dg/goacc/template.C:58: warning: ignoring ‘#pragma atomic update’ [-Wunknown-pragmas]
   58 | #pragma atomic update
  |

Small fix-up for r229832 (commit 7a5e4956cc026cba54159d5c764486ac4151db85)
"[openacc] tile, independent, default, private and firstprivate support in
c/++".

	gcc/testsuite/
	* g++.dg/goacc/template.C: Fix '#pragma atomic update' typo.
---
 gcc/testsuite/g++.dg/goacc/template.C | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/goacc/template.C b/gcc/testsuite/g++.dg/goacc/template.C
index 8bcd2a1ce43..51a3f54e43f 100644
--- a/gcc/testsuite/g++.dg/goacc/template.C
+++ b/gcc/testsuite/g++.dg/goacc/template.C
@@ -55,7 +55,7 @@ oacc_parallel_copy (T a)
 #pragma acc atomic capture
   c = b++;
 
-#pragma atomic update
+#pragma acc atomic update
   c++;
 
 #pragma acc atomic read
-- 
2.30.2

OpenACC 'nohost' clause

2021-07-21 Thread Thomas Schwinge

Hi!

On 2018-10-02T07:11:43-0700, Cesar Philippidis  wrote:
> Attached is a patch that introduces support for the acc routine nohost
> clause. Basically, if an acc routine function is marked as nohost, then
> the compiler does not generate code for the host.

This is in particular useful in combination with the OpenACC 'bind'
clause and 'device_type' clause, which we don't have yet, so:

> It's kind of strange
> to test for. Basically, we had to use acc_on_device at -O2 so that the
> host references to the dead function get optimized away.

Additionally I figured out something using weak symbols.

> I believe that the nohost clause was added for acc routines to allow
> offloaded acc code to call vendor libraries, such as cuBLAS, which are
> only available for specific accelerators. I haven't seen it used much in
> practice though.

ACK.

> Is this OK for trunk?

After fixing the crucial issue to discard 'nohost' functions only for the
host but not also for all offload targets ;-) and considerably
improving/fixing the Fortran front end changes and boosting C/C++/Fortran
test coverage generally, I've now pushed "OpenACC 'nohost' clause" to
master branch in commit a61f6afbee370785cf091fe46e2e022748528307, see
attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From a61f6afbee370785cf091fe46e2e022748528307 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 21 Jul 2021 18:30:00 +0200
Subject: [PATCH] OpenACC 'nohost' clause

Do not "compile a version of this procedure for the host".

	gcc/
	* tree-core.h (omp_clause_code): Add 'OMP_CLAUSE_NOHOST'.
	* tree.c (omp_clause_num_ops, omp_clause_code_name, walk_tree_1):
	Handle it.
	* tree-pretty-print.c (dump_omp_clause): Likewise.
	* omp-general.c (oacc_verify_routine_clauses): Likewise.
	* gimplify.c (gimplify_scan_omp_clauses)
	(gimplify_adjust_omp_clauses): Likewise.
	* tree-nested.c (convert_nonlocal_omp_clauses)
	(convert_local_omp_clauses): Likewise.
	* omp-low.c (scan_sharing_clauses): Likewise.
	* omp-offload.c (execute_oacc_device_lower): Update.
	gcc/c-family/
	* c-pragma.h (pragma_omp_clause): Add 'PRAGMA_OACC_CLAUSE_NOHOST'.
	gcc/c/
	* c-parser.c (c_parser_omp_clause_name): Handle 'nohost'.
	(c_parser_oacc_all_clauses): Handle 'PRAGMA_OACC_CLAUSE_NOHOST'.
	(OACC_ROUTINE_CLAUSE_MASK): Add 'PRAGMA_OACC_CLAUSE_NOHOST'.
	* c-typeck.c (c_finish_omp_clauses): Handle 'OMP_CLAUSE_NOHOST'.
	gcc/cp/
	* parser.c (cp_parser_omp_clause_name): Handle 'nohost'.
	(cp_parser_oacc_all_clauses): Handle 'PRAGMA_OACC_CLAUSE_NOHOST'.
	(OACC_ROUTINE_CLAUSE_MASK): Add 'PRAGMA_OACC_CLAUSE_NOHOST'.
	* pt.c (tsubst_omp_clauses): Handle 'OMP_CLAUSE_NOHOST'.
	* semantics.c (finish_omp_clauses): Likewise.
	gcc/fortran/
	* dump-parse-tree.c (show_attr): Update.
	* gfortran.h (symbol_attribute): Add 'oacc_routine_nohost' member.
	(gfc_omp_clauses): Add 'nohost' member.
	* module.c (ab_attribute): Add 'AB_OACC_ROUTINE_NOHOST'.
	(attr_bits, mio_symbol_attribute): Update.
	* openmp.c (omp_mask2): Add 'OMP_CLAUSE_NOHOST'.
	(gfc_match_omp_clauses): Handle 'OMP_CLAUSE_NOHOST'.
	(OACC_ROUTINE_CLAUSES): Add 'OMP_CLAUSE_NOHOST'.
	(gfc_match_oacc_routine): Update.
	* trans-decl.c (add_attributes_to_decl): Update.
	* trans-openmp.c (gfc_trans_omp_clauses): Likewise.
	gcc/testsuite/
	* c-c++-common/goacc/classify-routine-nohost.c: New file.
	* c-c++-common/goacc/classify-routine.c: Update.
	* c-c++-common/goacc/routine-2.c: Likewise.
	* c-c++-common/goacc/routine-nohost-1.c: New file.
	* c-c++-common/goacc/routine-nohost-2.c: Likewise.
	* g++.dg/goacc/template.C: Update.
	* gfortran.dg/goacc/classify-routine-nohost.f95: New file.
	* gfortran.dg/goacc/classify-routine.f95: Update.
	* gfortran.dg/goacc/pure-elemental-procedures-2.f90: Likewise.
	* gfortran.dg/goacc/routine-6.f90: Likewise.
	* gfortran.dg/goacc/routine-intrinsic-2.f: Likewise.
	* gfortran.dg/goacc/routine-module-1.f90: Likewise.
	* gfortran.dg/goacc/routine-module-2.f90: Likewise.
	* gfortran.dg/goacc/routine-module-3.f90: Likewise.
	* gfortran.dg/goacc/routine-module-mod-1.f90: Likewise.
	* gfortran.dg/goacc/routine-multiple-directives-1.f90: Likewise.
	* gfortran.dg/goacc/routine-multiple-directives-2.f90: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c: New
	file.
	* testsuite/libgomp.oacc-c-c++-common/routine-nohost-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-nohost-2_2.c:
	Likewise.
	* testsuite/libgomp.oacc-fortran/routine-nohost-1.f90: Likewise.

Co-Authored-By: Joseph Myers 
Co-Authored-By: Cesar Philippidis 
---
 gcc/c-family/c-pragma.h   |   1 +
 gcc/c/c-parser.c  |  10 +-
 gcc/c/c-typeck.c  |   1 +
 gcc/cp/parser.c   |  11 +-
 gc

Re: [PATCH, rs6000] fix execution failure of parity_1.f90 on P10 [PR100952]

2021-07-21 Thread Segher Boessenkool

Sorry for the delay!

On Tue, Jul 13, 2021 at 09:38:33AM +0800, HAO CHEN GUI wrote:
>   PR target/100952
>   * config/rs6000/rs6000.md (cstore4): Fix wrong fall through.

> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 3f59b544f6a..d7c13d4e79d 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -11623,7 +11623,10 @@ (define_expand "cstore4"
>  {
>/* Everything is best done with setbc[r] if available.  */
>if (TARGET_POWER10 && TARGET_ISEL)
> -rs6000_emit_int_cmove (operands[0], operands[1], const1_rtx, const0_rtx);
> +{
> +  rs6000_emit_int_cmove (operands[0], operands[1], const1_rtx, 
> const0_rtx);
> +  DONE;
> +}
>  
>/* Expanding EQ and NE directly to some machine instructions does not help
>   but does hurt combine.  So don't.  */

Perfect.  Okay for trunk and backports (but do not touch GCC 11 right
now without RM approval, see
).  Thanks!


Segher

Re: [PATCH, rs6000] fix failure test cases caused by disabling mode promotion for pseudos [PR100952]

2021-07-21 Thread Segher Boessenkool

Hi!

On Tue, Jul 06, 2021 at 11:11:05AM +0800, HAO CHEN GUI wrote:
>    The patch changed matching conditions in pr81384.c and pr56605.c. 
> The original conditions failed to match due to mode promotion disabled.

>   PR target/100952
>   * gcc/testsuite/gcc.target/powerpc/pr56605.c: Change matching
>   conditions.
>   * gcc/testsuite/gcc.target/powerpc/pr81348.c: Likewise.
> 

> diff --git a/gcc/testsuite/gcc.target/powerpc/pr56605.c 
> b/gcc/testsuite/gcc.target/powerpc/pr56605.c
> index 29efd815adc..2b7ddbd7410 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr56605.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr56605.c
> @@ -11,5 +11,5 @@ void foo (short* __restrict sb, int* __restrict ia)
>  ia[i] = (int) sb[i];
>  }
>  
> -/* { dg-final { scan-rtl-dump-times "\\\(compare:CC 
> \\\((?:and|zero_extend):DI \\\(reg:\[SD\]I" 1 "combine" } } */
> +/* { dg-final { scan-rtl-dump-times "\\\(compare:CC 
> \\\((?:and|zero_extend):SI \\\(subreg:SI \\\(reg:\[SD\]I" 1 "combine" } } */

So, this testcase only runs on 64-bit machines (even only on lp64
configurations).  But do we now always get a subreg?  And, can that
change again some time in the future?

Writing it as
/* { dg-final { scan-rtl-dump-times {\(compare:CC \((?:and|zero_extend):SI 
\(subreg:SI \(reg:[SD]I} 1 "combine" } } */
is easier to read btw.

If you get a subreg:SI of a reg:SI here, something is wrong.  And you
cannot have a zero_extend:SI of anything :SI either.

So what the original matched were
  (compare:CC (and:DI (reg:DI
and
  (compare:CC (zero_extend:DI (reg:SI
and now you want to allow a subreg:SI in that last one as well (and you
do not really care what it is a subreg of, you don't check what offset
anyway), so maybe just
/* { dg-final { scan-rtl-dump-times {\(compare:CC \((?:and|zero_extend):(?:DI 
\((?:sub)?reg:[SD]I} 1 "combine" } } */
will do what you want?

> --- a/gcc/testsuite/gcc.target/powerpc/pr81348.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr81348.c
> @@ -19,5 +19,5 @@ void d(void)
>  ***c = e;
>  }
>  
> -/* { dg-final { scan-assembler {\mlxsihzx\M}  } } */
> -/* { dg-final { scan-assembler {\mvextsh2d\M} } } */
> +/* { dg-final { scan-assembler {\mlha\M}  } } */
> +/* { dg-final { scan-assembler {\mmtvsrwa\M} } } */

(This test should not test for powerpc64*-*-* but powerpc*-*-* btw,
and that means it can just be left out, so just
/* { dg-do compile { target lp64 } } */
and nothing more).

Okay for trunk with those changes (the RE and lp64).  Thanks!
(Test if it works of course; I did not :-) )

Segher

[committed] analyzer: fix ICE in binding_cluster::purge_state_involving [PR101522]

2021-07-21 Thread David Malcolm via Gcc-patches

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r12-2459-g87bd75cd49aac68e90bd9b6b5e14582d6e0ccafa.

gcc/analyzer/ChangeLog:
PR analyzer/101522
* store.cc (binding_cluster::purge_state_involving): Don't change
m_map whilst iterating through it.

gcc/testsuite/ChangeLog:
PR analyzer/101522
* g++.dg/analyzer/pr101522.C: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/store.cc| 14 +++
 gcc/testsuite/g++.dg/analyzer/pr101522.C | 31 
 2 files changed, 40 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/analyzer/pr101522.C

diff --git a/gcc/analyzer/store.cc b/gcc/analyzer/store.cc
index 0042a207ba6..8ee414da5c8 100644
--- a/gcc/analyzer/store.cc
+++ b/gcc/analyzer/store.cc
@@ -1323,6 +1323,7 @@ binding_cluster::purge_state_involving (const svalue 
*sval,
region_model_manager *sval_mgr)
 {
   auto_vec to_remove;
+  auto_vec > to_make_unknown;
   for (auto iter : m_map)
 {
   const binding_key *iter_key = iter.first;
@@ -1335,17 +1336,20 @@ binding_cluster::purge_state_involving (const svalue 
*sval,
}
   const svalue *iter_sval = iter.second;
   if (iter_sval->involves_p (sval))
-   {
- const svalue *new_sval
-   = sval_mgr->get_or_create_unknown_svalue (iter_sval->get_type ());
- m_map.put (iter_key, new_sval);
-   }
+   to_make_unknown.safe_push (std::make_pair(iter_key,
+ iter_sval->get_type ()));
 }
   for (auto iter : to_remove)
 {
   m_map.remove (iter);
   m_touched = true;
 }
+  for (auto iter : to_make_unknown)
+{
+  const svalue *new_sval
+   = sval_mgr->get_or_create_unknown_svalue (iter.second);
+  m_map.put (iter.first, new_sval);
+}
 }
 
 /* Get any SVAL bound to REG within this cluster via kind KIND,
diff --git a/gcc/testsuite/g++.dg/analyzer/pr101522.C 
b/gcc/testsuite/g++.dg/analyzer/pr101522.C
new file mode 100644
index 000..634a2ac30cd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/analyzer/pr101522.C
@@ -0,0 +1,31 @@
+// { dg-do compile { target c++11 } }
+
+double
+sqrt ();
+
+namespace std {
+  class gamma_distribution {
+  public:
+gamma_distribution () : _M_param () {}
+
+  private:
+struct param_type {
+  param_type () : _M_beta () { _M_a2 = 1 / ::sqrt (); }
+  double _M_beta, _M_a2;
+};
+param_type _M_param;
+int _M_saved_available, _M_saved = 0, _M_param0 = 0;
+  };
+
+  struct fisher_f_distribution {
+gamma_distribution _M_gd_x, _M_gd_y;
+  };
+}
+
+int
+main ()
+{
+  std::fisher_f_distribution d;
+
+  return 0;
+}
-- 
2.26.3

[committed] analyzer: bulletproof -Wanalyzer-file-leak [PR101547]

2021-07-21 Thread David Malcolm via Gcc-patches

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r12-2460-g893b12cc12877aca1c9df6272123b26eddf12722.

gcc/analyzer/ChangeLog:
PR analyzer/101547
* sm-file.cc (file_leak::emit): Handle m_arg being NULL.
(file_leak::describe_final_event): Handle ev.m_expr being NULL.

gcc/testsuite/ChangeLog:
PR analyzer/101547
* gcc.dg/analyzer/pr101547.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/sm-file.cc  | 27 ++--
 gcc/testsuite/gcc.dg/analyzer/pr101547.c | 11 ++
 2 files changed, 32 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr101547.c

diff --git a/gcc/analyzer/sm-file.cc b/gcc/analyzer/sm-file.cc
index b40a9a1edb9..6a17019448e 100644
--- a/gcc/analyzer/sm-file.cc
+++ b/gcc/analyzer/sm-file.cc
@@ -193,9 +193,13 @@ public:
 /* CWE-775: "Missing Release of File Descriptor or Handle after
Effective Lifetime". */
 m.add_cwe (775);
-return warning_meta (rich_loc, m, OPT_Wanalyzer_file_leak,
-"leak of FILE %qE",
-m_arg);
+if (m_arg)
+  return warning_meta (rich_loc, m, OPT_Wanalyzer_file_leak,
+  "leak of FILE %qE",
+  m_arg);
+else
+  return warning_meta (rich_loc, m, OPT_Wanalyzer_file_leak,
+  "leak of FILE");
   }
 
   label_text describe_state_change (const evdesc::state_change &change)
@@ -212,10 +216,21 @@ public:
   label_text describe_final_event (const evdesc::final_event &ev) FINAL 
OVERRIDE
   {
 if (m_fopen_event.known_p ())
-  return ev.formatted_print ("%qE leaks here; was opened at %@",
-ev.m_expr, &m_fopen_event);
+  {
+   if (ev.m_expr)
+ return ev.formatted_print ("%qE leaks here; was opened at %@",
+ev.m_expr, &m_fopen_event);
+   else
+ return ev.formatted_print ("leaks here; was opened at %@",
+&m_fopen_event);
+  }
 else
-  return ev.formatted_print ("%qE leaks here", ev.m_expr);
+  {
+   if (ev.m_expr)
+ return ev.formatted_print ("%qE leaks here", ev.m_expr);
+   else
+ return ev.formatted_print ("leaks here");
+  }
   }
 
 private:
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr101547.c 
b/gcc/testsuite/gcc.dg/analyzer/pr101547.c
new file mode 100644
index 000..8791cffa2b6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr101547.c
@@ -0,0 +1,11 @@
+char *
+fopen (const char *restrict, const char *restrict);
+
+void
+k2 (void)
+{
+  char *setfiles[1];
+  int i;
+
+  setfiles[i] = fopen ("", ""); /* { dg-warning "use of uninitialized value 
'i'" } */
+} /* { dg-warning "leak of FILE" } */
-- 
2.26.3

[PATCH] RISC-V: Enable overlap-by-pieces via tune param

2021-07-21 Thread Christoph Muellner via Gcc-patches

This patch adds the field overlap_op_by_pieces to the struct
riscv_tune_param, which allows to enable the overlap_op_by_pieces
feature of the by-pieces infrastructure.

gcc/ChangeLog:

* config/riscv/riscv.c (struct riscv_tune_param): New field.
(riscv_overlap_op_by_pieces): New function.
(TARGET_OVERLAP_OP_BY_PIECES_P): Connect to
riscv_overlap_op_by_pieces.

Signed-off-by: Christoph Muellner 
---
 gcc/config/riscv/riscv.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 576960bb37c..824e930ef05 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -220,6 +220,7 @@ struct riscv_tune_param
   unsigned short branch_cost;
   unsigned short memory_cost;
   bool slow_unaligned_access;
+  bool overlap_op_by_pieces;
 };
 
 /* Information about one micro-arch we know about.  */
@@ -285,6 +286,7 @@ static const struct riscv_tune_param rocket_tune_info = {
   3,   /* branch_cost */
   5,   /* memory_cost */
   true,/* 
slow_unaligned_access */
+  false,   /* overlap_op_by_pieces */
 };
 
 /* Costs to use when optimizing for Sifive 7 Series.  */
@@ -298,6 +300,7 @@ static const struct riscv_tune_param sifive_7_tune_info = {
   4,   /* branch_cost */
   3,   /* memory_cost */
   true,/* 
slow_unaligned_access */
+  false,   /* overlap_op_by_pieces */
 };
 
 /* Costs to use when optimizing for T-HEAD c906.  */
@@ -311,6 +314,7 @@ static const struct riscv_tune_param thead_c906_tune_info = 
{
   3,/* branch_cost */
   5,/* memory_cost */
   false,/* slow_unaligned_access */
+  false,   /* overlap_op_by_pieces */
 };
 
 /* Costs to use when optimizing for size.  */
@@ -324,6 +328,7 @@ static const struct riscv_tune_param 
optimize_size_tune_info = {
   1,   /* branch_cost */
   2,   /* memory_cost */
   false,   /* slow_unaligned_access */
+  false,   /* overlap_op_by_pieces */
 };
 
 static tree riscv_handle_fndecl_attribute (tree *, tree, tree, int, bool *);
@@ -5201,6 +5206,12 @@ riscv_slow_unaligned_access (machine_mode, unsigned int)
   return riscv_slow_unaligned_access_p;
 }
 
+static bool
+riscv_overlap_op_by_pieces (void)
+{
+  return tune_param->overlap_op_by_pieces;
+}
+
 /* Implement TARGET_CAN_CHANGE_MODE_CLASS.  */
 
 static bool
@@ -5525,6 +5536,9 @@ riscv_asan_shadow_offset (void)
 #undef TARGET_SLOW_UNALIGNED_ACCESS
 #define TARGET_SLOW_UNALIGNED_ACCESS riscv_slow_unaligned_access
 
+#undef TARGET_OVERLAP_OP_BY_PIECES_P
+#define TARGET_OVERLAP_OP_BY_PIECES_P riscv_overlap_op_by_pieces
+
 #undef TARGET_SECONDARY_MEMORY_NEEDED
 #define TARGET_SECONDARY_MEMORY_NEEDED riscv_secondary_memory_needed
 
-- 
2.31.1

RE: [PATCH] Support logic shift left/right for avx512 mask type.

2021-07-21 Thread Liu, Hongtao via Gcc-patches



>-Original Message-
>From: Uros Bizjak 
>Sent: Wednesday, July 21, 2021 4:23 PM
>To: Hongtao Liu 
>Cc: Liu, Hongtao ; gcc-patches@gcc.gnu.org; H. J. Lu
>; Richard Biener 
>Subject: Re: [PATCH] Support logic shift left/right for avx512 mask type.
>
>On Wed, Jul 21, 2021 at 5:05 AM Hongtao Liu  wrote:
>>
>> On Tue, Jul 20, 2021 at 9:41 PM Uros Bizjak  wrote:
>> >
>> > On Tue, Jul 20, 2021 at 2:33 PM liuhongt  wrote:
>> > >
>> > > Hi:
>> > >   As mention in
>> > > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575420.html
>> > >
>> > > cut start-
>> > > > note for the lowpart we can just view-convert away the excess
>> > > > bits, fully re-using the mask.  We generate surprisingly "good" code:
>> > > >
>> > > > kmovb   %k1, %edi
>> > > > shrb$4, %dil
>> > > > kmovb   %edi, %k2
>> > > >
>> > > > besides the lack of using kshiftrb.  I guess we're just lacking
>> > > > a mask register alternative for
>> > > Yes, we can do it similar as kor/kand/kxor.
>> > > ---cut end
>> > >
>> > >   Bootstrap and regtested on x86_64-linux-gnu{-m32,}.
>> > >   Ok for trunk?
>> > >
>> > > gcc/ChangeLog:
>> > >
>> > > * config/i386/constraints.md (Wb): New constraint.
>> > > (Ww): Ditto.
>> > > * config/i386/i386.md (*ashlhi3_1): Extend to avx512 mask
>> > > shift.
>> > > (*ashlqi3_1): Ditto.
>> > > (*3_1): Ditto.
>> > > (*3_1): Ditto.
>> > > * config/i386/sse.md (k): New define_split after
>> > > it to convert generic shift pattern to mask shift ones.
>> > >
>> > > gcc/testsuite/ChangeLog:
>> > >
>> > > * gcc.target/i386/mask-shift.c: New test.
>
>
>+(define_insn "*lshr3_1"
>+  [(set (match_operand:SWI12 0 "nonimmediate_operand" "=m, ?k")
>+(lshiftrt:SWI12
>+  (match_operand:SWI12 1 "nonimmediate_operand" "0, k")
>+  (match_operand:QI 2 "nonmemory_operand" "c, ")))
>+   (clobber (reg:CC FLAGS_REG))]
>+  "ix86_binary_operator_ok (LSHIFTRT, mode, operands)"
>
>Also split this one to QImode and HImode to avoid conditions in isa attribute.
>
>OK with this change.
>

Thanks for the review, here's the patch I'm check in.

>Thanks,
>Uros.


V3-0001-Support-logic-shift-left-right-for-avx512-mask-type.patch
Description: V3-0001-Support-logic-shift-left-right-for-avx512-mask-type.patch

Re: [PATCH 1/2] RISC-V: Add arch flags for T-HEAD.

2021-07-21 Thread Jojo R via Gcc-patches

— Jojo
在 2021年7月22日 +0800 AM4:53，Jim Wilson ，写道：
> On Tue, Jul 13, 2021 at 11:06 AM Palmer Dabbelt  wrote:
> > Is there are documentation as to what this "theadc" extension is?
>
> The best doc I know of is    https://github.com/isrc-cas/c910-llvmThe README 
> is in Chinese, but google translate does a decent job on it.  If you want 
> more details, you have to read the llvm sources to see exactly what each 
> instruction does.  They have mentioned that they are working on English 
> language docs, but I don't know when they will be available.
> There are quite a few T-Head specific instructions here.  This patch is only 
> adding support for a few of them, probably as a trial to see how it goes 
> before they try to add the rest.
Hi,

Please let me feed more details for this patch,

There are about ~100+ instructions in our ISA spec,
and we put the RFC[1] to ask guide how to commit vendor extension ISAs,
we want to commit one type instruction every time, it’s helpful for 
reviewing.

Some Chinese T-HEAD ISA Specs have been on the our web page [2] already,
and we are converting these docs to english version to help your 
reading :)
it will be out in the next week, including binutils.

Thanks for your suggestion of the patch

[1] https://github.com/riscv/riscv-gcc/issues/278
[2] https://www.t-head.cn/technology
> Jim
>

Re: [POWER10] __morestack calls from pcrel code

2021-07-21 Thread Alan Modra via Gcc-patches

On Wed, Jul 21, 2021 at 08:59:04AM -0400, David Edelsohn wrote:
> On Wed, Jul 21, 2021 at 4:29 AM Alan Modra  wrote:
> >
> > On Wed, Jul 14, 2021 at 08:24:16PM -0400, David Edelsohn wrote:
> > > > > > * config/rs6000/morestack.S (R2_SAVE): Define.
> > > > > > (__morestack): Save and restore r2.  Set up r2 for called
> > > > > > functions.
> > >
> > > This patch is okay.
> >
> > Thanks David, the patch is needed on gcc-11 and gcc-10 too.
> > OK for the branches too?
> 
> Backports are fine, but I believe that Richi is planning to cut GCC 11
> RC today, so you really should check with him about a backport at the
> last minute.

Hi Richard,
Is this patch OK at this late stage for the gcc-11 branch?
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573978.html

The impacts of the bug are segfaults and other undesirable behaviour
with Go (or more generally -fsplit-stack) on power10 when libgcc is
not power10 pcrel.  A non-pcrel libgcc is very likely how distros
will ship gcc.

-- 
Alan Modra
Australia Development Lab, IBM

Re: [PATCH] [i386] Remove pass_cpb which is related to enable avx512 embedded broadcast from constant pool.

2021-07-21 Thread Hongtao Liu via Gcc-patches

On Wed, Jul 14, 2021 at 8:38 PM H.J. Lu  wrote:
>
> On Tue, Jul 13, 2021 at 9:35 PM Hongtao Liu  wrote:
> >
> > On Wed, Jul 14, 2021 at 10:34 AM liuhongt  wrote:
> > >
> > > By optimizing vector movement to broadcast in ix86_expand_vector_move
> > > during pass_expand, pass_reload/LRA can automatically generate an avx512
> > > embedded broadcast, pass_cpb is not needed.
> > >
> > > Considering that in the absence of avx512f, broadcast from memory is
> > > still slightly faster than loading the entire memory, so always enable
> > > broadcast.
> > >
> > > benchmark:
> > > https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/vaddps/broadcast
> > >
> > > The performance diff
> > >
> > > strategy: cycles
> > > memory  : 1046611188
> > > memory  : 1255420817
> > > memory  : 1044720793
> > > memory  : 1253414145
> > > average : 1097868397
> > >
> > > broadcast   : 1044430688
> > > broadcast   : 1044477630
> > > broadcast   : 1253554603
> > > broadcast   : 1044561934
> > > average : 1096756213
> > >
> > > But however broadcast has larger size.
> > >
> > > the size diff
> > >
> > > size broadcast.o
> > >textdata bss dec hex filename
> > > 137   0   0 137  89 broadcast.o
> > >
> > > size memory.o
> > >textdata bss dec hex filename
> > > 115   0   0 115  73 memory.o
> > >
> > > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/i386/i386-expand.c
> > > (ix86_broadcast_from_integer_constant): Rename to ..
> > > (ix86_broadcast_from_constant): .. this, and extend it to
> > > handle float mode.
> > > (ix86_expand_vector_move): Extend to float mode.
> > > * config/i386/i386-features.c
> > > (replace_constant_pool_with_broadcast): Remove.
> > > (remove_partial_avx_dependency_gate): Ditto.
> > > (constant_pool_broadcast): Ditto.
> > > (class pass_constant_pool_broadcast): Ditto.
> > > (make_pass_constant_pool_broadcast): Ditto.
> > > (remove_partial_avx_dependency): Adjust gate.
> > > * config/i386/i386-passes.def: Remove 
> > > pass_constant_pool_broadcast.
> > > * config/i386/i386-protos.h
> > > (make_pass_constant_pool_broadcast): Remove.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/fuse-caller-save-xmm.c: Adjust testcase.
> > > ---
> > >  gcc/config/i386/i386-expand.c |  29 +++-
> > >  gcc/config/i386/i386-features.c   | 157 +-
> > >  gcc/config/i386/i386-passes.def   |   1 -
> > >  gcc/config/i386/i386-protos.h |   1 -
> > >  .../gcc.target/i386/fuse-caller-save-xmm.c|   2 +-
> > >  5 files changed, 26 insertions(+), 164 deletions(-)
> > >
> > > diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> > > index 69ea79e6123..ba870145acd 100644
> > > --- a/gcc/config/i386/i386-expand.c
> > > +++ b/gcc/config/i386/i386-expand.c
> > > @@ -453,8 +453,10 @@ ix86_expand_move (machine_mode mode, rtx operands[])
> > >emit_insn (gen_rtx_SET (op0, op1));
> > >  }
> > >
> > > +/* OP is a memref of CONST_VECTOR, return scalar constant mem
> > > +   if CONST_VECTOR is a vec_duplicate, else return NULL.  */
> > >  static rtx
> > > -ix86_broadcast_from_integer_constant (machine_mode mode, rtx op)
> > > +ix86_broadcast_from_constant (machine_mode mode, rtx op)
> > >  {
> > >int nunits = GET_MODE_NUNITS (mode);
> > >if (nunits < 2)
> > > @@ -462,7 +464,8 @@ ix86_broadcast_from_integer_constant (machine_mode 
> > > mode, rtx op)
> > >
> > >/* Don't use integer vector broadcast if we can't move from GPR to SSE
> > >   register directly.  */
> > > -  if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
> > > +  if (!TARGET_INTER_UNIT_MOVES_TO_VEC
> > > +  && INTEGRAL_MODE_P (mode))
> > >  return nullptr;
> > >
> > >/* Convert CONST_VECTOR to a non-standard SSE constant integer
> > > @@ -470,12 +473,17 @@ ix86_broadcast_from_integer_constant (machine_mode 
> > > mode, rtx op)
> > >if (!(TARGET_AVX2
> > > || (TARGET_AVX
> > > && (GET_MODE_INNER (mode) == SImode
> > > -   || GET_MODE_INNER (mode) == DImode)))
> > > +   || GET_MODE_INNER (mode) == DImode))
> > > +   || FLOAT_MODE_P (mode))
> > >|| standard_sse_constant_p (op, mode))
> > >  return nullptr;
> > >
> > > -  /* Don't broadcast from a 64-bit integer constant in 32-bit mode.  */
> > > -  if (GET_MODE_INNER (mode) == DImode && !TARGET_64BIT)
> > > +  /* Don't broadcast from a 64-bit integer constant in 32-bit mode.
> > > + We can still put 64-bit integer constant in memory when
> > > + avx512 embed broadcast is available.  */
> > > +  if (GET_MODE_INNER (mode) == DImode && !TARGET_64BIT
> > > +  && (!TARGET_AVX512F
> > > + || (GET_MODE_SIZE (mode) < 64 && !TARGET_AVX512VL)))
> > >

Re: [PATCH 02/10] [i386] Enable _Float16 type for TARGET_SSE2 and above.

2021-07-21 Thread Hongtao Liu via Gcc-patches

On Wed, Jul 21, 2021 at 6:35 PM Uros Bizjak  wrote:
>
> On Wed, Jul 21, 2021 at 9:43 AM liuhongt  wrote:
> >
> > gcc/ChangeLog:
> >
> > * config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode.
> > * config/i386/i386.c (enum x86_64_reg_class): Add
> > X86_64_SSEHF_CLASS.
> > (merge_classes): Handle X86_64_SSEHF_CLASS.
> > (examine_argument): Ditto.
> > (construct_container): Ditto.
> > (classify_argument): Ditto, and set HFmode/HCmode to
> > X86_64_SSEHF_CLASS.
> > (function_value_32): Return _FLoat16/Complex Float16 by
> > %xmm0/%xmm1.
I forget to update changelog entry here, Complex _Float16 will be
returned by 1 sse register, will be updated in my next version.
> > (function_value_64): Return _Float16/Complex Float16 by SSE
> > register.
> > (ix86_print_operand): Handle CONST_DOUBLE HFmode.
> > (ix86_secondary_reload): Require gpr as intermediate register
> > to store _Float16 from sse register when sse4 is not
> > available.
> > (ix86_hard_regno_mode_ok): Put HFmode in sse register and gpr.
> > (ix86_libgcc_floating_mode_supported_p): Enable _FLoat16 under
> > sse2.
> > (ix86_scalar_mode_supported_p): Ditto.
> > (TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Defined.
> > (ix86_get_excess_precision): Return
> > FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 under sse2.
> > * config/i386/i386.h (VALID_SSE2_REG_MODE): Add HFmode.
> > * config/i386/i386.md (*pushhf_rex64): New define_insn.
> > (*pushhf): Ditto.
> > (*movhf_internal): Ditto.
> > * doc/extend.texi (Half-Precision Floating Point): Documemt
> > _Float16 for x86.
> >
> > gcc/lto/ChangeLog:
> >
> > * lto-lang.c (lto_type_for_mode): Return float16_type_node
> > when mode == TYPE_MODE (float16_type_node).
> >
> > gcc/testsuite/ChangeLog
> >
> > * gcc.target/i386/sse2-float16-1.c: New test.
> > * gcc.target/i386/sse2-float16-2.c: Ditto.
> > * gcc.target/i386/sse2-float16-3.c: Ditto.
>
> OK for the x86 part with some small changes inline.
>
> Thanks,
> Uros.
>
> > ---
> >  gcc/config/i386/i386-modes.def|   1 +
> >  gcc/config/i386/i386.c|  99 ++-
> >  gcc/config/i386/i386.h|   2 +-
> >  gcc/config/i386/i386.md   | 118 +-
> >  gcc/doc/extend.texi   |  16 +++
> >  gcc/lto/lto-lang.c|   3 +
> >  .../gcc.target/i386/sse2-float16-1.c  |   8 ++
> >  .../gcc.target/i386/sse2-float16-2.c  |  16 +++
> >  .../gcc.target/i386/sse2-float16-3.c  |  12 ++
> >  9 files changed, 265 insertions(+), 10 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-3.c
> >
> > diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def
> > index 4e7014be034..9232f59a925 100644
> > --- a/gcc/config/i386/i386-modes.def
> > +++ b/gcc/config/i386/i386-modes.def
> > @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
> >
> >  FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_intel_96_format);
> >  FLOAT_MODE (TF, 16, ieee_quad_format);
> > +FLOAT_MODE (HF, 2, ieee_half_format);
> >
> >  /* In ILP32 mode, XFmode has size 12 and alignment 4.
> > In LP64 mode, XFmode has size and alignment 16.  */
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index ff96134fb37..02628d838fc 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -387,6 +387,7 @@ enum x86_64_reg_class
> >  X86_64_INTEGER_CLASS,
> >  X86_64_INTEGERSI_CLASS,
> >  X86_64_SSE_CLASS,
> > +X86_64_SSEHF_CLASS,
> >  X86_64_SSESF_CLASS,
> >  X86_64_SSEDF_CLASS,
> >  X86_64_SSEUP_CLASS,
> > @@ -2023,8 +2024,10 @@ merge_classes (enum x86_64_reg_class class1, enum 
> > x86_64_reg_class class2)
> >  return X86_64_MEMORY_CLASS;
> >
> >/* Rule #4: If one of the classes is INTEGER, the result is INTEGER.  */
> > -  if ((class1 == X86_64_INTEGERSI_CLASS && class2 == X86_64_SSESF_CLASS)
> > -  || (class2 == X86_64_INTEGERSI_CLASS && class1 == 
> > X86_64_SSESF_CLASS))
> > +  if ((class1 == X86_64_INTEGERSI_CLASS
> > +   && (class2 == X86_64_SSESF_CLASS || class2 == X86_64_SSEHF_CLASS))
> > +  || (class2 == X86_64_INTEGERSI_CLASS
> > + && (class1 == X86_64_SSESF_CLASS || class1 == 
> > X86_64_SSEHF_CLASS)))
> >  return X86_64_INTEGERSI_CLASS;
> >if (class1 == X86_64_INTEGER_CLASS || class1 == X86_64_INTEGERSI_CLASS
> >|| class2 == X86_64_INTEGER_CLASS || class2 == 
> > X86_64_INTEGERSI_CLASS)
> > @@ -2178,6 +2181,8 @@ classify_argument (machine_mode mode, const_tree type,
> >

Re: [PATCH 05/10] AVX512FP16: Support vector init/broadcast/set/extract for FP16.

2021-07-21 Thread Hongtao Liu via Gcc-patches

On Wed, Jul 21, 2021 at 3:44 PM liuhongt  wrote:
>
> gcc/ChangeLog:
>
> * config/i386/avx512fp16intrin.h (_mm_set_ph): New intrinsic.
> (_mm256_set_ph): Likewise.
> (_mm512_set_ph): Likewise.
> (_mm_setr_ph): Likewise.
> (_mm256_setr_ph): Likewise.
> (_mm512_setr_ph): Likewise.
> (_mm_set1_ph): Likewise.
> (_mm256_set1_ph): Likewise.
> (_mm512_set1_ph): Likewise.
> (_mm_setzero_ph): Likewise.
> (_mm256_setzero_ph): Likewise.
> (_mm512_setzero_ph): Likewise.
> (_mm_set_sh): Likewise.
> (_mm_load_sh): Likewise.
> (_mm_store_sh): Likewise.
> * config/i386/i386-builtin-types.def (V8HF): New type.
> (DEF_FUNCTION_TYPE (V8HF, V8HI)): New builtin function type
> * config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
> Support vector HFmodes.
> (ix86_expand_vector_init_one_nonzero): Likewise.
> (ix86_expand_vector_init_one_var): Likewise.
> (ix86_expand_vector_init_interleave): Likewise.
> (ix86_expand_vector_init_general): Likewise.
> (ix86_expand_vector_set): Likewise.
> (ix86_expand_vector_extract): Likewise.
> (ix86_expand_vector_init_concat): Likewise.
> (ix86_expand_sse_movcc): Handle vector HFmodes.
> (ix86_expand_vector_set_var): Ditto.
> * config/i386/i386-modes.def: Add HF vector modes in comment.
> * config/i386/i386.c (classify_argument): Add HF vector modes.
> (ix86_hard_regno_mode_ok): Allow HF vector modes for AVX512FP16.
> (ix86_vector_mode_supported_p): Likewise.
> (ix86_set_reg_reg_cost): Handle vector HFmode.
> (ix86_get_ssemov): Handle vector HFmode.
> (function_arg_advance_64): Pass unamed V16HFmode and V32HFmode
> by stack.
Got some feedback by H.J that 16/32/64-byte vector _Float16 should be
passed by sse registers for 32-bit mode, not stack. will handle it in
function_arg_32  in my next version.
> * config/i386/i386.h (VALID_AVX512FP16_REG_MODE): New.
> (VALID_AVX256_REG_OR_OI_MODE): Rename to ..
> (VALID_AVX256_REG_OR_OI_VHF_MODE): .. this, and add V16HF.
> (VALID_SSE2_REG_VHF_MODE): New.
> (VALID_AVX512VL_128_REG_MODE): Add V8HF and TImode.
> (SSE_REG_MODE_P): Add vector HFmode.
> * config/i386/i386.md (mode): Add HF vector modes.
> (MODE_SIZE): Likewise.
> (ssemodesuffix): Add ph suffix for HF vector modes.
> * config/i386/sse.md (VFH_128): New mode iterator.
> (VMOVE): Adjust for HF vector modes.
> (V): Likewise.
> (V_256_512): Likewise.
> (avx512): Likewise.
> (avx512fmaskmode): Likewise.
> (shuffletype): Likewise.
> (sseinsnmode): Likewise.
> (ssedoublevecmode): Likewise.
> (ssehalfvecmode): Likewise.
> (ssehalfvecmodelower): Likewise.
> (ssePScmode): Likewise.
> (ssescalarmode): Likewise.
> (ssescalarmodelower): Likewise.
> (sseintprefix): Likewise.
> (i128): Likewise.
> (bcstscalarsuff): Likewise.
> (xtg_mode): Likewise.
> (VI12HF_AVX512VL): New mode_iterator.
> (VF_AVX512FP16): Likewise.
> (VIHF): Likewise.
> (VIHF_256): Likewise.
> (VIHF_AVX512BW): Likewise.
> (V16_256): Likewise.
> (V32_512): Likewise.
> (sseintmodesuffix): New mode_attr.
> (sse): Add scalar and vector HFmodes.
> (ssescalarmode): Add vector HFmode mapping.
> (ssescalarmodesuffix): Add sh suffix for HFmode.
> (*_vm3): Use VFH_128.
> (*_vm3): Likewise.
> (*ieee_3): Likewise.
> (_blendm): New define_insn.
> (vec_setv8hf): New define_expand.
> (vec_set_0): New define_insn for HF vector set.
> (*avx512fp16_movsh): Likewise.
> (avx512fp16_movsh): Likewise.
> (vec_extract_lo_v32hi): Rename to ...
> (vec_extract_lo_): ... this, and adjust to allow HF
> vector modes.
> (vec_extract_hi_v32hi): Likewise.
> (vec_extract_hi_): Likewise.
> (vec_extract_lo_v16hi): Likewise.
> (vec_extract_lo_): Likewise.
> (vec_extract_hi_v16hi): Likewise.
> (vec_extract_hi_): Likewise.
> (vec_set_hi_v16hi): Likewise.
> (vec_set_hi_): Likewise.
> (vec_set_lo_v16hi): Likewise.
> (vec_set_lo_: Likewise.
> (*vec_extract_0): New define_insn_and_split for HF
> vector extract.
> (*vec_extracthf): New define_insn.
> (VEC_EXTRACT_MODE): Add HF vector modes.
> (PINSR_MODE): Add V8HF.
> (sse2p4_1): Likewise.
> (pinsr_evex_isa): Likewise.
> (_pinsr): Adjust to support
> insert for V8HFmode.
> (pbroadcast_evex_isa): Add HF vector modes.
> (AVX2_VEC_DUP_MODE): Likewise.
> (VEC_INIT_MODE): Likewise.
>

Re: Clarification on CTF/BTF workings with LTO

2021-07-21 Thread Richard Biener

On Wed, 21 Jul 2021, Indu Bhagat wrote:

> Hello,
> 
> Wanted to follow up on the CTF/BTF debug info + LTO workings.
> 
> To summarize, the current status/workflow on trunk is:
> 
> - The CTF container is written out in the ctfout.c or btfout.c via the
> ctf_debug_finalize () API.
> - At this time, the ctf_debug_finalize () itself is called once in
> dwarf2out_early_finish ().
> - Until this time, the requirements of CTF and BTF are simple.
>- The generated .ctf/.BTF sections needs no demarcation of "early"/"late"
> debug. All of it can be generated "early".
>- The generated .ctf/.BTF information does not need to be different for the
> final assembly and the fat LTO IR.
>- The BPF CO-RE is not yet implemented on trunk.
> 
> Writing out the CTF/BTF at dwarf2out_early_finish seems to work - there will
> always be a .ctf/.BTF section whether it's fat or slim LTO objects (because
> the emission is still in dwarf2out_early_finish on the trunk). And we have
> functionality to copy over the .ctf/.BTF debug sections in
> handle_lto_debug_sections (). However, reading through some of the past emails
> on the CTF/BTF patch series, it seems that you have been pointing to the
> CTF/BTF debug info generation being broken when used with LTO. If true, I am
> most certainly missing some key point here.
> 
> So, before we move to the next steps of supporting additional requirements of
> BPF CO-RE etc., I would like to make sure that my current understanding is OK
> and that the current state of CTF/BTF on trunk is functional -with LTO-. I
> have tested some bits (with and without fat objects on x86_64) and have not
> run into issues.
> 
> Can you please confirm what you see amiss in the current workings of CTF/BTF
> with LTO on trunk ?

So on the functional level it seems to do something, that is, I see
.ctf sections in a LTO linked test program as well as in a non-LTO
linked program from fat LTO objects.  When I dump the .ctf section
with readelf I see type info that looks OK but I don't see any
function objects (my test has a main and foo function).  It might
be an artifact of the readelf version I have (2.36.1) since the
same happens w/o LTO.

So yes, in principle it should work in case there's only info
that needs to be emitted early.  ISTR that in the beginning you
had pieces emitted from dwarf2out_finish and there my concerns
were rooted.

For DWARF the "late" data (like anything that needs relocations
to symbols or addresses) is emitted from dwarf2out_finish and
the LTRANS unit where the info is emitted from does not have
the DWARF DIE generated early in memory but instead it knows
how to reference it by a symbol + offset relocation.  So it
generates a DIE like

   DW_TAG_subprogram
   DW_AT_abstract_origin $early_debug_symbol + offset
   DW_AT_low_pc .LC0_begin
...

to amend the early DIE with additional information, creating the
"concrete" instance of the subprogram, re-using the early
generated DIE as "abstract" instance.

I understand that CTF doesn't work like this (have relocations
or DIE offsets or some such) but you need some late annotation
at least for BPF?

Richard.

RE: [PATCH] Support logic shift left/right for avx512 mask type.

2021-07-21 Thread Richard Biener

On Thu, 22 Jul 2021, Liu, Hongtao wrote:

> 
> 
> >-Original Message-
> >From: Uros Bizjak 
> >Sent: Wednesday, July 21, 2021 4:23 PM
> >To: Hongtao Liu 
> >Cc: Liu, Hongtao ; gcc-patches@gcc.gnu.org; H. J. Lu
> >; Richard Biener 
> >Subject: Re: [PATCH] Support logic shift left/right for avx512 mask type.
> >
> >On Wed, Jul 21, 2021 at 5:05 AM Hongtao Liu  wrote:
> >>
> >> On Tue, Jul 20, 2021 at 9:41 PM Uros Bizjak  wrote:
> >> >
> >> > On Tue, Jul 20, 2021 at 2:33 PM liuhongt  wrote:
> >> > >
> >> > > Hi:
> >> > >   As mention in
> >> > > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575420.html
> >> > >
> >> > > cut start-
> >> > > > note for the lowpart we can just view-convert away the excess
> >> > > > bits, fully re-using the mask.  We generate surprisingly "good" code:
> >> > > >
> >> > > > kmovb   %k1, %edi
> >> > > > shrb$4, %dil
> >> > > > kmovb   %edi, %k2
> >> > > >
> >> > > > besides the lack of using kshiftrb.  I guess we're just lacking
> >> > > > a mask register alternative for
> >> > > Yes, we can do it similar as kor/kand/kxor.
> >> > > ---cut end
> >> > >
> >> > >   Bootstrap and regtested on x86_64-linux-gnu{-m32,}.
> >> > >   Ok for trunk?
> >> > >
> >> > > gcc/ChangeLog:
> >> > >
> >> > > * config/i386/constraints.md (Wb): New constraint.
> >> > > (Ww): Ditto.
> >> > > * config/i386/i386.md (*ashlhi3_1): Extend to avx512 mask
> >> > > shift.
> >> > > (*ashlqi3_1): Ditto.
> >> > > (*3_1): Ditto.
> >> > > (*3_1): Ditto.
> >> > > * config/i386/sse.md (k): New define_split after
> >> > > it to convert generic shift pattern to mask shift ones.
> >> > >
> >> > > gcc/testsuite/ChangeLog:
> >> > >
> >> > > * gcc.target/i386/mask-shift.c: New test.
> >
> >
> >+(define_insn "*lshr3_1"
> >+  [(set (match_operand:SWI12 0 "nonimmediate_operand" "=m, ?k")
> >+(lshiftrt:SWI12
> >+  (match_operand:SWI12 1 "nonimmediate_operand" "0, k")
> >+  (match_operand:QI 2 "nonmemory_operand" "c, ")))
> >+   (clobber (reg:CC FLAGS_REG))]
> >+  "ix86_binary_operator_ok (LSHIFTRT, mode, operands)"
> >
> >Also split this one to QImode and HImode to avoid conditions in isa 
> >attribute.
> >
> >OK with this change.
> >
> 
> Thanks for the review, here's the patch I'm check in.

Works with my experimental patches, thanks!

Richard.

Re: [POWER10] __morestack calls from pcrel code

2021-07-21 Thread Richard Biener

On Thu, 22 Jul 2021, Alan Modra wrote:

> On Wed, Jul 21, 2021 at 08:59:04AM -0400, David Edelsohn wrote:
> > On Wed, Jul 21, 2021 at 4:29 AM Alan Modra  wrote:
> > >
> > > On Wed, Jul 14, 2021 at 08:24:16PM -0400, David Edelsohn wrote:
> > > > > > > * config/rs6000/morestack.S (R2_SAVE): Define.
> > > > > > > (__morestack): Save and restore r2.  Set up r2 for called
> > > > > > > functions.
> > > >
> > > > This patch is okay.
> > >
> > > Thanks David, the patch is needed on gcc-11 and gcc-10 too.
> > > OK for the branches too?
> > 
> > Backports are fine, but I believe that Richi is planning to cut GCC 11
> > RC today, so you really should check with him about a backport at the
> > last minute.
> 
> Hi Richard,
> Is this patch OK at this late stage for the gcc-11 branch?
> https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573978.html
> 
> The impacts of the bug are segfaults and other undesirable behaviour
> with Go (or more generally -fsplit-stack) on power10 when libgcc is
> not power10 pcrel.  A non-pcrel libgcc is very likely how distros
> will ship gcc.

If you think it's safe (well, there are not many __morestack users,
so based on that it's pretty safe) then go ahead.

Richard.

96 matches

Mail list logo