Re: [PATCH v3] Modify combine pattern by a pseudo AND with its nonzero bits [PR93453]

2022-08-12 Thread HAO CHEN GUI via Gcc-patches
Hi Segher,

On 12/8/2022 上午 1:40, Segher Boessenkool wrote:
> Yes, but combine will use splitters as well.
Combine pass invokes combine_split_insns for 3-insn combine. If we want
to do the split for 2-insn combine (like the test case in PR), we have to
add a special case?

> 
> You can use nonzero_bits in the split condition (the second condition in
> a define_split, or the sole condition in a define_split) just fine, as
> long as the replacement RTL does not rely on the nonzero_bits itself.
> You cannot use it in the insn condition (the first condition in a
> define_insn_and_split, or the one condition in a define_insn) because
> that RTL will survive past combine, and then nonzero_bits can have bits
> cleared that were set before (that were determined to be always zero
> during combine, but that knowledge is gone later).

I tried to add a define_insn_and split pattern in rs6000.md, just like the
following code. The nonzero_bits is used in insn condition (for combine)
and no condition for the split. I can't set nonzero_bits in split condition
as it never matches and cause ICE then.

I am not sure if it is safe. If such an insn doesn't stem from the combine,
there is no guarantee that the nonzero_bits condition matches.


(define_insn_and_split "*test"
  [(set (match_operand:GPR 0 "gpc_reg_operand")
(plus_ior_xor:GPR (ashift:GPR (match_operand:GPR 1 "gpc_reg_operand")
  (match_operand:SI 2 "const_int_operand"))
  (match_operand:GPR 3 "gpc_reg_operand")))]
  "nonzero_bits (operands[3], mode)
   < HOST_WIDE_INT_1U << INTVAL (operands[2])"
  "#"
  ""
  [(set (match_dup 0)
(ior:GPR (and:GPR (match_dup 3)
  (match_dup 4))
 (ashift:GPR (match_dup 1)
 (match_dup 2]
{
  operands[4] = GEN_INT ((HOST_WIDE_INT_1U << INTVAL (operands[2])) - 1);
})

Thanks
Gui Haochen




[PATCH] phiopt: Remove unnecessary checks from spaceship_replacement [PR106506]

2022-08-12 Thread Jakub Jelinek via Gcc-patches
Hi!

Those 2 checks were just me trying to be extra careful, the
(phires & 1) == phires and variants it is folded to of course make only sense
for the -1/0/1/2 result spaceship, for -1/0/1 one can just use comparisons of
phires.  We only floating point spaceship if nans aren't honored, so the
2 case is ignored, and if it is, with Aldy's changes we can simplify the
2 case away from the phi but the (phires & 1) == phires stayed.  It is safe
to treat the phires comparison as phires >= 0 even then.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-08-11  Jakub Jelinek  

PR tree-optimization/106506
* tree-ssa-phiopt.cc (spaceship_replacement): Don't punt for
is_cast or orig_use_lhs cases if phi_bb has 3 predecessors.

* g++.dg/opt/pr94589-2.C: New test.

--- gcc/tree-ssa-phiopt.cc.jj   2022-08-10 09:06:53.0 +0200
+++ gcc/tree-ssa-phiopt.cc  2022-08-10 15:33:32.414641593 +0200
@@ -2448,8 +2448,6 @@ spaceship_replacement (basic_block cond_
return false;
   if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (orig_use_lhs))
return false;
-  if (EDGE_COUNT (phi_bb->preds) != 4)
-   return false;
   if (!single_imm_use (orig_use_lhs, &use_p, &use_stmt))
return false;
 
@@ -2467,8 +2465,6 @@ spaceship_replacement (basic_block cond_
   orig_use_lhs = gimple_assign_lhs (use_stmt);
   if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (orig_use_lhs))
return false;
-  if (EDGE_COUNT (phi_bb->preds) != 4)
-   return false;
   if (!single_imm_use (orig_use_lhs, &use_p, &use_stmt))
return false;
 }
--- gcc/testsuite/g++.dg/opt/pr94589-2.C.jj 2022-08-10 09:06:52.921213966 
+0200
+++ gcc/testsuite/g++.dg/opt/pr94589-2.C2022-08-10 15:45:24.599319922 
+0200
@@ -1,7 +1,7 @@
 // PR tree-optimization/94589
 // { dg-do compile { target c++20 } }
 // { dg-options "-O2 -g0 -ffast-math -fdump-tree-optimized" }
-// { dg-final { scan-tree-dump-times "\[ij]_\[0-9]+\\(D\\) (?:<|<=|==|!=|>|>=) 
\[ij]_\[0-9]+\\(D\\)" 12 "optimized" { xfail *-*-* } } }
+// { dg-final { scan-tree-dump-times "\[ij]_\[0-9]+\\(D\\) (?:<|<=|==|!=|>|>=) 
\[ij]_\[0-9]+\\(D\\)" 12 "optimized" } }
 // { dg-final { scan-tree-dump-times "i_\[0-9]+\\(D\\) (?:<|<=|==|!=|>|>=) 
5\\.0" 12 "optimized" } }
 
 #include 

Jakub



[committed] testsuite: Fix up pr104992* tests on i686-linux [PR104992]

2022-08-12 Thread Jakub Jelinek via Gcc-patches
Hi!

These 2 tests were FAILing on i686-linux or e.g. with
--target_board=unix/-m32/-mno-sse on x86_64-linux due to
-Wpsabi warnings and also because dg-options in the latter
test has been ignored due to missing space, so even -O2
wasn't passed at all.

Tested with
make check-gcc check-g++ 
RUNTESTFLAGS='--target_board=unix\{-m32,-m32/-mno-sse,-m64\} dg.exp=pr104992*'
on x86_64-linux, committed to trunk as obvious.

2022-08-11  Jakub Jelinek  

PR tree-optimization/104992
* gcc.dg/pr104992.c: Add -Wno-psabi to dg-options.
* g++.dg/pr104992-1.C: Likewise.  Add space between " and } in
dg-options.

--- gcc/testsuite/gcc.dg/pr104992.c.jj  2022-08-10 09:06:52.955213523 +0200
+++ gcc/testsuite/gcc.dg/pr104992.c 2022-08-11 10:07:52.047940115 +0200
@@ -1,6 +1,6 @@
 /* PR tree-optimization/104992 */
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-options "-O2 -Wno-psabi -fdump-tree-optimized" } */
 
 #define vector __attribute__((vector_size(4*sizeof(int
 
--- gcc/testsuite/g++.dg/pr104992-1.C.jj2022-08-10 09:06:52.922213953 
+0200
+++ gcc/testsuite/g++.dg/pr104992-1.C   2022-08-11 10:11:16.585203417 +0200
@@ -1,6 +1,6 @@
 /* PR tree-optimization/104992 */
 /* { dg-do run } */
-/* { dg-options "-O2"} */
+/* { dg-options "-O2 -Wno-psabi" } */
 
 #include "../gcc.dg/pr104992.c"
 

Jakub



[committed] testsuite: Fix up pr106243* tests on i686-linux [PR106243]

2022-08-12 Thread Jakub Jelinek via Gcc-patches
Hi!

These 2 tests were FAILing on i686-linux or e.g. with
--target_board=unix/-m32/-mno-sse on x86_64-linux due to
-Wpsabi warnings.

Tested with
make check-gcc check-g++ 
RUNTESTFLAGS='--target_board=unix\{-m32,-m32/-mno-sse,-m64\} dg.exp=pr106243*'
on x86_64-linux, committed to trunk as obvious.

2022-08-11  Jakub Jelinek  

PR tree-optimization/106243
* gcc.dg/pr106243.c: Add -Wno-psabi to dg-options.
* gcc.dg/pr106243-1.c: Likewise.

--- gcc/testsuite/gcc.dg/pr106243.c.jj  2022-08-10 09:06:52.955213523 +0200
+++ gcc/testsuite/gcc.dg/pr106243.c 2022-08-11 10:13:52.965111058 +0200
@@ -1,6 +1,6 @@
 /* PR tree-optimization/106243 */
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-options "-O2 -Wno-psabi -fdump-tree-optimized" } */
 
 #define vector __attribute__((vector_size(4*sizeof(int
 
--- gcc/testsuite/gcc.dg/pr106243-1.c.jj2022-08-10 09:06:52.955213523 
+0200
+++ gcc/testsuite/gcc.dg/pr106243-1.c   2022-08-11 10:14:06.381931542 +0200
@@ -1,6 +1,6 @@
 /* PR tree-optimization/106243 */
 /* { dg-do run } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -Wno-psabi" } */
 
 #include "pr106243.c"
 

Jakub



[PATCH v6] LoongArch: add addr_global attribute

2022-08-12 Thread Xi Ruoyao via Gcc-patches
v5 -> v6:

* still use "addr_global" as we don't have a better name.
* add a test case with -mno-explicit-relocs.

-- >8 --

A linker script and/or a section attribute may locate a local object in
some way unexpected by the code model, leading to a link failure.  This
happens when the Linux kernel loads a module with "local" per-CPU
variables.

Add an attribute to explicitly mark an variable with the address
unlimited by the code model so we would be able to work around such
problems.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_attribute_table):
New attribute table.
(TARGET_ATTRIBUTE_TABLE): Define the target hook.
(loongarch_handle_addr_global_attribute): New static function.
(loongarch_classify_symbol): Return SYMBOL_GOT_DISP for
SYMBOL_REF_DECL with addr_global attribute.
(loongarch_use_anchors_for_symbol_p): New static function.
(TARGET_USE_ANCHORS_FOR_SYMBOL_P): Define the target hook.
* doc/extend.texi (Variable Attributes): Document new
LoongArch specific attribute.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/attr-addr_global-1.c: New test.
* gcc.target/loongarch/attr-addr_global-2.c: New test.
---
 gcc/config/loongarch/loongarch.cc | 63 +++
 gcc/doc/extend.texi   | 17 +
 .../gcc.target/loongarch/attr-addr_global-1.c | 29 +
 .../gcc.target/loongarch/attr-addr_global-2.c | 29 +
 4 files changed, 138 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/attr-addr_global-1.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/attr-addr_global-2.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 79687340dfd..978e66ed549 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1643,6 +1643,15 @@ loongarch_classify_symbol (const_rtx x)
   && !loongarch_symbol_binds_local_p (x))
 return SYMBOL_GOT_DISP;
 
+  if (SYMBOL_REF_P (x))
+{
+  tree decl = SYMBOL_REF_DECL (x);
+  /* An addr_global symbol may be out of the +/- 2GiB range around
+the PC, so we have to use GOT.  */
+  if (decl && lookup_attribute ("addr_global", DECL_ATTRIBUTES (decl)))
+   return SYMBOL_GOT_DISP;
+}
+
   return SYMBOL_PCREL;
 }
 
@@ -6068,6 +6077,54 @@ loongarch_starting_frame_offset (void)
   return crtl->outgoing_args_size;
 }
 
+static tree
+loongarch_handle_addr_global_attribute (tree *node, tree name, tree, int,
+   bool *no_add_attrs)
+{
+  tree decl = *node;
+  if (TREE_CODE (decl) == VAR_DECL)
+{
+  if (DECL_CONTEXT (decl)
+ && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL
+ && !TREE_STATIC (decl))
+   {
+ error_at (DECL_SOURCE_LOCATION (decl),
+   "%qE attribute cannot be specified for local "
+   "variables", name);
+ *no_add_attrs = true;
+   }
+}
+  else
+{
+  warning (OPT_Wattributes, "%qE attribute ignored", name);
+  *no_add_attrs = true;
+}
+  return NULL_TREE;
+}
+
+static const struct attribute_spec loongarch_attribute_table[] =
+{
+  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
+   affects_type_identity, handler, exclude } */
+  { "addr_global", 0, 0, true, false, false, false,
+loongarch_handle_addr_global_attribute, NULL },
+  /* The last attribute spec is set to be NULL.  */
+  {}
+};
+
+bool
+loongarch_use_anchors_for_symbol_p (const_rtx symbol)
+{
+  tree decl = SYMBOL_REF_DECL (symbol);
+
+  /* An addr_global attribute indicates the linker may move the symbol away,
+ so the use of anchor may cause relocation overflow.  */
+  if (decl && lookup_attribute ("addr_global", DECL_ATTRIBUTES (decl)))
+return false;
+
+  return default_use_anchors_for_symbol_p (symbol);
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
@@ -6256,6 +6313,12 @@ loongarch_starting_frame_offset (void)
 #undef  TARGET_HAVE_SPECULATION_SAFE_VALUE
 #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
 
+#undef  TARGET_ATTRIBUTE_TABLE
+#define TARGET_ATTRIBUTE_TABLE loongarch_attribute_table
+
+#undef  TARGET_USE_ANCHORS_FOR_SYMBOL_P
+#define TARGET_USE_ANCHORS_FOR_SYMBOL_P loongarch_use_anchors_for_symbol_p
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-loongarch.h"
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 7fe7f8817cd..b1173e15c7c 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -7314,6 +7314,7 @@ attributes.
 * Blackfin Variable Attributes::
 * H8/300 Variable Attributes::
 * IA-64 Variable Attributes::
+* LoongArch Variable Attributes::
 * M32R/D Variable Attributes::
 * MeP Variable Attributes::
 * Microsoft Windows Variable Attributes::
@@ -8098,6 +8099,22 @@ defined by shared libraries.
 
 @end ta

[PATCH v2] rs6000: Rework ELFv2 support for -fpatchable-function-entry* [PR99888]

2022-08-12 Thread Kewen.Lin via Gcc-patches
Hi,

As PR99888 and its related show, the current support for
-fpatchable-function-entry on powerpc ELFv2 doesn't work
well with global entry existence.  For example, with one
command line option -fpatchable-function-entry=3,2, it got
below w/o this patch:

  .LPFE1:
  nop
  nop
  .type   foo, @function
  foo:
  nop
  .LFB0:
  .cfi_startproc
  .LCF0:
  0:  addis 2,12,.TOC.-.LCF0@ha
  addi 2,2,.TOC.-.LCF0@l
  .localentry foo,.-foo

, the assembly is unexpected since the patched NOPs have
no effects when being entered from local entry.

This patch is to update the NOPs patched before and after
local entry, it looks like:

  .type   foo, @function
  foo:
  .LFB0:
  .cfi_startproc
  .LCF0:
  0:  addis 2,12,.TOC.-.LCF0@ha
  addi 2,2,.TOC.-.LCF0@l
  nop
  nop
  .localentry foo,.-foo
  nop

Bootstrapped and regtested on powerpc64-linux-gnu P7 & P8,
and powerpc64le-linux-gnu P9 & P10.

v2: Update some comments, error message wordings, and test
cases as Segher's review comments.

v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599461.html

Is it ok for trunk?

BR,
Kewen
-
PR target/99888
PR target/105649

gcc/ChangeLog:

* config/rs6000/rs6000-internal.h
(rs6000_print_patchable_function_entry): New function declaration.
* config/rs6000/rs6000-logue.cc (rs6000_output_function_prologue):
Support patchable-function-entry by emitting NOPs before and after
local entry for the function that needs global entry.
* config/rs6000/rs6000.cc (rs6000_print_patchable_function_entry): Skip
the function that needs global entry till global entry has been
emitted.
* config/rs6000/rs6000.h (struct machine_function): New bool member
global_entry_emitted.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr99888-1.c: New test.
* gcc.target/powerpc/pr99888-2.c: New test.
* gcc.target/powerpc/pr99888-3.c: New test.
* gcc.target/powerpc/pr99888-4.c: New test.
* gcc.target/powerpc/pr99888-5.c: New test.
* gcc.target/powerpc/pr99888-6.c: New test.
* c-c++-common/patchable_function_entry-default.c: Adjust for
powerpc_elfv2 to avoid compilation error.
---
 gcc/config/rs6000/rs6000-internal.h   |  5 +++
 gcc/config/rs6000/rs6000-logue.cc | 32 +
 gcc/config/rs6000/rs6000.cc   | 10 -
 gcc/config/rs6000/rs6000.h|  4 ++
 .../patchable_function_entry-default.c|  1 +
 gcc/testsuite/gcc.target/powerpc/pr99888-1.c  | 45 +++
 gcc/testsuite/gcc.target/powerpc/pr99888-2.c  | 45 +++
 gcc/testsuite/gcc.target/powerpc/pr99888-3.c  | 12 +
 gcc/testsuite/gcc.target/powerpc/pr99888-4.c  | 13 ++
 gcc/testsuite/gcc.target/powerpc/pr99888-5.c  | 13 ++
 gcc/testsuite/gcc.target/powerpc/pr99888-6.c  | 14 ++
 11 files changed, 192 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr99888-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr99888-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr99888-3.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr99888-4.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr99888-5.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr99888-6.c

diff --git a/gcc/config/rs6000/rs6000-internal.h 
b/gcc/config/rs6000/rs6000-internal.h
index b9e82c0468d..da809d1ac8b 100644
--- a/gcc/config/rs6000/rs6000-internal.h
+++ b/gcc/config/rs6000/rs6000-internal.h
@@ -182,10 +182,15 @@ extern tree rs6000_fold_builtin (tree fndecl 
ATTRIBUTE_UNUSED,
 tree *args ATTRIBUTE_UNUSED,
 bool ignore ATTRIBUTE_UNUSED);

+extern void rs6000_print_patchable_function_entry (FILE *,
+  unsigned HOST_WIDE_INT,
+  bool);
+
 extern bool rs6000_passes_float;
 extern bool rs6000_passes_long_double;
 extern bool rs6000_passes_vector;
 extern bool rs6000_returns_struct;
 extern bool cpu_builtin_p;

+
 #endif
diff --git a/gcc/config/rs6000/rs6000-logue.cc 
b/gcc/config/rs6000/rs6000-logue.cc
index 59fe1c8cb8b..3e2b1773154 100644
--- a/gcc/config/rs6000/rs6000-logue.cc
+++ b/gcc/config/rs6000/rs6000-logue.cc
@@ -4013,11 +4013,43 @@ rs6000_output_function_prologue (FILE *file)
  fprintf (file, "\tadd 2,2,12\n");
}

+  unsigned short patch_area_size = crtl->patch_area_size;
+  unsigned short patch_area_entry = crtl->patch_area_entry;
+  /* Need to emit the patching area.  */
+  if (patch_area_size > 0)
+   {
+ cfun->machine->global_entry_emitted = true;
+ /* As ELFv2 ABI shows, the allowable bytes between the global
+and local entry points are 0, 4, 8, 16, 32 an

[PATCH] vect: Don't allow vect_emulated_vector_p type in vectorizable_call [PR106322]

2022-08-12 Thread Kewen.Lin via Gcc-patches
Hi,

As PR106322 shows, in some cases for some vector type whose
TYPE_MODE is a scalar integral mode instead of a vector mode,
it's possible to obtain wrong target support information when
querying with the scalar integral mode.  For example, for the
test case in PR106322, on ppc64 32bit vectorizer gets vector
type "vector(2) short unsigned int" for scalar type "short
unsigned int", its mode is SImode instead of V2HImode.  The
target support querying checks umul_highpart optab with SImode
and considers it's supported, then vectorizer further generates
.MULH IFN call for that vector type.  Unfortunately it's wrong
to use SImode support for that vector type multiply highpart
here.

This patch is to teach vectorizable_call analysis not to allow
vect_emulated_vector_p type for both vectype_in and vectype_out
as Richi suggested.

Bootstrapped and regtested on x86_64-redhat-linux,
aarch64-linux-gnu and powerpc64{,le}-linux-gnu.

Is it ok for trunk?  If it's ok, I guess we want this to be
backported?

BR,
Kewen
-
PR tree-optimization/106322

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_call): Don't allow
vect_emulated_vector_p type for both vectype_in and vectype_out.

gcc/testsuite/ChangeLog:

* g++.target/i386/pr106322.C: New test.
* g++.target/powerpc/pr106322.C: New test.
---
 gcc/testsuite/g++.target/i386/pr106322.C| 196 
 gcc/testsuite/g++.target/powerpc/pr106322.C | 195 +++
 gcc/tree-vect-stmts.cc  |   8 +
 3 files changed, 399 insertions(+)
 create mode 100644 gcc/testsuite/g++.target/i386/pr106322.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr106322.C

diff --git a/gcc/testsuite/g++.target/i386/pr106322.C 
b/gcc/testsuite/g++.target/i386/pr106322.C
new file mode 100644
index 000..3cd8d6bf225
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr106322.C
@@ -0,0 +1,196 @@
+/* { dg-do run } */
+/* { dg-require-effective-target ia32 } */
+/* { dg-require-effective-target c++11 } */
+/* { dg-options "-O2 -mtune=generic -march=i686" } */
+
+/* As PR106322, verify this can execute well (not abort).  */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+__attribute__((noipa))
+bool BytesEqual(const void *bytes1, const void *bytes2, const size_t size) {
+  return memcmp(bytes1, bytes2, size) == 0;
+}
+
+#define HWY_ALIGNMENT 64
+constexpr size_t kAlignment = HWY_ALIGNMENT;
+constexpr size_t kAlias = kAlignment * 4;
+
+namespace hwy {
+namespace N_EMU128 {
+template  struct Vec128 {
+  T raw[16 / sizeof(T)] = {};
+};
+} // namespace N_EMU128
+} // namespace hwy
+
+template 
+static void Store(const hwy::N_EMU128::Vec128 v,
+  T *__restrict__ aligned) {
+  __builtin_memcpy(aligned, v.raw, sizeof(T) * N);
+}
+
+template 
+static hwy::N_EMU128::Vec128 Load(const T *__restrict__ aligned) {
+  hwy::N_EMU128::Vec128 v;
+  __builtin_memcpy(v.raw, aligned, sizeof(T) * N);
+  return v;
+}
+
+template 
+static hwy::N_EMU128::Vec128
+MulHigh(hwy::N_EMU128::Vec128 a,
+const hwy::N_EMU128::Vec128 b) {
+  for (size_t i = 0; i < N; ++i) {
+// Cast to uint32_t first to prevent overflow. Otherwise the result of
+// uint16_t * uint16_t is in "int" which may overflow. In practice the
+// result is the same but this way it is also defined.
+a.raw[i] = static_cast(
+(static_cast(a.raw[i]) * static_cast(b.raw[i])) >>
+16);
+  }
+  return a;
+}
+
+#define HWY_ASSERT(condition) assert((condition))
+#define HWY_ASSUME_ALIGNED(ptr, align) __builtin_assume_aligned((ptr), (align))
+
+#pragma pack(push, 1)
+struct AllocationHeader {
+  void *allocated;
+  size_t payload_size;
+};
+#pragma pack(pop)
+
+static void FreeAlignedBytes(const void *aligned_pointer) {
+  HWY_ASSERT(aligned_pointer != nullptr);
+  if (aligned_pointer == nullptr)
+return;
+
+  const uintptr_t payload = reinterpret_cast(aligned_pointer);
+  HWY_ASSERT(payload % kAlignment == 0);
+  const AllocationHeader *header =
+  reinterpret_cast(payload) - 1;
+
+  free(header->allocated);
+}
+
+class AlignedFreer {
+public:
+  template  void operator()(T *aligned_pointer) const {
+FreeAlignedBytes(aligned_pointer);
+  }
+};
+
+template 
+using AlignedFreeUniquePtr = std::unique_ptr;
+
+static inline constexpr size_t ShiftCount(size_t n) {
+  return (n <= 1) ? 0 : 1 + ShiftCount(n / 2);
+}
+
+namespace {
+static size_t NextAlignedOffset() {
+  static std::atomic next{0};
+  constexpr uint32_t kGroups = kAlias / kAlignment;
+  const uint32_t group = next.fetch_add(1, std::memory_order_relaxed) % 
kGroups;
+  const size_t offset = kAlignment * group;
+  HWY_ASSERT((offset % kAlignment == 0) && offset <= kAlias);
+  return offset;
+}
+} // namespace
+
+static void *AllocateAlignedBytes(const size_t payload_size) {
+  HWY_ASSERT(payload_size != 0); // likely a bug in caller
+  if (payload_size >= std::numeric_limits::max() / 2) {
+HWY_ASSERT(false && "payload_size too la

Re: [PATCH] rs6000: avoid ineffective replacement of splitters

2022-08-12 Thread Kewen.Lin via Gcc-patches
Hi Jeff,

on 2022/8/12 14:39, Jiufu Guo via Gcc-patches wrote:
> Hi,
> 
> As a comment in
> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599556.html
> 
> Those splitters call rs6000_emit_set_const directly, and the replacements
> are never used.  Using (pc) would be less misleading.

Since the replacements are never used, IMHO this subject doesn't
quite meet the change.  How about "fix misleading new patterns
of splitters"?

> 
> This patch pass bootstrap®test on ppc64 BE and LE.
> Is this ok for trunk.

This patch is OK w/ or w/o subject tweaked.  Thanks!

BR,
Kewen

> 
> BR,
> Jeff(Jiufu)
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.md: (constant splitters): Use "(pc)" as the
>   replacements.
> 
> ---
>  gcc/config/rs6000/rs6000.md | 12 +++-
>  1 file changed, 3 insertions(+), 9 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 1367a2cb779..7fadbeef1aa 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -7727,11 +7727,7 @@ (define_split
>[(set (match_operand:SI 0 "gpc_reg_operand")
>   (match_operand:SI 1 "const_int_operand"))]
>"num_insns_constant (operands[1], SImode) > 1"
> -  [(set (match_dup 0)
> - (match_dup 2))
> -   (set (match_dup 0)
> - (ior:SI (match_dup 0)
> - (match_dup 3)))]
> +  [(pc)]
>  {
>if (rs6000_emit_set_const (operands[0], operands[1]))
>  DONE;
> @@ -9662,8 +9658,7 @@ (define_split
>[(set (match_operand:DI 0 "int_reg_operand_not_pseudo")
>   (match_operand:DI 1 "const_int_operand"))]
>"TARGET_POWERPC64 && num_insns_constant (operands[1], DImode) > 1"
> -  [(set (match_dup 0) (match_dup 2))
> -   (set (match_dup 0) (plus:DI (match_dup 0) (match_dup 3)))]
> +  [(pc)]
>  {
>if (rs6000_emit_set_const (operands[0], operands[1]))
>  DONE;
> @@ -9675,8 +9670,7 @@ (define_split
>[(set (match_operand:DI 0 "int_reg_operand_not_pseudo")
>   (match_operand:DI 1 "const_scalar_int_operand"))]
>"TARGET_POWERPC64 && num_insns_constant (operands[1], DImode) > 1"
> -  [(set (match_dup 0) (match_dup 2))
> -   (set (match_dup 0) (plus:DI (match_dup 0) (match_dup 3)))]
> +  [(pc)]
>  {
>if (rs6000_emit_set_const (operands[0], operands[1]))
>  DONE;


[PATCH] s390: Add -munroll-only-small-loops.

2022-08-12 Thread Robin Dapp via Gcc-patches
Hi,

inspired by Power we also introduce -munroll-only-small-loops.  This
implies activating -funroll-loops and -munroll-only-small-loops at -O2
and above.

Bootstrapped and regtested.

This introduces one regression in gcc.dg/sms-compare-debug-1.c but
currently dumps for sms are broken as well.  The difference is in the
location of some INSN_DELETED notes so I would consider this a minor issue.

Is it OK?

Regards
 Robin

gcc/ChangeLog:

* common/config/s390/s390-common.cc: Enable -funroll-loops and
-munroll-only-small-loops for OPT_LEVELS_2_PLUS_SPEED_ONLY.
* config/s390/s390.cc (s390_loop_unroll_adjust): Do not unroll
loops larger than 12 instructions.
(s390_override_options_after_change): Set unroll options.
(s390_option_override_internal): Likewise.
* config/s390/s390.opt: Document munroll-only-small-loops.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/vec-copysign.c: Do not unroll.
* gcc.target/s390/zvector/autovec-double-quiet-uneq.c: Dito.
* gcc.target/s390/zvector/autovec-double-signaling-ltgt.c: Dito.
* gcc.target/s390/zvector/autovec-float-quiet-uneq.c: Dito.
* gcc.target/s390/zvector/autovec-float-signaling-ltgt.c: Dito.
---
 gcc/common/config/s390/s390-common.cc |  5 +++
 gcc/config/s390/s390.cc   | 31 +++
 gcc/config/s390/s390.opt  |  4 +++
 .../gcc.target/s390/vector/vec-copysign.c |  2 +-
 .../s390/zvector/autovec-double-quiet-uneq.c  |  2 +-
 .../zvector/autovec-double-signaling-ltgt.c   |  2 +-
 .../s390/zvector/autovec-float-quiet-uneq.c   |  2 +-
 .../zvector/autovec-float-signaling-ltgt.c|  2 +-
 8 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/gcc/common/config/s390/s390-common.cc
b/gcc/common/config/s390/s390-common.cc
index 72a5ef47eaac..be3e6f201429 100644
--- a/gcc/common/config/s390/s390-common.cc
+++ b/gcc/common/config/s390/s390-common.cc
@@ -64,6 +64,11 @@ static const struct default_options
s390_option_optimization_table[] =
 /* Enable -fsched-pressure by default when optimizing.  */
 { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },

+/* Enable -munroll-only-small-loops with -funroll-loops to unroll small
+   loops at -O2 and above by default.  */
+{ OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
+{ OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL,
1 },
+
 /* ??? There are apparently still problems with -fcaller-saves.  */
 { OPT_LEVELS_ALL, OPT_fcaller_saves, NULL, 0 },

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 5644600edf3d..ef38fbe68c84 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -15457,6 +15457,21 @@ s390_loop_unroll_adjust (unsigned nunroll,
struct loop *loop)
   if (s390_tune < PROCESSOR_2097_Z10)
 return nunroll;

+  if (unroll_only_small_loops)
+{
+  /* Only unroll loops smaller than or equal to 12 insns.  */
+  const unsigned int small_threshold = 12;
+
+  if (loop->ninsns > small_threshold)
+   return 0;
+
+  /* ???: Make this dependent on the type of registers in
+the loop.  Increase the limit for vector registers.  */
+  const unsigned int max_insns = optimize >= 3 ? 36 : 24;
+
+  nunroll = MIN (nunroll, max_insns / loop->ninsns);
+}
+
   /* Count the number of memory references within the loop body.  */
   bbs = get_loop_body (loop);
   subrtx_iterator::array_type array;
@@ -15531,6 +15546,19 @@ static void
 s390_override_options_after_change (void)
 {
   s390_default_align (&global_options);
+
+  /* Explicit -funroll-loops turns -munroll-only-small-loops off.  */
+  if ((OPTION_SET_P (flag_unroll_loops) && flag_unroll_loops)
+   || (OPTION_SET_P (flag_unroll_all_loops)
+  && flag_unroll_all_loops))
+{
+  if (!OPTION_SET_P (unroll_only_small_loops))
+   unroll_only_small_loops = 0;
+  if (!OPTION_SET_P (flag_cunroll_grow_size))
+   flag_cunroll_grow_size = 1;
+}
+  else if (!OPTION_SET_P (flag_cunroll_grow_size))
+flag_cunroll_grow_size = flag_peel_loops || optimize >= 3;
 }

 static void
@@ -15740,6 +15768,9 @@ s390_option_override_internal (struct
gcc_options *opts,
   /* Set the default alignment.  */
   s390_default_align (opts);

+  /* Set unroll options.  */
+  s390_override_options_after_change ();
+
   /* Call target specific restore function to do post-init work.  At
the moment,
  this just sets opts->x_s390_cost_pointer.  */
   s390_function_specific_restore (opts, opts_set, NULL);
diff --git a/gcc/config/s390/s390.opt b/gcc/config/s390/s390.opt
index 9e8d3bfd404c..c375b9c5f729 100644
--- a/gcc/config/s390/s390.opt
+++ b/gcc/config/s390/s390.opt
@@ -321,3 +321,7 @@ and the default behavior is to emit separate
multiplication and addition
 instructions for long doubles in vector registers, because measurements
show
 that this improves performance.  This option all

[PATCH] s390: Add z15 to s390_issue_rate.

2022-08-12 Thread Robin Dapp via Gcc-patches
Hi,

this patch tries to be more explicit by mentioning z15 in s390_issue_rate.

No changes in testsuite, bootstrap or SPEC obviously.

Is it OK?

Regards
 Robin

gcc/ChangeLog:

* config/s390/s390.cc (s390_issue_rate): Add z15.
---
 gcc/config/s390/s390.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index ef38fbe68c84..528cd8c7f0f6 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -8582,6 +8582,7 @@ s390_issue_rate (void)
 case PROCESSOR_2827_ZEC12:
 case PROCESSOR_2964_Z13:
 case PROCESSOR_3906_Z14:
+case PROCESSOR_8561_Z15:
 case PROCESSOR_3931_Z16:
 default:
   return 1;
-- 
2.31.1



[PATCH] s390: Use vpdi and verllg in vec_reve.

2022-08-12 Thread Robin Dapp via Gcc-patches
Hi,

swapping the two elements of a V2DImode or V2DFmode vector can be done
with vpdi instead of using the generic way of loading a permutation mask
from the literal pool and vperm.

Analogous to the V2DI/V2DF case reversing the elements of a four-element
vector can be done by first swapping the elements of the first
doubleword as well the ones of the second one and subsequently rotate
the doublewords by 32 bits.

Bootstrapped and regtested, no regressions.

Is it OK?

Regards
 Robin

gcc/ChangeLog:

PR target/100869
* config/s390/vector.md (@vpdi4_2): New pattern.
(rotl3_di): New pattern.
* config/s390/vx-builtins.md: Use vpdi and verll for reversing
elements.

gcc/testsuite/ChangeLog:

* gcc.target/s390/zvector/vec-reve-int-long.c: New test.
---
 gcc/config/s390/vector.md | 28 +
 gcc/config/s390/vx-builtins.md| 41 +++
 .../s390/zvector/vec-reve-int-long.c  | 31 ++
 3 files changed, 100 insertions(+)
 create mode 100644
gcc/testsuite/gcc.target/s390/zvector/vec-reve-int-long.c

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 16b162aae0e5..2207f39b80e4 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -791,6 +791,17 @@ (define_insn "@vpdi4"
   "vpdi\t%v0,%v1,%v2,4"
   [(set_attr "op_type" "VRR")])

+; Second DW of op1 and first DW of op2 (when interpreted as 2-element
vector).
+(define_insn "@vpdi4_2"
+  [(set (match_operand:V_HW_4   0 "register_operand" "=v")
+   (vec_select:V_HW_4
+(vec_concat:
+ (match_operand:V_HW_4 1 "register_operand"  "v")
+ (match_operand:V_HW_4 2 "register_operand"  "v"))
+(parallel [(const_int 2) (const_int 3) (const_int 4) (const_int 5)])))]
+  "TARGET_VX"
+  "vpdi\t%v0,%v1,%v2,4"
+  [(set_attr "op_type" "VRR")])

 (define_insn "*vmrhb"
   [(set (match_operand:V16QI 0 "register_operand" "=v")
@@ -1249,6 +1260,23 @@ (define_insn "*3"
   "\t%v0,%v1,%Y2"
   [(set_attr "op_type" "VRS")])

+; verllg for V4SI/V4SF.  This swaps the first and the second two
+; elements of a vector and is only valid in that context.
+(define_expand "rotl3_di"
+ [
+ (set (match_dup 2)
+  (subreg:V2DI (match_operand:V_HW_4 1) 0))
+ (set (match_dup 3)
+  (rotate:V2DI
+   (match_dup 2)
+   (const_int 32)))
+ (set (match_operand:V_HW_4 0)
+  (subreg:V_HW_4 (match_dup 3) 0))]
+ "TARGET_VX"
+ {
+  operands[2] = gen_reg_rtx (V2DImode);
+  operands[3] = gen_reg_rtx (V2DImode);
+ })

 ; Shift each element by corresponding vector element

diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md
index c46d16eae484..99c4c037b49a 100644
--- a/gcc/config/s390/vx-builtins.md
+++ b/gcc/config/s390/vx-builtins.md
@@ -2184,6 +2184,47 @@ (define_insn "*eltswap"
vster\t%v1,%v0"
   [(set_attr "op_type" "*,VRX,VRX")])

+; Swapping v2df/v2di can be done via vpdi on z13 and z14.
+(define_split
+  [(set (match_operand:V_HW_2 0 "register_operand" "")
+   (unspec:V_HW_2 [(match_operand:V_HW_2 1 "register_operand" "")]
+  UNSPEC_VEC_ELTSWAP))]
+  "TARGET_VX && can_create_pseudo_p ()"
+  [(set (match_operand:V_HW_2 0 "register_operand" "=v")
+   (vec_select:V_HW_2
+(vec_concat:
+ (match_operand:V_HW_2 1 "register_operand"  "v")
+ (match_dup 1))
+(parallel [(const_int 1) (const_int 2)])))]
+)
+
+
+; Swapping v4df/v4si can be done via vpdi and rot.
+(define_split
+  [(set (match_operand:V_HW_4 0 "register_operand" "")
+   (unspec:V_HW_4 [(match_operand:V_HW_4 1 "register_operand" "")]
+  UNSPEC_VEC_ELTSWAP))]
+  "TARGET_VX && can_create_pseudo_p ()"
+  [(set (match_dup 2)
+   (vec_select:V_HW_4
+(vec_concat:
+ (match_dup 1)
+ (match_dup 1))
+(parallel [(const_int 2) (const_int 3) (const_int 4) (const_int 5)])))
+ (set (match_dup 3)
+  (subreg:V2DI (match_dup 2) 0))
+ (set (match_dup 4)
+  (rotate:V2DI
+   (match_dup 3)
+   (const_int 32)))
+ (set (match_operand:V_HW_4 0)
+  (subreg:V_HW_4 (match_dup 4) 0))]
+{
+  operands[2] = gen_reg_rtx (mode);
+  operands[3] = gen_reg_rtx (V2DImode);
+  operands[4] = gen_reg_rtx (V2DImode);
+})
+
 ; z15 has instructions for doing element reversal from mem to reg
 ; or the other way around.  For reg to reg or on pre z15 machines
 ; we have to emulate it with vector permute.
diff --git a/gcc/testsuite/gcc.target/s390/zvector/vec-reve-int-long.c
b/gcc/testsuite/gcc.target/s390/zvector/vec-reve-int-long.c
new file mode 100644
index ..dff3a94066c7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/vec-reve-int-long.c
@@ -0,0 +1,31 @@
+/* Test that we use vpdi in order to reverse vectors
+   with two elements instead of creating a literal-pool entry
+   and permuting with vperm.  */
+/* { dg-do compile { target { s390*-*-* } } } */
+/* { dg-options "-O2 -march=z14 -mzarch -mzve

[PATCH]AArch64 sve: Fix fcmuo combine patterns [PR106524]

2022-08-12 Thread Tamar Christina via Gcc-patches
Hi All,

There's no encoding for fcmuo with zero.  This restricts the combine patterns
from accepting zero registers.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master? and GCC 12 branch once unfrozen?

Thanks,
Tamar

gcc/ChangeLog:

PR target/106524
* config/aarch64/aarch64-sve.md (*fcmuo_nor_combine,
*fcmuo_bic_combine): Don't accept comparisons against zero.

gcc/testsuite/ChangeLog:

PR target/106524
* gcc.target/aarch64/sve/pr106524.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index 
bd60e65b0c3f05f1c931f03807170f3b9d699de5..e08bee197d8570c3e4e50068febc819d6e85cce0
 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -8231,7 +8231,7 @@ (define_insn_and_split "*fcmuo_bic_combine"
[(match_operand: 1)
 (const_int SVE_KNOWN_PTRUE)
 (match_operand:SVE_FULL_F 2 "register_operand" "w")
-(match_operand:SVE_FULL_F 3 "aarch64_simd_reg_or_zero" "wDz")]
+(match_operand:SVE_FULL_F 3 "register_operand" "w")]
UNSPEC_COND_FCMUO))
(match_operand: 4 "register_operand" "Upa"))
  (match_dup: 1)))
@@ -8267,7 +8267,7 @@ (define_insn_and_split "*fcmuo_nor_combine"
[(match_operand: 1)
 (const_int SVE_KNOWN_PTRUE)
 (match_operand:SVE_FULL_F 2 "register_operand" "w")
-(match_operand:SVE_FULL_F 3 "aarch64_simd_reg_or_zero" "wDz")]
+(match_operand:SVE_FULL_F 3 "register_operand" "w")]
UNSPEC_COND_FCMUO))
(not:
  (match_operand: 4 "register_operand" "Upa")))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr106524.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pr106524.c
new file mode 100644
index 
..a9f650f971a5cb5ad993f50aadfcac3a8c664a8b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pr106524.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a+sve -O2 -fno-move-loop-invariants" } */
+
+void
+test__zero (int *restrict dest, int *restrict src, float *a, int count)
+{
+  int i;
+
+  for (i = 0; i < count; ++i)
+dest[i] = !__builtin_isunordered (a[i], 0) ? src[i] : 0;
+}




-- 
diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index 
bd60e65b0c3f05f1c931f03807170f3b9d699de5..e08bee197d8570c3e4e50068febc819d6e85cce0
 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -8231,7 +8231,7 @@ (define_insn_and_split "*fcmuo_bic_combine"
[(match_operand: 1)
 (const_int SVE_KNOWN_PTRUE)
 (match_operand:SVE_FULL_F 2 "register_operand" "w")
-(match_operand:SVE_FULL_F 3 "aarch64_simd_reg_or_zero" "wDz")]
+(match_operand:SVE_FULL_F 3 "register_operand" "w")]
UNSPEC_COND_FCMUO))
(match_operand: 4 "register_operand" "Upa"))
  (match_dup: 1)))
@@ -8267,7 +8267,7 @@ (define_insn_and_split "*fcmuo_nor_combine"
[(match_operand: 1)
 (const_int SVE_KNOWN_PTRUE)
 (match_operand:SVE_FULL_F 2 "register_operand" "w")
-(match_operand:SVE_FULL_F 3 "aarch64_simd_reg_or_zero" "wDz")]
+(match_operand:SVE_FULL_F 3 "register_operand" "w")]
UNSPEC_COND_FCMUO))
(not:
  (match_operand: 4 "register_operand" "Upa")))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr106524.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pr106524.c
new file mode 100644
index 
..a9f650f971a5cb5ad993f50aadfcac3a8c664a8b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pr106524.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a+sve -O2 -fno-move-loop-invariants" } */
+
+void
+test__zero (int *restrict dest, int *restrict src, float *a, int count)
+{
+  int i;
+
+  for (i = 0; i < count; ++i)
+dest[i] = !__builtin_isunordered (a[i], 0) ? src[i] : 0;
+}





GCC 12.1.1 Status Report (2022-08-12), branch frozen for release

2022-08-12 Thread Richard Biener via Gcc-patches


Status
==

The gcc-12 branch is now frozen in preparation for a GCC 12.2 release
candidate and the GCC 12.2 release next week.  All changes now require
release manager approval.


Quality Data


Priority  #   Change from last report
---   ---
P1  0  
P2  432   +   3
P3  62-   1
P4  239   -   1
P5  25 
---   ---
Total P1-P3 494   +   2
Total   758   +   1


Previous Report
===

https://gcc.gnu.org/pipermail/gcc/2022-July/239190.html


[PATCH] s390: Recognize reverse/element swap permute patterns.

2022-08-12 Thread Robin Dapp via Gcc-patches
Hi,

this adds functions to recognize reverse/element swap permute patterns
for vler, vster as well as vpdi and rotate.

Bootstrapped and regtested, no regressions.

Is it OK?

Regards
 Robin

gcc/ChangeLog:

* config/s390/s390.cc (expand_perm_with_vpdi): Recognize swap pattern.
(is_reverse_perm_mask): New function.
(expand_perm_with_rot): Recognize reverse pattern.
(expand_perm_with_vster): Use vler/vster for element reversal on z15.
(s390_vectorize_vec_perm_const): Add expand functions.
* config/s390/vx-builtins.md: PreferThis adds functions to recognize
reverse/element swap permute patterns
for vler, vster as well as vpdi and rotate.

gcc/ChangeLog:

* config/s390/s390.cc (expand_perm_with_vpdi): Recognize swap pattern.
(is_reverse_perm_mask): New function.
(expand_perm_with_rot): Recognize reverse pattern.
(expand_perm_with_vster): Use vler/vster for element reversal on z15.
(s390_vectorize_vec_perm_const): Add expand functions.
* config/s390/vx-builtins.md: Prefer vster over vler.
* config/s390/s390.cc (expand_perm_with_vstbrq): New function.
(vectorize_vec_perm_const_1): Use.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/vperm-rev-z14.c: New test.
* gcc.target/s390/vector/vperm-rev-z15.c: New test.
* gcc.target/s390/zvector/vec-reve-store-byte.c: Adjust test
expectation.
---
 gcc/config/s390/s390.cc   | 102 ++-
 gcc/config/s390/vx-builtins.md|  21 
 .../gcc.target/s390/vector/vperm-rev-z14.c|  87 +
 .../gcc.target/s390/vector/vperm-rev-z15.c| 118 ++
 .../s390/zvector/vec-reve-store-byte.c|   6 +-
 5 files changed, 329 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vperm-rev-z14.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vperm-rev-z15.c

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 528cd8c7f0f6..c86b26933d7a 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -17225,10 +17225,15 @@ expand_perm_with_vpdi (const struct
expand_vec_perm_d &d)
   if (d.nelt != 2)
 return false;

+  /* If both operands are the same we can swap the elements
+ i.e. reverse the vector.  */
+  bool same = d.op0 == d.op1;
+
   if (d.perm[0] == 0 && d.perm[1] == 3)
 vpdi1_p = true;

-  if (d.perm[0] == 1 && d.perm[1] == 2)
+  if ((d.perm[0] == 1 && d.perm[1] == 2)
+  || (same && d.perm[0] == 1 && d.perm[1] == 0))
 vpdi4_p = true;

   if (!vpdi1_p && !vpdi4_p)
@@ -17249,6 +17254,92 @@ expand_perm_with_vpdi (const struct
expand_vec_perm_d &d)
   return true;
 }

+/* Helper that checks if a vector permutation mask D
+   represents a reversal of the vector's elements.  */
+static inline bool
+is_reverse_perm_mask (const struct expand_vec_perm_d &d)
+{
+  for (int i = 0; i < d.nelt; i++)
+if (d.perm[i] != d.nelt - i - 1)
+  return false;
+  return true;
+}
+
+/* The case of reversing a four-element vector [0, 1, 2, 3]
+   can be handled by first permuting the doublewords
+   [2, 3, 0, 1] and subsequently rotating them by 32 bits.  */
+static bool
+expand_perm_with_rot (const struct expand_vec_perm_d &d)
+{
+  if (d.nelt != 4)
+return false;
+
+  if (d.op0 == d.op1 && is_reverse_perm_mask (d))
+{
+  if (d.testing_p)
+   return true;
+
+  rtx tmp = gen_reg_rtx (d.vmode);
+  rtx op0_reg = force_reg (GET_MODE (d.op0), d.op0);
+
+  emit_insn (gen_vpdi4_2 (d.vmode, tmp, op0_reg, op0_reg));
+  if (d.vmode == V4SImode)
+   emit_insn (gen_rotlv4si3_di (d.target, tmp));
+  else if (d.vmode == V4SFmode)
+   emit_insn (gen_rotlv4sf3_di (d.target, tmp));
+
+  return true;
+}
+
+  return false;
+}
+
+/* If we just reverse the elements, emit an eltswap if we have
+   vler/vster.  */
+static bool
+expand_perm_with_vster (const struct expand_vec_perm_d &d)
+{
+  if (TARGET_VXE2 && d.op0 == d.op1 && is_reverse_perm_mask (d)
+  && (d.vmode == V2DImode || d.vmode == V2DFmode
+ || d.vmode == V4SImode || d.vmode == V4SFmode
+ || d.vmode == V8HImode))
+{
+  if (d.testing_p)
+   return true;
+
+  if (d.vmode == V2DImode)
+   emit_insn (gen_eltswapv2di (d.target, d.op0));
+  else if (d.vmode == V2DFmode)
+   emit_insn (gen_eltswapv2df (d.target, d.op0));
+  else if (d.vmode == V4SImode)
+   emit_insn (gen_eltswapv4si (d.target, d.op0));
+  else if (d.vmode == V4SFmode)
+   emit_insn (gen_eltswapv4sf (d.target, d.op0));
+  else if (d.vmode == V8HImode)
+   emit_insn (gen_eltswapv8hi (d.target, d.op0));
+  return true;
+}
+  return false;
+}
+
+/* If we reverse a byte-vector this is the same as
+   byte reversing it which can be done with vstbrq.  */
+static bool
+expand_perm_with_vstbrq (const struct expand_vec_perm_d &d)
+{
+  if (TARGET_VXE2 && d.op0 == d.op1 && i

Re: [PATCH] phiopt: Remove unnecessary checks from spaceship_replacement [PR106506]

2022-08-12 Thread Richard Biener via Gcc-patches
On Thu, 11 Aug 2022, Jakub Jelinek wrote:

> Hi!
> 
> Those 2 checks were just me trying to be extra careful, the
> (phires & 1) == phires and variants it is folded to of course make only sense
> for the -1/0/1/2 result spaceship, for -1/0/1 one can just use comparisons of
> phires.  We only floating point spaceship if nans aren't honored, so the
> 2 case is ignored, and if it is, with Aldy's changes we can simplify the
> 2 case away from the phi but the (phires & 1) == phires stayed.  It is safe
> to treat the phires comparison as phires >= 0 even then.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2022-08-11  Jakub Jelinek  
> 
>   PR tree-optimization/106506
>   * tree-ssa-phiopt.cc (spaceship_replacement): Don't punt for
>   is_cast or orig_use_lhs cases if phi_bb has 3 predecessors.
> 
>   * g++.dg/opt/pr94589-2.C: New test.
> 
> --- gcc/tree-ssa-phiopt.cc.jj 2022-08-10 09:06:53.0 +0200
> +++ gcc/tree-ssa-phiopt.cc2022-08-10 15:33:32.414641593 +0200
> @@ -2448,8 +2448,6 @@ spaceship_replacement (basic_block cond_
>   return false;
>if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (orig_use_lhs))
>   return false;
> -  if (EDGE_COUNT (phi_bb->preds) != 4)
> - return false;
>if (!single_imm_use (orig_use_lhs, &use_p, &use_stmt))
>   return false;
>  
> @@ -2467,8 +2465,6 @@ spaceship_replacement (basic_block cond_
>orig_use_lhs = gimple_assign_lhs (use_stmt);
>if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (orig_use_lhs))
>   return false;
> -  if (EDGE_COUNT (phi_bb->preds) != 4)
> - return false;
>if (!single_imm_use (orig_use_lhs, &use_p, &use_stmt))
>   return false;
>  }
> --- gcc/testsuite/g++.dg/opt/pr94589-2.C.jj   2022-08-10 09:06:52.921213966 
> +0200
> +++ gcc/testsuite/g++.dg/opt/pr94589-2.C  2022-08-10 15:45:24.599319922 
> +0200
> @@ -1,7 +1,7 @@
>  // PR tree-optimization/94589
>  // { dg-do compile { target c++20 } }
>  // { dg-options "-O2 -g0 -ffast-math -fdump-tree-optimized" }
> -// { dg-final { scan-tree-dump-times "\[ij]_\[0-9]+\\(D\\) 
> (?:<|<=|==|!=|>|>=) \[ij]_\[0-9]+\\(D\\)" 12 "optimized" { xfail *-*-* } } }
> +// { dg-final { scan-tree-dump-times "\[ij]_\[0-9]+\\(D\\) 
> (?:<|<=|==|!=|>|>=) \[ij]_\[0-9]+\\(D\\)" 12 "optimized" } }
>  // { dg-final { scan-tree-dump-times "i_\[0-9]+\\(D\\) (?:<|<=|==|!=|>|>=) 
> 5\\.0" 12 "optimized" } }
>  
>  #include 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


[PATCH] tree-optimization/106593 - fix ICE with backward threading

2022-08-12 Thread Richard Biener via Gcc-patches
With the last re-org I failed to make sure to not add SSA names
nor supported by ranger into m_imports which then triggers an
ICE in range_on_path_entry because range_of_expr returns false.  I've
noticed that range_on_path_entry does mightly complicated things
that don't make sense to me and the commentary might just be
out of date.  For the sake of it I replaced it with range_on_entry
and statistics show we thread _more_ jumps with that, so better
not do magic there.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Will push if that succeeds.

PR tree-optimization/106593
* tree-ssa-threadbackward.cc (back_threader::find_paths):
If the imports from the conditional do not satisfy
gimple_range_ssa_p don't try to thread anything.
* gimple-range-path.cc (range_on_path_entry): Just
call range_on_entry.
---
 gcc/gimple-range-path.cc   | 33 +
 gcc/tree-ssa-threadbackward.cc |  6 +-
 2 files changed, 6 insertions(+), 33 deletions(-)

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index b6148eb5bd7..a7d277c31b8 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -153,38 +153,7 @@ path_range_query::range_on_path_entry (vrange &r, tree 
name)
 {
   gcc_checking_assert (defined_outside_path (name));
   basic_block entry = entry_bb ();
-
-  // Prefer to use range_of_expr if we have a statement to look at,
-  // since it has better caching than range_on_edge.
-  gimple *last = last_stmt (entry);
-  if (last)
-{
-  if (m_ranger->range_of_expr (r, name, last))
-   return;
-  gcc_unreachable ();
-}
-
-  // If we have no statement, look at all the incoming ranges to the
-  // block.  This can happen when we're querying a block with only an
-  // outgoing edge (no statement but the fall through edge), but for
-  // which we can determine a range on entry to the block.
-  Value_Range tmp (TREE_TYPE (name));
-  bool changed = false;
-  r.set_undefined ();
-  for (unsigned i = 0; i < EDGE_COUNT (entry->preds); ++i)
-{
-  edge e = EDGE_PRED (entry, i);
-  if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)
- && m_ranger->range_on_edge (tmp, e, name))
-   {
- r.union_ (tmp);
- changed = true;
-   }
-}
-
-  // Make sure we don't return UNDEFINED by mistake.
-  if (!changed)
-r.set_varying (TREE_TYPE (name));
+  m_ranger->range_on_entry (r, entry, name);
 }
 
 // Return the range of NAME at the end of the path being analyzed.
diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
index 0a992213dad..669098e4ec3 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -525,7 +525,11 @@ back_threader::find_paths (basic_block bb, tree name)
   bitmap_clear (m_imports);
   ssa_op_iter iter;
   FOR_EACH_SSA_TREE_OPERAND (name, stmt, iter, SSA_OP_USE)
-   bitmap_set_bit (m_imports, SSA_NAME_VERSION (name));
+   {
+ if (!gimple_range_ssa_p (name))
+   return;
+ bitmap_set_bit (m_imports, SSA_NAME_VERSION (name));
+   }
 
   // Interesting is the set of imports we still not have see
   // the definition of.  So while imports only grow, the
-- 
2.35.3


[PATCH] s390: Implement vec_revb(vector short)/bswapv8hi with, verllh.

2022-08-12 Thread Robin Dapp via Gcc-patches
Hi,

this patch implements a byte swap for a V8HImode vector via an element
rotate by 8 bits.

Bootstrapped and regtested, no regressions.

Is it OK?

Regards
 Robin

gcc/ChangeLog:

PR target/100867
* config/s390/vector.md: Add special case for V8HImode.

gcc/testsuite/ChangeLog:

* gcc.target/s390/zvector/vec-revb-short.c: New test.
---
 gcc/config/s390/vector.md | 35 ---
 .../gcc.target/s390/zvector/vec-revb-short.c  | 13 +++
 2 files changed, 35 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec-revb-short.c

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 2207f39b80e4..6f46bed03e00 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -2898,22 +2898,31 @@ (define_expand "bswap"
   for (int i = 0; i < 16; i++)
 perm_rtx[i] = GEN_INT (perm[i]);

-  operands[2] = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16,
perm_rtx));
-
-  /* Without vxe2 we do not have byte swap instructions dealing
- directly with memory operands.  So instead of waiting until
- reload to fix that up switch over to vector permute right
- now.  */
-  if (!TARGET_VXE2)
+  if (!TARGET_VXE2 && mode == V8HImode)
 {
-  rtx in = force_reg (V16QImode, simplify_gen_subreg (V16QImode,
operands[1], mode, 0));
-  rtx permute = force_reg (V16QImode, force_const_mem (V16QImode,
operands[2]));
-  rtx out = gen_reg_rtx (V16QImode);
-
-  emit_insn (gen_vec_permv16qi (out, in, in, permute));
-  emit_move_insn (operands[0], simplify_gen_subreg (mode,
out, V16QImode, 0));
+  /* A byte swap for a short is just a rotate by 8 bits.  */
+  emit_insn (gen_rotlv8hi3 (operands[0], operands[1], GEN_INT (8)));
   DONE;
 }
+  else
+{
+  operands[2] = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16,
perm_rtx));
+
+  /* Without vxe2 we do not have byte swap instructions dealing
+directly with memory operands.  So instead of waiting until
+reload to fix that up switch over to vector permute right
+now.  */
+  if (!TARGET_VXE2)
+   {
+ rtx in = force_reg (V16QImode, simplify_gen_subreg (V16QImode,
operands[1], mode, 0));
+ rtx permute = force_reg (V16QImode, force_const_mem (V16QImode,
operands[2]));
+ rtx out = gen_reg_rtx (V16QImode);
+
+ emit_insn (gen_vec_permv16qi (out, in, in, permute));
+ emit_move_insn (operands[0], simplify_gen_subreg (mode, out,
V16QImode, 0));
+ DONE;
+   }
+}
 })

 ; Switching late to the reg-reg variant requires the vector permute
diff --git a/gcc/testsuite/gcc.target/s390/zvector/vec-revb-short.c
b/gcc/testsuite/gcc.target/s390/zvector/vec-revb-short.c
new file mode 100644
index ..bf58a0e12e74
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zvector/vec-revb-short.c
@@ -0,0 +1,13 @@
+/* Test that we use verllh for byte reversing a vector of shorts.  */
+/* { dg-do compile { target { s390*-*-* } } } */
+/* { dg-options "-O2 -march=z13 -mzvector -mzarch -fno-unroll-loops" } */
+
+/* { dg-final { scan-assembler-times "verllh\t" 1 } } */
+/* { dg-final { scan-assembler-not "vperm" } } */
+
+#include 
+
+vector short revb (vector short a)
+{
+   return vec_revb (a);
+}
-- 
2.31.1



RE: [PATCH]AArch64 sve: Fix fcmuo combine patterns [PR106524]

2022-08-12 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Tamar Christina 
> Sent: Friday, August 12, 2022 11:21 AM
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; Richard Earnshaw ;
> Marcus Shawcroft ; Kyrylo Tkachov
> ; Richard Sandiford
> 
> Subject: [PATCH]AArch64 sve: Fix fcmuo combine patterns [PR106524]
> 
> Hi All,
> 
> There's no encoding for fcmuo with zero.  This restricts the combine patterns
> from accepting zero registers.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master? and GCC 12 branch once unfrozen?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR target/106524
>   * config/aarch64/aarch64-sve.md (*fcmuo_nor_combine,
>   *fcmuo_bic_combine): Don't accept comparisons against
> zero.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/106524
>   * gcc.target/aarch64/sve/pr106524.c: New test.
> 
> --- inline copy of patch --
> diff --git a/gcc/config/aarch64/aarch64-sve.md
> b/gcc/config/aarch64/aarch64-sve.md
> index
> bd60e65b0c3f05f1c931f03807170f3b9d699de5..e08bee197d8570c3e4e50068
> febc819d6e85cce0 100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -8231,7 +8231,7 @@ (define_insn_and_split
> "*fcmuo_bic_combine"
>   [(match_operand: 1)
>(const_int SVE_KNOWN_PTRUE)
>(match_operand:SVE_FULL_F 2 "register_operand" "w")
> -  (match_operand:SVE_FULL_F 3 "aarch64_simd_reg_or_zero"
> "wDz")]
> +  (match_operand:SVE_FULL_F 3 "register_operand" "w")]
>   UNSPEC_COND_FCMUO))
>   (match_operand: 4 "register_operand" "Upa"))
> (match_dup: 1)))
> @@ -8267,7 +8267,7 @@ (define_insn_and_split
> "*fcmuo_nor_combine"
>   [(match_operand: 1)
>(const_int SVE_KNOWN_PTRUE)
>(match_operand:SVE_FULL_F 2 "register_operand" "w")
> -  (match_operand:SVE_FULL_F 3 "aarch64_simd_reg_or_zero"
> "wDz")]
> +  (match_operand:SVE_FULL_F 3 "register_operand" "w")]
>   UNSPEC_COND_FCMUO))
>   (not:
> (match_operand: 4 "register_operand" "Upa")))
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr106524.c
> b/gcc/testsuite/gcc.target/aarch64/sve/pr106524.c
> new file mode 100644
> index
> ..a9f650f971a5cb5ad993f50a
> adfcac3a8c664a8b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr106524.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=armv8-a+sve -O2 -fno-move-loop-invariants" } */
> +
> +void
> +test__zero (int *restrict dest, int *restrict src, float *a, int count)
> +{
> +  int i;
> +
> +  for (i = 0; i < count; ++i)
> +dest[i] = !__builtin_isunordered (a[i], 0) ? src[i] : 0;
> +}
> 
> 
> 
> 
> --


Re: [PATCH] vect: Don't allow vect_emulated_vector_p type in vectorizable_call [PR106322]

2022-08-12 Thread Richard Biener via Gcc-patches
On Fri, Aug 12, 2022 at 11:41 AM Kewen.Lin  wrote:
>
> Hi,
>
> As PR106322 shows, in some cases for some vector type whose
> TYPE_MODE is a scalar integral mode instead of a vector mode,
> it's possible to obtain wrong target support information when
> querying with the scalar integral mode.  For example, for the
> test case in PR106322, on ppc64 32bit vectorizer gets vector
> type "vector(2) short unsigned int" for scalar type "short
> unsigned int", its mode is SImode instead of V2HImode.  The
> target support querying checks umul_highpart optab with SImode
> and considers it's supported, then vectorizer further generates
> .MULH IFN call for that vector type.  Unfortunately it's wrong
> to use SImode support for that vector type multiply highpart
> here.
>
> This patch is to teach vectorizable_call analysis not to allow
> vect_emulated_vector_p type for both vectype_in and vectype_out
> as Richi suggested.
>
> Bootstrapped and regtested on x86_64-redhat-linux,
> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>
> Is it ok for trunk?

OK for trunk.

> If it's ok, I guess we want this to be
> backported?

Yes, but you just missed the RC for 12.2 so please wait until after GCC 12.2
is released and the branch is open again.  The testcase looks mightly
complicated
so fallout there might be well possible as well ;)  I suppose it wasn't possible
to craft a simple C testcase after the analysis?

Richard.

>
> BR,
> Kewen
> -
> PR tree-optimization/106322
>
> gcc/ChangeLog:
>
> * tree-vect-stmts.cc (vectorizable_call): Don't allow
> vect_emulated_vector_p type for both vectype_in and vectype_out.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/i386/pr106322.C: New test.
> * g++.target/powerpc/pr106322.C: New test.
> ---
>  gcc/testsuite/g++.target/i386/pr106322.C| 196 
>  gcc/testsuite/g++.target/powerpc/pr106322.C | 195 +++
>  gcc/tree-vect-stmts.cc  |   8 +
>  3 files changed, 399 insertions(+)
>  create mode 100644 gcc/testsuite/g++.target/i386/pr106322.C
>  create mode 100644 gcc/testsuite/g++.target/powerpc/pr106322.C
>
> diff --git a/gcc/testsuite/g++.target/i386/pr106322.C 
> b/gcc/testsuite/g++.target/i386/pr106322.C
> new file mode 100644
> index 000..3cd8d6bf225
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/pr106322.C
> @@ -0,0 +1,196 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target ia32 } */
> +/* { dg-require-effective-target c++11 } */
> +/* { dg-options "-O2 -mtune=generic -march=i686" } */
> +
> +/* As PR106322, verify this can execute well (not abort).  */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +__attribute__((noipa))
> +bool BytesEqual(const void *bytes1, const void *bytes2, const size_t size) {
> +  return memcmp(bytes1, bytes2, size) == 0;
> +}
> +
> +#define HWY_ALIGNMENT 64
> +constexpr size_t kAlignment = HWY_ALIGNMENT;
> +constexpr size_t kAlias = kAlignment * 4;
> +
> +namespace hwy {
> +namespace N_EMU128 {
> +template  struct Vec128 {
> +  T raw[16 / sizeof(T)] = {};
> +};
> +} // namespace N_EMU128
> +} // namespace hwy
> +
> +template 
> +static void Store(const hwy::N_EMU128::Vec128 v,
> +  T *__restrict__ aligned) {
> +  __builtin_memcpy(aligned, v.raw, sizeof(T) * N);
> +}
> +
> +template 
> +static hwy::N_EMU128::Vec128 Load(const T *__restrict__ aligned) {
> +  hwy::N_EMU128::Vec128 v;
> +  __builtin_memcpy(v.raw, aligned, sizeof(T) * N);
> +  return v;
> +}
> +
> +template 
> +static hwy::N_EMU128::Vec128
> +MulHigh(hwy::N_EMU128::Vec128 a,
> +const hwy::N_EMU128::Vec128 b) {
> +  for (size_t i = 0; i < N; ++i) {
> +// Cast to uint32_t first to prevent overflow. Otherwise the result of
> +// uint16_t * uint16_t is in "int" which may overflow. In practice the
> +// result is the same but this way it is also defined.
> +a.raw[i] = static_cast(
> +(static_cast(a.raw[i]) * static_cast(b.raw[i])) 
> >>
> +16);
> +  }
> +  return a;
> +}
> +
> +#define HWY_ASSERT(condition) assert((condition))
> +#define HWY_ASSUME_ALIGNED(ptr, align) __builtin_assume_aligned((ptr), 
> (align))
> +
> +#pragma pack(push, 1)
> +struct AllocationHeader {
> +  void *allocated;
> +  size_t payload_size;
> +};
> +#pragma pack(pop)
> +
> +static void FreeAlignedBytes(const void *aligned_pointer) {
> +  HWY_ASSERT(aligned_pointer != nullptr);
> +  if (aligned_pointer == nullptr)
> +return;
> +
> +  const uintptr_t payload = reinterpret_cast(aligned_pointer);
> +  HWY_ASSERT(payload % kAlignment == 0);
> +  const AllocationHeader *header =
> +  reinterpret_cast(payload) - 1;
> +
> +  free(header->allocated);
> +}
> +
> +class AlignedFreer {
> +public:
> +  template  void operator()(T *aligned_pointer) const {
> +FreeAlignedBytes(aligned_pointer);
> +  }
> +};
> +
> +template 
> +using AlignedFreeUniquePtr = std::unique_ptr;
> +
> +static inline constexpr size_t Shift

Re: [PATCH] vect: Don't allow vect_emulated_vector_p type in vectorizable_call [PR106322]

2022-08-12 Thread Kewen.Lin via Gcc-patches
on 2022/8/12 19:14, Richard Biener wrote:
> On Fri, Aug 12, 2022 at 11:41 AM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> As PR106322 shows, in some cases for some vector type whose
>> TYPE_MODE is a scalar integral mode instead of a vector mode,
>> it's possible to obtain wrong target support information when
>> querying with the scalar integral mode.  For example, for the
>> test case in PR106322, on ppc64 32bit vectorizer gets vector
>> type "vector(2) short unsigned int" for scalar type "short
>> unsigned int", its mode is SImode instead of V2HImode.  The
>> target support querying checks umul_highpart optab with SImode
>> and considers it's supported, then vectorizer further generates
>> .MULH IFN call for that vector type.  Unfortunately it's wrong
>> to use SImode support for that vector type multiply highpart
>> here.
>>
>> This patch is to teach vectorizable_call analysis not to allow
>> vect_emulated_vector_p type for both vectype_in and vectype_out
>> as Richi suggested.
>>
>> Bootstrapped and regtested on x86_64-redhat-linux,
>> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>>
>> Is it ok for trunk?
> 
> OK for trunk.
> 
>> If it's ok, I guess we want this to be
>> backported?
> 
> Yes, but you just missed the RC for 12.2 so please wait until after GCC 12.2
> is released and the branch is open again.  The testcase looks mightly
> complicated
> so fallout there might be well possible as well ;)  I suppose it wasn't 
> possible
> to craft a simple C testcase after the analysis?

Thanks for the hints!  Let me give it a try next week and get back to you then.

BR,
Kewen


Re: [PATCH] tree-optimization/106593 - fix ICE with backward threading

2022-08-12 Thread Aldy Hernandez via Gcc-patches
On Fri, Aug 12, 2022 at 12:59 PM Richard Biener  wrote:
>
> With the last re-org I failed to make sure to not add SSA names
> nor supported by ranger into m_imports which then triggers an
> ICE in range_on_path_entry because range_of_expr returns false.  I've
> noticed that range_on_path_entry does mightly complicated things
> that don't make sense to me and the commentary might just be
> out of date.  For the sake of it I replaced it with range_on_entry
> and statistics show we thread _more_ jumps with that, so better
> not do magic there.

Hang on, hang on.  range_on_path_entry was written that way for a
reason.  Andrew and I had numerous discussions about this.  For that
matter, my first implementation did exactly what you're proposing, but
he had reservations about using range_on_entry, which IIRC he thought
should be removed from the (public) API because it had a tendency to
blow up lookups.

Let's wait for Andrew to chime in on this.  If indeed the commentary
is out of date, I would much rather use range_on_entry like you
propose, but he and I have fought many times about this... over
various versions of the path solver :).

For now I would return VARYING in range_on_path_entry if range_of_expr
returns false.  We shouldn't be ICEing when we can gracefully handle
things.  This gcc_unreachable was there to catch implementation issues
during development.

I would keep your gimple_range_ssa_p check regardless.  No sense doing
extra work if we're absolutely sure we won't handle it.

Aldy

>
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>
> Will push if that succeeds.
>
> PR tree-optimization/106593
> * tree-ssa-threadbackward.cc (back_threader::find_paths):
> If the imports from the conditional do not satisfy
> gimple_range_ssa_p don't try to thread anything.
> * gimple-range-path.cc (range_on_path_entry): Just
> call range_on_entry.
> ---
>  gcc/gimple-range-path.cc   | 33 +
>  gcc/tree-ssa-threadbackward.cc |  6 +-
>  2 files changed, 6 insertions(+), 33 deletions(-)
>
> diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
> index b6148eb5bd7..a7d277c31b8 100644
> --- a/gcc/gimple-range-path.cc
> +++ b/gcc/gimple-range-path.cc
> @@ -153,38 +153,7 @@ path_range_query::range_on_path_entry (vrange &r, tree 
> name)
>  {
>gcc_checking_assert (defined_outside_path (name));
>basic_block entry = entry_bb ();
> -
> -  // Prefer to use range_of_expr if we have a statement to look at,
> -  // since it has better caching than range_on_edge.
> -  gimple *last = last_stmt (entry);
> -  if (last)
> -{
> -  if (m_ranger->range_of_expr (r, name, last))
> -   return;
> -  gcc_unreachable ();
> -}

I
> -
> -  // If we have no statement, look at all the incoming ranges to the
> -  // block.  This can happen when we're querying a block with only an
> -  // outgoing edge (no statement but the fall through edge), but for
> -  // which we can determine a range on entry to the block.
> -  Value_Range tmp (TREE_TYPE (name));
> -  bool changed = false;
> -  r.set_undefined ();
> -  for (unsigned i = 0; i < EDGE_COUNT (entry->preds); ++i)
> -{
> -  edge e = EDGE_PRED (entry, i);
> -  if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)
> - && m_ranger->range_on_edge (tmp, e, name))
> -   {
> - r.union_ (tmp);
> - changed = true;
> -   }
> -}
> -
> -  // Make sure we don't return UNDEFINED by mistake.
> -  if (!changed)
> -r.set_varying (TREE_TYPE (name));
> +  m_ranger->range_on_entry (r, entry, name);
>  }
>
>  // Return the range of NAME at the end of the path being analyzed.
> diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
> index 0a992213dad..669098e4ec3 100644
> --- a/gcc/tree-ssa-threadbackward.cc
> +++ b/gcc/tree-ssa-threadbackward.cc
> @@ -525,7 +525,11 @@ back_threader::find_paths (basic_block bb, tree name)
>bitmap_clear (m_imports);
>ssa_op_iter iter;
>FOR_EACH_SSA_TREE_OPERAND (name, stmt, iter, SSA_OP_USE)
> -   bitmap_set_bit (m_imports, SSA_NAME_VERSION (name));
> +   {
> + if (!gimple_range_ssa_p (name))
> +   return;
> + bitmap_set_bit (m_imports, SSA_NAME_VERSION (name));
> +   }
>
>// Interesting is the set of imports we still not have see
>// the definition of.  So while imports only grow, the
> --
> 2.35.3
>



Re: [PATCH] tree-optimization/106593 - fix ICE with backward threading

2022-08-12 Thread Richard Biener via Gcc-patches
On Fri, 12 Aug 2022, Aldy Hernandez wrote:

> On Fri, Aug 12, 2022 at 12:59 PM Richard Biener  wrote:
> >
> > With the last re-org I failed to make sure to not add SSA names
> > nor supported by ranger into m_imports which then triggers an
> > ICE in range_on_path_entry because range_of_expr returns false.  I've
> > noticed that range_on_path_entry does mightly complicated things
> > that don't make sense to me and the commentary might just be
> > out of date.  For the sake of it I replaced it with range_on_entry
> > and statistics show we thread _more_ jumps with that, so better
> > not do magic there.
> 
> Hang on, hang on.  range_on_path_entry was written that way for a
> reason.  Andrew and I had numerous discussions about this.  For that
> matter, my first implementation did exactly what you're proposing, but
> he had reservations about using range_on_entry, which IIRC he thought
> should be removed from the (public) API because it had a tendency to
> blow up lookups.
> 
> Let's wait for Andrew to chime in on this.  If indeed the commentary
> is out of date, I would much rather use range_on_entry like you
> propose, but he and I have fought many times about this... over
> various versions of the path solver :).
> 
> For now I would return VARYING in range_on_path_entry if range_of_expr
> returns false.  We shouldn't be ICEing when we can gracefully handle
> things.  This gcc_unreachable was there to catch implementation issues
> during development.
> 
> I would keep your gimple_range_ssa_p check regardless.  No sense doing
> extra work if we're absolutely sure we won't handle it.

OK, I'll push just the gimple_range_ssa_p then since that resolves
the PR on its own.  I was first misled about the gcc_unreachable
and my brain hurt understanding this function ... (also as to
why using range_of_expr on a _random_ stmt would be OK).

That said, nothing seems to be (publicly) using range_on_entry,
so if it shouldn't be used (but it's used privately!) then
make it private.

Thanks,
Richard.

> Aldy
> 
> >
> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> >
> > Will push if that succeeds.
> >
> > PR tree-optimization/106593
> > * tree-ssa-threadbackward.cc (back_threader::find_paths):
> > If the imports from the conditional do not satisfy
> > gimple_range_ssa_p don't try to thread anything.
> > * gimple-range-path.cc (range_on_path_entry): Just
> > call range_on_entry.
> > ---
> >  gcc/gimple-range-path.cc   | 33 +
> >  gcc/tree-ssa-threadbackward.cc |  6 +-
> >  2 files changed, 6 insertions(+), 33 deletions(-)
> >
> > diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
> > index b6148eb5bd7..a7d277c31b8 100644
> > --- a/gcc/gimple-range-path.cc
> > +++ b/gcc/gimple-range-path.cc
> > @@ -153,38 +153,7 @@ path_range_query::range_on_path_entry (vrange &r, tree 
> > name)
> >  {
> >gcc_checking_assert (defined_outside_path (name));
> >basic_block entry = entry_bb ();
> > -
> > -  // Prefer to use range_of_expr if we have a statement to look at,
> > -  // since it has better caching than range_on_edge.
> > -  gimple *last = last_stmt (entry);
> > -  if (last)
> > -{
> > -  if (m_ranger->range_of_expr (r, name, last))
> > -   return;
> > -  gcc_unreachable ();
> > -}
> 
> I
> > -
> > -  // If we have no statement, look at all the incoming ranges to the
> > -  // block.  This can happen when we're querying a block with only an
> > -  // outgoing edge (no statement but the fall through edge), but for
> > -  // which we can determine a range on entry to the block.
> > -  Value_Range tmp (TREE_TYPE (name));
> > -  bool changed = false;
> > -  r.set_undefined ();
> > -  for (unsigned i = 0; i < EDGE_COUNT (entry->preds); ++i)
> > -{
> > -  edge e = EDGE_PRED (entry, i);
> > -  if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)
> > - && m_ranger->range_on_edge (tmp, e, name))
> > -   {
> > - r.union_ (tmp);
> > - changed = true;
> > -   }
> > -}
> > -
> > -  // Make sure we don't return UNDEFINED by mistake.
> > -  if (!changed)
> > -r.set_varying (TREE_TYPE (name));
> > +  m_ranger->range_on_entry (r, entry, name);
> >  }
> >
> >  // Return the range of NAME at the end of the path being analyzed.
> > diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
> > index 0a992213dad..669098e4ec3 100644
> > --- a/gcc/tree-ssa-threadbackward.cc
> > +++ b/gcc/tree-ssa-threadbackward.cc
> > @@ -525,7 +525,11 @@ back_threader::find_paths (basic_block bb, tree name)
> >bitmap_clear (m_imports);
> >ssa_op_iter iter;
> >FOR_EACH_SSA_TREE_OPERAND (name, stmt, iter, SSA_OP_USE)
> > -   bitmap_set_bit (m_imports, SSA_NAME_VERSION (name));
> > +   {
> > + if (!gimple_range_ssa_p (name))
> > +   return;
> > + bitmap_set_bit (m_imports, SSA_NAME_VERSION (name));
> > +  

[PATCH] Support threading of just the exit edge

2022-08-12 Thread Richard Biener via Gcc-patches
This started with noticing we add ENTRY_BLOCK to our threads
just for the sake of simplifying the conditional at the end of
the first block in a function.  That's not really threading
anything but it ends up duplicating the entry block, and
re-writing the result instead of statically fold the jump.

The following tries to handle those by recording simplifications
of the exit conditional as a thread of length one.  That requires
special-casing them in the backward copier since if we do not
have any block to copy but modify the jump in place and remove
not taken edges this confuses the hell out of remaining threads.

So back_jt_path_registry::update_cfg now first marks all
edges we know are never taken and then prunes the threading
candidates when they include such edge.  Then it makes sure
to first perform unreachable edge removal (so we avoid
copying them when other thread paths contain the prevailing
edge) before continuing to apply the remaining threads.

In statistics you can see this avoids quite a bunch of useless
threads (I've investiated 3 random files from cc1files with
dropped stats in any of the thread passes).

Still thinking about it it would be nice to avoid the work of
discovering those candidates we have to throw away later
which could eventually be done by having the backward threader
perform a RPO walk over the CFG, skipping edges that can be
statically determined as not being executed.  Below I'm
abusing the path range query to statically analyze the exit
branch but I assume there's a simpler way of folding this stmt
which could then better integrate with such a walk.

In any case it seems worth more conciously handling the
case of exit branches that simplify without path sensitive
information.

Then the patch also restricts path discovery when we'd produce
threads we'll reject later during copying - the backward threader
copying cannot handle paths where the to duplicate blocks are
not from exactly the same loop.  I'm probably going to split this
part out.

Any thoughts?

* gimple-range-path.cc (path_range_query::set_path): Adjust
assert to allow paths of size one.
* tree-ssa-threadbackward.cc (back_threader::maybe_register_path):
Paths of size one are always profitable.
(back_threader::find_paths_to_names): Likewise.
Do not walk further if we are leaving the current loop.
(back_threader::find_taken_edge): Remove assert.  Do not
walk to ENTRY_BLOCK.
* tree-ssa-threadupdate.cc (back_jt_path_registry::update_cfg):
Handle jump threads of just the exit edge by modifying the
control statement in-place.
---
 gcc/gimple-range-path.cc   |  2 +-
 gcc/tree-ssa-threadbackward.cc | 21 -
 gcc/tree-ssa-threadupdate.cc   | 54 ++
 3 files changed, 69 insertions(+), 8 deletions(-)

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index 78146f5683e..a7d277c31b8 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -220,7 +220,7 @@ path_range_query::unreachable_path_p ()
 void
 path_range_query::set_path (const vec &path)
 {
-  gcc_checking_assert (path.length () > 1);
+  gcc_checking_assert (!path.is_empty ());
   m_path = path.copy ();
   m_pos = m_path.length () - 1;
   bitmap_clear (m_has_cache_entry);
diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
index b886027fccf..669098e4ec3 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -241,8 +241,9 @@ back_threader::maybe_register_path ()
   else
{
  bool irreducible = false;
- if (m_profit.profitable_path_p (m_path, m_name, taken_edge,
- &irreducible)
+ if ((m_path.length () == 1
+  || m_profit.profitable_path_p (m_path, m_name, taken_edge,
+ &irreducible))
  && debug_counter ()
  && m_registry.register_path (m_path, taken_edge))
{
@@ -267,7 +268,6 @@ back_threader::maybe_register_path ()
 edge
 back_threader::find_taken_edge (const vec &path)
 {
-  gcc_checking_assert (path.length () > 1);
   switch (gimple_code (m_last_stmt))
 {
 case GIMPLE_COND:
@@ -350,9 +350,15 @@ back_threader::find_paths_to_names (basic_block bb, bitmap 
interesting,
   m_path.safe_push (bb);
 
   // Try to resolve the path without looking back.
-  if (m_path.length () > 1
-  && (!m_profit.profitable_path_p (m_path, m_name, NULL)
- || maybe_register_path ()))
+  if ((m_path.length () > 1
+   && !m_profit.profitable_path_p (m_path, m_name, NULL))
+  || maybe_register_path ())
+;
+
+  // The backwards thread copier cannot copy blocks that do not belong
+  // to the same loop, so when the new source of the path entry no
+  // longer belongs to it we don't need to search further.
+  else if (m_path[0]->loop_father != bb->loop_father)
 ;
 
   // Continue look

Re: [PATCH] tree-optimization/106593 - fix ICE with backward threading

2022-08-12 Thread Andrew MacLeod via Gcc-patches



On 8/12/22 07:31, Aldy Hernandez wrote:

On Fri, Aug 12, 2022 at 12:59 PM Richard Biener  wrote:

With the last re-org I failed to make sure to not add SSA names
nor supported by ranger into m_imports which then triggers an
ICE in range_on_path_entry because range_of_expr returns false.  I've
noticed that range_on_path_entry does mightly complicated things
that don't make sense to me and the commentary might just be
out of date.  For the sake of it I replaced it with range_on_entry
and statistics show we thread _more_ jumps with that, so better
not do magic there.

Hang on, hang on.  range_on_path_entry was written that way for a
reason.  Andrew and I had numerous discussions about this.  For that
matter, my first implementation did exactly what you're proposing, but
he had reservations about using range_on_entry, which IIRC he thought
should be removed from the (public) API because it had a tendency to
blow up lookups.

Let's wait for Andrew to chime in on this.  If indeed the commentary
is out of date, I would much rather use range_on_entry like you
propose, but he and I have fought many times about this... over
various versions of the path solver :).


The original issue with range-on-entry is one needed to be very careful 
with it.  If you ask for range-on-entry of something which is not 
dominated by the definition, then the cache filling walk was getting 
filled all the way back to the top of the IL, and that was both a waste 
of time and memory., and in some pathological cases was outrageous.  And 
it was happening more frequently than one imagines... even if 
accidentally.  I think the most frequent accidental misuse we saw was 
calling range on entry for a def within the block, or a PHI for the block.


Its a legitimate issue for used before defined cases, but there isnt 
much we can do about those anyway,


range_of_expr on any stmt within a block, when the definition comes from 
outside he block causes ranger to trigger its internal range-on-entry 
"more safely", which is why it didn't need to be part of the API... but 
i admit it does cause some conniptions when for instance there is no 
stmt in the block.


That said, the improvements since then to the cache to be able to always 
use dominators, and selectively update the cache at strategic locations 
probably removes most issues with it. That plus we're more careful about 
timing things these days to make sure something horrid isn't 
introduced.  I also notice all my internal range_on_entry and _exit 
routines have evolved and are much cleaner than they once were.


So. now that we are sufficiently mature in this space...  I think we can 
promote range_on_entry and range_on_exit to full public API..  It does 
seem that there is some use practical use for them.


Andrew

PS. It might even be worthwhile to add an assert to make sure it isnt 
being called on the def block.. just to avoid that particular stupidty 
:-)   I'll take care of doing this.







For now I would return VARYING in range_on_path_entry if range_of_expr
returns false.  We shouldn't be ICEing when we can gracefully handle
things.  This gcc_unreachable was there to catch implementation issues
during development.

I would keep your gimple_range_ssa_p check regardless.  No sense doing
extra work if we're absolutely sure we won't handle it.

Aldy


Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Will push if that succeeds.

 PR tree-optimization/106593
 * tree-ssa-threadbackward.cc (back_threader::find_paths):
 If the imports from the conditional do not satisfy
 gimple_range_ssa_p don't try to thread anything.
 * gimple-range-path.cc (range_on_path_entry): Just
 call range_on_entry.
---
  gcc/gimple-range-path.cc   | 33 +
  gcc/tree-ssa-threadbackward.cc |  6 +-
  2 files changed, 6 insertions(+), 33 deletions(-)

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index b6148eb5bd7..a7d277c31b8 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -153,38 +153,7 @@ path_range_query::range_on_path_entry (vrange &r, tree 
name)
  {
gcc_checking_assert (defined_outside_path (name));
basic_block entry = entry_bb ();
-
-  // Prefer to use range_of_expr if we have a statement to look at,
-  // since it has better caching than range_on_edge.
-  gimple *last = last_stmt (entry);
-  if (last)
-{
-  if (m_ranger->range_of_expr (r, name, last))
-   return;
-  gcc_unreachable ();
-}

I

-
-  // If we have no statement, look at all the incoming ranges to the
-  // block.  This can happen when we're querying a block with only an
-  // outgoing edge (no statement but the fall through edge), but for
-  // which we can determine a range on entry to the block.
-  Value_Range tmp (TREE_TYPE (name));
-  bool changed = false;
-  r.set_undefined ();
-  for (unsigned i = 0; i < EDGE_COUNT (entry->preds); ++i)
-{
-  edge e = EDGE_

[committed] Improve comment for tree_niter_desc.{control,bound,cmp}

2022-08-12 Thread Andrew Carlotti via Gcc-patches
Fix typos and explain ERROR_MARK usage.

gcc/ChangeLog:

* tree-ssa-loop.h: Improve comment

---

diff --git a/gcc/tree-ssa-loop.h b/gcc/tree-ssa-loop.h
index 
415f461c37e4cd7df0b49f6104f796c49cc830fa..6c70f795d171f22b3ed75873fec4920fea75255b
 100644
--- a/gcc/tree-ssa-loop.h
+++ b/gcc/tree-ssa-loop.h
@@ -54,11 +54,11 @@ public:
   widest_int max;  /* The upper bound on the number of iterations of
   the loop.  */
 
-  /* The simplified shape of the exit condition.  The loop exits if
- CONTROL CMP BOUND is false, where CMP is one of NE_EXPR,
- LT_EXPR, or GT_EXPR, and step of CONTROL is positive if CMP is
- LE_EXPR and negative if CMP is GE_EXPR.  This information is used
- by loop unrolling.  */
+  /* The simplified shape of the exit condition.  This information is used by
+ loop unrolling.  If CMP is ERROR_MARK, then the loop cannot be unrolled.
+ Otherwise, the loop exits if CONTROL CMP BOUND is false, where CMP is one
+ of NE_EXPR, LT_EXPR, or GT_EXPR, and CONTROL.STEP is positive if CMP is
+ LT_EXPR and negative if CMP is GT_EXPR.  */
   affine_iv control;
   tree bound;
   enum tree_code cmp;


Re: [PATCH] tree-optimization/106593 - fix ICE with backward threading

2022-08-12 Thread Andrew MacLeod via Gcc-patches



On 8/12/22 09:38, Andrew MacLeod wrote:


On 8/12/22 07:31, Aldy Hernandez wrote:
On Fri, Aug 12, 2022 at 12:59 PM Richard Biener  
wrote:

With the last re-org I failed to make sure to not add SSA names
nor supported by ranger into m_imports which then triggers an
ICE in range_on_path_entry because range_of_expr returns false.  I've
noticed that range_on_path_entry does mightly complicated things
that don't make sense to me and the commentary might just be
out of date.  For the sake of it I replaced it with range_on_entry
and statistics show we thread _more_ jumps with that, so better
not do magic there.

Hang on, hang on.  range_on_path_entry was written that way for a
reason.  Andrew and I had numerous discussions about this.  For that
matter, my first implementation did exactly what you're proposing, but
he had reservations about using range_on_entry, which IIRC he thought
should be removed from the (public) API because it had a tendency to
blow up lookups.

Let's wait for Andrew to chime in on this.  If indeed the commentary
is out of date, I would much rather use range_on_entry like you
propose, but he and I have fought many times about this... over
various versions of the path solver :).


The original issue with range-on-entry is one needed to be very 
careful with it.  If you ask for range-on-entry of something which is 
not dominated by the definition, then the cache filling walk was 
getting filled all the way back to the top of the IL, and that was 
both a waste of time and memory., and in some pathological cases was 
outrageous.  And it was happening more frequently than one imagines... 
even if accidentally.  I think the most frequent accidental misuse we 
saw was calling range on entry for a def within the block, or a PHI 
for the block.


Its a legitimate issue for used before defined cases, but there isnt 
much we can do about those anyway,


range_of_expr on any stmt within a block, when the definition comes 
from outside he block causes ranger to trigger its internal 
range-on-entry "more safely", which is why it didn't need to be part 
of the API... but i admit it does cause some conniptions when for 
instance there is no stmt in the block.


That said, the improvements since then to the cache to be able to 
always use dominators, and selectively update the cache at strategic 
locations probably removes most issues with it. That plus we're more 
careful about timing things these days to make sure something horrid 
isn't introduced.  I also notice all my internal range_on_entry and 
_exit routines have evolved and are much cleaner than they once were.


So. now that we are sufficiently mature in this space...  I think we 
can promote range_on_entry and range_on_exit to full public API..  It 
does seem that there is some use practical use for them.


Andrew

PS. It might even be worthwhile to add an assert to make sure it isnt 
being called on the def block.. just to avoid that particular stupidty 
:-)   I'll take care of doing this.



Actually, as I look at it, perhaps better to leave things as they are.. 
ie, not promote it to a part of the range_query API.. that appears 
fraught with derived issues in other places.


Continue to leave it in rangers public API and anyone using a ranger can 
use it. I will add the assert to make sure its not abused in the common 
way of the past.


And yes, this will dramatically simplify the path_entry routine :-)

Andrew



[PATCH] s390: Implement vec_extract via vec_select.

2022-08-12 Thread Robin Dapp via Gcc-patches
Hi,

vec_select can handle dynamic/runtime masks nowadays.  Therefore we can
get rid of the UNSPEC_VEC_EXTRACT that was preventing further
optimizations like combining instructions with vec_extract patterns.

Bootstrapped and regtested. No regressions.

Is it OK?

Regards
 Robin

gcc/ChangeLog:

* config/s390/s390.md: Remove UNSPEC_VEC_EXTRACT.
* config/s390/vector.md: Rewrite patterns to use vec_select.
* config/s390/vx-builtins.md (vec_scatter_element_SI):
Likewise.
---

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 55c0064bba84..f37d8fd33a15 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -153,7 +153,6 @@ (define_c_enum "unspec" [
UNSPEC_VEC_VMALO

UNSPEC_VEC_GATHER
-   UNSPEC_VEC_EXTRACT
UNSPEC_VEC_INSERT_AND_ZERO
UNSPEC_VEC_LOAD_BNDRY
UNSPEC_VEC_LOAD_LEN
@@ -1744,8 +1743,8 @@ (define_split
&& GENERAL_REG_P (operands[0])
&& VECTOR_REG_P (operands[1])"
   [(set (match_dup 2) (match_dup 4))
-   (set (match_dup 3) (unspec:DI [(match_dup 5) (const_int 1)]
-UNSPEC_VEC_EXTRACT))]
+   (set (match_dup 3) (vec_select:DI (match_dup 5)
+  (parallel [(const_int 1)])))]
 {
   operands[2] = operand_subword (operands[0], 0, 0, TImode);
   operands[3] = operand_subword (operands[0], 1, 0, TImode);
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 6f46bed03e00..6a7ee7870f37 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -264,11 +264,13 @@ (define_split
(match_operand:V_128 1 "register_operand" ""))]
   "TARGET_VX && GENERAL_REG_P (operands[0]) && VECTOR_REG_P (operands[1])"
   [(set (match_dup 2)
-   (unspec:DI [(subreg:V2DI (match_dup 1) 0)
-   (const_int 0)] UNSPEC_VEC_EXTRACT))
+   (vec_select:DI
+ (subreg:V2DI (match_dup 1) 0)
+   (parallel [(const_int 0)])))
(set (match_dup 3)
-   (unspec:DI [(subreg:V2DI (match_dup 1) 0)
-   (const_int 1)] UNSPEC_VEC_EXTRACT))]
+   (vec_select:DI
+ (subreg:V2DI (match_dup 1) 0)
+   (parallel [(const_int 1)])))]
 {
   operands[2] = operand_subword (operands[0], 0, 0, mode);
   operands[3] = operand_subword (operands[0], 1, 0, mode);
@@ -505,22 +507,24 @@ (define_insn "*vec_set_plus"
   [(set_attr "op_type" "VRS")])


-; FIXME: Support also vector mode operands for 0
-; FIXME: This should be (vec_select ..) or something but it does only
allow constant selectors :(
-; This is used via RTL standard name as well as for expanding the builtin
+;; FIXME: Support also vector mode operands for 0
+;; This is used via RTL standard name as well as for expanding the builtin
 (define_expand "vec_extract"
-  [(set (match_operand: 0 "nonimmediate_operand" "")
-   (unspec: [(match_operand:V  1 "register_operand" "")
-  (match_operand:SI 2 "nonmemory_operand" "")]
- UNSPEC_VEC_EXTRACT))]
-  "TARGET_VX")
+  [(set (match_operand:0 "nonimmediate_operand" "")
+   (vec_select:
+ (match_operand:V  1 "register_operand" "")
+ (parallel
+  [(match_operand:SI   2 "nonmemory_operand" "")])))]
+  "TARGET_VX"
+)

 ; vlgvb, vlgvh, vlgvf, vlgvg, vsteb, vsteh, vstef, vsteg
 (define_insn "*vec_extract"
-  [(set (match_operand: 0 "nonimmediate_operand"  "=d,R")
-   (unspec: [(match_operand:V  1 "register_operand"   "v,v")
-  (match_operand:SI 2 "nonmemory_operand" "an,I")]
- UNSPEC_VEC_EXTRACT))]
+  [(set (match_operand: 0 "nonimmediate_operand" "=d,R")
+   (vec_select:
+ (match_operand:V   1 "nonmemory_operand"  "v,v")
+ (parallel
+  [(match_operand:SI2 "nonmemory_operand" "an,I")])))]
   "TARGET_VX
&& (!CONST_INT_P (operands[2])
|| UINTVAL (operands[2]) < GET_MODE_NUNITS (mode))"
@@ -531,11 +535,11 @@ (define_insn "*vec_extract"

 ; vlgvb, vlgvh, vlgvf, vlgvg
 (define_insn "*vec_extract_plus"
-  [(set (match_operand:  0
"nonimmediate_operand" "=d")
-   (unspec: [(match_operand:V   1 "register_operand"
 "v")
-  (plus:SI (match_operand:SI 2 "nonmemory_operand" 
"a")
-   (match_operand:SI 3 "const_int_operand" 
"n"))]
-  UNSPEC_VEC_EXTRACT))]
+  [(set (match_operand:   0 "nonimmediate_operand" "=d")
+   (vec_select:
+(match_operand:V  1 "register_operand"  "v")
+(plus:SI (match_operand:SI2 "nonmemory_operand" "a")
+ (parallel [(match_operand:SI 3 "const_int_operand" "n")]]
   "TARGET_VX"
   "vlgv\t%0,%v1,%Y3(%2)"
   [(set_attr "op_type" "VRS")])
diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md
index 22d0355ec219..fc13f0a3393e 100644
--- a/gcc/config/s390/vx-builtins.md
+++ b/gcc/config/s390/vx-builtins.md
@@ -440,12

Fix invalid devirtualization when combining final keyword and anonymous types

2022-08-12 Thread Jan Hubicka via Gcc-patches
Hi,
this patch fixes a wrong code issue where we incorrectly devirtualize to
__builtin_unreachable.  The problem occurs in combination of anonymous
namespaces and final keyword used on methods.  We do two optimizations here
 1) when reacing final method we cut the search for possible new targets
 2) if the type is anonymous we detect whether it is ever instatiated by
looking if its vtable is referred to.
Now this goes wrong when thre is an anonymous type with final method that
is not instantiated while its derived type is.  So if 1 triggers we need
to make 2 to look for vtables of all derived types as done by this patch.

Bootstrpaped/regtested x86_64-linux, comitted.

Honza

gcc/ChangeLog:

2022-08-10  Jan Hubicka  

PR middle-end/106057
* ipa-devirt.cc (type_or_derived_type_possibly_instantiated_p): New
function.
(possible_polymorphic_call_targets): Use it.

gcc/testsuite/ChangeLog:

2022-08-10  Jan Hubicka  

PR middle-end/106057
* g++.dg/tree-ssa/pr101839.C: New test.

diff --git a/gcc/ipa-devirt.cc b/gcc/ipa-devirt.cc
index 412ca14f66b..265d07bb354 100644
--- a/gcc/ipa-devirt.cc
+++ b/gcc/ipa-devirt.cc
@@ -285,6 +285,19 @@ type_possibly_instantiated_p (tree t)
   return vnode && vnode->definition;
 }
 
+/* Return true if T or type derived from T may have instance.  */
+
+static bool
+type_or_derived_type_possibly_instantiated_p (odr_type t)
+{
+  if (type_possibly_instantiated_p (t->type))
+return true;
+  for (auto derived : t->derived_types)
+if (type_or_derived_type_possibly_instantiated_p (derived))
+  return true;
+  return false;
+}
+
 /* Hash used to unify ODR types based on their mangled name and for anonymous
namespace types.  */
 
@@ -3172,6 +3185,7 @@ possible_polymorphic_call_targets (tree otr_type,
 {
   odr_type speculative_outer_type;
   bool speculation_complete = true;
+  bool check_derived_types = false;
 
   /* First insert target from type itself and check if it may have
 derived types.  */
@@ -3190,8 +3204,12 @@ possible_polymorphic_call_targets (tree otr_type,
 to walk derivations.  */
   if (target && DECL_FINAL_P (target))
context.speculative_maybe_derived_type = false;
-  if (type_possibly_instantiated_p (speculative_outer_type->type))
-   maybe_record_node (nodes, target, &inserted, can_refer, 
&speculation_complete);
+  if (check_derived_types
+ ? type_or_derived_type_possibly_instantiated_p
+(speculative_outer_type)
+ : type_possibly_instantiated_p (speculative_outer_type->type))
+   maybe_record_node (nodes, target, &inserted, can_refer,
+  &speculation_complete);
   if (binfo)
matched_vtables.add (BINFO_VTABLE (binfo));
 
@@ -3212,6 +3230,7 @@ possible_polymorphic_call_targets (tree otr_type,
 
   if (!speculative || !nodes.length ())
 {
+  bool check_derived_types = false;
   /* First see virtual method of type itself.  */
   binfo = get_binfo_at_offset (TYPE_BINFO (outer_type->type),
   context.offset, otr_type);
@@ -3229,16 +3248,18 @@ possible_polymorphic_call_targets (tree otr_type,
   if (target && DECL_CXX_DESTRUCTOR_P (target))
context.maybe_in_construction = false;
 
-  if (target)
+  /* In the case we get complete method, we don't need 
+to walk derivations.  */
+  if (target && DECL_FINAL_P (target))
{
- /* In the case we get complete method, we don't need 
-to walk derivations.  */
- if (DECL_FINAL_P (target))
-   context.maybe_derived_type = false;
+ check_derived_types = true;
+ context.maybe_derived_type = false;
}
 
   /* If OUTER_TYPE is abstract, we know we are not seeing its instance.  */
-  if (type_possibly_instantiated_p (outer_type->type))
+  if (check_derived_types
+ ? type_or_derived_type_possibly_instantiated_p (outer_type)
+ : type_possibly_instantiated_p (outer_type->type))
maybe_record_node (nodes, target, &inserted, can_refer, &complete);
   else
skipped = true;
diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr101839.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr101839.C
new file mode 100644
index 000..bb7b61cad43
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr101839.C
@@ -0,0 +1,53 @@
+// { dg-do run }
+// { dg-options "-O2 -fdump-tree-optimized" }  
 
+// { dg-require-effective-target c++11 }
+
+#include 
+#include 
+#include 
+namespace {
+  struct Buf {
+char * buf; int a{0}; int b{0};
+Buf(char * b) : buf(b) { }
+void add(int v) {
+  ::memcpy(buf, &v, sizeof(v));
+  a += sizeof(v);
+  b += sizeof(v);
+}
+  };
+  struct A {
+virtual void fill(Buf &buf) {
+  buf.add(type());
+  buf.add(type());
+}
+virtual ~A() {}

[PATCH 0/15] arm: Enables return address verification and branch target identification on Cortex-M

2022-08-12 Thread Andrea Corallo via Gcc-patches
Hi all,

as I respinned few patches, dropped one and added another, I'm reposting
this series thant enables return address verification and branch target
identification based on Armv8.1-M Pointer Authentication and Branch
Target Identification Extension [1] for Arm Cortex-M.

This feature is controlled by the newly introduced '-mbranch-protection'
option, contextually the Armv8.1-M Mainline target feature '+pacbti' is
added.

Best Regards

  Andrea

[1] 



[PATCH] s390: Implement vec_set with vec_merge and, vec_duplicate.

2022-08-12 Thread Robin Dapp via Gcc-patches
Hi,

similar to other backends this patch implements vec_set via
vec_merge and vec_duplicate instead of an unspec.  This opens up
more possibilites to combine instructions.

Bootstrapped and regtested. No regressions.

Is it OK?

Regards
 Robin

gcc/ChangeLog:

* config/s390/s390.md: Implement vec_set with vec_merge and
vec_duplicate.
* config/s390/vector.md: Likewise.
* config/s390/vx-builtins.md: Likewise.
* config/s390/s390.cc (s390_expand_vec_init): Emit new pattern.
(print_operand_address): New output modifier.
(print_operand): New output modifier.
---

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index c86b26933d7a..ff89fb83360a 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -7073,11 +7073,10 @@ s390_expand_vec_init (rtx target, rtx vals)
   if (!general_operand (elem, GET_MODE (elem)))
elem = force_reg (inner_mode, elem);

-  emit_insn (gen_rtx_SET (target,
- gen_rtx_UNSPEC (mode,
- gen_rtvec (3, elem,
-GEN_INT (i), target),
- UNSPEC_VEC_SET)));
+  emit_insn
+   (gen_rtx_SET
+(target, gen_rtx_VEC_MERGE
+ (mode, gen_rtx_VEC_DUPLICATE (mode, elem), target, GEN_INT (1 << 
i;
 }
 }

@@ -8057,6 +8056,8 @@ print_operand_address (FILE *file, rtx addr)
 'S': print S-type memory reference (base+displacement).
 'Y': print address style operand without index (e.g. shift count or
setmem
 operand).
+'P': print address-style operand without index but with the offset as
+if it were specified by a 'p' format flag.

 'b': print integer X as if it's an unsigned byte.
 'c': print integer X as if it's an signed byte.
@@ -8068,6 +8069,7 @@ print_operand_address (FILE *file, rtx addr)
 'k': print the first nonzero SImode part of X.
 'm': print the first SImode part unequal to -1 of X.
 'o': print integer X as if it's an unsigned 32bit word.
+'p': print N such that 2^N == X (X must be a power of 2 and const int).
 's': "start" of contiguous bitmask X in either DImode or vector
inner mode.
 't': CONST_INT: "start" of contiguous bitmask X in SImode.
 CONST_VECTOR: Generate a bitmask for vgbm instruction.
@@ -8237,6 +8239,16 @@ print_operand (FILE *file, rtx x, int code)
   print_shift_count_operand (file, x);
   return;

+case 'P':
+  if (CONST_INT_P (x))
+   {
+ ival = exact_log2 (INTVAL (x));
+ fprintf (file, HOST_WIDE_INT_PRINT_DEC, ival);
+   }
+  else
+   print_shift_count_operand (file, x);
+  return;
+
 case 'K':
   /* Append @PLT to both local and non-local symbols in order to
support
 Linux Kernel livepatching: patches contain individual functions and
@@ -8321,6 +8333,9 @@ print_operand (FILE *file, rtx x, int code)
case 'o':
  ival &= 0x;
  break;
+   case 'p':
+ ival = exact_log2 (INTVAL (x));
+ break;
case 'e': case 'f':
case 's': case 't':
  {
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index f37d8fd33a15..a82db4c624fa 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -183,7 +183,6 @@ (define_c_enum "unspec" [
UNSPEC_VEC_GFMSUM_128
UNSPEC_VEC_GFMSUM_ACCUM
UNSPEC_VEC_GFMSUM_ACCUM_128
-   UNSPEC_VEC_SET

UNSPEC_VEC_VSUMG
UNSPEC_VEC_VSUMQ
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index c50451a8326c..bde3a39db3d4 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -467,12 +467,17 @@ (define_insn "mov"
 ; vec_set is supposed to *modify* an existing vector so operand 0 is
 ; duplicated as input operand.
 (define_expand "vec_set"
-  [(set (match_operand:V0 "register_operand"  "")
-   (unspec:V [(match_operand: 1 "general_operand"   "")
-  (match_operand:SI2 "nonmemory_operand" "")
-  (match_dup 0)]
-  UNSPEC_VEC_SET))]
-  "TARGET_VX")
+  [(set (match_operand:V0 "register_operand" "")
+   (vec_merge:V
+ (vec_duplicate:V
+   (match_operand: 1 "general_operand" ""))
+ (match_dup 0)
+ (match_operand:SI  2 "nonmemory_operand")))]
+  ""
+{
+  if (CONST_INT_P (operands[2]))
+operands[2] = GEN_INT (1 << INTVAL (operands[2]));
+})

 ; FIXME: Support also vector mode operands for 1
 ; FIXME: A target memory operand seems to be useful otherwise we end
@@ -480,28 +485,31 @@ (define_expand "vec_set"
 ; that itself?
 ; vlvgb, vlvgh, vlvgf, vlvgg, vleb, vleh, vlef, vleg, vleib, vleih,
vleif, vleig
 (define_insn "*vec_set"
-  [(set (match_operand:V0 "register_operand"  "=v,v,v")
-   (unspec:V [(match_operand: 1 "general_operand""d,R,K")
-

Re: [PATCH] tree-optimization/106593 - fix ICE with backward threading

2022-08-12 Thread Aldy Hernandez via Gcc-patches
In that case Richi, go right ahead with your original patch. I for one am
happy we can use range_on_entry, which always seemed cleaner.

Aldy

On Fri, Aug 12, 2022, 16:07 Andrew MacLeod  wrote:

>
> On 8/12/22 09:38, Andrew MacLeod wrote:
> >
> > On 8/12/22 07:31, Aldy Hernandez wrote:
> >> On Fri, Aug 12, 2022 at 12:59 PM Richard Biener 
> >> wrote:
> >>> With the last re-org I failed to make sure to not add SSA names
> >>> nor supported by ranger into m_imports which then triggers an
> >>> ICE in range_on_path_entry because range_of_expr returns false.  I've
> >>> noticed that range_on_path_entry does mightly complicated things
> >>> that don't make sense to me and the commentary might just be
> >>> out of date.  For the sake of it I replaced it with range_on_entry
> >>> and statistics show we thread _more_ jumps with that, so better
> >>> not do magic there.
> >> Hang on, hang on.  range_on_path_entry was written that way for a
> >> reason.  Andrew and I had numerous discussions about this.  For that
> >> matter, my first implementation did exactly what you're proposing, but
> >> he had reservations about using range_on_entry, which IIRC he thought
> >> should be removed from the (public) API because it had a tendency to
> >> blow up lookups.
> >>
> >> Let's wait for Andrew to chime in on this.  If indeed the commentary
> >> is out of date, I would much rather use range_on_entry like you
> >> propose, but he and I have fought many times about this... over
> >> various versions of the path solver :).
> >
> > The original issue with range-on-entry is one needed to be very
> > careful with it.  If you ask for range-on-entry of something which is
> > not dominated by the definition, then the cache filling walk was
> > getting filled all the way back to the top of the IL, and that was
> > both a waste of time and memory., and in some pathological cases was
> > outrageous.  And it was happening more frequently than one imagines...
> > even if accidentally.  I think the most frequent accidental misuse we
> > saw was calling range on entry for a def within the block, or a PHI
> > for the block.
> >
> > Its a legitimate issue for used before defined cases, but there isnt
> > much we can do about those anyway,
> >
> > range_of_expr on any stmt within a block, when the definition comes
> > from outside he block causes ranger to trigger its internal
> > range-on-entry "more safely", which is why it didn't need to be part
> > of the API... but i admit it does cause some conniptions when for
> > instance there is no stmt in the block.
> >
> > That said, the improvements since then to the cache to be able to
> > always use dominators, and selectively update the cache at strategic
> > locations probably removes most issues with it. That plus we're more
> > careful about timing things these days to make sure something horrid
> > isn't introduced.  I also notice all my internal range_on_entry and
> > _exit routines have evolved and are much cleaner than they once were.
> >
> > So. now that we are sufficiently mature in this space...  I think we
> > can promote range_on_entry and range_on_exit to full public API..  It
> > does seem that there is some use practical use for them.
> >
> > Andrew
> >
> > PS. It might even be worthwhile to add an assert to make sure it isnt
> > being called on the def block.. just to avoid that particular stupidty
> > :-)   I'll take care of doing this.
> >
> >
> Actually, as I look at it, perhaps better to leave things as they are..
> ie, not promote it to a part of the range_query API.. that appears
> fraught with derived issues in other places.
>
> Continue to leave it in rangers public API and anyone using a ranger can
> use it. I will add the assert to make sure its not abused in the common
> way of the past.
>
> And yes, this will dramatically simplify the path_entry routine :-)
>
> Andrew
>
>


[PATCH 1/15] arm: Make mbranch-protection opts parsing common to AArch32/64

2022-08-12 Thread Andrea Corallo via Gcc-patches
Hi all,

This change refactors all the mbranch-protection option parsing code and
types to make it common to both AArch32 and AArch64 backends.

This change also pulls in some supporting types from AArch64 to make
it common (aarch_parse_opt_result).

The significant changes in this patch are the movement of all branch
protection parsing routines from aarch64.c to aarch-common.c and
supporting data types and static data structures.

This patch also pre-declares variables and types required in the
aarch32 back-end for moved variables for function sign scope and key
to prepare for the impending series of patches that support parsing
the feature mbranch-protection in the aarch32 back-end.

Approved here


gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc: Include aarch-common.h.
(all_architectures): Fix comment.
(aarch64_parse_extension): Rename return type, enum value names.
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Rename
factored out aarch_ra_sign_scope and aarch_ra_sign_key variables.
Also rename corresponding enum values.
* config/aarch64/aarch64-opts.h (aarch64_function_type): Factor
out aarch64_function_type and move it to common code as
aarch_function_type in aarch-common.h.
* config/aarch64/aarch64-protos.h: Include common types header,
move out types aarch64_parse_opt_result and aarch64_key_type to
aarch-common.h
* config/aarch64/aarch64.cc: Move mbranch-protection parsing types
and functions out into aarch-common.h and aarch-common.cc.  Fix up
all the name changes resulting from the move.
* config/aarch64/aarch64.md: Fix up aarch64_ra_sign_key type name change
and enum value.
* config/aarch64/aarch64.opt: Include aarch-common.h to import
type move.  Fix up name changes from factoring out common code and
data.
* config/arm/aarch-common-protos.h: Export factored out routines to both
backends.
* config/arm/aarch-common.cc: Include newly factored out types.  Move 
all
mbranch-protection code and data structures from aarch64.cc.
* config/arm/aarch-common.h: New header that declares types shared
between aarch32 and aarch64 backends.
* config/arm/arm-protos.h: Declare types and variables that are
made common to aarch64 and aarch32 backends - aarch_ra_sign_key,
aarch_ra_sign_scope and aarch_enable_bti.

Co-Authored-By: Tejas Belagod  

diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index dfda5b8372a..70a5cf98b75 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -30,6 +30,7 @@
 #include "opts.h"
 #include "flags.h"
 #include "diagnostic.h"
+#include "config/arm/aarch-common.h"
 
 #ifdef  TARGET_BIG_ENDIAN_DEFAULT
 #undef  TARGET_DEFAULT_TARGET_FLAGS
@@ -192,11 +193,11 @@ static const struct arch_to_arch_name all_architectures[] 
=
 
 /* Parse the architecture extension string STR and update ISA_FLAGS
with the architecture features turned on or off.  Return a
-   aarch64_parse_opt_result describing the result.
+   aarch_parse_opt_result describing the result.
When the STR string contains an invalid extension,
a copy of the string is created and stored to INVALID_EXTENSION.  */
 
-enum aarch64_parse_opt_result
+enum aarch_parse_opt_result
 aarch64_parse_extension (const char *str, uint64_t *isa_flags,
 std::string *invalid_extension)
 {
@@ -229,7 +230,7 @@ aarch64_parse_extension (const char *str, uint64_t 
*isa_flags,
adding_ext = 1;
 
   if (len == 0)
-   return AARCH64_PARSE_MISSING_ARG;
+   return AARCH_PARSE_MISSING_ARG;
 
 
   /* Scan over the extensions table trying to find an exact match.  */
@@ -251,13 +252,13 @@ aarch64_parse_extension (const char *str, uint64_t 
*isa_flags,
  /* Extension not found in list.  */
  if (invalid_extension)
*invalid_extension = std::string (str, len);
- return AARCH64_PARSE_INVALID_FEATURE;
+ return AARCH_PARSE_INVALID_FEATURE;
}
 
   str = ext;
 };
 
-  return AARCH64_PARSE_OK;
+  return AARCH_PARSE_OK;
 }
 
 /* Append all architecture extension candidates to the CANDIDATES vector.  */
diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index caf8e332ea0..b0c5a4fd6b6 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -183,14 +183,14 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
"__ARM_FEATURE_BTI_DEFAULT", pfile);
 
   cpp_undef (pfile, "__ARM_FEATURE_PAC_DEFAULT");
-  if (aarch64_ra_sign_scope != AARCH64_FUNCTION_NONE)
+  if (aarch_ra_sign_scope != AARCH_FUNCTION_NONE)
 {
   int v = 0;
-  if (aarch64_ra_sign_key == AA

[PATCH 2/15] arm: Add Armv8.1-M Mainline target feature +pacbti

2022-08-12 Thread Andrea Corallo via Gcc-patches
This patch adds the -march feature +pacbti to Armv8.1-M Mainline.
This feature enables pointer signing and authentication instructions
on M-class architectures.

Pre-approved here
.

gcc/Changelog:

* config/arm/arm.h (TARGET_HAVE_PACBTI): New macro.
* config/arm/arm-cpus.in (pacbti): New feature.
* doc/invoke.texi (Arm Options): Document it.

Co-Authored-By: Tejas Belagod  

diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 0d3082b569f..9502a34fa97 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -229,6 +229,10 @@ define feature cdecp5
 define feature cdecp6
 define feature cdecp7
 
+# M-profile control flow integrity extensions (PAC/AUT/BTI).
+# Optional from Armv8.1-M Mainline.
+define feature pacbti
+
 # Feature groups.  Conventionally all (or mostly) upper case.
 # ALL_FPU lists all the feature bits associated with the floating-point
 # unit; these will all be removed if the floating-point unit is disabled
@@ -743,6 +747,7 @@ begin arch armv8.1-m.main
  isa ARMv8_1m_main
 # fp => FPv5-sp-d16; fp.dp => FPv5-d16
  option dsp add armv7em
+ option pacbti add pacbti
  option fp add FPv5 fp16
  option fp.dp add FPv5 FP_DBL fp16
  option nofp remove ALL_FP
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index f479540812a..3495ab857ea 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -335,6 +335,12 @@ emission of floating point pcs attributes.  */
isa_bit_mve_float) \
   && !TARGET_GENERAL_REGS_ONLY)
 
+/* Non-zero if this target supports Armv8.1-M Mainline pointer-signing
+   extension.  */
+#define TARGET_HAVE_PACBTI (arm_arch8_1m_main \
+   && bitmap_bit_p (arm_active_target.isa, \
+isa_bit_pacbti))
+
 /* MVE have few common instructions as VFP, like VLDM alias VPOP, VLDR, VSTM
alia VPUSH, VSTR and VMOV, VMSR and VMRS.  In the same manner it updates few
registers such as FPCAR, FPCCR, FPDSCR, FPSCR, MVFR0, MVFR1 and MVFR2.  All
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 3936aef69d0..079e34ed98c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -21002,6 +21002,9 @@ Disable the floating-point extension.
 @item +cdecp0, +cdecp1, ... , +cdecp7
 Enable the Custom Datapath Extension (CDE) on selected coprocessors according
 to the numbers given in the options in the range 0 to 7.
+
+@item +pacbti
+Enable the Pointer Authentication and Branch Target Identification Extension.
 @end table
 
 @item  armv8-m.main


[PATCH 3/15] arm: Add option -mbranch-protection

2022-08-12 Thread Andrea Corallo via Gcc-patches
Hi all,

this adds -mbranch-protection option.  This option enables the
code-generation of pointer signing and authentication instructions in
function prologues and epilogues.

gcc/ChangeLog:

* config/arm/arm.c (arm_configure_build_target): Parse and validate
-mbranch-protection option and initialize appropriate data structures.
* config/arm/arm.opt (-mbranch-protection): New option.
* doc/invoke.texi (Arm Options): Document it.

Co-Authored-By: Tejas Belagod  
Co-Authored-By: Richard Earnshaw 

Approved here 

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 60f3eae82a4..0068817b0f2 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -3263,6 +3263,17 @@ arm_configure_build_target (struct arm_build_target 
*target,
   tune_opts = strchr (opts->x_arm_tune_string, '+');
 }
 
+  if (opts->x_arm_branch_protection_string)
+{
+  aarch_validate_mbranch_protection (opts->x_arm_branch_protection_string);
+
+  if (aarch_ra_sign_key != AARCH_KEY_A)
+   {
+ warning (0, "invalid key type for %<-mbranch-protection=%>");
+ aarch_ra_sign_key = AARCH_KEY_A;
+   }
+}
+
   if (arm_selected_arch)
 {
   arm_initialize_isa (target->isa, arm_selected_arch->common.isa_bits);
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index f54ec8356c3..d292e23ea11 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -323,6 +323,10 @@ mbranch-cost=
 Target RejectNegative Joined UInteger Var(arm_branch_cost) Init(-1)
 Cost to assume for a branch insn.
 
+mbranch-protection=
+Target RejectNegative Joined Var(arm_branch_protection_string) Save
+Use branch-protection features.
+
 mgeneral-regs-only
 Target RejectNegative Mask(GENERAL_REGS_ONLY) Save
 Generate code which uses the core registers only (r0-r14).
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 079e34ed98c..a2be3446594 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -825,7 +825,9 @@ Objective-C and Objective-C++ Dialects}.
 -mcmse @gol
 -mfix-cmse-cve-2021-35465 @gol
 -mstack-protector-guard=@var{guard} 
-mstack-protector-guard-offset=@var{offset} @gol
--mfdpic}
+-mfdpic @gol
+-mbranch-protection=@var{none}|@var{standard}|@var{pac-ret}[+@var{leaf}]
+[+@var{bti}]|@var{bti}[+@var{pac-ret}[+@var{leaf}]]}
 
 @emph{AVR Options}
 @gccoptlist{-mmcu=@var{mcu}  -mabsdata  -maccumulate-args @gol
@@ -21521,6 +21523,40 @@ The opposite @option{-mno-fdpic} option is useful (and 
required) to
 build the Linux kernel using the same (@code{arm-*-uclinuxfdpiceabi})
 toolchain as the one used to build the userland programs.
 
+@item
+-mbranch-protection=@var{none}|@var{standard}|@var{pac-ret}[+@var{leaf}][+@var{bti}]|@var{bti}[+@var{pac-ret}[+@var{leaf}]]
+@opindex mbranch-protection
+Enable branch protection features (armv8.1-m.main only).
+@samp{none} generate code without branch protection or return address
+signing.
+@samp{standard[+@var{leaf}]} generate code with all branch protection
+features enabled at their standard level.
+@samp{pac-ret[+@var{leaf}]} generate code with return address signing
+set to its standard level, which is to sign all functions that save
+the return address to memory.
+@samp{leaf} When return address signing is enabled, also sign leaf
+functions even if they do not write the return address to memory.
++@samp{bti} Add landing-pad instructions at the permitted targets of
+indirect branch instructions.
+
+If the @samp{+pacbti} architecture extension is not enabled, then all
+branch protection and return address signing operations are
+constrained to use only the instructions defined in the
+architectural-NOP space. The generated code will remain
+backwards-compatible with earlier versions of the architecture, but
+the additional security can be enabled at run time on processors that
+support the @samp{PACBTI} extension.
+
+Branch target enforcement using BTI can only be enabled at runtime if
+all code in the application has been compiled with at least
+@samp{-mbranch-protection=bti}.
+
+Any setting other than @samp{none} is supported only on armv8-m.main
+or later.
+
+The default is to generate code without branch protection or return
+address signing.
+
 @end table
 
 @node AVR Options


[PATCH 4/15] arm: Add testsuite library support for PACBTI target

2022-08-12 Thread Andrea Corallo via Gcc-patches
Hi all,

this adds targeting-checking entities for PACBTI in testsuite
framework.

Pre-approved with the requested changes here
.

gcc/testsuite/ChangeLog:

* testsuite/lib/target-supports.exp:
(check_effective_target_arm_pacbti_hw): New.
* doc/sourcebuild.texi: Document arm_pacbti_hw.

Co-Authored-By: Tejas Belagod  

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 613ac29967b..a3f60e9c0cb 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2167,6 +2167,10 @@ ARM target supports options to generate instructions 
from ARMv8.1-M with
 the Custom Datapath Extension (CDE) and M-Profile Vector Extension (MVE).
 Some multilibs may be incompatible with these options.
 
+@item arm_pacbti_hw
+Test system supports executing Pointer Authentication and Branch Target
+Identification instructions.
+
 @item arm_prefer_ldrd_strd
 ARM target prefers @code{LDRD} and @code{STRD} instructions over
 @code{LDM} and @code{STM} instructions.
@@ -2256,6 +2260,12 @@ ARM target generates Thumb-2 code for @code{-mthumb} but 
does not
 support executing the Armv8.1-M Mainline Low Overhead Loop
 instructions @code{DLS} and @code{LE}.
 
+@item mbranch_protection_ok
+ARM target supporting @code{-mbranch-protection=standard}.
+
+@item arm_pacbti_hw
+Test system supports for executing non nop pacbti instructions.
+
 @end table
 
 @subsubsection AArch64-specific attributes
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index ff8edbd3e17..aa828bd3a07 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5090,6 +5090,22 @@ proc check_effective_target_arm_cmse_clear_ok {} {
 } "-mcmse"];
 }
 
+# Return 1 if the target supports executing PACBTI instructions, 0
+# otherwise.
+
+proc check_effective_target_arm_pacbti_hw {} {
+return [check_runtime arm_pacbti_hw_available {
+   __attribute__ ((naked)) int
+   main (void)
+   {
+ asm ("pac r12, lr, sp");
+ asm ("mov r0, #0");
+ asm ("autg r12, lr, sp");
+ asm ("bx lr");
+   }
+} "-march=armv8.1-m.main+pacbti+fp -mbranch-protection=standard -mthumb 
-mfloat-abi=hard"]
+}
+
 # Return 1 if this compilation turns on string_ops_prefer_neon on.
 
 proc check_effective_target_arm_tune_string_ops_prefer_neon { } {


[PATCH 5/15] arm: Implement target feature macros for PACBTI

2022-08-12 Thread Andrea Corallo via Gcc-patches
This patch implements target feature macros when PACBTI is enabled
through the -march option or -mbranch-protection.  The target feature
macros __ARM_FEATURE_PAC_DEFAULT and __ARM_FEATURE_BTI_DEFAULT are
specified in ARM ACLE

__ARM_FEATURE_PAUTH and __ARM_FEATURE_BTI are specified in the
pull-request .

Approved here
.

gcc/

* config/arm/arm-c.c (arm_cpu_builtins): Define
__ARM_FEATURE_BTI_DEFAULT, __ARM_FEATURE_PAC_DEFAULT,
__ARM_FEATURE_PAUTH and __ARM_FEATURE_BTI.

gcc/testsuite/

* lib/target-supports.exp
(check_effective_target_mbranch_protection_ok): New function.
* gcc.target/arm/acle/pacbti-m-predef-2.c: New test.
* gcc.target/arm/acle/pacbti-m-predef-4.c: Likewise.
* gcc.target/arm/acle/pacbti-m-predef-5.c: Likewise.
* gcc.target/arm/acle/pacbti-m-predef-8.c: Likewise.
* gcc.target/arm/acle/pacbti-m-predef-9.c: Likewise.
* gcc.target/arm/acle/pacbti-m-predef-10.c: Likewise.
* gcc.target/arm/acle/pacbti-m-predef-11.c: Likewise.
* gcc.target/arm/acle/pacbti-m-predef-12.c: Likewise.

Co-Authored-By: Tejas Belagod  

diff --git a/gcc/config/arm/arm-c.cc b/gcc/config/arm/arm-c.cc
index a8697b8c62f..190099b2c37 100644
--- a/gcc/config/arm/arm-c.cc
+++ b/gcc/config/arm/arm-c.cc
@@ -212,6 +212,24 @@ arm_cpu_builtins (struct cpp_reader* pfile)
   def_or_undef_macro (pfile, "__ARM_FEATURE_COMPLEX", TARGET_COMPLEX);
   def_or_undef_macro (pfile, "__ARM_32BIT_STATE", TARGET_32BIT);
 
+  def_or_undef_macro (pfile, "__ARM_FEATURE_PAUTH", TARGET_HAVE_PACBTI);
+  def_or_undef_macro (pfile, "__ARM_FEATURE_BTI", TARGET_HAVE_PACBTI);
+  def_or_undef_macro (pfile, "__ARM_FEATURE_BTI_DEFAULT",
+ aarch_enable_bti == 1);
+
+  cpp_undef (pfile, "__ARM_FEATURE_PAC_DEFAULT");
+  if (aarch_ra_sign_scope != AARCH_FUNCTION_NONE)
+  {
+unsigned int pac = 1;
+
+gcc_assert (aarch_ra_sign_key == AARCH_KEY_A);
+
+if (aarch_ra_sign_scope == AARCH_FUNCTION_ALL)
+  pac |= 0x4;
+
+builtin_define_with_int_value ("__ARM_FEATURE_PAC_DEFAULT", pac);
+  }
+
   cpp_undef (pfile, "__ARM_FEATURE_MVE");
   if (TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT)
 {
diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-10.c 
b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-10.c
new file mode 100644
index 000..52d18238109
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-10.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target mbranch_protection_ok } */
+/* { dg-additional-options "-march=armv8.1-m.main+fp 
-mbranch-protection=bti+pac-ret -mfloat-abi=hard" } */
+
+#if (__ARM_FEATURE_BTI_DEFAULT != 1)
+#error "Feature test macro __ARM_FEATURE_BTI_DEFAULT should be defined to 1."
+#endif
+
+#if !defined (__ARM_FEATURE_PAC_DEFAULT)
+#error "Feature test macro __ARM_FEATURE_PAC_DEFAULT should be defined."
+#endif
diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-11.c 
b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-11.c
new file mode 100644
index 000..9f2711097ac
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-11.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" 
"-mcpu=*" "-mfloat-abi=*" } } */
+/* { dg-options "-march=armv8.1-m.main+pacbti" } */
+
+#if (__ARM_FEATURE_BTI != 1)
+#error "Feature test macro __ARM_FEATURE_BTI_DEFAULT should be defined to 1."
+#endif
+
+#if (__ARM_FEATURE_PAUTH != 1)
+#error "Feature test macro __ARM_FEATURE__PAUTH should be defined to 1."
+#endif
diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-12.c 
b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-12.c
new file mode 100644
index 000..db40b17c3b0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-12.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" 
"-mcpu=*" } } */
+/* { dg-options "-march=armv8-m.main+fp -mfloat-abi=softfp" } */
+
+#if defined (__ARM_FEATURE_BTI)
+#error "Feature test macro __ARM_FEATURE_BTI should not be defined."
+#endif
+
+#if defined (__ARM_FEATURE_PAUTH)
+#error "Feature test macro __ARM_FEATURE_PAUTH should not be defined."
+#endif
diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-2.c 
b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-2.c
new file mode 100644
index 000..cd418ce0c7f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-2.c
@@ -0,0 +1,24 @@
+/* { dg-do run } */
+/* { dg-require-effective-target mbranch_protection_ok } */
+/* { dg-require-effective-target arm_pacbti_hw } */
+/* { dg-options "-march=armv8.1-m.main+pacbti+fp 
-mbranch-protection=bti+pac-ret+leaf -mthumb -mfloa

Re: [PATCH] tree-optimization/106593 - fix ICE with backward threading

2022-08-12 Thread Aldy Hernandez via Gcc-patches
On Fri, Aug 12, 2022 at 1:36 PM Richard Biener  wrote:
>
> On Fri, 12 Aug 2022, Aldy Hernandez wrote:
>
> > On Fri, Aug 12, 2022 at 12:59 PM Richard Biener  wrote:
> > >
> > > With the last re-org I failed to make sure to not add SSA names
> > > nor supported by ranger into m_imports which then triggers an
> > > ICE in range_on_path_entry because range_of_expr returns false.  I've
> > > noticed that range_on_path_entry does mightly complicated things
> > > that don't make sense to me and the commentary might just be
> > > out of date.  For the sake of it I replaced it with range_on_entry
> > > and statistics show we thread _more_ jumps with that, so better
> > > not do magic there.
> >
> > Hang on, hang on.  range_on_path_entry was written that way for a
> > reason.  Andrew and I had numerous discussions about this.  For that
> > matter, my first implementation did exactly what you're proposing, but
> > he had reservations about using range_on_entry, which IIRC he thought
> > should be removed from the (public) API because it had a tendency to
> > blow up lookups.
> >
> > Let's wait for Andrew to chime in on this.  If indeed the commentary
> > is out of date, I would much rather use range_on_entry like you
> > propose, but he and I have fought many times about this... over
> > various versions of the path solver :).
> >
> > For now I would return VARYING in range_on_path_entry if range_of_expr
> > returns false.  We shouldn't be ICEing when we can gracefully handle
> > things.  This gcc_unreachable was there to catch implementation issues
> > during development.
> >
> > I would keep your gimple_range_ssa_p check regardless.  No sense doing
> > extra work if we're absolutely sure we won't handle it.
>
> OK, I'll push just the gimple_range_ssa_p then since that resolves
> the PR on its own.  I was first misled about the gcc_unreachable
> and my brain hurt understanding this function ... (also as to
> why using range_of_expr on a _random_ stmt would be OK).

Calling range_of_expr on a random stmt, is not OK, and is bound to
lead to subtle issues.  As I mentioned earlier, and both in the
comments for class path_range_query and
path_range_query::internal_range_of_expr, all we really support is
querying range_of_stmt and range_of_expr as it would appear at the end
of the path.

Internally to the path solver, if it uses range_of_expr and the SSA is
defined out side the path, we'll ignore the statement altogether and
return the range on entry to the path.  So yeah... feeding random
statements is not good.  It's meant to be used to query ranges of SSA
names at the end of the path.

Hmmm, perhaps I should rewrite
path_range_query::internal_range_of_expr() to explicitly ignore the
STMT, or even put some asserts if it's being used nonsensibly.

Aldy



[PATCH 6/15] arm: Add pointer authentication for stack-unwinding runtime

2022-08-12 Thread Andrea Corallo via Gcc-patches
Hi all,

this patch adds authentication for when the stack is unwound when an
exception is taken.  All the changes here are done to the runtime code
in libgcc's unwinder code for Arm target. All the changes are guarded
under defined (__ARM_FEATURE_PAC_DEFAULT) and activated only if the
+pacbti feature is switched on for the architecture. This means that
switching on the target feature via -march or -mcpu is sufficient and
-mbranch-protection need not be enabled. This ensures that the
unwinder is authenticated only if the PACBTI instructions are
available in the non-NOP space as it uses AUTG.  Just generating
PAC/AUT instructions using -mbranch-protection will not enable
authentication on the unwinder.

Arroved here:


gcc/ChangeLog:

* ginclude/unwind-arm-common.h (_Unwind_VRS_RegClass): Introduce
new pseudo register class _UVRSC_PAC.
* libgcc/config/arm/pr-support.c (__gnu_unwind_execute): Decode
exception opcode (0xb4) for saving RA_AUTH_CODE and authenticate
with AUTG if found.
* libgcc/config/arm/unwind-arm.c (struct pseudo_regs): New.
(phase1_vrs): Introduce new field to store pseudo-reg state.
(phase2_vrs): Likewise.
(_Unwind_VRS_Get): Load pseudo register state from virtual reg set.
(_Unwind_VRS_Set): Store pseudo register state to virtual reg set.
(_Unwind_VRS_Pop): Load pseudo register value from stack into VRS.

Co-Authored-By: Tejas Belagod  

diff --git a/gcc/ginclude/unwind-arm-common.h b/gcc/ginclude/unwind-arm-common.h
index d3831f6c60a..f26702e8c6c 100644
--- a/gcc/ginclude/unwind-arm-common.h
+++ b/gcc/ginclude/unwind-arm-common.h
@@ -127,7 +127,8 @@ extern "C" {
   _UVRSC_VFP = 1,   /* vfp */
   _UVRSC_FPA = 2,   /* fpa */
   _UVRSC_WMMXD = 3, /* Intel WMMX data register */
-  _UVRSC_WMMXC = 4  /* Intel WMMX control register */
+  _UVRSC_WMMXC = 4, /* Intel WMMX control register */
+  _UVRSC_PAC = 5/* Armv8.1-M Mainline PAC/AUTH pseudo-register */
 }
   _Unwind_VRS_RegClass;
 
diff --git a/libgcc/config/arm/pr-support.c b/libgcc/config/arm/pr-support.c
index 2de96c2a447..e48854587c6 100644
--- a/libgcc/config/arm/pr-support.c
+++ b/libgcc/config/arm/pr-support.c
@@ -106,6 +106,7 @@ __gnu_unwind_execute (_Unwind_Context * context, 
__gnu_unwind_state * uws)
 {
   _uw op;
   int set_pc;
+  int set_pac = 0;
   _uw reg;
 
   set_pc = 0;
@@ -114,6 +115,27 @@ __gnu_unwind_execute (_Unwind_Context * context, 
__gnu_unwind_state * uws)
   op = next_unwind_byte (uws);
   if (op == CODE_FINISH)
{
+ /* When we reach end, we have to authenticate R12 we just popped
+earlier.
+
+Note: while the check provides additional security against a
+corrupted unwind chain, it isn't essential for correct unwinding
+of an uncorrupted chain.  */
+#if defined(TARGET_HAVE_PACBTI)
+ if (set_pac)
+   {
+ _uw sp;
+ _uw lr;
+ _uw pac;
+ _Unwind_VRS_Get (context, _UVRSC_CORE, R_SP, _UVRSD_UINT32, &sp);
+ _Unwind_VRS_Get (context, _UVRSC_CORE, R_LR, _UVRSD_UINT32, &lr);
+ _Unwind_VRS_Get (context, _UVRSC_PAC, R_IP,
+  _UVRSD_UINT32, &pac);
+ __asm__ __volatile__
+   ("autg %0, %1, %2" : : "r"(pac), "r"(lr), "r"(sp) :);
+   }
+#endif
+
  /* If we haven't already set pc then copy it from lr.  */
  if (!set_pc)
{
@@ -227,6 +249,16 @@ __gnu_unwind_execute (_Unwind_Context * context, 
__gnu_unwind_state * uws)
return _URC_FAILURE;
  continue;
}
+ /* Pop PAC off the stack into VRS pseudo.pac.  */
+ if (op == 0xb4)
+   {
+ if (_Unwind_VRS_Pop (context, _UVRSC_PAC, 0, _UVRSD_UINT32)
+ != _UVRSR_OK)
+   return _URC_FAILURE;
+ set_pac = 1;
+ continue;
+   }
+
  if ((op & 0xfc) == 0xb4)  /* Obsolete FPA.  */
return _URC_FAILURE;
 
diff --git a/libgcc/config/arm/unwind-arm.c b/libgcc/config/arm/unwind-arm.c
index 386406564af..89f945d047e 100644
--- a/libgcc/config/arm/unwind-arm.c
+++ b/libgcc/config/arm/unwind-arm.c
@@ -64,6 +64,12 @@ struct wmmxc_regs
   _uw wc[4];
 };
 
+/*  Holds value of pseudo registers eg. PAC.  */
+struct pseudo_regs
+{
+  _uw pac;
+};
+
 /* The ABI specifies that the unwind routines may only use core registers,
except when actually manipulating coprocessor state.  This allows
us to write one implementation that works on all platforms by
@@ -78,6 +84,9 @@ typedef struct
   /* The first fields must be the same as a phase2_vrs.  */
   _uw demand_save_flags;
   struct core_regs core;
+  /* Armv8.1-M Mainline PAC/AUTH values.  This field should be in the same 
field
+ order as phase2_vrs.  */
+  struc

[PATCH 7/15] arm: Emit build attributes for PACBTI target feature

2022-08-12 Thread Andrea Corallo via Gcc-patches
This patch emits assembler directives for PACBTI build attributes as
defined by the
ABI.



gcc/ChangeLog:

* config/arm/arm.c (arm_file_start): Emit EABI attributes for
Tag_PAC_extension, Tag_BTI_extension, TAG_BTI_use, TAG_PACRET_use.

gcc/testsuite/ChangeLog:

* gcc.target/arm/acle/pacbti-m-predef-1.c: New test.
* gcc.target/arm/acle/pacbti-m-predef-3: Likewise.
* gcc.target/arm/acle/pacbti-m-predef-6.c: Likewise.
* gcc.target/arm/acle/pacbti-m-predef-7.c: Likewise.

Co-Authored-By: Tejas Belagod  

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 0068817b0f2..ceec14f84b6 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -28349,6 +28349,8 @@ static void
 arm_file_start (void)
 {
   int val;
+  bool pac = (aarch_ra_sign_scope != AARCH_FUNCTION_NONE);
+  bool bti = (aarch_enable_bti == 1);
 
   arm_print_asm_arch_directives
 (asm_out_file, TREE_TARGET_OPTION (target_option_default_node));
@@ -28419,6 +28421,22 @@ arm_file_start (void)
arm_emit_eabi_attribute ("Tag_ABI_FP_16bit_format", 38,
 (int) arm_fp16_format);
 
+  if (TARGET_HAVE_PACBTI)
+   {
+ arm_emit_eabi_attribute ("Tag_PAC_extension", 50, 2);
+ arm_emit_eabi_attribute ("Tag_BTI_extension", 52, 2);
+   }
+  else if (pac || bti)
+   {
+ arm_emit_eabi_attribute ("Tag_PAC_extension", 50, 1);
+ arm_emit_eabi_attribute ("Tag_BTI_extension", 52, 1);
+   }
+
+  if (bti)
+arm_emit_eabi_attribute ("TAG_BTI_use", 74, 1);
+  if (pac)
+   arm_emit_eabi_attribute ("TAG_PACRET_use", 76, 1);
+
   if (arm_lang_output_object_attributes_hook)
arm_lang_output_object_attributes_hook();
 }
diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-1.c 
b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-1.c
new file mode 100644
index 000..122f7a762a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target mbranch_protection_ok } */
+/* { dg-options "-march=armv8.1-m.main+fp -mbranch-protection=pac-ret+bti 
-mfloat-abi=hard --save-temps" } */
+
+#if !defined (__ARM_FEATURE_BTI_DEFAULT)
+#error "Feature test macro __ARM_FEATURE_BTI_DEFAULT should be defined."
+#endif
+
+#if !defined (__ARM_FEATURE_PAC_DEFAULT)
+#error "Feature test macro __ARM_FEATURE_PAC_DEFAULT should be defined."
+#endif
+
+/* { dg-final { scan-assembler-not "\.arch_extension pacbti" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 50, 1" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 52, 1" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 74, 1" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 76, 1" } } */
diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-3.c 
b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-3.c
new file mode 100644
index 000..b94f3447ad9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-3.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target mbranch_protection_ok } */
+/* { dg-options "-march=armv8.1-m.main+fp -mbranch-protection=pac-ret+leaf 
-mfloat-abi=hard --save-temps" } */
+
+#if defined (__ARM_FEATURE_BTI_DEFAULT)
+#error "Feature test macro __ARM_FEATURE_BTI_DEFAULT should be undefined."
+#endif
+
+#if !defined (__ARM_FEATURE_PAC_DEFAULT)
+#error "Feature test macro __ARM_FEATURE_PAC_DEFAULT should be defined."
+#endif
+
+/* { dg-final { scan-assembler-not "\.arch_extension pacbti" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 50, 1" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 52, 1" } } */
+/* { dg-final { scan-assembler-not "\.eabi_attribute 74" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 76, 1" } } */
diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-6.c 
b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-6.c
new file mode 100644
index 000..ed52afc83c5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-6.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target mbranch_protection_ok } */
+/* { dg-additional-options "-march=armv8.1-m.main+fp -mbranch-protection=bti 
-mfloat-abi=hard --save-temps" } */
+
+#if !defined (__ARM_FEATURE_BTI_DEFAULT)
+#error "Feature test macro __ARM_FEATURE_BTI_DEFAULT should be defined."
+#endif
+
+#if defined (__ARM_FEATURE_PAC_DEFAULT)
+#error "Feature test macro __ARM_FEATURE_PAC_DEFAULT should be undefined."
+#endif
+/* { dg-final { scan-assembler-not "\.arch_extension pacbti" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 50, 1" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 52, 1" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 74, 1" } } */
+/* { dg-final { scan-assembler-not "\.eabi_attribute 76" } } */
diff --git a/g

[Committed] arm: Document +no options for Cortex-M55 CPU.

2022-08-12 Thread Srinath Parvathaneni via Gcc-patches
Hi,

This patch documents the following options for Arm Cortex-M55 CPU under -mcpu= 
list.

+nomve.fp (disables MVE single precision floating point instructions)
+nomve (disables MVE integer and single precision floating point instructions)
+nodsp (disables dsp, MVE integer and single precision floating point 
instructions)
+nofp (disables floating point instructions)

Committed as obvious to master.

Regards,
Srinath.

gcc/ChangeLog:

2022-08-12  Srinath Parvathaneni  

 * doc/invoke.texi (Arm Options): Document -mcpu=cortex-m55 options.


### Attachment also inlined for ease of reply###


diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
3b529c420c94f70519abfd79acd90d216203c8a7..b264ae28fe6dbe5c298f3b91e4ce3fd8e6a0fb7f
 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -21638,14 +21638,25 @@ The following extension options are common to the 
listed CPUs:
 
 @table @samp
 @item +nodsp
-Disable the DSP instructions on @samp{cortex-m33}, @samp{cortex-m35p}.
+Disable the DSP instructions on @samp{cortex-m33}, @samp{cortex-m35p}
+and @samp{cortex-m55}. Also disable the M-Profile Vector Extension (MVE)
+integer and single precision floating-point instructions on @samp{cortex-m55}.
+
+@item +nomve
+Disable the M-Profile Vector Extension (MVE) integer and single precision
+floating-point instructions on @samp{cortex-m55}.
+
+@item +nomve.fp
+Disable the M-Profile Vector Extension (MVE) single precision floating-point
+instructions on @samp{cortex-m55}.
 
 @item  +nofp
 Disables the floating-point instructions on @samp{arm9e},
 @samp{arm946e-s}, @samp{arm966e-s}, @samp{arm968e-s}, @samp{arm10e},
 @samp{arm1020e}, @samp{arm1022e}, @samp{arm926ej-s},
 @samp{arm1026ej-s}, @samp{cortex-r5}, @samp{cortex-r7}, @samp{cortex-r8},
-@samp{cortex-m4}, @samp{cortex-m7}, @samp{cortex-m33} and @samp{cortex-m35p}.
+@samp{cortex-m4}, @samp{cortex-m7}, @samp{cortex-m33}, @samp{cortex-m35p}
+and @samp{cortex-m55}.
 Disables the floating-point and SIMD instructions on
 @samp{generic-armv7-a}, @samp{cortex-a5}, @samp{cortex-a7},
 @samp{cortex-a8}, @samp{cortex-a9}, @samp{cortex-a12},



diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
3b529c420c94f70519abfd79acd90d216203c8a7..b264ae28fe6dbe5c298f3b91e4ce3fd8e6a0fb7f
 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -21638,14 +21638,25 @@ The following extension options are common to the 
listed CPUs:
 
 @table @samp
 @item +nodsp
-Disable the DSP instructions on @samp{cortex-m33}, @samp{cortex-m35p}.
+Disable the DSP instructions on @samp{cortex-m33}, @samp{cortex-m35p}
+and @samp{cortex-m55}. Also disable the M-Profile Vector Extension (MVE)
+integer and single precision floating-point instructions on @samp{cortex-m55}.
+
+@item +nomve
+Disable the M-Profile Vector Extension (MVE) integer and single precision
+floating-point instructions on @samp{cortex-m55}.
+
+@item +nomve.fp
+Disable the M-Profile Vector Extension (MVE) single precision floating-point
+instructions on @samp{cortex-m55}.
 
 @item  +nofp
 Disables the floating-point instructions on @samp{arm9e},
 @samp{arm946e-s}, @samp{arm966e-s}, @samp{arm968e-s}, @samp{arm10e},
 @samp{arm1020e}, @samp{arm1022e}, @samp{arm926ej-s},
 @samp{arm1026ej-s}, @samp{cortex-r5}, @samp{cortex-r7}, @samp{cortex-r8},
-@samp{cortex-m4}, @samp{cortex-m7}, @samp{cortex-m33} and @samp{cortex-m35p}.
+@samp{cortex-m4}, @samp{cortex-m7}, @samp{cortex-m33}, @samp{cortex-m35p}
+and @samp{cortex-m55}.
 Disables the floating-point and SIMD instructions on
 @samp{generic-armv7-a}, @samp{cortex-a5}, @samp{cortex-a7},
 @samp{cortex-a8}, @samp{cortex-a9}, @samp{cortex-a12},





[PATCH 8/15] arm: Introduce multilibs for PACBTI target feature

2022-08-12 Thread Andrea Corallo via Gcc-patches
This patch add the following new multilibs.

thumb/v8.1-m.main+pacbti/mbranch-protection/nofp
thumb/v8.1-m.main+pacbti+dp/mbranch-protection/soft
thumb/v8.1-m.main+pacbti+dp/mbranch-protection/hard
thumb/v8.1-m.main+pacbti+fp/mbranch-protection/soft
thumb/v8.1-m.main+pacbti+fp/mbranch-protection/hard
thumb/v8.1-m.main+pacbti+mve/mbranch-protection/hard

Triggering the following compiler flags:

-mthumb -march=armv8.1-m.main+pacbti -mbranch-protection=standard 
-mfloat-abi=soft
-mthumb -march=armv8.1-m.main+pacbti+fp -mbranch-protection=standard 
-mfloat-abi=softfp
-mthumb -march=armv8.1-m.main+pacbti+fp -mbranch-protection=standard 
-mfloat-abi=hard
-mthumb -march=armv8.1-m.main+pacbti+fp.dp -mbranch-protection=standard 
-mfloat-abi=softfp
-mthumb -march=armv8.1-m.main+pacbti+fp.dp -mbranch-protection=standard 
-mfloat-abi=hard
-mthumb -march=armv8.1-m.main+pacbti+mve -mbranch-protection=standard 
-mfloat-abi=hard

Approved here:


gcc/

* config/arm/t-rmprofile: Add multilib rules for march +pacbti
  and mbranch-protection.

gcc/testsuite/

* gcc.target/arm/multilib.exp: Add pacbti related entries.

diff --git a/gcc/config/arm/t-rmprofile b/gcc/config/arm/t-rmprofile
index eb321e832f1..fe46a1efa1a 100644
--- a/gcc/config/arm/t-rmprofile
+++ b/gcc/config/arm/t-rmprofile
@@ -27,8 +27,11 @@
 
 # Arch and FPU variants to build libraries with
 
-MULTI_ARCH_OPTS_RM = 
march=armv6s-m/march=armv7-m/march=armv7e-m/march=armv7e-m+fp/march=armv7e-m+fp.dp/march=armv8-m.base/march=armv8-m.main/march=armv8-m.main+fp/march=armv8-m.main+fp.dp/march=armv8.1-m.main+mve
-MULTI_ARCH_DIRS_RM = v6-m v7-m v7e-m v7e-m+fp v7e-m+dp v8-m.base v8-m.main 
v8-m.main+fp v8-m.main+dp v8.1-m.main+mve
+MULTI_ARCH_OPTS_RM = 
march=armv6s-m/march=armv7-m/march=armv7e-m/march=armv7e-m+fp/march=armv7e-m+fp.dp/march=armv8-m.base/march=armv8-m.main/march=armv8-m.main+fp/march=armv8-m.main+fp.dp/march=armv8.1-m.main+mve/march=armv8.1-m.main+pacbti/march=armv8.1-m.main+pacbti+fp/march=armv8.1-m.main+pacbti+fp.dp/march=armv8.1-m.main+pacbti+mve
+MULTI_ARCH_DIRS_RM = v6-m v7-m v7e-m v7e-m+fp v7e-m+dp v8-m.base v8-m.main 
v8-m.main+fp v8-m.main+dp v8.1-m.main+mve v8.1-m.main+pacbti 
v8.1-m.main+pacbti+fp v8.1-m.main+pacbti+dp v8.1-m.main+pacbti+mve
+
+MULTI_ARCH_OPTS_RM += mbranch-protection=standard
+MULTI_ARCH_DIRS_RM += mbranch-protection
 
 # Base M-profile (no fp)
 MULTILIB_REQUIRED  += mthumb/march=armv6s-m/mfloat-abi=soft
@@ -50,6 +53,13 @@ MULTILIB_REQUIRED+= 
mthumb/march=armv8-m.main+fp.dp/mfloat-abi=hard
 MULTILIB_REQUIRED  += mthumb/march=armv8-m.main+fp.dp/mfloat-abi=softfp
 MULTILIB_REQUIRED  += mthumb/march=armv8.1-m.main+mve/mfloat-abi=hard
 
+MULTILIB_REQUIRED  += 
mthumb/march=armv8.1-m.main+pacbti/mbranch-protection=standard/mfloat-abi=soft
+MULTILIB_REQUIRED  += 
mthumb/march=armv8.1-m.main+pacbti+fp/mbranch-protection=standard/mfloat-abi=softfp
+MULTILIB_REQUIRED  += 
mthumb/march=armv8.1-m.main+pacbti+fp/mbranch-protection=standard/mfloat-abi=hard
+MULTILIB_REQUIRED  += 
mthumb/march=armv8.1-m.main+pacbti+fp.dp/mbranch-protection=standard/mfloat-abi=softfp
+MULTILIB_REQUIRED  += 
mthumb/march=armv8.1-m.main+pacbti+fp.dp/mbranch-protection=standard/mfloat-abi=hard
+MULTILIB_REQUIRED  += 
mthumb/march=armv8.1-m.main+pacbti+mve/mbranch-protection=standard/mfloat-abi=hard
+
 # Arch Matches
 MULTILIB_MATCHES   += march?armv6s-m=march?armv6-m
 
@@ -87,9 +97,23 @@ MULTILIB_MATCHES += $(foreach FP, $(v8_1m_sp_variants), \
 MULTILIB_MATCHES += $(foreach FP, $(v8_1m_dp_variants), \
 
march?armv8-m.main+fp.dp=mlibarch?armv8.1-m.main$(FP))
 
+# Map all mbranch-protection values other than 'none' to 'standard'.
+MULTILIB_MATCHES   += mbranch-protection?standard=mbranch-protection?bti
+MULTILIB_MATCHES   += 
mbranch-protection?standard=mbranch-protection?pac-ret
+MULTILIB_MATCHES   += 
mbranch-protection?standard=mbranch-protection?pac-ret+leaf
+MULTILIB_MATCHES   += 
mbranch-protection?standard=mbranch-protection?pac-ret+bti
+MULTILIB_MATCHES   += 
mbranch-protection?standard=mbranch-protection?pac-ret+leaf+bti
+MULTILIB_MATCHES   += 
mbranch-protection?standard=mbranch-protection?bti+pac-ret
+MULTILIB_MATCHES   += 
mbranch-protection?standard=mbranch-protection?bti+pac-ret+leaf
+MULTILIB_MATCHES   += 
mbranch-protection?standard=mbranch-protection?standard+leaf
+
 # For all the MULTILIB_REQUIRED for v8-m and above, add MULTILIB_MATCHES which
 # maps mlibarch with march for multilib linking.
 MULTILIB_MATCHES   += march?armv8-m.main=mlibarch?armv8-m.main
 MULTILIB_MATCHES   += march?armv8-m.main+fp=mlibarch?armv8-m.main+fp
 MULTILIB_MATCHES   += march?armv8-m.main+fp.dp=mlibarch?armv8-m.main+fp.dp
 MULTILIB_MATCHES   += march?armv8.1-m.main+mve=mlibarch?armv8.1-m.main+mve
+MULTILIB_MATCHES   += 
march?armv8.

[PATCH 9/15] arm: Set again stack pointer as CFA reg when popping if necessary

2022-08-12 Thread Andrea Corallo via Gcc-patches
Hi all,

this patch enables 'arm_emit_multi_reg_pop' to set again the stack
pointer as CFA reg when popping if this is necessary.

/gcc/

* config/arm/arm.cc (arm_emit_multi_reg_pop): If the frame pointer
was set define again the stack pointer as CFA reg when popping.

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index ceec14f84b6..a5cf4225aa2 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -22303,8 +22303,18 @@ arm_emit_multi_reg_pop (unsigned long saved_regs_mask)
 
   REG_NOTES (par) = dwarf;
   if (!return_in_pc)
-arm_add_cfa_adjust_cfa_note (par, UNITS_PER_WORD * num_regs,
-stack_pointer_rtx, stack_pointer_rtx);
+{
+  /* If the frame pointer was set define again the stack pointer
+ as CFA reg.  */
+  if (frame_pointer_needed)
+{
+  RTX_FRAME_RELATED_P (par) = 1;
+  add_reg_note (par, REG_CFA_DEF_CFA, stack_pointer_rtx);
+}
+  else
+arm_add_cfa_adjust_cfa_note (par, UNITS_PER_WORD * num_regs,
+ stack_pointer_rtx, stack_pointer_rtx);
+}
 }
 
 /* Generate and emit an insn pattern that we will recognize as a pop_multi


[PATCH 10/15] arm: Implement cortex-M return signing address codegen

2022-08-12 Thread Andrea Corallo via Gcc-patches
Hi all,

this patch enables address return signature and verification based on
Armv8.1-M Pointer Authentication [1].

To sign the return address, we use the PAC R12, LR, SP instruction
upon function entry.  This is signing LR using SP and storing the
result in R12.  R12 will be pushed into the stack.

During function epilogue R12 will be popped and AUT R12, LR, SP will
be used to verify that the content of LR is still valid before return.

Here an example of PAC instrumented function prologue and epilogue:

void foo (void);

int main()
{
  foo ();
  return 0;
}

Compiled with '-march=armv8.1-m.main -mbranch-protection=pac-ret
-mthumb' translates into:

main:
pac ip, lr, sp
push{r3, r7, ip, lr}
add r7, sp, #0
bl  foo
movsr3, #0
mov r0, r3
pop {r3, r7, ip, lr}
aut ip, lr, sp
bx  lr

The patch also takes care of generating a PACBTI instruction in place
of the sequence BTI+PAC when Branch Target Identification is enabled
contextually.

Ex. the previous example compiled with '-march=armv8.1-m.main
-mbranch-protection=pac-ret+bti -mthumb' translates into:

main:
pacbti  ip, lr, sp
push{r3, r7, ip, lr}
add r7, sp, #0
bl  foo
movsr3, #0
mov r0, r3
pop {r3, r7, ip, lr}
aut ip, lr, sp
bx  lr

As part of previous upstream suggestions a test for varargs has been
added and '-mtpcs-frame' is deemed being incompatible with this return
signing address feature being introduced.

[1] 


gcc/Changelog

2021-11-03  Andrea Corallo  

* config/arm/arm.c: (arm_compute_frame_layout)
(arm_expand_prologue, thumb2_expand_return, arm_expand_epilogue)
(arm_conditional_register_usage): Update for pac codegen.
(arm_current_function_pac_enabled_p): New function.
* config/arm/arm.md (pac_ip_lr_sp, pacbti_ip_lr_sp, aut_ip_lr_sp):
Add new patterns.
* config/arm/unspecs.md (UNSPEC_PAC_IP_LR_SP)
(UNSPEC_PACBTI_IP_LR_SP, UNSPEC_AUT_IP_LR_SP): Add unspecs.

gcc/testsuite/Changelog

2021-11-03  Andrea Corallo  

* gcc.target/arm/pac.h : New file.
* gcc.target/arm/pac-1.c : New test case.
* gcc.target/arm/pac-2.c : Likewise.
* gcc.target/arm/pac-3.c : Likewise.
* gcc.target/arm/pac-4.c : Likewise.
* gcc.target/arm/pac-5.c : Likewise.
* gcc.target/arm/pac-6.c : Likewise.
* gcc.target/arm/pac-7.c : Likewise.
* gcc.target/arm/pac-8.c : Likewise.

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index cff7ff1da2a..84764bf27ce 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -379,6 +379,7 @@ extern int vfp3_const_double_for_bits (rtx);
 extern void arm_emit_coreregs_64bit_shift (enum rtx_code, rtx, rtx, rtx, rtx,
   rtx);
 extern bool arm_fusion_enabled_p (tune_params::fuse_ops);
+extern bool arm_current_function_pac_enabled_p (void);
 extern bool arm_valid_symbolic_address_p (rtx);
 extern bool arm_validize_comparison (rtx *, rtx *, rtx *);
 extern bool arm_expand_vector_compare (rtx, rtx_code, rtx, rtx, bool);
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index a5cf4225aa2..31c6bcdea55 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -3209,6 +3209,9 @@ arm_option_override_internal (struct gcc_options *opts,
   arm_stack_protector_guard_offset = offs;
 }
 
+  if (arm_current_function_pac_enabled_p () && !(arm_arch7 && arm_arch_cmse))
+error ("This architecture does not support branch protection 
instructions");
+
 #ifdef SUBTARGET_OVERRIDE_INTERNAL_OPTIONS
   SUBTARGET_OVERRIDE_INTERNAL_OPTIONS;
 #endif
@@ -21139,6 +21142,9 @@ arm_compute_save_core_reg_mask (void)
 
   save_reg_mask |= arm_compute_save_reg0_reg12_mask ();
 
+  if (arm_current_function_pac_enabled_p ())
+save_reg_mask |= 1 << IP_REGNUM;
+
   /* Decide if we need to save the link register.
  Interrupt routines have their own banked link register,
  so they never need to save it.
@@ -23362,6 +23368,12 @@ output_probe_stack_range (rtx reg1, rtx reg2)
   return "";
 }
 
+static bool
+aarch_bti_enabled ()
+{
+  return false;
+}
+
 /* Generate the prologue instructions for entry into an ARM or Thumb-2
function.  */
 void
@@ -23440,12 +23452,13 @@ arm_expand_prologue (void)
 
   /* The static chain register is the same as the IP register.  If it is
  clobbered when creating the frame, we need to save and restore it.  */
-  clobber_ip = IS_NESTED (func_type)
-  && ((TARGET_APCS_FRAME && frame_pointer_needed && TARGET_ARM)
-  || ((flag_stack_check == STATIC_BUILTIN_STACK_CHECK
-   || flag_stack_

[PATCH 11/15] aarch64: Make bti pass generic so it can be used by the arm backend

2022-08-12 Thread Andrea Corallo via Gcc-patches
Hi all,

this patch splits and restructures the aarch64 bti pass code in order
to have it usable by the arm backend as well.  These changes have no
functional impact.

The original patch was approved here:
.

After that Richard E. noted that was better to move the new pass
definition for arm in the following patch and so I did.

Best Regards

  Andrea

gcc/Changelog

* config.gcc (aarch64*-*-*): Rename 'aarch64-bti-insert.o' into
'aarch-bti-insert.o'.
* config/aarch64/aarch64-protos.h: Remove 'aarch64_bti_enabled'
proto.
* config/aarch64/aarch64.cc (aarch_bti_enabled): Rename.
(aarch_bti_j_insn_p, aarch_pac_insn_p): New functions.
(aarch64_output_mi_thunk)
(aarch64_print_patchable_function_entry)
(aarch64_file_end_indicate_exec_stack): Update renamed function
calls to renamed functions.
* config/aarch64/t-aarch64 (aarch-bti-insert.o): Update target.
* config/arm/aarch-bti-insert.cc: New file including and
generalizing code from aarch64-bti-insert.cc.
* config/arm/aarch-common-protos.h: Update.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 7b58e1314ff..2021bdf9d2f 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -329,7 +329,7 @@ aarch64*-*-*)
c_target_objs="aarch64-c.o"
cxx_target_objs="aarch64-c.o"
d_target_objs="aarch64-d.o"
-   extra_objs="aarch64-builtins.o aarch-common.o aarch64-sve-builtins.o 
aarch64-sve-builtins-shapes.o aarch64-sve-builtins-base.o 
aarch64-sve-builtins-sve2.o cortex-a57-fma-steering.o aarch64-speculation.o 
falkor-tag-collision-avoidance.o aarch64-bti-insert.o aarch64-cc-fusion.o"
+   extra_objs="aarch64-builtins.o aarch-common.o aarch64-sve-builtins.o 
aarch64-sve-builtins-shapes.o aarch64-sve-builtins-base.o 
aarch64-sve-builtins-sve2.o cortex-a57-fma-steering.o aarch64-speculation.o 
falkor-tag-collision-avoidance.o aarch-bti-insert.o aarch64-cc-fusion.o"
target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.cc 
\$(srcdir)/config/aarch64/aarch64-sve-builtins.h 
\$(srcdir)/config/aarch64/aarch64-sve-builtins.cc"
target_has_targetm_common=yes
;;
diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index b0c5a4fd6b6..a9aad3abdc2 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -179,7 +179,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
   aarch64_def_or_undef (TARGET_RNG, "__ARM_FEATURE_RNG", pfile);
   aarch64_def_or_undef (TARGET_MEMTAG, "__ARM_FEATURE_MEMORY_TAGGING", pfile);
 
-  aarch64_def_or_undef (aarch64_bti_enabled (),
+  aarch64_def_or_undef (aarch_bti_enabled (),
"__ARM_FEATURE_BTI_DEFAULT", pfile);
 
   cpp_undef (pfile, "__ARM_FEATURE_PAC_DEFAULT");
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index fe2180e95ea..9fdf7f9cc9c 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -891,7 +891,6 @@ void aarch64_register_pragmas (void);
 void aarch64_relayout_simd_types (void);
 void aarch64_reset_previous_fndecl (void);
 bool aarch64_return_address_signing_enabled (void);
-bool aarch64_bti_enabled (void);
 void aarch64_save_restore_target_globals (tree);
 void aarch64_addti_scratch_regs (rtx, rtx, rtx *,
 rtx *, rtx *,
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index eec743024c1..2f67f3872f6 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -8534,11 +8534,61 @@ aarch64_return_address_signing_enabled (void)
 
 /* Return TRUE if Branch Target Identification Mechanism is enabled.  */
 bool
-aarch64_bti_enabled (void)
+aarch_bti_enabled (void)
 {
   return (aarch_enable_bti == 1);
 }
 
+/* Check if INSN is a BTI J insn.  */
+bool
+aarch_bti_j_insn_p (rtx_insn *insn)
+{
+  if (!insn || !INSN_P (insn))
+return false;
+
+  rtx pat = PATTERN (insn);
+  return GET_CODE (pat) == UNSPEC_VOLATILE && XINT (pat, 1) == UNSPECV_BTI_J;
+}
+
+/* Check if X (or any sub-rtx of X) is a PACIASP/PACIBSP instruction.  */
+bool
+aarch_pac_insn_p (rtx x)
+{
+  if (!INSN_P (x))
+return false;
+
+  subrtx_var_iterator::array_type array;
+  FOR_EACH_SUBRTX_VAR (iter, array, PATTERN (x), ALL)
+{
+  rtx sub = *iter;
+  if (sub && GET_CODE (sub) == UNSPEC)
+   {
+ int unspec_val = XINT (sub, 1);
+ switch (unspec_val)
+   {
+   case UNSPEC_PACIASP:
+case UNSPEC_PACIBSP:
+ return true;
+
+   default:
+ return false;
+   }
+ iter.skip_subrtxes ();
+   }
+}
+  return false;
+}
+
+rtx aarch_gen_bti_c (void)
+{
+  return gen_bti_c ();
+}
+
+rtx aarch_gen_bti_j (void)
+{
+  return gen_bti_j ();
+}
+
 /* The caller is going to use ST1D or LD1D to save or restore an SVE
register i

[PATCH 12/15] arm: implement bti injection

2022-08-12 Thread Andrea Corallo via Gcc-patches
Hi all,

this patch enables Branch Target Identification Armv8.1-M Mechanism
[1].

This is achieved by using the bti pass made common with Aarch64.

The pass iterates through the instructions and adds the necessary BTI
instructions at the beginning of every function and at every landing
pads targeted by indirect jumps.

Best Regards

  Andrea

[1]


gcc/ChangeLog

2022-04-07  Andrea Corallo  

* config.gcc (arm*-*-*): Add 'aarch-bti-insert.o' object.
* config/arm/arm-protos.h: Update.
* config/arm/arm.cc (aarch_bti_enabled, aarch_bti_j_insn_p)
(aarch_pac_insn_p, aarch_gen_bti_c, aarch_gen_bti_j): New
functions.
* config/arm/arm.md (bti_nop): New insn.
* config/arm/t-arm (PASSES_EXTRA): Add 'arm-passes.def'.
(aarch-bti-insert.o): New target.
* config/arm/unspecs.md (UNSPEC_BTI_NOP): New unspec.
* config/arm/aarch-bti-insert.cc (rest_of_insert_bti): Update
to verify arch compatibility.
* config/arm/arm-passes.def: New file.

gcc/testsuite/ChangeLog

2022-04-07  Andrea Corallo  

* gcc.target/arm/bti-1.c: New testcase.
* gcc.target/arm/bti-2.c: Likewise.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 2021bdf9d2f..004e1dfa8d8 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -353,7 +353,7 @@ arc*-*-*)
;;
 arm*-*-*)
cpu_type=arm
-   extra_objs="arm-builtins.o arm-mve-builtins.o aarch-common.o"
+   extra_objs="arm-builtins.o arm-mve-builtins.o aarch-common.o 
aarch-bti-insert.o"
extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h 
arm_bf16.h arm_mve_types.h arm_mve.h arm_cde.h"
target_type_format_char='%'
c_target_objs="arm-c.o"
diff --git a/gcc/config/arm/aarch-bti-insert.cc 
b/gcc/config/arm/aarch-bti-insert.cc
index 2d1d2e334a9..8f045c247bf 100644
--- a/gcc/config/arm/aarch-bti-insert.cc
+++ b/gcc/config/arm/aarch-bti-insert.cc
@@ -41,6 +41,7 @@
 #include "cfgrtl.h"
 #include "tree-pass.h"
 #include "cgraph.h"
+#include "diagnostic-core.h"
 
 /* This pass enables the support for Branch Target Identification Mechanism for
Arm/AArch64.  This is a security feature introduced in ARMv8.5-A
diff --git a/gcc/config/arm/arm-passes.def b/gcc/config/arm/arm-passes.def
new file mode 100644
index 000..71d6b563640
--- /dev/null
+++ b/gcc/config/arm/arm-passes.def
@@ -0,0 +1,21 @@
+/* Arm-specific passes declarations.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   Contributed by Arm Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_insert_bti);
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 84764bf27ce..6befb6c4445 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -24,6 +24,8 @@
 
 #include "sbitmap.h"
 
+rtl_opt_pass *make_pass_insert_bti (gcc::context *ctxt);
+
 extern enum unwind_info_type arm_except_unwind_info (struct gcc_options *);
 extern int use_return_insn (int, rtx);
 extern bool use_simple_return_p (void);
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 31c6bcdea55..de5a679c92a 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -23368,12 +23368,6 @@ output_probe_stack_range (rtx reg1, rtx reg2)
   return "";
 }
 
-static bool
-aarch_bti_enabled ()
-{
-  return false;
-}
-
 /* Generate the prologue instructions for entry into an ARM or Thumb-2
function.  */
 void
@@ -32985,6 +32979,58 @@ arm_current_function_pac_enabled_p (void)
&& !crtl->is_leaf);
 }
 
+/* Return TRUE if Branch Target Identification Mechanism is enabled.  */
+bool
+aarch_bti_enabled (void)
+{
+  return aarch_enable_bti == 1;
+}
+
+/* Check if INSN is a BTI J insn.  */
+bool
+aarch_bti_j_insn_p (rtx_insn *insn)
+{
+  if (!insn || !INSN_P (insn))
+return false;
+
+  rtx pat = PATTERN (insn);
+  return GET_CODE (pat) == UNSPEC_VOLATILE && XINT (pat, 1) == UNSPEC_BTI_NOP;
+}
+
+/* Check if X (or any sub-rtx of X) is a PACIASP/PACIBSP instruction.  */
+bool
+aarch_pac_insn_p (rtx x)
+{
+  if (!x || !INSN_P (x))
+return false;
+
+  rtx pat = PATTERN (x);
+
+  if (GET_CODE (pat) == SET)
+{
+  rtx tmp

Re: [PATCH] Support threading of just the exit edge

2022-08-12 Thread Aldy Hernandez via Gcc-patches
On Fri, Aug 12, 2022 at 2:01 PM Richard Biener  wrote:
>
> This started with noticing we add ENTRY_BLOCK to our threads
> just for the sake of simplifying the conditional at the end of
> the first block in a function.  That's not really threading
> anything but it ends up duplicating the entry block, and
> re-writing the result instead of statically fold the jump.

Hmmm, but threading 2 blocks is not really threading at all??  Unless
I'm misunderstanding something, this was even documented in the
backwards threader:

[snip]
 That's not really a jump threading opportunity, but instead is
 simple cprop & simplification.  We could handle it here if we
 wanted by wiring up all the incoming edges.  If we run this
 early in IPA, that might be worth doing.   For now we just
 reject that case.  */
  if (m_path.length () <= 1)
  return false;

Which you undoubtedly ran into because you're specifically eliding the check:

> - if (m_profit.profitable_path_p (m_path, m_name, taken_edge,
> - &irreducible)
> + if ((m_path.length () == 1
> +  || m_profit.profitable_path_p (m_path, m_name, taken_edge,
> + &irreducible))

>
> The following tries to handle those by recording simplifications
> of the exit conditional as a thread of length one.  That requires
> special-casing them in the backward copier since if we do not
> have any block to copy but modify the jump in place and remove
> not taken edges this confuses the hell out of remaining threads.
>
> So back_jt_path_registry::update_cfg now first marks all
> edges we know are never taken and then prunes the threading
> candidates when they include such edge.  Then it makes sure
> to first perform unreachable edge removal (so we avoid
> copying them when other thread paths contain the prevailing
> edge) before continuing to apply the remaining threads.

This is all beyond my pay grade.  I'm not very well versed in the
threader per se.  So if y'all think it's a good idea, by all means.
Perhaps Jeff can chime in, or remembers the above comment?

>
> In statistics you can see this avoids quite a bunch of useless
> threads (I've investiated 3 random files from cc1files with
> dropped stats in any of the thread passes).
>
> Still thinking about it it would be nice to avoid the work of
> discovering those candidates we have to throw away later
> which could eventually be done by having the backward threader
> perform a RPO walk over the CFG, skipping edges that can be
> statically determined as not being executed.  Below I'm
> abusing the path range query to statically analyze the exit
> branch but I assume there's a simpler way of folding this stmt
> which could then better integrate with such a walk.

Unreachable paths can be queried with
path_range_query::unreachable_path_p ().  Could you leverage this?
The idea was that if we ever resolved any SSA name to UNDEFINED, the
path itself was unreachable.

Aldy

>
> In any case it seems worth more conciously handling the
> case of exit branches that simplify without path sensitive
> information.
>
> Then the patch also restricts path discovery when we'd produce
> threads we'll reject later during copying - the backward threader
> copying cannot handle paths where the to duplicate blocks are
> not from exactly the same loop.  I'm probably going to split this
> part out.
>
> Any thoughts?
>
> * gimple-range-path.cc (path_range_query::set_path): Adjust
> assert to allow paths of size one.
> * tree-ssa-threadbackward.cc (back_threader::maybe_register_path):
> Paths of size one are always profitable.
> (back_threader::find_paths_to_names): Likewise.
> Do not walk further if we are leaving the current loop.
> (back_threader::find_taken_edge): Remove assert.  Do not
> walk to ENTRY_BLOCK.
> * tree-ssa-threadupdate.cc (back_jt_path_registry::update_cfg):
> Handle jump threads of just the exit edge by modifying the
> control statement in-place.
> ---
>  gcc/gimple-range-path.cc   |  2 +-
>  gcc/tree-ssa-threadbackward.cc | 21 -
>  gcc/tree-ssa-threadupdate.cc   | 54 ++
>  3 files changed, 69 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
> index 78146f5683e..a7d277c31b8 100644
> --- a/gcc/gimple-range-path.cc
> +++ b/gcc/gimple-range-path.cc
> @@ -220,7 +220,7 @@ path_range_query::unreachable_path_p ()
>  void
>  path_range_query::set_path (const vec &path)
>  {
> -  gcc_checking_assert (path.length () > 1);
> +  gcc_checking_assert (!path.is_empty ());
>m_path = path.copy ();
>m_pos = m_path.length () - 1;
>bitmap_clear (m_has_cache_entry);
> diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
> index b886027fccf..669098e4ec3 100644
> --- a/gcc/tree-ssa-threadbackward.cc
> +++ b/gcc/t

Re: [PATCH 0/15] arm: Enables return address verification and branch target identification on Cortex-M

2022-08-12 Thread Andrea Corallo via Gcc-patches
Andrea Corallo via Gcc-patches  writes:

> Hi all,
>
> as I respinned few patches, dropped one and added another, I'm reposting
> this series thant enables return address verification and branch target
> identification based on Armv8.1-M Pointer Authentication and Branch
> Target Identification Extension [1] for Arm Cortex-M.
>
> This feature is controlled by the newly introduced '-mbranch-protection'
> option, contextually the Armv8.1-M Mainline target feature '+pacbti' is
> added.
>
> Best Regards
>
>   Andrea
>
> [1]
> 

Hi all,

FYI I've pushed these rebased on gcc-12 in 'endors/ARM/arm-12-m-pacbti'.

Best Regards

  Andrea


[PATCH 13/15] arm: Add pacbti related multilib support for armv8.1-m.main.

2022-08-12 Thread Srinath Parvathaneni via Gcc-patches
 Hi,

This patch supports following -march/-mbranch-protection combination by linking 
them
to existing pacbti multilibs.

$ -march=armv8.1-m.main+pacbti+fp.dp+mve.fp -mbranch-protection=standard 
-mfloat-abi=hard -mthumb
$ -march=armv8.1-m.main+pacbti+fp.dp+mve -mbranch-protection=standard 
-mfloat-abi=hard -mthumb
$ -march=armv8.1-m.main+dsp+pacbti+fp.dp -mbranch-protection=standard 
-mfloat-abi=hard -mthumb

Regression tested on arm-none-eabi and bootstrapped on arm-none-linux-gnueabihf.

Ok for master?

Regards,
Srinath.

gcc/ChangeLog:

2022-08-12  Srinath Parvathaneni  

* config/arm/t-rmprofile: Add pacbti multililb variants.

gcc/testsuite/ChangeLog:

2022-08-12  Srinath Parvathaneni  

* gcc.target/arm/pac-10.c: New test.
* gcc.target/arm/pac-11.c: Likewise.
* gcc.target/arm/pac-12.c: Likewise.


patch_16143
Description: patch_16143


RE: [GCC][PATCH v2] arm: Add support for Arm Cortex-M85 CPU.

2022-08-12 Thread Srinath Parvathaneni via Gcc-patches
Hi,

This patch adds the -mcpu support for the Arm Cortex-M85 CPU which is an
Armv8.1-M Mainline CPU supporting MVE and PACBTI by default.

-mpcu=cortex-m85 switch by default matches to 
-march=armv8.1-m.main+pacbti+mve.fp+fp.dp.

Also following options are provided to disable default features.
+nomve.fp (disables MVE Floating point)
+nomve (disables MVE Integer and MVE Floating point)
+nodsp (disables dsp, MVE Integer and MVE Floating point)
+nopacbti (disables pacbti)
+nofp (disables floating point and MVE floating point)

Regression tested on arm-none-eabi and bootstrapped on arm-none-linux-gnueabihf.

Ok for master?

Regards,
Srinath.

gcc/ChangeLog:

2022-08-12  Srinath Parvathaneni  

* config/arm/arm-cpus.in (cortex-m85): Define new CPU.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Likewise.
* doc/invoke.texi (Arm Options): Document -mcpu=cortex-m85.
* (-mfix-cmse-cve-2021-35465): Likewise.

gcc/testsuite/ChangeLog:

2022-08-12  Srinath Parvathaneni  

* gcc.target/arm/multilib.exp: Add tests for cortex-m85.


patch_15922
Description: patch_15922


Re: [PATCH v2] rs6000: Rework ELFv2 support for -fpatchable-function-entry* [PR99888]

2022-08-12 Thread Segher Boessenkool
Hi!

On Fri, Aug 12, 2022 at 05:40:06PM +0800, Kewen.Lin wrote:
> This patch is to update the NOPs patched before and after
> local entry, it looks like:

As I said before, please don't say NOPs.  I know some documentation
does.  That docvumentation needs fixing.  This is not an acronym or
similar: "nop" is short for "noop" or "no-op", meaning "no operation".
It also is a well-accepted term.  It also is an assembler mnemonic, and
has to be written as all lower case there as well.

Just say "nops" please.

> --- a/gcc/testsuite/c-c++-common/patchable_function_entry-default.c
> +++ b/gcc/testsuite/c-c++-common/patchable_function_entry-default.c
> @@ -1,6 +1,7 @@
>  /* { dg-do compile { target { ! { nvptx*-*-* visium-*-* } } } } */
>  /* { dg-options "-O2 -fpatchable-function-entry=3,1" } */
>  /* { dg-additional-options "-fno-pie" { target sparc*-*-* } } */
> +/* { dg-additional-options "-fpatchable-function-entry=3,2" { target 
> powerpc_elfv2 } } */

Add a comment why this is needed?  People looking at this testcase in
the future (including yourself!) will thank you.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr99888-1.c
> @@ -0,0 +1,45 @@
> +/* { dg-require-effective-target powerpc_elfv2 } */

Does this not work on other PowerPC targets?

> +/* Verify no errors for different NOPs after local entry.  */

(Add "on ELFv2" if you make the test run everywhere :-) )

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr99888-2.c
> @@ -0,0 +1,45 @@
> +/* { dg-require-effective-target powerpc_elfv2 } */
> +
> +/* Verify no errors for 2, 6 and 14 NOPs before local entry.  */

Similar.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr99888-3.c
> @@ -0,0 +1,12 @@
> +/* { dg-require-effective-target powerpc_elfv2 } */
> +/* { dg-options "-fpatchable-function-entry=1" } */
> +
> +/* Verify no errors, using command line option instead of function
> +   attribute.  */
> +
> +extern int a;
> +
> +int test (int b) {
> +  return a + b;
> +}

And more.

Rest looks good, thanks!


Segher


[r13-2029 Regression] FAIL: gcc.dg/analyzer/torture/pr93451.c -Os (test for excess errors) on Linux/x86_64

2022-08-12 Thread haochen.jiang via Gcc-patches
On Linux/x86_64,

7e3b45befdbbf1a1f9ff728fa2bac31b4756907c is the first bad commit
commit 7e3b45befdbbf1a1f9ff728fa2bac31b4756907c
Author: Tim Lange 
Date:   Fri Aug 12 10:27:16 2022 +0200

analyzer: out-of-bounds checker [PR106000]

caused

FAIL: gcc.dg/analyzer/torture/pr93451.c   -O0  (test for excess errors)
FAIL: gcc.dg/analyzer/torture/pr93451.c   -O1  (test for excess errors)
FAIL: gcc.dg/analyzer/torture/pr93451.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: gcc.dg/analyzer/torture/pr93451.c   -O2  (test for excess errors)
FAIL: gcc.dg/analyzer/torture/pr93451.c   -O3 -g  (test for excess errors)
FAIL: gcc.dg/analyzer/torture/pr93451.c   -Os  (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-2029/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="analyzer-torture.exp=gcc.dg/analyzer/torture/pr93451.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="analyzer-torture.exp=gcc.dg/analyzer/torture/pr93451.c 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)


[x86 PATCH] PR target/106577: force_reg may clobber operands during split.

2022-08-12 Thread Roger Sayle

This patch fixes PR target/106577 which is a recent ICE on valid regression
caused by my introduction of a *testti_doubleword pre-reload splitter in
i386.md.  During the split pass before reload, this converts the virtual
*testti_doubleword into an *andti3_doubleword and *cmpti_doubleword,
checking that any immediate operand is a valid "x86_64_hilo_general_operand"
and placing it into a TImode register using force_reg if it isn't.

The unexpected behaviour (that caught me out) is that calling force_reg
may occasionally clobber the contents of the global operands array, or
more accurately recog_data.operand[0], which means that by the time
split_XXX calls gen_split_YYY the replacement insn's operands have been
corrupted.

It's difficult to tell who (if anyone is at fault).  The re-entrant
stack trace (for the attached PR) looks like:

gen_split_203 (*testti_doubleword) calls
force_reg calls
emit_move_insn calls
emit_move_insn_1 calls
gen_movti calls
ix86_expand_move calls
ix86_convert_const_wide_int_to_broadcast calls
ix86_vector_duplicate_value calls
recog_memoized calls
recog.

By far the simplest and possibly correct fix is rather than attempt
to push and pop recog_data, to simply (in pre-reload splits) save a
copy of any operands that will be needed after force_reg, and use
these copies afterwards.  Many pre-reload splitters avoid this issue
using "[(clobber (const_int 0))]" and so avoid gen_split_YYY functions,
but in our case we still need to save a copy of operands[0] (even if we
call emit_insn or expand_* ourselves), so we might as well continue to
use the conveniently generated gen_split.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures. Ok for mainline?


2022-08-12  Roger Sayle  

gcc/ChangeLog
PR target/106577
* config/i386/i386.md (*testti_doubleword): Preserve a copy of
operands[0], and move initialization of operands[2] later, as the
call to force_reg may clobber the contents of the operands array.

gcc/testsuite/ChangeLog
PR target/106577
* gcc.target/i386/pr106577.c: New test case.


Thanks,
Roger
--

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 2fde8cd..e9232cd 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -9772,9 +9772,12 @@
   (clobber (reg:CC FLAGS_REG))])
(set (reg:CCZ FLAGS_REG) (compare:CCZ (match_dup 2) (const_int 0)))]
 {
-  operands[2] = gen_reg_rtx (TImode);
+  /* Calling force_reg may clobber operands[0].  */
+  rtx save_op0 = operands[0];
   if (!x86_64_hilo_general_operand (operands[1], TImode))
 operands[1] = force_reg (TImode, operands[1]);
+  operands[2] = gen_reg_rtx (TImode);
+  operands[0] = save_op0;
 })
 
 ;; Combine likes to form bit extractions for some tests.  Humor it.
diff --git a/gcc/testsuite/gcc.target/i386/pr106577.c 
b/gcc/testsuite/gcc.target/i386/pr106577.c
new file mode 100644
index 000..4e35031
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr106577.c
@@ -0,0 +1,7 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -mavx" } */
+
+int i;
+void foo(void) {
+  i ^= !(((unsigned __int128)0xf0f0f0f0f0f0f0f0 << 64 | 0xf0f0f0f0f0f0f0f0) & 
i);
+}


Re: Rust frontend patches v1

2022-08-12 Thread Mike Stump via Gcc-patches
On Aug 10, 2022, at 11:56 AM, Philip Herron  wrote:
> 
> For my v2 of the patches, I've been spending a lot of time ensuring
> each patch is buildable. It would end up being simpler if it was
> possible if each patch did not have to be like this so I could split
> up the front-end in more patches. Does this make sense? In theory,
> when everything goes well, does this still mean that we can merge in
> one commit, or should it follow a series of buildable patches? I've
> received feedback that it might be possible to ignore making each
> patch an independent chunk and just focus on splitting it up as small
> as possible even if they don't build.

It is a waste of time to make each build.  The all go in together, or not at 
all.  The patches are split for review only.  You can then maintain approval 
status for each individually and perform adjustments and updates for each patch.

Once all pieces have been approved in their final form, you can then commit the 
whole at once.  It is this commit, before you commit, that you regression test, 
integration test, and ensure that final form is good.  If not, you bounce back 
to update for all the pieces that need it, approval for those edits, and 
lather, rinse, repeat.

It is handy for larger work (like this) to be on a git branch so that followers 
can see the totality of the work and experiment with it in the large.  I'd 
usually do the commit to the main branch as a squashed commit without the 
review edit histories or the "bad stuff" the reviewers had you change or the 
merge records even.

[x86 PATCH take #2] Move V1TI shift/rotate lowering from expand to pre-reload split.

2022-08-12 Thread Roger Sayle

Hi Uros,
As requested, here's an updated version of my patch that introduces a new
const_0_to_255_not_mul_8_operand as you've requested.  I think in this
instance, having mutually exclusive patterns that can appear in any order,
without imposing implicit ordering constraints, is slightly preferable,
especially as (thanks to STV)  some related patterns may appear in
sse.md and others appear in i386.md (making ordering tricky).

This patch has been retested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.  Ok for mainline?


2022-08-12  Roger Sayle  
Uroš Bizjak  

gcc/ChangeLog
* config/i386/predicates.md (const_0_to_255_not_mul_8_operand):
New predicate for values between 0/1 and 255, not multiples of 8.
* config/i386/sse.md (ashlv1ti3): Delay lowering of logical left
shifts by constant bit counts.
(*ashlvti3_internal): New define_insn_and_split that lowers
logical left shifts by constant bit counts, that aren't multiples
of 8, before reload.
(lshrv1ti3): Delay lowering of logical right shifts by constant.
(*lshrv1ti3_internal): New define_insn_and_split that lowers
logical right shifts by constant bit counts, that aren't multiples
of 8, before reload.
(ashrv1ti3):: Delay lowering of arithmetic right shifts by
constant bit counts.
(*ashrv1ti3_internal): New define_insn_and_split that lowers
arithmetic right shifts by constant bit counts before reload.
(rotlv1ti3): Delay lowering of rotate left by constant.
(*rotlv1ti3_internal): New define_insn_and_split that lowers
rotate left by constant bits counts before reload.
(rotrv1ti3): Delay lowering of rotate right by constant.
(*rotrv1ti3_internal): New define_insn_and_split that lowers
rotate right by constant bits counts before reload.


Thanks again,
Roger

> -Original Message-
> From: Uros Bizjak 
> Sent: 08 August 2022 08:48
> To: Roger Sayle 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [x86 PATCH] Move V1TI shift/rotate lowering from expand to pre-
> reload split.
> 
> On Fri, Aug 5, 2022 at 8:36 PM Roger Sayle 
> wrote:
> >
> >
> > This patch moves the lowering of 128-bit V1TImode shifts and rotations
> > by constant bit counts to sequences of SSE operations from the RTL
> > expansion pass to the pre-reload split pass.  Postponing this
> > splitting of shifts and rotates enables (will enable) the TImode
> > equivalents of these operations/ instructions to be considered as
> > candidates by the (TImode) STV pass.
> > Technically, this patch changes the existing expanders to continue to
> > lower shifts by variable amounts, but constant operands become RTL
> > instructions, specified by define_insn_and_split that are triggered by
> > x86_pre_reload_split.  The one minor complication is that logical
> > shifts by multiples of eight, don't get split, but are handled by
> > existing insn patterns, such as sse2_ashlv1ti3 and sse2_lshrv1ti3.
> > There should be no changes in generated code with this patch, which
> > just adjusts the pass in which transformations get applied.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32},
> > with no new failures.  Ok for mainline?
> >
> >
> >
> > 2022-08-05  Roger Sayle  
> >
> > gcc/ChangeLog
> > * config/i386/sse.md (ashlv1ti3): Delay lowering of logical left
> > shifts by constant bit counts.
> > (*ashlvti3_internal): New define_insn_and_split that lowers
> > logical left shifts by constant bit counts, that aren't multiples
> > of 8, before reload.
> > (lshrv1ti3): Delay lowering of logical right shifts by constant.
> > (*lshrv1ti3_internal): New define_insn_and_split that lowers
> > logical right shifts by constant bit counts, that aren't multiples
> > of 8, before reload.
> > (ashrv1ti3):: Delay lowering of arithmetic right shifts by
> > constant bit counts.
> > (*ashrv1ti3_internal): New define_insn_and_split that lowers
> > arithmetic right shifts by constant bit counts before reload.
> > (rotlv1ti3): Delay lowering of rotate left by constant.
> > (*rotlv1ti3_internal): New define_insn_and_split that lowers
> > rotate left by constant bits counts before reload.
> > (rotrv1ti3): Delay lowering of rotate right by constant.
> > (*rotrv1ti3_internal): New define_insn_and_split that lowers
> > rotate right by constant bits counts before reload.
> 
> +(define_insn_and_split "*ashlv1ti3_internal"
> +  [(set (match_operand:V1TI 0 "register_operand")
>   (ashift:V1TI
>   (match_operand:V1TI 1 "register_operand")
> - (match_operand:QI 2 "general_operand")))]
> -  "TARGET_SSE2 && TARGET_64BIT"
> + (match_operand:

[PATCH take #2] PR tree-optimization/71343: Optimize (X<

2022-08-12 Thread Roger Sayle

Hi Richard,
Many thanks for the review and useful suggestions.  I (think I) agree that
handling non-canonical forms in value_numbering makes more sense,
so this revised patch is just the first (non-controversial) part of the original
submission, that incorporates your observation that it doesn't need to
be limited to (valid) constant shifts, and can be generalized to any
shift, without introducing undefined behaviour that didn't exist before.

This revised patch has been tested on x86_64-pc-linux-gnu with
make bootstrap and make -k check, both with and without
--target_board=unix{-m32} with no new failures.  Ok for mainline?


2022-08-12  Roger Sayle  
Richard Biener  

gcc/ChangeLog
PR tree-optimization/71343
* match.pd (op (lshift @0 @1) (lshift @2 @1)): Optimize the
expression (X<>C)^(Y>>C)
to (X^Y)>>C for binary logical operators, AND, IOR and XOR.

gcc/testsuite/ChangeLog
PR tree-optimization/71343
* gcc.dg/pr71343-1.c: New test case.


Thanks,
Roger
--

> -Original Message-
> From: Richard Biener 
> Sent: 08 August 2022 12:42
> To: Roger Sayle 
> Cc: GCC Patches 
> Subject: Re: [PATCH] PR tree-optimization/71343: Optimize (X< (X&Y)< 
> On Mon, Aug 8, 2022 at 10:07 AM Roger Sayle
>  wrote:
> >
> >
> > This patch resolves PR tree-optimization/71343, a missed-optimization
> > enhancement request where GCC fails to see that (a<<2)+(b<<2) == a*4+b*4.
> > This requires two related (sets of) optimizations to be added to match.pd.
> >
> > The first is that (X< > for many binary operators, including AND, IOR, XOR, and (if overflow
> > isn't an issue) PLUS and MINUS.  Likewise, the right shifts (both
> > logical and arithmetic) and bit-wise logical operators can be
> > simplified in a similar fashion.  These all reduce the number of
> > GIMPLE binary operations from 3 to 2, by combining/eliminating a shift
> operation.
> >
> > The second optimization reflects that the middle-end doesn't impose a
> > canonical form on multiplications by powers of two, vs. left shifts,
> > instead leaving these operations as specified by the programmer unless
> > there's a good reason to change them.  Hence, GIMPLE code may contain
> > the expressions "X * 8" and "X << 3" even though these represent the
> > same value/computation.  The tweak to match.pd is that comparison
> > operations whose operands are equivalent non-canonical expressions can
> > be taught their equivalence.  Hence "(X * 8) == (X << 3)" will always
> > evaluate to true, and "(X<<2) > 4*X" will always evaluate to false.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32},
> > with no new failures.  Ok for mainline?
> 
> +/* Shifts by constants distribute over several binary operations,
> +   hence (X << C) + (Y << C) can be simplified to (X + Y) << C.  */
> +(for op (plus minus)
> +  (simplify
> +(op (lshift:s @0 INTEGER_CST@1) (lshift:s @2 INTEGER_CST@1))
> +(if (INTEGRAL_TYPE_P (type)
> +&& TYPE_OVERFLOW_WRAPS (type)
> +&& !TYPE_SATURATING (type)
> +&& tree_fits_shwi_p (@1)
> +&& tree_to_shwi (@1) > 0
> +&& tree_to_shwi (@1) < TYPE_PRECISION (type))
> 
> I do wonder why we need to restrict this to shifts by constants?
> Any out-of-bound shift was already there, no?
> 
> +/* Some tree expressions are intentionally non-canonical.
> +   We handle the comparison of the equivalent forms here.  */ (for cmp
> +(eq le ge)
> +  (simplify
> +(cmp:c (lshift @0 INTEGER_CST@1) (mult @0 integer_pow2p@2))
> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +&& tree_fits_shwi_p (@1)
> +&& tree_to_shwi (@1) > 0
> +&& tree_to_shwi (@1) < TYPE_PRECISION  (TREE_TYPE (@0))
> +&& wi::to_wide (@1) == wi::exact_log2 (wi::to_wide (@2)))
> +  { constant_boolean_node (true, type); })))
> +
> +(for cmp (ne lt gt)
> +  (simplify
> +(cmp:c (lshift @0 INTEGER_CST@1) (mult @0 integer_pow2p@2))
> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +&& tree_fits_shwi_p (@1)
> +&& tree_to_shwi (@1) > 0
> +&& tree_to_shwi (@1) < TYPE_PRECISION  (TREE_TYPE (@0))
> +&& wi::to_wide (@1) == wi::exact_log2 (wi::to_wide (@2)))
> +  { constant_boolean_node (false, type); })))
> 
> hmm.  I wonder if it makes more sense to handle this in value-numbering.
> tree-ssa-sccvn.cc:visit_nary_op handles some cases that are not exactly
> canonicalization issues but the shift vs mult could be handled there by just
> performing the alternate lookup.  That would also enable CSE and by means of
> that of course the comparisons you do above.
> 
> Thanks,
> Richard.
> 
> >
> > 2022-08-08  Roger Sayle  
> >
> > gcc/ChangeLog
> > PR tree-optimization/71343
> > * match.pd (op (lshift @0 @1) (lshift @2 @1)): Optimize the
> > expression (X< > (op (rshift @0 @1) (rshift @2 @1)): Likwise, simplify (X>>C)^(Y>>C)
> > 

Re: [PATCH] RISC-V: Avoid redundant sign-extension for SImode SGE, SGEU, SLE, SLEU

2022-08-12 Thread Maciej W. Rozycki
On Thu, 11 Aug 2022, Kito Cheng wrote:

> LGTM, but with a nit, I don't get set.w but get an andi like below, so
> maybe we should also scan-assembler-not andi? feel free to commit that
> directly with that fix
> 
> ```asm
> sleu:
>sgtua0,a0,a1# 9 [c=4 l=4]  *sgtu_disi
>xoria0,a0,1 # 10[c=4 l=4]  *xorsi3_internal/1
>andia0,a0,1 # 16[c=4 l=4]  anddi3/1
>ret # 25[c=0 l=4]  simple_return
> ```

 Interesting.  I can do that, but can you please share the compilation 
options, given or defaulted (from `--with...' configuration options), this 
happens with?

  Maciej


RE: [PATCH] PR tree-optimization/64992: (B << 2) != 0 is B when B is Boolean.

2022-08-12 Thread Roger Sayle
Hi Richard,

> -Original Message-
> From: Richard Biener 
> Sent: 08 August 2022 12:49
> Subject: Re: [PATCH] PR tree-optimization/64992: (B << 2) != 0 is B when B is
> Boolean.
> 
> On Mon, Aug 8, 2022 at 11:06 AM Roger Sayle
>  wrote:
> >
> > This patch resolves both PR tree-optimization/64992 and PR
> > tree-optimization/98956 which are missed optimization enhancement
> > request, for which Andrew Pinski already has a proposed solution
> > (related to a fix for PR tree-optimization/98954).  Yesterday, I
> > proposed an alternate improved patch for PR98954, which although
> > superior in most respects, alas didn't address this case [which
> > doesn't include a BIT_AND_EXPR], hence this follow-up fix.
> >
> > For many functions, F(B), of a (zero-one) Boolean value B, the
> > expression F(B) != 0 can often be simplified to just B.  Hence "(B *
> > 5) != 0" is B, "-B != 0" is B, "bswap(B) != 0" is B, "(B >>r 3) != 0"
> > is B.  These are all currently optimized by GCC, with the strange
> > exception of left shifts by a constant (possibly due to the
> > undefined/implementation defined behaviour when the shift constant is
> > larger than the first operand's precision).
> > This patch adds support for this particular case, when the shift
> > constant is valid.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32},
> > with no new failures.  Ok for mainline?
> 
> +/* (X << C) != 0 can be simplified to X, when X is zero_one_valued_p.
> +*/ (simplify
> +  (ne (lshift zero_one_valued_p@0 INTEGER_CST@1) integer_zerop@2)
> +  (if (tree_fits_shwi_p (@1)
> +   && tree_to_shwi (@1) > 0
> +   && tree_to_shwi (@1) < TYPE_PRECISION (TREE_TYPE (@0)))
> +(convert @0)))
> 
> while we deliberately do not fold int << 34 since the result is undefined 
> there is
> IMHO no reason to not fold the above for any (even non-constant) shift value.
> We have guards with TYPE_OVERFLOW_SANITIZED in some cases but I think
> that's not appropriate here, there's one flag_sanitize check, maybe there's a
> special bit for SHIFT overflow we can use.  Why is (X << 0) != 0 excempt in 
> the
> condition?

In this case, I think it makes more sense to err on the side of caution, and
avoid changing the observable behaviour of programs, even in cases were
the behaviour is officially undefined.  For many targets, (1< > 2022-08-08  Roger Sayle  
> >
> > gcc/ChangeLog
> > PR tree-optimization/64992
> > PR tree-optimization/98956
> > * match.pd (ne (lshift @0 @1) 0): Simplify (X << C) != 0 to X
> > when X is zero_one_valued_p and the shift constant C is valid.
> > (eq (lshift @0 @1) 0): Likewise, simplify (X << C) == 0 to !X
> > when X is zero_one_valued_p and the shift constant C is valid.
> >
> > gcc/testsuite/ChangeLog
> > PR tree-optimization/64992
> > * gcc.dg/pr64992.c: New test case.
> >

Thanks,
Roger
--