On Dec  2, 2021, Richard Biener <rguent...@suse.de> wrote:

> While adding ASMNESIA looks OK at this point, I'm not sure about the
> asm handling stuff.  You mention 'reload' above, do you mean LRA?

I meant the allocation was first visible in -fdump-rtl-reload.  I've
added a note about this problem, and a modified testcase, to PR93027
once I realized that the problem was present in an unpatched compiler,
even at -O0.


In the previous patch, I'd already abandoned the intent to use the +X
constraint for ASMNESIA, because it was not useful and created other
problems.  I'd fixed numerous of those additional problems in the
recog.c and cfgexpand.c changes, but those turned out to be not needed
to fix the PR once I backpedaled to +g (or +m).  ASMNESIA also turned
out to be unnecessary, so I prepared and retested another version of the
patch that uses the same switch-to-MEM logic I wrote for ASMNESIA in the
previous patch, but now directly in the detach_value implementation
within the harden-conditional passes, using +m and an addressable
temporary if we find that +g won't do.

If you find that ASMNESIA is still useful as a reusable internal
primitive, I can reintroduce it, now or at a later time.  It would bring
some slight codegen benefit, but we could do without it at this stage.


[PR103149] detach values through mem only if general regs won't do

From: Alexandre Oliva <ol...@adacore.com>

When hardening compares or conditional branches, we perform redundant
tests, and to prevent them from being optimized out, we use asm
statements that preserve a value used in a compare, but in a way that
the compiler can no longer assume it's the same value, so it can't
optimize the redundant test away.

We used to use +g, but that requires general regs or mem.  You might
think that, if a reg constraint can't be satisfied, the register
allocator will fall back to memory, but that's not so: we decide on
matching MEMs very early on, by using the same addressable operand on
both input and output, and only if the constraint does not allow
registers.  If it does, we use gimple registers and then pseudos as
inputs and outputs, and then inputs can be substituted by equivalent
expressions, and then, if no register contraint fits (e.g. because
that mode won't fit in general regs, or won't fit in regs at all), the
register allocator will give up before even trying to allocate some
temporary memory to unify input and output.

This patch arranges for us to create and use the temporary stack slot
if we can tell the mode requires memory, or won't otherwise fit in
general regs, and thus to use +m for that asm.


Regstrapped on x86_64-linux-gnu.  Verified that the mentioned aarch64 PR
is fixed.  Also bootstrapping along with a patch that enables both
compare hardening passes.  Ok to install?


for  gcc/ChangeLog

        PR middle-end/103149
        * gimple-harden-conditionals.cc (detach_value): Use memory if
        general regs won't do.

for  gcc/testsuite/ChangeLog

        PR middle-end/103149
        * gcc.target/aarch64/pr103149.c: New.
---
 gcc/gimple-harden-conditionals.cc           |   67 +++++++++++++++++++++++++--
 gcc/testsuite/gcc.target/aarch64/pr103149.c |   14 ++++++
 2 files changed, 75 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr103149.c

diff --git a/gcc/gimple-harden-conditionals.cc 
b/gcc/gimple-harden-conditionals.cc
index cfa2361d65be0..81867d6e4275f 100644
--- a/gcc/gimple-harden-conditionals.cc
+++ b/gcc/gimple-harden-conditionals.cc
@@ -22,6 +22,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
+#include "target.h"
+#include "rtl.h"
 #include "tree.h"
 #include "fold-const.h"
 #include "gimple.h"
@@ -132,25 +134,78 @@ detach_value (location_t loc, gimple_stmt_iterator *gsip, 
tree val)
   tree ret = make_ssa_name (TREE_TYPE (val));
   SET_SSA_NAME_VAR_OR_IDENTIFIER (ret, SSA_NAME_IDENTIFIER (val));
 
-  /* Output asm ("" : "=g" (ret) : "0" (val));  */
+  /* Some modes won't fit in general regs, so we fall back to memory
+     for them.  ??? It would be ideal to try to identify an alternate,
+     wider or more suitable register class, and use the corresponding
+     constraint, but there's no logic to go from register class to
+     constraint, even if there is a corresponding constraint, and even
+     if we could enumerate constraints, we can't get to their string
+     either.  So this will do for now.  */
+  bool need_memory = true;
+  enum machine_mode mode = TYPE_MODE (TREE_TYPE (val));
+  if (mode != BLKmode)
+    for (int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+      if (TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], i)
+         && targetm.hard_regno_mode_ok (i, mode))
+       {
+         need_memory = false;
+         break;
+       }
+
+  tree asminput = val;
+  tree asmoutput = ret;
+  const char *constraint_out = need_memory ? "=m" : "=g";
+  const char *constraint_in = need_memory ? "m" : "0";
+
+  if (need_memory)
+    {
+      tree temp = create_tmp_var (TREE_TYPE (val), "dtch");
+      mark_addressable (temp);
+
+      gassign *copyin = gimple_build_assign (temp, asminput);
+      gimple_set_location (copyin, loc);
+      gsi_insert_before (gsip, copyin, GSI_SAME_STMT);
+
+      asminput = asmoutput = temp;
+    }
+
+  /* Output an asm statement with matching input and output.  It does
+     nothing, but after it the compiler no longer knows the output
+     still holds the same value as the input.  */
   vec<tree, va_gc> *inputs = NULL;
   vec<tree, va_gc> *outputs = NULL;
   vec_safe_push (outputs,
                 build_tree_list
                 (build_tree_list
-                 (NULL_TREE, build_string (2, "=g")),
-                 ret));
+                 (NULL_TREE, build_string (strlen (constraint_out),
+                                           constraint_out)),
+                 asmoutput));
   vec_safe_push (inputs,
                 build_tree_list
                 (build_tree_list
-                 (NULL_TREE, build_string (1, "0")),
-                 val));
+                 (NULL_TREE, build_string (strlen (constraint_in),
+                                           constraint_in)),
+                 asminput));
   gasm *detach = gimple_build_asm_vec ("", inputs, outputs,
                                       NULL, NULL);
   gimple_set_location (detach, loc);
   gsi_insert_before (gsip, detach, GSI_SAME_STMT);
 
-  SSA_NAME_DEF_STMT (ret) = detach;
+  if (need_memory)
+    {
+      gassign *copyout = gimple_build_assign (ret, asmoutput);
+      gimple_set_location (copyout, loc);
+      gsi_insert_before (gsip, copyout, GSI_SAME_STMT);
+      SSA_NAME_DEF_STMT (ret) = copyout;
+
+      gassign *clobber = gimple_build_assign (asmoutput,
+                                             build_clobber
+                                             (TREE_TYPE (asmoutput)));
+      gimple_set_location (clobber, loc);
+      gsi_insert_before (gsip, clobber, GSI_SAME_STMT);
+    }
+  else
+    SSA_NAME_DEF_STMT (ret) = detach;
 
   return ret;
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/pr103149.c 
b/gcc/testsuite/gcc.target/aarch64/pr103149.c
new file mode 100644
index 0000000000000..906bc9ae06612
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr103149.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a+sve -O2 -fharden-conditional-branches 
-fno-tree-scev-cprop" } */
+
+/* -fharden-conditional-branches prevents optimization of its redundant
+   compares by detaching values from the operands with asm statements.  They
+   used to require GENERAL_REGS, but the vectorized booleans, generated while
+   vectorizing this function, can't be held in GENERAL_REGS.  */
+
+void
+foo (int *p)
+{
+  while (*p < 1)
+    ++*p;
+}


-- 
Alexandre Oliva, happy hacker                https://FSFLA.org/blogs/lxo/
   Free Software Activist                       GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about <https://stallmansupport.org>

Reply via email to