On Fri, 3 Dec 2021, Alexandre Oliva wrote: > On Dec 2, 2021, Richard Biener <rguent...@suse.de> wrote: > > > While adding ASMNESIA looks OK at this point, I'm not sure about the > > asm handling stuff. You mention 'reload' above, do you mean LRA? > > I meant the allocation was first visible in -fdump-rtl-reload. I've > added a note about this problem, and a modified testcase, to PR93027 > once I realized that the problem was present in an unpatched compiler, > even at -O0.
Ah, I see. > In the previous patch, I'd already abandoned the intent to use the +X > constraint for ASMNESIA, because it was not useful and created other > problems. I'd fixed numerous of those additional problems in the > recog.c and cfgexpand.c changes, but those turned out to be not needed > to fix the PR once I backpedaled to +g (or +m). ASMNESIA also turned > out to be unnecessary, so I prepared and retested another version of the > patch that uses the same switch-to-MEM logic I wrote for ASMNESIA in the > previous patch, but now directly in the detach_value implementation > within the harden-conditional passes, using +m and an addressable > temporary if we find that +g won't do. > > If you find that ASMNESIA is still useful as a reusable internal > primitive, I can reintroduce it, now or at a later time. It would bring > some slight codegen benefit, but we could do without it at this stage. I prefer to be minimalistic at this point. Maybe we can re-use ASMNESIA for the FENC_ACCESS stuff if somebody picks up Marcs work during next stage1. > > [PR103149] detach values through mem only if general regs won't do > > From: Alexandre Oliva <ol...@adacore.com> > > When hardening compares or conditional branches, we perform redundant > tests, and to prevent them from being optimized out, we use asm > statements that preserve a value used in a compare, but in a way that > the compiler can no longer assume it's the same value, so it can't > optimize the redundant test away. > > We used to use +g, but that requires general regs or mem. You might > think that, if a reg constraint can't be satisfied, the register > allocator will fall back to memory, but that's not so: we decide on > matching MEMs very early on, by using the same addressable operand on > both input and output, and only if the constraint does not allow > registers. If it does, we use gimple registers and then pseudos as > inputs and outputs, and then inputs can be substituted by equivalent > expressions, and then, if no register contraint fits (e.g. because > that mode won't fit in general regs, or won't fit in regs at all), the > register allocator will give up before even trying to allocate some > temporary memory to unify input and output. > > This patch arranges for us to create and use the temporary stack slot > if we can tell the mode requires memory, or won't otherwise fit in > general regs, and thus to use +m for that asm. > > > Regstrapped on x86_64-linux-gnu. Verified that the mentioned aarch64 PR > is fixed. Also bootstrapping along with a patch that enables both > compare hardening passes. Ok to install? OK. Thanks, Richard. > > > for gcc/ChangeLog > > PR middle-end/103149 > * gimple-harden-conditionals.cc (detach_value): Use memory if > general regs won't do. > > for gcc/testsuite/ChangeLog > > PR middle-end/103149 > * gcc.target/aarch64/pr103149.c: New. > --- > gcc/gimple-harden-conditionals.cc | 67 > +++++++++++++++++++++++++-- > gcc/testsuite/gcc.target/aarch64/pr103149.c | 14 ++++++ > 2 files changed, 75 insertions(+), 6 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr103149.c > > diff --git a/gcc/gimple-harden-conditionals.cc > b/gcc/gimple-harden-conditionals.cc > index cfa2361d65be0..81867d6e4275f 100644 > --- a/gcc/gimple-harden-conditionals.cc > +++ b/gcc/gimple-harden-conditionals.cc > @@ -22,6 +22,8 @@ along with GCC; see the file COPYING3. If not see > #include "system.h" > #include "coretypes.h" > #include "backend.h" > +#include "target.h" > +#include "rtl.h" > #include "tree.h" > #include "fold-const.h" > #include "gimple.h" > @@ -132,25 +134,78 @@ detach_value (location_t loc, gimple_stmt_iterator > *gsip, tree val) > tree ret = make_ssa_name (TREE_TYPE (val)); > SET_SSA_NAME_VAR_OR_IDENTIFIER (ret, SSA_NAME_IDENTIFIER (val)); > > - /* Output asm ("" : "=g" (ret) : "0" (val)); */ > + /* Some modes won't fit in general regs, so we fall back to memory > + for them. ??? It would be ideal to try to identify an alternate, > + wider or more suitable register class, and use the corresponding > + constraint, but there's no logic to go from register class to > + constraint, even if there is a corresponding constraint, and even > + if we could enumerate constraints, we can't get to their string > + either. So this will do for now. */ > + bool need_memory = true; > + enum machine_mode mode = TYPE_MODE (TREE_TYPE (val)); > + if (mode != BLKmode) > + for (int i = 0; i < FIRST_PSEUDO_REGISTER; i++) > + if (TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], i) > + && targetm.hard_regno_mode_ok (i, mode)) > + { > + need_memory = false; > + break; > + } > + > + tree asminput = val; > + tree asmoutput = ret; > + const char *constraint_out = need_memory ? "=m" : "=g"; > + const char *constraint_in = need_memory ? "m" : "0"; > + > + if (need_memory) > + { > + tree temp = create_tmp_var (TREE_TYPE (val), "dtch"); > + mark_addressable (temp); > + > + gassign *copyin = gimple_build_assign (temp, asminput); > + gimple_set_location (copyin, loc); > + gsi_insert_before (gsip, copyin, GSI_SAME_STMT); > + > + asminput = asmoutput = temp; > + } > + > + /* Output an asm statement with matching input and output. It does > + nothing, but after it the compiler no longer knows the output > + still holds the same value as the input. */ > vec<tree, va_gc> *inputs = NULL; > vec<tree, va_gc> *outputs = NULL; > vec_safe_push (outputs, > build_tree_list > (build_tree_list > - (NULL_TREE, build_string (2, "=g")), > - ret)); > + (NULL_TREE, build_string (strlen (constraint_out), > + constraint_out)), > + asmoutput)); > vec_safe_push (inputs, > build_tree_list > (build_tree_list > - (NULL_TREE, build_string (1, "0")), > - val)); > + (NULL_TREE, build_string (strlen (constraint_in), > + constraint_in)), > + asminput)); > gasm *detach = gimple_build_asm_vec ("", inputs, outputs, > NULL, NULL); > gimple_set_location (detach, loc); > gsi_insert_before (gsip, detach, GSI_SAME_STMT); > > - SSA_NAME_DEF_STMT (ret) = detach; > + if (need_memory) > + { > + gassign *copyout = gimple_build_assign (ret, asmoutput); > + gimple_set_location (copyout, loc); > + gsi_insert_before (gsip, copyout, GSI_SAME_STMT); > + SSA_NAME_DEF_STMT (ret) = copyout; > + > + gassign *clobber = gimple_build_assign (asmoutput, > + build_clobber > + (TREE_TYPE (asmoutput))); > + gimple_set_location (clobber, loc); > + gsi_insert_before (gsip, clobber, GSI_SAME_STMT); > + } > + else > + SSA_NAME_DEF_STMT (ret) = detach; > > return ret; > } > diff --git a/gcc/testsuite/gcc.target/aarch64/pr103149.c > b/gcc/testsuite/gcc.target/aarch64/pr103149.c > new file mode 100644 > index 0000000000000..906bc9ae06612 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/pr103149.c > @@ -0,0 +1,14 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=armv8-a+sve -O2 -fharden-conditional-branches > -fno-tree-scev-cprop" } */ > + > +/* -fharden-conditional-branches prevents optimization of its redundant > + compares by detaching values from the operands with asm statements. They > + used to require GENERAL_REGS, but the vectorized booleans, generated while > + vectorizing this function, can't be held in GENERAL_REGS. */ > + > +void > +foo (int *p) > +{ > + while (*p < 1) > + ++*p; > +} > > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)