On 10/10/14 11:53, Kyrill Tkachov wrote: > Hi all, > > > Some early revisions of the Cortex-A53 have an erratum (835769) whereby > it is possible for a 64-bit multiply-accumulate instruction in > AArch64 state to generate an incorrect result. The details are quite > complex and hard to determine statically, since branches in the code > may exist in some circumstances, but all cases end with a memory > (load, store, or prefetch) instruction followed immediately by the > multiply-accumulate operation. > > The safest work-around for this issue is to make the compiler avoid > emitting multiply-accumulate instructions immediately after memory > instructions and the simplest way to do this is to insert a NOP. A > linker patching technique can also be used, by moving potentially > affected multiply-accumulate instruction into a patch region and > replacing the original instruction with a branch to the patch. > > This patch achieves the compiler part by using the final prescan pass. > The insn length attribute is updated accordingly using ADJUST_INSN_LENGTH > so that conditional branches work properly. > > The fix is disabled by default and can be turned on by the > -mfix-cortex-a53-835769 compiler option. > > I'm attaching a trunk and a 4.9 version of the patch. > The 4.9 version is different only in that rtx_insn* is replaced by rtx. > > Tested on aarch64-none-linux-gnu (and bootstrap with the option) and > built various large benchmarks with it. > > Ok? > > Thanks, > Kyrill > > 2014-10-10 Kyrylo Tkachov<kyrylo.tkac...@arm.com> > Ramana Radhakrishnan<ramana.radhakrish...@arm.com> > > * config/aarch64/aarch64.h (FINAL_PRESCAN_INSN): Define. > (ADJUST_INSN_LENGTH): Define. > * config/aarch64/aarch64.opt (mfix-cortex-a53-835769): New option. > * config/aarch64/aarch64.c (is_mem_p): New function. > (is_memory_op): Likewise. > (aarch64_prev_real_insn): Likewise. > (is_madd_op): Likewise. > (dep_between_memop_and_next): Likewise. > (aarch64_madd_needs_nop): Likewise. > (aarch64_final_prescan_insn): Likewise. > * doc/invoke.texi (AArch64 Options): Document -mfix-cortex-a53-835769 > and -mno-fix-cortex-a53-835769 options. > > > a53-workaround-new.patch > > > commit 6ee2bec04fbce15b13b59b54ef4fe11aeda74ff4 > Author: Kyrill Tkachov <kyrylo.tkac...@arm.com> > Date: Wed Oct 8 12:48:34 2014 +0000 > > [AArch64] Add final prescan workaround for Cortex-A53 erratum. > > diff --git a/gcc/config/aarch64/aarch64-protos.h > b/gcc/config/aarch64/aarch64-protos.h > index b5f53d2..c57a467 100644 > --- a/gcc/config/aarch64/aarch64-protos.h > +++ b/gcc/config/aarch64/aarch64-protos.h > @@ -308,6 +308,8 @@ aarch64_builtin_vectorized_function (tree fndecl, > > extern void aarch64_split_combinev16qi (rtx operands[3]); > extern void aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel); > +extern bool aarch64_madd_needs_nop (rtx_insn *); > +extern void aarch64_final_prescan_insn (rtx_insn *); > extern bool > aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel); > void aarch64_atomic_assign_expand_fenv (tree *, tree *, tree *); > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 5144c35..76a2480 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -7586,6 +7586,128 @@ aarch64_mangle_type (const_tree type) > return NULL; > } > > +static int > +is_mem_p (rtx *x, void *data ATTRIBUTE_UNUSED) > +{ > + return MEM_P (*x); > +} > + > +static bool > +is_memory_op (rtx_insn *mem_insn) > +{ > + rtx pattern = PATTERN (mem_insn); > + return for_each_rtx (&pattern, is_mem_p, NULL); > +} > + > +/* Find the first rtx_insn before insn that will generate an assembly > + instruction. */ > + > +static rtx_insn * > +aarch64_prev_real_insn (rtx_insn *insn) > +{ > + if (!insn) > + return NULL; > + > + do > + { > + insn = prev_real_insn (insn); > + } > + while (insn && recog_memoized (insn) < 0); > + > + return insn; > +} > + > +static bool > +is_madd_op (enum attr_type t1) > +{ > + unsigned int i; > + /* A number of these may be AArch32 only. */ > + enum attr_type mlatypes[] = { > + TYPE_MLA, TYPE_MLAS, TYPE_SMLAD, TYPE_SMLADX, TYPE_SMLAL, TYPE_SMLALD, > + TYPE_SMLALS, TYPE_SMLALXY, TYPE_SMLAWX, TYPE_SMLAWY, TYPE_SMLAXY, > + TYPE_SMMLA, TYPE_UMLAL, TYPE_UMLALS,TYPE_SMLSD, TYPE_SMLSDX, TYPE_SMLSLD > + }; > + > + for (i = 0; i < sizeof (mlatypes) / sizeof (enum attr_type); i++) > + { > + if (t1 == mlatypes[i]) > + return true; > + } > + > + return false; > +} > + > +/* Check if there is a register dependency between a load and the insn > + for which we hold recog_data. */ > + > +static bool > +dep_between_memop_and_curr (rtx memop) > +{ > + rtx load_reg; > + int opno; > + > + if (!memop) > + return false; > + > + if (!REG_P (SET_DEST (memop))) > + return false; > + > + load_reg = SET_DEST (memop); > + for (opno = 0; opno < recog_data.n_operands; opno++) > + { > + rtx operand = recog_data.operand[opno]; > + if (REG_P (operand) > + && reg_overlap_mentioned_p (load_reg, operand)) > + return true; > + > + } > + return false; > +} > + > +bool > +aarch64_madd_needs_nop (rtx_insn* insn) > +{ > + enum attr_type attr_type; > + rtx_insn *prev; > + rtx body; > + > + if (!aarch64_fix_a53_err835769) > + return false; > + > + if (recog_memoized (insn) < 0) > + return false; > + > + attr_type = get_attr_type (insn); > + if (!is_madd_op (attr_type)) > + return false; > + > + prev = aarch64_prev_real_insn (insn); > + if (!prev) > + return false; > + > + body = single_set (prev); > + > + /* If the previous insn is a memory op and there is no dependency between > + it and the madd, emit a nop between them. If we know the previous insn > is > + a memory op but body is NULL, emit the nop to be safe, it's probably a > + load/store pair insn. */ > + if (is_memory_op (prev) > + && GET_MODE (recog_data.operand[0]) == DImode > + && (!dep_between_memop_and_curr (body))) > + return true; > + > + return false; > + > +} > + > +void > +aarch64_final_prescan_insn (rtx_insn *insn) > +{ > + if (aarch64_madd_needs_nop (insn)) > + fprintf (asm_out_file, "\tnop // between mem op and mult-accumulate\n"); > +} > + > + > /* Return the equivalent letter for size. */ > static char > sizetochar (int size) > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h > index db950da..e9e5fd8 100644 > --- a/gcc/config/aarch64/aarch64.h > +++ b/gcc/config/aarch64/aarch64.h > @@ -486,6 +486,15 @@ enum target_cpus > (TARGET_CPU_generic | (AARCH64_CPU_DEFAULT_FLAGS << 6)) > #endif > > +/* If inserting NOP before a mult-accumulate insn remember to adjust the > + length so that conditional branching code is updated appropriately. */ > +#define ADJUST_INSN_LENGTH(insn, length) \ > + if (aarch64_madd_needs_nop (insn)) \ > + length += 4; > + > +#define FINAL_PRESCAN_INSN(INSN, OPVEC, NOPERANDS) \ > + aarch64_final_prescan_insn (INSN); \ > + > /* The processor for which instructions should be scheduled. */ > extern enum aarch64_processor aarch64_tune; > > diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt > index f5a15b7..77deb2e 100644 > --- a/gcc/config/aarch64/aarch64.opt > +++ b/gcc/config/aarch64/aarch64.opt > @@ -67,6 +67,10 @@ mgeneral-regs-only > Target Report RejectNegative Mask(GENERAL_REGS_ONLY) > Generate code which uses only the general registers > > +mfix-cortex-a53-835769 > +Target Report Var(aarch64_fix_a53_err835769) Init(0) > +Workaround for ARM Cortex-A53 Erratum number 835769 > + > mlittle-endian > Target Report RejectNegative InverseMask(BIG_END) > Assume target CPU is configured as little endian > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index 5fe7e15..acc9dd9 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -489,6 +489,7 @@ Objective-C and Objective-C++ Dialects}. > -mstrict-align @gol > -momit-leaf-frame-pointer -mno-omit-leaf-frame-pointer @gol > -mtls-dialect=desc -mtls-dialect=traditional @gol > +-mfix-cortex-a53-835769 -mno-fix-cortex-a53-835769 @gol > -march=@var{name} -mcpu=@var{name} -mtune=@var{name}} > > @emph{Adapteva Epiphany Options} > @@ -11744,6 +11745,14 @@ of TLS variables. This is the default. > Use traditional TLS as the thread-local storage mechanism for dynamic > accesses > of TLS variables. > > +@item -mfix-cortex-a53-835769 > +@itemx -mno-fix-cortex-a53-835769 > +@opindex -mfix-cortex-a53-835769 > +@opindex -mno-fix-cortex-a53-835769 > +Enable or disable the workaround for the ARM Cortex-A53 erratum number > 835769. > +This will involve inserting a NOP instruction between memory instructions and > +64-bit integer multiply-accumulate instructions. > + > @item -march=@var{name} > @opindex march > Specify the name of the target architecture, optionally suffixed by one or > > > a53-49-workaround.patch > > > commit 48cc738e2c38dd372ae054d30964dd780478c4d1 > Author: Kyrill Tkachov <kyrylo.tkac...@arm.com> > Date: Tue Oct 7 12:42:01 2014 +0000 > > [AArch64] Add workaround for Cortex-A53 erratum 835769 > > 2014-10-07 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > Ramana Radhakrishnan <ramana.radhakrish...@arm.com> > > * config/aarch64/aarch64.h (FINAL_PRESCAN_INSN): Define. > (ADJUST_INSN_LENGTH): Define. > * config/aarch64/aarch64.opt (mfix-cortex-a53-835769): New > * option. > * config/aarch64/aarch64.c (is_mem_p): New function. > (is_memory_op): Likewise. > (aarch64_prev_real_insn): Likewise. > (is_madd_op): Likewise. > (dep_between_memop_and_curr): Likewise. > (aarch64_madd_needs_nop): Likewise. > (aarch64_final_prescan_insn): Likewise. > * doc/invoke.texi (AArch64 Options): Document > * -mfix-cortex-a53-835769 > and -mno-fix-cortex-a53-835769 options. >
> + > +/* Check if there is a register dependency between a load and the insn > + for which we hold recog_data. */ > + > +static bool > +dep_between_memop_and_curr (rtx memop) > +{ > + rtx load_reg; > + int opno; > + > + if (!memop) > + return false; > + > + if (!REG_P (SET_DEST (memop))) > + return false; > + > + load_reg = SET_DEST (memop); > + for (opno = 0; opno < recog_data.n_operands; opno++) > + { > + rtx operand = recog_data.operand[opno]; > + if (REG_P (operand) > + && reg_overlap_mentioned_p (load_reg, operand)) > + return true; > + > + } > + return false; > +} > + > +bool > +aarch64_madd_needs_nop (rtx insn) > +{ > + enum attr_type attr_type; > + rtx prev; > + rtx body; > + > + if (!aarch64_fix_a53_err835769) > + return false; > + > + if (recog_memoized (insn) < 0) > + return false; > + > + attr_type = get_attr_type (insn); > + if (!is_madd_op (attr_type)) > + return false; > + > + prev = aarch64_prev_real_insn (insn); > + if (!prev) > + return false; > + > + body = single_set (prev); > + > + /* If the previous insn is a memory op and there is no dependency between > + it and the madd, emit a nop between them. If we know the previous insn > is > + a memory op but body is NULL, emit the nop to be safe, it's probably a > + load/store pair insn. */ > + if (is_memory_op (prev) > + && GET_MODE (recog_data.operand[0]) == DImode > + && (!dep_between_memop_and_curr (body))) > + return true; > + It seems a bit strange to me that we expect dep_between_memop_and_curr to detect the non-single_set() case and act conservatively. Better would be to pull that test out to this level, giving us: body = single_set (prev); if (!body) /* Complex memory operation, possibly a load/store pair insn. Just be conservative for now. */ return true; ... Now dep_between... can assert that it is passed a valid SET() rtx. R.