On Thu, May 21, 2015 at 04:57:00PM +0100, Matthew Wahab wrote: > On Aarch64, the __sync builtins are implemented using the __atomic operations > and barriers. This makes the the __sync builtins inconsistent with their > documentation which requires stronger barriers than those for the __atomic > builtins.
<snip> > Ok for trunk? Close, but I have some comments on style. Please tie this to the PR which was open in the ChangLog entry. > > gcc/ > 2015-05-21 Matthew Wahab <matthew.wa...@arm.com> > > * config/aarch64/aarch64.c (aarch64_emit_post_barrier): New. If I was being picky, this should be something like aarch64_emit_sync_op_epilogue , but I don't want to bikeshed too much. > (aarch64_split_atomic_op): Check for __sync memory models, emit > appropriate initial and final barriers. I don't see any new initial barriers. I think you are referring to relaxing the ldaxr to an ldxr for __sync primitives, in which case, say that. > From 2092902d2738b0c24a6272e0b3480bb9cffd275c Mon Sep 17 00:00:00 2001 > From: Matthew Wahab <matthew.wa...@arm.com> > Date: Fri, 15 May 2015 09:26:28 +0100 > Subject: [PATCH 1/3] [AArch64] Strengthen barriers for sync-fetch-op builtin. > > Change-Id: I3342a572d672163ffc703e4e51603744680334fc > --- > gcc/config/aarch64/aarch64.c | 30 +++++++++++++++++++++++++++++- > 1 file changed, 29 insertions(+), 1 deletion(-) > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 7f0cc0d..778571f 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -9249,6 +9249,22 @@ aarch64_expand_compare_and_swap (rtx operands[]) > emit_insn (gen_rtx_SET (bval, x)); > } > > +/* Emit a post-operation barrier. */ This comment could do with some more detail. What is a post-operation barrier? When do we need one? What is the MODEL parameter? > +static void > +aarch64_emit_post_barrier (enum memmodel model) > +{ > + const enum memmodel base_model = memmodel_base (model); > + > + if (is_mm_sync (model) > + && (base_model == MEMMODEL_ACQUIRE > + || base_model == MEMMODEL_ACQ_REL > + || base_model == MEMMODEL_SEQ_CST)) > + { > + emit_insn (gen_mem_thread_fence (GEN_INT (MEMMODEL_SEQ_CST))); > + } > +} > + > /* Split a compare and swap pattern. */ > > void > @@ -9311,12 +9327,20 @@ aarch64_split_atomic_op (enum rtx_code code, rtx > old_out, rtx new_out, rtx mem, > { > machine_mode mode = GET_MODE (mem); > machine_mode wmode = (mode == DImode ? DImode : SImode); > + const enum memmodel model = memmodel_from_int (INTVAL (model_rtx)); > + const bool is_sync = is_mm_sync (model); > + rtx load_model_rtx = model_rtx; > rtx_code_label *label; > rtx x; > > label = gen_label_rtx (); > emit_label (label); > > + /* A __sync operation will emit a final fence to stop code hoisting, so the Can we pick a consistent terminology between fence/barrier? They are currently used interchangeably, but I think we generally prefer barrier in the AArch64 port. > + load can be relaxed. */ > + if (is_sync) > + load_model_rtx = GEN_INT (MEMMODEL_RELAXED); > + > if (new_out) > new_out = gen_lowpart (wmode, new_out); > if (old_out) > @@ -9325,7 +9349,7 @@ aarch64_split_atomic_op (enum rtx_code code, rtx > old_out, rtx new_out, rtx mem, > old_out = new_out; > value = simplify_gen_subreg (wmode, value, mode, 0); > > - aarch64_emit_load_exclusive (mode, old_out, mem, model_rtx); > + aarch64_emit_load_exclusive (mode, old_out, mem, load_model_rtx); To my mind, these two hunks would be marginally easier to follow if we combined them, as so: /* A __sync operation will emit a final barrier to stop code hoisting, so the load can be relaxed. */ if (is_sync) aarch64_emit_load_exclusive (mode, old_out, mem, GEN_INT (MEMMODEL_RELAXED)); else aarch64_emit_load_exclusive (mode, old_out, mem, model_rtx); it is just one less piece of information to juggle when thinking through what we are emitting here. > > switch (code) > { > @@ -9361,6 +9385,10 @@ aarch64_split_atomic_op (enum rtx_code code, rtx > old_out, rtx new_out, rtx mem, > x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, > gen_rtx_LABEL_REF (Pmode, label), pc_rtx); > aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x)); > + > + /* Emit any fence needed for a __sync operation. */ > + if (is_sync) > + aarch64_emit_post_barrier (model); > } Cheers, James