On Wed, Mar 08, 2017 at 04:35:38PM +0000, Kyrill Tkachov wrote: > Hi all, > > For the testcase in this patch where the value of x is zero we currently > generate: > foo: > mov w1, 4 > .L2: > ldaxr w2, [x0] > cmp w2, 0 > bne .L3 > stxr w3, w1, [x0] > cbnz w3, .L2 > .L3: > cset w0, eq > ret > > We currently cannot merge the cmp and b.ne inside the loop into a cbnz > because we need the condition flags set for the return value of the function > (i.e. the cset at the end). But if we re-jig the sequence in that case we > can generate a tighter loop: > foo: > mov w1, 4 > .L2: > ldaxr w2, [x0] > cbnz w2, .L3 > stxr w3, w1, [x0] > cbnz w3, .L2 > .L3: > cmp w2, 0 > cset w0, eq > ret > > So we add an explicit compare after the loop and inside the loop we use the > fact that we're comparing against zero to emit a CBNZ. This means we may > re-do the comparison twice (once inside the CBNZ, once at the CMP at the > end), but there is now less code inside the loop. > > I've seen this sequence appear in glibc locking code so maybe it's worth > adding the extra bit of complexity to the compare-exchange splitter to catch > this case. > > Bootstrapped and tested on aarch64-none-linux-gnu. In previous iterations of > the patch where I had gotten some logic wrong it would cause miscompiles of > libgomp leading to timeouts in its testsuite but this version passes > everything cleanly. > > Ok for GCC 8? (I know it's early, but might as well get it out in case > someone wants to try it out)
OK. Thanks, James > > Thanks, > Kyrill > > 2017-03-08 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > * config/aarch64/aarch64.c (aarch64_split_compare_and_swap): > Emit CBNZ inside loop when doing a strong exchange and comparing > against zero. Generate the CC flags after the loop. > > 2017-03-08 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > * gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c: New test. > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index > 76a2de20dfcd4ea38fb7c58a9e8612509c5987bd..5fa8e197328ce4cb1718ff7d99b1ea95e02129a4 > 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -12095,6 +12095,17 @@ aarch64_split_compare_and_swap (rtx operands[]) > mode = GET_MODE (mem); > model = memmodel_from_int (INTVAL (model_rtx)); > > + /* When OLDVAL is zero and we want the strong version we can emit a tighter > + loop: > + .label1: > + LD[A]XR rval, [mem] > + CBNZ rval, .label2 > + ST[L]XR scratch, newval, [mem] > + CBNZ scratch, .label1 > + .label2: > + CMP rval, 0. */ > + bool strong_zero_p = !is_weak && oldval == const0_rtx; > + > label1 = NULL; > if (!is_weak) > { > @@ -12111,11 +12122,21 @@ aarch64_split_compare_and_swap (rtx operands[]) > else > aarch64_emit_load_exclusive (mode, rval, mem, model_rtx); > > - cond = aarch64_gen_compare_reg (NE, rval, oldval); > - x = gen_rtx_NE (VOIDmode, cond, const0_rtx); > - x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, > - gen_rtx_LABEL_REF (Pmode, label2), pc_rtx); > - aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x)); > + if (strong_zero_p) > + { > + x = gen_rtx_NE (VOIDmode, rval, const0_rtx); > + x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, > + gen_rtx_LABEL_REF (Pmode, label2), pc_rtx); > + aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x)); > + } > + else > + { > + cond = aarch64_gen_compare_reg (NE, rval, oldval); > + x = gen_rtx_NE (VOIDmode, cond, const0_rtx); > + x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, > + gen_rtx_LABEL_REF (Pmode, label2), pc_rtx); > + aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x)); > + } > > aarch64_emit_store_exclusive (mode, scratch, mem, newval, model_rtx); > > @@ -12134,7 +12155,15 @@ aarch64_split_compare_and_swap (rtx operands[]) > } > > emit_label (label2); > - > + /* If we used a CBNZ in the exchange loop emit an explicit compare with > RVAL > + to set the condition flags. If this is not used it will be removed by > + later passes. */ > + if (strong_zero_p) > + { > + cond = gen_rtx_REG (CCmode, CC_REGNUM); > + x = gen_rtx_COMPARE (CCmode, rval, const0_rtx); > + emit_insn (gen_rtx_SET (cond, x)); > + } > /* Emit any final barrier needed for a __sync operation. */ > if (is_mm_sync (model)) > aarch64_emit_post_barrier (model); > diff --git > a/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c > b/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..b14a7c294376f03cd13077d18d865f83a04bd04e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_strong_1.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +int > +foo (int *a) > +{ > + int x = 0; > + return __atomic_compare_exchange_n (a, &x, 4, 0, > + __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE); > +} > + > +/* { dg-final { scan-assembler-times "cbnz\\tw\[0-9\]+" 2 } } */