On Thu, Jun 21, 2012 at 6:55 AM, Alan Modra <amo...@gmail.com> wrote:
> A couple of small tweaks to PowerPc atomic operations.  The first
> omits the "cmp; bc; isync" barrier on atomic_load with mem model
> __ATOMIC_CONSUME.  PowerPC pointer loads don't need a barrier.  Ref
> http://www.rdrop.com/users/paulmck/scalability/paper/N2745r.2011.03.04a.html
> As best I can see, mem_thread_fence should not be changed similarly,
> since __ATOMIC_CONSUME doesn't really make sense on a fence.  So a
> fence with __ATOMIC_CONSUME ought to behave as __ATOMIC_ACQUIRE.
>
> The second tweak forces the address used by load_locked and
> store_conditional to a reg when the address is not legitimate for
> those instructions, saving reload some work, reducing register
> pressure and sometimes code size.  Not a big deal, just something I
> noticed a while ago when looking at libgomp.  eg. (-original, +patched)

> Bootstrapped and regression tested powerpc64-linux.  OK for mainline?
>
>        * config/rs6000/rs6000.c (rs6000_pre_atomic_barrier): Pass in and
>        return mem.  Convert to indirect addressing if not indirect or
>        indexed.  Adjust all callers.

The second patch is okay.

The first patch is controversial. Richard, Paul McKenney and I
discussed the meaning of load(MEMMODEL_CONSUME).  Let me quote part of
Paul's comments:


What is required from memory_order_consume loads is that it meet the
following constraints:

1) "Load tearing" is prohibited.  In other words, the implementation
must load the value "in one go" or, alternatively, prevent stores from
running concurrently with a multi-instruction load.  A
multi-instruction load would have to be used for large data structures
that are declared "atomic".

2) Similarly, "store tearing" is prohibited.  In other words, the
implementation must store the value "in one go" or, alternatively,
prevent loads from running concurrently with a multi-instruction
store.

3) The implementation is prohibiting from inventing a load or store
not explicitly coded by the programmer.  In contrast, an
implementation might invent loads and stores for non-atomic variables
in order to optimize register usage.

4) Later loads and stores to which a dependency is carried from the
initial load (see 1.10p9 and 1.10p10 in the standard) cannot be
reordered to precede the load(memory_order_consume).  Other loads and
stores following the load(memory_order_consume) may be freely
reordered to precede the load(memory_order_consume).

A relaxed load is subject to constraints #1-#3 above, but not #4.  In
contrast, a memory_order_acquire load would be subject to #1-3 above,
but would be subject to a stronger version of #4 that prohibited -any-
load or store following the load(memory_order_consume) to be reordered
prior to that load(memory_order_consume).


The isync is not technically required, but removing it requires subtle
load-store tracking conformance from the compiler and library.  We
left it in to be safe. I'm curious about Richard and Jakub's current
thoughts.

- David

Reply via email to