On Thu, Jun 21, 2012 at 6:55 AM, Alan Modra <amo...@gmail.com> wrote: > A couple of small tweaks to PowerPc atomic operations. The first > omits the "cmp; bc; isync" barrier on atomic_load with mem model > __ATOMIC_CONSUME. PowerPC pointer loads don't need a barrier. Ref > http://www.rdrop.com/users/paulmck/scalability/paper/N2745r.2011.03.04a.html > As best I can see, mem_thread_fence should not be changed similarly, > since __ATOMIC_CONSUME doesn't really make sense on a fence. So a > fence with __ATOMIC_CONSUME ought to behave as __ATOMIC_ACQUIRE. > > The second tweak forces the address used by load_locked and > store_conditional to a reg when the address is not legitimate for > those instructions, saving reload some work, reducing register > pressure and sometimes code size. Not a big deal, just something I > noticed a while ago when looking at libgomp. eg. (-original, +patched)
> Bootstrapped and regression tested powerpc64-linux. OK for mainline? > > * config/rs6000/rs6000.c (rs6000_pre_atomic_barrier): Pass in and > return mem. Convert to indirect addressing if not indirect or > indexed. Adjust all callers. The second patch is okay. The first patch is controversial. Richard, Paul McKenney and I discussed the meaning of load(MEMMODEL_CONSUME). Let me quote part of Paul's comments: What is required from memory_order_consume loads is that it meet the following constraints: 1) "Load tearing" is prohibited. In other words, the implementation must load the value "in one go" or, alternatively, prevent stores from running concurrently with a multi-instruction load. A multi-instruction load would have to be used for large data structures that are declared "atomic". 2) Similarly, "store tearing" is prohibited. In other words, the implementation must store the value "in one go" or, alternatively, prevent loads from running concurrently with a multi-instruction store. 3) The implementation is prohibiting from inventing a load or store not explicitly coded by the programmer. In contrast, an implementation might invent loads and stores for non-atomic variables in order to optimize register usage. 4) Later loads and stores to which a dependency is carried from the initial load (see 1.10p9 and 1.10p10 in the standard) cannot be reordered to precede the load(memory_order_consume). Other loads and stores following the load(memory_order_consume) may be freely reordered to precede the load(memory_order_consume). A relaxed load is subject to constraints #1-#3 above, but not #4. In contrast, a memory_order_acquire load would be subject to #1-3 above, but would be subject to a stronger version of #4 that prohibited -any- load or store following the load(memory_order_consume) to be reordered prior to that load(memory_order_consume). The isync is not technically required, but removing it requires subtle load-store tracking conformance from the compiler and library. We left it in to be safe. I'm curious about Richard and Jakub's current thoughts. - David