Hi, On 2017-03-31 13:38:31 +0300, Alexander Korotkov wrote: > > It seems that on this platform definition of atomics should be provided by > > fallback.h. But it doesn't because I already defined > > PG_HAVE_ATOMIC_U32_SUPPORT > > in arch-ppc.h. I think in this case we shouldn't provide ppc-specific > > implementation of pg_atomic_fetch_mask_add_u32(). However, I don't know > > how to do this assuming arch-ppc.h is included before compiler-specific > > headers. Thus, in arch-ppc.h we don't know yet if we would find > > implementation of atomics for this platform. One possible solution is to > > provide assembly implementation for all atomics in arch-ppc.h.
That'd probably not be good use of effort. > BTW, implementation for all atomics in arch-ppc.h would be too invasive and > shouldn't be considered for v10. > However, I made following workaround: declare pg_atomic_uint32 and > pg_atomic_fetch_mask_add_u32_impl() only when we know that generic-gcc.h > would declare gcc-based atomics. > Could you, please, check it on Apple PPC? That doesn't sound crazy to me. > +/* > + * pg_atomic_fetch_mask_add_u32 - atomically check that masked bits in > variable > + * and if they are clear then add to variable. Hm, expanding on this wouldn't hurt. > + * Returns the value of ptr before the atomic operation. Sounds a bit like it'd return the pointer itself, not what it points to... > + * Full barrier semantics. Does it actually have full barrier semantics? Based on a quick skim I don't think the ppc implementation doesn't? > +#if defined(HAVE_ATOMICS) \ > + && (defined(HAVE_GCC__ATOMIC_INT32_CAS) || > defined(HAVE_GCC__SYNC_INT32_CAS)) > + > +/* > + * Declare pg_atomic_uint32 structure before generic-gcc.h does it in order > to > + * use it in function arguments. > + */ If we do this, then the other side needs a pointer to keep this up2date. > +#define PG_HAVE_ATOMIC_U32_SUPPORT > +typedef struct pg_atomic_uint32 > +{ > + volatile uint32 value; > +} pg_atomic_uint32; > + > +/* > + * Optimized implementation of pg_atomic_fetch_mask_add_u32() for Power > + * processors. Atomic operations on Power processors are implemented using > + * optimistic locking. 'lwarx' instruction 'reserves index', but that > + * reservation could be broken on 'stwcx.' and then we have to retry. Thus, > + * each CAS operation is a loop. But loop of CAS operation is two level > nested > + * loop. Experiments on multicore Power machines shows that we can have huge > + * benefit from having this operation done using single loop in assembly. > + */ > +#define PG_HAVE_ATOMIC_FETCH_MASK_ADD_U32 > +static inline uint32 > +pg_atomic_fetch_mask_add_u32_impl(volatile pg_atomic_uint32 *ptr, > + uint32 mask, > uint32 increment) > +{ > + uint32 result, > + tmp; > + > + __asm__ __volatile__( > + /* read *ptr and reserve index */ > +#ifdef USE_PPC_LWARX_MUTEX_HINT > + " lwarx %0,0,%5,1 \n" > +#else > + " lwarx %0,0,%5 \n" > +#endif Hm, we should remove that hint bit stuff at some point... > + " and %1,%0,%3 \n" /* calculate '*ptr & mask" > */ > + " cmpwi %1,0 \n" /* compare '*ptr & mark' vs 0 */ > + " bne- $+16 \n" /* exit on '*ptr & mark != 0' */ > + " add %1,%0,%4 \n" /* calculate '*ptr + > increment' */ > + " stwcx. %1,0,%5 \n" /* try to store '*ptr + increment' > into *ptr */ > + " bne- $-24 \n" /* retry if index reservation is > broken */ > +#ifdef USE_PPC_LWSYNC > + " lwsync \n" > +#else > + " isync \n" > +#endif > + : "=&r"(result), "=&r"(tmp), "+m"(*ptr) > + : "r"(mask), "r"(increment), "r"(ptr) > + : "memory", "cc"); > + return result; > +} I'm not sure that the barrier semantics here are sufficient. > +/* > + * Generic implementation of pg_atomic_fetch_mask_add_u32() via loop > + * of compare & exchange. > + */ > +static inline uint32 > +pg_atomic_fetch_mask_add_u32_impl(volatile pg_atomic_uint32 *ptr, > + uint32 mask_, > uint32 add_) > +{ > + uint32 old_value; > + > + /* > + * Read once outside the loop, later iterations will get the newer value > + * via compare & exchange. > + */ > + old_value = pg_atomic_read_u32_impl(ptr); > + > + /* loop until we've determined whether we could make an increment or > not */ > + while (true) > + { > + uint32 desired_value; > + bool free; > + > + desired_value = old_value; > + free = (old_value & mask_) == 0; > + if (free) > + desired_value += add_; > + > + /* > + * Attempt to swap in the value we are expecting. If we didn't > see > + * masked bits to be clear, that's just the old value. If we > saw them > + * as clear, we'll attempt to make an increment. The reason > that we > + * always swap in the value is that this doubles as a memory > barrier. > + * We could try to be smarter and only swap in values if we saw > the > + * maked bits as clear, but benchmark haven't shown it as > beneficial > + * so far. > + * > + * Retry if the value changed since we last looked at it. > + */ > + if (pg_atomic_compare_exchange_u32_impl(ptr, &old_value, > desired_value)) > + return old_value; > + } > + pg_unreachable(); > +} > +#endif Have you done x86 benchmarking? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers