Re: [HACKERS] LWLock optimization for multicore Power machines

Andres Freund Mon, 03 Apr 2017 11:57:06 -0700

Hi,

On 2017-03-31 13:38:31 +0300, Alexander Korotkov wrote:
> > It seems that on this platform definition of atomics should be provided by
> > fallback.h.  But it doesn't because I already defined 
> > PG_HAVE_ATOMIC_U32_SUPPORT
> > in arch-ppc.h.  I think in this case we shouldn't provide ppc-specific
> > implementation of pg_atomic_fetch_mask_add_u32().  However, I don't know
> > how to do this assuming arch-ppc.h is included before compiler-specific
> > headers.  Thus, in arch-ppc.h we don't know yet if we would find
> > implementation of atomics for this platform.  One possible solution is to
> > provide assembly implementation for all atomics in arch-ppc.h.


That'd probably not be good use of effort.


> BTW, implementation for all atomics in arch-ppc.h would be too invasive and
> shouldn't be considered for v10.
> However, I made following workaround: declare pg_atomic_uint32 and
> pg_atomic_fetch_mask_add_u32_impl() only when we know that generic-gcc.h
> would declare gcc-based atomics.
> Could you, please, check it on Apple PPC?

That doesn't sound crazy to me.


> +/*
> + * pg_atomic_fetch_mask_add_u32 - atomically check that masked bits in 
> variable
> + * and if they are clear then add to variable.

Hm, expanding on this wouldn't hurt.


> + * Returns the value of ptr before the atomic operation.

Sounds a bit like it'd return the pointer itself, not what it points to...


> + * Full barrier semantics.

Does it actually have full barrier semantics?  Based on a quick skim I
don't think the ppc implementation doesn't?



> +#if defined(HAVE_ATOMICS) \
> +     && (defined(HAVE_GCC__ATOMIC_INT32_CAS) || 
> defined(HAVE_GCC__SYNC_INT32_CAS))
> +
> +/*
> + * Declare pg_atomic_uint32 structure before generic-gcc.h does it in order 
> to
> + * use it in function arguments.
> + */

If we do this, then the other side needs a pointer to keep this up2date.

> +#define PG_HAVE_ATOMIC_U32_SUPPORT
> +typedef struct pg_atomic_uint32
> +{
> +     volatile uint32 value;
> +} pg_atomic_uint32;
> +
> +/*
> + * Optimized implementation of pg_atomic_fetch_mask_add_u32() for Power
> + * processors.  Atomic operations on Power processors are implemented using
> + * optimistic locking.  'lwarx' instruction 'reserves index', but that
> + * reservation could be broken on 'stwcx.' and then we have to retry.  Thus,
> + * each CAS operation is a loop.  But loop of CAS operation is two level 
> nested
> + * loop.  Experiments on multicore Power machines shows that we can have huge
> + * benefit from having this operation done using single loop in assembly.
> + */
> +#define PG_HAVE_ATOMIC_FETCH_MASK_ADD_U32
> +static inline uint32
> +pg_atomic_fetch_mask_add_u32_impl(volatile pg_atomic_uint32 *ptr,
> +                                                               uint32 mask, 
> uint32 increment)
> +{
> +     uint32          result,
> +                             tmp;
> +
> +     __asm__ __volatile__(
> +     /* read *ptr and reserve index */
> +#ifdef USE_PPC_LWARX_MUTEX_HINT
> +     "       lwarx   %0,0,%5,1       \n"
> +#else
> +     "       lwarx   %0,0,%5         \n"
> +#endif

Hm, we should remove that hint bit stuff at some point...


> +     "       and             %1,%0,%3        \n" /* calculate '*ptr & mask" 
> */
> +     "       cmpwi   %1,0            \n" /* compare '*ptr & mark' vs 0 */
> +     "       bne-    $+16            \n" /* exit on '*ptr & mark != 0' */
> +     "       add             %1,%0,%4        \n" /* calculate '*ptr + 
> increment' */
> +     "       stwcx.  %1,0,%5         \n" /* try to store '*ptr + increment' 
> into *ptr */
> +     "       bne-    $-24            \n" /* retry if index reservation is 
> broken */
> +#ifdef USE_PPC_LWSYNC
> +     "       lwsync                          \n"
> +#else
> +     "       isync                           \n"
> +#endif
> +     : "=&r"(result), "=&r"(tmp), "+m"(*ptr)
> +     : "r"(mask), "r"(increment), "r"(ptr)
> +     : "memory", "cc");
> +     return result;
> +}

I'm not sure that the barrier semantics here are sufficient.


> +/*
> + * Generic implementation of pg_atomic_fetch_mask_add_u32() via loop
> + * of compare & exchange.
> + */
> +static inline uint32
> +pg_atomic_fetch_mask_add_u32_impl(volatile pg_atomic_uint32 *ptr,
> +                                                               uint32 mask_, 
> uint32 add_)
> +{
> +     uint32          old_value;
> +
> +     /*
> +      * Read once outside the loop, later iterations will get the newer value
> +      * via compare & exchange.
> +      */
> +     old_value = pg_atomic_read_u32_impl(ptr);
> +
> +     /* loop until we've determined whether we could make an increment or 
> not */
> +     while (true)
> +     {
> +             uint32          desired_value;
> +             bool            free;
> +
> +             desired_value = old_value;
> +             free = (old_value & mask_) == 0;
> +             if (free)
> +                     desired_value += add_;
> +
> +             /*
> +              * Attempt to swap in the value we are expecting. If we didn't 
> see
> +              * masked bits to be clear, that's just the old value. If we 
> saw them
> +              * as clear, we'll attempt to make an increment. The reason 
> that we
> +              * always swap in the value is that this doubles as a memory 
> barrier.
> +              * We could try to be smarter and only swap in values if we saw 
> the
> +              * maked bits as clear, but benchmark haven't shown it as 
> beneficial
> +              * so far.
> +              *
> +              * Retry if the value changed since we last looked at it.
> +              */
> +             if (pg_atomic_compare_exchange_u32_impl(ptr, &old_value, 
> desired_value))
> +                     return old_value;
> +     }
> +     pg_unreachable();
> +}
> +#endif

Have you done x86 benchmarking?


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LWLock optimization for multicore Power machines

Reply via email to