Acked-by: Marek Olšák <marek.ol...@amd.com> Somebody else should review the configure.ac change.
Marek On Tue, Oct 4, 2016 at 4:14 PM, Nicolai Hähnle <nhaeh...@gmail.com> wrote: > From: Nicolai Hähnle <nicolai.haeh...@amd.com> > > This is motivated by the fact that p_atomic_read and p_atomic_set may > somewhat surprisingly not do the right thing in the old version: while > stores and loads are de facto atomic at least on x86, the compiler may > apply re-ordering and speculation quite liberally. Basically, the old > version uses the "relaxed" memory ordering. > > The new ordering always uses acquire/release ordering. This is the > strongest possible memory ordering that doesn't require additional > fence instructions on x86. (And the only stronger ordering is > "sequentially consistent", which is usually more than you need anyway.) > > I would feel more comfortable if p_atomic_set/read in the old > implementation were at least using volatile loads and stores, but I > don't see a way to get there without typeof (which we cannot use here > since the code is compiled with -std=c99). > > Eventually, we should really just move to something that is based on > the atomics in C11 / C++11. > --- > configure.ac | 11 +++++++++++ > src/util/u_atomic.h | 21 +++++++++++++++++++++ > 2 files changed, 32 insertions(+) > > diff --git a/configure.ac b/configure.ac > index 1bfac3b..421f4f3 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -380,20 +380,31 @@ int main () { > c = _mm_max_epu32(a, b); > return _mm_cvtsi128_si32(c); > }]])], SSE41_SUPPORTED=1) > CFLAGS="$save_CFLAGS" > if test "x$SSE41_SUPPORTED" = x1; then > DEFINES="$DEFINES -DUSE_SSE41" > fi > AM_CONDITIONAL([SSE41_SUPPORTED], [test x$SSE41_SUPPORTED = x1]) > AC_SUBST([SSE41_CFLAGS], $SSE41_CFLAGS) > > +dnl Check for new-style atomic builtins > +AC_COMPILE_IFELSE([AC_LANG_SOURCE([[ > +int main() { > + int n; > + return __atomic_load_n(&n, __ATOMIC_ACQUIRE); > +}]])], GCC_ATOMIC_BUILTINS_SUPPORTED=1) > +if test "x$GCC_ATOMIC_BUILTINS_SUPPORTED" = x1; then > + DEFINES="$DEFINES -DUSE_GCC_ATOMIC_BUILTINS" > +fi > +AM_CONDITIONAL([GCC_ATOMIC_BUILTINS_SUPPORTED], [test > x$GCC_ATOMIC_BUILTINS_SUPPORTED = x1]) > + > dnl Check for Endianness > AC_C_BIGENDIAN( > little_endian=no, > little_endian=yes, > little_endian=no, > little_endian=no > ) > > dnl Check for POWER8 Architecture > PWR8_CFLAGS="-mpower8-vector" > diff --git a/src/util/u_atomic.h b/src/util/u_atomic.h > index 8675903..2a5bbae 100644 > --- a/src/util/u_atomic.h > +++ b/src/util/u_atomic.h > @@ -29,28 +29,49 @@ > #error "Unsupported platform" > #endif > > > /* Implementation using GCC-provided synchronization intrinsics > */ > #if defined(PIPE_ATOMIC_GCC_INTRINSIC) > > #define PIPE_ATOMIC "GCC Sync Intrinsics" > > +#if defined(USE_GCC_ATOMIC_BUILTINS) > + > +/* The builtins with explicit memory model are available since GCC 4.7. */ > +#define p_atomic_set(_v, _i) __atomic_store_n((_v), (_i), __ATOMIC_RELEASE) > +#define p_atomic_read(_v) __atomic_load_n((_v), __ATOMIC_ACQUIRE) > +#define p_atomic_dec_zero(v) (__atomic_sub_fetch((v), 1, __ATOMIC_ACQ_REL) > == 0) > +#define p_atomic_inc(v) (void) __atomic_add_fetch((v), 1, __ATOMIC_ACQ_REL) > +#define p_atomic_dec(v) (void) __atomic_sub_fetch((v), 1, __ATOMIC_ACQ_REL) > +#define p_atomic_add(v, i) (void) __atomic_add_fetch((v), (i), > __ATOMIC_ACQ_REL) > +#define p_atomic_inc_return(v) __atomic_add_fetch((v), 1, __ATOMIC_ACQ_REL) > +#define p_atomic_dec_return(v) __atomic_sub_fetch((v), 1, __ATOMIC_ACQ_REL) > + > +#else > + > #define p_atomic_set(_v, _i) (*(_v) = (_i)) > #define p_atomic_read(_v) (*(_v)) > #define p_atomic_dec_zero(v) (__sync_sub_and_fetch((v), 1) == 0) > #define p_atomic_inc(v) (void) __sync_add_and_fetch((v), 1) > #define p_atomic_dec(v) (void) __sync_sub_and_fetch((v), 1) > #define p_atomic_add(v, i) (void) __sync_add_and_fetch((v), (i)) > #define p_atomic_inc_return(v) __sync_add_and_fetch((v), 1) > #define p_atomic_dec_return(v) __sync_sub_and_fetch((v), 1) > + > +#endif > + > +/* There is no __atomic_* compare and exchange that returns the current > value. > + * Also, GCC 5.4 seems unable to optimize a compound statement expression > that > + * uses an additional stack variable with __atomic_compare_exchange[_n]. > + */ > #define p_atomic_cmpxchg(v, old, _new) \ > __sync_val_compare_and_swap((v), (old), (_new)) > > #endif > > > > /* Unlocked version for single threaded environments, such as some > * windows kernel modules. > */ > -- > 2.7.4 > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev