On Wed, Dec 5, 2012 at 4:42 PM, Jakub Jelinek <ja...@redhat.com> wrote: > On Wed, Dec 05, 2012 at 03:13:20PM +0400, Dmitry Vyukov wrote: >> I hope I addressed all your comments in this revision. >> I've fixed nand atomic operation, made atomic operations atomic again >> and added visibility attribute to interface functions. > > Mostly, still, __sync_lock_test_and_set isn't full barrier unlike most other > __sync_* builtins, so either you'd need to use __atomic_exchange_n if > available (you can e.g. guard it with #ifdef __ATOMIC_SEQ_CST), or > if not available also use __sync_synchronize ().
__atomic_xxx are available since gcc 4.7 and we use gcc 4.4 on some bots. I do not want to clutter the code with macros too much. I will add: template<typename T> T func_xchg(volatile T *v, T op) { - return __sync_lock_test_and_set(v, op); + T res = __sync_lock_test_and_set(v, op); + // __sync_lock_test_and_set does not contain full barrier. + __sync_synchronize(); + return res; } > Also, for the atomic load and store, it would be nice to use __atomic_load_n > and __atomic_store_n if available (as fallback __sync_synchronize (), which > I believe needs to be before resp. after the memory operation depending on > whether it is load or store). It would be nice to have atomic operations in C/C++ from day one, it would make our lives so much easier :) My current code contains some additional memory barrier: template<typename T> static T AtomicLoad(ThreadState *thr, uptr pc, const volatile T *a, morder mo) { ... T v = *a; ... __sync_synchronize(); ... } template<typename T> static void AtomicStore(ThreadState *thr, uptr pc, volatile T *a, T v, ... __sync_synchronize(); ... *a = v; ... } OK, I see what you mean. There is missing trailing barrier. I will add: thr->clock.ReleaseStore(&s->clock); *a = v; s->mtx.Unlock(); + // Trainling memory barrier to provide sequential consistency + // for Dekker-like store-load synchronization. + __sync_synchronize(); } Actually on x86 (the only platform that we support now), the mutex provides full barriers on both lock and unlock (because it's based on atomic RMW) :) > And we'll need to tweak libsanitizer makefiles, so it uses -mcx16 on x86_64 > 64-bit when compiling that tsan_atomic*.cc, either we just ignore for > libsanitizer the pre-CX16 AMD CPUs, those were just a few initial models > I think, or we'd need to do IFUNC dispatcher, or perhaps just one time > cpuid check that would see if it can do 16-byte atomics or not. I think we just add -mcx16. Thanks!