On Wed, Dec 5, 2012 at 4:42 PM, Jakub Jelinek <[email protected]> wrote:
> On Wed, Dec 05, 2012 at 03:13:20PM +0400, Dmitry Vyukov wrote:
>> I hope I addressed all your comments in this revision.
>> I've fixed nand atomic operation, made atomic operations atomic again
>> and added visibility attribute to interface functions.
>
> Mostly, still, __sync_lock_test_and_set isn't full barrier unlike most other
> __sync_* builtins, so either you'd need to use __atomic_exchange_n if
> available (you can e.g. guard it with #ifdef __ATOMIC_SEQ_CST), or
> if not available also use __sync_synchronize ().
__atomic_xxx are available since gcc 4.7 and we use gcc 4.4 on some
bots. I do not want to clutter the code with macros too much.
I will add:
template<typename T> T func_xchg(volatile T *v, T op) {
- return __sync_lock_test_and_set(v, op);
+ T res = __sync_lock_test_and_set(v, op);
+ // __sync_lock_test_and_set does not contain full barrier.
+ __sync_synchronize();
+ return res;
}
> Also, for the atomic load and store, it would be nice to use __atomic_load_n
> and __atomic_store_n if available (as fallback __sync_synchronize (), which
> I believe needs to be before resp. after the memory operation depending on
> whether it is load or store).
It would be nice to have atomic operations in C/C++ from day one, it
would make our lives so much easier :)
My current code contains some additional memory barrier:
template<typename T>
static T AtomicLoad(ThreadState *thr, uptr pc, const volatile T *a,
morder mo) {
...
T v = *a;
...
__sync_synchronize();
...
}
template<typename T>
static void AtomicStore(ThreadState *thr, uptr pc, volatile T *a, T v,
...
__sync_synchronize();
...
*a = v;
...
}
OK, I see what you mean. There is missing trailing barrier. I will add:
thr->clock.ReleaseStore(&s->clock);
*a = v;
s->mtx.Unlock();
+ // Trainling memory barrier to provide sequential consistency
+ // for Dekker-like store-load synchronization.
+ __sync_synchronize();
}
Actually on x86 (the only platform that we support now), the mutex
provides full barriers on both lock and unlock (because it's based on
atomic RMW) :)
> And we'll need to tweak libsanitizer makefiles, so it uses -mcx16 on x86_64
> 64-bit when compiling that tsan_atomic*.cc, either we just ignore for
> libsanitizer the pre-CX16 AMD CPUs, those were just a few initial models
> I think, or we'd need to do IFUNC dispatcher, or perhaps just one time
> cpuid check that would see if it can do 16-byte atomics or not.
I think we just add -mcx16.
Thanks!