On Wed, Dec 5, 2012 at 4:42 PM, Jakub Jelinek <ja...@redhat.com> wrote:
> On Wed, Dec 05, 2012 at 03:13:20PM +0400, Dmitry Vyukov wrote:
>> I hope I addressed all your comments in this revision.
>> I've fixed nand atomic operation, made atomic operations atomic again
>> and added visibility attribute to interface functions.
>
> Mostly, still, __sync_lock_test_and_set isn't full barrier unlike most other
> __sync_* builtins, so either you'd need to use __atomic_exchange_n if
> available (you can e.g. guard it with #ifdef __ATOMIC_SEQ_CST), or
> if not available also use __sync_synchronize ().


__atomic_xxx are available since gcc 4.7 and we use gcc 4.4 on some
bots. I do not want to clutter the code with macros too much.
I will add:

 template<typename T> T func_xchg(volatile T *v, T op) {
-  return __sync_lock_test_and_set(v, op);
+  T res = __sync_lock_test_and_set(v, op);
+  // __sync_lock_test_and_set does not contain full barrier.
+  __sync_synchronize();
+  return res;
 }


> Also, for the atomic load and store, it would be nice to use __atomic_load_n
> and __atomic_store_n if available (as fallback __sync_synchronize (), which
> I believe needs to be before resp. after the memory operation depending on
> whether it is load or store).

It would be nice to have atomic operations in C/C++ from day one, it
would make our lives so much easier :)
My current code contains some additional memory barrier:

template<typename T>
static T AtomicLoad(ThreadState *thr, uptr pc, const volatile T *a,
    morder mo) {
...
  T v = *a;
...
  __sync_synchronize();
...
}

template<typename T>
static void AtomicStore(ThreadState *thr, uptr pc, volatile T *a, T v,
...
  __sync_synchronize();
...
  *a = v;
...
}

OK, I see what you mean. There is missing trailing barrier. I will add:

   thr->clock.ReleaseStore(&s->clock);
   *a = v;
   s->mtx.Unlock();
+  // Trainling memory barrier to provide sequential consistency
+  // for Dekker-like store-load synchronization.
+  __sync_synchronize();
 }

Actually on x86 (the only platform that we support now), the mutex
provides full barriers on both lock and unlock (because it's based on
atomic RMW) :)




> And we'll need to tweak libsanitizer makefiles, so it uses -mcx16 on x86_64
> 64-bit when compiling that tsan_atomic*.cc, either we just ignore for
> libsanitizer the pre-CX16 AMD CPUs, those were just a few initial models
> I think, or we'd need to do IFUNC dispatcher, or perhaps just one time
> cpuid check that would see if it can do 16-byte atomics or not.

I think we just add -mcx16.

Thanks!

Reply via email to