On Sat, Nov 24, 2012 at 3:06 PM, Dmitry Vyukov <dvyu...@google.com> wrote: > > On Fri, Nov 23, 2012 at 8:39 PM, Jakub Jelinek <ja...@redhat.com> wrote: > > On Fri, Nov 23, 2012 at 08:10:39PM +0400, Dmitry Vyukov wrote: > >> > This patch attempts to instrument __atomic_* and __sync_* builtins. > >> > Unfortunately for none of the builtins there is 1:1 mapping to the tsan > >> > replacements, tsan uses weirdo memory model encoding (instead of values > >> > from 0 to 5 apparently 1 << 0 to 1 << 5, as if one could have more than > >> > one memory model at the same time), so even for the easy cases GCC > >> > has to transform them. > >> > >> > >> gcc must be using old version of the library. I've switched to ABI > >> constants some time ago. > > > > Ah, it was just me looking at llvm compiler-rt tsan checkout from a few days > > ago. Guess I'll need to update the patch. So, it now expects 0 for relaxed > > up to 5 for sequentially consistent? > > Yes. > > > >> > More importantly, there is no way to pass through > >> > __ATOMIC_HLE_ACQUIRE/__ATOMIC_HLE_RELEASE. Right now the patch just > >> > gives > >> > up and doesn't replace anything if e.g. > >> > __atomic_store_n (p, 0, __ATOMIC_HLE_RELEASE | __ATOMIC_RELEASE); > >> > is seen, perhaps one could ignore the hle bits which are just an > >> > optimization, but there is no way to find out in the generic code > >> > whether the extra bits are just an optimization or change the behavior > >> > of the builtin. > >> > > >> > >> Do you mean hardware lock elission? oh, boy > > > > Yes. > > > >> It's not just "an atomic". It should be legal to downgrade them to plain > >> atomic ops (however, I am not sure what memory order they must have... is > >> it possible to have HLE_ACQUIRE before seq_cst atomic rmw?). And I think > >> that's what we need to do. > > > > Perhaps if there wasn't the compatibility hack or what is that 100500 > > comparison, or if it could be tweaked, we could pass through the HLE bits > > too and let the library decide what to do with it. > > 100500 was a tricky multi-step migration to the new scheme, because we > can't deploy compiler and runtime atomically. > I think we just need to drop HLE bits. I don't know what the runtime > can possibly do with them. And I don't want one more tricky multi-step > migration. > > > > > > > If HLE bits are set, the low order bits (model & 65535) contains the > > normal memory model, i.e. 0 (relaxed) through 5 (seq_cst), and either 65536 > > (hle acquire) or 131072 (hle release) is ored with that. > > > >> Well, it's a dirty implementation that relies on x86 memory model (and no > >> compiler optimizations, well, apparently there are data races :)). > >> I think eventually I will just replace them with mutexes. > > > > I don't see why mutexes would be better than just plain __atomic_* builtins. > > With mutexes there wouldn't be any atomicity... > > For race detector any atomic operations is a heavy operations, which > is not atomic anyway. > Currently I do: > > update vector clock > do the atomic operation > update vector clock > > where 'update vector clock' is > lock container mutex > find atomic descriptor > lock atomic descriptor > unlock container mutex > update clock (O(N)) > unlock atomic descriptor > > it's much wiser to combine 2 vector clock updates and do the atomic > operation itself under the atomic descriptor mutex. > > > > >> I've just committed a patch to llvm with failure_memory_order (r168518). > > > > Ok, I'll adjust the patch to pass both memory models through then. > > > >> Yeah, I think it's better to transform them to a more standard ones (llvm > >> also has own weird atomic ops and own memory orders). > > > > Ok, no change on the GCC side then needed for that beyond what I posted. > > > >> > Oh, and there are no 16-byte atomics provided by libtsan, the library > >> > would need to be built with -mcx16 on x86_64 for that and have the > >> > additional entry points for unsigned __int128. The patch doesn't touch > >> > the 16-byte atomics, leaves them as is. > >> > >> I think that's fine for now. > > > > Perhaps. Note that such atomics are used also e.g. for #pragma omp atomic > > on long double etc. > > > >> That's what llvm does as well. But it inserts a fast path before > >> __cxa_guard_acquire -- acquire-load of the lock word. Doesn't gcc do the > >> same? > >> tsan intercepts __cxa_guard functions. > > > > Yes, except it isn't __atomic_load_*, but plain memory read. > > _3 = MEM[(char *)&_ZGVZ3foovE1a]; > > if (_3 == 0) > > goto <bb 3>; > > else > > goto <bb 8>; > > > > <bb 8>: > > fast path, whatever; > > > > <bb 3>: > > _5 = __cxa_guard_acquire (&_ZGVZ3foovE1a); > > ... > > > > So, right now tsan would just instrument it as __tsan_read1 from > > &_ZGVZ3foovE1a rather than any atomic load. > > > Looks like a bug. That needs to be load-acquire with proper compiler > and hardware memory ordering.
Can anybody confirm whether it's a bug or not. It can also involve compiler reorderings, especially if the object ctor/functions contains accesses to other global objects.