On Sat, Nov 24, 2012 at 3:06 PM, Dmitry Vyukov <dvyu...@google.com> wrote:
>
> On Fri, Nov 23, 2012 at 8:39 PM, Jakub Jelinek <ja...@redhat.com> wrote:
> > On Fri, Nov 23, 2012 at 08:10:39PM +0400, Dmitry Vyukov wrote:
> >> > This patch attempts to instrument __atomic_* and __sync_* builtins.
> >> > Unfortunately for none of the builtins there is 1:1 mapping to the tsan
> >> > replacements, tsan uses weirdo memory model encoding (instead of values
> >> > from 0 to 5 apparently 1 << 0 to 1 << 5, as if one could have more than
> >> > one memory model at the same time), so even for the easy cases GCC
> >> > has to transform them.
> >>
> >>
> >> gcc must be using old version of the library. I've switched to ABI
> >> constants some time ago.
> >
> > Ah, it was just me looking at llvm compiler-rt tsan checkout from a few days
> > ago.  Guess I'll need to update the patch.  So, it now expects 0 for relaxed
> > up to 5 for sequentially consistent?
>
> Yes.
>
>
> >> > More importantly, there is no way to pass through
> >> > __ATOMIC_HLE_ACQUIRE/__ATOMIC_HLE_RELEASE.  Right now the patch just 
> >> > gives
> >> > up and doesn't replace anything if e.g.
> >> > __atomic_store_n (p, 0, __ATOMIC_HLE_RELEASE | __ATOMIC_RELEASE);
> >> > is seen, perhaps one could ignore the hle bits which are just an
> >> > optimization, but there is no way to find out in the generic code
> >> > whether the extra bits are just an optimization or change the behavior
> >> > of the builtin.
> >> >
> >>
> >> Do you mean hardware lock elission? oh, boy
> >
> > Yes.
> >
> >> It's not just "an atomic". It should be legal to downgrade them to plain
> >> atomic ops (however, I am not sure what memory order they must have... is
> >> it possible to have HLE_ACQUIRE before seq_cst atomic rmw?). And I think
> >> that's what we need to do.
> >
> > Perhaps if there wasn't the compatibility hack or what is that 100500
> > comparison, or if it could be tweaked, we could pass through the HLE bits
> > too and let the library decide what to do with it.
>
> 100500 was a tricky multi-step migration to the new scheme, because we
> can't deploy compiler and runtime atomically.
> I think we just need to drop HLE bits. I don't know what the runtime
> can possibly do with them. And I don't want one more tricky multi-step
> migration.
>
>
>
> >
> > If HLE bits are set, the low order bits (model & 65535) contains the
> > normal memory model, i.e. 0 (relaxed) through 5 (seq_cst), and either 65536
> > (hle acquire) or 131072 (hle release) is ored with that.
> >
> >> Well, it's a dirty implementation that relies on x86 memory model (and no
> >> compiler optimizations, well, apparently there are data races :)).
> >> I think eventually I will just replace them with mutexes.
> >
> > I don't see why mutexes would be better than just plain __atomic_* builtins.
> > With mutexes there wouldn't be any atomicity...
>
> For race detector any atomic operations is a heavy operations, which
> is not atomic anyway.
> Currently I do:
>
> update vector clock
> do the atomic operation
> update vector clock
>
> where 'update vector clock' is
> lock container mutex
> find atomic descriptor
> lock atomic descriptor
> unlock container mutex
> update clock (O(N))
> unlock atomic descriptor
>
> it's much wiser to combine 2 vector clock updates and do the atomic
> operation itself under the atomic descriptor mutex.
>
>
>
> >> I've just committed a patch to llvm with failure_memory_order (r168518).
> >
> > Ok, I'll adjust the patch to pass both memory models through then.
> >
> >> Yeah, I think it's better to transform them to a more standard ones (llvm
> >> also has own weird atomic ops and own memory orders).
> >
> > Ok, no change on the GCC side then needed for that beyond what I posted.
> >
> >> > Oh, and there are no 16-byte atomics provided by libtsan, the library
> >> > would need to be built with -mcx16 on x86_64 for that and have the
> >> > additional entry points for unsigned __int128.  The patch doesn't touch
> >> > the 16-byte atomics, leaves them as is.
> >>
> >> I think that's fine for now.
> >
> > Perhaps.  Note that such atomics are used also e.g. for #pragma omp atomic
> > on long double etc.
> >
> >> That's what llvm does as well. But it inserts a fast path before
> >> __cxa_guard_acquire -- acquire-load of the lock word. Doesn't gcc do the
> >> same?
> >> tsan intercepts __cxa_guard functions.
> >
> > Yes, except it isn't __atomic_load_*, but plain memory read.
> >   _3 = MEM[(char *)&_ZGVZ3foovE1a];
> >   if (_3 == 0)
> >     goto <bb 3>;
> >   else
> >     goto <bb 8>;
> >
> >   <bb 8>:
> >   fast path, whatever;
> >
> >   <bb 3>:
> >   _5 = __cxa_guard_acquire (&_ZGVZ3foovE1a);
> >   ...
> >
> > So, right now tsan would just instrument it as __tsan_read1 from
> > &_ZGVZ3foovE1a rather than any atomic load.
>
>
> Looks like a bug. That needs to be load-acquire with proper compiler
> and hardware memory ordering.


Can anybody confirm whether it's a bug or not.
It can also involve compiler reorderings, especially if the object
ctor/functions contains accesses to other global objects.

Reply via email to