On Fri, Nov 23, 2012 at 9:07 PM, Xinliang David Li <davi...@google.com> wrote: > On Fri, Nov 23, 2012 at 8:39 AM, Jakub Jelinek <ja...@redhat.com> wrote: >> On Fri, Nov 23, 2012 at 08:10:39PM +0400, Dmitry Vyukov wrote: >>> > This patch attempts to instrument __atomic_* and __sync_* builtins. >>> > Unfortunately for none of the builtins there is 1:1 mapping to the tsan >>> > replacements, tsan uses weirdo memory model encoding (instead of values >>> > from 0 to 5 apparently 1 << 0 to 1 << 5, as if one could have more than >>> > one memory model at the same time), so even for the easy cases GCC >>> > has to transform them. >>> >>> >>> gcc must be using old version of the library. I've switched to ABI >>> constants some time ago. >> >> Ah, it was just me looking at llvm compiler-rt tsan checkout from a few days >> ago. Guess I'll need to update the patch. So, it now expects 0 for relaxed >> up to 5 for sequentially consistent? >> >>> > More importantly, there is no way to pass through >>> > __ATOMIC_HLE_ACQUIRE/__ATOMIC_HLE_RELEASE. Right now the patch just gives >>> > up and doesn't replace anything if e.g. >>> > __atomic_store_n (p, 0, __ATOMIC_HLE_RELEASE | __ATOMIC_RELEASE); >>> > is seen, perhaps one could ignore the hle bits which are just an >>> > optimization, but there is no way to find out in the generic code >>> > whether the extra bits are just an optimization or change the behavior >>> > of the builtin. >>> > >>> >>> Do you mean hardware lock elission? oh, boy >> >> Yes. >> >>> It's not just "an atomic". It should be legal to downgrade them to plain >>> atomic ops (however, I am not sure what memory order they must have... is >>> it possible to have HLE_ACQUIRE before seq_cst atomic rmw?). And I think >>> that's what we need to do. >> >> Perhaps if there wasn't the compatibility hack or what is that 100500 >> comparison, or if it could be tweaked, we could pass through the HLE bits >> too and let the library decide what to do with it. >> >> If HLE bits are set, the low order bits (model & 65535) contains the >> normal memory model, i.e. 0 (relaxed) through 5 (seq_cst), and either 65536 >> (hle acquire) or 131072 (hle release) is ored with that. >> >>> Well, it's a dirty implementation that relies on x86 memory model (and no >>> compiler optimizations, well, apparently there are data races :)). >>> I think eventually I will just replace them with mutexes. >> >> I don't see why mutexes would be better than just plain __atomic_* builtins. >> With mutexes there wouldn't be any atomicity... >> >>> I've just committed a patch to llvm with failure_memory_order (r168518). >> >> Ok, I'll adjust the patch to pass both memory models through then. >> >>> Yeah, I think it's better to transform them to a more standard ones (llvm >>> also has own weird atomic ops and own memory orders). >> >> Ok, no change on the GCC side then needed for that beyond what I posted. >> >>> > Oh, and there are no 16-byte atomics provided by libtsan, the library >>> > would need to be built with -mcx16 on x86_64 for that and have the >>> > additional entry points for unsigned __int128. The patch doesn't touch >>> > the 16-byte atomics, leaves them as is. >>> >>> I think that's fine for now. >> >> Perhaps. Note that such atomics are used also e.g. for #pragma omp atomic >> on long double etc. >> >>> That's what llvm does as well. But it inserts a fast path before >>> __cxa_guard_acquire -- acquire-load of the lock word. Doesn't gcc do the >>> same? >>> tsan intercepts __cxa_guard functions. >> >> Yes, except it isn't __atomic_load_*, but plain memory read. >> _3 = MEM[(char *)&_ZGVZ3foovE1a]; >> if (_3 == 0) >> goto <bb 3>; >> else >> goto <bb 8>; >> >> <bb 8>: >> fast path, whatever; >> >> <bb 3>: >> _5 = __cxa_guard_acquire (&_ZGVZ3foovE1a); >> ... >> >> So, right now tsan would just instrument it as __tsan_read1 from >> &_ZGVZ3foovE1a rather than any atomic load. >> >>> Well, yes, the compiler module must pass to the runtime all memory >>> accesses, whatever form they have in compiler internal representation. >>> Yes, I think I need to provide range memory access functions in runtime. I >>> already have this issue for Go language, there are a lot of range accesses >>> due to arrays and slices. >>> I will add them next week. >> >> Ok. A slight problem then is that where the tsan pass sits right now, there >> is no easy way to find out if the builtin call will be expanded inline or >> not, so (similar for asan), if we instrument them in the pass, it might be >> instrumented twice at runtime if the builtin is expanded as a library call >> (once the added instrumentation for the builtin, once in the intercepted >> library call). That isn't wrong, just might need slightly more resources >> than if we ensured we only instrument the builtin if it isn't expanded >> inline. >> > > Should inlining of those functions be disabled as if -fno-builtins is > specified?
Yes, it sounds reasonable. Performance characteristics under tsan differ significantly, so most likely we don't care.