On Mon, Jun 30, 2014 at 9:26 AM, Andres Freund <and...@2ndquadrant.com> wrote: > I think it continues in the tradition of making s_lock.h ever harder to > follow. But it's still better than what we have now from a correctness > perspective.
Well, as you and I have discussed before, someday we probably need to get rid of s_lock.h or at least refactor it heavily, but let's do one thing at a time. I think we're eventually going to require every platform that wants to run PostgreSQL to have working barriers and atomics, and then we can rewrite s_lock.h into something much shorter and cleaner, but I am opposed to doing that today, because even if we don't care about people running obscure proprietary compilers on weird platforms, there are still lots of people running older GCC versions. For right now, I think the prudent thing to do is keep s_lock.h on life support. >> I think >> newer releases of Sun Studio support that GCC way of doing a barrier, >> but I don't know how to test for that, so at the moment that uses the >> fallback "put it in a function" approach, > > Sun studio's support for the gcc way is new (some update to sun studio 12), > so that's > probably not sufficient. > #include <mbarrier.h> and __compiler_barrier()/__machine_rel_barrier() > should do the trick for spinlocks. That seems to have existed for > longer. Can you either link to relevant documentation or be more specific about what needs to be done here? > Solaris seems to run with TSO enabled for SPARC (whereas linux uses RMO, > relaxed memory ordering), so it's probably fine to just use the compiler > barrier. If it isn't, that's a change that has nothing to do with making spinlock operations compiler barriers and everything to do with fixing a bug in the existing implementation. > So you believe we have a reliable barrier implementation - but you don't > believe it's reliable enough for spinlocks? If we'd *not* use the > barrier fallback for spinlock release if, and only if, it's implemented > via spinlocks, I don't see why we'd be worse off? I can't parse this sentence because it's not clearly to me exactly which part the "not" applies to, and I think we're talking past each other a bit, too. Basically, every platform we support today has a spinlock implementation that is supposed to prevent CPU reordering across the acquire and release operations but might not prevent compiler reordering. IOW, S_LOCK() should be a CPU acquire fence, and S_UNLOCK() should be a CPU release fence. Now, we want to make these operations compiler fences as well, and AIUI your proposal for that is to make NEW_S_UNLOCK(lock) = out of line function call + S_LOCK(dummy) + S_UNLOCK(dummy) + S_UNLOCK(lock). My proposal is to make NEW_S_UNLOCK(lock) = out of line function call + S_UNLOCK(lock), which I think is strictly better. There's zip to guarantee that adding S_LOCK(dummy) + S_UNLOCK(dummy) is doing anything we want in there, and it's definitely going to be worse for performance. The other thing that I don't like about your proposal has to do with the fact that the support matrix for barrier.h and s_lock.h are not identical. If s_lock.h is confusing to you today, making it further intertwined with barrier.h is not going to make things better; at least, that confuses the crap out of me. Perhaps the only good thing to be said about this mess is that, right now, the dependency is in just one direction: barrier.h depends on s_lock.h, and not the other way around. At some future point we may well want to reverse the direction of that dependency, but to do that we need barrier.h to have a working barrier implementation for every platform that s_lock.h supports. Maybe we'll accomplish that by adding to barrier.h and and maybe we'll accomplish that by subtracting from s_lock.h but the short path to getting this issue fixed is to be agnostic to that question. > 1) > Afaics ARM will fall back to this for older gccs - and ARM is weakly > ordered. I think I'd just also use swpb to release the lock. Something > like > #define S_INIT_LOCK(lock) (*(lock)) = 0); > > #define S_UNLOCK s_unlock > static __inline__ void > s_unlock(volatile slock_t *lock) > { > register slock_t _res = 0; > > __asm__ __volatile__( > " swpb %0, %0, [%2] \n" > : "+r"(_res), "+m"(*lock) > : "r"(lock) > : "memory"); > Assert(_res == 1); // lock should be held by us > } > > 2) > Afaics it's also wrong for sparc on linux (and probably the BSDs) where > relaxed memory ordering is used. > > 3) > Also looks wrong on mips which needs a full memory barrier. You're mixing apples and oranges again. If the existing definitions aren't CPU barriers, they're already broken, and we should fix that and back-patch. On the other hand, the API contract change making these into compiler barriers is a master-only change. I think for purposes of this patch we should assume that the existing code is sufficient to prevent CPU reordering and just focus on fixing compiler ordering problems. Whatever needs to change on the CPU-reordering end of things is a separate commit. >> @@ -826,6 +845,9 @@ spin_delay(void) >> } >> #endif >> >> +#define S_UNLOCK(lock) \ >> + do { MemoryBarrier(); (*(lock)) = 0); } while (0) >> + >> #endif > > Hm. Won't this end up being a full memory barrier on x86-64 even though > a compiler barrier would suffice on x86? In my measurements on linux > x86-64 that has a prety hefty performance penalty on NUMA systems. Ah, crap, that should have been _ReadWriteBarrier(). > I think this should also mention that the fallback only works for > strongly ordered systems. I can revise the comments. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers