On Thu, 2013-06-20 at 09:53 +0200, Paolo Bonzini wrote: > Il 19/06/2013 22:25, Torvald Riegel ha scritto: > > On Wed, 2013-06-19 at 17:14 +0200, Paolo Bonzini wrote: > >> (1) I don't care about relaxed RMW ops (loads/stores occur in hot paths, > >> but RMW shouldn't be that bad. I don't care if reference counting is a > >> little slower than it could be, for example); > > > > I doubt relaxed RMW ops are sufficient even for reference counting. > > They are enough on the increment side, or so says boost... > > http://www.chaoticmind.net/~hcb/projects/boost.atomic/doc/atomic/usage_examples.html#boost_atomic.usage_examples.example_reference_counters
Oh, right, for this kind of refcounting it's okay on the increment side. But the explanation on the page you referenced isn't correct I think: "...passing an existing reference from one thread to another must already provide any required synchronization." is not sufficient because that would just create a happens-before from the reference-passing source to the other thread that gets the reference. The relaxed RMW increment works because of the modification order being consistent with happens-before (see 6.17 in the model), so we can never reach a value of zero for the refcount once we incremented the reference even with a relaxed RMW. IMO, the acquire fence in the release is not 100% correct according to my understanding of the memory model: if (x->refcount_.fetch_sub(1, boost::memory_order_release) == 1) { boost::atomic_thread_fence(boost::memory_order_acquire); delete x; } "delete x" is unconditional, and I guess not specified to read all of what x points to. The acquire fence would only result in a synchronizes-with edge if there is a reads-from edge between the release store and a load that reads the stores value and is sequenced after the acquire fence. Thus, I think the compiler could be allowed to reorder the fence after the delete in some case (e.g., when there's no destructor called or it doesn't have any conditionals in it), but I guess it's likely to not ever try to do that in practice. Regarding the hardware fences that this maps, I suppose this just happens to work fine on most architectures, perhaps just because "delete" will access some of the memory when releasing the memory. Changing the release to the following would be correct, and probably little additional overhead: if (x->refcount_.fetch_sub(1, boost::memory_order_release) == 1) { if (x->refcount.load(boost::memory_order_acquire) == 0) delete x; } That makes delete conditional and thus having to happen after we ensured the happens before edge that we need. > >> By contrast, Java volatile semantics are easily converted to a sequence > >> of relaxed loads, relaxed stores, and acq/rel/sc fences. > > > > The same holds for C11/C++11. If you look at either the standard or the > > Batty model, you'll see that for every pair like store(rel)--load(acq), > > there is also store(rel)--fence(acq)+load(relaxed), > > store(relaxed)+fence(rel)--fence(acq)+load(relaxed), etc. defined, > > giving the same semantics. Likewise for SC. > > Do you have a pointer to that? It would help. In the full model (n3132.pdf), see 6.12 (which then references which parts in the standard lead to those parts of the model). SC fences are also acquire and release fences, so this covers synchronizes-with via reads-from too. 6.17 has more constraints on SC fences and modification order, so we get something similar for the ordering of just writes. Torvald