Re: [Qemu-devel] Java volatile vs. C11 seq_cst (was Re: [PATCH v2 1/2] add a header file for atomic operations)

Torvald Riegel Sat, 22 Jun 2013 03:57:25 -0700

On Thu, 2013-06-20 at 09:53 +0200, Paolo Bonzini wrote:
> Il 19/06/2013 22:25, Torvald Riegel ha scritto:
> > On Wed, 2013-06-19 at 17:14 +0200, Paolo Bonzini wrote:
> >> (1) I don't care about relaxed RMW ops (loads/stores occur in hot paths,
> >> but RMW shouldn't be that bad.  I don't care if reference counting is a
> >> little slower than it could be, for example);
> > 
> > I doubt relaxed RMW ops are sufficient even for reference counting.
> 
> They are enough on the increment side, or so says boost...
> 
> http://www.chaoticmind.net/~hcb/projects/boost.atomic/doc/atomic/usage_examples.html#boost_atomic.usage_examples.example_reference_counters


Oh, right, for this kind of refcounting it's okay on the increment side.
But the explanation on the page you referenced isn't correct I think:
"...passing an existing reference from one thread to another must
already provide any required synchronization." is not sufficient because
that would just create a happens-before from the reference-passing
source to the other thread that gets the reference.
The relaxed RMW increment works because of the modification order being
consistent with happens-before (see 6.17 in the model), so we can never
reach a value of zero for the refcount once we incremented the reference
even with a relaxed RMW.

IMO, the acquire fence in the release is not 100% correct according to
my understanding of the memory model:
    if (x->refcount_.fetch_sub(1, boost::memory_order_release) == 1) {
      boost::atomic_thread_fence(boost::memory_order_acquire);
      delete x;
    }
"delete x" is unconditional, and I guess not specified to read all of
what x points to.  The acquire fence would only result in a
synchronizes-with edge if there is a reads-from edge between the release
store and a load that reads the stores value and is sequenced after the
acquire fence.
Thus, I think the compiler could be allowed to reorder the fence after
the delete in some case (e.g., when there's no destructor called or it
doesn't have any conditionals in it), but I guess it's likely to not
ever try to do that in practice.
Regarding the hardware fences that this maps, I suppose this just
happens to work fine on most architectures, perhaps just because
"delete" will access some of the memory when releasing the memory.

Changing the release to the following would be correct, and probably
little additional overhead:
    if (x->refcount_.fetch_sub(1, boost::memory_order_release) == 1) { 
      if (x->refcount.load(boost::memory_order_acquire) == 0)
        delete x;
    }

That makes delete conditional and thus having to happen after we ensured
the happens before edge that we need.

> >> By contrast, Java volatile semantics are easily converted to a sequence
> >> of relaxed loads, relaxed stores, and acq/rel/sc fences.
> > 
> > The same holds for C11/C++11.  If you look at either the standard or the
> > Batty model, you'll see that for every pair like store(rel)--load(acq),
> > there is also store(rel)--fence(acq)+load(relaxed),
> > store(relaxed)+fence(rel)--fence(acq)+load(relaxed), etc. defined,
> > giving the same semantics.  Likewise for SC.
> 
> Do you have a pointer to that?  It would help.

In the full model (n3132.pdf), see 6.12 (which then references which
parts in the standard lead to those parts of the model).  SC fences are
also acquire and release fences, so this covers synchronizes-with via
reads-from too.  6.17 has more constraints on SC fences and modification
order, so we get something similar for the ordering of just writes.


Torvald

Re: [Qemu-devel] Java volatile vs. C11 seq_cst (was Re: [PATCH v2 1/2] add a header file for atomic operations)

Reply via email to