On Wed, Sep 30, 2015 at 06:44:32 +0200, Paolo Bonzini wrote: > I have a doubt about your patches for ll/sc emulation, that I hope you > can clarify. > > From 10000ft, both approaches rely on checking a flag during stores. > This is split between the TLB and the CPUState for Alvise's patches (in > order to exploit the existing fast-path checks), and entirely in the > radix tree for Emilio's. However, the idea is the same.
Not quite the same idea, see below. > Now, the patch are okay for serial emulation, but I am not sure if it's > possible to do lock-free ll/sc emulation, because there is a race. > > If we check the flag before the store, the race is as follows: > > CPU0 CPU1 > ------------------------------------------------------- > check flag > load locked: > set flag > load value (normal load on CPU) > store > store conditional (normal store on CPU) > > where the sc doesn't fail. For completeness, if we check it afterwards > (which would be possible with Emilio's approach, though not for the > TLB-based one): > > CPU0 CPU1 > ------------------------------------------------------ > load locked > set bit > load value (normal load on CPU) > store > store conditional (normal store on CPU) > check flag > > and again the sc doesn't fail. (snip) > Tell me I'm wrong. :) The difference between Alvise's implementation and what I submitted is that in the radix tree a cache line that has *ever* had an atomic op performed on it, is marked as "slow path" for the rest of the execution, meaning that *all* subsequent stores to said cache line will take the cache line entry's lock. This does not fix the race completely (e.g. you could have a store and an atomic op concurrently executing on a line that hasn't yet had an atomic op on it), but it significantly closes it. My guess is that it would be very hard to trigger by practical programs. E.