> CPU #0 CPU #1 > > x <- load-locked(A) > y <- load(B) > x+1 -> store(A) > y+1 -> store(B) > x -> store(A) > f(x,y) -> store-cond(A) > > Unless I made a mistake, the above cannot store f(x,y+1) into A, for > any interleaving (assume strongly ordered memory or barriers), on > machines where any store by another CPU breaks the condition. But on > machines which implement store-cond by atomic-cmpxchg using the > load-locked value, f(x,y+1) can be stored.
I investigated this fairly closely for the initial implementation. Your key assumption is that you have strict ordering between CPUs. While it is possible to construct theoretical failure cases there this is observable. In practice you end up falling fall foul of architectural limitations on the use of ll/sc. Your example fails to describe how x and y are transferred from CPU0 to CPU1. I'd regard any code that has a barrier between a load-locked and store- conditional with extreme suspicion. For example PPC states that barrier instructions cause a CPU to loose the lock[1]. Paul [1] We currently get this wrong in QEMU.