----- On Jan 16, 2017, at 6:50 PM, Linus Torvalds torva...@linux-foundation.org wrote:
> Why not just make the write be a "smp_store_release()", and the read > be a "smp_load_acquire()". That guarantees a certain amount of > ordering. The only amount that I suspect makes sense, in fact. > > But it's not clear what the problem is, so.. If we only use a smp_store_release() for the store to membarrier_exped, the "unregister" (setting back to 0) would be OK, but not the "register", as the following scenario shows: Initial values: A = B = 0 CPU 0 | CPU 1 (no-hz full) | | membarrier(REGISTER_EXPEDITED) | (write barrier implied by store-release) | set t->membarrier_exped = 1 (store-release imply memory barrier before store) | store B = 1 | barrier() (compiler-level barrier) | store A = 1 x = load A | membarrier(CMD_SHARED) | smp_mb() [1] | iter. on nohz cpus | if iter_t->membarrier_exped == 0 | (skip) | smp_mb() [2] | y = load B | Expect: if x == 1, then y == 1 CPU 0 can observe A == 1, membarrier_exped == 0, and B == 0, because there is no memory barrier between store to membarrier_exped and store to A on CPU 1. What we seem to need on the registration/unregistration side is store-acquire for registration, and store-release for unregistration. This pairs with a load of membarrier_exped that has both acquire and release barriers ([1] and [2] above). > I'm not seeing how a regular fork() could possibly ever make sense to > have the membarrier state in the newly forked process. Not that > "fork()" is really well-defined for within a single thread anyway (it > actually is as far as Linux is concerned, but not in POSIX, afaik). > > So if there is no major reason for it, I would strongly suggest that > _if_ all this makes sense in the first place, the membarrier thing > should just be cleared unconditionally both for exec and for > clone/fork. That's fine with me! Thanks, Mathieu > > Linus -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com