On Tue, Jun 13, 2017 at 09:15:47PM -0400, Steven Rostedt wrote: > On Tue, 13 Jun 2017 16:42:05 -0700 > "Paul E. McKenney" <paul...@linux.vnet.ibm.com> wrote: > > > On Tue, Jun 13, 2017 at 07:23:08PM -0400, Steven Rostedt wrote: > > > On Fri, 9 Jun 2017 05:45:54 -0700 > > > "Paul E. McKenney" <paul...@linux.vnet.ibm.com> wrote: > > > > > > > On Fri, Jun 09, 2017 at 09:19:57AM +0200, Peter Zijlstra wrote: > > > > > On Thu, Jun 08, 2017 at 08:25:46PM -0700, Krister Johansen wrote: > > > > > > The behavior of swake_up() differs from that of wake_up(), and from > > > > > > the > > > > > > swake_up() that came from RT linux. A memory barrier, or some other > > > > > > synchronization, is needed prior to a swake_up so that the waiter > > > > > > sees > > > > > > the condition set by the waker, and so that the waker does not see > > > > > > an > > > > > > empty wait list. > > > > > > > > > > Urgh.. let me stare at that. But it sounds like the wrong solution > > > > > since > > > > > we wanted to keep the wait and swait APIs as close as possible. > > > > > > > > But don't they both need some sort of ordering, be it memory barriers or > > > > locking, to handle the case where the wait/swait doesn't actually sleep? > > > > > > > > > > Looking at an RCU example, and assuming that ordering can move around > > > within a spin lock, and that changes can leak into a spin lock region > > > from both before and after. Could we have: > > > > > > (looking at __call_rcu_core() and rcu_gp_kthread() > > > > > > CPU0 CPU1 > > > ---- ---- > > > __call_rcu_core() { > > > > > > spin_lock(rnp_root) > > > need_wake = __rcu_start_gp() { > > > rcu_start_gp_advanced() { > > > gp_flags = FLAG_INIT > > > } > > > } > > > > > > rcu_gp_kthread() { > > > swait_event_interruptible(wq, > > > gp_flags & FLAG_INIT) { > > > spin_lock(q->lock) > > > > > > *fetch wq->task_list here! * > > > > > > list_add(wq->task_list, q->task_list) > > > spin_unlock(q->lock); > > > > > > *fetch old value of gp_flags here * > > > > Both reads of ->gp_flags are READ_ONCE(), so having seen the new value > > in swait_event_interruptible(), this task/CPU cannot see the old value > > from some later access. You have to have accesses to two different > > variables to require a memory barrier (at least assuming consistent use > > of READ_ONCE(), WRITE_ONCE(), or equivalent). > > If I'm not mistaken, READ_ONCE() and WRITE_ONCE() is just volatiles > added. The compiler may not leak or move the the fetches, but what > about the hardware?
The hardware cannot move the references if both references are in the same thread and to the same variable, which is the case with ->gp_flags. > A spin_lock() only needs to make sure what is after it does not leak > before it. > > A spin_unlock() only needs to make sure what is before it must not leak > after it. Both true, with the exception of a spin_is_locked() to that same lock variable, which cannot be reordered with either spin_lock() or spin_unlock() in either direction. > From my understandings of reading memory-barrier.txt, there's no > guarantees that the hardware doesn't let reads or writes that happen > before a spin_lock() happen after it. Nor does it guarantee that reads > or writes that happen after a spin_unlock() doesn't happen before it. > > The spin_locks only need to protect the inside of the critical section, > not the outside of it leaking in. Again, quite true. > I'm looking at this in particular: > > ==== > (1) ACQUIRE operation implication: > > Memory operations issued after the ACQUIRE will be completed after the > ACQUIRE operation has completed. > > Memory operations issued before the ACQUIRE may be completed after > the ACQUIRE operation has completed. An smp_mb__before_spinlock(), > combined with a following ACQUIRE, orders prior stores against > subsequent loads and stores. Note that this is weaker than smp_mb()! > The smp_mb__before_spinlock() primitive is free on many architectures. > > (2) RELEASE operation implication: > > Memory operations issued before the RELEASE will be completed before the > RELEASE operation has completed. > > Memory operations issued after the RELEASE may be completed before the > RELEASE operation has completed. > ==== And here is the part you also need to look at: ==== (*) Overlapping loads and stores within a particular CPU will appear to be ordered within that CPU. This means that for: a = READ_ONCE(*X); WRITE_ONCE(*X, b); the CPU will only issue the following sequence of memory operations: a = LOAD *X, STORE *X = b And for: WRITE_ONCE(*X, c); d = READ_ONCE(*X); the CPU will only issue: STORE *X = c, d = LOAD *X (Loads and stores overlap if they are targeted at overlapping pieces of memory). ==== This section needs some help -- the actual guarantee is stronger, that all CPUs will agree on the order of volatile same-sized aligned accesses to a given single location. So if a previous READ_ONCE() sees the new value, any subsequent READ_ONCE() from that same variable is guaranteed to also see the new value (or some later value). Does that help, or am I missing something here? Thanx, Paul > -- Steve > > > > > > > spin_unlock(rnp_root) > > > > > > rcu_gp_kthread_wake() { > > > swake_up(wq) { > > > swait_active(wq) { > > > list_empty(wq->task_list) > > > > > > } * return false * > > > > > > if (condition) * false * > > > schedule(); > > > > > > Looks like a memory barrier is missing. Perhaps we should slap on into > > > swait_active()? I don't think it is wise to let users add there own, as > > > I think we currently have bugs now. > > > > I -know- I have bugs now. ;-) > > > > But I don't believe this is one of them. Or am I getting confused? > > > > Thanx, Paul >