Excerpts from Nicholas Piggin's message of June 29, 2020 1:28 pm: > Excerpts from Linus Torvalds's message of June 28, 2020 3:39 pm: >> On Fri, Jun 26, 2020 at 8:43 AM Peter Zijlstra <pet...@infradead.org> wrote: >>> >>> I ended up with something like the below.. but it is too warm to think >>> properly. >>> >>> I don't particularly like WQ_FLAG_PAGEWAITERS, but I liked open-coding >>> all that even less. >> >> Ugh. >> >> I think I have a much simpler approach, actually. >> >> So the *problem* is purely around that >> >> if (behavior == EXCLUSIVE) { >> if (!test_and_set_bit_lock(bit_nr, &page->flags)) >> break; >> } else .. >> >> and in particular it is purely *after* that test_and_set_bit_lock() >> case. We have two cases: >> >> (a) *If* we get the lock there, we're good, and all done, and we have >> the lock. We don't care about any other wakeups, because they'll be >> stale anyway (the thing that released the lock that we just took) and >> because we got the lock, no other exclusive waiters should be woken up >> anyway (and we will in turn wake up any waiters when we release it) >> >> (b) we did *not* get the lock, because somebody else got it and is >> about to immediately unlock again. And that _future_ wakeup that they >> send might get lost because it might end up targeting us (but we might >> just exit and not care). >> >> Agreed? >> >> So the only case that really matters is that we didn't get the lock, >> but we must *not* be woken up afterwards. >> >> So how about the attached trivial two-liner? We solve the problem by >> simply marking ourselves TASK_RUNNING, which means that we won't be >> counted as an exclusive wakeup. >> >> Ok, so the "one" line to do that is that is actually two lines: >> >> __set_current_state(TASK_RUNNING); >> smp_mb__before_atomic(); >> >> and there's four lines of comments to go with it, but it really is >> very simple: if we do that before we do the test_and_set_bit_lock(), >> no wakeups will be lost, because we won't be sleeping for that wakeup. >> >> I'm not entirely happy about that "smp_mb__before_atomic()". I think >> it's right in practice that test_and_set_bit_lock() (when it actually >> does a write) has at LEAST atomic seqmantics, so I think it's good. >> But it's not pretty. >> >> But I don't want to use a heavy >> >> set_current_state(TASK_RUNNING); >> if (!test_and_set_bit_lock(bit_nr, &page->flags)) .. >> >> sequence, because at least on x86, that test_and_set_bit_lock() >> already has a memory barrier in it, so the extra memory barrier from >> set_current_state() is all kinds of pointless. >> >> Hmm? > > Wow good catch. Does bit_is_set even have to be true? If it's woken due > to a signal, it may still be on the waitqueue right?
No, ignore this part (which you explained well it was just a thinko, and your patch of course would not have worked if this was the case): the exclusive wake up doesn't get lost if schedule() was called because state goes back to running regardless of what woke it. I still prefer if it can be changed to the below fix though. > works, but it looks like a pretty standard variant of "don't lose > wakeups" bug. > > prepare_to_wait_event() has a pretty good pattern (and comment), I would > favour using that (test the signal when inserting on the waitqueue). > > @@ -1133,6 +1133,15 @@ static inline int > wait_on_page_bit_common(wait_queue_head_t *q, > for (;;) { > spin_lock_irq(&q->lock); > > + if (signal_pending_state(state, current)) { > + /* Must not lose an exclusive wake up, see > + * prepare_to_wait_event comment */ > + list_del_init(&wait->entry); > + spin_unlock_irq(&q->lock);> + > ret = -EINTR; > + break; > + } > + > if (likely(list_empty(&wait->entry))) { > __add_wait_queue_entry_tail(q, wait); > SetPageWaiters(page); > @@ -1157,11 +1166,6 @@ static inline int > wait_on_page_bit_common(wait_queue_head_t *q, > break; > } > > - if (signal_pending_state(state, current)) { > - ret = -EINTR; > - break; > - } > - > if (behavior == DROP) { > /* > * We can no longer safely access page->flags: > > - mutex_lock_common does the signal check under its wait queue lock. > > - rwsem looks like it does it backward and checks if it was woken if it > got a signal and tries to handle it that way (hopefully okay, I prefer > the standard pattern). > > - futex unqueues and tests for wakeup before testing signal. > > Etc. And it's not even exclusive to signals of course, those are just > the typical asynchronous thing that would wake us without removing from > the wait queue. Bit of a shame there is no standard pattern to follow > though. > > I wonder how you could improve that. finish_wait could WARN_ON an > exclusive waiter being found on the queue? > > @@ -377,6 +377,7 @@ void finish_wait(struct wait_queue_head *wq_head, struct > wait_queue_entry *wq_en > * the list). > */ > if (!list_empty_careful(&wq_entry->entry)) { > + WARN_ON(wq_entry->flags & WQ_FLAG_EXCLUSIVE); > spin_lock_irqsave(&wq_head->lock, flags); > list_del_init(&wq_entry->entry); > spin_unlock_irqrestore(&wq_head->lock, flags); > > That doesn't catch a limited count of wake ups, maybe if you passed in > success value to finish_wait, it could warn in case a failure has > WQ_FLAG_WOKEN. That doesn't help things that invent their own waitqueues > mind you. I wonder if we could do some kind of generic annotations for > anyone implementing wait queues to call, which could have debug checks > implemented? > > Thanks, > Nick > >