On Wed, Jun 4, 2014 at 12:44 AM, Peter Zijlstra <pet...@infradead.org> wrote: > On Tue, Jun 03, 2014 at 05:29:50PM -0700, Andy Lutomirski wrote: >> Currently, the only real guarantee provided by the polling bit is >> that, if you hold rq->lock and the polling bit is set, then you can >> set need_resched to force a reschedule. >> >> The only reason the lock is needed is that the idle thread might not >> be running at all when setting its need_resched bit, and rq->lock >> keeps it pinned. >> >> This is easy to fix: just clear the polling bit before scheduling. >> Now the polling bit is only ever set when rq->curr == rq->idle. > > Yah, except of course: > > lkml.kernel.org/r/20131120162736.508462...@infradead.org
Wow, that code was ugly. Can you be persuaded to hold off on that patch until after this series is done? I think the cpu_startup_entry change will just muddy the waters for now. > > which I really need to rebase and post again :/ > > In any case, this is useful even with that, although then we really must > do something like: > > rcu_read_lock(); > if (!set_nr_if_polling(rq->curr)) > smp_send_reschedule(rq->cpu); > rcu_read_unlock(); > > Because there's other tasks than rq->idle which might be 'idle', joy! Is this really a problem? It's certainly the case that a non-idle (in the sense of != rq->idle) task can have the polling bit set, but so what? I'm perfectly happy to wake these ersatz idle things via IPI instead of via TIF_NEED_RESCHED. I think that all of the code that plays with the polling bit after all these patches are applied either holds rq->lock (which prevents the task from going away) or acts directly on rq->idle, which really has no business being deallocated. I think that the RCU solution is actually racy, too. Consider: If rq->curr is one some thermal sort-of-idle thing that has polling set, and no one has made those tasks respect the new polling semantics from this patch, then this could happen: CPU A: rcu_read_lock(); load rq->curr CPU B: deschedule rq->curr and schedule somthing else CPU A: set_nr_if_polling(rq->curr) returns true no IPI sent rcu_read_unlock() RCU prevents us from a use-after-free, but we can still fail to send the required IPI. Using rq->idle in combination with this patch should prevent that race. It's possible for there to be weird timing things that cause unnecessary IPIs to be sent, but my workload (which, I suspect, is unusually heavy on remote wakeups from idle) sees something like 99% of them avoiding the IPI. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/