Hello, I just spent three hours studying TOT scheduler code, and I am leaning towards that this patch being no longer relevant (or being never relevant, really). If it ever fixed a real issue, I think it fixed it incorrectly. However, I would love to understand what exactly we were seeing and why this helped at all.
So a little background on this. I found this change I made that I failed to upstream from a few years ago, so my memory is a little rusty. This was definitely an issue on the setup we had, which was an arch/arm mach-msm (now known as mach-qcom, I believe) fork of 3.4.0 > WTH his SWFI and which arch has that? SWFI is "Stop and Wait For Interrupt". This appears to be a qcom specific term that they stopped using, so I apologize for that. It specifically refers to a CPU that is sitting in some sort of a "wait" or "wfi" instruction and not executing anything until an interrupt occurs. > Yeah, no, this is completely insane. Totally possible. In fact, I sort of hope that it is. > If the remote cpu is running the idle task, check_preempt_curr() should > very much wake it up, if its not the idle class, it should never get > there because there is now an actually runnable task on. > Please explain in detail what happens and/or provide traces of this > happening. We observed two relevant scenarios that are similar but distinct. Looking through top-of-tree code more carefully I don't see either as possible anymore, so the patch probably does not apply, but I would still defer to your expertise We have an SCHED_FIFO thread that falls asleep in a poll() waiting for an interrupt from a device driver. 1) It is the last thread on that CPU, the CPU enters idle and goes into a 'wfi' instruction because of cpuidle_idle_call(), and will not wake up until it receives an interrupt of some sort (like IPI). The interrupt occurs on a different CPU, the task is woken, but doesn't run until much later. 2) It is not the last thread on that CPU. The CPU switches to a lower priority task in a different class. The interrupt occurs on a different CPU, the task is woken, but doesn't preempt until much later. Unfortunately I don't have the ftrace logs from back then, and it would be hard to recreate the scenario. If I recall correctly, the problem seemed to occur because the task cpus_share_cache() was true, the task was never put on the wake_list, and correspondingly the scheduler_ipi() that was triggered by check_preempt_curr() did nothing, when it returned it would somehow wind up in schedule() where it did nothing because task was not TASK_RUNNING. Does that make any amount of sense at all? Sorry if this is a stupendous waste of time, Regards, Yuriy On Thu, Sep 22, 2016 at 12:34 AM, Peter Zijlstra <pet...@infradead.org> wrote: > On Tue, Sep 20, 2016 at 06:14:34PM -0700, Yuriy Romanenko wrote: >> From e9a304ae91fa2a4427bde7d3aea18296d0ebb27f Mon Sep 17 00:00:00 2001 >> From: Yuriy Romanenko <yromane...@carrietech.com> >> Date: Tue, 20 Sep 2016 17:47:28 -0700 >> Subject: [PATCH] Fix tasks being forgotten for a long time on SMP >> >> Observed occasional very high latency on an embedded SMP system between >> a task becoming ready to run and actually running with low system load, >> impacting interactive usage. >> >> A sched_wake() from CPUx on CPUy puts the task into the run queue and >> marks it runnable, but does not trigger an IPI to have the scheduler >> re-run on CPUy and see if the current task needs to get pre-empted and >> does not wake up CPUy if it is asleep. >> >> This is especially evident when a CPU is in SWFI and simply does not >> wake up even though it now has a runnable task. > > WTH his SWFI and which arch has that? > >> This is probably not the most elegant fix and definitely generates some >> unnecessary scheduler runs, but it's better for overall latency. >> >> Signed-off-by: Yuriy Romanenko <yromane...@carrietech.com> >> --- >> kernel/sched/core.c | 8 ++++++++ >> 1 file changed, 8 insertions(+) >> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index 860070f..7c334b7 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -1686,6 +1686,14 @@ static void ttwu_do_wakeup(struct rq *rq, struct >> task_struct *p, int wake_flags, >> trace_sched_wakeup(p); >> >> #ifdef CONFIG_SMP >> + /* >> + * If the task is not on the current cpu, there is a chance >> + * the other cpu might be asleep and will not get to our task >> + * for a really long time. Send an IPI to avoid that >> + */ >> + if (task_cpu(p) != smp_processor_id()) >> + smp_send_reschedule(task_cpu(p)); >> + > > Yeah, no, this is completely insane. > > If the remote cpu is running the idle task, check_preempt_curr() should > very much wake it up, if its not the idle class, it should never get > there because there is now an actually runnable task on. > > Please explain in detail what happens and/or provide traces of this > happening.