On Mon, Feb 16, 2015 at 03:38:34PM +0300, Kirill Tkhai wrote:
> We shouldn't enqueue migrating tasks. Please, try this one instead ;)

Ha, we should amend that task-rq-lock loop for that. See below.

I've not yet tested; going to try and reconstruct a .config that
triggers the oops.

---
Subject: sched/dl: Prevent enqueue of a sleeping task in dl_task_timer()
From: Kirill Tkhai <tk...@yandex.ru>
Date: Mon, 16 Feb 2015 15:38:34 +0300

A deadline task may be throttled and dequeued at the same time.
This happens, when it becomes throttled in schedule(), which
is called to go to sleep:

current->state = TASK_INTERRUPTIBLE;
schedule()
    deactivate_task()
        dequeue_task_dl()
            update_curr_dl()
                start_dl_timer()
            __dequeue_task_dl()
    prev->on_rq = 0;

Later the timer fires, but the task is still dequeued:

dl_task_timer()
    enqueue_task_dl() /* queues on dl_rq; on_rq remains 0 */

Someone wakes it up:

try_to_wake_up()

    enqueue_dl_entity()
        BUG_ON(on_dl_rq())

Patch fixes this problem, it prevents queueing !on_rq tasks
on dl_rq.

Also teach the rq-lock loop about TASK_ON_RQ_MIGRATING as per
cca26e8009d1 ("sched: Teach scheduler to understand
TASK_ON_RQ_MIGRATING state").

Fixes: 1019a359d3dc ("sched/deadline: Fix stale yield state")
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Juri Lelli <juri.le...@arm.com>
Reported-by: Fengguang Wu <fengguang...@intel.com>
Signed-off-by: Kirill Tkhai <ktk...@parallels.com>
[peterz: Wrote comment; fixed task-rq-lock loop]
Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org>
Link: http://lkml.kernel.org/r/1374601424090...@web4j.yandex.ru
---
 kernel/sched/deadline.c |   25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -515,9 +515,8 @@ static enum hrtimer_restart dl_task_time
 again:
        rq = task_rq(p);
        raw_spin_lock(&rq->lock);
-
-       if (rq != task_rq(p)) {
-               /* Task was moved, retrying. */
+       if (rq != task_rq(p) || task_on_rq_migrating(p)) {
+               /* Task was move{d,ing}, retry */
                raw_spin_unlock(&rq->lock);
                goto again;
        }
@@ -541,6 +540,26 @@ static enum hrtimer_restart dl_task_time
 
        sched_clock_tick();
        update_rq_clock(rq);
+
+       /*
+        * If the throttle happened during sched-out; like:
+        *
+        *   schedule()
+        *     deactivate_task()
+        *       dequeue_task_dl()
+        *         update_curr_dl()
+        *           start_dl_timer()
+        *         __dequeue_task_dl()
+        *     prev->on_rq = 0;
+        *
+        * We can be both throttled and !queued. Replenish the counter
+        * but do not enqueue -- wait for our wakeup to do that.
+        */
+       if (!task_on_rq_queued(p)) {
+               replenish_dl_entity(dl_se, dl_se);
+               goto unlock;
+       }
+
        enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
        if (dl_task(rq->curr))
                check_preempt_curr_dl(rq, p, 0);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to