kthreadd, the creator of other kernel threads, runs as a normal
priority task. This is a potential for priority inversion when a task
wants to spawn a high-priority kernel thread. A middle priority
SCHED_FIFO task can block kthreadd's execution indefinitely and thus
prevent the timely creation of the high-priority kernel thread.
    
This causes a practical problem. When a runaway real-time task is
eating 100% CPU and we attempt to put the CPU offline, sometimes we
block while waiting for the creation of the highest-priority
"kstopmachine" thread. 

The fix is to run kthreadd with the highest possible SCHED_FIFO
priority. Its children must still run as slightly negatively reniced
SCHED_NORMAL tasks.
    
Signed-off-by: Michal Schmidt <[EMAIL PROTECTED]>

diff --git a/kernel/kthread.c b/kernel/kthread.c
index dcfe724..a7ce932 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -94,10 +94,17 @@ static void create_kthread(struct kthread_create_info 
*create)
        if (pid < 0) {
                create->result = ERR_PTR(pid);
        } else {
+               struct sched_param param = { .sched_priority = 0 };
                wait_for_completion(&create->started);
                read_lock(&tasklist_lock);
                create->result = find_task_by_pid(pid);
                read_unlock(&tasklist_lock);
+               /*
+                * We (kthreadd) run with SCHED_FIFO, but we don't want
+                * the kthreads we create to have it too by default.
+                */
+               sched_setscheduler(create->result, SCHED_NORMAL, &param);
+               set_user_nice(create->result, -5);
        }
        complete(&create->done);
 }
@@ -217,11 +224,12 @@ EXPORT_SYMBOL(kthread_stop);
 int kthreadd(void *unused)
 {
        struct task_struct *tsk = current;
+       struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 };
 
        /* Setup a clean context for our children to inherit. */
        set_task_comm(tsk, "kthreadd");
        ignore_signals(tsk);
-       set_user_nice(tsk, -5);
+       sched_setscheduler(tsk, SCHED_FIFO, &param);
        set_cpus_allowed(tsk, CPU_MASK_ALL);
 
        current->flags |= PF_NOFREEZE;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to