Hi, I wrote patches which fixes the problem regarding stop_machine_run() and cpu hotplug.
stop_machine_run() can't accomplish its work if there is a real time process on the CPU on which "kstopmachine" kernel thread is running. For more details, please refer to the following thread: http://lkml.org/lkml/2007/5/7/41 TEST RESULT: I did the following test on my ia64 box. It works fine: ------------------------------------------------------------------------------- # cat loop.sh while true ; do : done ------------------------------------------------------------------------------- # cat test_stop_machine_run_with_rt_proc.sh #!/bin/sh taskset 0x2 chrt -f 98 ./loop.sh & PID=${!} echo 0 >/sys/devices/system/cpu/cpu1/online kill ${PID} echo 1 >/sys/devices/system/cpu/cpu1/online ------------------------------------------------------------------------------- To do the test, just issue the following command. # ./test_stop_machine_run_with_rt_proc.sh # TODO list ========= Some more works are needed. See the TODO list. - If there is a SCHED_FIFO process having max priority, stop_machine_run doesn't work because kstopmachine doesn't be scheduled. -> I'm trying to fix this problem, see the followings: http://lkml.org/lkml/2007/5/8/620 I would submit RFC patches in 1 weeks. - On CPU hot removal, if that RT process is migrated to the CPU on which stop_machine_run() is running, stop_machine_run can't continue to run. -> I'm trying to fix this problem. - Other `stop_machine_run() with FIFO` problem might exist. -> I've not research other subsystem using stop_machine_run yet. # FYI, I'll be offline for 2 days. Thanks, Satoru --- Fix stop_machine_run() problem with naughty real time process stop_machine_run() does its work on "kstopmachine" thread having max priority. However that thread get such priority after woken up. Therefore, in the following case ... - "kstopmachine" try to run on CPU1 - There is a real time process which doesn't relinquish CPU time voluntary on CPU1 ... "kstopmachine" can't start to run and the CPU on which stop_machine_run() is runing hangs up. To fix this problem, call sched_setscheduler() before waking up that thread. Signed-off-by: Satoru Takeuchi <[EMAIL PROTECTED]> Index: linux-2.6.21/kernel/stop_machine.c =================================================================== --- linux-2.6.21.orig/kernel/stop_machine.c 2007-05-11 13:45:34.000000000 +0900 +++ linux-2.6.21/kernel/stop_machine.c 2007-05-11 14:49:17.000000000 +0900 @@ -89,10 +89,6 @@ static void stopmachine_set_state(enum s static int stop_machine(void) { int i, ret = 0; - struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 }; - - /* One high-prio thread per cpu. We'll do this one. */ - sched_setscheduler(current, SCHED_FIFO, ¶m); atomic_set(&stopmachine_thread_ack, 0); stopmachine_num_threads = 0; @@ -184,6 +180,10 @@ struct task_struct *__stop_machine_run(i p = kthread_create(do_stop, &smdata, "kstopmachine"); if (!IS_ERR(p)) { + struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 }; + + /* One high-prio thread per cpu. We'll do this one. */ + sched_setscheduler(p, SCHED_FIFO, ¶m); kthread_bind(p, cpu); wake_up_process(p); wait_for_completion(&smdata.done); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/