On 07/14, Mathieu Desnoyers wrote: > > @@ -4891,10 +4948,42 @@ static int migration_thread(void *data) > list_del_init(head->next); > > spin_unlock(&rq->lock); > - __migrate_task(req->task, cpu, req->dest_cpu); > + migrated = __migrate_task(req->task, cpu, req->dest_cpu); > local_irq_enable(); > - > - complete(&req->done); > + if (!migrated) { > + /* > + * If the process has not been migrated, let it run > + * until it reaches a migration_check() so it can > + * wake us up. > + */ > + spin_lock_irq(&rq->lock); > + head = &rq->migration_queue; > + list_add(&req->list, head); > + if (req->task->se.on_rq > + || !task_migrate_count(req->task)) { > + /* > + * The process is on the runqueue, it could > + * exit its critical section at any moment, > + * don't race with it and retry actively. > + * Also, if the thread is not on the runqueue > + * and has a zero migration count > + * (__migrate_task failed because cpus allowed > + * changed), just retry. > + */ > + spin_unlock_irq(&rq->lock); > + continue;
Again, this can deadlock. migration_thread() is SCHED_FIFO, and it shares the same CPU with req->task. We are doing a busy-wait loop, req->task may have no chance to finish its critical section. > + } > + set_tsk_thread_flag(req->task, TIF_NEED_MIGRATE); > + set_current_state(TASK_INTERRUPTIBLE); > + spin_unlock_irq(&rq->lock); > + /* > + * Wait for the process currently in its critical > + * section. > + */ > + wake_up_process(req->task); > + schedule(); As Peter pointed out, we hold up all other migration requests. And worse, this can hang forever (until another req comes). do_check_migrate() can wake_up() the wrong rq. set_current_state(TASK_INTERRUPTIBLE) is bogus, we can miss a wakeup from (say) sched_migrate_task(). Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/