On 09/01, Peter Zijlstra wrote: > > On Mon, Aug 25, 2014 at 05:57:38PM +0200, Oleg Nesterov wrote: > > Peter, do you remember another problem with TASK_DEAD we discussed recently? > > (prev_state == TASK_DEAD detection in finish_task_switch() still looks > > racy). > > Uhm, right. That was somewhere on the todo list :-) > > > I am starting to think that perhaps we need something like below, what do > > you all think? > > I'm thinking you lost the hunk that adds rq::dead :-), more comments > below.
And "goto deactivate" should be moved down, after "switch_count" initialization. > > + if (unlikely(rq->dead)) > > + goto deactivate; > > + > > Yeah, it would be best to not have to do that; ideally we would be able > to maybe do both; set rq->dead and current->state == TASK_DEAD. To avoid spin_unlock_wait() in do_exit(). But on a second thought this can't work, please see below. > > --- x/kernel/exit.c > > +++ x/kernel/exit.c > > @@ -815,25 +815,8 @@ void do_exit(long code) > > __this_cpu_add(dirty_throttle_leaks, tsk->nr_dirtied); > > exit_rcu(); > > > > - /* > > - * The setting of TASK_RUNNING by try_to_wake_up() may be delayed > > - * when the following two conditions become true. > > - * - There is race condition of mmap_sem (It is acquired by > > - * exit_mm()), and > > - * - SMI occurs before setting TASK_RUNINNG. > > - * (or hypervisor of virtual machine switches to other guest) > > - * As a result, we may become TASK_RUNNING after becoming TASK_DEAD > > - * > > - * To avoid it, we have to wait for releasing tsk->pi_lock which > > - * is held by try_to_wake_up() > > - */ > > - smp_mb(); > > - raw_spin_unlock_wait(&tsk->pi_lock); > > - > > - /* causes final put_task_struct in finish_task_switch(). */ > > - tsk->state = TASK_DEAD; > > tsk->flags |= PF_NOFREEZE; /* tell freezer to ignore us */ > > - schedule(); > > + exit_schedule(); > > BUG(); > > /* Avoid "noreturn function does return". */ > > for (;;) > > Yes, something like this might work fine.. Not really :/ Yes, rq->dead (or just "bool prev_dead") should obviously solve the problem with ttwu() after the last schedule(). But only in a sense that the dying task won't be activated. However, the very fact that another CPU can look at this task_struct means that we still need spin_unlock_wait(). If nothing else to ensure that try_to_wake_up()->spin_unlock(pi_lock) won't write into the memory we are are going to free. So I think the comment in do exit should be updated too, and smp_mb() should be moved under raw_spin_unlock_wait() but ... But. If am right, doesn't this mean we that have even more problems with postmortem wakeups??? Why ttwu() can't _start_ after spin_unlock_wait ? Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/