On 09/01, Peter Zijlstra wrote:
>
> On Mon, Aug 25, 2014 at 05:57:38PM +0200, Oleg Nesterov wrote:
> > Peter, do you remember another problem with TASK_DEAD we discussed recently?
> > (prev_state == TASK_DEAD detection in finish_task_switch() still looks 
> > racy).
>
> Uhm, right. That was somewhere on the todo list :-)
>
> > I am starting to think that perhaps we need something like below, what do
> > you all think?
>
> I'm thinking you lost the hunk that adds rq::dead :-), more comments
> below.

And "goto deactivate" should be moved down, after "switch_count"
initialization.

> > +   if (unlikely(rq->dead))
> > +           goto deactivate;
> > +
>
> Yeah, it would be best to not have to do that; ideally we would be able
> to maybe do both; set rq->dead and current->state == TASK_DEAD.

To avoid spin_unlock_wait() in do_exit(). But on a second thought this
can't work, please see below.

> > --- x/kernel/exit.c
> > +++ x/kernel/exit.c
> > @@ -815,25 +815,8 @@ void do_exit(long code)
> >             __this_cpu_add(dirty_throttle_leaks, tsk->nr_dirtied);
> >     exit_rcu();
> >
> > -   /*
> > -    * The setting of TASK_RUNNING by try_to_wake_up() may be delayed
> > -    * when the following two conditions become true.
> > -    *   - There is race condition of mmap_sem (It is acquired by
> > -    *     exit_mm()), and
> > -    *   - SMI occurs before setting TASK_RUNINNG.
> > -    *     (or hypervisor of virtual machine switches to other guest)
> > -    *  As a result, we may become TASK_RUNNING after becoming TASK_DEAD
> > -    *
> > -    * To avoid it, we have to wait for releasing tsk->pi_lock which
> > -    * is held by try_to_wake_up()
> > -    */
> > -   smp_mb();
> > -   raw_spin_unlock_wait(&tsk->pi_lock);
> > -
> > -   /* causes final put_task_struct in finish_task_switch(). */
> > -   tsk->state = TASK_DEAD;
> >     tsk->flags |= PF_NOFREEZE;      /* tell freezer to ignore us */
> > -   schedule();
> > +   exit_schedule();
> >     BUG();
> >     /* Avoid "noreturn function does return".  */
> >     for (;;)
>
> Yes, something like this might work fine..

Not really :/ Yes, rq->dead (or just "bool prev_dead") should obviously
solve the problem with ttwu() after the last schedule(). But only in a
sense that the dying task won't be activated.

However, the very fact that another CPU can look at this task_struct
means that we still need spin_unlock_wait(). If nothing else to ensure
that try_to_wake_up()->spin_unlock(pi_lock) won't write into the memory
we are are going to free.

So I think the comment in do exit should be updated too, and smp_mb()
should be moved under raw_spin_unlock_wait() but ...

But. If am right, doesn't this mean we that have even more problems with
postmortem wakeups??? Why ttwu() can't _start_ after spin_unlock_wait ?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to