migration_pending completion

Qais Yousef Thu, 04 Feb 2021 07:32:35 -0800

On 02/03/21 18:59, Valentin Schneider wrote:
> On 03/02/21 17:23, Qais Yousef wrote:
> > On 01/27/21 19:30, Valentin Schneider wrote:
> >> Fiddling some more with a TLA+ model of set_cpus_allowed_ptr() & friends
> >> unearthed one more outstanding issue. This doesn't even involve
> >> migrate_disable(), but rather affinity changes and execution of the stopper
> >> racing with each other.
> >> 
> >> My own interpretation of the (lengthy) TLA+ splat (note the potential for
> >> errors at each level) is:
> >> 
> >>   Initial conditions:
> >>     victim.cpus_mask = {CPU0, CPU1}
> >> 
> >>   CPU0                             CPU1                             
> >> CPU<don't care>
> >> 
> >>   switch_to(victim)
> >>                                                                
> >> set_cpus_allowed(victim, {CPU1})
> >>                                                                  kick CPU0 
> >> migration_cpu_stop({.dest_cpu = CPU1})
> >>   switch_to(stopper/0)
> >>                                                                // e.g. CFS 
> >> load balance
> >>                                                                
> >> move_queued_task(CPU0, victim, CPU1);
> >>                               switch_to(victim)
> >>                                                                
> >> set_cpus_allowed(victim, {CPU0});
> >>                                                                  
> >> task_rq_unlock();
> >>   migration_cpu_stop(dest_cpu=CPU1)
> >
> > This migration stop is due to set_cpus_allowed(victim, {CPU1}), right?
> >
> 
> Right
> 
> >>     task_rq(p) != rq && pending
> >>       kick CPU1 migration_cpu_stop({.dest_cpu = CPU1})
> >> 
> >>                               switch_to(stopper/1)
> >>                               migration_cpu_stop(dest_cpu=CPU1)
> >
> > And this migration stop is due to set_cpus_allowed(victim, {CPU0}), right?
> >
> 
> Nein! This is a retriggering of the "current" stopper (triggered by
> set_cpus_allowed(victim, {CPU1})), see the tail of that
> 
>   else if (dest_cpu < 0 || pending)
> 
> branch in migration_cpu_stop(), is what I'm trying to hint at with that 
> 
> task_rq(p) != rq && pending


Okay I see. But AFAIU, the work will be queued in order. So we should first
handle the set_cpus_allowed_ptr(victim, {CPU0}) before the retrigger, no?

So I see migration_cpu_stop() running 3 times

        1. because of set_cpus_allowed(victim, {CPU1}) on CPU0
        2. because of set_cpus_allowed(victim, {CPU0}) on CPU1
        3. because of retrigger of '1' on CPU0

Thanks

--
Qais Yousef

Re: [RFC PATCH] sched/core: Fix premature p->migration_pending completion

Reply via email to