On 02/03/21 18:59, Valentin Schneider wrote: > On 03/02/21 17:23, Qais Yousef wrote: > > On 01/27/21 19:30, Valentin Schneider wrote: > >> Fiddling some more with a TLA+ model of set_cpus_allowed_ptr() & friends > >> unearthed one more outstanding issue. This doesn't even involve > >> migrate_disable(), but rather affinity changes and execution of the stopper > >> racing with each other. > >> > >> My own interpretation of the (lengthy) TLA+ splat (note the potential for > >> errors at each level) is: > >> > >> Initial conditions: > >> victim.cpus_mask = {CPU0, CPU1} > >> > >> CPU0 CPU1 > >> CPU<don't care> > >> > >> switch_to(victim) > >> > >> set_cpus_allowed(victim, {CPU1}) > >> kick CPU0 > >> migration_cpu_stop({.dest_cpu = CPU1}) > >> switch_to(stopper/0) > >> // e.g. CFS > >> load balance > >> > >> move_queued_task(CPU0, victim, CPU1); > >> switch_to(victim) > >> > >> set_cpus_allowed(victim, {CPU0}); > >> > >> task_rq_unlock(); > >> migration_cpu_stop(dest_cpu=CPU1) > > > > This migration stop is due to set_cpus_allowed(victim, {CPU1}), right? > > > > Right > > >> task_rq(p) != rq && pending > >> kick CPU1 migration_cpu_stop({.dest_cpu = CPU1}) > >> > >> switch_to(stopper/1) > >> migration_cpu_stop(dest_cpu=CPU1) > > > > And this migration stop is due to set_cpus_allowed(victim, {CPU0}), right? > > > > Nein! This is a retriggering of the "current" stopper (triggered by > set_cpus_allowed(victim, {CPU1})), see the tail of that > > else if (dest_cpu < 0 || pending) > > branch in migration_cpu_stop(), is what I'm trying to hint at with that > > task_rq(p) != rq && pending
Okay I see. But AFAIU, the work will be queued in order. So we should first handle the set_cpus_allowed_ptr(victim, {CPU0}) before the retrigger, no? So I see migration_cpu_stop() running 3 times 1. because of set_cpus_allowed(victim, {CPU1}) on CPU0 2. because of set_cpus_allowed(victim, {CPU0}) on CPU1 3. because of retrigger of '1' on CPU0 Thanks -- Qais Yousef