cpu stopper threads and load balancing leads to deadlock

Matt Fleming Tue, 17 Apr 2018 07:22:09 -0700

Hi guys,

We've seen a bug in one of our SLE kernels where the cpu stopper
thread ("migration/15") is entering idle balance. This then triggers
active load balance.


At the same time, a task on another CPU triggers a page fault and NUMA
balancing kicks in to try and migrate the task closer to the NUMA node
for that page (we're inside stop_two_cpus()). This faulting task is
spinning in try_to_wake_up() (inside smp_cond_load_acquire(&p->on_cpu,
!VAL)), waiting for "migration/15" to context switch.

Unfortunately, because "migration/15" is doing active load balance
it's spinning waiting for the NUMA-page-faulting CPU's stopper lock,
which is already held (since it's inside stop_two_cpus()).

Deadlock ensues.

This seems like a situation that should be prohibited, but I cannot
find any code to prevent it. Is it OK for stopper threads to load
balance? Is there something that should prevent this situation from
happening?

cpu stopper threads and load balancing leads to deadlock

Reply via email to