On Wed, 2025-03-26 at 17:44 +0100, Frederic Weisbecker wrote: > Hi Walter Chang, > > Le Wed, Mar 26, 2025 at 05:46:38AM +0000, Walter Chang (張維哲) a écrit > : > > On Tue, 2025-01-21 at 09:08 -0800, Paul E. McKenney wrote: > > > On Sat, Jan 18, 2025 at 12:24:33AM +0100, Frederic Weisbecker > > > wrote: > > > > hrtimers are migrated away from the dying CPU to any online > > > > target > > > > at > > > > the CPUHP_AP_HRTIMERS_DYING stage in order not to delay > > > > bandwidth > > > > timers > > > > handling tasks involved in the CPU hotplug forward progress. > > > > > > > > However wake ups can still be performed by the outgoing CPU > > > > after > > > > CPUHP_AP_HRTIMERS_DYING. Those can result again in bandwidth > > > > timers > > > > being armed. Depending on several considerations (crystal ball > > > > power management based election, earliest timer already > > > > enqueued, > > > > timer > > > > migration enabled or not), the target may eventually be the > > > > current > > > > CPU even if offline. If that happens, the timer is eventually > > > > ignored. > > > > > > > > The most notable example is RCU which had to deal with each and > > > > every of > > > > those wake-ups by deferring them to an online CPU, along with > > > > related > > > > workarounds: > > > > > > > > _ e787644caf76 (rcu: Defer RCU kthreads wakeup when CPU is > > > > dying) > > > > _ 9139f93209d1 (rcu/nocb: Fix RT throttling hrtimer armed from > > > > offline CPU) > > > > _ f7345ccc62a4 (rcu/nocb: Fix rcuog wake-up from offline > > > > softirq) > > > > > > > > The problem isn't confined to RCU though as the stop machine > > > > kthread > > > > (which runs CPUHP_AP_HRTIMERS_DYING) reports its completion at > > > > the > > > > end > > > > of its work through cpu_stop_signal_done() and performs a wake > > > > up > > > > that > > > > eventually arms the deadline server timer: > > > > > > > > WARNING: CPU: 94 PID: 588 at > > > > kernel/time/hrtimer.c:1086 > > > > hrtimer_start_range_ns+0x289/0x2d0 > > > > CPU: 94 UID: 0 PID: 588 Comm: migration/94 Not > > > > tainted > > > > Stopper: multi_cpu_stop+0x0/0x120 <- > > > > stop_machine_cpuslocked+0x66/0xc0 > > > > RIP: 0010:hrtimer_start_range_ns+0x289/0x2d0 > > > > Call Trace: > > > > <TASK> > > > > ? hrtimer_start_range_ns > > > > start_dl_timer > > > > enqueue_dl_entity > > > > dl_server_start > > > > enqueue_task_fair > > > > enqueue_task > > > > ttwu_do_activate > > > > try_to_wake_up > > > > complete > > > > cpu_stopper_thread > > > > smpboot_thread_fn > > > > kthread > > > > ret_from_fork > > > > ret_from_fork_asm > > > > </TASK> > > > > > > > > Instead of providing yet another bandaid to work around the > > > > situation, > > > > fix it from hrtimers infrastructure instead: always migrate > > > > away a > > > > timer to an online target whenever it is enqueued from an > > > > offline > > > > CPU. > > > > > > > > This will also allow to revert all the above RCU disgraceful > > > > hacks. > > > > > > > > Reported-by: Vlad Poenaru <vlad.w...@gmail.com> > > > > Reported-by: Usama Arif <usamaarif...@gmail.com> > > > > Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from > > > > outgoing CPU earlier") > > > > Closes: 20241213203739.1519801-1-usamaarif...@gmail.com > > > > Signed-off-by: Frederic Weisbecker <frede...@kernel.org> > > > > Signed-off-by: Paul E. McKenney <paul...@kernel.org> > > > > > > This passes over-holiday testing rcutorture, so, perhaps > > > redundantly: > > > > > > Tested-by: Paul E. McKenney <paul...@kernel.org> > > > > Hi, > > > > I encountered the same issue even after applying this patch. > > Below are the details of the warning and call trace. > > > > > > migration/3: ------------[ cut here ]------------ > > migration/3: WARNING: CPU: 3 PID: 42 at kernel/time/hrtimer.c:1125 > > enqueue_hrtimer+0x7c/0xec > > migration/3: CPU: 3 UID: 0 PID: 42 Comm: migration/3 Tainted: > > G > > OE 6.12.18-android16-0-g59cb5a849beb-4k #1 > > 0b440e43fa7b24aaa3b7e6e5d2b938948e0cacdb > > migration/3: Stopper: multi_cpu_stop+0x0/0x184 <- > > stop_machine_cpuslocked+0xc0/0x15c > > It's not the first time I get such a report on an out of tree > kernel. The problem is I don't know if the tainted modules are > involved. But something is probably making an offline CPU visible > within > the hierarchy on get_nohz_timer_target(). And that new warning made > that visible. > Hi,
By review the get_nohz_timer_target(), it's probably making an offline CPU visible at timer candidates, maybe this patch could fix it? [PATCH] sched/core: Exclude offline CPUs from the timer candidates The timer target is chosen from the HK_TYPE_KERNEL_NOISE. However,the candidate may be an offline CPU, so exclude offline CPUs and choose only from online CPUs. Signed-off-by: kuyo chang <kuyo.ch...@mediatek.com> --- kernel/sched/core.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index cfaca3040b2f..efcc2576e622 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1182,7 +1182,7 @@ int get_nohz_timer_target(void) struct sched_domain *sd; const struct cpumask *hk_mask; - if (housekeeping_cpu(cpu, HK_TYPE_KERNEL_NOISE)) { + if (housekeeping_cpu(cpu, HK_TYPE_KERNEL_NOISE) && cpu_online(cpu)) { if (!idle_cpu(cpu)) return cpu; default_cpu = cpu; @@ -1197,13 +1197,16 @@ int get_nohz_timer_target(void) if (cpu == i) continue; - if (!idle_cpu(i)) + if (!idle_cpu(i) && cpu_online(i)) return i; } } - if (default_cpu == -1) + if (default_cpu == -1) { default_cpu = housekeeping_any_cpu(HK_TYPE_KERNEL_NOISE); + if (!cpu_online(default_cpu)) + default_cpu = cpumask_any(cpu_online_mask); + } return default_cpu; } > Can you try this and tell us if the warning fires? > > Thanks. > > diff --git a/include/linux/sched/nohz.h b/include/linux/sched/nohz.h > index 6d67e9a5af6b..f49512628269 100644 > --- a/include/linux/sched/nohz.h > +++ b/include/linux/sched/nohz.h > @@ -9,6 +9,7 @@ > #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON) > extern void nohz_balance_enter_idle(int cpu); > extern int get_nohz_timer_target(void); > +extern void assert_domain_online(void); > #else > static inline void nohz_balance_enter_idle(int cpu) { } > #endif > diff --git a/kernel/cpu.c b/kernel/cpu.c > index 07455d25329c..98c8f8408403 100644 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -13,6 +13,7 @@ > #include <linux/sched/isolation.h> > #include <linux/sched/task.h> > #include <linux/sched/smt.h> > +#include <linux/sched/nohz.h> > #include <linux/unistd.h> > #include <linux/cpu.h> > #include <linux/oom.h> > @@ -1277,6 +1278,7 @@ static int take_cpu_down(void *_param) > if (err < 0) > return err; > > + assert_domain_online(); > /* > * Must be called from CPUHP_TEARDOWN_CPU, which means, as > we are going > * down, that the current state is CPUHP_TEARDOWN_CPU - 1. > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 175a5a7ac107..88157b1645cc 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -1163,6 +1163,20 @@ void resched_cpu(int cpu) > > #ifdef CONFIG_SMP > #ifdef CONFIG_NO_HZ_COMMON > +void assert_domain_online(void) > +{ > + int cpu = smp_processor_id(); > + int i; > + struct sched_domain *sd; > + > + guard(rcu)(); > + > + for_each_domain(cpu, sd) { > + for_each_cpu(i, sched_domain_span(sd)) { > + WARN_ON_ONCE(cpu_is_offline(i)); > + } > + } > +} > /* > * In the semi idle case, use the nearest busy CPU for migrating > timers > * from an idle CPU. This is good for power-savings.