On 05/07/18 14:27, Matt Fleming wrote: > On Thu, 05 Jul, at 11:10:42AM, Valentin Schneider wrote: >> Hi, >> >> On 04/07/18 15:24, Matt Fleming wrote: >>> It's possible that the CPU doing nohz idle balance hasn't had its own >>> load updated for many seconds. This can lead to huge deltas between >>> rq->avg_stamp and rq->clock when rebalancing, and has been seen to >>> cause the following crash: >>> >>> divide error: 0000 [#1] SMP >>> Call Trace: >>> [<ffffffff810bcba8>] update_sd_lb_stats+0xe8/0x560
My confusion comes from not seeing where that crash happens. Would you mind sharing the associated line number? I can feel the "how did I not see this" from there but it can't be helped :( >>> [<ffffffff810bd04d>] find_busiest_group+0x2d/0x4b0 >>> [<ffffffff810bd640>] load_balance+0x170/0x950 >>> [<ffffffff810be3ff>] rebalance_domains+0x13f/0x290 >>> [<ffffffff810852bc>] __do_softirq+0xec/0x300 >>> [<ffffffff8108578a>] irq_exit+0xfa/0x110 >>> [<ffffffff816167d9>] reschedule_interrupt+0xc9/0xd0 >>> >> >> Do you have some sort of reproducer for that crash? If not I guess I can cook >> something up with a quiet userspace & rt-app, though I've never seen that one >> on arm64. > > Unfortunately no, I don't have a reproduction case. Would love to have > one to verify the patch though. > >>> Make sure we update the rq clock and load before balancing. >>> >>> Cc: Ingo Molnar <mi...@kernel.org> >>> Cc: Mike Galbraith <umgwanakikb...@gmail.com> >>> Cc: Peter Zijlstra <pet...@infradead.org> >>> Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> >>> --- >>> kernel/sched/fair.c | 10 ++++++++++ >>> 1 file changed, 10 insertions(+) >>> >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>> index 2f0a0be4d344..2c81662c858a 100644 >>> --- a/kernel/sched/fair.c >>> +++ b/kernel/sched/fair.c >>> @@ -9597,6 +9597,16 @@ static bool _nohz_idle_balance(struct rq *this_rq, >>> unsigned int flags, >>> */ >>> smp_mb(); >>> >>> + /* >>> + * Ensure this_rq's clock and load are up-to-date before we >>> + * rebalance since it's possible that they haven't been >>> + * updated for multiple schedule periods, i.e. many seconds. >>> + */ >>> + raw_spin_lock_irq(&this_rq->lock); >>> + update_rq_clock(this_rq); >>> + cpu_load_update_idle(this_rq); >>> + raw_spin_unlock_irq(&this_rq->lock); >>> + >> >> I'm failing to understand why the updates further down below are seemingly >> not enough. After we've potentially done >> >> update_rq_clock(rq); >> cpu_load_update_idle(rq); >> >> for all nohz cpus != this_cpu, we still end up doing: >> >> if (idle != CPU_NEWLY_IDLE) { >> update_blocked_averages(this_cpu); >> has_blocked_load |= this_rq->has_blocked_load; >> } >> >> which should properly update this_rq's clock and load before we attempt to do >> any balancing on it. > > But cpu_load_update_idle() and update_blocked_averages() are not the same > thing. > Right, we don't do any rq->cpu_load[] update for this_rq in the current nohz code (i.e. by using update_blocked_averages()), which I think we do want to do. I'm just miserably failing to find how not doing this leads to a div by 0.