From: Kirill Tkhai <ktk...@virtuozzo.com> Since calc_load_fold_active() reads two variables (nr_running and nr_uninterruptible) it may race with parallel try_to_wake_up(). So it must be executed under rq::lock to prevent that. This seems to be the reason of negative calc_load_tasks, observed on several machines.
https://jira.sw.ru/browse/PSBM-68052 Signed-off-by: Kirill Tkhai <ktk...@virtuozzo.com> (cherry-picked from vz7 commit 63138412c98b ("sched: Count loadavg under rq::lock in calc_load_nohz_start()")) Rebase to vz8: - move calc_load_migrate under rq lock - move the rq lock/unlock hunk from calc_load_enter_idle() to calc_load_nohz_start() - use rq_(un)lock helpers like in patch sent to ms https://lkml.org/lkml/2017/7/7/472 This patch did not hit mainstream for some reason but it is likely still valid, at least I don't see any explicit fix of this problem. https://jira.sw.ru/browse/PSBM-127780 Signed-off-by: Pavel Tikhomirov <ptikhomi...@virtuozzo.com> khorenko@: we don't disable/enable irq in calc_load_nohz_start() because we assume irqs already disabled in calc_load_nohz_start(). Stack example: cpuidle_idle_call tick_nohz_idle_stop_tick __tick_nohz_idle_stop_tick tick_nohz_stop_tick calc_load_nohz_start Cherry-picked from vz8 commit e30538246af4 ("sched: Count loadavg under rq::lock in calc_load_nohz_start()")). Possible paths to reach calc_load_fold_active() where rq->nr_running and rq->nr_uninterruptible get used: (1) scheduler_tick() <- [code to take rq lock already existed] calc_global_load_tick() calc_load_fold_active() (2) sched_tick_remote() <- [code to take rq lock already existed] calc_load_nohz_remote() calc_load_nohz_fold() calc_load_fold_active() (3) tick_nohz_stop_tick() calc_load_nohz_start() <- [rq lock added here by this patch] calc_load_nohz_fold() calc_load_fold_active() (4) sched_cpu_dying() <- [rq lock ensured here by this patch] calc_load_migrate() calc_load_fold_active() Signed-off-by: Nikita Yushchenko <nikita.yushche...@virtuozzo.com> --- kernel/sched/core.c | 2 +- kernel/sched/loadavg.c | 6 +++++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f1689ac77af1..dad3dbd6acca 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -9035,9 +9035,9 @@ int sched_cpu_dying(unsigned int cpu) WARN(true, "Dying CPU not properly vacated!"); dump_rq_tasks(rq, KERN_WARNING); } + calc_load_migrate(rq); rq_unlock_irqrestore(rq, &rf); - calc_load_migrate(rq); update_max_interval(); hrtick_clear(rq); sched_core_cpu_dying(cpu); diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c index 0174c40c7468..74cfc755c87d 100644 --- a/kernel/sched/loadavg.c +++ b/kernel/sched/loadavg.c @@ -312,11 +312,15 @@ static void calc_load_nohz_fold(struct rq *rq) void calc_load_nohz_start(void) { + struct rq *rq = this_rq(); + struct rq_flags rf; /* * We're going into NO_HZ mode, if there's any pending delta, fold it * into the pending NO_HZ delta. */ - calc_load_nohz_fold(this_rq()); + rq_lock(rq, &rf); + calc_load_nohz_fold(rq); + rq_unlock(rq, &rf); } /* -- 2.30.2 _______________________________________________ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel