On Fri, 12 Aug 2016 15:09:00 +0800 Wanpeng Li <kernel...@gmail.com> wrote: > 2016-08-12 10:44 GMT+08:00 Rik van Riel <r...@redhat.com>:
> > If you pass ULONG_MAX as the maxtime argument to > > steal_account_process_time(), does the steal time > > get accounted properly at 75%? > > Yes. I talked with Paolo this morning, and it turns out that if a guest misses several timer ticks in a row, they will simply get lost. That means the functions calling steal_account_process_time may not know how much CPU time has passed since the last time it was called, but steal_account_process_time will get a good idea on how much time the host spent running something else. Removing the limit, and documenting why, seems like the right way to fix this bug. Wanpeng, does the patch below work for you? Everybody else, does this patch look acceptable? ---8<--- Subject: time,virt: do not limit steal_account_process_time When a guest is interrupted for a longer amount of time, missed clock ticks are not redelivered later. Because of that, we should not limit the amount of steal time accounted to the amount of time that the calling functions think have passed. Instead, simply let steal_account_process_time account however much steal time the host told us elapsed. This can make up timer ticks that were missed when the host scheduled somebody else. Signed-off-by: Rik van Riel <r...@redhat.com> Reported-by: Wanpeng Li <kernel...@gmail.com> --- kernel/sched/cputime.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 1934f658c036..6f15274940fb 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -263,7 +263,12 @@ void account_idle_time(cputime_t cputime) cpustat[CPUTIME_IDLE] += (__force u64) cputime; } -static __always_inline cputime_t steal_account_process_time(cputime_t maxtime) +/* + * When a guest is interrupted for a longer amount of time, missed clock + * ticks are not redelivered later. Due to that, this function may on + * occasion account more time than the calling functions think elapsed. + */ +static __always_inline cputime_t steal_account_process_time(void) { #ifdef CONFIG_PARAVIRT if (static_key_false(¶virt_steal_enabled)) { @@ -273,7 +278,7 @@ static __always_inline cputime_t steal_account_process_time(cputime_t maxtime) steal = paravirt_steal_clock(smp_processor_id()); steal -= this_rq()->prev_steal_time; - steal_cputime = min(nsecs_to_cputime(steal), maxtime); + steal_cputime = nsecs_to_cputime(steal); account_steal_time(steal_cputime); this_rq()->prev_steal_time += cputime_to_nsecs(steal_cputime); @@ -290,7 +295,7 @@ static inline cputime_t account_other_time(cputime_t max) { cputime_t accounted; - accounted = steal_account_process_time(max); + accounted = steal_account_process_time(); if (accounted < max) accounted += irqtime_account_hi_update(max - accounted); @@ -486,7 +491,7 @@ void account_process_tick(struct task_struct *p, int user_tick) } cputime = cputime_one_jiffy; - steal = steal_account_process_time(cputime); + steal = steal_account_process_time(); if (steal >= cputime) return;