On Sat, Mar 05, 2016 at 11:27:01AM +0100, Thomas Gleixner wrote: > Chris, > > On Fri, 4 Mar 2016, Chris Friesen wrote: > > First of all the subject line should contain a subsystem prefix, > i.e. "sched/cputime:" > > > The callers of steal_account_process_tick() expect it to return whether > > the last jiffy was stolen or not. > > > > Currently the return value of steal_account_process_tick() is in units > > of cputime, which vary between either jiffies or nsecs depending on > > CONFIG_VIRT_CPU_ACCOUNTING_GEN. > > Sure, but what is the actual problem? The return value is boolean and tells > whether there was stolen time accounted or not. > > > The fix is to change steal_account_process_tick() to always return > > jiffies. If CONFIG_VIRT_CPU_ACCOUNTING_GEN is not enabled then this > > is a no-op. > > What does that fix? > > > As far as I can tell this bug has been present since commit dee08a72. > > Which bug? > > > Signed-off-by: Chris Friesen <chris.frie...@windriver.com> > > --- > > > > kernel/sched/cputime.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c > > index b2ab2ff..e724496 100644 > > --- a/kernel/sched/cputime.c > > +++ b/kernel/sched/cputime.c > > @@ -276,7 +276,7 @@ static __always_inline bool > > steal_account_process_tick(void) > > this_rq()->prev_steal_time += cputime_to_nsecs(steal_ct); > > > > account_steal_time(steal_ct); > > - return steal_ct; > > + return cputime_to_jiffies(steal_ct); > > So if steal time is close to a jiffie, then cputime_to_jiffies will return 0 > and you account a full jiffie to user/system/whatever. > > Without a proper explanation of the problem and the resulting "bug" I really > cannot figure out why we want that change.
Indeed the changelog should better explain the problem. So I think the issue is that if the cputime has nsecs granularity and we have a tiny stolen time to account (lets say a few nanosecs, in fact anything that is below a jiffy), we are not going to account the tick on user/system. But the fix doesn't look right to me because we are still accounting the steal time if it is lower than a jiffy and that steal time will never be substracted to user/system time if it never reach a jiffy. Instead the fix should accumulate the steal time and account it only once it's worth a jiffy and then substract it from system/user time accordingly. Something like that: diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index b2ab2ff..d38e25f 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -262,7 +262,7 @@ static __always_inline bool steal_account_process_tick(void) #ifdef CONFIG_PARAVIRT if (static_key_false(¶virt_steal_enabled)) { u64 steal; - cputime_t steal_ct; + unsigned long steal_jiffies; steal = paravirt_steal_clock(smp_processor_id()); steal -= this_rq()->prev_steal_time; @@ -272,11 +272,11 @@ static __always_inline bool steal_account_process_tick(void) * based on jiffies). Lets cast the result to cputime * granularity and account the rest on the next rounds. */ - steal_ct = nsecs_to_cputime(steal); - this_rq()->prev_steal_time += cputime_to_nsecs(steal_ct); + steal_jiffies = nsecs_to_jiffies(steal); + this_rq()->prev_steal_time += jiffies_to_nsecs(steal_jiffies); account_steal_time(steal_ct); - return steal_ct; + return steal_jiffies; } #endif return false;