On Fri, Jul 19, 2019 at 01:03:49PM +0200, Peter Zijlstra wrote: > On Thu, Jul 18, 2019 at 03:18:34PM +0200, Oleg Nesterov wrote: > > People report that utime and stime from /proc/<pid>/stat become very wrong > > when the numbers are big enough. In particular, the monitored application > > can run all the time in user-space but only stime grows. > > > > This is because scale_stime() is very inaccurate. It tries to minimize the > > relative error, but the absolute error can be huge. > > > > Andrew wrote the test-case: > > > > int main(int argc, char **argv) > > { > > struct task_cputime c; > > struct prev_cputime p; > > u64 st, pst, cst; > > u64 ut, put, cut; > > u64 x; > > int i = -1; // one step not printed > > > > if (argc != 2) > > { > > printf("usage: %s <start_in_seconds>\n", argv[0]); > > return 1; > > } > > x = strtoull(argv[1], NULL, 0) * SEC; > > printf("start=%lld\n", x); > > > > p.stime = 0; > > p.utime = 0; > > > > while (i++ < NSTEPS) > > { > > x += STEP; > > c.stime = x; > > c.utime = x; > > c.sum_exec_runtime = x + x; > > pst = cputime_to_clock_t(p.stime); > > put = cputime_to_clock_t(p.utime); > > cputime_adjust(&c, &p, &ut, &st); > > cst = cputime_to_clock_t(st); > > cut = cputime_to_clock_t(ut); > > if (i) > > printf("ut(diff)/st(diff): %20lld (%4lld) %20lld > > (%4lld)\n", > > cut, cut - put, cst, cst - pst); > > } > > } > > > > For example, > > > > $ ./stime 300000 > > start=300000000000000 > > ut(diff)/st(diff): 299994875 ( 0) 300009124 > > (2000) > > ut(diff)/st(diff): 299994875 ( 0) 300011124 > > (2000) > > ut(diff)/st(diff): 299994875 ( 0) 300013124 > > (2000) > > ut(diff)/st(diff): 299994875 ( 0) 300015124 > > (2000) > > ut(diff)/st(diff): 299994875 ( 0) 300017124 > > (2000) > > ut(diff)/st(diff): 299994875 ( 0) 300019124 > > (2000) > > ut(diff)/st(diff): 299994875 ( 0) 300021124 > > (2000) > > ut(diff)/st(diff): 299994875 ( 0) 300023124 > > (2000) > > ut(diff)/st(diff): 299994875 ( 0) 300025124 > > (2000) > > ut(diff)/st(diff): 299994875 ( 0) 300027124 > > (2000) > > ut(diff)/st(diff): 299994875 ( 0) 300029124 > > (2000) > > ut(diff)/st(diff): 299996875 (2000) 300029124 ( > > 0) > > ut(diff)/st(diff): 299998875 (2000) 300029124 ( > > 0) > > ut(diff)/st(diff): 300000875 (2000) 300029124 ( > > 0) > > ut(diff)/st(diff): 300002875 (2000) 300029124 ( > > 0) > > ut(diff)/st(diff): 300004875 (2000) 300029124 ( > > 0) > > ut(diff)/st(diff): 300006875 (2000) 300029124 ( > > 0) > > ut(diff)/st(diff): 300008875 (2000) 300029124 ( > > 0) > > ut(diff)/st(diff): 300010875 (2000) 300029124 ( > > 0) > > ut(diff)/st(diff): 300012055 (1180) 300029944 ( > > 820) > > ut(diff)/st(diff): 300012055 ( 0) 300031944 > > (2000) > > ut(diff)/st(diff): 300012055 ( 0) 300033944 > > (2000) > > ut(diff)/st(diff): 300012055 ( 0) 300035944 > > (2000) > > ut(diff)/st(diff): 300012055 ( 0) 300037944 > > (2000) > > > > shows the problem even when sum_exec_runtime is not that big: 300000 secs. > > > > The new implementation of scale_stime() does the additional div64_u64_rem() > > in a loop but see the comment, as long it is used by cputime_adjust() this > > can happen only once. > > That only shows something after long long staring :/ There's no words on > what the output actually means or what would've been expected. > > Also, your example is incomplete; the below is a test for scale_stime(); > from this we can see that the division results in too large a number, > but, important for our use-case in cputime_adjust(), it is a step > function (due to loss in precision) and for every plateau we shift > runtime into the wrong bucket.
But I'm still confused, since in the long run, it should still end up with a proportionally divided user/system, irrespective of some short term wobblies. So please, better articulate the problem.