On 05/31/2012 02:34, Andrey Zonov wrote:
On 5/30/12 11:27 PM, Andrey Zonov wrote:
Hi,
I have long running process for which `ps -o usertime -p $pid' shows
always the same time - 6190:07.65, `ps -o cputime -p $pid' for the same
process continue to grow and now it's 21538:53.61. It looks like
overflow in resource usage code or something.
I reproduced that problem with attached program. I ran it with 23
threads on machine with 24 CPUs and after night I see this:
$ ps -o usertime,time -p 24134 && sleep 60 && ps -o usertime,time -p 24134
USERTIME TIME
6351:24.74 14977:35.19
USERTIME TIME
6351:24.74 15000:34.53
Per thread user-time counts correct:
$ ps -H -o usertime,time -p 24134
USERTIME TIME
0:00.00 0:00.00
652:35.84 652:38.59
652:34.75 652:37.97
652:50.46 652:51.97
652:38.93 652:43.08
652:39.73 652:43.36
652:44.09 652:47.36
652:56.49 652:57.94
652:51.84 652:54.41
652:37.48 652:41.57
652:36.61 652:40.90
652:39.41 652:42.52
653:03.72 653:06.72
652:49.96 652:53.25
652:45.92 652:49.03
652:40.33 652:42.05
652:46.53 652:49.31
652:44.77 652:47.33
653:00.54 653:02.24
652:33.31 652:36.13
652:51.03 652:52.91
652:50.73 652:52.71
652:41.32 652:44.64
652:59.86 653:03.25
(kgdb) p $my->p_rux
$14 = {rux_runtime = 2171421985692826, rux_uticks = 114886093,
rux_sticks = 8353, rux_iticks = 0, rux_uu = 381084736784, rux_su =
65773652, rux_tu = 904571706136}
(kgdb) p $my->p_rux
$15 = {rux_runtime = 2191831516209186, rux_uticks = 115966087,
rux_sticks = 8444, rux_iticks = 0, rux_uu = 381084736784, rux_su =
66458587, rux_tu = 913099969825}
As you can see rux_uu stale, but rux_uticks still ticks. I think the
problem is in calcru1(). This expression
uu = (tu * ut) / tt
overflows.
I applied the following patch:
Index: /usr/src/sys/kern/kern_resource.c
===================================================================
--- /usr/src/sys/kern/kern_resource.c (revision 235394)
+++ /usr/src/sys/kern/kern_resource.c (working copy)
@@ -885,7 +885,7 @@ calcru1(struct proc *p, struct rusage_ext *ruxp, s
struct timeval *sp)
{
/* {user, system, interrupt, total} {ticks, usec}: */
- uint64_t ut, uu, st, su, it, tt, tu;
+ uint64_t ut, uu, st, su, it, tt, tu, tmp;
ut = ruxp->rux_uticks;
st = ruxp->rux_sticks;
@@ -909,10 +909,20 @@ calcru1(struct proc *p, struct rusage_ext *ruxp, s
* The normal case, time increased.
* Enforce monotonicity of bucketed numbers.
*/
- uu = (tu * ut) / tt;
+ if (ut == 0)
+ uu = 0;
+ else {
+ tmp = tt / ut;
+ uu = tmp ? tu / tmp : 0;
+ }
if (uu < ruxp->rux_uu)
uu = ruxp->rux_uu;
and now ran test again.
This looks related to, and possibly identical to, PR kern/76972:
http://www.freebsd.org/cgi/query-pr.cgi?pr=76972
If you filed a PR, please submit a follow-up to both PRs so they
reference each other.
Thanks,
Eric
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"