Chris Landauer writes:
| 
| hihi, all -
| 
| well, i have an "almost" fix for the problem - read on, ...
| (this is for discussin before send-pr submission)
| 
| description
| 
|       in FreeBSD 5.3-RELEASE (and 5.2.1R, and 5.1R),
|       in file /usr/src/sys/kern/kern_resource.c,
|       lines 657-750 define a function calcru(),
|       and there is a bug in that function -
| 
|       line 713 in that file is
| 
|                uu = (tu * ut) / tt,
| 
|       which is trying to compute a proportion (since ut<=tt) - the problem
|       is that for my application, the values of tt and ut are very large and
|       the value of the intermediate product overflows, leading to incorrect
|       values (i reported the symptoms to freebsd-hackers and a very helpful
|       description and localization of the problem to calcru() was provided
|       by peter jeremy, who also wrote that it is a very old, but only partly
|       known, problem)
| 
| repetition
| 
|       use time in csh or /usr/bin/time or getrusage() to time any program
|       that takes more than a week to run
| 
| fix (almost)
| 
|       it turns out that this problem can be (partly) fixed by replacing that
|       one line above with the following lines:
| 
|       /* we know 0 <= ut <= tt and 1 <= tt */
|       if (tu >= tt)
|               {
|               **whatever type they need to be** q, r;
|               q = tu / tt;
|               r = tu % tt;
|               uu = (q * ut) + (r * ut) / tt;
|               }
|       else    uu = (tu * ut) / tt
| 
|       this change does not solve all the arithmetic problems (especially
|       when ut is very large), and a similar change is needed for system
|       time, computing su in line 714 of the file, but it suffices for me -
| 
|       analysis (yup, proof that it should work 8-)) -
| 
|       i expect that all of these counters are increasing throughout the life
|       of the process -
|               tu is total time in microseconds
|               ut is user 128hz ticks
|               tt is total 128hz ticks
|       i assume therefore that
|               tu ~ tt * 10^6/128
|       strictly because of the clock rates
|       (machines with other kinds of clock rate ratios will need a different
|       analysis, but the same idea should work there, too) -
| 
|       in my case ut ~ tt, so we see overflow in the old computation when
|               tu * ut >= 2^64
|               tt^2 * 10^6/128 >= 2^64
|               tt * 10^3/8*sqrt(2) >= 2^32 ~ 4 * 10^9
|               tt >= 32*sqrt(2)*10^6,
|       which is about
|               sqrt(2)*10^6 / 4 ~ 3.54*10^5 seconds,
|       or
|                       ~ 98 hours
|       (this is the phenomenon i observed)
| 
|       in the new computation offered above, since we know that
|               ut <= tt,
|       we also know that
|               uu <= tu,
|       and
|               (q * ut) <= uu,
|       so there can be no overflow in the (q * ut) term -
|       therefore, this changed code will overflow only when r is large
|               r ~ tt,
|       and then only when
|               r * ut >= 2^64
|               tt^2 >= 2^64
|               tt >= 2^32,
|       which is about
|                       ~ 2^25 seconds
|                       ~ 9300 hours
|                       ~ 388 days
|       (i can live with that for now)
| 
|       for everyone else, it adds the one test in the if statement to every
|       call to calcru() (or two, if system time is similarly fixed) -
|       <philosophy> instrumentation is costly, and correct instrumentation is
|       even more costly, but it's worth it, every time, to get the right
|       answer </philosophy>
| 
|       i'm about to try it out this week - i will report the results when i
|       get some data (which will be a few weeks)
| 
|       i'm thinking about how to solve the problem correctly for really
|       long-running programs, but it's all pretty special case ugly so far

This is my fix:

Index: kern_resource.c
===================================================================
RCS file: /usr/local/cvsroot/freebsd/src/sys/kern/kern_resource.c,v
retrieving revision 1.55.2.5
diff -u -p -r1.55.2.5 kern_resource.c
--- kern_resource.c     3 Nov 2001 01:41:08 -0000       1.55.2.5
+++ kern_resource.c     26 Jan 2005 19:03:27 -0000
@@ -552,10 +552,10 @@ calcru(p, up, sp, ip)
                tu = ptu;
        }
 
-       /* Subdivide tu. */
-       uu = (tu * ut) / tt;
-       su = (tu * st) / tt;
-       iu = tu - uu - su;
+       /* Subdivide tu. try to becareful of overflow */
+       su = tu * (st * 1024 / tt) / 1024;
+       iu = tu * (it * 1024 / tt) / 1024;
+       uu = tu - (su + iu);
 
        /* Enforce monotonicity. */
        if (uu < p->p_uu || su < p->p_su || iu < p->p_iu) {

If people like it I can commit it.  It works for us.  We had problems with
this as well.  It's pretty simple fix.  I used 1k since usage of this
tends to be % so rounding should effect that much.

Doug A.
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to