On Tue, 11 Dec 2007, Otto Moerbeek wrote:

On Tue, Dec 11, 2007 at 03:10:52AM +0000, Jason George wrote:

I have an older dual P3-800 Compaq server that started losing time very
rapidly after upgrading to a -CURRENT snapshot over the weekend.

Running anything that threw the machine into a high (>10% sustained) system
percentage caused the clock to lose an incredible amount of time... to the
tune of 30 minutes in the course of about 4 hours.  My specific repeatable
test was to run "sup" with the compress option turned on.

Once I stopped the userland program that was causing the kernel to have high
system load, the clock would slowly start to try to readjust itself.
Ultimately, it would get within a second but would never fully sync.

Has anyone else seen this behaviour?

(Theo and Otto are aware...)

Yes, but to makes things clear, this is a different problem.

The first problem is: some clocks have a large systematic drift. That
can have several reasons: from cheap hardware to hard/firmware
reporting the wrong timecounter source frequency (we have seen that at
least one macppc, but machines of other platforms might have the same
problem). Current ntpd can only handle drifts up to +/- 500ppm for
timecounter archs.

The second problem is a drift _depending_ on system load. This is
likely to be a problem with interrupts or scheduling. Both underwent
large changes in current, and the developers in this area should/are
picking this up.

Oh, and ntpd without -s was never designed to adjust for more than a
couple of minutes of offset. So if your clock is way off and yo do not
want or can not use -s , use rdate once to get close and then let ntpd
do its work.


Thanks for the note.

For clarification, on the load-induced drifts, I am resetting the clock by hand to within a minute of the correct time. ntpd would then bring the clock to within 1s or so of the time it derives from its peers.

With the posted patch, ntpd is getting closer. It is synching then unsynching, adjusting local clock, synching, then adjusting clock frequency, adjusting local clock, adjusting local clock, adjusting frequency, etc, etc. No consistent "success" yet after 2h40m. The system is also essentially "unloaded".

--J

Reply via email to