On Mon, Dec 10, 2007 at 04:07:17PM -0500, Daniel Ouellet wrote:

> Hi,
>
> Looking a the code, I am trying to understand something on some servers 
> that just don't stay sync in the latest kernel (current).
>
> I see some changes were done to the drift, and a few other things.
>
> What is really the logic in the daemon to actually send a sync message and 
> more importantly to write the /var/db/drift file to then start to adjust 
> the clock.
>
> I am asking, because looks like some clock drift more then the correction 
> done to it.
>
> I can see the clock get sync for may be 1 or 2 minutes only after one to 
> three hours trying and then continuing to try to catch up and no drift file 
> are written.
>
> With the new/current code, is it possible to have a situation where the 
> drift is bigger then what's needed in difference between sampling to get 
> the clock to sync and write the drift file and then start to adjust the 
> clock to stay more in sync?
>
> Hoe my explications make sense as I have a few Sun servers that were 
> keeping time no problem before with 4.1, but then running 4.2 current, they 
> can't get and stay in sync now.
>
> So, these are the same boxes and only the OS was changed, that's why I am 
> asking.

Some archs use timecounter code now for the clock. That code has a lot
of benefits, but the range of clock drifts that can be compensated for
is not very big. I have an experimental diff here that might solve
your case. See below.

> One example of sampling, where th gap keep getting bigger:
> Dec 10 15:07:46 ntp1a ntpd[28571]: adjusting local clock by 0.589365s
> Dec 10 15:10:25 ntp1a ntpd[28571]: adjusting local clock by 0.619122s
> Dec 10 15:10:57 ntp1a ntpd[28571]: adjusting local clock by 0.625311s
> Dec 10 15:13:09 ntp1a ntpd[28571]: adjusting local clock by 0.654803s
> Dec 10 15:15:53 ntp1a ntpd[28571]: adjusting local clock by 0.665832s
> Dec 10 15:18:26 ntp1a ntpd[28571]: adjusting local clock by 0.755500s
> Dec 10 15:22:48 ntp1a ntpd[28571]: adjusting local clock by 0.777401s
> Dec 10 15:25:27 ntp1a ntpd[28571]: adjusting local clock by 0.786259s
> Dec 10 15:28:41 ntp1a ntpd[28571]: adjusting local clock by 0.855696s
> Dec 10 15:31:21 ntp1a ntpd[28571]: adjusting local clock by 0.901818s
> Dec 10 15:34:34 ntp1a ntpd[28571]: adjusting local clock by 0.986841s
> Dec 10 15:38:48 ntp1a ntpd[28571]: adjusting local clock by 0.890534s
> Dec 10 15:39:19 ntp1a ntpd[28571]: adjusting local clock by 1.003113s
> Dec 10 15:43:35 ntp1a ntpd[28571]: adjusting local clock by 1.003807s
> Dec 10 15:44:09 ntp1a ntpd[28571]: adjusting local clock by 1.000521s
> Dec 10 15:46:17 ntp1a ntpd[28571]: adjusting local clock by 1.070674s
> Dec 10 15:50:29 ntp1a ntpd[28571]: adjusting local clock by 1.012753s
> Dec 10 15:54:40 ntp1a ntpd[28571]: adjusting local clock by 1.011539s
> Dec 10 15:56:52 ntp1a ntpd[28571]: adjusting local clock by 1.109486s
> Dec 10 16:00:05 ntp1a ntpd[28571]: adjusting local clock by 1.024082s
>
>
> My understanding of the man page is that drift file will be written only 
> after the clock is in sync and then adjfreq will kick in to adjust it and 
> keep the time in sync better. But what about if it can't sync, or stay in 
> sync to have time to write the drift file, what then?

I would really have to look into the code to see if it's feasible to
start adjusting frequency when not synced. Currently I do not think it
will work without some rewriting. I am also worried the complexity of
the code would increase, or some oscillating effect would be
introduced in some cases. 

>
> Wouldn't it make sense to be able to compensate for that and may be have 
> ajdfreq start to play in to help address cases like this?
>
> I see some code have changes for this to reset the adjfreq 2 weeks and 4 
> days ago.
>
> Anyway, can it be force to start using adjfreq somehow before it is in sync 
> if only for testing reason?

Yes, you can create a drift file yourself and (re)start ntpd. For that
to work you'll need a reasnable estimate of the dirft.

So to summarize, you can do three things:

1. Test the diff below
2. Hack the ntpd code to start using adjfreq without being synced (not 
recommended)
3. Estimate the drift yousrelf and create a ntpd.drift file.

I would start with 1.
 
        -Otto


Index: kern_tc.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_tc.c,v
retrieving revision 1.9
diff -u -p -r1.9 kern_tc.c
--- kern_tc.c   9 May 2007 17:42:19 -0000       1.9
+++ kern_tc.c   12 Nov 2007 20:07:17 -0000
@@ -567,11 +567,11 @@ ntp_update_second(int64_t *adjust, time_
        if (adjtimedelta.tv_sec > 0)
                adj.tv_usec = 5000;
        else if (adjtimedelta.tv_sec == 0)
-               adj.tv_usec = MIN(500, adjtimedelta.tv_usec);
+               adj.tv_usec = MIN(5000, adjtimedelta.tv_usec);
        else if (adjtimedelta.tv_sec < -1)
                adj.tv_usec = -5000;
        else if (adjtimedelta.tv_sec == -1)
-               adj.tv_usec = MAX(-500, adjtimedelta.tv_usec - 1000000);
+               adj.tv_usec = MAX(-5000, adjtimedelta.tv_usec - 1000000);
        timersub(&adjtimedelta, &adj, &adjtimedelta);
        *adjust = ((int64_t)adj.tv_usec * 1000) << 32;
        *adjust += timecounter->tc_freq_adj;

Reply via email to