On Mon, Dec 10, 2007 at 04:07:17PM -0500, Daniel Ouellet wrote: > Hi, > > Looking a the code, I am trying to understand something on some servers > that just don't stay sync in the latest kernel (current). > > I see some changes were done to the drift, and a few other things. > > What is really the logic in the daemon to actually send a sync message and > more importantly to write the /var/db/drift file to then start to adjust > the clock. > > I am asking, because looks like some clock drift more then the correction > done to it. > > I can see the clock get sync for may be 1 or 2 minutes only after one to > three hours trying and then continuing to try to catch up and no drift file > are written. > > With the new/current code, is it possible to have a situation where the > drift is bigger then what's needed in difference between sampling to get > the clock to sync and write the drift file and then start to adjust the > clock to stay more in sync? > > Hoe my explications make sense as I have a few Sun servers that were > keeping time no problem before with 4.1, but then running 4.2 current, they > can't get and stay in sync now. > > So, these are the same boxes and only the OS was changed, that's why I am > asking.
Some archs use timecounter code now for the clock. That code has a lot of benefits, but the range of clock drifts that can be compensated for is not very big. I have an experimental diff here that might solve your case. See below. > One example of sampling, where th gap keep getting bigger: > Dec 10 15:07:46 ntp1a ntpd[28571]: adjusting local clock by 0.589365s > Dec 10 15:10:25 ntp1a ntpd[28571]: adjusting local clock by 0.619122s > Dec 10 15:10:57 ntp1a ntpd[28571]: adjusting local clock by 0.625311s > Dec 10 15:13:09 ntp1a ntpd[28571]: adjusting local clock by 0.654803s > Dec 10 15:15:53 ntp1a ntpd[28571]: adjusting local clock by 0.665832s > Dec 10 15:18:26 ntp1a ntpd[28571]: adjusting local clock by 0.755500s > Dec 10 15:22:48 ntp1a ntpd[28571]: adjusting local clock by 0.777401s > Dec 10 15:25:27 ntp1a ntpd[28571]: adjusting local clock by 0.786259s > Dec 10 15:28:41 ntp1a ntpd[28571]: adjusting local clock by 0.855696s > Dec 10 15:31:21 ntp1a ntpd[28571]: adjusting local clock by 0.901818s > Dec 10 15:34:34 ntp1a ntpd[28571]: adjusting local clock by 0.986841s > Dec 10 15:38:48 ntp1a ntpd[28571]: adjusting local clock by 0.890534s > Dec 10 15:39:19 ntp1a ntpd[28571]: adjusting local clock by 1.003113s > Dec 10 15:43:35 ntp1a ntpd[28571]: adjusting local clock by 1.003807s > Dec 10 15:44:09 ntp1a ntpd[28571]: adjusting local clock by 1.000521s > Dec 10 15:46:17 ntp1a ntpd[28571]: adjusting local clock by 1.070674s > Dec 10 15:50:29 ntp1a ntpd[28571]: adjusting local clock by 1.012753s > Dec 10 15:54:40 ntp1a ntpd[28571]: adjusting local clock by 1.011539s > Dec 10 15:56:52 ntp1a ntpd[28571]: adjusting local clock by 1.109486s > Dec 10 16:00:05 ntp1a ntpd[28571]: adjusting local clock by 1.024082s > > > My understanding of the man page is that drift file will be written only > after the clock is in sync and then adjfreq will kick in to adjust it and > keep the time in sync better. But what about if it can't sync, or stay in > sync to have time to write the drift file, what then? I would really have to look into the code to see if it's feasible to start adjusting frequency when not synced. Currently I do not think it will work without some rewriting. I am also worried the complexity of the code would increase, or some oscillating effect would be introduced in some cases. > > Wouldn't it make sense to be able to compensate for that and may be have > ajdfreq start to play in to help address cases like this? > > I see some code have changes for this to reset the adjfreq 2 weeks and 4 > days ago. > > Anyway, can it be force to start using adjfreq somehow before it is in sync > if only for testing reason? Yes, you can create a drift file yourself and (re)start ntpd. For that to work you'll need a reasnable estimate of the dirft. So to summarize, you can do three things: 1. Test the diff below 2. Hack the ntpd code to start using adjfreq without being synced (not recommended) 3. Estimate the drift yousrelf and create a ntpd.drift file. I would start with 1. -Otto Index: kern_tc.c =================================================================== RCS file: /cvs/src/sys/kern/kern_tc.c,v retrieving revision 1.9 diff -u -p -r1.9 kern_tc.c --- kern_tc.c 9 May 2007 17:42:19 -0000 1.9 +++ kern_tc.c 12 Nov 2007 20:07:17 -0000 @@ -567,11 +567,11 @@ ntp_update_second(int64_t *adjust, time_ if (adjtimedelta.tv_sec > 0) adj.tv_usec = 5000; else if (adjtimedelta.tv_sec == 0) - adj.tv_usec = MIN(500, adjtimedelta.tv_usec); + adj.tv_usec = MIN(5000, adjtimedelta.tv_usec); else if (adjtimedelta.tv_sec < -1) adj.tv_usec = -5000; else if (adjtimedelta.tv_sec == -1) - adj.tv_usec = MAX(-500, adjtimedelta.tv_usec - 1000000); + adj.tv_usec = MAX(-5000, adjtimedelta.tv_usec - 1000000); timersub(&adjtimedelta, &adj, &adjtimedelta); *adjust = ((int64_t)adj.tv_usec * 1000) << 32; *adjust += timecounter->tc_freq_adj;