Re: Technical strategy and performance

2016-06-29 Thread Eric S. Raymond
Hal Murray : > The big picture question that comes to mind is why did we start by forking > ntp classic? Why not start from scratch? Did anybody consider chrony? What > other options are/were there? We forked from Classic for the same reasons we didn't start from scratch and didn't use chrony

My task list

2016-06-29 Thread Eric S. Raymond
Here's what I currently have lined up to do next, in the priority order I currently have in mind: 1. Try replacing our buggy async-DNS code with the c-ares library. 2. If that succeeds, reinstate memlocking long enough to check if the crash bug recurs. If it doesn't, leave memlocking in. 3.

Re: Kernel PPS processing

2016-06-29 Thread Gary E. Miller
Yo Hal! On Wed, 29 Jun 2016 22:21:41 -0700 Hal Murray wrote: > > Local clock frequency offset, as opposed to local clock time > > offset. > > Most NTP documentation calls that drift. Its magnitude is not very > interesting when discussing quality of time. Changes over time can > be interes

Re: Kernel PPS processing

2016-06-29 Thread Hal Murray
> Local clock frequency offset, as opposed to local clock time offset. Most NTP documentation calls that drift. Its magnitude is not very interesting when discussing quality of time. Changes over time can be interesting. It's usually much more interesting to look at the clock offset. There a

Re: Kernel PPS processing

2016-06-29 Thread Gary E. Miller
Yo Hal! On Wed, 29 Jun 2016 21:35:49 -0700 Hal Murray wrote: > g...@rellim.com said: > > Wow. I thought something was wrong. My local clock offset > > (peerstats file) has always been hanging around 100ppm. Stable to > > ±1ppm so I figured that was normal. > > > After reboot the local cloc

Re: Kernel PPS processing

2016-06-29 Thread Hal Murray
g...@rellim.com said: > Wow. I thought something was wrong. My local clock offset (peerstats file) > has always been hanging around 100ppm. Stable to ±1ppm so I figured > that was normal. > After reboot the local clock offset started at 9ppm and has been slowyly > going down, now under 2ppm.

Re: Kernel PPS processing

2016-06-29 Thread Gary E. Miller
Yo Hal! On Wed, 29 Jun 2016 15:28:29 -0700 Hal Murray wrote: > g...@rellim.com said: > >> I'm not sure why you would expect performance to be identical. > > Because thhey use the same kernel generated time stamp and PLL > > algorithm. > > There are two chunks of PPS code in the kernel with

Re: Kernel PPS processing

2016-06-29 Thread Hal Murray
g...@rellim.com said: >> I'm not sure why you would expect performance to be identical. > Because thhey use the same kernel generated time stamp and PLL algorithm. There are two chunks of PPS code in the kernel with separate RFCs. One is getting the time stamp. The other is doing the PLL. Th

Re: Technical strategy and performance

2016-06-29 Thread Hal Murray
Thanks. I didn't see any surprises. I'm happy with the general idea, it's the details that get interesting. Removing cruft is good. Removing features is not. There is a trade off between the cruftiness of the code and the importance of any features it includes. This example gets tangled up

Re: Kernel PPS processing

2016-06-29 Thread Matthew Selsky
On Wed, Jun 29, 2016 at 12:31:32PM -0700, Hal Murray wrote: > matthew.sel...@twosigma.com said: > > We tested booting with "nohz=off intel_idle.max_cstate=0" and it made a > > difference in our production clocks. > > Interesting. Thanks. > > How did you decide to go there? The "nohz=off" hint

Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-06-29 Thread Matthew Selsky
On Tue, Jun 28, 2016 at 11:39:16PM -0700, Hal Murray wrote: > > matthew.sel...@twosigma.com said: > > "rlimit memlock 0" using Classic causes ntpd to died after 3 minutes with > > this error 2016-06-29T00:13:21.903+00:00 host.example.com ntpd[27206]: > > libgcc_s.so.1 must be installed for pthread

Re: Kernel PPS processing

2016-06-29 Thread Gary E. Miller
Yo Matthew! On Wed, 29 Jun 2016 16:38:40 -0400 Matthew Selsky wrote: > Measuring on this server via ntpq -p we were seeing offsets of +/- > 3us without either of these two kernel parameters. With nohz=off, > the offsets were +/- 1us. With both kernel parameters we see > offsets too small for n

Re: Kernel PPS processing

2016-06-29 Thread Gary E. Miller
Yo Matthew! On Wed, 29 Jun 2016 16:38:40 -0400 Matthew Selsky wrote: > > Can you quantify the individual effects? And is that kernel PPS, > > KPPS in ntpsec, or just PPS in ntpsec? > > We're using a GPS PCIe card with OCXO HQ with a daemon that writes to > SHM and then ntpd reads from the SH

Re: Technical strategy and performance

2016-06-29 Thread Mark Atwood
this is was discussed heavily in cii meetings. will expand later. on my phone On Wed, Jun 29, 2016, 12:03 PM Hal Murray wrote: > > fallenpega...@gmail.com said: > > Thank you Eric. Have read, am pondering, and welcome other people to > weigh > > in. > > The big picture question that comes to

Re: Kernel PPS processing

2016-06-29 Thread Matthew Selsky
On Wed, Jun 29, 2016 at 12:31:56PM -0700, Gary E. Miller wrote: > Yo Matthew! > > On Wed, 29 Jun 2016 14:55:23 -0400 > Matthew Selsky wrote: > > > On Wed, Jun 29, 2016 at 11:48:08AM -0700, Hal Murray wrote: > > > > Can you quantify the better? I would have expected identical... > > > > > > D

Re: Kernel PPS processing

2016-06-29 Thread Gary E. Miller
Yo Hal! > > > Did you look at the graph? > > > http://users.megapathdsl.net/~hmurray/ntpsec/glypnod-pps-kernel.png > > Yeah, your 'No kernel' was aweful. If that is your baseline then you > got something really, really, wrong. The scale on the 'kernel PLL' > was too small to really tell, but

Re: Kernel PPS processing

2016-06-29 Thread Gary E. Miller
Yo Hal! On Wed, 29 Jun 2016 11:48:08 -0700 Hal Murray wrote: > I'm not sure why you would expect performance to be identical. Because thhey use the same kernel generated time stamp and PLL algorithm. > Dave > Mills and crew went to a lot of effort to get code into various > kernels, including

Re: Kernel PPS processing

2016-06-29 Thread Gary E. Miller
Yo Matthew! On Wed, 29 Jun 2016 14:55:23 -0400 Matthew Selsky wrote: > On Wed, Jun 29, 2016 at 11:48:08AM -0700, Hal Murray wrote: > > > Can you quantify the better? I would have expected identical... > > > > Did you look at the graph? > > http://users.megapathdsl.net/~hmurray/ntpsec/glypn

Re: Kernel PPS processing

2016-06-29 Thread Hal Murray
matthew.sel...@twosigma.com said: > We tested booting with "nohz=off intel_idle.max_cstate=0" and it made a > difference in our production clocks. Interesting. Thanks. How did you decide to go there? Did you try those 2 changes separately? Was that with PPS or just a typical system? Are you

Re: Technical strategy and performance

2016-06-29 Thread Hal Murray
fallenpega...@gmail.com said: > Thank you Eric. Have read, am pondering, and welcome other people to weigh > in. The big picture question that comes to mind is why did we start by forking ntp classic? Why not start from scratch? Did anybody consider chrony? What other options are/were ther

Re: Kernel PPS processing

2016-06-29 Thread Matthew Selsky
On Wed, Jun 29, 2016 at 11:48:08AM -0700, Hal Murray wrote: > > Can you quantify the better? I would have expected identical... > > Did you look at the graph? > http://users.megapathdsl.net/~hmurray/ntpsec/glypnod-pps-kernel.png > > I'm not sure why you would expect performance to be identical

Re: Kernel PPS processing

2016-06-29 Thread Hal Murray
> Can you quantify the better? I would have expected identical... Did you look at the graph? http://users.megapathdsl.net/~hmurray/ntpsec/glypnod-pps-kernel.png I'm not sure why you would expect performance to be identical. Dave Mills and crew went to a lot of effort to get code into various

Re: Technical strategy and performance

2016-06-29 Thread Mark Atwood
Thank you Eric. Have read, am pondering, and welcome other people to weigh in. ..m On Tue, Jun 28, 2016 at 8:30 PM Eric S. Raymond wrote: > In recent discussion of the removal of memlock, Hal Murray said > "Consider ntpd running on an old system that is mostly lightly loaded > and doesn't have

Re: Kernel PPS processing

2016-06-29 Thread Gary E. Miller
Yo Hal! On Wed, 29 Jun 2016 03:21:40 -0700 Hal Murray wrote: > So I pulled over the sources and built a kernel with NO_HZ turned off > and NTP_PPS turned on. I'm interested. > The next project is to figure out why it works so much better, or > rather why the normal ntpd can't do a lot better.

Re: adns is looking plausible

2016-06-29 Thread Gary E. Miller
Yo Eric! On Wed, 29 Jun 2016 08:19:45 -0400 "Eric S. Raymond" wrote: > I haven't looked at the code itself yet, but from reading the C > header file and the website, adns is looking like a plausible > replacement for our homebrew async-DNS. Gentoo has been slowly moving from adns to c-ares.

Re: adns is looking plausible

2016-06-29 Thread Hal Murray
e...@thyrsus.com said: > I haven't looked at the code itself yet, but from reading the C header file > and the website, adns is looking like a plausible replacement for our > homebrew async-DNS. Good find! One feature that pushes me in that direction is being able to get at the TTL. > and redu

adns is looking plausible

2016-06-29 Thread Eric S. Raymond
Heads up, Mark! Licensing policy issue. I haven't looked at the code itself yet, but from reading the C header file and the website, adns is looking like a plausible replacement for our homebrew async-DNS. I know who Ian Jackson is, and there's enough of an ecology of related projects to make me

Kernel PPS processing

2016-06-29 Thread Hal Murray
http://users.megapathdsl.net/~hmurray/ntpsec/glypnod-pps-kernel.png If you turn on flag3 for a PPS driver on a Linux system, you get this error message: 06-20T12:25:32 ntpd[988]: refclock_params: kernel PLL (hardpps, RFC 1589) not implemented I poked around a bit. Those options are in drivers/