Re: [gpsd-dev] refclock 28 gone wacky on me
g...@rellim.com said: > Yes, that is expected. You need to tetll the Skytrazzq to force the top of > the second, and save to flash. What does that mean? bellyac...@gmail.com said: > Did that which is why I didn't understand the delivery coming at near the > end of the second. It appears thought that the firmware *slowly* brings > the delivery back to where it should be. I've lost track of the details of this discussion. Many GPS chips seem to have a 100 ms timer that drifts slowly so the offset within the second when the NMEA strings come out will slowly drift then jump back and start over. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Fix for startup bug - please test
Details in https://gitlab.com/NTPsec/ntpsec/issues/68 -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Fedora kernels missing hardpps
Mumble. Long story. There are two parts of PPS processing in the kernel. One is RFC 2783 which describes an API for capturing the time when a pulse happens. The other is RFC 1589 which describes a PLL which basically moves all the timekeeping work into the kernel. If you turn on flag3 with the PPS driver, it tries to activate the kernel PLL. I hadn't tried the kernel PLL for a while, maybe because I turned it off ages ago when the Linux PPS area was being rewritten. So I tried it. It gives a not-implemented error. But I thought the code was there. Poking around, that chunk of code in the kernel depends upon NTP_PPS which says: config NTP_PPS bool "PPS kernel consumer support" depends on !NO_HZ help This option adds support for direct in-kernel time synchronization using an external PPS signal. It doesn't work on tickless systems at the moment. The kernels shipped with Fedora, Debian, and Ubuntu all have NO_HZ turned on. I haven't found the config file for any of the ARM kernels. One try got the same result. I guess I'll try building a kernel and/or running on FreeBSD or NetBSD. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Documenting some progress - magic refclock addresses are almost gone
e...@thyrsus.com said: > Does anyone on the list understand mode 6 well enough to answer questions? > My main one is: if I add a field to a mode 6 response, is it going to break > old ntpqs or will they silently ignore it? I think they ignore it, but try it to be sure. > (The response field I intend to add, of course, is to the peer query and is > a refclock type name - empty for real servers.) Beware. There are 4 variations on the peers command in ntpq. The "o" types print the dispersion rather than jitter. The "l" types print the local IP address rather than the refid. At least I thought that was what should happen. lpeers doesn't work that way. There is also apeers. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Use of pool servers reveals unacceptable crash rate in async DNS
e...@thyrsus.com said: > 1. Apply Classic's workaround for the problem, which I don't remember the > details of but involved some dodgy nonstandard linker hacks done through the > build system. *However, I did not trust this method when I understood it.* > It seemed sure to cause porting difficulties and is inherently fragile. k...@roeckx.be said: > If it's the one I'm thinking about, I think the solution is to remove the > locking of memory. We may be confusing several bugs. There was a problem with locking stuff into memory. Some library needed by end of thread processing wasn't loaded yet and things worked out such that with the default memory 32 bit systems worked but 64 bit systems didn't have enough room. I think one solution was to create a dummy thread early on to get that module loaded. Or disable memory locking, or tell it to use more memory, or ... > 2. Fix the actual problem. Well, that'd be nice, but Hal looked into it > months ago and said he understood it but couldn't generate a fix. IIRC, he > said it needed a full rewrite. That tells me the code is probably not > salvageable. I don't remember that part. I use the pool command on several systems. I haven't seen a crash in ages. There was another interesting problem in this area. It was a bug in FreeBSD's trap handler. ntpd managed to trigger it consistently. . > I favor #4. I favor understanding things more. Can you get a stack trace? -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Use of pool servers reveals unacceptable crash rate in async DNS
e...@thyrsus.com said: > I think the hack is to force libgcc_s to be loaded early. I don't know how > to do that in waf. There are two problems in this area. One is the end-of-thread code not getting locked into memory. I think that is what you are running into. The other is a tangle of error handling on out-of-memory issues by things like pthread_create and DNS lookup. I think the latter end up with a retry error code. I think I fixed some/many of them to crash rather than retry on the assumption that memory wasn't going to get freed and I didn't know of any other reason to retry. But that was a long time ago (maybe pre fork) and I don't remember the details. I think we should copy the warmup code from ntp classic. It's basically an upstream bug. Warmup seems like a reasonable work around. It's in ntpd/ntpd.c Search for NEED_PTHREAD_WARMUP and backup over the long comment which describes what's going on. There is a note about not working on FreeBSD. I haven't sorted that out. It may refer to the linker hack. Here are the bugs I remember: https://bugs.ntp.org/show_bug.cgi?id=2831 FreeBSD page fault story, morphs into lock discussion https://bugs.ntp.org/show_bug.cgi?id=2905 rlimit/memlock discussion There is more info in various bugs: https://bugs.ntp.org/show_bug.cgi?id=2332 https://bugs.ntp.org/show_bug.cgi?id=2954 https://bugs.ntp.org/show_bug.cgi?id=2817 The signal/noise may not be good. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Use of pool servers reveals unacceptable crash rate in async DNS
e...@thyrsus.com said: > In this case, we have two possible complexity-reducing fixes. One is to > drop the memlock feature entirely. The other is to drop the buggy homebrew > asynchronous-DNS lookup from Classic and use libc's. Dropping memlock is an interesting idea. I can't think of any place where it is required today but my crystal ball for what we will need tomorrow has never been very good. What would you do if we discovered a case where we wanted it? We could try simplifying things to only supporting lock-everything-I-need rather than specifying how much. There might be a slippery slope if something like a thread stack needs a sane size specified. Is there a simple way to count page faults for a process? Or measure swapped out data and/or code that isn't swapped in? I don't think your use-libc approach will be as simple as you would like. It's not available on NetBSD or FreeBSD. Maybe I just didn't look in the right place. It's not in netdb.h where it is for Linux. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: My first positive structural change to NTP
> Here's how I think it should look: > -- > refclock shm unit 0 refid GPS > refclock shm unit 1 prefer refid PPS > -- I think you should start a list of that sort of change. Currently, we can switch between our code and ntpd classic. The same ntp.conf works for both. I think we should preserve that until we make an explicit decision that it's the right time to make the break. --- > Oh well...almost everyone disables remote querying anyway. It may be disabled for general IP addresses, but it's used all the time for monitoring your own servers. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: My first positive structural change to NTP
strom...@nexgo.de said: > I think that's still perpetuating a mistake. This whole business of having > to specify two servers (or refclocks) for the same thing should go away. There is a fundamental issue. With a PPS, there really are two sources of time. Internally, ntpd needs two different handles so you can see both sets of info on ntpq -peers and clockstats. Normally, each PPS has an associated serial stream. It would be good if there were a clean way to specify that rather than using the prefer kludge. strom...@nexgo.de said: > It's easy enough these days to tell udev what each device should be named, > so in principle there wouldn't even be a need to use anything but the > default names. Is there a udev equivalent on other OSes? I don't think it is necessary. A boot time script can setup symbolic links. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Use of pool servers reveals unacceptable crash rate in async DNS
e...@thyrsus.com said: >> Is getaddrinfo_a() in RTEMS? QNX? BSD? > It's not an OS thing, it's a toolchain thing. getaddrinfo_a() is > implemented using standard C and POSIX threads, it doesn't need OS-specific > support. Or it's in an optional extra library. > Linux has it because Linux uses libc whether you're compiling with gcc or > clang. Any of those other platforms will have it *if* they have (gcc || > clang) && glibc. My Linux man page says: #define _GNU_SOURCE /* See feature_test_macros(7) */ Link with -lanl. I couldn't find it in /usr/include/ on NetBSD or FreeBSD. On Linux, it's in netdb.h. -- If it uses threads, we still have the problem of not being able to load the thread cleanup code. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Use of pool servers reveals unacceptable crash rate in async DNS
e...@thyrsus.com said: >> We could try simplifying things to only supporting lock-everything-I-need >> rather than specifying how much. There might be a slippery slope if >> something like a thread stack needs a sane size specified. > I'm not intimate with mlockall, but it looks like it works that way now. There is a back door way to specify a limit. Part of it is the total. Part of it is the stack size for new threads. [way to count page faults] > I don't know. I can do some research, but I'm not sure "enough page faults > to merit memory locking" would be a well-defined threshold even if I knew > how to count them. If the answer was 0 then we wouldn't have to discuss the threshold. -- > I believe you're right that these platforms don't have it. The question is, > how important is that fact? Is the performance hit from synchronous DNS > really a showstopper? I don't know the answer. There are two cases I know of where ntpd does a DNS lookup after it gets started. One is the try again when DNS for the normal server case doesn't work during initialization. It will try again occasionally until it gets an answer. (which might be negative) The main one is the pool code trying for a new server. I think we should be extending this rather than dropping it. There are several possibles in this area. The main one would be to verify that a server you are using is still in the pool. (There isn't a way to do that yet - the pool doesn't have any DNS support for that.) The other would be to try replacing the poorest server rather than only replacing dead servers. DNS lookups can take a LONG time. I think I've seen 40 seconds on a failing case. If we get the recv time stamp from the OS, I think the DNS delays won't introduce any lies on the normal path. We could test that by putting a sleep in the main loop. (There is a filter to reject packets that take too long, but I think that's time-in-flight and excludes time sitting on the server.) There are two cases I can think of where a pause in ntpd would cause troubles. One is that it would mess up refclocks. The other is that packets will get dropped if too many of them arrive. I think that means we could use the pool command on a system without refclocks. That covers end nodes and maybe lightly loaded servers. --- It's worth checking out the input buffering side of things. There may be some code there that we don't need. I think there is a pool of buffers. Where can a buffer sit other than on the free queue. Why do we need a pool? -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Use of pool servers reveals unacceptable crash rate in async DNS
Possible crazy idea... How about we never kill the DNS helper thread. Just let it sit there in case it gets more work to do. The only cost is a bit of memory. Or maybe only do that if we are locking stuff into memory. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Use of pool servers reveals unacceptable crash rate in async DNS
e...@thyrsus.com said: > Ugh. Our options have just narrowed. I've just seen > libgcc_s.so.1 must be installed for pthread_cancel to work Aborted (core > dumped) > with memlock off in the build. Can you reproduce it? My guess is that you didn't really get memlock turned off. How about putting a break on mlockall or the call to it. (There is only one in ntpd.c) -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Wonky NTP startup and the incremental-configuration problem
An alternative option would be to implement rereading ntp.conf. For each line in ntp.conf, there are 3 possibilities. It's new or the value has changed, nothing has changed, or the item was dropped. The latter is the tricky case. The idea is to save a parsed copy of the old ntp.conf. As the new file is read in, kick out the old items (if any) as they get replaced. (Actually, move them to what will be the new saved info.) Anything left on the old saved-list needs to be set back to the default. That works for simple things like setting a parameter. It gets more complicated for things like server/pool/refclock. It feels like something that's reasonably clean with the appropriate table. We would need a way to test things. I wonder if we could do that from a script driving the debugger? -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
head broken if no refclocks
after a simple ./waf configure [murray@fed raw]$ ./waf build --- building host --- Waf: Entering directory `/home/murray/ntpsec/raw/build/host' [1/5] Processing ntpd/ntp_parser.y [2/5] Compiling build/host/ntpd/ntp_parser.tab.c /home/murray/ntpsec/raw/ntpd/ntp_parser.y: In function âyyparseâ: /home/murray/ntpsec/raw/ntpd/ntp_parser.y:996:33: error: ânum_refclock_confâ undeclared (first use in this function) for (dtype = 1; dtype < (int)num_refclock_conf; dtype++) ^ /home/murray/ntpsec/raw/ntpd/ntp_parser.y:996:33: note: each undeclared identifier is reported only once for each function it appears in /home/murray/ntpsec/raw/ntpd/ntp_parser.y:997:12: error: ârefclock_confâ undeclared (first use in this function) if (refclock_conf[dtype]->basename != NULL && !strcasecmp(refclock_conf[dtype]->basename, $2) == 0) ^ Waf: Leaving directory `/home/murray/ntpsec/raw/build/host' -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Our testing sucks
1007 ./waf configure --refclock=20,22 --enable-debug-gdb 1008 ./waf build 1009 gdb ./build/main/ntpq/ntpq (gdb) run -p Starting program: /home/murray/ntpsec/raw/build/main/ntpq/ntpq -p Missing separate debuginfos, use: dnf debuginfo-install glibc-2.21-13.fc22.x86_64 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". remote refid st t when poll reach delay offset jitter == Program received signal SIGSEGV, Segmentation fault. 0x00413d68 in strlcpy (dst=0x7fffd700 "", src=0x4f , siz=1025) at ../../libntp/strl_obsd.c:36 36 if ((*d++ = *s++) == '\0') Missing separate debuginfos, use: dnf debuginfo-install ncurses-libs-5.9-18.20150214.fc22.x86_64 (gdb) bt #0 0x00413d68 in strlcpy (dst=0x7fffd700 "", src=0x4f , siz=1025) at ../../libntp/strl_obsd.c:36 #1 0x0040a561 in doprintpeers (pvl=0x625460 , associd=1947, rstatus=37914, datalen=2, data=0x62880d "\r\n", fp=0x7725b620 <_IO_2_1_stdout_>, af=0) at ../../ntpq/ntpq-subs.c:1795 #2 0x0040a8c8 in dogetpeers (pvl=0x625460 , associd=1947, fp=0x7725b620 <_IO_2_1_stdout_>, af=0) at ../../ntpq/ntpq-subs.c:1877 #3 0x0040aae1 in dopeers (showall=0, fp=0x7725b620 <_IO_2_1_stdout_>, af=0) at ../../ntpq/ntpq-subs.c:1928 #4 0x0040ad9a in peers (pcmd=0x7fffe220, fp=0x7725b620 <_IO_2_1_stdout_>) at ../../ntpq/ntpq-subs.c:2008 #5 0x00404bc2 in docmd (cmdline=0x419d08 "peers") at ../../ntpq/ntpq.c:1649 #6 0x00402cda in ntpqmain (argc=0, argv=0x7fffe478) at ../../ntpq/ntpq.c:658 #7 0x00402426 in main (argc=2, argv=0x7fffe468) at ../../ntpq/ntpq.c:442 (gdb) -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
waf list shouldn't need to be configured
$ ./waf --list --- building host --- The cache directory is empty: reconfigure the project $ -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Our testing sucks
1010 ./waf configure 1011 ./waf build [ 74/206] Compiling ntpd/ntp_intercept.c ../../ntpd/ntp_control.c: In function âctl_putpeerâ: ../../ntpd/ntp_control.c:2319:8: error: âstruct peerâ has no member named âprocptrâ if (p->procptr != NULL) { ^ ../../ntpd/ntp_control.c:2322:10: error: âstruct peerâ has no member named âprocptrâ p->procptr->clockname, p->refclkunit); ^ ../../ntpd/ntp_control.c:2322:33: error: âstruct peerâ has no member named ârefclkunitâ p->procptr->clockname, p->refclkunit); ^ Waf: Leaving directory `/home/murray/ntpsec/foo/build/main' -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
waf --list needs to show old numbers as well as new names
It's handy if you are updating a script. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
New ntpq peers chops refclocks to 6 characters
But there is lots more room in that column. I think it will hold a worst case IPv4 numerical address. remote refid st t when poll reach delay offset jitter == HP5850 .GPS.0 l7 6410.0000.000 0.000 PPS(0) .PPS.0 l- 6400.0000.000 0.000 SHM(0) .SHM.0 l5 6410.000 -218.14 0.001 SHM(1) .SHM.0 l4 6410.000 -1.094 0.001 GPSD(0 .GPSD. 0 l3 6410.000 -308.94 0.001 GPSD(1 .GPSD. 0 l2 6410.000 -0.209 0.001 -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: The new refclock directive is implemented and documented
e...@thyrsus.com said: > and the > noun/verb "fudge" is reserved for the two time offset options. Why? What's the difference between a flag that gets set to 0 or 1 and a time that gets set to a number? > There will be a *limited* open period for bikeshedding about the driver > names. hp58503a should probably be hpgps. It works for several devices. -- You need a plan for testing this stuff. I won't be helping since I think it's important to be able to run ntp classic. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: waf --list needs to show old numbers as well as new names
> Can you show me an example of this sort of script? How do you build things for your collection of systems? Do you really type the configuration in by hand each time? Do you use --refclock=all? Here is a fragment that I translated by hand: --refclock=irig,nmea,pps,hp58503a,shm,gpsd -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: New ntpq peers chops refclocks to 6 characters
How does your new stuff handle multiple instances of a refclock type? For a test case, I suggest a USB driver in addition to a HAT. Try both NMEA/PPS as well as both SHM and various combinations. The JSON driver uses the high bit of the unit to enable/disable the PPS. The NMEA and HP drivers use the mode/ttl slot to select the baud rate. There are probably others that do something similar. As long as you are changing things, you might as well clean up how the baud rate gets passed in. It needs that before it opens the /dev/tty. The old fudge stuff is too late. (Or was at one point.) -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: The new refclock directive is implemented and documented
e...@thyrsus.com said: > I don't think shm needs to change at all. It says what it is - data coming > over System V shm, which defines its own format by the shared structure I like SHM. I think there are non-gpsd sources of SHM data. I have no strong preferences for gpsd vs json. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: waf --list needs to show old numbers as well as new names
e...@thyrsus.com said: >> --refclock=irig,nmea,pps,hp58503a,shm,gpsd > I'm not seeing a problem here. Isn't it trvial to get those names from, > e.g., https://docs.ntpsec.org/latest/refclock.html ? The problem is not to "get the names", it's to translate an old number to the new name. You may have forgotten which driver you tossed into that setup or why, and maybe now is not the time to clean up that sort of thing. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: New ntpq peers chops refclocks to 6 characters
e...@thyrsus.com said: > More suggestions like this, please. bps may not be enough. There is also the parity and stop bits, but I don't think they are fiddled much. The HP driver uses one mode bit to switch from whatever the default is to a different baud rate and parity. It may be simpler to use a device-type keyword rather than require the user to know about bps and such. The NMEA driver uses a chunk of the mode field to select the type of sentence to use. Something symbolic would be nicer. The palisade driver has an option that selects from several modes. There are probably others. It's probably worth a pass through the code and/or documentation before you do anything. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
offset: time1 or time2
e...@thyrsus.com said: > Which reminds me: an addition I'm considering is adding "offset" as a > synonym for time1 or time2, whichever one usually sets an offset for time > reported from the unit. Only., I'm not clear which it should be; either it > varies by driver or I'm not understanding the documentation properly. Can > you shed any light on this? The problem arises when you have something like the NMEA driver that tries to handle the PPS by itself. That needs two offsets, one for the serial port and one for the PPS. My suggestion would be offset and pps-offset. I think the only way to be sure what is going on would be to go through the drivers one by one and make a chart of their usage. That might be handy for other uses. If you make one, consider adding it to docs/ Just put a date on it. It's probably worth grepping the drivers to make sure the code agrees with the documentation. I think there was a reasonable attempt to keep the usage common across drivers but I won't be surprised by any differences. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: The new refclock directive is implemented and documented
e...@thyrsus.com said: >> hp58503a should probably be hpgps. It works for several devices. > OK. Can you enumerate some other devices so I can list them in the header > comment and on the driver page? The documentation already mentions the Z3801A. There are a lot of them in the ham/hacker community courtesy of the cell phone industry many years ago. I think these were the first GPSDOs available at recycled prices. The manual is available and makes a good read for background info on GPSDOs. There is lots of non-HP info available on the web. They are really old (GPS software says COPYRIGHT 1991-1995 MOTOROLA) so the GPS units aren't very sensitive. There are several other Z38xx versions available. There are usually a few of them on eBay. Recently, a batch of new Z3811/3812 two unit pairs appeared, also known as KS-24361. Lucent unloaded their stockpile. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: The new refclock directive is implemented and documented
e...@thyrsus.com said: > An argument for "json", maybe. But not a really compelling one, because > GPSD defined the protocol and anything else emitting it would probably be > emulating GPSD deliberately. I think I prefer JSON for the same reason I like SHM. I think the real question is does the current driver depend on any GPS(d) cruft. If all I wanted was time, could I send just the offset and would the current driver take it. Maybe gpsd should have a time-only mode. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Use of pool servers reveals unacceptable crash rate in async DNS
cbwie...@gmail.com said: > How are pool entries added when the service decides it needs more? There is some background stuff that roughly says "need more?", and if so fires off the DNS lookup. > Would it be possible to leverage this code for adding all servers specified > by name? Probably not directly, but it wouldn't be hard for the server code to use more than one address if that was desired. Maybe it should be "servers" rather than "server". Do you have an example where that would be useful? If you don't have lots of servers, you probably don't want to switch to using "pool" since that path will probably keep banging away at the DNS looking for more servers. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Use of pool servers reveals unacceptable crash rate in async DNS
cbwie...@gmail.com said: > I was thinking of setting up associations using the DNS lookup code. If the > mechanism for adding new pool servers was blocking on the DNS call but > asynchronous to the rest of the daemon, I was figuring to call the lookup > with the name provided by the server directive. The only real difference > between a specified server and a pool server is that you don't delete the > specified server. The DNS lookup for server and pool both take the same general path of using another thread to do the lookup. If all goes well, the server stuff could do the lookup during startup. But there are all sorts of ways for DNS to not work. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
ntpq mrulist: cpu hog
I have a pool server. mru maxmem is set big enough to capture a whole day. Each midnight, a cron job fires off to capture everything to a file. The file is 100 megabytes. While that is going on, ntpq is using 95% of the cpu. If anybody is looking for a nice distraction, it would be interesting to understand what's going on and see if it could be fixed. (I haven't looked at the code yet.) -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Use of pool servers reveals unacceptable crash rate in async DNS
e...@thyrsus.com said: > After discussion with Daniel about the performance and security issues I > deleted the memlock code. As the comment explains: I think changes like that are worthy of a general announcement. > on modern systems, which swap so seldom > that many people don't bother with swap partitions I think you have extrapolated from some modern systems to our whole target environment. I don't remember any discussion supporting memlock not being interesting/important. I'd be a lot happier if you had a plan for what to do if it turned out to be a problem and/or a way to verify that we don't need it or detect that it causes trouble. Consider ntpd running on an old system that is mostly lightly loaded and doesn't have a lot of memory. I could easily imagine ntpd getting swapped out when some load did come along. I don't know how to evaluate if that will cause problems and I don't think we have a test environment that is likely to blunder into it. I poked around a bit. Linux and NetBSD and FreeBSD all have getrusage(). I didn't notice any differences. It covers page faults and CPU usage. When I'm in the right mood, I'll add another file parallel to sysstats to collect that sort of data. The CPU usage will probably be interesting even if page faults are boring. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Device driver mode bits and other skulduggery.
e...@thyrsus.com said: > One thing that jumps out at me is that several drivers have a clockstats > verbosity option, always flag4 (which, alas, is used for other things too). There may have been a general idea that flag4 would be used to enable clockstats from an individual driver instance. That's how refclock_shm uses it. I'd be happy if you nuked that test, aka always write it. (If clockstats isn't enabled it won't go anywhere.) Same for refclock_gpsd There may be others. I didn't check the drivers I'm not familiar with. It's probably not worth a lot of work in this area. > hpgps: >time1: PPS time offset The HP driver doesn't know anything about PPS. I assume that is a typo. > nmea: >flag3: clock discipline selection > pps: >flag3: PPS discipline select I would say "kernel PLL" in there. "discipline select" doesn't tell me anything. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Use of pool servers reveals unacceptable crash rate in async DNS
matthew.sel...@twosigma.com said: > "rlimit memlock 0" using Classic causes ntpd to died after 3 minutes with > this error 2016-06-29T00:13:21.903+00:00 host.example.com ntpd[27206]: > libgcc_s.so.1 must be installed for pthread_cancel to work What version of Classic are you running? I though they had fixed that. > I've attached 15 minute graphs for "rlimit memlock -1" and "rlimit memlock > 128" using Classic. Locking memory seems to result in more stable graphs > over the time period that I was able to collect quickly. What are you plotting? -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Kernel PPS processing
http://users.megapathdsl.net/~hmurray/ntpsec/glypnod-pps-kernel.png If you turn on flag3 for a PPS driver on a Linux system, you get this error message: 06-20T12:25:32 ntpd[988]: refclock_params: kernel PLL (hardpps, RFC 1589) not implemented I poked around a bit. Those options are in drivers/pps/Kconfig Here is the key chunk: config NTP_PPS bool "PPS kernel consumer support" depends on !NO_HZ help This option adds support for direct in-kernel time synchronization using an external PPS signal. It doesn't work on tickless systems at the moment. So I pulled over the sources and built a kernel with NO_HZ turned off and NTP_PPS turned on. The next project is to figure out why it works so much better, or rather why the normal ntpd can't do a lot better. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: adns is looking plausible
e...@thyrsus.com said: > I haven't looked at the code itself yet, but from reading the C header file > and the website, adns is looking like a plausible replacement for our > homebrew async-DNS. Good find! One feature that pushes me in that direction is being able to get at the TTL. > and reducing our KLOC ... It's not obvious to me that reducing our code at the cost of dragging in another library is progress. Do we even have a list of the libraries that we now depend upon? How do we evaluate their risks and/or trustworthiness? It's available as a package in Fedora and NetBSD and FreeBSD. I take that as a vote of confidence, but I don't know how strong. Does anybody know a simple to find out how many other packages depend upon a given package? Where does this fit in our our overall priorities? What can I do to help get you back to working on TESTFRAME? Should we rip out the current intercept stuff (few KLOC) and start over when we are done with what seems to be turning into a long series of other projects? -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Kernel PPS processing
> Can you quantify the better? I would have expected identical... Did you look at the graph? http://users.megapathdsl.net/~hmurray/ntpsec/glypnod-pps-kernel.png I'm not sure why you would expect performance to be identical. Dave Mills and crew went to a lot of effort to get code into various kernels, including writing a RFC. I'd be very surprised if it wasn't a significant improvement. The question in my mind is have things changed enough since 1994 so that we can do as well without that code in the kernel? -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Technical strategy and performance
fallenpega...@gmail.com said: > Thank you Eric. Have read, am pondering, and welcome other people to weigh > in. The big picture question that comes to mind is why did we start by forking ntp classic? Why not start from scratch? Did anybody consider chrony? What other options are/were there? Where would I look to find a crisp statement of the goals of the project? -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Kernel PPS processing
matthew.sel...@twosigma.com said: > We tested booting with "nohz=off intel_idle.max_cstate=0" and it made a > difference in our production clocks. Interesting. Thanks. How did you decide to go there? Did you try those 2 changes separately? Was that with PPS or just a typical system? Are you using the kernel from a distro or building your own? -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Technical strategy and performance
Thanks. I didn't see any surprises. I'm happy with the general idea, it's the details that get interesting. Removing cruft is good. Removing features is not. There is a trade off between the cruftiness of the code and the importance of any features it includes. This example gets tangled up in several issues. I didn't see much discussion. I seem to be the only one who occasionally pushes back when you hint at removing stuff. I can't tell if I'm making the right amount of noise or not enough or too much. Most of the cruft you remove looks like progress to me, but I can't tell if/when you are going too far. It's a judgment call. Sometimes I don't care much. Sometimes I do. One of the complications for this case is that we don't have a good way to test things. This feels like the sort of problem that might come back as a hard to debug example way off in a far away datacenter where it would be even harder to debug. I don't like that sort of problem so I'm probably willing to put up with a bit of cruft in the code in order to reduce the risk. You haven't convinced me that modern hardware will make this problem go away. Yes, it will reduce it, but that also makes it harder to test. Your comment about no swap space was timely. I lost a cron job a few days ago because it ran out of memory. I don't know enough about modern data center operations. On VM systems, they charge for memory. ... Did you consider simplifying things rather than removing everything? (Sorry for not suggesting this sooner.) Most of the cruft was in figuring out how much to lock. Would locking everything be simple enough? --- I thought there was a command line switch to use the real-time scheduler but I can't find it. If it's there, it might be cruft to clean up. If it's not there, it might be a good feature. There would be complications with lots of traffic locking up the CPU. --- There is another interesting consideration when using old hardware. They take a lot of power. At some point, it's cheaper to buy new gear that doesn't use as much power and has more memory while you are at it. I computed the pay back time once, and it seemed like a good excuse to get some new toys. The next time I did the calculations, I got a an answer I didn't like as much. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Kernel PPS processing
g...@rellim.com said: >> I'm not sure why you would expect performance to be identical. > Because thhey use the same kernel generated time stamp and PLL algorithm. There are two chunks of PPS code in the kernel with separate RFCs. One is getting the time stamp. The other is doing the PLL. The in-kernel PLL is totally different from anything in ntpd. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Kernel PPS processing
g...@rellim.com said: > Wow. I thought something was wrong. My local clock offset (peerstats file) > has always been hanging around 100ppm. Stable to ±1ppm so I figured > that was normal. > After reboot the local clock offset started at 9ppm and has been slowyly > going down, now under 2ppm. Your units don't make sense. Offsets would be in units of seconds. I assume you mean some fraction of a second. A guess would be microseconds. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Kernel PPS processing
> Local clock frequency offset, as opposed to local clock time offset. Most NTP documentation calls that drift. Its magnitude is not very interesting when discussing quality of time. Changes over time can be interesting. It's usually much more interesting to look at the clock offset. There are two sources for drift. One is crystal error. That part often makes a good thermometer. The other is software. If somebody gets the arithmetic a bit wrong, ntpd can correct just like it does for the initial hardware error. For many years, Linux had a not-good measurement of the system clock frequency at boot time. If you rebooted, you got a different answer. It was close, just not good enough in the low bits if you wanted good timekeeping. Jun 2 10:34:25 fed kernel: tsc: Detected 1596.750 MHz processor Jun 9 11:06:24 fed kernel: tsc: Detected 1596.966 MHz processor Jun 19 11:42:22 fed kernel: tsc: Detected 1596.978 MHz processor -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: My task list
> 1. Try replacing our buggy async-DNS code with the c-ares library. You keep calling the existing code "buggy". Is that correct, or are you just being sloppy since you don't like it (perhaps justifiably) and it has triggered bugs/quirks in other parts of the system. As far as I can tell, our code is innocent. The recent troubles are some combination of libc/memlockall and pthreads not working well together. We just happened to trigger it reliably enough to cause troubles but not reliable enough to make testing simple. > 2. If that succeeds, reinstate memlocking long enough to check if the >crash bug recurs. If it doesn't, leave memlocking in. The old memlock code, or a simplified lock-everything (no parameters) version? If any new code uses threads, it's going to have the same problem. I'd vote against restoring the old code until you have figured out how to test it. > 3. Collect the results from my first profiling runs, now about 14 days of > data >Learn how to graph and interpret them. You might do that first since you will probably want to tweak something and collect more data. Data for a day will tell you most of what you will ever get. If you have lots of data, then you have to scan it looking for glitches. Consider bumping the clock and watching it recover. (util/bumpclock) There are two interesting cases. One is a big bump so it will "step" the clock to recover. The other is a small bump so it will slew (slowly) to recover. The split is 128 ms. So I'd try 200 ms and 100 ms. > 5. Do the cleanup required to get the code compiling under -std=c99. What does that involve? TESTFRAME is missing. How about we both clear our schedules and desks and give it another try? How about next Wed? -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Technical strategy and performance
e...@thyrsus.com said: > In many cases, especially in governmant, they *can't* -- they have lengthy > certification requirements for new infrastructure components. If they are on the ball, they will have to do almost as much work to (re)certification after all the changes we have made. >> Where would I look to find a crisp statement of the goals of the project? > On the project website. > https://www.ntpsec.org/announcement.html > https://www.ntpsec.org/plans.html I don't see anything in either page that I would call "crisp". Yes, there is lots of good stuff. If you know what the answer is, you can find lots of supporting info. The plans are all down in the weeds, details rather than big picture. The announcement is background and handwaving. If I asked you, does "X" fit in the scope of the project, you can scan the plans to see if you find a match but if not, good luck trying to figure it out from the announcement. Take the memlock discussion. Is there anything in the announcement that says we focus on modern systems with lots of memory? -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Kernel PPS processing
g...@rellim.com said: > I took another look, and realized I misunderstood the y axis. And that you > are plotting loopstats and I'm looking at offsets. So not the bad I > thought. I can't figure out what that means. I was plotting the offset column from loopstats. > To get apples and apples, can you send me your gnuplot formula? I'll add it > to my chrony graph. >From another handy graph: set ylabel "Offset ms" set y2label "Drift PPM" Plot \ "glypnod-loop" \ using ($2/3600):($3*1000) \ title "Offset" with lines lt 1, \ "glypnod-loop" \ using ($2/3600):($4) \ axes x1y2 \ title "Drift" with lines lt 3 -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Master Does Not Compile on Centos 6
j...@rtems.org said: > This likely fails on other platforms since it is a mismatched brace: Yes. Anything without a refclock. Amar: Buildbot needs a few more build runs with various configurations to catch things like this. They don't need to be run on all systems but they should be run on at least one. My straw man to start with would be one with minimal features and one with everything. We should run them on everything when getting ready for a release. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Master Does Not Compile on Centos 6
>> This likely fails on other platforms since it is a mismatched brace: > Yes. Anything without a refclock. Fix pushed. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Technical strategy and performance
ja...@azze.org said: > This is why I try to make noise when things are broken on RHEL/CentOS 6.x. I > don't see a builder for that OS on buildbot.ntpsec.org. The Red Hat > Enterprise family (RHEL, CentOS, Scientific Linux, Oracle Enterprise Linux) > and SuSE Linux Enterprise Server are where we boring, conservative sysadmins > like to live. There are a lot of us who haven't moved off of RHEL 6 > (supported through 2020) for critical infrastructure because RHEL 7 went > systemd on us. Is CentOS reasonable coverage for the Red Hat side? What versions do we need? Is Scientific Linux enough different that it's worth running it too? If so, what versions? Is openSUSE reasonable coverage for SUSE? If so, what versions do we need? Is there a free version of Oracle? -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Technical strategy and performance
e...@thyrsus.com said: > There are some prerequisites. Libraries need the library installed to run > and in addition, the development headers installed to build. > Python 2.x, x >= 5 > bison > libevent 2.x > libcap > OpenSSL > GNU readline > BSD libedit > sys/timepps.h > asciidoc, a2x It's (much?) more complicated than that. Python and bison are needed to build and install. Python may be needed by some utilities. libcap and timepps.h are optional. libcap on Linux is needed for drop root. timepps is needed for PPS support. ntpd will run fine without them. For the crypto stuff, we need libcrypto. On Fedora, that comes from the cryptopp package. I'm not sure how that's tangled up with OpenSSL. We can build without it but you won't get the crypto stuff. asciidoc is only needed to build the documentation. If necessary, you could build it on a different system and copy it over. I think readline and libedit are only needed by utilities like ntpq. I think a tarball avoids some of the build requirements. I'd guess bison and asciidoc, but I'm not sure. We should probably setup a cross compile example. e...@thyrsus.com said: > H. Daniel, shouldn't that OpenSSL be replaced by libsodium? Please > write up an entry on that. We have a local copy of whatever we need from libsodium. We need libcrypto for the crypto stuff. Amar: Buildbot should probably include a system that doesn't have any of the optional stuff installed - just to make sure we really can build on it. Eric: There is code for md5 and sha1 (I think) that gets built if you don't have the appropriate library. I don't remember when it gets included. Have you considered nuking it? Or do we want to retain a private copy of the basic crypto routines so we don't depend on another package. If so, we will probably need our copy of sha256. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
asciidoc tables
The table widths have things like: [width="100%",cols="<34%,<33%,<33%"] I find that makes a table that is ugly and hard to read. I could tune the widths, but I don't know how wide the viewer's display will be. Is there a better way to do things? I'd like to say "make this column as wide as necessary to hold the widest content", and mark the last column as the one to wrap if the total doesn't fit. This isn't a big deal, but it's annoying enough that I would put in some effort to fix things if I knew how to do it. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Add usestats to collect resource usage statistics
I just pushed the code. You will get things like this: 57570 76638.360 3600 19.221 29.499 1541 0 0 0 2984 288288 2123 0 8428 57570 80238.357 3600 20.812 25.956 1062 0 0 0 3024 246608 2274 0 12652 57570 83838.357 3600 23.353 26.497 833 0 0 0 2992 255329 2556 0 16164 57571 1038.358 3600 31.154 31.335 1027 0 0 0 3088 310802 2393 0 20280 57571 4638.357 3600 31.467 28.972 859 0 0 0 3120 266748 2469 0 23676 57571 8238.357 3600 43.700 38.214 1525 0 0 0 2976 369410 3247 0 29748 57571 11838.357 3600 35.270 24.945 644 0 0 0 3112 226024 3155 0 32384 57571 15438.357 3600 46.356 29.400 1439 0 0 0 2856 278092 1971 0 37928 The data is from getrusage(). The last column is the high water mark for the "resident set size" in kilobytes. If anybody figures out exactly what that means, please clue me in. The above is from a pool server setup with the mrulist limit big enough to hold a whole day. It's still ramping up. The floating point numbers are user and system CPU usage. The last line shows a total usage of 2.104%. One of the 0s is page faults. More into in ./host/docs/monopt.html or docs/includes/mon-commands.txt -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Avoiding merge bubbles
Thanks. I hate that crap as much as anybody. > git pull --rebase I missed the --rebase part. Is there any way to set things up so --rebase is the default with pull? Is there any way to recover after I forget? Can we fix the push process to reject pushes if they have that type of comment? (I think it already rejects pushes that don't build, so the mechanism is there.) -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Avoiding merge bubbles
e...@thyrsus.com said: >> Is there any way to set things up so --rebase is the default with pull? > Yes. If you look in your .git/config, adding the "rebase = true" line will > set --rebase for all pulls from master. Thanks. Where should that be documented? I think I set that when you sent out a similar message a long time ago but I lost it when making a new clone. Are there other git quirks that should be documented? >> Is there any way to recover after I forget? > Not short of repository surgery. Remember the hash chain - git is actually > designed to make it difficult to modify old commits. If the crap is in my local copy, I can move the whole directory to the side, get a new clone, and merge my edits back in. Recovering my edits could be ugly, but in the no-collision case it's not that hard. Diff the directories to find the files you have edited, copy them over and commit... >> Can we fix the push process to reject pushes if they have that >> type of comment? > Theoretically possible, but probably a bad idea. We will probably have to > do real branch merges occasionally. I was thinking of looking for an exact match on the default commit message. If there was a real collision the message should say something interesting. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
memory locking
e...@thyrsus.com said: > BTW, I think I've knocked the mlockall/threads/async bug on the head. I > swiped some code from chrony that does memlocking after telling ntpd it can > have as much memory as it wants - ntpd's worst-case memory requirement ain't > much. I've had that version running continuously for about 14 hours merrily > swapping pool servers in and out with no crash. Thanks. ntpd can take a lot of memory if you collect the statistics for who is talking to you on a busy machine, for example a pool server. It depends on how busy and how long you want to keep the data. Is there any way to turn that off? I don't see any mention in the documentation. Where should that go? -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: memory locking
e...@thyrsus.com said: > I'm not sure what the referent of "that" is. The statistics-gathering I've > seen seems to be all about writing line-at-a-time records to various stats > files; I can't see that generating a lot of memory pressure. > If there's somewhere in the code that is allocating memory proportional to > the size of saved statistics, yes that could be a problem. Do you have some > specific case in mind? Different context for statistics. The MRU list keeps tracks of traffic from each IP Address. It's in the misc options page under "mru". You can see it with ntpq -c mrulist. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Is digest mode working for mailing lists?
A few weeks ago, I signed up for bugs and vc in digest mode. I thought I got one message, maybe one each list, but I haven't seen anything since. I see stuff in the archives for vc but the archives for bugs is empty. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Refclock quirk
I'm seeing things like this: remote refid st t when poll reach delay offset jitter == +fe80::21e:c9ff: .PPS.1 u 86 1024 3770.483 -3.643 0.330 +fe80::226:2dff: .PPS.1 u 758 1024 3770.582 -3.652 0.392 +fe80::226:2dff: 192.168.1.33 2 u 802 1024 3770.591 -3.632 0.362 *GPS_NMEA(0) .GPS.0 l 21 64 3770.000 74.403 5.791 oPPS(0) .PPS.0 l 84 1024 3770.000 -3.750 0.332 +glypnod .PPS.1 u 45 64 3770.361 -3.683 0.012 +shuksan .PPS.1 u 60 64 3770.313 -3.656 0.021 +mon .PPS.1 u 39 64 3770.552 -3.801 0.034 +tom .PPS.1 u 31 64 3770.506 -3.770 0.037 +cent.PPS.1 u 31 64 3770.442 -3.729 0.014 That's on a raspberry Pi with a GPS HAT. The glitch is the 1024 polling interval for the PPS. That shouldn't be there. Note that the NMEA driver is at 64. I don't remember seeing anything like that before your recent refclock changes. There is a maxpoll 6 on the 5 servers after the PPS. Nothing on the first 3 or either refclock. I thought the refclocks used to set their own polling interval, but I can't find that code. (I remember changing something many years ago.) -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: memory locking
> esr@snark:~/software/ntp-rescue/ntpsec$ ntpq -c mrulist > ***Command `mrulist' unknown I don't know what's wrong on your end. When I cut/paste that line, I get things like this: Ctrl-C will stop MRU retrieval and display partial results. Retrieved 5 unique MRU entries and 0 updates. lstint avgint rstr r m v count rport remote address == 1 330 . 3 4 3833 123 shuksan 21 370 . 3 4 3394 123 mon 27 520 . 4 4 2399 123 glypnod 31 650 . 4 4 1916 123 cent 64 440 . 4 4 2826 123 tom e...@thyrsus.com said: > Name must have changed. But I remember seeing the code for that. Looking in > ntp_control.c...looks like memory used used is O(n) in the number of peers. Where "peers" includes clients rather than just "peers" from ntp.conf -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Refclock quirk
e...@thyrsus.com said: > Doesn't show up in bisection, and now doesn't reproduce with the head > revision either. There is no point is bisecting unless you have a test case that fails on head. Please try using NMEA and PPS rather than SHM. You have to wait a while. I'm not sure how long. I think it's the normal ramp-up on polling interval. e...@thyrsus.com said: > Try nuking the build directory, re-configuring, and rebuilding. You might > have a stale binary somewhere. I have a script that does that. I'll poke around... -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Refclock quirk
hmur...@megapathdsl.net said: > You have to wait a while. I'm not sure how long. I think it's the normal > ramp-up on polling interval. It takes about 6 minutes. remote refid st t when poll reach delay offset jitter == ntp.mcast.net .ACST. 16 a- 6400.0000.000 0.004 +glypnod .PPS.1 u7 64 770.4060.650 0.133 -shuksan .PPS.1 u 10 64 770.2880.356 0.355 +mon .PPS.1 u6 64 770.5310.637 0.165 -tom .PPS.1 u9 64 770.5150.600 0.161 +cent.PPS.1 u5 64 770.4480.625 0.143 *NMEA(0) .GPS.0 l 12 64 770.000 -14.045 1.068 oPPS(0) .PPS.0 l 11 128 770.0000.070 0.092 +fed 192.168.1.3 2 u8 64 770.5210.669 0.140 +fed2192.168.1.3 2 u7 64 770.5370.694 0.169 +deb 213.74.106.159 2 b 14 64 760.5010.551 0.159 +deb2192.168.1.3 2 b 36 64 360.5630.643 0.162 -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Zero-configuration ntpd
> Default servers should be the global NTP pool. In general, it's a very bad idea to wire names or addresses into code, especially if you don't own/control the resource being used. This case is less-bad than many others since it is possible (maybe even easy) to change/fix. The problem is that you need an exit strategy. What are you going to do if your usage puts too much load on the pool (or it's DNS servers) or the pool goes out of business? You could make it a build time option so each distro could set things up to use their NTP servers. But they can already do that with a config file. Wiring in a default seems too complicated. You might be able to pick up some servers via DHCP. That probably belongs in the startup scripts rather than in ntpd itself. > Default for statistics can be no stats gathering. That's the current default. > There is currently no default drift file location. This is where I am not > sure of my ground - should there be one? If not, why not? It will work without one. It should be a bit slower to get started. We should test that case. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Anti-DDoS
Is there consensus on what we should be doing? Actually, I'm looking for a bigger picture of what all UDP services should be doing. DNS is the other obvious example. If you had asked me a year or two ago, I would have said "rate limiting" and thought that solved the problem. It does solve the reflection attack, but it opens things up to a different type of attack. A bad guy can deny service to Bob at selected servers by sending forged packets to those servers so they start rate limiting him. That doesn't take a lot of traffic so it won't stand out and most of the infrastructure won't even know there is a problem. (That does require that you can figure out what servers Bob is talking to.) Is there any good writeup on why BCP-38 is so hard to implement and/or why it isn't implemented more often? I assume it's money. Is the problem routers can't do it? (fast enough) Or maybe ISPs don't have their act together? -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Zero-configuration ntpd
g...@rellim.com said: > Default for statistics can be no stats gathering. > Agreed. They just grow forever. Ditto ntp.log that should default to the > system syslog. The main log file does default to syslog > Off-topic: ntpd should have a max number of saved logs. The default is no log files. I don't think ntpd should get involved with deleting anything. If nothing else, it's an insecurity opportunity. Debian has a cron job to do it. (I have to kill it since I want them saved forever.) -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: asciidoc tables
e...@thyrsus.com said: > Sadly, proportional is all you can do in the table model of XML-DocBook > (which is what asciidoc uses as a back end). Can I specify the total width in characters? Can we assume the width is appropriate for a man page? That might look ugly with narrow or wide web pages but it will probably be better for the typical case. (at least the way I read web pages) -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: question about upgrading from Classic to NTPsec (packaging issue)
j...@systemsartisans.com said: > I am in the process of trying to create an RPM package from the repo's > current head. Given that I would expect this to be used by sysadmins, etc. > who might already have installed the Classic version (very possibly from > their distro's package sources), how would you all suggest I treat the > presence of a ''conflicting'' version? Save the config files, and overwrite/ > remove all the old executables? Move everything aside, and put our entire > "best practices" package in place?? Ask for manual intervention??? Halt, > catch fire, burn it all to the ground? ;) So far, the config files are compatible so it makes sense to leave the old one alone if it has been edited. (The admin might have picked some good servers or setup logging.) If there is a conflict, my suggestion would be to rename the old stuff to classic-xxx and install the new stuff as ntpsec-xxx and setup links and provide a script to swing the links. Or something like that. There probably needs to be a script to uninstall the classic version and undo the links. Where would you document what happened and/or how to switch back? -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Kernel PPS processing
strom...@nexgo.de said: > On another tangent back to NTP, I'm wondering if it wouldn't make sense to > offload the timestamp filtering at least to the VC4. Most NTP boxes would > run headless anyway, so there'd be 16 processors sitting idle for that sort > of thing. Not likely. That sort of work doesn't take enough CPU cycles to worry about. If you put it someplace else, it just makes things harder to maintain and debug. >> Yeah, I'm wondering why the dealy in Linux kernel for 64 bit A8? > It wouldn't buy anyone anything of immediate use except having a complete > additional distro to build and maintain and more memory pressure to deal > with. I suspect that eventually a 64bit port will be added anyway to tick a > checkbox somewhere. If you don't need them, 64 bit pointers just take up more memory. That shows up as cache misses. There are 2 reasons that I know of for needing 64 bit pointers. The first is that you are using more than 32 bits of virtual memory. The second is that your hardware can't address all of physical memory when running in 32 bit mode. If your system uses the top bit for I/O, then you can only use 2 GB of physical memory. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Kernel PPS processing
g...@rellim.com said: > The big thing for NTP and gpsd would be the 64 bit math. Both do a lot of > 64 bit math. You can do 64 bit arithmetic without using 64 bit pointers. Somebody mentioned that the plan is have one boot file that runs on all Raspberry Pis. Are things setup so that user code builds that way too? If so, it might take a magic option to the compiler to get it to use the 64 bit instructions. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Requesting code review on possible fix for nopeer/pool conflict
dfoxfra...@gmail.com said: > The whole receive() function you're looking at is about to get blown away in > my ntp_proto refactor. Can you hold off on touching it until next week? Please don't push any big changes until Eric and/or I get the polling tangle fixed. dfoxfra...@gmail.com said: > One more reason I need to get my ACL language implemented and restrict needs > to die. If you kill restrict, we are taking a major step toward making ntp.conf file no longer compatible. Would it be possible to for your new code to support the old restrict stuff? (Similar to the way Eric's new refclock stuff still works with the old stuff.) --- > I think this working as designed. 'restrict nopeer' means "Don't establish > unauthenticated ephemeral associations with this IP address", which is > exactly what pool does. I agree this is stupid design but I don't think it's > a bug. I think the current setup is buggy, but maybe that whole area is more complicated than I currently understand. Maybe restrict needs a nopool tag so we don't get confused by peer vs pool. There is a fundamental problem here. What should happen if server/pool and restrict lines conflict? Is server DNS different from server IP Address? My straw man is that a restrict line with explicit IP Address(es) should block server/pool addresses but the default restrict should not. I'd also vote for conflicts to generate error messages at startup time and/or at DNS lookup time. The DNS lookup could try the next address and/or try again later (after TTL). -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Requesting code review on possible fix for nopeer/pool conflict
dfoxfra...@gmail.com said: > What exactly is the "polling tangle" you're referring to? I talked to Eric > about this earlier today, and he mentioned something about the polling > interval drifting to 1024 seconds on a consistently reachable server. But > AFAIK, nothing has changed and that's always been exactly the intended > behavior, as set by the "NTP_MAXDPOLL" constant. The problem is that the ramp up on polling interval is happening on refclocks. Maybe only on PPS refclocks. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Refclock polling ramp up
I just pushed a fix. Would you please sanity check... For servers, minpoll and maxpoll default to 6 and 10. There is also a check to make sure that minpoll isn't greater than maxpoll. For refclocks, minpoll defaults to 6 and maxpoll defaults to minpoll. The problem is that you were storing maxpoll and friends in a peer struct and by the time you were checking to see if it needed a default, it had already been defaulted by newpeer so the test was testing clobbered data. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Do we have a list of user visible changes from ntp classic?
It's probably all in NEWS (or should be), but that's chronological and seems hard to read. For example, the deleted refclocks are scattered all over the place. I think I'm suggesting something like CHANGES-form-ntp-classic -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Removing interleaved mode
dfoxfra...@gmail.com said: > With Eric's permission, I have removed support for interleaved mode in my > proto-refactor branch. Here is its commit-message eulogy: Seems fine with me. I've never used it. We should test things to make sure nothing strange happens. I think that requires 4 systems: 2 new and 2 old. (I guess it could be done with 2 if you are willing to run the tests serially.) This might be a good candidate for Pis with GPS HATs. It might take the kernel PLL to notice the difference. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Linux capabilites check broken on NetBSD
On NetBSD: 07-06T15:42:17 ntpd[4940]: root can't be dropped due to missing capabilities. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Weirdest bug yet.
No rawstats or protostats either. e...@thyrsus.com said: > It was adding "subtype" as an alias for "mode" in the lexical analyzer. This > somehow confuses the crap out of the parser's FSM. ... Remember the saveconfigandquit stuff you ripped out? That would have caught this. (if we had used it) How are we going to test the parser? Can we replace ntp_config (and most of ntpd) with a skeleton that catches all the call-outs and prints stuff? -- Are there any other aliases in the grammer? Ahh. I see this in ntpd/keyword-gen.c { "subtype",T_Mode, FOLLBY_TOKEN }, How about making a T_Subtype and adding it to ntpd/ntp_parser.y so that it does the same thing? -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Linux capabilites check broken on NetBSD
matthew.sel...@twosigma.com said: > NetBSD should be using the clockctl interface: > http://netbsd.gw.com/cgi-bin/man-cgi?clockctl+4.i386+NetBSD-7.0 Thanks. Eric, I should probably fix it since I have a test case. Should we add HAVE_SYS_CLOCKCTL to waf, or just test for __NetBSD__? -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Linux capabilites check broken on NetBSD
> Attempted port fix pushed. Please test. Missing the _H on HAVE_SYS_CLOCKCTL Fix pushed. More testing in the pipeline. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Requesting review of "Eliminate some pointless gymnastics in the config parser."
e...@thyrsus.com said: > About eight hours ago I removed some code that looked so stupid that I now > wonder if it was serving some purpose I don't understand. I don't know of any reason for the old code. Your change looks sane to me. I don't see how to fully test it. The iburst case seems to still work. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Any important bugs/quirks?
Things have been a bit, well, "interesting" the past few days. I think everything has been put back together. Is there anything that needs fixing that I/we have missed? (I'm not looking for new features we haven't implemented yet, just things that we broke or things we changed that don't work right yet.) -- Is this a good time for a release? (before we break anything again) -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
SIGHUP catcher, issue #78
I just pushed code that catches SIGHUP and reopens the log file if it has changed and checks for a new leapseconds file. You can poke it by hand with killall -HUP ntpd We should get a chance to test the new leap file stuff soon. It's time for a new one. Besides, a day or two ago, the news said they announced one for the end of this year so we'll get to test the run time too. I think there is a script to fetch a new leap file. This could be added to it. For logrotate on Linux, you can put a fragment like this in /etc/logrotate.d/ntpd /var/log/ntp/ntpd.log { monthly postrotate /usr/bin/killall -HUP ntpd endscript rotate } -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Sandboxing: How important is seccomp?
[For those not familiar with it, seccomp gives the kernel a list of syscalls that the program is allowed to use. All others becomes illegal. So if a bad guy finds a stack overflow there is (hopefully) a good chance that any code he tries to run will crash.] I've got it working on Intel. It doesn't work on ARM. More below. This is likely to have a long tail of cases that I can't test. There is an interesting tangle of which syscalls are used by various combinations of 32 vs 64 bit and the age of libc and kernels. I've been adding things to the list as I discover them. There are probably obscure combinations used by distros that I can't test and/or refclocks that I can't test may use strange calls. There is also the chance that things will change out from under us. For example, getrandom was added to the 3:17 kernel. So if you updated your kernel and glibc our ntpd would crash until we updated it or you disabled it at runtime. Assuming the environment supports them... Should we set things up so so that droproot and seccomp are required at build time unless you explicitly disable them? That just requires installing the required packages. Should we set things up so that droproot is required at runtime? I assume we add options to disable any runtime checks. (Can we use -u root:root to say no-thanks?) Both work on Linux. NetBSD supports droproot. - On ARM, it dies before it gets to our code. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1". Cannot access memory at address 0x0 Program received signal SIGILL, Illegal instruction. 0x76dabde8 in ?? () from /usr/lib/arm-linux-gnueabihf/libcrypto.so.1.0.0 (gdb) bt #0 0x76dabde8 in ?? () from /usr/lib/arm-linux-gnueabihf/libcrypto.so.1.0.0 #1 0x76da84b4 in OPENSSL_cpuid_setup () from /usr/lib/arm-linux-gnueabihf/libcrypto.so.1.0.0 #2 0x76fdeffc in call_init (l=, argc=5, argv=0x7efffd34, env=0x7efffd4c) at dl-init.c:78 #3 0x76fdf0d8 in _dl_init (main_map=0x76fff958, argc=5, argv=0x7efffd34, env=0x7efffd4c) at dl-init.c:126 #4 0x76fcfd84 in _dl_start_user () from /lib/ld-linux-armhf.so.3 Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Anybody know how to debug things like this?
I'm working on segcomp. I'm at the stage where things mostly work and I'm trying to find obscure code paths that use a syscall that isn't yet on the OK list. The SIGSYS means it tried to call something that wasn't on the list. Normally, a simple backtrace will let me can figure out what it is and add the appropriate call to the list. It might be in the magic part of creating a new thread, but that has been working for months. Program received signal SIGSYS, Bad system call. 0x41d810d8 in clone () from /lib/libc.so.6 Missing separate debuginfos, use: debuginfo-install glibc-2.16-34.fc18.i686 libattr-2.4.46-7.fc18.i686 libcap-2.22-5.fc18.i686 libgcc-4.7.2-8.fc18.i686 libseccomp-1.0.1-0.fc18.i686 (gdb) bt #0 0x41d810d8 in clone () from /lib/libc.so.6 #1 0x0001 in ?? () #2 0xb7fcbb40 in ?? () #3 0x in ?? () (gdb) info threads Id Target Id Frame * 1Thread 0xb7fcc6c0 (LWP 18519) "ntpd" 0x41d810d8 in clone () from /lib/libc.so.6 (gdb) -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Anybody know how to debug things like this?
> Seems like a situation made for investigating with Mozilla rr. Could you please say a bit more? I don't know anything about Mozilla rr. Why is that likely to help me in this case? I think I have tracked down the problem. It's trying to start a new thread. The clone syscall wasn't on the list, but we have started new threads before. The catch is that the sandboxing doesn't get called until late in the initialization procedure. So the first thread gets created before sandboxing turns off the clone syscall. If you trigger a second thread, that one dies. (My test case for that is closing the lid on a laptop for a short time.) It seemed like a good idea to move the sandboxing up earlier. The catch is that most of the initialization will get done as ntp rather than root, so permissions on devices for refclocks and files like ntp.keys need to be fixed. Anybody see any problems with that? I think old versions and ntp classic will still work running as root. --- I figured out the problem with gdb on Raspberry Pi. It got an illegal instruction from SSL before calling main. It's also got a signal handler. I assume it's a run time test for something. It works if you continue. The seccomp stuff doesn't work on Raspberry Pi. I haven't figured out why. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Anybody know how to debug things like this?
e...@thyrsus.com said: > It's like a symbolic debugger that keeps an execution trace and lets you > step backwards in time. Under rr you could induce the crash, then step back > to the last syscall. I don't think that's going to help. I'm in a signal handler from the current attempted syscall. What I need is to get the call number. Things are confused since the low level thread stuff is magic. (at least to me) Normally I can just get it from the name of the next frame on the stack. Sometimes glibc uses old or different syscalls. So far, I have always been able to figure things out. For some of the thread stuff, I had to look at the source for pthread_create, but google found it on github. > I'm worried about it. Not for any specific reason, it just trips my > "Danger! Danger!" sensors. I've got a bad feeling that it might be one of > those 'innocuous' changes that come back to bite us in the ass. Me too, but I can't see any solid reason. The lock in memory needs to be moved up too. (Only root can ask for everything, or we need another capability or ...) I'll keep testing it. It makes finding syscalls used by threads/DNS easier, and we'll need that in case the pool stuff decides it needs more servers. > I want to ask a different question: why the early thread launch? Can we move > that? It's getting called from config_peers in ntp_config. I don't know of any reason why we couldn't move it. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Anybody know how to debug things like this?
e...@thyrsus.com said: > The only safe alternative would be to force the initial DNS lookups to be > synchronous. That doesn't work for the pool case. It wants to get more servers if some of the ones it is using stop responding. > A: get configuration (that's the early thread launch) We could split the DNS lookups out of that. It would add a new step to your list. > E. initializing = false > Moving F and B is no problem, but the others worry me - especially the > setting of initialize, which does things I don't understand to the protocol > machine. My spider sense is tingling. The DNS lookups may take a long time, so starting them a bit later should be OK. It wouldn't surprise me if we discovered problems. That's a mixed blessing. Finding problems is always good, but we would also have to to fix them. I've found 2 more quirks associated with early sandbox. The PID file needs to be writable by ntp. That should be easy to fix - just some scripting before starting ntpd. systemd doesn't have that scripting, but it doesn't use a pid file. The other is that NetBSD won't let ntp open wildcard sockets. It may be sockets with port# less that 1024. I don't have a solution but I haven't looked very hard. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Anybody know how to debug things like this?
e...@thyrsus.com said: > I'm in favor of cleaning up and fixing some of these order dependencies, but > I'd rather get us to a safe and functioning state first. Accordingly, > splitting out seccomp() implementation to do it early and keeping droproot > late is looking better and better. I found another case that doesn't work with early drop root: opening the first 2 SHM slots. I'll add an option to use early drop root and push what I have. When it gets to the top of the list, we should cleanup the SHM handshake. If we make the handshake use 2 counters rather than a ready flag, the read side can be read-only and this sort of problem will go away. That lets us have multiple users so we can run debugging/monitoring code in parallel with ntpd. That will either take a command line switch to gpsd or it will have to setup duplicate SHM slots (with different names). -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Odds and ends...
There is some ugly code in ntp_loopfilter that's setting up a signal handler in case ntp_adjtime doesn't work. It's the sort of stuff Eric loves to rip out. I can't figure out why that code would be useful. I expect we should figure that out at build time. I've commented it out. It's still working on all of the systems I have access to. If it blows up for you, please tell us what type of system you are running on. Does anybody have any experience with seccomp? How useful is it? I think it's Linux only. The idea is to tell the kernel what system calls the program uses so that if a bad guy finds something like a stack overflow, the program will die if his exploit uses any other syscall. We inherited code from ntp classic, but it didn't work. I poked around. It needs a library. I've got it working on Intel. It builds on ARM, but gets an Invalid argument error at runtime. I've pushed two configure time options that let us test seccomp. The catch is that it's not simple to figure out which syscalls are actually used. A lot of that stuff is hidden in libc and friends. I've been adding them as I discover them. It's working on all the systems i can test on. --enable_seccomp turns on that code. Testing encouraged. If it crashes for you, please run it from gdb and send be a backtrace. DNS lookup uses a blizzard of syscalls, many involved with threads. Normally, the DNS helper thread gets started before seccomp is activated. That makes it hard to test the syscalls needed by pthread_create. You have to wait for the thread to time out and then for the pool logic to try again which starts a new thread. --enable-early-droproot does the drop root before reading the config file. That turns on seccomp early enough to catch creating the first DNS helper thread. User ntp has to be able to access refclocks and append to existing log/stats files which may be owned by root. That's probably a good idea anyway. There are two known cases where early drop root doesn't work. On is on NetBSD. Opening sockets doesn't work. I haven't checked carefully. I assume it's checking for port numbers less than 1024 or such. The other is SHM. It can't access the first two slots. That can be fixed, but I don't know of a quick workaround. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: adev.py
> Are you saying the unix time stamp result in the output is wrong? I didn't look that far. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Removing the worst cruft
e...@thyrsus.com said: > But AUSTRON/IRIG/CHU...I think there's a good (though not absolutely > dispositive) case for simply dropping them all. The Austron driver uses Loran. It was unplugged in the US several years ago. I think it's still used in Northern Europe. It may come back in the US as a backup for GPS. Is this a good time to setup a procedure for second class refclocks? Or think about how to do it? I haven't given this a lot of thought. The idea is to make it easy to add drivers for hardware we don't support directly. I think there are two things we would have to do. One is keep track of names. The other is to setup and document a recipe for adding a driver. Handwave. My straw man is that the keep-track of numbers part means that we maintain everything but the code and documentation for a driver. I think that's just a table entry in pylib/refclock.py and another in ntpd/refclock_conf.c It would be nice to teach waf to make man/web pages for optional drivers. It's possible that some git magic would simplify most of that, but I don't know how to do it. Maybe we just maintain a comment that git can replace. ?? If you use dumbclock as an example, I'll adopt the IRIG driver as a sanity check. How many of the current drivers can we test? Maybe we should move all the others to second class status. We need a web page with a status slot for each driver. We should pick one driver to use as an example. Simpler is better. If you do dump drivers that have survived this far, I think it would be better to move them to another git repository where they are still visible but don't clutter up our main world. Plan B is to not waste time on any of that and save our energy for the great SHM cleanup, or whatever. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Removing the worst cruft
g...@rellim.com said: > Several commercial NTP products do it, we wantt them to convert from NP > Classic to NTPsec. Are they sending IRIG or listening to it? -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Removing the worst cruft
e...@thyrsus.com said: > No, you were right the first time - and it's something I should have > noticed. That driver is designed for an obsolete class of sound card. I don't know much about audio. What is the right API to use? All we need is a batch of samples and the time they arrived. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Removing the worst cruft
fallenpega...@gmail.com said: > What I am wishing for, would be for someone to write a standalone in its own > demon process IRIG driver, that then speaks GPSD or SHM to NTPsec. But > testing such a beast would be specialized task. I think things are much more complicated than it seems. That doesn't actually solve any problems, just pushes them over where they aren't visible unless you go looking for them. It requires a public API and all the version support/coordination problems. We should be trying to make life simpler for sysadmins, not more complicated. That isn't to say that cleaning up the refclock interface isn't a good idea, just that "for someone to write..." is probably going to to cause more troubles that it solves. This area needs some serious thought before we start writing code. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Removing the worst cruft
e...@thyrsus.com said: > According to Wikipedia LORAN is dead. The principal station chains shut down > in 1979-1980. Last live use was in China in the 1990s. > What you are probably thinking of is DECCA, which was a hyperbolic radio > navigation system (very similar operating principle to LORAN but better > accuracy) deployed out of Great Britain with several station groups > elsewhere in Northern Europe. It shut down in 2000. A Japanese station > group continued operation until 2001. I think DECCA is/was way way old. I was thinking of eLoran. Looks like Britain pull the plug on their work the end of last year after France and Germany (and Norway?) bailed. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Possible cleanup
There is a SAVE_ERRNO macro that wraps around some code to preserve errno. It's only used in a few places. The first place I saw was calling msyslog. That would make sense if following code did something that depended on the error, but I checked all 4 cases and they never looked at errno. They are all in ntp_refclock. Looks like we can rip it out. But that's not the sort of code that's easy to test so we should be careful. SAVE_ERRNO uses a socket_errno() macro to get errno. That looks like it's a hook to work with windows. The answer gets put back into errno. Ahhh. the SAVE_ERRNO macro isn't being used to save errno but rather to copy the windows error over to errno where %m in msyslog can get it. I wonder if we can push that into msyslog. Looks like it's already there. So either we can rip it out, or we have to fix all the other places that use %m. (Or I haven't analyzed things correctly.) -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Removing the worst cruft
e...@thyrsus.com said: > Drivers that very well might fail the ten-year test: truetime, magnavox, > palisade, oncore, jupiter. Palisade is in use. It covers Trimble TSIP which includes the Thunderbolt which was widely available surplus only a few years ago and is popular with time-nuts. It should probably be renamed to Trimble. There are several sub-drivers/modes to cover different Trimble models which use different subsets of the full TSIP protocol. There is also some TSIP code in the generic/parse driver. The oncore driver covers Motorola which is/was very common. They put out a long sequence of chips. Many of them are way old, but probably don't add much to the driver. The M12/M12+ was popular and state of the art only a few years ago. Motorola sold off their GPS business. I forget who bought it. Somebody in Asia. It may have been sold again. I think the M12+ is still in production but the name may have changed and/or they may have updated it again. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Removing the worst cruft
> Can the palisade/trimble driver be replaced with a parse driver? I doubt it, but I'm far from familiar with the parse driver. Based on Eric's previous comments, the parse driver handles devices that provide the time in an easy to parse format. TSIP might fit that if all goes well. But there are many variations of TSIP. One covers reversing the normal PPS operation. Instead of needing kernel support to time stamp the PPS pulse, you send it a pulse by flapping one of the modem control signals and it tells you the time that happened. My vote would be to not rock that boat. There are more important things to work on. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Kernel PLL graphs
There are two parts to PPS processing in the kernel. RFC 2783 describes an API for capturing time stamps. RFC 1589 describes a PLL that lives in the kernel. Most Linux distros don't support RFC 1589. The code is in the kernel, but it doesn't work with the shipped kernels. It requires !NO_HZ, but most distros prefer NO_HZ. I pulled over the sources and built my own kernel. Here are the before and after graphs: http://users.megapathdsl.net/~hmurray/ntpsec/PPS-kernel.png The data is from two separate days so this isn't a clean comparison. I don't know what that machine was doing on either day. Here is a zoom in on the Kernel PLL day. http://users.megapathdsl.net/~hmurray/ntpsec/PPS-kernel2.png Note that the peak offset is less than a microsecond. We should see if we can get similar results on a Raspberry Pi. I haven't tried building an ARM kernel. I think we should be able to run the PLL code outside the kernel. The PPS time stamp is key. The PLL calculations don't need to be run in the kernel. They need to be run soon after the PPS, but not interrupt level immediately. The API has an option to wakeup on PPS. I don't know if it is implemented on Linux. The no-PLL test was run at the default maxpoll of 6. I should try faster. I also need a standard test load. I remember various FreeBSD-is-better type comments from many years ago. I don't know if the PLL was working in Linux at the time. I should setup a test case. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: driftMime-Version: 1.0
g...@rellim.com said: > 1. On startup chronyd checks the time stamp on the drift file. > if the timestamp > sysclock, the sysclock is set to the timestamp I vote that we don't do anything, not even make it optional behind a command line switch. We have more important things to do. The OS should be doing that sort of thing, probably using the root directory. Why stop with the drift file? Should we check the log files too? It's the sort of code that is hard to test and likely to have subtle problems. I think it's a good item to put on the what-do-customers-want list. > 2. ntpd stores the frequency ppm offset in the driftfile. > chronyd stores the frequency ppm offset and the 'skew' > (estimated accuracy of the existing frequency value). > I can see that saving the 'skew' is a nice touch, but I suspect much the > good chronyd startup behavior is explained elsewhere. I'm not sure that ntpd has a parameter equivalent to skew. Again, I vote that we don't do anything now. The current startup stuff is broken. There is no point in working on things like this until we understand and fix the current problems. g...@rellim.com said: > In a related topic, it would be nice (maybe an option) for ntpd to hold off > logging the initial aweful data until after the -g option has set the system > clock. And a bit longer, so the wonky startup data is masked. But that is when you really really want the logging. I might agree to put it someplace other than the normal place. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Kernel PLL graphs
matthew.sel...@twosigma.com said: > I'm using maxpoll of 1 on my stratum 1 servers. And I have !NO_HZ set. My > offsets stay belong 1 microsecond as reported by ntpq. If we switched the > units to nanoseconds, that might be interesting. Time to make sure I've got the right number of negatives... "I have !NO_HZ set" means you have unset NO_HZ which probably means you had to build your own kernel. Do you have flag3 turned on? If so, the kernel does all the work and maxpoll is essentially ignored. I though there was a min to maxpoll so I'm a bit surprised you could set it to 1. > I don't have !NO_HZ set on my stratum 2 servers, but I'm looking at the > ramifications of that. At least for the effect I'm discussing, it only matters if you have a PPS. > I'm curious what your results are. -- These are my opinions. I hate spam. ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel