On Tue, Oct 19, 2021 at 09:47:22PM +0200, Martijn van Duren wrote: > On Tue, 2021-10-19 at 19:56 +0200, Otto Moerbeek wrote: > > On Tue, Oct 19, 2021 at 07:49:15PM +0200, Mischa wrote: > > > > > On 2021-10-15 20:05, Otto Moerbeek wrote: > > > > On Fri, Oct 15, 2021 at 07:47:22PM +0200, Mischa wrote: > > > > > On 2021-10-15 19:42, Otto Moerbeek wrote: > > > > > > On Fri, Oct 15, 2021 at 07:16:55PM +0200, Mischa wrote: > > > > > > > > > > > > > On 2021-10-15 18:27, Otto Moerbeek wrote: > > > > > > > > > > > > > > > > The actual problem (SIGSEGV) happens in the child processes: > > > > > > > > ktrace the > > > > > > > > children as well: ktrace -di ... > > > > > > > > > > > > > > > > -Otto > > > > > > > > > > > > > > Thanx Otto. > > > > > > > Below is the the kdump with ktrace -di > > > > > > > It's quite a lot of data but I didn't want to remove something > > > > > > > that > > > > > > > could > > > > > > > potentially be useful. > > > > > > > > > > > > > > Mischa > > > > > > > > > > > > > > > > > > > The pattern below happens multiple times: > > > > > > > > > > > > A recvfrom of 101 bytes and after that a SIGSEGV. > > > > > > > > > > > > Now we do not know for sure if those two lines are related. > > > > > > > > > > > > I suspect that it is no coincidence that the 101 is one larger than > > > > > > 100... > > > > > > > > > > > > No other clue yet. > > > > > > > > > > Anything else I can collect. > > > > > > > > You might want to compile and install nsd wit debug symbols info: > > > > > > > > cd /usr/src/usr.sbin/nsd > > > > make -f Makefile.bsd-wrapper obj > > > > make -f Makefile.bsd-wrapper clean > > > > DEBUG=-g make -f Makefile.bsd-wrapper > > > > make -f Makefile.bsd-wrapper install > > > > > > > > > > > > Then: collect a gdb trace from a running process: install gdb from > > > > ports, > > > > run > > > > egdb --pid=pidofnsdchild /usr/sbin/nsd > > > > > > > > and wait for the crash. > > > > > > > > But I'm mostly unfamiliar with the nsd code and what has been changed > > > > recently. I's say make sure sthen@ and florian@ see this: move to > > > > bugs@ as I do not know if they read misc@. > > > > > > Thanx Otto. > > > > > > As this is my first time using gdb, I need some assistance. > > > > > > root@name2:~ # ps -aux | grep nsd > > > _nsd 79188 0.0 1.0 101704 86400 ?? Ip 7:31PM 0:00.20 nsd: > > > xfrd (nsd) > > > _nsd 24002 0.0 0.4 37188 37388 ?? Ip 7:31PM 0:00.29 nsd: > > > main > > > (nsd) > > > _nsd 44937 0.0 0.2 37544 18308 ?? Sp 7:45PM 0:00.11 nsd: > > > server 1 (nsd) > > > > > > root@name2:~ # egdb --pid=44937 /usr/sbin/nsd > > > GNU gdb (GDB) 7.12.1 > > > Copyright (C) 2017 Free Software Foundation, Inc. > > > License GPLv3+: GNU GPL version 3 or later > > > <http://gnu.org/licenses/gpl.html> > > > This is free software: you are free to change and redistribute it. > > > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > > > and "show warranty" for details. > > > This GDB was configured as "x86_64-unknown-openbsd7.0". > > > Type "show configuration" for configuration details. > > > For bug reporting instructions, please see: > > > <http://www.gnu.org/software/gdb/bugs/>. > > > Find the GDB manual and other documentation resources online at: > > > <http://www.gnu.org/software/gdb/documentation/>. > > > For help, type "help". > > > Type "apropos word" to search for commands related to "word"... > > > Reading symbols from /usr/sbin/nsd...(no debugging symbols found)...done. > > > Attaching to program: /usr/sbin/nsd, process 44937 > > > Reading symbols from /usr/lib/libssl.so.50.0...done. > > > Reading symbols from /usr/lib/libcrypto.so.47.0...done. > > > Reading symbols from /usr/lib/libevent.so.4.1...done. > > > Reading symbols from /usr/lib/libc.so.96.1...done. > > > Reading symbols from /usr/libexec/ld.so...done. > > > [Switching to thread 563101] > > > kevent () at /tmp/-:3 > > > 3 /tmp/-: No such file or directory. > > > > > > Anything I am missing? > > > > > > Mischa > > > > > > > Do you see a gdb prompt? If so > > > > continue > > > > should it (and then wait for the crash). > > > > If you still see the crashes, a tcpdump of the traffic to nsd might > > helps as well, I can replay that locally against nsd. I would also > > need your nsd config for that. > > > > -Otto > > > I did some debugging with Mischa. > > Unfortunately I misclicked and deleted the backtrace. However, the > problem was that query.c calls add_rrset (query.c:736) from > answer_delegation (query.c:917), where rrset is NULL. > > When looking in the original query it was always a PTR request to > an IPv6 record. When looking through the file we tried to remove > some likely suspect entries to see if we could pinpoint the root- > cause, but after readding everything it wouldn't crash anymore. > > Adding a simple comment to the zonefile of the second NS server > yielded the same result: the server won't crash anymore. > > Mischa is going to monitor the situation to see if the issues > return, but my current best guess is that some weird state got > cached somewhere somehow and got flushed when saving the > zonefile. > > martijn@ >
Maybe some form of corruption in the zonefile that was remved when saving? Who knows.... Anyway, thanks for taking care. -Otto