It looks like glibc's heap checker aborts on a bad free() call, so this is probably due to a double free or a corrupted pointer. If you can recompile dnsmasq, I would attempt to use AddressSanitizer, otherwise run the dnsmasq binary under valgrind, with --track-origins=yes. It will help if you install debug symbols.
-Dimitry > On 13 Feb 2025, at 18:14, Clippinger, Sam via Dnsmasq-discuss > <dnsmasq-discuss@lists.thekelleys.org.uk> wrote: > > Hello everyone, > > I have found an issue in dnsmasq v2.90 that is causing problems in our > Openstack environments. When our Neutron agents rewrite the configs and send > a SIGHUP to trigger a reload, dnsmasq will (usually) crash with a SIGABRT > signal. This only seems to happen in our busiest Openstack regions where VMs > are coming and going constantly, causing dnsmasq to reload many times per > minute. In other regions where there are no new VMs being created, the > reloads work fine with no crashes. > > I investigated in a very busy region where I see dozens of crashes per > minute. It is only using dnsmasq for DHCP, it is not receiving DNS queries. > This is a production environment, but I rebuilt dnsmasq with debug symbols > and managed to capture this with gdb when it crashes. I tried it a few times > and the crash always has the same stack trace. > ################################################################################ > Reading symbols from > /usr/lib/debug/usr/sbin/dnsmasq-2.90-1.el9.x86_64.debug... > Attaching to program: > /usr/lib/debug/usr/sbin/dnsmasq-2.90-1.el9.x86_64.debug, process 3075598 > <snip loading messages> > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib64/libthread_db.so.1". > 0x00007ff1c8c62ac7 in poll () from target:/lib64/libc.so.6 > (gdb) c > Continuing. > > Program received signal SIGHUP, Hangup. > 0x00007ff1c8c62ac7 in poll () from target:/lib64/libc.so.6 > (gdb) c > Continuing. > > Program received signal SIGABRT, Aborted. > 0x00007ff1c8beca6c in __pthread_kill_implementation () from > target:/lib64/libc.so.6 > (gdb) where > #0 0x00007ff1c8beca6c in __pthread_kill_implementation () from > target:/lib64/libc.so.6 > #1 0x00007ff1c8b9f686 in raise () from target:/lib64/libc.so.6 > #2 0x00007ff1c8b89833 in abort () from target:/lib64/libc.so.6 > #3 0x00007ff1c8b8a170 in __libc_message.cold () from target:/lib64/libc.so.6 > #4 0x00007ff1c8bf6b17 in malloc_printerr () from target:/lib64/libc.so.6 > #5 0x00007ff1c8bf8800 in _int_free () from target:/lib64/libc.so.6 > #6 0x00007ff1c8bfae55 in free () from target:/lib64/libc.so.6 > #7 0x000055f6521e0c18 in dhcp_netid_free (nid=0x7ff1c8bfae55 <free+85>) at > /usr/src/debug/dnsmasq-2.90-1.el9.x86_64/src/option.c:1333 > #8 dhcp_netid_list_free (netid=0x0) at > /usr/src/debug/dnsmasq-2.90-1.el9.x86_64/src/option.c:1363 > #9 dhcp_config_free (config=0x55f652b51a60) at > /usr/src/debug/dnsmasq-2.90-1.el9.x86_64/src/option.c:1381 > #10 0x000055f652b51930 in ?? () > #11 0x000055f6529eb1f8 in ?? () > #12 0x0000000000000fa4 in ?? () > #13 0x000055f6529eaf60 in ?? () > #14 0x000055f6529eaf60 in ?? () > #15 0x000055f6521f5259 in clear_dynamic_conf () at > /usr/src/debug/dnsmasq-2.90-1.el9.x86_64/src/option.c:5777 > #16 reread_dhcp () at > /usr/src/debug/dnsmasq-2.90-1.el9.x86_64/src/option.c:5818 > #17 clear_cache_and_reload (now=94516438056960) at > /usr/src/debug/dnsmasq-2.90-1.el9.x86_64/src/dnsmasq.c:1742 > #18 0x4141414141414141 in ?? () > #19 0x0000000067ae1dbd in ?? () > #20 0x0000000000000000 in ?? () > (gdb) > ################################################################################ > > The dnsmasq command line looks like this (lightly redacted): > dnsmasq --no-hosts --no-resolv \ > --pid-file=/var/lib/neutron/dhcp/xxx/pid \ > --dhcp-hostsfile=/var/lib/neutron/dhcp/xxx/host \ > --addn-hosts=/var/lib/neutron/dhcp/xxx/addn_hosts \ > --dhcp-optsfile=/var/lib/neutron/dhcp/xxx/opts \ > --dhcp-leasefile=/var/lib/neutron/dhcp/xxx/leases \ > --dhcp-match=set:ipxe,175 \ > --dhcp-userclass=set:ipxe6,iPXE \ > --local-service \ > --bind-dynamic \ > --dhcp-range=set:subnet-yyy,10.1.1.0,static,255.255.248.0,86400s \ > --dhcp-range=set:subnet-zzz,10.2.1.0,static,255.255.252.0,86400s \ > --dhcp-option-force=option:mtu,1500 \ > --dhcp-lease-max=3072 \ > --conf-file=/etc/neutron/dnsmasq-neutron.conf > > The /etc/neutron/dnsmasq-neutron.conf file only sets these options (lightly > redacted): > dhcp-boot=smsboot\pxelinux.com <http://pxelinux.com/>,boothost,10.0.1.2 > dhcp-option=option:ntp-server,10.0.0.1,10.0.1.1,10.0.2.1 > > The /var/lib/neutron/dhcp/xxx/host file contains between 800-3000 entries, > depending on the time of day. They each look something like this (lightly > redacted): > fa:16:3e:3b:ad:b9,set:16a8f84b90f640f7a2c9a133d844985e,host-10-1-2-3,10.1.2.3 > > The /var/lib/neutron/dhcp/xxx/addn_hosts file contains between 800-3000 > entries, depending on the time of day. They each look something like this > (lightly redacted): > 10.1.9.9 np0006812233.subdomain.subdomain.mycorp.com > <http://np0006812233.subdomain.subdomain.mycorp.com/>. np0006812233 > > The /var/lib/neutron/dhcp/xxx/opts file contains about 190 entries. The top > of the file looks like this, the rest of the entries are just like the last > two lines, defining more domain-name and domain-search values for additional > subdomains (lightly redacted): > tag:subnet-xxx,option:dns-server,10.0.0.10,10.0.0.11 > tag:subnet-xxx,option:classless-static-route,10.1.1.0/22,0.0.0.0,169.254.169.254/32,10.2.1.30,0.0.0.0/0,10.2.1.1 > tag:subnet-xxx,249,10.1.1.0/22,0.0.0.0,169.254.169.254/32,10.2.1.30,0.0.0.0/0,10.2.1.1 > tag:subnet-xxx,option:router,10.2.1.1 > tag:subnet-yyy,option:dns-server,0.0.10,10.0.0.11 > tag:subnet-yyy,option:classless-static-route,10.2.1.0/21,0.0.0.0,169.254.169.254/32,10.1.1.30,0.0.0.0/0,10.1.1.1 > tag:subnet-yyy,249,10.2.1.0/21,0.0.0.0,169.254.169.254/32,10.1.1.30,0.0.0.0/0,10.1.1.1 > tag:subnet-yyy,option:router,10.1.1.1 > tag:16a8f84b90f640f7a2c9a133d844985e,option:domain-name,subdomain.subdomain.mycorp.com > <http://subdomain.subdomain.mycorp.com/> > tag:16a8f84b90f640f7a2c9a133d844985e,option:domain-search,subdomain.subdomain. > mycorp.com <http://mycorp.com/>,subdomain. mycorp.com <http://mycorp.com/>, > mycorp.com <http://mycorp.com/> > > The /var/lib/neutron/xxx/leases file contains between 800-3000 entries, > depending on the time of day. They each look something like this (lightly > redacted): > 1739552375 fa:16:3e:3b:ad:b9 10.1.2.3 np0006812233 * > > What can I do to help troubleshoot this? I know C but I’m not familiar with > the dnsmasq code. Thanks in advance! > > -- Sam Clippinger > > _______________________________________________ > Dnsmasq-discuss mailing list > Dnsmasq-discuss@lists.thekelleys.org.uk > <mailto:Dnsmasq-discuss@lists.thekelleys.org.uk> > https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
_______________________________________________ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss