It looks like glibc's heap checker aborts on a bad free() call, so this is 
probably due to a double free or a corrupted pointer. If you can recompile 
dnsmasq, I would attempt to use AddressSanitizer, otherwise run the dnsmasq 
binary under valgrind, with --track-origins=yes. It will help if you install 
debug symbols.

-Dimitry

> On 13 Feb 2025, at 18:14, Clippinger, Sam via Dnsmasq-discuss 
> <dnsmasq-discuss@lists.thekelleys.org.uk> wrote:
> 
> Hello everyone,
>  
> I have found an issue in dnsmasq v2.90 that is causing problems in our 
> Openstack environments.  When our Neutron agents rewrite the configs and send 
> a SIGHUP to trigger a reload, dnsmasq will (usually) crash with a SIGABRT 
> signal.  This only seems to happen in our busiest Openstack regions where VMs 
> are coming and going constantly, causing dnsmasq to reload many times per 
> minute.  In other regions where there are no new VMs being created, the 
> reloads work fine with no crashes.
>  
> I investigated in a very busy region where I see dozens of crashes per 
> minute.  It is only using dnsmasq for DHCP, it is not receiving DNS queries.  
> This is a production environment, but I rebuilt dnsmasq with debug symbols 
> and managed to capture this with gdb when it crashes.  I tried it a few times 
> and the crash always has the same stack trace.
> ################################################################################
> Reading symbols from 
> /usr/lib/debug/usr/sbin/dnsmasq-2.90-1.el9.x86_64.debug...
> Attaching to program: 
> /usr/lib/debug/usr/sbin/dnsmasq-2.90-1.el9.x86_64.debug, process 3075598
> <snip loading messages>
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> 0x00007ff1c8c62ac7 in poll () from target:/lib64/libc.so.6
> (gdb) c
> Continuing.
>  
> Program received signal SIGHUP, Hangup.
> 0x00007ff1c8c62ac7 in poll () from target:/lib64/libc.so.6
> (gdb) c
> Continuing.
>  
> Program received signal SIGABRT, Aborted.
> 0x00007ff1c8beca6c in __pthread_kill_implementation () from 
> target:/lib64/libc.so.6
> (gdb) where
> #0  0x00007ff1c8beca6c in __pthread_kill_implementation () from 
> target:/lib64/libc.so.6
> #1  0x00007ff1c8b9f686 in raise () from target:/lib64/libc.so.6
> #2  0x00007ff1c8b89833 in abort () from target:/lib64/libc.so.6
> #3  0x00007ff1c8b8a170 in __libc_message.cold () from target:/lib64/libc.so.6
> #4  0x00007ff1c8bf6b17 in malloc_printerr () from target:/lib64/libc.so.6
> #5  0x00007ff1c8bf8800 in _int_free () from target:/lib64/libc.so.6
> #6  0x00007ff1c8bfae55 in free () from target:/lib64/libc.so.6
> #7  0x000055f6521e0c18 in dhcp_netid_free (nid=0x7ff1c8bfae55 <free+85>) at 
> /usr/src/debug/dnsmasq-2.90-1.el9.x86_64/src/option.c:1333
> #8  dhcp_netid_list_free (netid=0x0) at 
> /usr/src/debug/dnsmasq-2.90-1.el9.x86_64/src/option.c:1363
> #9  dhcp_config_free (config=0x55f652b51a60) at 
> /usr/src/debug/dnsmasq-2.90-1.el9.x86_64/src/option.c:1381
> #10 0x000055f652b51930 in ?? ()
> #11 0x000055f6529eb1f8 in ?? ()
> #12 0x0000000000000fa4 in ?? ()
> #13 0x000055f6529eaf60 in ?? ()
> #14 0x000055f6529eaf60 in ?? ()
> #15 0x000055f6521f5259 in clear_dynamic_conf () at 
> /usr/src/debug/dnsmasq-2.90-1.el9.x86_64/src/option.c:5777
> #16 reread_dhcp () at 
> /usr/src/debug/dnsmasq-2.90-1.el9.x86_64/src/option.c:5818
> #17 clear_cache_and_reload (now=94516438056960) at 
> /usr/src/debug/dnsmasq-2.90-1.el9.x86_64/src/dnsmasq.c:1742
> #18 0x4141414141414141 in ?? ()
> #19 0x0000000067ae1dbd in ?? ()
> #20 0x0000000000000000 in ?? ()
> (gdb)
> ################################################################################
>  
> The dnsmasq command line looks like this (lightly redacted):
> dnsmasq --no-hosts --no-resolv \
>     --pid-file=/var/lib/neutron/dhcp/xxx/pid \
>     --dhcp-hostsfile=/var/lib/neutron/dhcp/xxx/host \
>     --addn-hosts=/var/lib/neutron/dhcp/xxx/addn_hosts \
>     --dhcp-optsfile=/var/lib/neutron/dhcp/xxx/opts \
>     --dhcp-leasefile=/var/lib/neutron/dhcp/xxx/leases \
>     --dhcp-match=set:ipxe,175 \
>     --dhcp-userclass=set:ipxe6,iPXE \
>     --local-service \
>     --bind-dynamic \
>     --dhcp-range=set:subnet-yyy,10.1.1.0,static,255.255.248.0,86400s \
>     --dhcp-range=set:subnet-zzz,10.2.1.0,static,255.255.252.0,86400s \
>     --dhcp-option-force=option:mtu,1500 \
>     --dhcp-lease-max=3072 \
>     --conf-file=/etc/neutron/dnsmasq-neutron.conf
>  
> The /etc/neutron/dnsmasq-neutron.conf file only sets these options (lightly 
> redacted):
> dhcp-boot=smsboot\pxelinux.com <http://pxelinux.com/>,boothost,10.0.1.2
> dhcp-option=option:ntp-server,10.0.0.1,10.0.1.1,10.0.2.1
>  
> The /var/lib/neutron/dhcp/xxx/host file contains between 800-3000 entries, 
> depending on the time of day.  They each look something like this (lightly 
> redacted):
> fa:16:3e:3b:ad:b9,set:16a8f84b90f640f7a2c9a133d844985e,host-10-1-2-3,10.1.2.3
>  
> The /var/lib/neutron/dhcp/xxx/addn_hosts file contains between 800-3000 
> entries, depending on the time of day.  They each look something like this 
> (lightly redacted):
> 10.1.9.9     np0006812233.subdomain.subdomain.mycorp.com 
> <http://np0006812233.subdomain.subdomain.mycorp.com/>. np0006812233
>  
> The /var/lib/neutron/dhcp/xxx/opts file contains about 190 entries.  The top 
> of the file looks like this, the rest of the entries are just like the last 
> two lines, defining more domain-name and domain-search values for additional 
> subdomains (lightly redacted):
> tag:subnet-xxx,option:dns-server,10.0.0.10,10.0.0.11
> tag:subnet-xxx,option:classless-static-route,10.1.1.0/22,0.0.0.0,169.254.169.254/32,10.2.1.30,0.0.0.0/0,10.2.1.1
> tag:subnet-xxx,249,10.1.1.0/22,0.0.0.0,169.254.169.254/32,10.2.1.30,0.0.0.0/0,10.2.1.1
> tag:subnet-xxx,option:router,10.2.1.1
> tag:subnet-yyy,option:dns-server,0.0.10,10.0.0.11
> tag:subnet-yyy,option:classless-static-route,10.2.1.0/21,0.0.0.0,169.254.169.254/32,10.1.1.30,0.0.0.0/0,10.1.1.1
> tag:subnet-yyy,249,10.2.1.0/21,0.0.0.0,169.254.169.254/32,10.1.1.30,0.0.0.0/0,10.1.1.1
> tag:subnet-yyy,option:router,10.1.1.1
> tag:16a8f84b90f640f7a2c9a133d844985e,option:domain-name,subdomain.subdomain.mycorp.com
>  <http://subdomain.subdomain.mycorp.com/>
> tag:16a8f84b90f640f7a2c9a133d844985e,option:domain-search,subdomain.subdomain.
>  mycorp.com <http://mycorp.com/>,subdomain. mycorp.com <http://mycorp.com/>, 
> mycorp.com <http://mycorp.com/>
>  
> The /var/lib/neutron/xxx/leases file contains between 800-3000 entries, 
> depending on the time of day.  They each look something like this (lightly 
> redacted):
> 1739552375 fa:16:3e:3b:ad:b9 10.1.2.3 np0006812233 *
>  
> What can I do to help troubleshoot this?  I know C but I’m not familiar with 
> the dnsmasq code.  Thanks in advance!
>  
> -- Sam Clippinger 
>  
> _______________________________________________
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss@lists.thekelleys.org.uk 
> <mailto:Dnsmasq-discuss@lists.thekelleys.org.uk>
> https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss

_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss

Reply via email to