On 11/08/2021 16:19, Andre Heider wrote:
Hi there,

it seems I can trigger this semi-reliably while powering up 2 KVM windows instances in parallel.

This is the tail of log-queries:
dnsmasq[6591]: query[A] foo.internal from 192.168.0.foo
dnsmasq[6591]: DHCP foo.internal is 192.168.0.foo
dnsmasq[6591]: query[AAAA] foo.internal from 192.168.0.foo
dnsmasq[6591]: config foo.internal is NODATA-IPv6
dnsmasq[6591]: query[A] ipv4only.arpa from 192.168.0.foo

And this is where dnsmasq stop replying altogether and hogs the cpu.
Note that foo.internal is the kvm host with the ip 192.168.0.foo.

Using gdbserver doesn't yield too much info, looks like because of lto maybe (which OpenWrt does for dnsmasq)?
(gdb) info threads
   Id   Target Id                  Frame
* 1    Thread 6591.6591 "dnsmasq" 0x555715f1 in forward_query.lto_priv ()
(gdb) bt full
#0  0x555715f1 in forward_query.lto_priv ()
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Without LTO, another hang with the last log-queries 'dnsmasq[6842]: query[A] ipv4only.arpa from 192.168.0.foo' again:

#0 0x55568969 in forward_query (udpfd=<optimized out>, udpaddr=0x7fffbab8, dst_addr=0x7fffbaa8, dst_iface=15, header=0x5561c080, plen=31, limit=0x5561c280 "", now=958007, forward=0x5563e8e0, ad_reqd=0, do_bit=0) at forward.c:319
        is_sign = 1432469663
        pheader = 0x0
        flags = 0
        fwd_flags = <optimized out>
        is_dnssec = 0
        master = <optimized out>
        gotname = 128
        hash = 0x55624a10
        oph = 0x0
        old_src = <optimized out>
        first = 1432469663
        last = 12
        start = 0
        subnet = 0
        cacheable = 1431977952
        forwarded = 0
        edns0_len = 1431725623
        pheader = 0x80088 <error: Cannot access memory at address 0x80088>
        ede = -1
#1 0x5556947d in receive_query (listen=0x5561f5c0, now=958007) at forward.c:1640
        ad_reqd = 0
        header = 0x5561c080
source_addr = {sa = {sa_family = 2, sa_data = "\306d\300\250\000(\000\000\000\000\000\000\000"}, in = {sin_family = 2, sin_port = 50788, sin_addr = {s_addr = 3232235560}, sin_zero = "\000\000\000\000\000\000\000"}, in6 = {sin6_family = 2, sin6_port = 50788, sin6_flowinfo = 3232235560, sin6_addr = { __in6_union = {__s6_addr = "\000\000\000\000\000\000\000\000\000\000\001Ww\347", <incomplete sequence \360>, __s6_addr16 = {0, 0, 0, 0, 0, 343, 30695, 3824}, __s6_addr32 = {0, 0, 343, 2011631344}}}, sin6_scope_id = 8162}}
        pheader = 0x7fffbb68 "UZ?\340w\373+x"
        type = 1
        udp_size = <optimized out>
dst_addr = {addr4 = {s_addr = 3232235521}, addr6 = {__in6_union = {__s6_addr = "\300\250\000\001a\023\340\345\000\004\024\231w\354\066\240", __s6_addr16 = {49320, 1, 24851, 57573, 4, 5273, 30700, 13984}, __s6_addr32 = {3232235521, 1628692709, 267417, 2011969184}}}, cname = {target = {cache = 0xc0a80001, name = 0xc0a80001 <error: Cannot access memory at address 0xc0a80001>}, uid = 1628692709, is_name_ptr = 267417}, key = {keydata = 0xc0a80001, keylen = 24851, flags = 57573, keytag = 4, algo = 20 '\024'}, ds = {keydata = 0xc0a80001, keylen = 24851, keytag = 57573, algo = 0 '\000', digest = 4 '\004'}, srv = {target = 0xc0a80001, targetlen = 24851, srvport = 57573, priority = 4, weight = 5273}, log = {keytag = 49320, algo = 1, digest = 24851, rcode = 57573, ede = 267417}}
        netmask = {s_addr = <optimized out>}
        dst_addr_4 = {s_addr = 3232235521}
        m = 0
        n = 31
        if_index = 15
        auth_dns = 0
        do_bit = 0
        have_pseudoheader = 0
        mark = 0
        have_mark = 0
        is_single_query = <optimized out>
        allowed = <optimized out>
        local_auth = <optimized out>
        iov = {{iov_base = 0x5561c080, iov_len = 1280}}
msg = {msg_name = 0x7fffbab8, msg_namelen = 16, msg_iov = 0x7fffbaa0, msg_iovlen = 1, msg_control = 0x7fffbad4, msg_controllen = 24, msg_flags = 0}
        cmptr = <optimized out>
control_u = {align = {cmsg_len = 24, cmsg_level = 0, cmsg_type = 8}, control6 = "\000\000\000\030\000\000\000\000\000\000\000\b\000\000\000\017\300\250\000\001\300\250\000\001\000\000\000\030\000\000\000", control = "\000\000\000\030\000\000\000\000\000\000\000\b\000\000\000\017\300\250\000\001\300\250\000\001"}
        family = <optimized out>
        check_dst = <optimized out>
#2 0x5556c853 in check_dns_listeners (now=<optimized out>) at dnsmasq.c:1802
        serverfdp = <optimized out>
        listener = 0x5561f5c0
        rfl = <optimized out>
        i = <optimized out>
        pipefd = {1431977952, 2012949368}
#3 0x555570f9 in main (argc=<optimized out>, argv=<optimized out>) at dnsmasq.c:1229
        timeout = <optimized out>
        bind_fallback = 0
        now = 958007
sigact = {__sa_handler = {sa_handler = 0x1, sa_sigaction = 0x1}, sa_mask = {__bits = {0, 0, 0, 0, 0, 2013266592, 17788, 2013238320, 2013238324, 2013245196, 4, 0, 80, 0, 2012700248, 2012708864, 2013167616, 2013233152, 2013237244, 0, 691091872, 2013233152, 4, 2147466580, 1431657089, 2013091140, 2013086164, 2013165476, 0, 0, 2013095836, 15630}}, sa_flags = 0, sa_restorer = 0x22358}
        if_tmp = <optimized out>
        piperead = 24
        pipefd = {24, 25}
        err_pipe = {0, -1}
        ent_pw = <optimized out>
        script_uid = 0
        script_gid = 0
        gp = <optimized out>
        i = <optimized out>
        max_fd = 1024
        baduser = <optimized out>
        log_err = 0
        chown_warn = 0
        hdr = 0x55634fa0
        data = <optimized out>
        need_cap_net_admin = <optimized out>
        need_cap_net_raw = <optimized out>
        need_cap_net_bind_service = <optimized out>
        bound_device = 0x0
        did_bind = 0
        serv = <optimized out>
        netlink_warn = 0x0

Continuing and breaking again about 30 times mostly yields line 319 as above (which is 'difftime(now, master->forwardtime) > FORWARD_TIME ||')

A few times it hits 341 though ('while (forward->blocking_query)'):

#0 0x5556896f in forward_query (udpfd=<optimized out>, udpaddr=0x7fffbab8, dst_addr=0x7fffbaa8, dst_iface=15, header=0x5561c080, plen=31, limit=0x5561c280 "", now=958007, forward=0x5563e8e0, ad_reqd=0, do_bit=0) at forward.c:341
        is_sign = 1432469663
        pheader = 0x0
        flags = 0
        fwd_flags = <optimized out>
        is_dnssec = 0
        master = <optimized out>
        gotname = 128
        hash = 0x55624a10
        oph = 0x0
        old_src = <optimized out>
        first = 1432469663
        last = 12
        start = 0
        subnet = 0
        cacheable = 1431977952
        forwarded = 0
        edns0_len = 1431725623
        pheader = 0x80088 <error: Cannot access memory at address 0x80088>
        ede = -1
#1 0x5556947d in receive_query (listen=0x5561f5c0, now=958007) at forward.c:1640
        ad_reqd = 0
        header = 0x5561c080
source_addr = {sa = {sa_family = 2, sa_data = "\306d\300\250\000(\000\000\000\000\000\000\000"}, in = {sin_family = 2, sin_port = 50788, sin_addr = {s_addr = 3232235560}, sin_zero = "\000\000\000\000\000\000\000"}, in6 = {sin6_family = 2, sin6_port = 50788, sin6_flowinfo = 3232235560, sin6_addr = { __in6_union = {__s6_addr = "\000\000\000\000\000\000\000\000\000\000\001Ww\347", <incomplete sequence \360>, __s6_addr16 = {0, 0, 0, 0, 0, 343, 30695, 3824}, __s6_addr32 = {0, 0, 343, 2011631344}}}, sin6_scope_id = 8162}}
        pheader = 0x7fffbb68 "UZ?\340w\373+x"
        type = 1
        udp_size = <optimized out>
dst_addr = {addr4 = {s_addr = 3232235521}, addr6 = {__in6_union = {__s6_addr = "\300\250\000\001a\023\340\345\000\004\024\231w\354\066\240", __s6_addr16 = {49320, 1, 24851, 57573, 4, 5273, 30700, 13984}, __s6_addr32 = {3232235521, 1628692709, 267417, 2011969184}}}, cname = {target = {cache = 0xc0a80001, name = 0xc0a80001 <error: Cannot access memory at address 0xc0a80001>}, uid = 1628692709, is_name_ptr = 267417}, key = {keydata = 0xc0a80001, keylen = 24851, flags = 57573, keytag = 4, algo = 20 '\024'}, ds = {keydata = 0xc0a80001, keylen = 24851, keytag = 57573, algo = 0 '\000', digest = 4 '\004'}, srv = {target = 0xc0a80001, targetlen = 24851, srvport = 57573, priority = 4, weight = 5273}, log = {keytag = 49320, algo = 1, digest = 24851, rcode = 57573, ede = 267417}}
        netmask = {s_addr = <optimized out>}
        dst_addr_4 = {s_addr = 3232235521}
        m = 0
        n = 31
        if_index = 15
        auth_dns = 0
        do_bit = 0
        have_pseudoheader = 0
        mark = 0
        have_mark = 0
        is_single_query = <optimized out>
        allowed = <optimized out>
        local_auth = <optimized out>
        iov = {{iov_base = 0x5561c080, iov_len = 1280}}
msg = {msg_name = 0x7fffbab8, msg_namelen = 16, msg_iov = 0x7fffbaa0, msg_iovlen = 1, msg_control = 0x7fffbad4, msg_controllen = 24, msg_flags = 0}
        cmptr = <optimized out>
control_u = {align = {cmsg_len = 24, cmsg_level = 0, cmsg_type = 8}, control6 = "\000\000\000\030\000\000\000\000\000\000\000\b\000\000\000\017\300\250\000\001\300\250\000\001\000\000\000\030\000\000\000", control = "\000\000\000\030\000\000\000\000\000\000\000\b\000\000\000\017\300\250\000\001\300\250\000\001"}
        family = <optimized out>
        check_dst = <optimized out>
#2 0x5556c853 in check_dns_listeners (now=<optimized out>) at dnsmasq.c:1802
        serverfdp = <optimized out>
        listener = 0x5561f5c0
        rfl = <optimized out>
        i = <optimized out>
        pipefd = {1431977952, 2012949368}


Does that help, maybe?

Regards,
Andre

On 11/08/2021 08:36, Andre Heider wrote:
On 10/08/2021 23:11, Simon Kelley wrote:
On 10/08/2021 14:53, Dominik wrote:
Hey Simon,

various dnsmasq-2.86 test tags are around and it doesn't look like
there are any intermediate bugs around. The Pi-hole beta seems to have
attracted at least a few couple of dozen additional testers and nothing
seems to have come up here, either.

How do you feel about tagging a v2.86 release?

Best regards,
Dominik



I've been accumulating patches for a couple of weeks, and started
working through those tonight. None are very intrusive or controversial.
Once those are done, I'll tag rc1.

I'm using 2.86test6 on OpenWrt, and I think I've found a bug. Detail's are vague so far but ever since I've started DoT with stubby as upstream server, dnsmasq every now and then gets into a mode where it stops responding to request completely and just sits there using 100% cpu power on one core. I haven't found a way yet to trigger that reliably, but it feels like timing/parallel requests.

Does that ring any bells?

Any hints on how to get more relevant info on the situation? Sometimes it takes over a day, so verbose logs seem to be out of the question (since it's a small router). How to I get a backtrace without gdb?

config created by OpenWrt:
conf-file=/etc/dnsmasq.conf
dhcp-authoritative
domain-needed
no-resolv
localise-queries
enable-ubus=dnsmasq
expand-hosts
bind-dynamic
local-service
quiet-dhcp
edns-packet-max=1280
domain=internal
local=/internal/
server=127.0.0.1#5453
addn-hosts=/tmp/hosts
dhcp-leasefile=/tmp/dhcp.leases
stop-dns-rebind
conf-file=/usr/share/dnsmasq/trust-anchors.conf
dnssec
dhcp-broadcast=tag:needs-broadcast
conf-dir=/tmp/dnsmasq.d
user=dnsmasq
group=dnsmasq
dhcp-ignore-names=tag:dhcp_bogus_hostname
conf-file=/usr/share/dnsmasq/dhcpbogushostname.conf
bogus-priv
conf-file=/usr/share/dnsmasq/rfc6761.conf
no-dhcp-interface=pppoe-wan
dhcp-range=set:lan,192.168.0.200,192.168.0.254,255.255.255.0,12h

stubby config created by OpenWrt:
resolution_type: GETDNS_RESOLUTION_STUB
round_robin_upstreams: 1
appdata_dir: "/var/lib/stubby"
trust_anchors_backoff_time: 2500
tls_authentication: GETDNS_AUTHENTICATION_REQUIRED
tls_query_padding_blocksize: 128
edns_client_subnet_private: 1
idle_timeout: 10000
listen_addresses:
   - 127.0.0.1@5453
dns_transport_list:
   - GETDNS_TRANSPORT_TLS
upstream_recursive_servers:
   - address_data: 5.1.66.255
     tls_auth_name: "dot.ffmuc.net"
   - address_data: 185.150.99.255
     tls_auth_name: "dot.ffmuc.net"
   - address_data: 185.95.218.42
     tls_auth_name: "dns.digitale-gesellschaft.ch"
   - address_data: 185.95.218.43
     tls_auth_name: "dns.digitale-gesellschaft.ch"




_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss

Reply via email to