On 12/08/2021 11:06, Simon Kelley wrote:
This is useful information, but the backtraces are puzzling: the code
isn't in tight loop, certainly.


I wonder if the v4only.arpa thing is not a coincidence?

Some things to try, please.

1) When the dnsmasq process is faulted, run

strace -p <pid of dnsmasq process>

Will do, so far I haven't been able to reproduce the issue today...

2) Try doing a query on ipv4only.arpa to dnsmasq directly, just in case
that's the whole trigger.

3) Same but straight to stubby

dig @127.0.0.1 -p 5453 ipv4only.arpa

Both work just fine, so maybe that was just a red herring.



4) Take a look in /usr/share/dnsmasq/rfc6761.conf

Is .arpa mentioned in there?

Nope, it's the default file as Kevin pasted


Cheers,

Simon.



On 11/08/2021 15:50, Andre Heider wrote:
On 11/08/2021 16:19, Andre Heider wrote:
Hi there,

it seems I can trigger this semi-reliably while powering up 2 KVM
windows instances in parallel.

This is the tail of log-queries:
dnsmasq[6591]: query[A] foo.internal from 192.168.0.foo
dnsmasq[6591]: DHCP foo.internal is 192.168.0.foo
dnsmasq[6591]: query[AAAA] foo.internal from 192.168.0.foo
dnsmasq[6591]: config foo.internal is NODATA-IPv6
dnsmasq[6591]: query[A] ipv4only.arpa from 192.168.0.foo

And this is where dnsmasq stop replying altogether and hogs the cpu.
Note that foo.internal is the kvm host with the ip 192.168.0.foo.

Using gdbserver doesn't yield too much info, looks like because of lto
maybe (which OpenWrt does for dnsmasq)?
(gdb) info threads
    Id   Target Id                  Frame
* 1    Thread 6591.6591 "dnsmasq" 0x555715f1 in forward_query.lto_priv ()
(gdb) bt full
#0  0x555715f1 in forward_query.lto_priv ()
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt
stack?)

Without LTO, another hang with the last log-queries 'dnsmasq[6842]:
query[A] ipv4only.arpa from 192.168.0.foo' again:

#0  0x55568969 in forward_query (udpfd=<optimized out>,
udpaddr=0x7fffbab8, dst_addr=0x7fffbaa8, dst_iface=15,
header=0x5561c080, plen=31, limit=0x5561c280 "", now=958007,
forward=0x5563e8e0, ad_reqd=0, do_bit=0) at forward.c:319
         is_sign = 1432469663
         pheader = 0x0
         flags = 0
         fwd_flags = <optimized out>
         is_dnssec = 0
         master = <optimized out>
         gotname = 128
         hash = 0x55624a10
         oph = 0x0
         old_src = <optimized out>
         first = 1432469663
         last = 12
         start = 0
         subnet = 0
         cacheable = 1431977952
         forwarded = 0
         edns0_len = 1431725623
         pheader = 0x80088 <error: Cannot access memory at address 0x80088>
         ede = -1
#1  0x5556947d in receive_query (listen=0x5561f5c0, now=958007) at
forward.c:1640
         ad_reqd = 0
         header = 0x5561c080
         source_addr = {sa = {sa_family = 2, sa_data =
"\306d\300\250\000(\000\000\000\000\000\000\000"}, in = {sin_family = 2,
sin_port = 50788, sin_addr = {s_addr = 3232235560}, sin_zero =
"\000\000\000\000\000\000\000"}, in6 = {sin6_family = 2, sin6_port =
50788, sin6_flowinfo = 3232235560, sin6_addr = {
               __in6_union = {__s6_addr =
"\000\000\000\000\000\000\000\000\000\000\001Ww\347", <incomplete
sequence \360>, __s6_addr16 = {0, 0, 0, 0, 0, 343, 30695, 3824},
__s6_addr32 = {0, 0, 343, 2011631344}}}, sin6_scope_id = 8162}}
         pheader = 0x7fffbb68 "UZ?\340w\373+x"
         type = 1
         udp_size = <optimized out>
         dst_addr = {addr4 = {s_addr = 3232235521}, addr6 = {__in6_union
= {__s6_addr =
"\300\250\000\001a\023\340\345\000\004\024\231w\354\066\240",
__s6_addr16 = {49320, 1, 24851, 57573, 4, 5273, 30700, 13984},
__s6_addr32 = {3232235521, 1628692709, 267417, 2011969184}}}, cname =
{target = {cache = 0xc0a80001,
               name = 0xc0a80001 <error: Cannot access memory at address
0xc0a80001>}, uid = 1628692709, is_name_ptr = 267417}, key = {keydata =
0xc0a80001, keylen = 24851, flags = 57573, keytag = 4, algo = 20
'\024'}, ds = {keydata = 0xc0a80001, keylen = 24851, keytag = 57573,
algo = 0 '\000', digest = 4 '\004'},
           srv = {target = 0xc0a80001, targetlen = 24851, srvport =
57573, priority = 4, weight = 5273}, log = {keytag = 49320, algo = 1,
digest = 24851, rcode = 57573, ede = 267417}}
         netmask = {s_addr = <optimized out>}
         dst_addr_4 = {s_addr = 3232235521}
         m = 0
         n = 31
         if_index = 15
         auth_dns = 0
         do_bit = 0
         have_pseudoheader = 0
         mark = 0
         have_mark = 0
         is_single_query = <optimized out>
         allowed = <optimized out>
         local_auth = <optimized out>
         iov = {{iov_base = 0x5561c080, iov_len = 1280}}
         msg = {msg_name = 0x7fffbab8, msg_namelen = 16, msg_iov =
0x7fffbaa0, msg_iovlen = 1, msg_control = 0x7fffbad4, msg_controllen =
24, msg_flags = 0}
         cmptr = <optimized out>
         control_u = {align = {cmsg_len = 24, cmsg_level = 0, cmsg_type =
8}, control6 =
"\000\000\000\030\000\000\000\000\000\000\000\b\000\000\000\017\300\250\000\001\300\250\000\001\000\000\000\030\000\000\000",

           control =
"\000\000\000\030\000\000\000\000\000\000\000\b\000\000\000\017\300\250\000\001\300\250\000\001"}

         family = <optimized out>
         check_dst = <optimized out>
#2  0x5556c853 in check_dns_listeners (now=<optimized out>) at
dnsmasq.c:1802
         serverfdp = <optimized out>
         listener = 0x5561f5c0
         rfl = <optimized out>
         i = <optimized out>
         pipefd = {1431977952, 2012949368}
#3  0x555570f9 in main (argc=<optimized out>, argv=<optimized out>) at
dnsmasq.c:1229
         timeout = <optimized out>
         bind_fallback = 0
         now = 958007
         sigact = {__sa_handler = {sa_handler = 0x1, sa_sigaction = 0x1},
sa_mask = {__bits = {0, 0, 0, 0, 0, 2013266592, 17788, 2013238320,
2013238324, 2013245196, 4, 0, 80, 0, 2012700248, 2012708864, 2013167616,
2013233152, 2013237244, 0, 691091872, 2013233152, 4, 2147466580,
1431657089, 2013091140, 2013086164,
               2013165476, 0, 0, 2013095836, 15630}}, sa_flags = 0,
sa_restorer = 0x22358}
         if_tmp = <optimized out>
         piperead = 24
         pipefd = {24, 25}
         err_pipe = {0, -1}
         ent_pw = <optimized out>
         script_uid = 0
         script_gid = 0
         gp = <optimized out>
         i = <optimized out>
         max_fd = 1024
         baduser = <optimized out>
         log_err = 0
         chown_warn = 0
         hdr = 0x55634fa0
         data = <optimized out>
         need_cap_net_admin = <optimized out>
         need_cap_net_raw = <optimized out>
         need_cap_net_bind_service = <optimized out>
         bound_device = 0x0
         did_bind = 0
         serv = <optimized out>
         netlink_warn = 0x0

Continuing and breaking again about 30 times mostly yields line 319 as
above (which is 'difftime(now, master->forwardtime) > FORWARD_TIME ||')

A few times it hits 341 though ('while (forward->blocking_query)'):

#0  0x5556896f in forward_query (udpfd=<optimized out>,
udpaddr=0x7fffbab8, dst_addr=0x7fffbaa8, dst_iface=15,
header=0x5561c080, plen=31, limit=0x5561c280 "", now=958007,
forward=0x5563e8e0, ad_reqd=0, do_bit=0) at forward.c:341
         is_sign = 1432469663
         pheader = 0x0
         flags = 0
         fwd_flags = <optimized out>
         is_dnssec = 0
         master = <optimized out>
         gotname = 128
         hash = 0x55624a10
         oph = 0x0
         old_src = <optimized out>
         first = 1432469663
         last = 12
         start = 0
         subnet = 0
         cacheable = 1431977952
         forwarded = 0
         edns0_len = 1431725623
         pheader = 0x80088 <error: Cannot access memory at address 0x80088>
         ede = -1
#1  0x5556947d in receive_query (listen=0x5561f5c0, now=958007) at
forward.c:1640
         ad_reqd = 0
         header = 0x5561c080
         source_addr = {sa = {sa_family = 2, sa_data =
"\306d\300\250\000(\000\000\000\000\000\000\000"}, in = {sin_family = 2,
sin_port = 50788, sin_addr = {s_addr = 3232235560}, sin_zero =
"\000\000\000\000\000\000\000"}, in6 = {sin6_family = 2, sin6_port =
50788, sin6_flowinfo = 3232235560, sin6_addr = {
               __in6_union = {__s6_addr =
"\000\000\000\000\000\000\000\000\000\000\001Ww\347", <incomplete
sequence \360>, __s6_addr16 = {0, 0, 0, 0, 0, 343, 30695, 3824},
__s6_addr32 = {0, 0, 343, 2011631344}}}, sin6_scope_id = 8162}}
         pheader = 0x7fffbb68 "UZ?\340w\373+x"
         type = 1
         udp_size = <optimized out>
         dst_addr = {addr4 = {s_addr = 3232235521}, addr6 = {__in6_union
= {__s6_addr =
"\300\250\000\001a\023\340\345\000\004\024\231w\354\066\240",
__s6_addr16 = {49320, 1, 24851, 57573, 4, 5273, 30700, 13984},
__s6_addr32 = {3232235521, 1628692709, 267417, 2011969184}}}, cname =
{target = {cache = 0xc0a80001,
               name = 0xc0a80001 <error: Cannot access memory at address
0xc0a80001>}, uid = 1628692709, is_name_ptr = 267417}, key = {keydata =
0xc0a80001, keylen = 24851, flags = 57573, keytag = 4, algo = 20
'\024'}, ds = {keydata = 0xc0a80001, keylen = 24851, keytag = 57573,
algo = 0 '\000', digest = 4 '\004'},
           srv = {target = 0xc0a80001, targetlen = 24851, srvport =
57573, priority = 4, weight = 5273}, log = {keytag = 49320, algo = 1,
digest = 24851, rcode = 57573, ede = 267417}}
         netmask = {s_addr = <optimized out>}
         dst_addr_4 = {s_addr = 3232235521}
         m = 0
         n = 31
         if_index = 15
         auth_dns = 0
         do_bit = 0
         have_pseudoheader = 0
         mark = 0
         have_mark = 0
         is_single_query = <optimized out>
         allowed = <optimized out>
         local_auth = <optimized out>
         iov = {{iov_base = 0x5561c080, iov_len = 1280}}
         msg = {msg_name = 0x7fffbab8, msg_namelen = 16, msg_iov =
0x7fffbaa0, msg_iovlen = 1, msg_control = 0x7fffbad4, msg_controllen =
24, msg_flags = 0}
         cmptr = <optimized out>
         control_u = {align = {cmsg_len = 24, cmsg_level = 0, cmsg_type =
8}, control6 =
"\000\000\000\030\000\000\000\000\000\000\000\b\000\000\000\017\300\250\000\001\300\250\000\001\000\000\000\030\000\000\000",

           control =
"\000\000\000\030\000\000\000\000\000\000\000\b\000\000\000\017\300\250\000\001\300\250\000\001"}

         family = <optimized out>
         check_dst = <optimized out>
#2  0x5556c853 in check_dns_listeners (now=<optimized out>) at
dnsmasq.c:1802
         serverfdp = <optimized out>
         listener = 0x5561f5c0
         rfl = <optimized out>
         i = <optimized out>
         pipefd = {1431977952, 2012949368}


Does that help, maybe?

Regards,
Andre

On 11/08/2021 08:36, Andre Heider wrote:
On 10/08/2021 23:11, Simon Kelley wrote:
On 10/08/2021 14:53, Dominik wrote:
Hey Simon,

various dnsmasq-2.86 test tags are around and it doesn't look like
there are any intermediate bugs around. The Pi-hole beta seems to have
attracted at least a few couple of dozen additional testers and
nothing
seems to have come up here, either.

How do you feel about tagging a v2.86 release?

Best regards,
Dominik



I've been accumulating patches for a couple of weeks, and started
working through those tonight. None are very intrusive or
controversial.
Once those are done, I'll tag rc1.

I'm using 2.86test6 on OpenWrt, and I think I've found a bug.
Detail's are vague so far but ever since I've started DoT with stubby
as upstream server, dnsmasq every now and then gets into a mode where
it stops responding to request completely and just sits there using
100% cpu power on one core.
I haven't found a way yet to trigger that reliably, but it feels like
timing/parallel requests.

Does that ring any bells?

Any hints on how to get more relevant info on the situation?
Sometimes it takes over a day, so verbose logs seem to be out of the
question (since it's a small router). How to I get a backtrace
without gdb?

config created by OpenWrt:
conf-file=/etc/dnsmasq.conf
dhcp-authoritative
domain-needed
no-resolv
localise-queries
enable-ubus=dnsmasq
expand-hosts
bind-dynamic
local-service
quiet-dhcp
edns-packet-max=1280
domain=internal
local=/internal/
server=127.0.0.1#5453
addn-hosts=/tmp/hosts
dhcp-leasefile=/tmp/dhcp.leases
stop-dns-rebind
conf-file=/usr/share/dnsmasq/trust-anchors.conf
dnssec
dhcp-broadcast=tag:needs-broadcast
conf-dir=/tmp/dnsmasq.d
user=dnsmasq
group=dnsmasq
dhcp-ignore-names=tag:dhcp_bogus_hostname
conf-file=/usr/share/dnsmasq/dhcpbogushostname.conf
bogus-priv
conf-file=/usr/share/dnsmasq/rfc6761.conf
no-dhcp-interface=pppoe-wan
dhcp-range=set:lan,192.168.0.200,192.168.0.254,255.255.255.0,12h

stubby config created by OpenWrt:
resolution_type: GETDNS_RESOLUTION_STUB
round_robin_upstreams: 1
appdata_dir: "/var/lib/stubby"
trust_anchors_backoff_time: 2500
tls_authentication: GETDNS_AUTHENTICATION_REQUIRED
tls_query_padding_blocksize: 128
edns_client_subnet_private: 1
idle_timeout: 10000
listen_addresses:
    - 127.0.0.1@5453
dns_transport_list:
    - GETDNS_TRANSPORT_TLS
upstream_recursive_servers:
    - address_data: 5.1.66.255
      tls_auth_name: "dot.ffmuc.net"
    - address_data: 185.150.99.255
      tls_auth_name: "dot.ffmuc.net"
    - address_data: 185.95.218.42
      tls_auth_name: "dns.digitale-gesellschaft.ch"
    - address_data: 185.95.218.43
      tls_auth_name: "dns.digitale-gesellschaft.ch"






_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss

Reply via email to