On 11/08/2021 16:19, Andre Heider wrote:
Hi there,
it seems I can trigger this semi-reliably while powering up 2 KVM
windows instances in parallel.
This is the tail of log-queries:
dnsmasq[6591]: query[A] foo.internal from 192.168.0.foo
dnsmasq[6591]: DHCP foo.internal is 192.168.0.foo
dnsmasq[6591]: query[AAAA] foo.internal from 192.168.0.foo
dnsmasq[6591]: config foo.internal is NODATA-IPv6
dnsmasq[6591]: query[A] ipv4only.arpa from 192.168.0.foo
And this is where dnsmasq stop replying altogether and hogs the cpu.
Note that foo.internal is the kvm host with the ip 192.168.0.foo.
Using gdbserver doesn't yield too much info, looks like because of lto
maybe (which OpenWrt does for dnsmasq)?
(gdb) info threads
Id Target Id Frame
* 1 Thread 6591.6591 "dnsmasq" 0x555715f1 in forward_query.lto_priv ()
(gdb) bt full
#0 0x555715f1 in forward_query.lto_priv ()
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Without LTO, another hang with the last log-queries 'dnsmasq[6842]:
query[A] ipv4only.arpa from 192.168.0.foo' again:
#0 0x55568969 in forward_query (udpfd=<optimized out>,
udpaddr=0x7fffbab8, dst_addr=0x7fffbaa8, dst_iface=15,
header=0x5561c080, plen=31, limit=0x5561c280 "", now=958007,
forward=0x5563e8e0, ad_reqd=0, do_bit=0) at forward.c:319
is_sign = 1432469663
pheader = 0x0
flags = 0
fwd_flags = <optimized out>
is_dnssec = 0
master = <optimized out>
gotname = 128
hash = 0x55624a10
oph = 0x0
old_src = <optimized out>
first = 1432469663
last = 12
start = 0
subnet = 0
cacheable = 1431977952
forwarded = 0
edns0_len = 1431725623
pheader = 0x80088 <error: Cannot access memory at address 0x80088>
ede = -1
#1 0x5556947d in receive_query (listen=0x5561f5c0, now=958007) at
forward.c:1640
ad_reqd = 0
header = 0x5561c080
source_addr = {sa = {sa_family = 2, sa_data =
"\306d\300\250\000(\000\000\000\000\000\000\000"}, in = {sin_family = 2,
sin_port = 50788, sin_addr = {s_addr = 3232235560}, sin_zero =
"\000\000\000\000\000\000\000"}, in6 = {sin6_family = 2, sin6_port =
50788, sin6_flowinfo = 3232235560, sin6_addr = {
__in6_union = {__s6_addr =
"\000\000\000\000\000\000\000\000\000\000\001Ww\347", <incomplete
sequence \360>, __s6_addr16 = {0, 0, 0, 0, 0, 343, 30695, 3824},
__s6_addr32 = {0, 0, 343, 2011631344}}}, sin6_scope_id = 8162}}
pheader = 0x7fffbb68 "UZ?\340w\373+x"
type = 1
udp_size = <optimized out>
dst_addr = {addr4 = {s_addr = 3232235521}, addr6 = {__in6_union
= {__s6_addr =
"\300\250\000\001a\023\340\345\000\004\024\231w\354\066\240",
__s6_addr16 = {49320, 1, 24851, 57573, 4, 5273, 30700, 13984},
__s6_addr32 = {3232235521, 1628692709, 267417, 2011969184}}}, cname =
{target = {cache = 0xc0a80001,
name = 0xc0a80001 <error: Cannot access memory at address
0xc0a80001>}, uid = 1628692709, is_name_ptr = 267417}, key = {keydata =
0xc0a80001, keylen = 24851, flags = 57573, keytag = 4, algo = 20
'\024'}, ds = {keydata = 0xc0a80001, keylen = 24851, keytag = 57573,
algo = 0 '\000', digest = 4 '\004'},
srv = {target = 0xc0a80001, targetlen = 24851, srvport =
57573, priority = 4, weight = 5273}, log = {keytag = 49320, algo = 1,
digest = 24851, rcode = 57573, ede = 267417}}
netmask = {s_addr = <optimized out>}
dst_addr_4 = {s_addr = 3232235521}
m = 0
n = 31
if_index = 15
auth_dns = 0
do_bit = 0
have_pseudoheader = 0
mark = 0
have_mark = 0
is_single_query = <optimized out>
allowed = <optimized out>
local_auth = <optimized out>
iov = {{iov_base = 0x5561c080, iov_len = 1280}}
msg = {msg_name = 0x7fffbab8, msg_namelen = 16, msg_iov =
0x7fffbaa0, msg_iovlen = 1, msg_control = 0x7fffbad4, msg_controllen =
24, msg_flags = 0}
cmptr = <optimized out>
control_u = {align = {cmsg_len = 24, cmsg_level = 0, cmsg_type
= 8}, control6 =
"\000\000\000\030\000\000\000\000\000\000\000\b\000\000\000\017\300\250\000\001\300\250\000\001\000\000\000\030\000\000\000",
control =
"\000\000\000\030\000\000\000\000\000\000\000\b\000\000\000\017\300\250\000\001\300\250\000\001"}
family = <optimized out>
check_dst = <optimized out>
#2 0x5556c853 in check_dns_listeners (now=<optimized out>) at
dnsmasq.c:1802
serverfdp = <optimized out>
listener = 0x5561f5c0
rfl = <optimized out>
i = <optimized out>
pipefd = {1431977952, 2012949368}
#3 0x555570f9 in main (argc=<optimized out>, argv=<optimized out>) at
dnsmasq.c:1229
timeout = <optimized out>
bind_fallback = 0
now = 958007
sigact = {__sa_handler = {sa_handler = 0x1, sa_sigaction =
0x1}, sa_mask = {__bits = {0, 0, 0, 0, 0, 2013266592, 17788, 2013238320,
2013238324, 2013245196, 4, 0, 80, 0, 2012700248, 2012708864, 2013167616,
2013233152, 2013237244, 0, 691091872, 2013233152, 4, 2147466580,
1431657089, 2013091140, 2013086164,
2013165476, 0, 0, 2013095836, 15630}}, sa_flags = 0,
sa_restorer = 0x22358}
if_tmp = <optimized out>
piperead = 24
pipefd = {24, 25}
err_pipe = {0, -1}
ent_pw = <optimized out>
script_uid = 0
script_gid = 0
gp = <optimized out>
i = <optimized out>
max_fd = 1024
baduser = <optimized out>
log_err = 0
chown_warn = 0
hdr = 0x55634fa0
data = <optimized out>
need_cap_net_admin = <optimized out>
need_cap_net_raw = <optimized out>
need_cap_net_bind_service = <optimized out>
bound_device = 0x0
did_bind = 0
serv = <optimized out>
netlink_warn = 0x0
Continuing and breaking again about 30 times mostly yields line 319 as
above (which is 'difftime(now, master->forwardtime) > FORWARD_TIME ||')
A few times it hits 341 though ('while (forward->blocking_query)'):
#0 0x5556896f in forward_query (udpfd=<optimized out>,
udpaddr=0x7fffbab8, dst_addr=0x7fffbaa8, dst_iface=15,
header=0x5561c080, plen=31, limit=0x5561c280 "", now=958007,
forward=0x5563e8e0, ad_reqd=0, do_bit=0) at forward.c:341
is_sign = 1432469663
pheader = 0x0
flags = 0
fwd_flags = <optimized out>
is_dnssec = 0
master = <optimized out>
gotname = 128
hash = 0x55624a10
oph = 0x0
old_src = <optimized out>
first = 1432469663
last = 12
start = 0
subnet = 0
cacheable = 1431977952
forwarded = 0
edns0_len = 1431725623
pheader = 0x80088 <error: Cannot access memory at address 0x80088>
ede = -1
#1 0x5556947d in receive_query (listen=0x5561f5c0, now=958007) at
forward.c:1640
ad_reqd = 0
header = 0x5561c080
source_addr = {sa = {sa_family = 2, sa_data =
"\306d\300\250\000(\000\000\000\000\000\000\000"}, in = {sin_family = 2,
sin_port = 50788, sin_addr = {s_addr = 3232235560}, sin_zero =
"\000\000\000\000\000\000\000"}, in6 = {sin6_family = 2, sin6_port =
50788, sin6_flowinfo = 3232235560, sin6_addr = {
__in6_union = {__s6_addr =
"\000\000\000\000\000\000\000\000\000\000\001Ww\347", <incomplete
sequence \360>, __s6_addr16 = {0, 0, 0, 0, 0, 343, 30695, 3824},
__s6_addr32 = {0, 0, 343, 2011631344}}}, sin6_scope_id = 8162}}
pheader = 0x7fffbb68 "UZ?\340w\373+x"
type = 1
udp_size = <optimized out>
dst_addr = {addr4 = {s_addr = 3232235521}, addr6 = {__in6_union
= {__s6_addr =
"\300\250\000\001a\023\340\345\000\004\024\231w\354\066\240",
__s6_addr16 = {49320, 1, 24851, 57573, 4, 5273, 30700, 13984},
__s6_addr32 = {3232235521, 1628692709, 267417, 2011969184}}}, cname =
{target = {cache = 0xc0a80001,
name = 0xc0a80001 <error: Cannot access memory at address
0xc0a80001>}, uid = 1628692709, is_name_ptr = 267417}, key = {keydata =
0xc0a80001, keylen = 24851, flags = 57573, keytag = 4, algo = 20
'\024'}, ds = {keydata = 0xc0a80001, keylen = 24851, keytag = 57573,
algo = 0 '\000', digest = 4 '\004'},
srv = {target = 0xc0a80001, targetlen = 24851, srvport =
57573, priority = 4, weight = 5273}, log = {keytag = 49320, algo = 1,
digest = 24851, rcode = 57573, ede = 267417}}
netmask = {s_addr = <optimized out>}
dst_addr_4 = {s_addr = 3232235521}
m = 0
n = 31
if_index = 15
auth_dns = 0
do_bit = 0
have_pseudoheader = 0
mark = 0
have_mark = 0
is_single_query = <optimized out>
allowed = <optimized out>
local_auth = <optimized out>
iov = {{iov_base = 0x5561c080, iov_len = 1280}}
msg = {msg_name = 0x7fffbab8, msg_namelen = 16, msg_iov =
0x7fffbaa0, msg_iovlen = 1, msg_control = 0x7fffbad4, msg_controllen =
24, msg_flags = 0}
cmptr = <optimized out>
control_u = {align = {cmsg_len = 24, cmsg_level = 0, cmsg_type
= 8}, control6 =
"\000\000\000\030\000\000\000\000\000\000\000\b\000\000\000\017\300\250\000\001\300\250\000\001\000\000\000\030\000\000\000",
control =
"\000\000\000\030\000\000\000\000\000\000\000\b\000\000\000\017\300\250\000\001\300\250\000\001"}
family = <optimized out>
check_dst = <optimized out>
#2 0x5556c853 in check_dns_listeners (now=<optimized out>) at
dnsmasq.c:1802
serverfdp = <optimized out>
listener = 0x5561f5c0
rfl = <optimized out>
i = <optimized out>
pipefd = {1431977952, 2012949368}
Does that help, maybe?
Regards,
Andre
On 11/08/2021 08:36, Andre Heider wrote:
On 10/08/2021 23:11, Simon Kelley wrote:
On 10/08/2021 14:53, Dominik wrote:
Hey Simon,
various dnsmasq-2.86 test tags are around and it doesn't look like
there are any intermediate bugs around. The Pi-hole beta seems to have
attracted at least a few couple of dozen additional testers and nothing
seems to have come up here, either.
How do you feel about tagging a v2.86 release?
Best regards,
Dominik
I've been accumulating patches for a couple of weeks, and started
working through those tonight. None are very intrusive or controversial.
Once those are done, I'll tag rc1.
I'm using 2.86test6 on OpenWrt, and I think I've found a bug. Detail's
are vague so far but ever since I've started DoT with stubby as
upstream server, dnsmasq every now and then gets into a mode where it
stops responding to request completely and just sits there using 100%
cpu power on one core.
I haven't found a way yet to trigger that reliably, but it feels like
timing/parallel requests.
Does that ring any bells?
Any hints on how to get more relevant info on the situation? Sometimes
it takes over a day, so verbose logs seem to be out of the question
(since it's a small router). How to I get a backtrace without gdb?
config created by OpenWrt:
conf-file=/etc/dnsmasq.conf
dhcp-authoritative
domain-needed
no-resolv
localise-queries
enable-ubus=dnsmasq
expand-hosts
bind-dynamic
local-service
quiet-dhcp
edns-packet-max=1280
domain=internal
local=/internal/
server=127.0.0.1#5453
addn-hosts=/tmp/hosts
dhcp-leasefile=/tmp/dhcp.leases
stop-dns-rebind
conf-file=/usr/share/dnsmasq/trust-anchors.conf
dnssec
dhcp-broadcast=tag:needs-broadcast
conf-dir=/tmp/dnsmasq.d
user=dnsmasq
group=dnsmasq
dhcp-ignore-names=tag:dhcp_bogus_hostname
conf-file=/usr/share/dnsmasq/dhcpbogushostname.conf
bogus-priv
conf-file=/usr/share/dnsmasq/rfc6761.conf
no-dhcp-interface=pppoe-wan
dhcp-range=set:lan,192.168.0.200,192.168.0.254,255.255.255.0,12h
stubby config created by OpenWrt:
resolution_type: GETDNS_RESOLUTION_STUB
round_robin_upstreams: 1
appdata_dir: "/var/lib/stubby"
trust_anchors_backoff_time: 2500
tls_authentication: GETDNS_AUTHENTICATION_REQUIRED
tls_query_padding_blocksize: 128
edns_client_subnet_private: 1
idle_timeout: 10000
listen_addresses:
- 127.0.0.1@5453
dns_transport_list:
- GETDNS_TRANSPORT_TLS
upstream_recursive_servers:
- address_data: 5.1.66.255
tls_auth_name: "dot.ffmuc.net"
- address_data: 185.150.99.255
tls_auth_name: "dot.ffmuc.net"
- address_data: 185.95.218.42
tls_auth_name: "dns.digitale-gesellschaft.ch"
- address_data: 185.95.218.43
tls_auth_name: "dns.digitale-gesellschaft.ch"
_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss