I am still not sure what exactly causes this problem, but I have hit it
again. I am sure it happens sometimes, when I disconnect from my Lenovo
docking station and then connect back to it.
Interesting thing I have found is it gets unblocked by sending a simple
dig -4 @localhost +tcp fedoraproject.org query. TCP query seems to do
enumerate_interfaces(0) on every query, which fixes incorrect ifindex
and unblocks the dnsmasq.
I am not sure why check_servers(0); called from dbus.c does not fix this
reliably. It seems to me it should. It may be just delayed or run too
soon. I think we can afford enumerating interface on fatal error, which
results in REFUSED response anyway.
It runs with these parameters:
/usr/sbin/dnsmasq --no-resolv --keep-in-foreground --no-hosts
--bind-interfaces --pid-file=/run/NetworkManager/dnsmasq.pid
--listen-address=127.0.0.1 --cache-size=400 --clear-on-reload
--conf-file=/dev/null --proxy-dnssec
--enable-dbus=org.freedesktop.NetworkManager.dnsmasq
--conf-dir=/etc/NetworkManager/dnsmasq.d
But it seems to me local_bind would bind interface whether
--bind-interfaces or --bind-dynamic is present. So I think no condition
should be for enumerate_interfaces(0); call in this case as well.
I have created for it bug #2247269 [1] for tracking this.
1. https://bugzilla.redhat.com/show_bug.cgi?id=2247269
On 16. 10. 23 15:02, Petr Menšík wrote:
Hello everyone.
Today I have returned to work, where I am running dnsmasq 2.89 on my
Fedora 27 laptop. It is configured by Network Manager by its
dns=dnsmasq plugin. But when I returned today, I have found our
internal network refused to resolve any name. I dug into dnsmasq what
it does. Problem is it did not fix itself after a while, but
stubbornly failed without later fix.
It were failing quite often on random_sock() local_bind call. The
errno returned 99. I have noticed it failed to notice change of
ifindex in interface it should be bound to.
(gdb) bt
#0 0x00007f53305e7020 in strerror () from /lib64/libc.so.6
#1 0x00005557a3ec2c4b in random_sock (s=s@entry=0x5557a43fef50) at
/usr/src/debug/dnsmasq-2.89-5.fc37.x86_64/src/forward.c:2511
#2 0x00005557a3ec62f2 in allocate_rfd
(fdlp=fdlp@entry=0x5557a43f5280, serv=serv@entry=0x5557a43fef50)
at /usr/src/debug/dnsmasq-2.89-5.fc37.x86_64/src/forward.c:2607
#3 0x00005557a3ec72dc in forward_query (udpfd=4,
udpaddr=0x7ffdb6bfbd30, dst_addr=0x7ffdb6bfbd00, dst_iface=0,
header=0x5557a43e03d0, plen=51,
limit=0x5557a43e0880 "", now=1697453089, forward=0x5557a43f5230,
ad_reqd=1, do_bit=0, fast_retry=0)
at /usr/src/debug/dnsmasq-2.89-5.fc37.x86_64/src/forward.c:498
#4 0x00005557a3ed0ebd in receive_query (now=1697453089,
listen=0x5557a43e0cc0) at
/usr/src/debug/dnsmasq-2.89-5.fc37.x86_64/src/forward.c:1869
#5 check_dns_listeners (now=1697453089) at
/usr/src/debug/dnsmasq-2.89-5.fc37.x86_64/src/dnsmasq.c:1845
#6 0x00005557a3eac9ef in main (argc=<optimized out>, argv=<optimized
out>) at /usr/src/debug/dnsmasq-2.89-5.fc37.x86_64/src/dnsmasq.c:1266
(gdb) p *$d->servers->next->next->next->next->next->next
$8 = {flags = 800, domain_len = 14, domain = 0x5557a43f5eb0
"brq.redhat.com", next = 0x5557a43ffa10, serial = 6, arrayposn = 23,
last_server = -1, addr = {sa = {sa_family = 2, sa_data =
"\0005\n&\005\032\226\r\2170S\177\000"}, in = {sin_family = 2,
sin_port = 13568,
sin_addr = {s_addr = 436545034}, sin_zero =
"\226\r\2170S\177\000"}, in6 = {sin6_family = 2, sin6_port = 13568,
sin6_flowinfo = 436545034,
sin6_addr = {__in6_u = {__u6_addr8 =
"\226\r\2170S\177\000\0000\275\001\a\220\000\000", __u6_addr16 =
{3478, 12431, 32595, 0, 48432, 1793,
144, 0}, __u6_addr32 = {814681494, 32595, 117554480,
144}}}, sin6_scope_id = 3446832640}}, source_addr = {sa = {sa_family = 2,
sa_data = "\000\000\000\000\000\000@\274\277\266\375\177\000"},
in = {sin_family = 2, sin_port = 0, sin_addr = {s_addr = 0},
sin_zero = "@\274\277\266\375\177\000"}, in6 = {sin6_family = 2,
sin6_port = 0, sin6_flowinfo = 0, sin6_addr = {__in6_u = {
__u6_addr8 =
"@\274\277\266\375\177\000\000@\274\277\266\375\177\000", __u6_addr16
= {48192, 46783, 32765, 0, 48192, 46783, 32765, 0},
__u6_addr32 = {3066018880, 32765, 3066018880, 32765}}},
sin6_scope_id = 814672583}},
interface = "enp9s0u1\000\000\000\000\000\000\000\000", ifindex = 7,
sfd = 0x0, tcpfd = 0, edns_pktsz = 1232, pktsz_reduced = 0, queries =
446,
failed_queries = 0, nxdomain_replies = 0, retrys = 4, query_latency
= 0, mma_latency = 0, forwardtime = 0, forwardcount = 0, uid =
3867576473}
(gdb) p *$d->servers->next->next->next->next->next->next->next
$9 = {flags = 800, domain_len = 10, domain = 0x5557a43ff9f0
"redhat.com", next = 0x5557a43f5fb0, serial = 7, arrayposn = 25,
last_server = -1,
addr = {sa = {sa_family = 2, sa_data =
"\0005\n&\005\032\226\r\2170S\177\000"}, in = {sin_family = 2,
sin_port = 13568, sin_addr = {
s_addr = 436545034}, sin_zero = "\226\r\2170S\177\000"}, in6 =
{sin6_family = 2, sin6_port = 13568, sin6_flowinfo = 436545034,
sin6_addr = {__in6_u = {__u6_addr8 =
"\226\r\2170S\177\000\0000\275\001\a\220\000\000", __u6_addr16 =
{3478, 12431, 32595, 0, 48432, 1793,
144, 0}, __u6_addr32 = {814681494, 32595, 117554480,
144}}}, sin6_scope_id = 3446832640}}, source_addr = {sa = {sa_family = 2,
sa_data = "\000\000\000\000\000\000@\274\277\266\375\177\000"},
in = {sin_family = 2, sin_port = 0, sin_addr = {s_addr = 0},
sin_zero = "@\274\277\266\375\177\000"}, in6 = {sin6_family = 2,
sin6_port = 0, sin6_flowinfo = 0, sin6_addr = {__in6_u = {
__u6_addr8 =
"@\274\277\266\375\177\000\000@\274\277\266\375\177\000", __u6_addr16
= {48192, 46783, 32765, 0, 48192, 46783, 32765, 0},
__u6_addr32 = {3066018880, 32765, 3066018880, 32765}}},
sin6_scope_id = 814672583}},
interface = "enp9s0u1\000\000\000\000\000\000\000\000", ifindex = 6,
sfd = 0x0, tcpfd = 0, edns_pktsz = 1232, pktsz_reduced = 0,
queries = 6480, failed_queries = 0, nxdomain_replies = 0, retrys =
134, query_latency = 34, mma_latency = 4414, forwardtime = 0,
forwardcount = 0, uid = 3578949556}
$ ip a show dev enp9s0u1
7: enp9s0u1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
state UP group default qlen 1000
link/ether 00:50:b6:b4:17:b2 brd ff:ff:ff:ff:ff:ff
inet 10.43.2.229/24 brd 10.43.2.255 scope global dynamic
noprefixroute enp9s0u1
valid_lft 56729sec preferred_lft 56729sec
inet6 2620:52:0:2b02:b3ba:7320:65f8:1fff/64 scope global dynamic
noprefixroute
valid_lft 2591999sec preferred_lft 604799sec
inet6 fe80::b2f:65c5:d743:524b/64 scope link noprefixroute
valid_lft forever preferred_lft forever
The problem seems to be wrong ifindex for redhat.com domain, while for
brq.redhat.com it has refreshed correctly. I am not sure how exactly
did that happen, but I think I have saw that few times already. I am
not sure about exact steps required to reproduce this issue, but I
think it would be related to undocking from thunderbolt and
reconnecting again. Has anyone else saw similar behaviour?
It seems to me call to enumerate_interfaces(0) should have fixed this.
I wonder whether it would make sense to call it explicitly after
local_bind failure. Because full journal I do not have details about
interface changes anymore:
journalctl -xeu NetworkManager | grep 'failed to bind server socket to
enp9s0u1' | wc -l
711
Has similar error been seen in the wild? Is there fix for it, which I
have failed to find?
Cheers,
Petr
--
Petr Menšík
Software Engineer, RHEL
Red Hat, http://www.redhat.com/
PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB
From 94c928d6fbc5ed829302970b42cf82a2533eb24a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Petr=20Men=C5=A1=C3=ADk?= <pemen...@redhat.com>
Date: Tue, 31 Oct 2023 16:22:17 +0100
Subject: [PATCH] Force interface enumeration after local_bind failure
I have seen such errors when dnsmasq is running from NetworkManager as
DBus-1 managed service. NM configures resolvers including interface
names. But only in some situations it fails to refresh interfaces
correctly and old ifindex stays there.
I have seen that on thunderbolt dock station, which interface is removed
after disconnection and new with the same name, but different ifindex is
created. Because NM does not use bind-dynamic, dnsmasq is not watching
state of interfaces. For some reason DBus reconfiguration does not help
sometimes, leaving the machine without working resolution.
It will not fix itself even after some time. But can be fixed by doing
just one TCP query, which enumerates interface always. Do that if we
have printed bind interface error to prevent spamming with the same
error indefinitely.
---
src/forward.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/src/forward.c b/src/forward.c
index 6c36cde..07f954f 100644
--- a/src/forward.c
+++ b/src/forward.c
@@ -2536,6 +2536,8 @@ static int random_sock(struct server *s)
my_syslog(LOG_ERR, _("failed to bind server socket to %s: %s"),
daemon->addrbuff, strerror(errno));
+ /* If we failed to catch interface changes, force it here. */
+ enumerate_interfaces(0);
}
close(fd);
--
2.41.0
_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss