When analysing report [1] for non-responding queries over TCP, I have
found forwarded TCP connections have quite high timeout. If for whatever
reason the forwarder currently set as a last used forwarder is dropping
packets without reject, the TCP will timeout for about 120 seconds on my
system. That is way too much, I think any TCP clients will give up far
before that. This is just quick workaround to improve the situation, not
final fix.
The problem is if the client chooses to use TCP for whatever reason,
dnsmasq will never switch to actually working server until some UDP
request arrives to trigger re-evaluation of last_server responsiveness.
It might do so, but just inside single TCP process. It just stubbornly
forwards even 5th retry to the same server now, when another one might
be able to answer right away.
I think the proper solution would be implementation of event driven
reading of TCP sockets in the main process. I don't think using threads
is possible, because quite a lot of globals used. It should not fork
another processes without --no-daemon parameter, but just use poll() to
watch socket readiness, then read whatever is prepared already. Since
TCP DNS message says its length at the start, it can just allocate
buffer big enough for that connection and iteratively read without
blocking. Once it is read, it can parse it, process it. A bit of socket
magic would be required, but similar approach could solve also sending
with multiple calls without blocking. That would be big change however.
I think some feedback should be delivered to main dnsmasq process from
tcp processing children, just like cache is updated from
cache_end_insert(). I think it should be able to switch last_server used
due to feedback from tcp client process. I even think there should be
different last_server for UDP and different for TCP, but not untill TCP
can report issues too. But not sure what approach to choose. At first I
though about special F_flag, but all bits for cache record (struct crec)
are already used.
Alternative quick-fix might be in case the TCP request sending fails to
some server to generate UDP request with EDNS header added from tcp
child process to main process UDP socket. It would ensure UDP check is
done at the main process, which might change current used resolver for
following TCP connections too.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=2160466
--
Petr Menšík
Software Engineer, RHEL
Red Hat, http://www.redhat.com/
PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB
From c02cfcb0a358e04636ffd2bcc595860b25b3a440 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Petr=20Men=C5=A1=C3=ADk?= <pemen...@redhat.com>
Date: Wed, 17 May 2023 21:40:19 +0200
Subject: [PATCH] Add --dns-tcp-timeout option
Changes send timeout of outgoing TCP connections. Allows waiting just
short time for successful connection before trying another server.
Makes possible faster switch to working server if previous is not
responding. Default socket timeout seems too high for DNS connections.
---
man/dnsmasq.8 | 4 ++++
src/config.h | 1 +
src/dnsmasq.h | 1 +
src/forward.c | 10 ++++++++--
src/option.c | 10 +++++++++-
5 files changed, 23 insertions(+), 3 deletions(-)
diff --git a/man/dnsmasq.8 b/man/dnsmasq.8
index 30429df..94fd5b1 100644
--- a/man/dnsmasq.8
+++ b/man/dnsmasq.8
@@ -860,6 +860,10 @@ where this needs to be increased is when using web-server log file
resolvers, which can generate large numbers of concurrent queries. This
parameter actually controls the number of concurrent queries per server group, where a server group is the set of server(s) associated with a single domain. So if a domain has it's own server via --server=/example.com/1.2.3.4 and 1.2.3.4 is not responding, but queries for *.example.com cannot go elsewhere, then other queries will not be affected. On configurations with many such server groups and tight resources, this value may need to be reduced.
.TP
+.B --dns-tcp-timeout=<seconds>
+Sets send timeout for forwarded TCP connections. Can be used to reduce time of waiting
+for successful TCP connection. Default value 0 skips the change of it.
+.TP
.B --dnssec
Validate DNS replies and cache DNSSEC data. When forwarding DNS queries, dnsmasq requests the
DNSSEC records needed to validate the replies. The replies are validated and the result returned as
diff --git a/src/config.h b/src/config.h
index 88cf72e..5fd5cdf 100644
--- a/src/config.h
+++ b/src/config.h
@@ -61,6 +61,7 @@
#define LOOP_TEST_TYPE T_TXT
#define DEFAULT_FAST_RETRY 1000 /* ms, default delay before fast retry */
#define STALE_CACHE_EXPIRY 86400 /* 1 day in secs, default maximum expiry time for stale cache data */
+#define TCP_TIMEOUT 0 /* timeout of tcp outgoing connections */
/* compile-time options: uncomment below to enable or do eg.
make COPTS=-DHAVE_BROKEN_RTC
diff --git a/src/dnsmasq.h b/src/dnsmasq.h
index 2f95c12..4113ccb 100644
--- a/src/dnsmasq.h
+++ b/src/dnsmasq.h
@@ -1256,6 +1256,7 @@ extern struct daemon {
int tcp_pipes[MAX_PROCS];
int pipe_to_parent;
int numrrand;
+ int tcp_timeout;
struct randfd *randomsocks;
struct randfd_list *rfl_spare, *rfl_poll;
int v6pktinfo;
diff --git a/src/forward.c b/src/forward.c
index ecfeebd..f323fee 100644
--- a/src/forward.c
+++ b/src/forward.c
@@ -1929,8 +1929,14 @@ static ssize_t tcp_talk(int first, int last, int start, unsigned char *packet,
/* Copy connection mark of incoming query to outgoing connection. */
if (have_mark)
setsockopt(serv->tcpfd, SOL_SOCKET, SO_MARK, &mark, sizeof(unsigned int));
-#endif
-
+#endif
+
+ if (daemon->tcp_timeout>0)
+ {
+ struct timeval tv = {daemon->tcp_timeout, 0};
+ setsockopt(serv->tcpfd, SOL_SOCKET, SO_SNDTIMEO, &tv, sizeof(tv));
+ }
+
if ((!local_bind(serv->tcpfd, &serv->source_addr, serv->interface, 0, 1)))
{
close(serv->tcpfd);
diff --git a/src/option.c b/src/option.c
index 8322725..b94a5ff 100644
--- a/src/option.c
+++ b/src/option.c
@@ -190,6 +190,7 @@ struct myoption {
#define LOPT_FILTER_RR 381
#define LOPT_NO_DHCP6 382
#define LOPT_NO_DHCP4 383
+#define LOPT_TCP_TIMEOUT 384
#ifdef HAVE_GETOPT_LONG
static const struct option opts[] =
@@ -279,6 +280,7 @@ static const struct myoption opts[] =
{ "leasefile-ro", 0, 0, '9' },
{ "script-on-renewal", 0, 0, LOPT_SCRIPT_TIME},
{ "dns-forward-max", 1, 0, '0' },
+ { "dns-tcp-timeout", 1, 0, LOPT_TCP_TIMEOUT },
{ "clear-on-reload", 0, 0, LOPT_RELOAD },
{ "dhcp-ignore-names", 2, 0, LOPT_NO_NAMES },
{ "enable-tftp", 2, 0, LOPT_TFTP },
@@ -3391,7 +3393,12 @@ static int one_opt(int option, char *arg, char *errstr, char *gen_err, int comma
case '0': /* --dns-forward-max */
if (!atoi_check(arg, &daemon->ftabsize))
ret_err(gen_err);
- break;
+ break;
+
+ case LOPT_TCP_TIMEOUT: /* --dns-tcp-timeout */
+ if (!atoi_check(arg, &daemon->tcp_timeout))
+ ret_err(gen_err);
+ break;
case 'q': /* --log-queries */
set_option_bool(OPT_LOG);
@@ -5833,6 +5840,7 @@ void read_opts(int argc, char **argv, char *compile_opts)
daemon->soa_expiry = SOA_EXPIRY;
daemon->randport_limit = 1;
daemon->host_index = SRC_AH;
+ daemon->tcp_timeout = TCP_TIMEOUT;
/* See comment above make_servers(). Optimises server-read code. */
mark_servers(0);
--
2.40.1
_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss