Hi Simon and dnsmasq contributors,

I am running dnsmasq with a blocklist from
https://github.com/notracking/hosts-blocklists/blob/master/dnsmasq/dnsmasq.blacklist.txt

I have noticed that building dnsmasq with libidn2 support (which my distro
does) can cause extreme slowdowns. The slowdowns seem to come from the call
to idn2_to_ascii_lz in canonicalise() being very slow.

idn2_to_ascii_lz is run on every domain name in the blocklist to encode
special characters, and this is surprisingly slow even when there are no
special characters. I developed a patch (attached to this email) that
checks a domain name for other characters than . - a-z 0-9. If any such
character is found, the domain name will be encoded. If no such character
is found the domain name will not be encoded (as encoding won't change it).
This removes most of the overhead of using libidn2. Unless you find any
problems with this approach, I wish the patch can be mainlined.

Some benchmarks on a Raspberry Pi (slow, but probably not an uncommon
device for running dnsmasq) running ArchLinux and dnsmasq git master:

# Without libidn2: Acceptable speed
> make
> time ./src/dnsmasq -C dnsmasq.blacklist.txt --test
dnsmasq: syntax check OK.

real 0m3.699s
user 0m3.468s
sys 0m0.200s



# With libidn2: To slow to be usable
> make COPTS="-DHAVE_LIBIDN2"
> time ./src/dnsmasq -C dnsmasq.blacklist.txt --test
dnsmasq: syntax check OK.

real 1m6.921s
user 0m59.509s
sys 0m0.606s


# With libidn2 and attached patch: Back to acceptable speed
> git am 0001-Avoid-IDN-translations-when-not-needed.patch
> make COPTS="-DHAVE_LIBIDN2"
> time ./src/dnsmasq -C dnsmasq.blacklist.txt --test
dnsmasq: syntax check OK.

real 0m3.903s
user 0m3.643s
sys 0m0.219s

Best regards,
Gustaf
From 17c6833d6718650cb6927679185e8f9be7152591 Mon Sep 17 00:00:00 2001
From: Gustaf Ullberg <gustaf.ullb...@gmail.com>
Date: Mon, 6 Sep 2021 10:45:51 +0000
Subject: [PATCH] Avoid IDN translations when not needed

This is an optimisation that bypasses the IDN encoding of domain
names that do no contain special characters.
---
 src/util.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/src/util.c b/src/util.c
index 1425764..771abe5 100644
--- a/src/util.c
+++ b/src/util.c
@@ -193,6 +193,21 @@ int legal_hostname(char *name)
   return 1;
 }
   
+#if defined(HAVE_IDN) || defined(HAVE_LIBIDN2)
+/* Domain names consisting solely of a-z 0-9 - .
+   do not need IDN encoding. */
+int domain_name_needs_encoding(const char *s)
+{
+  for(; *s; ++s)
+    if (!((*s >= 'a' && *s <= 'z') ||
+	  (*s >= '0' && *s <= '9') ||
+	   *s == '-' ||
+	   *s == '.'))
+      return 1;
+  return 0;
+}
+#endif
+
 char *canonicalise(char *in, int *nomem)
 {
   char *ret = NULL;
@@ -204,12 +219,14 @@ char *canonicalise(char *in, int *nomem)
   if (!(rc = check_name(in)))
     return NULL;
   
+#if defined(HAVE_IDN) || defined(HAVE_LIBIDN2)
+  if (domain_name_needs_encoding(in)
 #if defined(HAVE_LIBIDN2) && (!defined(IDN2_VERSION_NUMBER) || IDN2_VERSION_NUMBER < 0x02000003)
   /* older libidn2 strips underscores, so don't do IDN processing
      if the name has an underscore (check_name() returned 2) */
-  if (rc != 2)
+  && (rc != 2)
 #endif
-#if defined(HAVE_IDN) || defined(HAVE_LIBIDN2)
+  )
     {
 #  ifdef HAVE_LIBIDN2
       rc = idn2_to_ascii_lz(in, &ret, IDN2_NONTRANSITIONAL);
-- 
2.33.0

_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss

Reply via email to