Thanks Petr for having a look at this.
Since IDN processing do turns uppercase letters into lowercase, I
deliberately left uppercase letters out.
I think your approach to put everything in check_name() makes sense, even
if that function grows and maybe is starting to become a bit hard to read.

I took your patch and made a couple of changes:
- IDN processing is performed if there are uppercase letters present in the
name (unless we have an old version of libidn2 and there is an underscore
in the name).
- IDN processing is always performed if there are non-ascii characters in
the name, no matter if there are underscores or not (a non-processed name
containing non-ascii characters sounds dangerous).

The size of the blacklist is about 230000 lines, and I agree that it would
make sense to also file a bug on libidn2.

Den ons 8 sep. 2021 kl 15:25 skrev Petr Menšík <pemen...@redhat.com>:

> I think your check should also accept uppercase ASCII letters. Anyway,
> similar check is already done in check_names, which is there to skip names
> containing underscore with older libidn2 versions. I guess it could return
> 2 also in case ascii-only characters were detected, instead of checking the
> name again in another loop.
>
> Attached alternative change, which would process only names not only ascii
> names. Changes check_names to return 2 when IDN should be used. Printing
> ascii names should be safe, even when they contain characters not allowed
> by hostnames. Such as _, +, = or whatever garbage is present. As long as it
> is readable in logs, it should not matter.
>
> How many lines does your dnsmasq.blacklist.txt contain? Those differences
> are significant. Maybe bug should be filled on libidn2. Conversion from
> ascii-only name to ascii name should not take too long even if it was
> called.
> On 9/6/21 3:27 PM, Gustaf Ullberg wrote:
>
> Hi Simon and dnsmasq contributors,
>
> I am running dnsmasq with a blocklist from
>
> https://github.com/notracking/hosts-blocklists/blob/master/dnsmasq/dnsmasq.blacklist.txt
>
> I have noticed that building dnsmasq with libidn2 support (which my distro
> does) can cause extreme slowdowns. The slowdowns seem to come from the call
> to idn2_to_ascii_lz in canonicalise() being very slow.
>
> idn2_to_ascii_lz is run on every domain name in the blocklist to encode
> special characters, and this is surprisingly slow even when there are no
> special characters. I developed a patch (attached to this email) that
> checks a domain name for other characters than . - a-z 0-9. If any such
> character is found, the domain name will be encoded. If no such character
> is found the domain name will not be encoded (as encoding won't change it).
> This removes most of the overhead of using libidn2. Unless you find any
> problems with this approach, I wish the patch can be mainlined.
>
> Some benchmarks on a Raspberry Pi (slow, but probably not an uncommon
> device for running dnsmasq) running ArchLinux and dnsmasq git master:
>
> # Without libidn2: Acceptable speed
> > make
> > time ./src/dnsmasq -C dnsmasq.blacklist.txt --test
> dnsmasq: syntax check OK.
>
> real 0m3.699s
> user 0m3.468s
> sys 0m0.200s
>
>
>
> # With libidn2: To slow to be usable
> > make COPTS="-DHAVE_LIBIDN2"
> > time ./src/dnsmasq -C dnsmasq.blacklist.txt --test
> dnsmasq: syntax check OK.
>
> real 1m6.921s
> user 0m59.509s
> sys 0m0.606s
>
>
> # With libidn2 and attached patch: Back to acceptable speed
> > git am 0001-Avoid-IDN-translations-when-not-needed.patch
> > make COPTS="-DHAVE_LIBIDN2"
> > time ./src/dnsmasq -C dnsmasq.blacklist.txt --test
> dnsmasq: syntax check OK.
>
> real 0m3.903s
> user 0m3.643s
> sys 0m0.219s
>
> Best regards,
> Gustaf
>
> _______________________________________________
> Dnsmasq-discuss mailing 
> listdnsmasq-disc...@lists.thekelleys.org.ukhttps://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
>
> --
> Petr Menšík
> Software Engineer
> Red Hat, http://www.redhat.com/
> email: pemen...@redhat.com
> PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB
>
> _______________________________________________
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss@lists.thekelleys.org.uk
> https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
>
From d5dbb66203c6dcd77276ebb243b8921ac975166d Mon Sep 17 00:00:00 2001
From: Gustaf Ullberg <gustaf.ullb...@gmail.com>
Date: Wed, 8 Sep 2021 22:57:15 +0200
Subject: [PATCH] check_name() determines if IDN processing is needed.

Optimization that only runs IDN processing if it would alter the domain
name (non-ascii or uppercase characters).
---
 src/util.c | 60 +++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 41 insertions(+), 19 deletions(-)

diff --git a/src/util.c b/src/util.c
index 1425764..3842738 100644
--- a/src/util.c
+++ b/src/util.c
@@ -115,7 +115,8 @@ u64 rand64(void)
   return (u64)out[outleft+1] + (((u64)out[outleft]) << 32);
 }
 
-/* returns 2 if names is OK but contains one or more underscores */
+/* returns 1 if name is OK and ascii printable
+ * returns 2 if name should be processed by IDN */
 static int check_name(char *in)
 {
   /* remove trailing . 
@@ -123,7 +124,9 @@ static int check_name(char *in)
   size_t dotgap = 0, l = strlen(in);
   char c;
   int nowhite = 0;
+  int idn_encode = 0;
   int hasuscore = 0;
+  int hasucase = 0;
   
   if (l == 0 || l > MAXDNAME) return 0;
   
@@ -136,28 +139,49 @@ static int check_name(char *in)
   for (; (c = *in); in++)
     {
       if (c == '.')
-	dotgap = 0;
+        dotgap = 0;
       else if (++dotgap > MAXLABEL)
-	return 0;
+        return 0;
       else if (isascii((unsigned char)c) && iscntrl((unsigned char)c)) 
-	/* iscntrl only gives expected results for ascii */
-	return 0;
-#if !defined(HAVE_IDN) && !defined(HAVE_LIBIDN2)
+        /* iscntrl only gives expected results for ascii */
+        return 0;
       else if (!isascii((unsigned char)c))
-	return 0;
+#if !defined(HAVE_IDN) && !defined(HAVE_LIBIDN2)
+        return 0;
+#else
+        idn_encode = 1;
 #endif
       else if (c != ' ')
-	{
-	  nowhite = 1;
-	  if (c == '_')
-	    hasuscore = 1;
-	}
+        {
+          nowhite = 1;
+#if defined(HAVE_LIBIDN2) && (!defined(IDN2_VERSION_NUMBER) || IDN2_VERSION_NUMBER < 0x02000003)
+          if (c == '_')
+            hasuscore = 1;
+#else
+          (void)hasuscore;
+#endif
+
+#if defined(HAVE_IDN) || defined(HAVE_LIBIDN2)
+          if (c >= 'A' && c <= 'Z')
+            hasucase = 1;
+#else
+          (void)hasucase;
+#endif
+        }
     }
 
   if (!nowhite)
     return 0;
 
-  return hasuscore ? 2 : 1;
+#if defined(HAVE_LIBIDN2) && (!defined(IDN2_VERSION_NUMBER) || IDN2_VERSION_NUMBER < 0x02000003)
+  /* Older libidn2 strips underscores, so don't do IDN processing
+     if the name has an underscore unless it also has non-ascii characters. */
+  idn_encode = idn_encode || (hasucase && !hasuscore);
+#else
+  idn_encode = idn_encode || hasucase;
+#endif
+
+  return (idn_encode) ? 2 : 1;
 }
 
 /* Hostnames have a more limited valid charset than domain names
@@ -204,12 +228,8 @@ char *canonicalise(char *in, int *nomem)
   if (!(rc = check_name(in)))
     return NULL;
   
-#if defined(HAVE_LIBIDN2) && (!defined(IDN2_VERSION_NUMBER) || IDN2_VERSION_NUMBER < 0x02000003)
-  /* older libidn2 strips underscores, so don't do IDN processing
-     if the name has an underscore (check_name() returned 2) */
-  if (rc != 2)
-#endif
 #if defined(HAVE_IDN) || defined(HAVE_LIBIDN2)
+  if (rc == 2)
     {
 #  ifdef HAVE_LIBIDN2
       rc = idn2_to_ascii_lz(in, &ret, IDN2_NONTRANSITIONAL);
@@ -234,12 +254,14 @@ char *canonicalise(char *in, int *nomem)
       
       return ret;
     }
+#else
+  (void)rc;
 #endif
   
   if ((ret = whine_malloc(strlen(in)+1)))
     strcpy(ret, in);
   else if (nomem)
-    *nomem = 1;    
+    *nomem = 1;
 
   return ret;
 }
-- 
2.33.0

_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss

Reply via email to