Thanks Petr for having a look at this. Since IDN processing do turns uppercase letters into lowercase, I deliberately left uppercase letters out. I think your approach to put everything in check_name() makes sense, even if that function grows and maybe is starting to become a bit hard to read.
I took your patch and made a couple of changes: - IDN processing is performed if there are uppercase letters present in the name (unless we have an old version of libidn2 and there is an underscore in the name). - IDN processing is always performed if there are non-ascii characters in the name, no matter if there are underscores or not (a non-processed name containing non-ascii characters sounds dangerous). The size of the blacklist is about 230000 lines, and I agree that it would make sense to also file a bug on libidn2. Den ons 8 sep. 2021 kl 15:25 skrev Petr Menšík <pemen...@redhat.com>: > I think your check should also accept uppercase ASCII letters. Anyway, > similar check is already done in check_names, which is there to skip names > containing underscore with older libidn2 versions. I guess it could return > 2 also in case ascii-only characters were detected, instead of checking the > name again in another loop. > > Attached alternative change, which would process only names not only ascii > names. Changes check_names to return 2 when IDN should be used. Printing > ascii names should be safe, even when they contain characters not allowed > by hostnames. Such as _, +, = or whatever garbage is present. As long as it > is readable in logs, it should not matter. > > How many lines does your dnsmasq.blacklist.txt contain? Those differences > are significant. Maybe bug should be filled on libidn2. Conversion from > ascii-only name to ascii name should not take too long even if it was > called. > On 9/6/21 3:27 PM, Gustaf Ullberg wrote: > > Hi Simon and dnsmasq contributors, > > I am running dnsmasq with a blocklist from > > https://github.com/notracking/hosts-blocklists/blob/master/dnsmasq/dnsmasq.blacklist.txt > > I have noticed that building dnsmasq with libidn2 support (which my distro > does) can cause extreme slowdowns. The slowdowns seem to come from the call > to idn2_to_ascii_lz in canonicalise() being very slow. > > idn2_to_ascii_lz is run on every domain name in the blocklist to encode > special characters, and this is surprisingly slow even when there are no > special characters. I developed a patch (attached to this email) that > checks a domain name for other characters than . - a-z 0-9. If any such > character is found, the domain name will be encoded. If no such character > is found the domain name will not be encoded (as encoding won't change it). > This removes most of the overhead of using libidn2. Unless you find any > problems with this approach, I wish the patch can be mainlined. > > Some benchmarks on a Raspberry Pi (slow, but probably not an uncommon > device for running dnsmasq) running ArchLinux and dnsmasq git master: > > # Without libidn2: Acceptable speed > > make > > time ./src/dnsmasq -C dnsmasq.blacklist.txt --test > dnsmasq: syntax check OK. > > real 0m3.699s > user 0m3.468s > sys 0m0.200s > > > > # With libidn2: To slow to be usable > > make COPTS="-DHAVE_LIBIDN2" > > time ./src/dnsmasq -C dnsmasq.blacklist.txt --test > dnsmasq: syntax check OK. > > real 1m6.921s > user 0m59.509s > sys 0m0.606s > > > # With libidn2 and attached patch: Back to acceptable speed > > git am 0001-Avoid-IDN-translations-when-not-needed.patch > > make COPTS="-DHAVE_LIBIDN2" > > time ./src/dnsmasq -C dnsmasq.blacklist.txt --test > dnsmasq: syntax check OK. > > real 0m3.903s > user 0m3.643s > sys 0m0.219s > > Best regards, > Gustaf > > _______________________________________________ > Dnsmasq-discuss mailing > listdnsmasq-disc...@lists.thekelleys.org.ukhttps://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss > > -- > Petr Menšík > Software Engineer > Red Hat, http://www.redhat.com/ > email: pemen...@redhat.com > PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB > > _______________________________________________ > Dnsmasq-discuss mailing list > Dnsmasq-discuss@lists.thekelleys.org.uk > https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss >
From d5dbb66203c6dcd77276ebb243b8921ac975166d Mon Sep 17 00:00:00 2001 From: Gustaf Ullberg <gustaf.ullb...@gmail.com> Date: Wed, 8 Sep 2021 22:57:15 +0200 Subject: [PATCH] check_name() determines if IDN processing is needed. Optimization that only runs IDN processing if it would alter the domain name (non-ascii or uppercase characters). --- src/util.c | 60 +++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 41 insertions(+), 19 deletions(-) diff --git a/src/util.c b/src/util.c index 1425764..3842738 100644 --- a/src/util.c +++ b/src/util.c @@ -115,7 +115,8 @@ u64 rand64(void) return (u64)out[outleft+1] + (((u64)out[outleft]) << 32); } -/* returns 2 if names is OK but contains one or more underscores */ +/* returns 1 if name is OK and ascii printable + * returns 2 if name should be processed by IDN */ static int check_name(char *in) { /* remove trailing . @@ -123,7 +124,9 @@ static int check_name(char *in) size_t dotgap = 0, l = strlen(in); char c; int nowhite = 0; + int idn_encode = 0; int hasuscore = 0; + int hasucase = 0; if (l == 0 || l > MAXDNAME) return 0; @@ -136,28 +139,49 @@ static int check_name(char *in) for (; (c = *in); in++) { if (c == '.') - dotgap = 0; + dotgap = 0; else if (++dotgap > MAXLABEL) - return 0; + return 0; else if (isascii((unsigned char)c) && iscntrl((unsigned char)c)) - /* iscntrl only gives expected results for ascii */ - return 0; -#if !defined(HAVE_IDN) && !defined(HAVE_LIBIDN2) + /* iscntrl only gives expected results for ascii */ + return 0; else if (!isascii((unsigned char)c)) - return 0; +#if !defined(HAVE_IDN) && !defined(HAVE_LIBIDN2) + return 0; +#else + idn_encode = 1; #endif else if (c != ' ') - { - nowhite = 1; - if (c == '_') - hasuscore = 1; - } + { + nowhite = 1; +#if defined(HAVE_LIBIDN2) && (!defined(IDN2_VERSION_NUMBER) || IDN2_VERSION_NUMBER < 0x02000003) + if (c == '_') + hasuscore = 1; +#else + (void)hasuscore; +#endif + +#if defined(HAVE_IDN) || defined(HAVE_LIBIDN2) + if (c >= 'A' && c <= 'Z') + hasucase = 1; +#else + (void)hasucase; +#endif + } } if (!nowhite) return 0; - return hasuscore ? 2 : 1; +#if defined(HAVE_LIBIDN2) && (!defined(IDN2_VERSION_NUMBER) || IDN2_VERSION_NUMBER < 0x02000003) + /* Older libidn2 strips underscores, so don't do IDN processing + if the name has an underscore unless it also has non-ascii characters. */ + idn_encode = idn_encode || (hasucase && !hasuscore); +#else + idn_encode = idn_encode || hasucase; +#endif + + return (idn_encode) ? 2 : 1; } /* Hostnames have a more limited valid charset than domain names @@ -204,12 +228,8 @@ char *canonicalise(char *in, int *nomem) if (!(rc = check_name(in))) return NULL; -#if defined(HAVE_LIBIDN2) && (!defined(IDN2_VERSION_NUMBER) || IDN2_VERSION_NUMBER < 0x02000003) - /* older libidn2 strips underscores, so don't do IDN processing - if the name has an underscore (check_name() returned 2) */ - if (rc != 2) -#endif #if defined(HAVE_IDN) || defined(HAVE_LIBIDN2) + if (rc == 2) { # ifdef HAVE_LIBIDN2 rc = idn2_to_ascii_lz(in, &ret, IDN2_NONTRANSITIONAL); @@ -234,12 +254,14 @@ char *canonicalise(char *in, int *nomem) return ret; } +#else + (void)rc; #endif if ((ret = whine_malloc(strlen(in)+1))) strcpy(ret, in); else if (nomem) - *nomem = 1; + *nomem = 1; return ret; } -- 2.33.0
_______________________________________________ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss