> On Apr 27, 2021, at 8:58 PM, Wim Lewis <w...@hhhh.org> wrote: > > On Thursday, April 8, 2021 8:43:35 AM PDT, Barry Scott wrote: >> We just added a patch to our twisted to prevent twisted from doing idna >> validation. >> _idnaBytes and _idnaText not convert from bytes to unicode based on the type >> of >> the provided arg. >> >> We had to do this because there are domain names that youtube.com uses that >> are >> not valid under IDNA-2008 https://tools.ietf.org/html/rfc5891#section-4.2.3.1 > > My reading of the RFC is that the YouTube domain you mention > (r2---sn-aigzrn7e.googlevideo.com) is an invalid "U-Label", but that doesn't > mean it's an entirely invaid domain label. It just means you can't legally > run it through IDNA and turn it into "xn--r2---sn-aigzrn7e-". The intent, as > I understand it, is to forbid any possibility of double-encoding or > double-decoding a label, not to forbid the possibility of using labels like > the one you mention.
I agree with this reading. >> I can see why a UI would need to do IDNA-2008 converts and validation >> but I'm not clear why its of value deep in the guts of twisted. > > My guess is that this is just an accident of the way that the > bytes/characters distinction and the IDNA features were added to Twisted, and > is probably a bug. +1. We also have other issues with the Python IDNA library: https://github.com/kjd/idna/issues/18 <https://github.com/kjd/idna/issues/18> and would generally like to reduce our strictness via whatever mechanisms we can, even for things that genuinely require it (which this does not). >> Why is this code needed at all in twisted? >> If its for a high level API then why isn't it being called at the >> edge of the high level API calls? > > I'd argue that resolving URLs is in fact a high level API (from the point of > view of the name resoution system) but even so, it seems to me that Twisted > is doing the wrong thing here. The format of that label should prevent it > from ever being transformed by IDNA, but shouldn't prevent it from being > passed through unchanged, since it doesn't contain any codepoints outside of > the usual ASCII range. Also agreed with all of this. >> The key idea here is that its human input that will be converted. >> But the code is used deep in the _sslverify.py where no human >> input is entered. > > _sslverify has to check whether the information in the server's certificate > matches the URL that the user supplied. Certificates can contain Unicode text > — at least in the (completely obsolete) CN-as-domain-name situation — so > _sslverify probably picked up the requirement for IDNA transformations from > that. (I don't remember whether dNSName SANs can contain unicode.) Yep. > What is the patch you decided to add to your version? Where in _sslverify did > the problem surface? I am also very curious about this :).
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python