On Thursday, April 8, 2021 8:43:35 AM PDT, Barry Scott wrote:
We just added a patch to our twisted to prevent twisted from
doing idna validation.
_idnaBytes and _idnaText not convert from bytes to unicode
based on the type of
the provided arg.
We had to do this because there are domain names that
youtube.com uses that are
not valid under IDNA-2008
https://tools.ietf.org/html/rfc5891#section-4.2.3.1
My reading of the RFC is that the YouTube domain you mention
(r2---sn-aigzrn7e.googlevideo.com) is an invalid "U-Label", but that
doesn't mean it's an entirely invaid domain label. It just means you can't
legally run it through IDNA and turn it into "xn--r2---sn-aigzrn7e-". The
intent, as I understand it, is to forbid any possibility of double-encoding
or double-decoding a label, not to forbid the possibility of using labels
like the one you mention.
I can see why a UI would need to do IDNA-2008 converts and validation
but I'm not clear why its of value deep in the guts of twisted.
My guess is that this is just an accident of the way that the
bytes/characters distinction and the IDNA features were added to Twisted,
and is probably a bug.
Why is this code needed at all in twisted?
If its for a high level API then why isn't it being called at the
edge of the high level API calls?
I'd argue that resolving URLs is in fact a high level API (from the point
of view of the name resoution system) but even so, it seems to me that
Twisted is doing the wrong thing here. The format of that label should
prevent it from ever being transformed by IDNA, but shouldn't prevent it
from being passed through unchanged, since it doesn't contain any
codepoints outside of the usual ASCII range.
The key idea here is that its human input that will be converted.
But the code is used deep in the _sslverify.py where no human
input is entered.
_sslverify has to check whether the information in the server's certificate
matches the URL that the user supplied. Certificates can contain Unicode
text — at least in the (completely obsolete) CN-as-domain-name situation —
so _sslverify probably picked up the requirement for IDNA transformations
from that. (I don't remember whether dNSName SANs can contain unicode.)
What is the patch you decided to add to your version? Where in _sslverify
did the problem surface?
_______________________________________________
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python