> On Apr 27, 2021, at 8:58 PM, Wim Lewis <w...@hhhh.org> wrote:
> 
> On Thursday, April 8, 2021 8:43:35 AM PDT, Barry Scott wrote:
>> We just added a patch to our twisted to prevent twisted from doing idna 
>> validation.
>> _idnaBytes and _idnaText not convert from bytes to unicode based on the type 
>> of
>> the provided arg.
>> 
>> We had to do this because there are domain names that youtube.com uses that 
>> are
>> not valid under IDNA-2008 https://tools.ietf.org/html/rfc5891#section-4.2.3.1
> 
> My reading of the RFC is that the YouTube domain you mention 
> (r2---sn-aigzrn7e.googlevideo.com) is an invalid "U-Label", but that doesn't 
> mean it's an entirely invaid domain label. It just means you can't legally 
> run it through IDNA and turn it into "xn--r2---sn-aigzrn7e-". The intent, as 
> I understand it, is to forbid any possibility of double-encoding or 
> double-decoding a label, not to forbid the possibility of using labels like 
> the one you mention.

I agree with this reading.

>> I can see why a UI would need to do IDNA-2008 converts and validation
>> but I'm not clear why its of value deep in the guts of twisted.
> 
> My guess is that this is just an accident of the way that the 
> bytes/characters distinction and the IDNA features were added to Twisted, and 
> is probably a bug.

+1.

We also have other issues with the Python IDNA library: 
https://github.com/kjd/idna/issues/18 <https://github.com/kjd/idna/issues/18> 
and would generally like to reduce our strictness via whatever mechanisms we 
can, even for things that genuinely require it (which this does not).

>> Why is this code needed at all in twisted?
>> If its for a high level API then why isn't it being called at the
>> edge of the high level API calls?
> 
> I'd argue that resolving URLs is in fact a high level API (from the point of 
> view of the name resoution system) but even so, it seems to me that Twisted 
> is doing the wrong thing here. The format of that label should prevent it 
> from ever being transformed by IDNA, but shouldn't prevent it from being 
> passed through unchanged, since it doesn't contain any codepoints outside of 
> the usual ASCII range.

Also agreed with all of this.

>> The key idea here is that its human input that will be converted.
>> But the code is used deep in the _sslverify.py where no human
>> input is entered.
> 
> _sslverify has to check whether the information in the server's certificate 
> matches the URL that the user supplied. Certificates can contain Unicode text 
> — at least in the (completely obsolete) CN-as-domain-name situation — so 
> _sslverify probably picked up the requirement for IDNA transformations from 
> that. (I don't remember whether dNSName SANs can contain unicode.)

Yep.

> What is the patch you decided to add to your version? Where in _sslverify did 
> the problem surface?

I am also very curious about this :).

_______________________________________________
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Reply via email to