Re: [Twisted-Python] IDNA problem in twisted

2021-04-29 Thread Barry Scott
On Wednesday, 28 April 2021 06:52:30 BST Glyph wrote:
> 
> > On Apr 27, 2021, at 8:58 PM, Wim Lewis  wrote:
> > 
> > On Thursday, April 8, 2021 8:43:35 AM PDT, Barry Scott wrote:
> >> We just added a patch to our twisted to prevent twisted from doing idna 
> >> validation.
> >> _idnaBytes and _idnaText not convert from bytes to unicode based on the 
> >> type of
> >> the provided arg.
> >> 
> >> We had to do this because there are domain names that youtube.com uses 
> >> that are
> >> not valid under IDNA-2008 
> >> https://tools.ietf.org/html/rfc5891#section-4.2.3.1
> > 
> > My reading of the RFC is that the YouTube domain you mention 
> > (r2---sn-aigzrn7e.googlevideo.com) is an invalid "U-Label", but that 
> > doesn't mean it's an entirely invaid domain label. It just means you can't 
> > legally run it through IDNA and turn it into "xn--r2---sn-aigzrn7e-". The 
> > intent, as I understand it, is to forbid any possibility of double-encoding 
> > or double-decoding a label, not to forbid the possibility of using labels 
> > like the one you mention.
> 
> I agree with this reading.
> 
> >> I can see why a UI would need to do IDNA-2008 converts and validation
> >> but I'm not clear why its of value deep in the guts of twisted.
> > 
> > My guess is that this is just an accident of the way that the 
> > bytes/characters distinction and the IDNA features were added to Twisted, 
> > and is probably a bug.
> 
> +1.
> 
> We also have other issues with the Python IDNA library: 
> https://github.com/kjd/idna/issues/18  
> and would generally like to reduce our strictness via whatever mechanisms we 
> can, even for things that genuinely require it (which this does not).
> 
> >> Why is this code needed at all in twisted?
> >> If its for a high level API then why isn't it being called at the
> >> edge of the high level API calls?
> > 
> > I'd argue that resolving URLs is in fact a high level API (from the point 
> > of view of the name resoution system) but even so, it seems to me that 
> > Twisted is doing the wrong thing here. The format of that label should 
> > prevent it from ever being transformed by IDNA, but shouldn't prevent it 
> > from being passed through unchanged, since it doesn't contain any 
> > codepoints outside of the usual ASCII range.
> 
> Also agreed with all of this.
> 
> >> The key idea here is that its human input that will be converted.
> >> But the code is used deep in the _sslverify.py where no human
> >> input is entered.
> > 
> > _sslverify has to check whether the information in the server's certificate 
> > matches the URL that the user supplied. Certificates can contain Unicode 
> > text — at least in the (completely obsolete) CN-as-domain-name situation — 
> > so _sslverify probably picked up the requirement for IDNA transformations 
> > from that. (I don't remember whether dNSName SANs can contain unicode.)
> 
> Yep.
> 
> > What is the patch you decided to add to your version? Where in _sslverify 
> > did the problem surface?

When _idaBytes was called to raise an exception in ClientTLSOptions.__init__.

> I am also very curious about this :).

Attached is the patch we are using. We are using 19.07 for sad reasons.

Barry
Only in tmp2: twisted-remove-idna-checks.patch
diff -r -u tmp1/twisted-twisted-19.7.0/src/twisted/internet/_idna.py tmp2/twisted-twisted-19.7.0/src/twisted/internet/_idna.py
--- tmp1/twisted-twisted-19.7.0/src/twisted/internet/_idna.py	2019-07-28 10:17:29.0 +0100
+++ tmp2/twisted-twisted-19.7.0/src/twisted/internet/_idna.py	2021-04-08 14:38:04.449987636 +0100
@@ -22,14 +22,10 @@
 @return: The domain name's IDNA representation, encoded as bytes.
 @rtype: L{bytes}
 """
-try:
-import idna
-except ImportError:
-return text.encode("idna")
+if type(text) == bytes:
+return text
 else:
-return idna.encode(text)
-
-
+return text.encode('ascii')
 
 def _idnaText(octets):
 """
@@ -43,12 +39,10 @@
 @return: A human-readable domain name.
 @rtype: L{unicode}
 """
-try:
-import idna
-except ImportError:
-return octets.decode("idna")
+if type(octets) == bytes:
+return octets.decode('ascii') 
 else:
-return idna.decode(octets)
+return octets
 
 
 
___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] startTLS errors not propagating to Factory

2021-04-29 Thread Richard van der Hoff

On 29/04/2021 07:36, Glyph wrote:


In my tests at least, TLSMemoryBIOProtocol.connectionLost is doing 
exactly the right thing - it is called with an unhelpful reason, but 
substitutes back in the helpful reason which has already been stashed.


Rather, the problem, as I see it, is that it's not 
TLSMemoryBIOProtocol.connectionLost that calls 
Factory.clientConnectionLost. That is done by 
tcp.Client.connectionLost, via one of tcp.Client's myriad of base 
classes, at 
https://github.com/twisted/twisted/blob/3c868ac11786eef7ea269caa3056f00854128957/src/twisted/internet/tcp.py#L508. 
Of course, that doesn't get the benefit of TLSMemoryBIOProtocol's 
reason switcheroo.


I'm still not quite sure who is in the wrong here.


Aah, yeah, this is a weird quirk of the ancient-style layering in the 
SMTP code :-|.  The way this /should/ work is by using HostnameEndpoint.


I'm not sure /exactly/ where we're going off the rails, but by using 
/both /the old 'startTLS' style of starting a TLS connection, as well 
as relying on ClientFactory rather than an Endpoint of some kind, 
means that we're getting this duplicate notification; the one that you 
get to Protocol.connectionLost will come from TLS and have useful 
information, but the one that goes to the Connector will be coming 
straight from TCP.


The right thing to fix here, I think, is to ignore 
clientConnectionLost entirely, and instead to have the protocol object 
relay its failure to some other differently-named method on 
SMTPSenderFactory.


Right! That sounds plausible, and certainly gives me some places to 
poke. I'll have another look later. Thanks very much!


R
___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] startTLS errors not propagating to Factory

2021-04-29 Thread Glyph

> On Apr 29, 2021, at 3:09 AM, Richard van der Hoff  wrote:
> Right! That sounds plausible, and certainly gives me some places to poke. 
> I'll have another look later. Thanks very much!
> 

Glad to help. Looking forward to the resolution on this!

For what it's worth: it may also be useful to explicitly account for the 
Connector object in the transport.startTLS subsystem, it would be good to have 
the same reason object delivered in both places.  So fixing the SMTP code isn't 
necessarily the only path forward here.

-g___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python