[issue43577] Deadlock when using SSLContext._msg_callback and SSLContext.sni_callback

2021-03-20 Thread Andrew Dailey


New submission from Andrew Dailey :

Hello,

I think I might've stumbled onto an oversight with how an SSLSocket handles 
overwriting its SSLContext within an sni_callback. If both "_msg_callback" and 
"sni_callback" are defined on an SSLContext object and the sni_callback 
replaces the context with new one, the interpreter locks up indefinitely. It 
fails to respond to keyboard interrupts and must be forcefully killed.

This seems to be a common use case of the sni_callback: create a new context 
with a different cert chain and attach it to the current socket (which replaces 
the existing one). If _msg_callback never gets defined on the original context 
then this deadlock never occurs. Curiously, if you assign the same 
_msg_callback to the new context before replacement, this also avoids the 
deadlock.

I've attached as minimal of a reproduction as I could come up with. I think the 
code within will probably do a better job explaining this problem than I've 
done here in prose. I've only tested it on a couple Linux distros (Ubuntu 
Server and Void Linux) but the lock occurs 100% of the time in my experience.

In the brief time I've spent digging into the CPython source, I've come to 
understand that replacing the SSLContext on an SSLSocket isn't "just" a simple 
replacement but actually involves some OpenSSL mechanics (specifically, 
SSL_set_SSL_CTX) [0]. I'm wondering if maybe this context update routine isn't 
properly cleaning up whatever resources / references were being used by the 
msg_callback? Maybe this is even closer to an OpenSSL bug (or a least a gotcha)?

I also feel the need to explain why I'd even be using an undocumented property 
(SSLContext._msg_callback) in the first place. I'm trying to implement a 
program that automatically manages TLS certs on a socket via Let's Encrypt and 
the ACME protocol (RFC8555). Part of this process involves serving up a 
specific cert when a connection requests the acme-tls/1 ALPN protocol. Given 
the existing Python SSL API, I don't believe there is any way for me to do this 
"correctly".

The documentation for SSLContext.sni_callback [1] mentions that the 
selected_alpn_protocol function should be usable within the callback but I 
don't that is quite true. According to the OpenSSL docs [2]:
Several callbacks are executed during ClientHello processing, including the 
ClientHello, ALPN, and servername callbacks. The ClientHello callback is 
executed first, then the servername callback, followed by the ALPN callback.

If there is a better way for me to identify a specific ALPN protocol _before_ 
the sni_callback, I could definitely use the guidance. That would avoid this 
deadlock altogether (even though it'd still be waiting to catch someone 
else...).

This is my first Python issue so I hope what I've supplied makes sense. If 
there is anything more I can do to help or provide more info, please let me 
know.

[0] https://github.com/python/cpython/blob/3.9/Modules/_ssl.c#L2194
[1] https://docs.python.org/3/library/ssl.html#ssl.SSLContext.sni_callback
[2] 
https://www.openssl.org/docs/man1.1.1/man3/SSL_CTX_set_tlsext_servername_callback.html

--
assignee: christian.heimes
components: SSL
files: deadlock.zip
messages: 389216
nosy: christian.heimes, theandrew168
priority: normal
severity: normal
status: open
title: Deadlock when using SSLContext._msg_callback and SSLContext.sni_callback
type: behavior
versions: Python 3.8, Python 3.9
Added file: https://bugs.python.org/file49897/deadlock.zip

___
Python tracker 
<https://bugs.python.org/issue43577>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43577] Deadlock when using SSLContext._msg_callback and SSLContext.sni_callback

2021-03-21 Thread Andrew Dailey


Andrew Dailey  added the comment:

I'm glad that the info I provided was helpful! I'll go ahead and create another 
issue for the misleading docs surrounding SSLContext.sni_callback. Thanks for 
looking into this and coming up with a fix so quickly.

I do have one more question: does python provide a "safe" way to test for 
deadlocks like this? I noticed that you added a test case to verify that this 
lockup doesn't happen but what would happen if someone ran that test on an 
earlier version? Would the test runner also freeze or are there facilities 
in-place to catch such behavior? Maybe something nutty like:

with should_deadlock():
my_buggy_test()

--

___
Python tracker 
<https://bugs.python.org/issue43577>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43582] SSLContext.sni_callback docs inaccurately describe available handshake info

2021-03-21 Thread Andrew Dailey


New submission from Andrew Dailey :

Hello,

The documentation for SSLContext.sni_callback [0] seems to incorrectly describe 
the information available at that stage of the TLS handshake.

According to the docs:
Due to the early negotiation phase of the TLS connection, only limited methods 
and attributes are usable like SSLSocket.selected_alpn_protocol() and 
SSLSocket.context. SSLSocket.getpeercert(), SSLSocket.getpeercert(), 
SSLSocket.cipher() and SSLSocket.compress() methods require that the TLS 
connection has progressed beyond the TLS Client Hello and therefore will not 
contain return meaningful values nor can they be called safely.

This paragraph claims that SSLSocket.selected_alpn_protocol() should be usable 
within sni_callback but I think this is inaccurate. Based on the OpenSSL docs 
[1] and my own testing, the servername callback occurs after ClientHello but 
_before_ the ALPN callback. This prevents accurate ALPN information from being 
available until later. I believe that any call to 
SSLSocket.selected_alpn_protocol() within an SSLContext.sni_callback will 
simply return None.

Excerpt from the OpenSSL docs:
Several callbacks are executed during ClientHello processing, including the 
ClientHello, ALPN, and servername callbacks. The ClientHello callback is 
executed first, then the servername callback, followed by the ALPN callback.

I think it'd be better to explain that the only "useful" thing you can do 
within sni_callback is to see what sni_name is desired an optionally swap out 
the context for one with a more appropriate cert chain. Any information about 
the selected ALPN protocol has to wait until later in the handshake.

[0] https://docs.python.org/3/library/ssl.html#ssl.SSLContext.sni_callback
[1] 
https://www.openssl.org/docs/man1.1.1/man3/SSL_CTX_set_tlsext_servername_callback.html

--
assignee: docs@python
components: Documentation, SSL
messages: 389231
nosy: docs@python, theandrew168
priority: normal
severity: normal
status: open
title: SSLContext.sni_callback docs inaccurately describe available handshake 
info
type: enhancement
versions: Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue43582>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43582] SSLContext.sni_callback docs inaccurately describe available handshake info

2021-03-21 Thread Andrew Dailey


Change by Andrew Dailey :


--
nosy: +christian.heimes

___
Python tracker 
<https://bugs.python.org/issue43582>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43582] SSLContext.sni_callback docs inaccurately describe available handshake info

2021-03-21 Thread Andrew Dailey


Andrew Dailey  added the comment:

Yea, I'm still on the hunt for a better way to solve my primary problem: detect 
an acme-tls/1 ALPN protocol request during the TLS handshake so that I can swap 
out the context to one with the cert chain that Let's Encrypt is expecting to 
see.

It seems like OpenSSL provides three primary hooks into the handshake: 
ClientHello, servername, and ALPN. The servername callback is the only one that 
can be "officially" customized by Python's SSL API. The ALPN callback seems to 
be used under the hood to implement SSLContext.set_alpn_protocols() but there 
isn't a way to specify complete control of the callback.

My current "hack" is to use the SSLContext._msg_callback to check for the 
acme-tls/1 protocol explicitly:

def msg_callback(conn, direction, version, content_type, msg_type, data):
if direction == 'read' and b'acme-tls/1' in data:
print('got an acme-tls/1 request')
print('set a flag for sni_callback to check, etc etc')

I know this probably isn't a good or safe way to solve the problem. The current 
docs make it sound like sni_callback would be my one-stop shop but that ended 
up not being the case. Maybe I could subclass SSLSocket, override 
do_handshake(), and then swap out the context before or after 
super().do_handshake()? I'm quite new to Python/OpenSSL internals so I'm not 
sure if that is even possible. Can a context be swapped out so late in the 
handshake process?

The SSL_client_hello_get0_ext() function you mentioned could be a contender. 
The _msg_callback I'm currently using _does_ do the trick but maybe shouldn't 
be documented and made official? Regardless of how best to solve my current 
acme-tls/1 ALPN detection issue, the sni_callback won't ever be the full answer 
unless some internal mechanics are added to watch ClientHello and preemptively 
peek at the requested ALPN protocol(s).

--

___
Python tracker 
<https://bugs.python.org/issue43582>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43582] SSLContext.sni_callback docs inaccurately describe available handshake info

2021-03-21 Thread Andrew Dailey


Andrew Dailey  added the comment:

Yea, I noticed that through some of my digging. The ALPN callback is used to 
implement SSLContext.set_alpn_protocols() but full control of the callback 
isn't exposed. Aside from adjusting how the ALPN callback used, do you know of 
any other way to swap contexts once the selected ALPN proto is known but not 
before it's too late? As I said before, I'm not super familiar with Python / 
OpenSSL internals but maybe overriding SSLSocket.do_handshake() would suffice? 
I don't want this issue to get too far off track.

I'm still doing research on how I'd go about drafting and submitting a formal 
patch here on the issue tracker. I'm new to this process but definitely want to 
help out as much as I can.

Here's my current idea for how to adjust the documentation given the current 
behavior / capabilities.

CURRENT:
Due to the early negotiation phase of the TLS connection, only limited methods 
and attributes are usable like SSLSocket.selected_alpn_protocol() and 
SSLSocket.context. SSLSocket.getpeercert(), SSLSocket.getpeercert(), 
SSLSocket.cipher() and SSLSocket.compress() methods require that the TLS 
connection has progressed beyond the TLS Client Hello and therefore will not 
contain return meaningful values nor can they be called safely.

REVISED:
Based on the value of `sni_name`, a new SSLContext can be created and attached 
to the current SSLSocket. Due to the early negotiation phase of the TLS 
connection, only the Client Hello will have occurred by the time this callback 
is called. Methods and attributes such as SSLSocket.selected_alpn_protocol(), 
SSLSocket.getpeercert(), SSLSocket.cipher(), and SSLSocket.compress() require 
that the TLS connection has progressed beyond the TLS Client Hello and 
therefore will not contain return meaningful values nor can they be called 
safely.

--

___
Python tracker 
<https://bugs.python.org/issue43582>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43582] SSLContext.sni_callback docs inaccurately describe available handshake info

2021-03-25 Thread Andrew Dailey


Andrew Dailey  added the comment:

Okay, that makes sense. I appreciate the overview! What do you feel is the best 
way to "solve" this issue, then? Should the docs be updated to reflect the fact 
that ALPN info isn't available in the sni_callback? Or should some code be 
modified to make the docs correct (even though that'd have to be a bit hacky 
since the OpenSSL handshake callback order seems fairly set in stone).

I've got the contributor agreement signed and ready to go. I can formalize my 
ideas for revising the docs into a patch if that would make sense and be 
helpful.

--

___
Python tracker 
<https://bugs.python.org/issue43582>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com