Jan-Philip Gehrcke added the comment: Hello!
Like everybody in this thread I would love to see this land and have prepared a new patch, hoping that we can process this still for 3.6. Antoine summarized the core task here very well: > Let's stay focused on what is > necessary to solve this use case, IMO (i.e. specify non-file-backed data > as a certification chain or private key). What follows is what I believe is the consensus in this thread about a minimum viable solution (points mentioned earlier, enriched with some detail): - The `certfile` and `keyfile` arguments to `SSLContext.load_cert_chain()` can both be either a file path or a file object. - `str` or `bytes` values to both, `certfile` and `keyfile`, are interpreted as paths. - The `password` behavior must stay as is, independent of `certfile`/`keyfile` being file object or path. - When handing file paths to `load_cert_chain()`, the behavior must stay as is, in all detail. - When handing file objects to `load_cert_chain()`, - use the OpenSSL API call `PEM_read_bio_PrivateKey()` for loading the key data (reading PEM data is what's currently supported). - support the case where the `certfile` file object first presents a private key and then (a) certificate(s), as currently documented: """ Often the private key is stored in the same file as the certificate; in this case, only the certfile parameter to SSLContext.load_cert_chain() and wrap_socket() needs to be passed. If the private key is stored with the certificate, it should come before the first certificate in the certificate chain: """ Ref: - https://docs.python.org/3/library/ssl.html#combined-key-and-certificate - resemble the OpenSSL API call `SSL_CTX_use_certificate_chain_file()` behavior for loading the certificate data (as this is what's currently used in the `SSLContext.load_cert_chain()` implementation in _ssl.c. There is no direct correspondence for this function reading the data directly from memory BIOs. Relevant documentation for resembling that behavior: """ SSL_CTX_use_certificate_chain_file() adds the first certificate found in the file to the certificate store. The other certificates are added to the store of chain certificates using SSL_CTX_add1_chain_cert. """ """ For a longer chain, the client must send the complete chain [...]. This can only be accomplished by either adding the intermediate CA certificates into the trusted certificate store for the SSL_CTX object [...], or by adding the chain certificates using the SSL_CTX_add_extra_chain_cert function, """ """ When building its own certificate chain, an OpenSSL client/server will try to fill in missing certificates from CAfile/CApath, if the certificate chain was not explicitly specified (see SSL_CTX_add_extra_chain_cert, SSL_CTX_use_certificate. """ Refs: - https://github.com/python/cpython/blob/0852878a81edd5c16776a68ce34c45cca233deae/Modules/_ssl.c#L2824 - https://www.openssl.org/docs/manmaster/ssl/SSL_CTX_use_certificate.html - https://www.openssl.org/docs/manmaster/ssl/SSL_CTX_set_client_cert_cb.html - https://www.openssl.org/docs/manmaster/ssl/SSL_CTX_load_verify_locations.html When inspecting the OpenSSL code for SSL_CTX_use_certificate_chain_file(), some of the above statements are not entirely correct. After inspection, to me the safest bet seems to keep the memory/BIO-based load_cert_chain() implementation as close as possible to SSL_CTX_use_certificate_chain_file() from OpenSSL 1.0.2h: https://fossies.org/dox/openssl-1.0.2h/ssl__rsa_8c_source.html#l00685 use_certificate_chain_file() in OpenSSL 1.1.0-pre6: https://fossies.org/dox/openssl-1.1.0-pre6/ssl__rsa_8c_source.html#l00599 Using this as a template (instead of coming up with a solution purely based on OpenSSL docs) ensures that this part of the current `SSLContext.load_cert_chain()` docs will be remain correct: """ The certfile [...] in PEM format containing the certificate as well as any number of CA certificates needed to establish the certificate’s authenticity. """ More behavioral detail that I planned with: - `certfile` and `keyfile`, if given both, must both be a file path or both be a file object. A mixture is undefined behavior and leads to an error depending on the specifics of the mixture. - If `certfile` and `keyfile` are not file paths, expect them to have a read() method and just call it. Let a TypeError or AttributeError be handled by the caller (as done elsewhere in the stdlib). - Use the existing Python MemoryBIO abstractions introduced by Geert Jansen in this beautiful patch: https://github.com/python/cpython/commit/08802a11fdb844a5aaec3a1dc67d8281d582d22b That is, prepare these BIOs in Python code and pass them to an underlying C function that consumes BIOs. A summary of a few things that are in my opinion beyond a minimal implementation (but probably useful in the future!): - Private key deserialization using d2i_PKCS8PrivateKey_bio(). The current implementation only supports SSL_FILETYPE_PEM and some of the semantics of the load_cert_chain() method is specifically adjusted to PEM features. - Certificate deserialization supporting formats other than PEM (the current implementation uses SSL_CTX_use_certificate_chain_file() which only supports PEM). - New types for certificates and keys. Supporting additional serialization formats greatly increases the complexity of the task at hand here. We can save that for later. An important note on backwards compatibility: >> Antoine, are you suggesting that we remove the current c-level >> capability to use file system files (using open()) and just go with >> raw bytes data at the C api level, and then do the 'filename or >> filelike object' in Python land? > > Yes, I think that's reasonable. Hopefully it won't break any existing uses. I absolutely agree that Python code should be responsible for doing the distinction between file paths and file objects. However, I do not agree with > remove the current c-level capability to use file system files After following the code I am rather sure that we should leave the current C implementation (`_ssl__SSLContext_load_cert_chain_impl()` in Modules/_ssl.c) as untouched as possible, and not try to integrate the new logic into that very same function. Instead, I figured, we should use Python code to delegate to one of two different C functions after the distinction between file path and file object has been made, whereas one of the two C functions is the one we used so far. The central argument for having two separate C code paths is that old code must proceed using OpenSSL's SSL_CTX_use_certificate_chain_file(). Hence, I opted for implementing SSLContext.load_cert_chain() in Python, delegating the actual work * to either the current load_cert_chain C implementation if file paths were given or * to a new C implementation resembling SSL_CTX_use_certificate_chain_file(), after read()ing the file object(s) and creating the corresponding MemoryBIO object(s) containing the cert/key data. Comments on tests: * The diff seems radical due to restructuring. I have manually confirmed that the old tests pass against the new code. * Major goal of the test refactoring was to take as much of the old test code as possible (now `_test_load_cert_chain_core()`) and automatically test it with three different input data types: * file paths, as before (now `test_load_cert_chain_core_filepaths()`) * ByteIO buffers (`test_load_cert_chain_core_byte_buffers()`) * StringIO buffers (`test_load_cert_chain_core_text_buffers()`) * Also, there are new test methods specifically targeting those behavioral details that the current load_cert_chain() implementation and the new implementation do _not_ have in common. This mainly accounts for new (better, IMO) error messages created by the OpenSSL API calls in the BIO-based implementation. * In just _one_ of the more complex tests involving actual client/server communication, I have replaced `SIGNED_CERTFILE` (a path) with `bytebuf(SIGNED_CERTFILE)`. Now I'd appreciate your review! Jan-Philip ---------- nosy: +jgehrcke Added file: http://bugs.python.org/file44092/python_16487_jgehrcke_01.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue16487> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com