
Aniruddh J updated FLINK-36370:

> Flink 1.18 fails with Empty server certificate chain when High Availability 
> and mTLS both enabled
> -------------------------------------------------------------------------------------------------
>                 Key: FLINK-36370
>                 URL: https://issues.apache.org/jira/browse/FLINK-36370
>             Project: Flink
>          Issue Type: Bug
>          Components: Kubernetes Operator, Runtime / Coordination
>    Affects Versions: kubernetes-operator-1.7.0, 1.18.1
>            Reporter: Aniruddh J
>            Priority: Major
>         Attachments: flink-cert-issue.log, 
> flink-kubernetes-operator-54b9b99bd5-hkh8q-flink-kubernetes-operator.log, 
> flink-ssl-66c8dfbcc7-l725q-flink-main-container.log
> Hi, in my kubernetes cluster I have flink-kubernetes-operator v1.7.0 and 
> apache-flink v1.18.1 installed. In the FlinkDeployment CR when I enable 
> Kubernetes high availability services with mTLS something like below:
> {code:java}
> high-availability.type: kubernetes
> high-availability: 
> org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
> high-availability.storageDir: 'file:///mnt/pv/ha'
> security.ssl.rest.authentication-enabled: 'true'{code}
> I am ending up with *SSLHandshakeException with empty client certificate*
> Though both of them work fine when implemented individually. Upon enabling  
> *{{{}-{}}}{{{}[Djavax.net|http://djavax.net/]{}}}{{{}.debug=all{}}}* observed 
> client server communication and figured out  
> [https://github.com/apache/flink/blob/release-1.18/flink-runtime/src/main/java/org/apache/flink/runtime/rest/RestClient.java]
>  is where Client gets setup and it happens from the operator side 
> [https://github.com/apache/flink-kubernetes-operator/blob/b081b75b72ddde643710e869b95b214912882363/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java#L750]
>  (correct me here please)
> When we enable both mTLS and HA the client doesn't seem to be getting setup. 
> Not only that, it doesn't follow the same path of client creation. Below is 
> the part of the ssl handshake log before getting the error (attached the 
> entire ssl handshake log):
> {code:java}
> javax.net.ssl|DEBUG|53|flink-rest-server-netty-worker-thread-1|2024-09-19 
> 15:16:12.508 GMT|null:-1|Produced CertificateRequest handshake message (
> "CertificateRequest":
> { "certificate types": [ecdsa_sign, rsa_sign, dss_sign] "supported signature 
> algorithms": [ecdsa_secp256r1_sha256, .., rsa_sha224, dsa_sha224, ecdsa_sha1, 
> rsa_pkcs1_sha1, dsa_sha1] "certificate authorities": [CN=FlinkCA, O=Apache 
> Flink] }
> javax.net.ssl|DEBUG|53|flink-rest-server-netty-worker-thread-1|2024-09-19 
> 15:16:12.512 GMT|null:-1|Raw read (
> 0000: 1603030007 0B 000003000000 ............
> )
> javax.net.ssl|DEBUG|53|flink-rest-server-netty-worker-thread-1|2024-09-19 
> 15:16:12.513 GMT|null:-1|READ: TLSv1.2 handshake, length = 7
> javax.net.ssl|DEBUG|53|flink-rest-server-netty-worker-thread-1|2024-09-19 
> 15:16:12.513 GMT|null:-1|Consuming client Certificate handshake message (
> "Certificates": <empty list>
> )
> javax.net.ssl|ERROR|53|flink-rest-server-netty-worker-thread-1|2024-09-19 
> 15:16:12.514 GMT|null:-1|Fatal (BAD_CERTIFICATE): Empty server certificate 
> chain (
> "throwable" : {
> javax.net.ssl.SSLHandshakeException: Empty server certificate chain
> {code}
> From the initial looks it seems when Flink server is requesting for 
> certificates from Client, the client doesn't send anything back since it does 
> not have certificates  matching the CA?
> Some client is sending a REST request to Flink server which the netty library 
> is handling but until we figure out the client we don't know whether it's the 
> truststore on client that's a problem or something else we don't see here.
> *Note: The certficates for Flink are self-signed certificates.*
> Thanks!

This message was sent by Atlassian Jira

Reply via email to