[ https://issues.apache.org/jira/browse/FLINK-36370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias Pohl updated FLINK-36370: ---------------------------------- Affects Version/s: 1.18.1 kubernetes-operator-1.7.0 > Flink 1.18 fails with Empty server certificate chain when High Availability > and mTLS both enabled > ------------------------------------------------------------------------------------------------- > > Key: FLINK-36370 > URL: https://issues.apache.org/jira/browse/FLINK-36370 > Project: Flink > Issue Type: Bug > Affects Versions: kubernetes-operator-1.7.0, 1.18.1 > Reporter: Aniruddh J > Priority: Major > Attachments: flink-cert-issue.log > > > Hi, in my kubernetes cluster I have flink-kubernetes-operator v1.7.0 and > apache-flink v1.18.1 installed. In the FlinkDeployment CR when I enable > Kubernetes high availability services with mTLS something like below: > {code:java} > high-availability.type: kubernetes > high-availability: > org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory > high-availability.storageDir: 'file:///mnt/pv/ha' > security.ssl.rest.authentication-enabled: 'true'{code} > I am ending up with *SSLHandshakeException with empty client certificate* > > Though both of them work fine when implemented individually. Upon enabling > *{{{}-{}}}{{{}[Djavax.net|http://djavax.net/]{}}}{{{}.debug=all{}}}* observed > client server communication and figured out > [https://github.com/apache/flink/blob/release-1.18/flink-runtime/src/main/java/org/apache/flink/runtime/rest/RestClient.java] > is where Client gets setup and it happens from the operator side > [https://github.com/apache/flink-kubernetes-operator/blob/b081b75b72ddde643710e869b95b214912882363/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java#L750] > (correct me here please) > > When we enable both mTLS and HA the client doesn't seem to be getting setup. > Not only that, it doesn't follow the same path of client creation. Below is > the part of the ssl handshake log before getting the error (attached the > entire ssl handshake log): > {code:java} > javax.net.ssl|DEBUG|53|flink-rest-server-netty-worker-thread-1|2024-09-19 > 15:16:12.508 GMT|null:-1|Produced CertificateRequest handshake message ( > "CertificateRequest": > { "certificate types": [ecdsa_sign, rsa_sign, dss_sign] "supported signature > algorithms": [ecdsa_secp256r1_sha256, .., rsa_sha224, dsa_sha224, ecdsa_sha1, > rsa_pkcs1_sha1, dsa_sha1] "certificate authorities": [CN=FlinkCA, O=Apache > Flink] } > javax.net.ssl|DEBUG|53|flink-rest-server-netty-worker-thread-1|2024-09-19 > 15:16:12.512 GMT|null:-1|Raw read ( > 0000: 1603030007 0B 000003000000 ............ > ) > javax.net.ssl|DEBUG|53|flink-rest-server-netty-worker-thread-1|2024-09-19 > 15:16:12.513 GMT|null:-1|READ: TLSv1.2 handshake, length = 7 > javax.net.ssl|DEBUG|53|flink-rest-server-netty-worker-thread-1|2024-09-19 > 15:16:12.513 GMT|null:-1|Consuming client Certificate handshake message ( > "Certificates": <empty list> > ) > javax.net.ssl|ERROR|53|flink-rest-server-netty-worker-thread-1|2024-09-19 > 15:16:12.514 GMT|null:-1|Fatal (BAD_CERTIFICATE): Empty server certificate > chain ( > "throwable" : { > javax.net.ssl.SSLHandshakeException: Empty server certificate chain > {code} > From the initial looks it seems when Flink server is requesting for > certificates from Client, the client doesn't send anything back since it does > not have certificates matching the CA? > > Some client is sending a REST request to Flink server which the netty library > is handling but until we figure out the client we don't know whether it's the > truststore on client that's a problem or something else we don't see here. > > *Note: The certficates for Flink are self-signed certificates.* > Thanks! -- This message was sent by Atlassian Jira (v8.20.10#820010)