[ https://issues.apache.org/jira/browse/FLINK-36370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aniruddh J updated FLINK-36370: ------------------------------- Attachment: flink-cert-issue.log Description: Hi, in my kubernetes cluster I have flink-kubernetes-operator v1.7.0 and apache-flink v1.18.1 installed. In the FlinkDeployment CR when I enable Kubernetes high availability services with mTLS something like below: {code:java} high-availability.type: kubernetes high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory high-availability.storageDir: 'file:///mnt/pv/ha' security.ssl.rest.authentication-enabled: 'true'{code} I am ending up with *SSLHandshakeException with empty client certificate* Though both of them work fine when implemented individually. Upon enabling *{{{}-{}}}{{{}[Djavax.net|http://djavax.net/]{}}}{{{}.debug=all{}}}* observed client server communication and figured out [https://github.com/apache/flink/blob/release-1.18/flink-runtime/src/main/java/org/apache/flink/runtime/rest/RestClient.java] is where Client gets setup and it happens from the operator side [https://github.com/apache/flink-kubernetes-operator/blob/b081b75b72ddde643710e869b95b214912882363/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java#L750] (correct me here please) When we enable both mTLS and HA the client doesn't seem to be getting setup. Not only that, it doesn't follow the same path of client creation. Below is the part of the ssl handshake log before getting the error (attached the entire ssl handshake log): {code:java} javax.net.ssl|DEBUG|53|flink-rest-server-netty-worker-thread-1|2024-09-19 15:16:12.508 GMT|null:-1|Produced CertificateRequest handshake message ( "CertificateRequest": { "certificate types": [ecdsa_sign, rsa_sign, dss_sign] "supported signature algorithms": [ecdsa_secp256r1_sha256, .., rsa_sha224, dsa_sha224, ecdsa_sha1, rsa_pkcs1_sha1, dsa_sha1] "certificate authorities": [CN=FlinkCA, O=Apache Flink] } javax.net.ssl|DEBUG|53|flink-rest-server-netty-worker-thread-1|2024-09-19 15:16:12.512 GMT|null:-1|Raw read ( 0000: 1603030007 0B 000003000000 ............ ) javax.net.ssl|DEBUG|53|flink-rest-server-netty-worker-thread-1|2024-09-19 15:16:12.513 GMT|null:-1|READ: TLSv1.2 handshake, length = 7 javax.net.ssl|DEBUG|53|flink-rest-server-netty-worker-thread-1|2024-09-19 15:16:12.513 GMT|null:-1|Consuming client Certificate handshake message ( "Certificates": <empty list> ) javax.net.ssl|ERROR|53|flink-rest-server-netty-worker-thread-1|2024-09-19 15:16:12.514 GMT|null:-1|Fatal (BAD_CERTIFICATE): Empty server certificate chain ( "throwable" : { javax.net.ssl.SSLHandshakeException: Empty server certificate chain {code} >From the initial looks it seems when Flink server is requesting for >certificates from Client, the client doesn't send anything back since it does >not have certificates matching the CA? Some client is sending a REST request to Flink server which the netty library is handling but until we figure out the client we don't know whether it's the truststore on client that's a problem or something else we don't see here. Thanks! was: Hi, in my kubernetes cluster I have flink-kubernetes-operator v1.7.0 and apache-flink v1.18.1 installed. In the FlinkDeployment CR when I enable Kubernetes high availability services with mTLS something like below: ``` high-availability.type: kubernetes high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory high-availability.storageDir: 'file:///mnt/pv/ha' security.ssl.rest.authentication-enabled: 'true' ``` I am ending up with `SSLHandshakeException with empty client certificate` . Though both of them work fine when implemented individually. Upon enabling `{{{}-{}}}{{{}[Djavax.net|http://djavax.net/]{}}}{{{}.debug=all`{}}} observed client server communication and figured out [https://github.com/apache/flink/blob/release-1.18/flink-runtime/src/main/java/org/apache/flink/runtime/rest/RestClient.java] is where Client gets setup and it happens from the operator side [https://github.com/apache/flink-kubernetes-operator/blob/b081b75b72ddde643710e869b95b214912882363/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java#L750] (correct me here please) When we enable both mTLS and HA the client doesn't seem to be getting setup. Not only that, it doesn't follow the same path of client creation. Below is the part of the ssl handshake log before getting the error: ``` javax.net.ssl|DEBUG|53|flink-rest-server-netty-worker-thread-1|2024-09-19 15:16:12.508 GMT|null:-1|Produced CertificateRequest handshake message ( "CertificateRequest": { "certificate types": [ecdsa_sign, rsa_sign, dss_sign] "supported signature algorithms": [ecdsa_secp256r1_sha256, .., rsa_sha224, dsa_sha224, ecdsa_sha1, rsa_pkcs1_sha1, dsa_sha1] "certificate authorities": [CN=FlinkCA, O=Apache Flink] } javax.net.ssl|DEBUG|53|flink-rest-server-netty-worker-thread-1|2024-09-19 15:16:12.512 GMT|null:-1|Raw read ( 0000: 1603030007 0B 000003000000 ............ ) javax.net.ssl|DEBUG|53|flink-rest-server-netty-worker-thread-1|2024-09-19 15:16:12.513 GMT|null:-1|READ: TLSv1.2 handshake, length = 7 javax.net.ssl|DEBUG|53|flink-rest-server-netty-worker-thread-1|2024-09-19 15:16:12.513 GMT|null:-1|Consuming client Certificate handshake message ( "Certificates": <empty list> ) javax.net.ssl|ERROR|53|flink-rest-server-netty-worker-thread-1|2024-09-19 15:16:12.514 GMT|null:-1|Fatal (BAD_CERTIFICATE): Empty server certificate chain ( "throwable" : { javax.net.ssl.SSLHandshakeException: Empty server certificate chain ``` >From the initial looks it seems when Flink server is requesting for >certificates from Client, the client doesn't send anything back since it does >not have matching CAs? Some client is sending a REST request to Flink server which the netty library is handling but until we figure out the client we don't know whether it's the truststore on client that's a problem or something else we don't see here. Thanks! > Flink 1.18 fails with Empty server certificate chain when High Availability > and mTLS both enabled > ------------------------------------------------------------------------------------------------- > > Key: FLINK-36370 > URL: https://issues.apache.org/jira/browse/FLINK-36370 > Project: Flink > Issue Type: Bug > Reporter: Aniruddh J > Priority: Minor > Attachments: flink-cert-issue.log > > > Hi, in my kubernetes cluster I have flink-kubernetes-operator v1.7.0 and > apache-flink v1.18.1 installed. In the FlinkDeployment CR when I enable > Kubernetes high availability services with mTLS something like below: > {code:java} > high-availability.type: kubernetes > high-availability: > org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory > high-availability.storageDir: 'file:///mnt/pv/ha' > security.ssl.rest.authentication-enabled: 'true'{code} > I am ending up with *SSLHandshakeException with empty client certificate* > > Though both of them work fine when implemented individually. Upon enabling > *{{{}-{}}}{{{}[Djavax.net|http://djavax.net/]{}}}{{{}.debug=all{}}}* observed > client server communication and figured out > [https://github.com/apache/flink/blob/release-1.18/flink-runtime/src/main/java/org/apache/flink/runtime/rest/RestClient.java] > is where Client gets setup and it happens from the operator side > [https://github.com/apache/flink-kubernetes-operator/blob/b081b75b72ddde643710e869b95b214912882363/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java#L750] > (correct me here please) > > When we enable both mTLS and HA the client doesn't seem to be getting setup. > Not only that, it doesn't follow the same path of client creation. Below is > the part of the ssl handshake log before getting the error (attached the > entire ssl handshake log): > {code:java} > javax.net.ssl|DEBUG|53|flink-rest-server-netty-worker-thread-1|2024-09-19 > 15:16:12.508 GMT|null:-1|Produced CertificateRequest handshake message ( > "CertificateRequest": > { "certificate types": [ecdsa_sign, rsa_sign, dss_sign] "supported signature > algorithms": [ecdsa_secp256r1_sha256, .., rsa_sha224, dsa_sha224, ecdsa_sha1, > rsa_pkcs1_sha1, dsa_sha1] "certificate authorities": [CN=FlinkCA, O=Apache > Flink] } > javax.net.ssl|DEBUG|53|flink-rest-server-netty-worker-thread-1|2024-09-19 > 15:16:12.512 GMT|null:-1|Raw read ( > 0000: 1603030007 0B 000003000000 ............ > ) > javax.net.ssl|DEBUG|53|flink-rest-server-netty-worker-thread-1|2024-09-19 > 15:16:12.513 GMT|null:-1|READ: TLSv1.2 handshake, length = 7 > javax.net.ssl|DEBUG|53|flink-rest-server-netty-worker-thread-1|2024-09-19 > 15:16:12.513 GMT|null:-1|Consuming client Certificate handshake message ( > "Certificates": <empty list> > ) > javax.net.ssl|ERROR|53|flink-rest-server-netty-worker-thread-1|2024-09-19 > 15:16:12.514 GMT|null:-1|Fatal (BAD_CERTIFICATE): Empty server certificate > chain ( > "throwable" : { > javax.net.ssl.SSLHandshakeException: Empty server certificate chain > {code} > From the initial looks it seems when Flink server is requesting for > certificates from Client, the client doesn't send anything back since it does > not have certificates matching the CA? > > Some client is sending a REST request to Flink server which the netty library > is handling but until we figure out the client we don't know whether it's the > truststore on client that's a problem or something else we don't see here. > Thanks! -- This message was sent by Atlassian Jira (v8.20.10#820010)