Hi Arvid, thanks for the reply. Our stores are world-readable, so I don’t think that it’s an access issue. All of our clients have the stores present through a shared mount as well. I’m able to see the shipped stores in the directory.info output when pulling the YARN logs, and can confirm the account submitting the application has correct privileges.
The exception I shared occurs during the cluster deployment phase. Here’s the full stacktrace: 2021-04-26 13:37:17,468 [main] ERROR ClusterEntrypoint - Could not start cluster entrypoint YarnSessionClusterEntrypoint. org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to initialize the cluster entrypoint YarnSessionClusterEntrypoint. at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:182) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:501) at org.apache.flink.yarn.entrypoint.YarnSessionClusterEntrypoint.main(YarnSessionClusterEntrypoint.java:93) Caused by: org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent. at org.apache.flink.runtime.entrypoint.component.AbstractDispatcherResourceManagerComponentFactory.create(AbstractDispatcherResourceManagerComponentF actory.java:257) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:210) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:163) ... 2 more Caused by: org.apache.flink.util.ConfigurationException: Failed to initialize SSLEngineFactory for REST server endpoint. at org.apache.flink.runtime.rest.RestServerEndpointConfiguration.fromConfiguration(RestServerEndpointConfiguration.java:162) at org.apache.flink.runtime.rest.SessionRestEndpointFactory.createRestEndpoint(SessionRestEndpointFactory.java:54) at org.apache.flink.runtime.entrypoint.component.AbstractDispatcherResourceManagerComponentFactory.create(AbstractDispatcherResourceManagerComponentF actory.java:150) ... 9 more Caused by: java.nio.file.NoSuchFileException: /home/user/ssl/deploy-keys/rest.keystore at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) at java.nio.file.Files.newByteChannel(Files.java:361) at java.nio.file.Files.newByteChannel(Files.java:407) at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384) at java.nio.file.Files.newInputStream(Files.java:152) at org.apache.flink.runtime.net.SSLUtils.getKeyManagerFactory(SSLUtils.java:266) at org.apache.flink.runtime.net.SSLUtils.createRestNettySSLContext(SSLUtils.java:392) at org.apache.flink.runtime.net.SSLUtils.createRestNettySSLContext(SSLUtils.java:365) at org.apache.flink.runtime.net.SSLUtils.createRestServerSSLEngineFactory(SSLUtils.java:163) at org.apache.flink.runtime.rest.RestServerEndpointConfiguration.fromConfiguration(RestServerEndpointConfiguration.java:160) ... 11 more Given the number of machines in our YARN compute cluster, we’d really like to avoid having to have to copy the stores to each machine as that would add another step in configuration each time a machine is replaced, added, etc. The YARN shipping feature is really what we need. The documentation [1] says that we should be able to ship the stores directly from my our client: flink run -m yarn-cluster -yt deploy-keys/ flinkapp.jar But it doesn’t provide an example of the requisite change made in the flink-conf.yaml that supports shipped stores. If we consider that we have the stores available in a local directory called /home/user/ssl/deploy-keys/, and we’re shipping the directory through the –yt option, what do the values of: 1. security.ssl.rest.keystore 2. security.ssl.rest.truststore Need to be in order for this to work? Happy to share our failed application’s YARN logs with you If you require them. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/security-ssl.html#tips-for-yarn--mesos-deployment // ah From: Arvid Heise <ar...@apache.org> Sent: Wednesday, April 21, 2021 1:05 PM To: Hailu, Andreas [Engineering] <andreas.ha...@ny.email.gs.com> Cc: user@flink.apache.org Subject: Re: [1.9.2] Flink SSL on YARN - NoSuchFileException Hi Andreas, I'd check where the exception occurs (not clear from what you posted) and double-check that the part of the system can access the given path deploy-keys/rest.keystore. The brute-force solution is to manually copy the files onto all worker nodes on the respective directory + potentially the client. On Mon, Apr 19, 2021 at 4:45 PM Hailu, Andreas [Engineering] <andreas.ha...@gs.com<mailto:andreas.ha...@gs.com>> wrote: Hi Flink team, I’m trying to configure a Flink on YARN with SSL enabled. I’ve followed the documentation’s instruction [1] to generate a Keystore and Truststore locally, and added a the properties to my flink-conf.yaml. security.ssl.rest.keystore: /home/user/ssl/deploy-keys/rest.keystore security.ssl.rest.truststore: /home/user/ssl/deploy-keys/rest.truststore I’ve also added the yarnship option so that the keystore and truststore are deployed as suggested in [1]. -m yarn-cluster --class <class> [...] -yt /home/user/ssl/deploy-keys/ However, starting the Flink cluster results in a NoSuchFileException, Caused by: java.nio.file.NoSuchFileException: /home/user/ssl/deploy-keys/rest.keystore at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) at java.nio.file.Files.newByteChannel(Files.java:361) at java.nio.file.Files.newByteChannel(Files.java:407) at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384) at java.nio.file.Files.newInputStream(Files.java:152) at org.apache.flink.runtime.net.SSLUtils.getKeyManagerFactory(SSLUtils.java:266) at org.apache.flink.runtime.net.SSLUtils.createRestNettySSLContext(SSLUtils.java:392) at org.apache.flink.runtime.net.SSLUtils.createRestNettySSLContext(SSLUtils.java:365) at org.apache.flink.runtime.net.SSLUtils.createRestServerSSLEngineFactory(SSLUtils.java:163) at org.apache.flink.runtime.rest.RestServerEndpointConfiguration.fromConfiguration(RestServerEndpointConfiguration.java:160) I’m able to see in launch_container.sh that the shipped directory was able to be created successfully: mkdir -p deploy-keys ln -sf "/fs/htmp/yarn/local/usercache/delp/appcache/application_1618711298408_2664/filecache/16/rest.truststore" "deploy-keys/rest.truststore" mkdir -p deploy-keys ln -sf "/fs/htmp/yarn/local/usercache/delp/appcache/application_1618711298408_2664/filecache/13/rest.keystore" "deploy-keys/rest.keystore" So given the above logs, I tried editing flink-conf.yaml to reflect what I saw: security.ssl.rest.keystore: deploy-keys/rest.keystore security.ssl.rest.truststore: deploy-keys/rest.truststore But that didn’t seem to work, either: Caused by: java.nio.file.NoSuchFileException: deploy-keys/rest.truststore at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) at java.nio.file.Files.newByteChannel(Files.java:361) at java.nio.file.Files.newByteChannel(Files.java:407) at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384) at java.nio.file.Files.newInputStream(Files.java:152) at org.apache.flink.runtime.net.SSLUtils.getTrustManagerFactory(SSLUtils.java:233) at org.apache.flink.runtime.net.SSLUtils.createRestNettySSLContext(SSLUtils.java:397) at org.apache.flink.runtime.net.SSLUtils.createRestNettySSLContext(SSLUtils.java:365) at org.apache.flink.runtime.net.SSLUtils.createRestClientSSLEngineFactory(SSLUtils.java:181) at org.apache.flink.runtime.rest.RestClientConfiguration.fromConfiguration(RestClientConfiguration.java:106) What needs to be done to get the YARN application to point to the right keystore and truststore? [1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/security-ssl.html#tips-for-yarn--mesos-deployment<https://urldefense.proofpoint.com/v2/url?u=https-3A__ci.apache.org_projects_flink_flink-2Ddocs-2Drelease-2D1.9_ops_security-2Dssl.html-23tips-2Dfor-2Dyarn-2D-2Dmesos-2Ddeployment&d=DwMFaQ&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM&m=6sX96fiy1e71tCaTQbdV2QYtM4FfnAq3hR9u74PK7kU&s=bJsC35KHZmJQrcj5Ug4F1WhDE96V6eM91wotNOtZoo0&e=> ____________ Andreas Hailu Data Lake Engineering | Goldman Sachs & Co. ________________________________ Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices<http://www.gs.com/privacy-notices> ________________________________ Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices<http://www.gs.com/privacy-notices>