Hi Arvid, thanks for the reply.
Our stores are world-readable, so I don’t think that it’s an access issue. All
of our clients have the stores present through a shared mount as well. I’m able
to see the shipped stores in the directory.info output when pulling the YARN
logs, and can confirm the account submitting the application has correct
privileges.
The exception I shared occurs during the cluster deployment phase. Here’s the
full stacktrace:
2021-04-26 13:37:17,468 [main] ERROR ClusterEntrypoint - Could not start
cluster entrypoint YarnSessionClusterEntrypoint.
org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to
initialize the cluster entrypoint YarnSessionClusterEntrypoint.
at
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:182)
at
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:501)
at
org.apache.flink.yarn.entrypoint.YarnSessionClusterEntrypoint.main(YarnSessionClusterEntrypoint.java:93)
Caused by: org.apache.flink.util.FlinkException: Could not create the
DispatcherResourceManagerComponent.
at
org.apache.flink.runtime.entrypoint.component.AbstractDispatcherResourceManagerComponentFactory.create(AbstractDispatcherResourceManagerComponentF
actory.java:257)
at
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:210)
at
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:163)
... 2 more
Caused by: org.apache.flink.util.ConfigurationException: Failed to initialize
SSLEngineFactory for REST server endpoint.
at
org.apache.flink.runtime.rest.RestServerEndpointConfiguration.fromConfiguration(RestServerEndpointConfiguration.java:162)
at
org.apache.flink.runtime.rest.SessionRestEndpointFactory.createRestEndpoint(SessionRestEndpointFactory.java:54)
at
org.apache.flink.runtime.entrypoint.component.AbstractDispatcherResourceManagerComponentFactory.create(AbstractDispatcherResourceManagerComponentF
actory.java:150)
... 9 more
Caused by: java.nio.file.NoSuchFileException:
/home/user/ssl/deploy-keys/rest.keystore
at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at
sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
at java.nio.file.Files.newByteChannel(Files.java:361)
at java.nio.file.Files.newByteChannel(Files.java:407)
at
java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
at java.nio.file.Files.newInputStream(Files.java:152)
at
org.apache.flink.runtime.net.SSLUtils.getKeyManagerFactory(SSLUtils.java:266)
at
org.apache.flink.runtime.net.SSLUtils.createRestNettySSLContext(SSLUtils.java:392)
at
org.apache.flink.runtime.net.SSLUtils.createRestNettySSLContext(SSLUtils.java:365)
at
org.apache.flink.runtime.net.SSLUtils.createRestServerSSLEngineFactory(SSLUtils.java:163)
at
org.apache.flink.runtime.rest.RestServerEndpointConfiguration.fromConfiguration(RestServerEndpointConfiguration.java:160)
... 11 more
Given the number of machines in our YARN compute cluster, we’d really like to
avoid having to have to copy the stores to each machine as that would add
another step in configuration each time a machine is replaced, added, etc. The
YARN shipping feature is really what we need.
The documentation [1] says that we should be able to ship the stores directly
from my our client:
flink run -m yarn-cluster -yt deploy-keys/ flinkapp.jar
But it doesn’t provide an example of the requisite change made in the
flink-conf.yaml that supports shipped stores.
If we consider that we have the stores available in a local directory called
/home/user/ssl/deploy-keys/, and we’re shipping the directory through the –yt
option, what do the values of:
1. security.ssl.rest.keystore
2. security.ssl.rest.truststore
Need to be in order for this to work? Happy to share our failed application’s
YARN logs with you If you require them.
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/security-ssl.html#tips-for-yarn--mesos-deployment
// ah
From: Arvid Heise <[email protected]>
Sent: Wednesday, April 21, 2021 1:05 PM
To: Hailu, Andreas [Engineering] <[email protected]>
Cc: [email protected]
Subject: Re: [1.9.2] Flink SSL on YARN - NoSuchFileException
Hi Andreas,
I'd check where the exception occurs (not clear from what you posted) and
double-check that the part of the system can access the given path
deploy-keys/rest.keystore.
The brute-force solution is to manually copy the files onto all worker nodes on
the respective directory + potentially the client.
On Mon, Apr 19, 2021 at 4:45 PM Hailu, Andreas [Engineering]
<[email protected]<mailto:[email protected]>> wrote:
Hi Flink team,
I’m trying to configure a Flink on YARN with SSL enabled. I’ve followed the
documentation’s instruction [1] to generate a Keystore and Truststore locally,
and added a the properties to my flink-conf.yaml.
security.ssl.rest.keystore: /home/user/ssl/deploy-keys/rest.keystore
security.ssl.rest.truststore: /home/user/ssl/deploy-keys/rest.truststore
I’ve also added the yarnship option so that the keystore and truststore are
deployed as suggested in [1].
-m yarn-cluster --class <class> [...] -yt /home/user/ssl/deploy-keys/
However, starting the Flink cluster results in a NoSuchFileException,
Caused by: java.nio.file.NoSuchFileException:
/home/user/ssl/deploy-keys/rest.keystore
at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at
sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
at java.nio.file.Files.newByteChannel(Files.java:361)
at java.nio.file.Files.newByteChannel(Files.java:407)
at
java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
at java.nio.file.Files.newInputStream(Files.java:152)
at
org.apache.flink.runtime.net.SSLUtils.getKeyManagerFactory(SSLUtils.java:266)
at
org.apache.flink.runtime.net.SSLUtils.createRestNettySSLContext(SSLUtils.java:392)
at
org.apache.flink.runtime.net.SSLUtils.createRestNettySSLContext(SSLUtils.java:365)
at
org.apache.flink.runtime.net.SSLUtils.createRestServerSSLEngineFactory(SSLUtils.java:163)
at
org.apache.flink.runtime.rest.RestServerEndpointConfiguration.fromConfiguration(RestServerEndpointConfiguration.java:160)
I’m able to see in launch_container.sh that the shipped directory was able to
be created successfully:
mkdir -p deploy-keys
ln -sf
"/fs/htmp/yarn/local/usercache/delp/appcache/application_1618711298408_2664/filecache/16/rest.truststore"
"deploy-keys/rest.truststore"
mkdir -p deploy-keys
ln -sf
"/fs/htmp/yarn/local/usercache/delp/appcache/application_1618711298408_2664/filecache/13/rest.keystore"
"deploy-keys/rest.keystore"
So given the above logs, I tried editing flink-conf.yaml to reflect what I saw:
security.ssl.rest.keystore: deploy-keys/rest.keystore
security.ssl.rest.truststore: deploy-keys/rest.truststore
But that didn’t seem to work, either:
Caused by: java.nio.file.NoSuchFileException: deploy-keys/rest.truststore
at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at
sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
at java.nio.file.Files.newByteChannel(Files.java:361)
at java.nio.file.Files.newByteChannel(Files.java:407)
at
java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
at java.nio.file.Files.newInputStream(Files.java:152)
at
org.apache.flink.runtime.net.SSLUtils.getTrustManagerFactory(SSLUtils.java:233)
at
org.apache.flink.runtime.net.SSLUtils.createRestNettySSLContext(SSLUtils.java:397)
at
org.apache.flink.runtime.net.SSLUtils.createRestNettySSLContext(SSLUtils.java:365)
at
org.apache.flink.runtime.net.SSLUtils.createRestClientSSLEngineFactory(SSLUtils.java:181)
at
org.apache.flink.runtime.rest.RestClientConfiguration.fromConfiguration(RestClientConfiguration.java:106)
What needs to be done to get the YARN application to point to the right
keystore and truststore?
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/security-ssl.html#tips-for-yarn--mesos-deployment<https://urldefense.proofpoint.com/v2/url?u=https-3A__ci.apache.org_projects_flink_flink-2Ddocs-2Drelease-2D1.9_ops_security-2Dssl.html-23tips-2Dfor-2Dyarn-2D-2Dmesos-2Ddeployment&d=DwMFaQ&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM&m=6sX96fiy1e71tCaTQbdV2QYtM4FfnAq3hR9u74PK7kU&s=bJsC35KHZmJQrcj5Ug4F1WhDE96V6eM91wotNOtZoo0&e=>
____________
Andreas Hailu
Data Lake Engineering | Goldman Sachs & Co.
________________________________
Your Personal Data: We may collect and process information about you that may
be subject to data protection laws. For more information about how we use and
disclose your personal data, how we protect your information, our legal basis
to use your information, your rights and who you can contact, please refer to:
www.gs.com/privacy-notices<http://www.gs.com/privacy-notices>
________________________________
Your Personal Data: We may collect and process information about you that may
be subject to data protection laws. For more information about how we use and
disclose your personal data, how we protect your information, our legal basis
to use your information, your rights and who you can contact, please refer to:
www.gs.com/privacy-notices<http://www.gs.com/privacy-notices>