FYI: The 1.6 docs reflect the setup where internal and external SSL are separately configured, and where internal SSL uses client authentication.
https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/security-ssl.html On Mon, Aug 13, 2018 at 8:54 AM, Stephan Ewen <se...@apache.org> wrote: > Sounds good, Eron! > > Please go ahead... > > On Sat, Jul 28, 2018 at 1:33 AM, Eron Wright <eronwri...@gmail.com> wrote: > >> As an update to this thread, Stephan opted to split the internal/external >> configuration (by providing overrides for a common SSL configuration): >> https://github.com/apache/flink/pull/6326 >> >> Note that Akka doesn't support hostname verification in its 'classic' >> remoting implementation (though the new Artery implementation apparently >> does), and such verification wouldn't apply to the client certificate >> anyway. So the reality is that one should use a limited truststore >> (never >> the system truststore) for Akka communication. >> >> On the question of routing external communication thru the YARN resource >> proxy or Mesos/DCOS admin router, the value proposition is: >> a) simplifies service discovery on the part of external clients, >> b) permits single sign-on (SSO) be delegating authentication to a central >> authority, >> c) facilitates access from outside the cluster, via a public address. >> The main challenge is that the Flink client code must support a more >> diverse array of authentication methods, e.g. Kerberos when communicating >> with the YARN proxy. >> >> Given #6326, the next steps would be (unordered): >> a) create an umbrella issue for the overall effort >> b) dive into the authorization work for external communication >> c) implement auto-generation of a certificate for internal communication >> d) implement TLS on queryable state interface (FLINK-5029) >> >> I'll take care of (a) unless there is any objection. >> -Eron >> >> >> On Sun, May 13, 2018 at 5:45 AM Stephan Ewen <ewenstep...@gmail.com> >> wrote: >> >> > Throwing in some more food for thought: >> > >> > An alternative to the above proposed separation of internal and external >> > SSL would be the following: >> > >> > - We separate channel encryption and authentication >> > - We use one common SSL layer (internal and external) that is in both >> > cases only responsible for establishing an encrypted connection >> > - Authentication / authorization internally is done by SASL with >> > username/password or shared secret. >> > - Authentication externally must be through a proxy and authorization >> > based on a validating HTTP headers set by the proxy, as discussed >> above.. >> > >> > Advantages: >> > - There is only one certificate needed, which could also be shared >> across >> > applications >> > - One or two lines in the config authenticate and authorize internal >> > communication >> > - One could possibly still fall back to the other mode by skipping >> > >> > Open Questions / Disadvantages >> > - Given that hostname verification during SSL handshake is not >> possible >> > in many setups, the encrypted channel is vulnerable to man-in-the-middle >> > attacks without mutual authentication. Not sure how serious that is, >> > because it would need an attacker to have compromise network nodes of >> the >> > cluster already. is that not a universal issue in the K8s world? >> > >> > This is anyways a bit hypothetical, because as long as we have akka >> beneath >> > the RPC layer, we cannot go with that approach. >> > >> > However, if we want to at least keep the door open towards something >> like >> > that in the future, we would need to set up configuration in such a way >> > that we have a "common SSL" configuration (keystore, truststore, etc.) >> and >> > internal/external options that override those. That would anyways be >> > helpful for backwards compatibility. >> > >> > @Eron - what are your thoughts on that? >> > >> > >> > >> > >> > >> > >> > >> > >> > On Sun, May 13, 2018 at 1:40 AM, Stephan Ewen <ewenstep...@gmail.com> >> > wrote: >> > >> > > Thank you for bringing this proposal up. It looks very good and we >> seem >> > to >> > > be thinking along very similar lines. >> > > >> > > Below are some comments and thoughts on the FLIP. >> > > >> > > *Internal vs. External Connectivity* >> > > >> > > That is a very helpful distinction, let's build on that. >> > > >> > > - I would suggest to treat eventually all communication coming >> > > potentially from users as external, meaning Client-to-Dispatcher, >> > > Client-to-JobManager (trigger savepoint, change parallelism, ...), Web >> > UI, >> > > Queryable State. >> > > >> > > - That leaves communication that is only between >> > JobManager/TaskManager/ >> > > ResourceManager/Dispatcher/HistoryServer as internal. >> > > >> > > - I am somewhat operating under the assumption that all external >> > > communication will eventually be HTTP/REST. That works best with many >> > > setups and is the basis for using service proxies that >> > > handle authentication/authorization. >> > > >> > > >> > > In Flink 1.5 and future versions, we have the following update there: >> > > >> > > - Akka is now strictly internal connectivity, the client (except >> legacy >> > > client) do not use it any more. >> > > >> > > - The Blob Server will move to purely internal connectivity in Flink >> > > 1.6, where a POST of a job to the Dispatcher has the jars and the >> > JobGraph. >> > > That is important for Kubernetes setups, where exposing the BlobServer >> > and >> > > querying the blob port causes quite some friction. >> > > >> > > - Treating queryable state as "internal connectivity" is fine for >> now. >> > > We should treat it as "external" connectivity in the future if we >> move it >> > > to HTTP/REST. >> > > >> > > >> > > *Internal Connectivity and SSL Mutual Authentication* >> > > >> > > Simply activating SSL mutual authentication for the internal >> > communication >> > > is a really low hanging fruit. >> > > >> > > Activating client authentication for Akka, network stack Netty (and >> Blob >> > > Server/Client in Flink 1.6) should require no change in the >> > configurations >> > > with respect to Flink 1.4. All processes are, with respect to internal >> > > communication, simultaneously server and client endpoints. Because of >> > that, >> > > they already need KeyStore and TrustStore files for SSL handshakes, >> where >> > > the TrustStore needs to trust the KeyStore Certificate. >> > > >> > > I personally favor the suggestion made to have a script that >> generates a >> > > self-signed certificate and adds it to "conf" and updates the >> > > configuration. That should be picked up by the Yarn and Mesos clients >> > > anyways. >> > > >> > > >> > > *External Connectivity* >> > > >> > > There is a huge surface area and I think we need to give users a way >> to >> > > plug in their own tools. >> > > From what I see (and after some discussions with Patrick and Gary) I >> > think >> > > it makes sense to look at proxies in a broad way, similar to the >> approach >> > > Eron outlined. >> > > >> > > The basic approach could be like that: >> > > >> > > - Everything goes through HTTPS, so the proxy can work with HTTP >> > headers. >> > > - The proxy handles authentication and possibly authorization. The >> > proxy >> > > adds some header, for example a user name, a group id, an >> authorization >> > > token. >> > > - Flink can configure an implementation of an 'authorizer' or >> validator >> > > on the headers to decide whether the request is valid. >> > > >> > > - Example 1: The proxy does authentication and adds the user name / >> > > group as a header. The the Flink-side authorizer simply checks whether >> > the >> > > name is in the config (simple ACL-style) scheme. >> > > - Example 2: The proxy adds an JSON Web Token and the authorizer >> > > validates that token. >> > > >> > > For secure connections between the Proxy and the Flink Endpoint I >> would >> > > follow Eron's suggestion, to use separate KeyStores and TrustStores >> than >> > > for internal communication. >> > > >> > > For Yarn and Mesos, I would like to see if we could handle those >> again as >> > > a special case of the proxies above: >> > > - DCOS Admin Router forwards the user authentication token, so that >> > > could be another authorizer implementation. >> > > - In YARN we could see if can implement the IP filter via such an >> > > authorizer. >> > > >> > > >> > > *Hostname Verification* >> > > >> > > For internal communication, and especially on dynamic environments >> like >> > > Kubernetes, it is very hard to work with certificates and have >> hostname >> > > verification on. >> > > >> > > If we assume internal communication works strictly with a shared >> secret >> > > certificate and with client authentication, does hostname verification >> > > actually still add security in that particular setup? My understanding >> > was >> > > that hostname verification is important to not have some valid >> > certificate >> > > presented, but the one bound to the server you want to talk to. If we >> > have >> > > anyways one trusted certificate only, isn't that already implied? >> > > >> > > On the other hand, it is still possible (and potentially valuable) for >> > > users in standalone mode to use keystores and truststores from a PKI, >> in >> > > which case there may still be an argument in favor of hostname >> > verification. >> > > >> > > On Thu, May 10, 2018, 02:30 Eron Wright <eronwri...@gmail.com> wrote: >> > > >> > >> Hello, >> > >> >> > >> Given that some SSL enhancement bugs have been posted lately, I took >> > some >> > >> time to revise FLIP-26 which explores how to harden both external and >> > >> internal communication. >> > >> >> > >> >> > https://cwiki.apache.org/confluence/pages/viewpage.action? >> pageId=80453255 >> > >> >> > >> Some recent related issues: >> > >> - FLINK-9312 - mutual auth for intra-cluster communication >> > >> - FLINK-5030 - original SSL feature work >> > >> >> > >> There's also some recent discussion of how to use Flink SSL >> effectively >> > in >> > >> a Kubernetes environment. The issue is about hostname verification. >> > The >> > >> proposal that I've put forward in FLIP-26 is to not use hostname >> > >> verification for intra-cluster communication, but rather to rely in a >> > >> cluster-internal certificate and a truststore consisting only of that >> > >> certificate. Meanwhile, a new "external" certificate would be >> > >> configurable for the web/api endpoint and associated with a >> well-known >> > DNS >> > >> name as provided by a K8s Service resource. >> > >> >> > >> Stephan is this in-line with your thinking re FLINK-9312? >> > >> >> > >> Thanks >> > >> Eron >> > >> >> > > >> > >> > >