Sounds good, Eron!
Please go ahead...
On Sat, Jul 28, 2018 at 1:33 AM, Eron Wright wrote:
> As an update to this thread, Stephan opted to split the internal/external
> configuration (by providing overrides for a common SSL configuration):
> https://github.com/apache/flink/pull/6326
>
> Note that Akka doesn't support hostname verification in its 'classic'
> remoting implementation (though the new Artery implementation apparently
> does), and such verification wouldn't apply to the client certificate
> anyway. So the reality is that one should use a limited truststore (never
> the system truststore) for Akka communication.
>
> On the question of routing external communication thru the YARN resource
> proxy or Mesos/DCOS admin router, the value proposition is:
> a) simplifies service discovery on the part of external clients,
> b) permits single sign-on (SSO) be delegating authentication to a central
> authority,
> c) facilitates access from outside the cluster, via a public address.
> The main challenge is that the Flink client code must support a more
> diverse array of authentication methods, e.g. Kerberos when communicating
> with the YARN proxy.
>
> Given #6326, the next steps would be (unordered):
> a) create an umbrella issue for the overall effort
> b) dive into the authorization work for external communication
> c) implement auto-generation of a certificate for internal communication
> d) implement TLS on queryable state interface (FLINK-5029)
>
> I'll take care of (a) unless there is any objection.
> -Eron
>
>
> On Sun, May 13, 2018 at 5:45 AM Stephan Ewen
> wrote:
>
> > Throwing in some more food for thought:
> >
> > An alternative to the above proposed separation of internal and external
> > SSL would be the following:
> >
> > - We separate channel encryption and authentication
> > - We use one common SSL layer (internal and external) that is in both
> > cases only responsible for establishing an encrypted connection
> > - Authentication / authorization internally is done by SASL with
> > username/password or shared secret.
> > - Authentication externally must be through a proxy and authorization
> > based on a validating HTTP headers set by the proxy, as discussed above..
> >
> > Advantages:
> > - There is only one certificate needed, which could also be shared
> across
> > applications
> > - One or two lines in the config authenticate and authorize internal
> > communication
> > - One could possibly still fall back to the other mode by skipping
> >
> > Open Questions / Disadvantages
> > - Given that hostname verification during SSL handshake is not possible
> > in many setups, the encrypted channel is vulnerable to man-in-the-middle
> > attacks without mutual authentication. Not sure how serious that is,
> > because it would need an attacker to have compromise network nodes of the
> > cluster already. is that not a universal issue in the K8s world?
> >
> > This is anyways a bit hypothetical, because as long as we have akka
> beneath
> > the RPC layer, we cannot go with that approach.
> >
> > However, if we want to at least keep the door open towards something like
> > that in the future, we would need to set up configuration in such a way
> > that we have a "common SSL" configuration (keystore, truststore, etc.)
> and
> > internal/external options that override those. That would anyways be
> > helpful for backwards compatibility.
> >
> > @Eron - what are your thoughts on that?
> >
> >
> >
> >
> >
> >
> >
> >
> > On Sun, May 13, 2018 at 1:40 AM, Stephan Ewen
> > wrote:
> >
> > > Thank you for bringing this proposal up. It looks very good and we seem
> > to
> > > be thinking along very similar lines.
> > >
> > > Below are some comments and thoughts on the FLIP.
> > >
> > > *Internal vs. External Connectivity*
> > >
> > > That is a very helpful distinction, let's build on that.
> > >
> > > - I would suggest to treat eventually all communication coming
> > > potentially from users as external, meaning Client-to-Dispatcher,
> > > Client-to-JobManager (trigger savepoint, change parallelism, ...), Web
> > UI,
> > > Queryable State.
> > >
> > > - That leaves communication that is only between
> > JobManager/TaskManager/
> > > ResourceManager/Dispatcher/HistoryServer as internal.
> > >
> > > - I am somewhat operating under the assumption that all external
> > > communication will eventually be HTTP/REST. That works best with many
> > > setups and is the basis for using service proxies that
> > > handle authentication/authorization.
> > >
> > >
> > > In Flink 1.5 and future versions, we have the following update there:
> > >
> > > - Akka is now strictly internal connectivity, the client (except
> legacy
> > > client) do not use it any more.
> > >
> > > - The Blob Server will move to purely internal connectivity in Flink
> > > 1.6, where a POST of a job to the Dispatcher has the jars and the
> > JobGraph.
> > > That is important for Kubernetes setups, where exposing the