Sounds good, Eron!

Please go ahead...

On Sat, Jul 28, 2018 at 1:33 AM, Eron Wright <eronwri...@gmail.com> wrote:

>  As an update to this thread, Stephan opted to split the internal/external
> configuration (by providing overrides for a common SSL configuration):
> https://github.com/apache/flink/pull/6326
>
> Note that Akka doesn't support hostname verification in its 'classic'
> remoting implementation (though the new Artery implementation apparently
> does), and such verification wouldn't apply to the client certificate
> anyway.   So the reality is that one should use a limited truststore (never
> the system truststore) for Akka communication.
>
> On the question of routing external communication thru the YARN resource
> proxy or Mesos/DCOS admin router, the value proposition is:
> a) simplifies service discovery on the part of external clients,
> b) permits single sign-on (SSO) be delegating authentication to a central
> authority,
> c) facilitates access from outside the cluster, via a public address.
> The main challenge is that the Flink client code must support a more
> diverse array of authentication methods, e.g. Kerberos when communicating
> with the YARN proxy.
>
> Given #6326, the next steps would be (unordered):
> a) create an umbrella issue for the overall effort
> b) dive into the authorization work for external communication
> c) implement auto-generation of a certificate for internal communication
> d) implement TLS on queryable state interface (FLINK-5029)
>
> I'll take care of (a) unless there is any objection.
> -Eron
>
>
> On Sun, May 13, 2018 at 5:45 AM Stephan Ewen <ewenstep...@gmail.com>
> wrote:
>
> > Throwing in some more food for thought:
> >
> > An alternative to the above proposed separation of internal and external
> > SSL would be the following:
> >
> >   - We separate channel encryption and authentication
> >   - We use one common SSL layer (internal and external) that is in both
> > cases only responsible for establishing an encrypted connection
> >   - Authentication / authorization internally is done by SASL with
> > username/password or shared secret.
> >   - Authentication externally must be through a proxy and authorization
> > based on a validating HTTP headers set by the proxy, as discussed above..
> >
> > Advantages:
> >   - There is only one certificate needed, which could also be shared
> across
> > applications
> >   - One or two lines in the config authenticate and authorize internal
> > communication
> >   - One could possibly still fall back to the other mode by skipping
> >
> > Open Questions / Disadvantages
> >   - Given that hostname verification during SSL handshake is not possible
> > in many setups, the encrypted channel is vulnerable to man-in-the-middle
> > attacks without mutual authentication. Not sure how serious that is,
> > because it would need an attacker to have compromise network nodes of the
> > cluster already. is that not a universal issue in the K8s world?
> >
> > This is anyways a bit hypothetical, because as long as we have akka
> beneath
> > the RPC layer, we cannot go with that approach.
> >
> > However, if we want to at least keep the door open towards something like
> > that in the future, we would need to set up configuration in such a way
> > that we have a "common SSL" configuration (keystore, truststore, etc.)
> and
> > internal/external options that override those. That would anyways be
> > helpful for backwards compatibility.
> >
> > @Eron - what are your thoughts on that?
> >
> >
> >
> >
> >
> >
> >
> >
> > On Sun, May 13, 2018 at 1:40 AM, Stephan Ewen <ewenstep...@gmail.com>
> > wrote:
> >
> > > Thank you for bringing this proposal up. It looks very good and we seem
> > to
> > > be thinking along very similar lines.
> > >
> > > Below are some comments and thoughts on the FLIP.
> > >
> > > *Internal vs. External Connectivity*
> > >
> > > That is a very helpful distinction, let's build on that.
> > >
> > >   - I would suggest to treat eventually all communication coming
> > > potentially from users as external, meaning Client-to-Dispatcher,
> > > Client-to-JobManager (trigger savepoint, change parallelism, ...), Web
> > UI,
> > > Queryable State.
> > >
> > >   - That leaves communication that is only between
> > JobManager/TaskManager/
> > > ResourceManager/Dispatcher/HistoryServer as internal.
> > >
> > >   - I am somewhat operating under the assumption that all external
> > > communication will eventually be HTTP/REST. That works best with many
> > > setups and is the basis for using service proxies that
> > > handle  authentication/authorization.
> > >
> > >
> > > In Flink 1.5 and future versions, we have the following update there:
> > >
> > >   - Akka is now strictly internal connectivity, the client (except
> legacy
> > > client) do not use it any more.
> > >
> > >   - The Blob Server will move to purely internal connectivity in Flink
> > > 1.6, where a POST of a job to the Dispatcher has the jars and the
> > JobGraph.
> > > That is important for Kubernetes setups, where exposing the BlobServer
> > and
> > > querying the blob port causes quite some friction.
> > >
> > >   - Treating queryable state as "internal connectivity" is fine for
> now.
> > > We should treat it as "external" connectivity in the future if we move
> it
> > > to HTTP/REST.
> > >
> > >
> > > *Internal Connectivity and SSL Mutual Authentication*
> > >
> > > Simply activating SSL mutual authentication for the internal
> > communication
> > > is a really low hanging fruit.
> > >
> > > Activating client authentication for Akka, network stack Netty (and
> Blob
> > > Server/Client in Flink 1.6) should require no change in the
> > configurations
> > > with respect to Flink 1.4. All processes are, with respect to internal
> > > communication, simultaneously server and client endpoints. Because of
> > that,
> > > they already need KeyStore and TrustStore files for SSL handshakes,
> where
> > > the TrustStore needs to trust the KeyStore Certificate.
> > >
> > > I personally favor the suggestion made to have a script that generates
> a
> > > self-signed certificate and adds it to "conf" and updates the
> > > configuration. That should be picked up by the Yarn and Mesos clients
> > > anyways.
> > >
> > >
> > > *External Connectivity*
> > >
> > > There is a huge surface area and I think we need to give users a way to
> > > plug in their own tools.
> > > From what I see (and after some discussions with Patrick and Gary) I
> > think
> > > it makes sense to look at proxies in a broad way, similar to the
> approach
> > > Eron outlined.
> > >
> > > The basic approach could be like that:
> > >
> > >   - Everything goes through HTTPS, so the proxy can work with HTTP
> > headers.
> > >   - The proxy handles authentication and possibly authorization. The
> > proxy
> > > adds some header, for example a user name, a group id, an authorization
> > > token.
> > >   - Flink can configure an implementation of an 'authorizer' or
> validator
> > > on the headers to decide whether the request is valid.
> > >
> > >   - Example 1: The proxy does authentication and adds the user name /
> > > group as a header. The the Flink-side authorizer simply checks whether
> > the
> > > name is in the config (simple ACL-style) scheme.
> > >   - Example 2: The proxy adds an JSON Web Token and the authorizer
> > > validates that token.
> > >
> > > For secure connections between the Proxy and the Flink Endpoint I would
> > > follow Eron's suggestion, to use separate KeyStores and TrustStores
> than
> > > for internal communication.
> > >
> > > For Yarn and Mesos, I would like to see if we could handle those again
> as
> > > a special case of the proxies above:
> > >   - DCOS Admin Router forwards the user authentication token, so that
> > > could be another authorizer implementation.
> > >   - In YARN we could see if can implement the IP filter via such an
> > > authorizer.
> > >
> > >
> > > *Hostname Verification*
> > >
> > > For internal communication, and especially on dynamic environments like
> > > Kubernetes, it is very hard to work with certificates and have hostname
> > > verification on.
> > >
> > > If we assume internal communication works strictly with a shared secret
> > > certificate and with client authentication, does hostname verification
> > > actually still add security in that particular setup? My understanding
> > was
> > > that hostname verification is important to not have some valid
> > certificate
> > > presented, but the one bound to the server you want to talk to. If we
> > have
> > > anyways one trusted certificate only, isn't that already implied?
> > >
> > > On the other hand, it is still possible (and potentially valuable) for
> > > users in standalone mode to use keystores and truststores from a PKI,
> in
> > > which case there may still be an argument in favor of hostname
> > verification.
> > >
> > > On Thu, May 10, 2018, 02:30 Eron Wright <eronwri...@gmail.com> wrote:
> > >
> > >> Hello,
> > >>
> > >> Given that some SSL enhancement bugs have been posted lately, I took
> > some
> > >> time to revise FLIP-26 which explores how to harden both external and
> > >> internal communication.
> > >>
> > >>
> > https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=80453255
> > >>
> > >> Some recent related issues:
> > >> - FLINK-9312 - mutual auth for intra-cluster communication
> > >> - FLINK-5030 - original SSL feature work
> > >>
> > >> There's also some recent discussion of how to use Flink SSL
> effectively
> > in
> > >> a Kubernetes environment.   The issue is about hostname verification.
> > The
> > >> proposal that I've put forward in FLIP-26 is to not use hostname
> > >> verification for intra-cluster communication, but rather to rely in a
> > >> cluster-internal certificate and a truststore consisting only of that
> > >> certificate.   Meanwhile, a new "external" certificate would be
> > >> configurable for the web/api endpoint and associated with a well-known
> > DNS
> > >> name as provided by a K8s Service resource.
> > >>
> > >> Stephan is this in-line with your thinking re FLINK-9312?
> > >>
> > >> Thanks
> > >> Eron
> > >>
> > >
> >
>

Reply via email to