Hi! I think there might be possible alternatives but it seems Kerberos on the rest endpoint ticks all the right boxes and provides a super clean and simple solution for strong authentication.
I wouldn’t even consider sidecar proxies etc if we can solve it in such a simple way as proposed by G. Cheers Gyula On Fri, 4 Jun 2021 at 10:03, Till Rohrmann <trohrm...@apache.org> wrote: > I am not saying that we shouldn't add a strong authentication mechanism if > there are good reasons for it. I primarily would like to understand the > context a bit better in order to give qualified feedback and come to a good > decision. In order to do this, I have the feeling that we haven't fully > considered all available options which are on the table, tbh. > > Does the problem of certificate expiry also apply for self-signed > certificates? If yes, then this should then also be a problem for the > internal encryption of Flink's communication. If not, then one could use > self-signed certificates with a longer validity to solve the mentioned > issue. > > I think you can set up Flink in such a way that you don't have to handle > all the different certificates. For example, you could deploy Flink with a > "sidecar proxy" which is responsible for the authentication using an > arbitrary method (e.g. Kerberos) and then bind the REST endpoint to a local > network interface. That way, the REST endpoint would only be available > through the sidecar proxy. Additionally, one could enable SSL for this > communication. Would this be a solution for the problem? > > Cheers, > Till > > On Thu, Jun 3, 2021 at 10:46 PM Márton Balassi <balassi.mar...@gmail.com> > wrote: > >> That is an interesting idea, Till. >> >> The main issue with it is that TLS certificates have an expiration time, >> usually they get approved for a couple years. Forcing our users to restart >> jobs to reprovision TLS certificates would be weird when we could just >> implement a single proper strong authentication mechanism instead in a >> couple hundred lines of code. :-) >> >> In many cases it is also impractical to go the TLS mutual route, because >> the Flink Dashboard can end up on any node in the k8s/Yarn cluster which >> means that we need a certificate per node (due to the mutual auth), but if >> we also want to protect the private key of these from users accidentally or >> intentionally leaking them then we need this per user. As in we end up >> managing user*machine number certificates and having to renew them >> periodically, which albeit automatable is unfortunately not yet automated >> in all large organizations. >> >> I fully agree that TLS certificate mutual authentication has its nice >> properties, especially at very large (multiple thousand node) clusters - >> but it has its own challenges too. Thanks for bringing it up. >> >> Happy to have this added to the rejected alternative list so that we have >> the full picture documented. >> >> On Thu, Jun 3, 2021 at 5:52 PM Till Rohrmann <trohrm...@apache.org> >> wrote: >> >>> I guess the idea would then be to let the proxy do the authentication >>> job and only forward the request via an SSL mutually encrypted connection >>> to the Flink cluster. Would this be possible? The beauty of this setup is >>> in my opinion that this setup should work with all kinds of authentication >>> mechanisms. >>> >>> Cheers, >>> Till >>> >>> On Thu, Jun 3, 2021 at 3:12 PM Gabor Somogyi <gabor.g.somo...@gmail.com> >>> wrote: >>> >>>> Thanks for giving options to fulfil the need. >>>> >>>> Users are looking for a solution where users can be identified on the >>>> whole cluster and restrict access to resources/actions. >>>> A good example for such an action is cancelling other users running >>>> jobs. >>>> >>>> * SSL does provide mutual authentication but when authentication passed >>>> there is no user based on restrictions can be made. >>>> * The less problematic part is that generating/maintaining short time >>>> valid certificates would be a hard (that's the reason KDC like servers >>>> exist). >>>> Having long time valid certificates would widen the attack surface but >>>> since the first concern is there this is just a cosmetic issue. >>>> >>>> All in all using TLS certificates is not sufficient in these >>>> environments unfortunately. >>>> >>>> BR, >>>> G >>>> >>>> >>>> On Thu, Jun 3, 2021 at 12:49 PM Till Rohrmann <trohrm...@apache.org> >>>> wrote: >>>> >>>>> Thanks for the information Gabor. If it is about securing the >>>>> communication between the REST client and the REST server, then Flink >>>>> already supports enabling mutual SSL authentication [1]. Would this be >>>>> enough to secure the communication and to pass an audit? >>>>> >>>>> [1] >>>>> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/security/security-ssl/#external--rest-connectivity >>>>> >>>>> Cheers, >>>>> Till >>>>> >>>>> On Thu, Jun 3, 2021 at 10:33 AM Gabor Somogyi < >>>>> gabor.g.somo...@gmail.com> wrote: >>>>> >>>>>> Hi Till, >>>>>> >>>>>> Since I'm working in security area 10+ years let me share my thought. >>>>>> I would like to emphasise there are experts better than me but I have >>>>>> some >>>>>> basics. >>>>>> The discussion is open and not trying to tell alone things... >>>>>> >>>>>> > I mean if an attacker can get access to one of the machines, then it >>>>>> should also be possible to obtain the right Kerberos token. >>>>>> Not necessarily. For example if one gets access to a specific user's >>>>>> credentials then it's not possible to compromise other user's jobs, >>>>>> data, >>>>>> etc... >>>>>> Security is like an onion, the more layers has been added the more >>>>>> time an >>>>>> attacker needs to proceed. >>>>>> At the end of the day if one is in, then most probably can find the >>>>>> way but >>>>>> this time is normally enough to sysadmins or security experts to >>>>>> close down the system and minimize the damage. >>>>>> >>>>>> The other thing is that all tokens has a timeout and if the token is >>>>>> invalid then the attacker can't proceed further. >>>>>> >>>>>> > Is Kerberos also the standard authentication protocol for Kubernetes >>>>>> deployments? >>>>>> Kerberos is an industry standard which is cloud/deployment agnostic >>>>>> and it >>>>>> can be used in any deployments including k8s. >>>>>> The main intention is to use kerberos in k8s deployments too since >>>>>> we're >>>>>> going this direction as well. >>>>>> Please see how Spark does this: >>>>>> >>>>>> https://spark.apache.org/docs/latest/security.html#secure-interaction-with-kubernetes >>>>>> >>>>>> Last but not least the most important reason to add at least one >>>>>> strong >>>>>> authentication is that we have users who has >>>>>> hard requirements on this. They're doing security audits and if they >>>>>> fail >>>>>> then it's deal breaking. >>>>>> That is why we have added kerberos at the first place. Unfortunately >>>>>> we >>>>>> can't name them in this public list, however >>>>>> the customers who specifically asked for this were mainly in the >>>>>> banking >>>>>> and telco sector. >>>>>> >>>>>> BR, >>>>>> G >>>>>> >>>>>> >>>>>> On Thu, Jun 3, 2021 at 9:20 AM Till Rohrmann <trohrm...@apache.org> >>>>>> wrote: >>>>>> >>>>>> > Thanks for updating the document Márton. Why is it that banks will >>>>>> > consider it more secure if Flink comes with Kerberos authentication >>>>>> > (assuming a properly secured setup)? I mean if an attacker can get >>>>>> access >>>>>> > to one of the machines, then it should also be possible to obtain >>>>>> the right >>>>>> > Kerberos token. >>>>>> > >>>>>> > I am not an authentication expert and that's why I wanted to ask >>>>>> what are >>>>>> > other authentication protocols other than Kerberos? Why did we >>>>>> select >>>>>> > Kerberos and not any other authentication protocol? Maybe you can >>>>>> list the >>>>>> > pros and cons for the different protocols. Is Kerberos also the >>>>>> standard >>>>>> > authentication protocol for Kubernetes deployments? If not, what >>>>>> would be >>>>>> > the answer when deploying on K8s? >>>>>> > >>>>>> > Cheers, >>>>>> > Till >>>>>> > >>>>>> > On Wed, Jun 2, 2021 at 12:07 PM Gabor Somogyi < >>>>>> gabor.g.somo...@gmail.com> >>>>>> > wrote: >>>>>> > >>>>>> >> Hi team, >>>>>> >> >>>>>> >> Happy to be here and hope I can provide quality additions in the >>>>>> future. >>>>>> >> >>>>>> >> Thank you all for helpful the suggestions! >>>>>> >> Considering them the FLIP has been modified and the work continues >>>>>> on the >>>>>> >> already existing Jira. >>>>>> >> >>>>>> >> BR, >>>>>> >> G >>>>>> >> >>>>>> >> >>>>>> >> On Wed, Jun 2, 2021 at 11:23 AM Márton Balassi < >>>>>> balassi.mar...@gmail.com> >>>>>> >> wrote: >>>>>> >> >>>>>> >>> Thanks, Chesney - I totally missed that. Answered on the ticket >>>>>> too, let >>>>>> >>> us continue there then. >>>>>> >>> >>>>>> >>> Till, I agree that we should keep this codepath as slim as >>>>>> possible. It >>>>>> >>> is an important design decision that we aim to keep the list of >>>>>> >>> authentication protocols to a minimum. We believe that this >>>>>> should not be a >>>>>> >>> primary concern of Flink and a trusted proxy service (for example >>>>>> Apache >>>>>> >>> Knox) should be used to enable a multitude of enduser >>>>>> authentication >>>>>> >>> mechanisms. The bare minimum of authentication mechanisms to >>>>>> support >>>>>> >>> consequently consist of a single strong authentication protocol >>>>>> for which >>>>>> >>> Kerberos is the enterprise solution and HTTP Basic primary for >>>>>> development >>>>>> >>> and light-weight scenarios. >>>>>> >>> >>>>>> >>> Added the above wording to G's doc. >>>>>> >>> >>>>>> >>> >>>>>> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> On Tue, Jun 1, 2021 at 11:47 AM Chesnay Schepler < >>>>>> ches...@apache.org> >>>>>> >>> wrote: >>>>>> >>> >>>>>> >>>> There's a related effort: >>>>>> >>>> https://issues.apache.org/jira/browse/FLINK-21108 >>>>>> >>>> >>>>>> >>>> On 6/1/2021 10:14 AM, Till Rohrmann wrote: >>>>>> >>>> > Hi Gabor, welcome to the Flink community! >>>>>> >>>> > >>>>>> >>>> > Thanks for sharing this proposal with the community Márton. In >>>>>> >>>> general, I >>>>>> >>>> > agree that authentication is missing and that this is required >>>>>> for >>>>>> >>>> using >>>>>> >>>> > Flink within an enterprise. The thing I am wondering is >>>>>> whether this >>>>>> >>>> > feature strictly needs to be implemented inside of Flink or >>>>>> whether a >>>>>> >>>> proxy >>>>>> >>>> > setup could do the job? Have you considered this option? If >>>>>> yes, then >>>>>> >>>> it >>>>>> >>>> > would be good to list it under the point of rejected >>>>>> alternatives. >>>>>> >>>> > >>>>>> >>>> > I do see the benefit of implementing this feature inside of >>>>>> Flink if >>>>>> >>>> many >>>>>> >>>> > users need it. If not, then it might be easier for the project >>>>>> to not >>>>>> >>>> > increase the surface area since it makes the overall >>>>>> maintenance >>>>>> >>>> harder. >>>>>> >>>> > >>>>>> >>>> > Cheers, >>>>>> >>>> > Till >>>>>> >>>> > >>>>>> >>>> > On Mon, May 31, 2021 at 4:57 PM Márton Balassi < >>>>>> mbala...@apache.org> >>>>>> >>>> wrote: >>>>>> >>>> > >>>>>> >>>> >> Hi team, >>>>>> >>>> >> >>>>>> >>>> >> Firstly I would like to introduce Gabor or G [1] for short to >>>>>> the >>>>>> >>>> >> community, he is a Spark committer who has recently >>>>>> transitioned to >>>>>> >>>> the >>>>>> >>>> >> Flink Engineering team at Cloudera and is looking forward to >>>>>> >>>> contributing >>>>>> >>>> >> to Apache Flink. Previously G primarily focused on Spark >>>>>> Streaming >>>>>> >>>> and >>>>>> >>>> >> security. >>>>>> >>>> >> >>>>>> >>>> >> Based on requests from our customers G has implemented >>>>>> Kerberos and >>>>>> >>>> HTTP >>>>>> >>>> >> Basic Authentication for the Flink Dashboard and >>>>>> HistoryServer. >>>>>> >>>> Previously >>>>>> >>>> >> lacked an authentication story. >>>>>> >>>> >> >>>>>> >>>> >> We are looking to contribute this functionality back to the >>>>>> >>>> community, we >>>>>> >>>> >> believe that given Flink's maturity there should be a common >>>>>> code >>>>>> >>>> solution >>>>>> >>>> >> for this general pattern. >>>>>> >>>> >> >>>>>> >>>> >> We are looking forward to your feedback on G's design. [2] >>>>>> >>>> >> >>>>>> >>>> >> [1] http://gaborsomogyi.com/ >>>>>> >>>> >> [2] >>>>>> >>>> >> >>>>>> >>>> >> >>>>>> >>>> >>>>>> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit >>>>>> >>>> >> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>>>