Hi!

I think there might be possible alternatives but it seems Kerberos on the
rest endpoint ticks all the right boxes and provides a super clean and
simple solution for strong authentication.

I wouldn’t even consider sidecar proxies etc if we can solve it in such a
simple way as proposed by G.

Cheers
Gyula

On Fri, 4 Jun 2021 at 10:03, Till Rohrmann <trohrm...@apache.org> wrote:

> I am not saying that we shouldn't add a strong authentication mechanism if
> there are good reasons for it. I primarily would like to understand the
> context a bit better in order to give qualified feedback and come to a good
> decision. In order to do this, I have the feeling that we haven't fully
> considered all available options which are on the table, tbh.
>
> Does the problem of certificate expiry also apply for self-signed
> certificates? If yes, then this should then also be a problem for the
> internal encryption of Flink's communication. If not, then one could use
> self-signed certificates with a longer validity to solve the mentioned
> issue.
>
> I think you can set up Flink in such a way that you don't have to handle
> all the different certificates. For example, you could deploy Flink with a
> "sidecar proxy" which is responsible for the authentication using an
> arbitrary method (e.g. Kerberos) and then bind the REST endpoint to a local
> network interface. That way, the REST endpoint would only be available
> through the sidecar proxy. Additionally, one could enable SSL for this
> communication. Would this be a solution for the problem?
>
> Cheers,
> Till
>
> On Thu, Jun 3, 2021 at 10:46 PM Márton Balassi <balassi.mar...@gmail.com>
> wrote:
>
>> That is an interesting idea, Till.
>>
>> The main issue with it is that TLS certificates have an expiration time,
>> usually they get approved for a couple years. Forcing our users to restart
>> jobs to reprovision TLS certificates would be weird when we could just
>> implement a single proper strong authentication mechanism instead in a
>> couple hundred lines of code. :-)
>>
>> In many cases it is also impractical to go the TLS mutual route, because
>> the Flink Dashboard can end up on any node in the k8s/Yarn cluster which
>> means that we need a certificate per node (due to the mutual auth), but if
>> we also want to protect the private key of these from users accidentally or
>> intentionally leaking them then we need this per user. As in we end up
>> managing user*machine number certificates and having to renew them
>> periodically, which albeit automatable is unfortunately not yet automated
>> in all large organizations.
>>
>> I fully agree that TLS certificate mutual authentication has its nice
>> properties, especially at very large (multiple thousand node) clusters -
>> but it has its own challenges too. Thanks for bringing it up.
>>
>> Happy to have this added to the rejected alternative list so that we have
>> the full picture documented.
>>
>> On Thu, Jun 3, 2021 at 5:52 PM Till Rohrmann <trohrm...@apache.org>
>> wrote:
>>
>>> I guess the idea would then be to let the proxy do the authentication
>>> job and only forward the request via an SSL mutually encrypted connection
>>> to the Flink cluster. Would this be possible? The beauty of this setup is
>>> in my opinion that this setup should work with all kinds of authentication
>>> mechanisms.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Thu, Jun 3, 2021 at 3:12 PM Gabor Somogyi <gabor.g.somo...@gmail.com>
>>> wrote:
>>>
>>>> Thanks for giving options to fulfil the need.
>>>>
>>>> Users are looking for a solution where users can be identified on the
>>>> whole cluster and restrict access to resources/actions.
>>>> A good example for such an action is cancelling other users running
>>>> jobs.
>>>>
>>>> * SSL does provide mutual authentication but when authentication passed
>>>> there is no user based on restrictions can be made.
>>>> * The less problematic part is that generating/maintaining short time
>>>> valid certificates would be a hard (that's the reason KDC like servers
>>>> exist).
>>>> Having long time valid certificates would widen the attack surface but
>>>> since the first concern is there this is just a cosmetic issue.
>>>>
>>>> All in all using TLS certificates is not sufficient in these
>>>> environments unfortunately.
>>>>
>>>> BR,
>>>> G
>>>>
>>>>
>>>> On Thu, Jun 3, 2021 at 12:49 PM Till Rohrmann <trohrm...@apache.org>
>>>> wrote:
>>>>
>>>>> Thanks for the information Gabor. If it is about securing the
>>>>> communication between the REST client and the REST server, then Flink
>>>>> already supports enabling mutual SSL authentication [1]. Would this be
>>>>> enough to secure the communication and to pass an audit?
>>>>>
>>>>> [1]
>>>>> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/security/security-ssl/#external--rest-connectivity
>>>>>
>>>>> Cheers,
>>>>> Till
>>>>>
>>>>> On Thu, Jun 3, 2021 at 10:33 AM Gabor Somogyi <
>>>>> gabor.g.somo...@gmail.com> wrote:
>>>>>
>>>>>> Hi Till,
>>>>>>
>>>>>> Since I'm working in security area 10+ years let me share my thought.
>>>>>> I would like to emphasise there are experts better than me but I have
>>>>>> some
>>>>>> basics.
>>>>>> The discussion is open and not trying to tell alone things...
>>>>>>
>>>>>> > I mean if an attacker can get access to one of the machines, then it
>>>>>> should also be possible to obtain the right Kerberos token.
>>>>>> Not necessarily. For example if one gets access to a specific user's
>>>>>> credentials then it's not possible to compromise other user's jobs,
>>>>>> data,
>>>>>> etc...
>>>>>> Security is like an onion, the more layers has been added the more
>>>>>> time an
>>>>>> attacker needs to proceed.
>>>>>> At the end of the day if one is in, then most probably can find the
>>>>>> way but
>>>>>> this time is normally enough to sysadmins or security experts to
>>>>>> close down the system and minimize the damage.
>>>>>>
>>>>>> The other thing is that all tokens has a timeout and if the token is
>>>>>> invalid then the attacker can't proceed further.
>>>>>>
>>>>>> > Is Kerberos also the standard authentication protocol for Kubernetes
>>>>>> deployments?
>>>>>> Kerberos is an industry standard which is cloud/deployment agnostic
>>>>>> and it
>>>>>> can be used in any deployments including k8s.
>>>>>> The main intention is to use kerberos in k8s deployments too since
>>>>>> we're
>>>>>> going this direction as well.
>>>>>> Please see how Spark does this:
>>>>>>
>>>>>> https://spark.apache.org/docs/latest/security.html#secure-interaction-with-kubernetes
>>>>>>
>>>>>> Last but not least the most important reason to add at least one
>>>>>> strong
>>>>>> authentication is that we have users who has
>>>>>> hard requirements on this. They're doing security audits and if they
>>>>>> fail
>>>>>> then it's deal breaking.
>>>>>> That is why we have added kerberos at the first place. Unfortunately
>>>>>> we
>>>>>> can't name them in this public list, however
>>>>>> the customers who specifically asked for this were mainly in the
>>>>>> banking
>>>>>> and telco sector.
>>>>>>
>>>>>> BR,
>>>>>> G
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 3, 2021 at 9:20 AM Till Rohrmann <trohrm...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>> > Thanks for updating the document Márton. Why is it that banks will
>>>>>> > consider it more secure if Flink comes with Kerberos authentication
>>>>>> > (assuming a properly secured setup)? I mean if an attacker can get
>>>>>> access
>>>>>> > to one of the machines, then it should also be possible to obtain
>>>>>> the right
>>>>>> > Kerberos token.
>>>>>> >
>>>>>> > I am not an authentication expert and that's why I wanted to ask
>>>>>> what are
>>>>>> > other authentication protocols other than Kerberos? Why did we
>>>>>> select
>>>>>> > Kerberos and not any other authentication protocol? Maybe you can
>>>>>> list the
>>>>>> > pros and cons for the different protocols. Is Kerberos also the
>>>>>> standard
>>>>>> > authentication protocol for Kubernetes deployments? If not, what
>>>>>> would be
>>>>>> > the answer when deploying on K8s?
>>>>>> >
>>>>>> > Cheers,
>>>>>> > Till
>>>>>> >
>>>>>> > On Wed, Jun 2, 2021 at 12:07 PM Gabor Somogyi <
>>>>>> gabor.g.somo...@gmail.com>
>>>>>> > wrote:
>>>>>> >
>>>>>> >> Hi team,
>>>>>> >>
>>>>>> >> Happy to be here and hope I can provide quality additions in the
>>>>>> future.
>>>>>> >>
>>>>>> >> Thank you all for helpful the suggestions!
>>>>>> >> Considering them the FLIP has been modified and the work continues
>>>>>> on the
>>>>>> >> already existing Jira.
>>>>>> >>
>>>>>> >> BR,
>>>>>> >> G
>>>>>> >>
>>>>>> >>
>>>>>> >> On Wed, Jun 2, 2021 at 11:23 AM Márton Balassi <
>>>>>> balassi.mar...@gmail.com>
>>>>>> >> wrote:
>>>>>> >>
>>>>>> >>> Thanks, Chesney - I totally missed that. Answered on the ticket
>>>>>> too, let
>>>>>> >>> us continue there then.
>>>>>> >>>
>>>>>> >>> Till, I agree that we should keep this codepath as slim as
>>>>>> possible. It
>>>>>> >>> is an important design decision that we aim to keep the list of
>>>>>> >>> authentication protocols to a minimum. We believe that this
>>>>>> should not be a
>>>>>> >>> primary concern of Flink and a trusted proxy service (for example
>>>>>> Apache
>>>>>> >>> Knox) should be used to enable a multitude of enduser
>>>>>> authentication
>>>>>> >>> mechanisms. The bare minimum of authentication mechanisms to
>>>>>> support
>>>>>> >>> consequently consist of a single strong authentication protocol
>>>>>> for which
>>>>>> >>> Kerberos is the enterprise solution and HTTP Basic primary for
>>>>>> development
>>>>>> >>> and light-weight scenarios.
>>>>>> >>>
>>>>>> >>> Added the above wording to G's doc.
>>>>>> >>>
>>>>>> >>>
>>>>>> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> On Tue, Jun 1, 2021 at 11:47 AM Chesnay Schepler <
>>>>>> ches...@apache.org>
>>>>>> >>> wrote:
>>>>>> >>>
>>>>>> >>>> There's a related effort:
>>>>>> >>>> https://issues.apache.org/jira/browse/FLINK-21108
>>>>>> >>>>
>>>>>> >>>> On 6/1/2021 10:14 AM, Till Rohrmann wrote:
>>>>>> >>>> > Hi Gabor, welcome to the Flink community!
>>>>>> >>>> >
>>>>>> >>>> > Thanks for sharing this proposal with the community Márton. In
>>>>>> >>>> general, I
>>>>>> >>>> > agree that authentication is missing and that this is required
>>>>>> for
>>>>>> >>>> using
>>>>>> >>>> > Flink within an enterprise. The thing I am wondering is
>>>>>> whether this
>>>>>> >>>> > feature strictly needs to be implemented inside of Flink or
>>>>>> whether a
>>>>>> >>>> proxy
>>>>>> >>>> > setup could do the job? Have you considered this option? If
>>>>>> yes, then
>>>>>> >>>> it
>>>>>> >>>> > would be good to list it under the point of rejected
>>>>>> alternatives.
>>>>>> >>>> >
>>>>>> >>>> > I do see the benefit of implementing this feature inside of
>>>>>> Flink if
>>>>>> >>>> many
>>>>>> >>>> > users need it. If not, then it might be easier for the project
>>>>>> to not
>>>>>> >>>> > increase the surface area since it makes the overall
>>>>>> maintenance
>>>>>> >>>> harder.
>>>>>> >>>> >
>>>>>> >>>> > Cheers,
>>>>>> >>>> > Till
>>>>>> >>>> >
>>>>>> >>>> > On Mon, May 31, 2021 at 4:57 PM Márton Balassi <
>>>>>> mbala...@apache.org>
>>>>>> >>>> wrote:
>>>>>> >>>> >
>>>>>> >>>> >> Hi team,
>>>>>> >>>> >>
>>>>>> >>>> >> Firstly I would like to introduce Gabor or G [1] for short to
>>>>>> the
>>>>>> >>>> >> community, he is a Spark committer who has recently
>>>>>> transitioned to
>>>>>> >>>> the
>>>>>> >>>> >> Flink Engineering team at Cloudera and is looking forward to
>>>>>> >>>> contributing
>>>>>> >>>> >> to Apache Flink. Previously G primarily focused on Spark
>>>>>> Streaming
>>>>>> >>>> and
>>>>>> >>>> >> security.
>>>>>> >>>> >>
>>>>>> >>>> >> Based on requests from our customers G has implemented
>>>>>> Kerberos and
>>>>>> >>>> HTTP
>>>>>> >>>> >> Basic Authentication for the Flink Dashboard and
>>>>>> HistoryServer.
>>>>>> >>>> Previously
>>>>>> >>>> >> lacked an authentication story.
>>>>>> >>>> >>
>>>>>> >>>> >> We are looking to contribute this functionality back to the
>>>>>> >>>> community, we
>>>>>> >>>> >> believe that given Flink's maturity there should be a common
>>>>>> code
>>>>>> >>>> solution
>>>>>> >>>> >> for this general pattern.
>>>>>> >>>> >>
>>>>>> >>>> >> We are looking forward to your feedback on G's design. [2]
>>>>>> >>>> >>
>>>>>> >>>> >> [1] http://gaborsomogyi.com/
>>>>>> >>>> >> [2]
>>>>>> >>>> >>
>>>>>> >>>> >>
>>>>>> >>>>
>>>>>> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
>>>>>> >>>> >>
>>>>>> >>>>
>>>>>> >>>>
>>>>>>
>>>>>

Reply via email to