I did not mean for the user to sign its own certificates but for the
operator of the cluster. Once the user request hits the proxy, it should no
longer be under his control. I think I do not fully understand yet why this
would not work.

What I would like to avoid is to add more complexity into Flink if there is
an easy solution which fulfills the requirements. That's why I would like
to exercise thoroughly through the different alternatives. Also, I am
missing a bit the comparison of Kerberos to other authentication mechanisms
and why they were rejected in favour of Kerberos.

Cheers,
Till

On Fri, Jun 4, 2021 at 10:26 AM Gyula Fóra <gyf...@apache.org> wrote:

> Hi!
>
> I think there might be possible alternatives but it seems Kerberos on the
> rest endpoint ticks all the right boxes and provides a super clean and
> simple solution for strong authentication.
>
> I wouldn’t even consider sidecar proxies etc if we can solve it in such a
> simple way as proposed by G.
>
> Cheers
> Gyula
>
> On Fri, 4 Jun 2021 at 10:03, Till Rohrmann <trohrm...@apache.org> wrote:
>
>> I am not saying that we shouldn't add a strong authentication mechanism
>> if there are good reasons for it. I primarily would like to understand the
>> context a bit better in order to give qualified feedback and come to a good
>> decision. In order to do this, I have the feeling that we haven't fully
>> considered all available options which are on the table, tbh.
>>
>> Does the problem of certificate expiry also apply for self-signed
>> certificates? If yes, then this should then also be a problem for the
>> internal encryption of Flink's communication. If not, then one could use
>> self-signed certificates with a longer validity to solve the mentioned
>> issue.
>>
>> I think you can set up Flink in such a way that you don't have to handle
>> all the different certificates. For example, you could deploy Flink with a
>> "sidecar proxy" which is responsible for the authentication using an
>> arbitrary method (e.g. Kerberos) and then bind the REST endpoint to a local
>> network interface. That way, the REST endpoint would only be available
>> through the sidecar proxy. Additionally, one could enable SSL for this
>> communication. Would this be a solution for the problem?
>>
>> Cheers,
>> Till
>>
>> On Thu, Jun 3, 2021 at 10:46 PM Márton Balassi <balassi.mar...@gmail.com>
>> wrote:
>>
>>> That is an interesting idea, Till.
>>>
>>> The main issue with it is that TLS certificates have an expiration time,
>>> usually they get approved for a couple years. Forcing our users to restart
>>> jobs to reprovision TLS certificates would be weird when we could just
>>> implement a single proper strong authentication mechanism instead in a
>>> couple hundred lines of code. :-)
>>>
>>> In many cases it is also impractical to go the TLS mutual route, because
>>> the Flink Dashboard can end up on any node in the k8s/Yarn cluster which
>>> means that we need a certificate per node (due to the mutual auth), but if
>>> we also want to protect the private key of these from users accidentally or
>>> intentionally leaking them then we need this per user. As in we end up
>>> managing user*machine number certificates and having to renew them
>>> periodically, which albeit automatable is unfortunately not yet automated
>>> in all large organizations.
>>>
>>> I fully agree that TLS certificate mutual authentication has its nice
>>> properties, especially at very large (multiple thousand node) clusters -
>>> but it has its own challenges too. Thanks for bringing it up.
>>>
>>> Happy to have this added to the rejected alternative list so that we
>>> have the full picture documented.
>>>
>>> On Thu, Jun 3, 2021 at 5:52 PM Till Rohrmann <trohrm...@apache.org>
>>> wrote:
>>>
>>>> I guess the idea would then be to let the proxy do the authentication
>>>> job and only forward the request via an SSL mutually encrypted connection
>>>> to the Flink cluster. Would this be possible? The beauty of this setup is
>>>> in my opinion that this setup should work with all kinds of authentication
>>>> mechanisms.
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>> On Thu, Jun 3, 2021 at 3:12 PM Gabor Somogyi <gabor.g.somo...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks for giving options to fulfil the need.
>>>>>
>>>>> Users are looking for a solution where users can be identified on the
>>>>> whole cluster and restrict access to resources/actions.
>>>>> A good example for such an action is cancelling other users running
>>>>> jobs.
>>>>>
>>>>> * SSL does provide mutual authentication but when authentication
>>>>> passed there is no user based on restrictions can be made.
>>>>> * The less problematic part is that generating/maintaining short time
>>>>> valid certificates would be a hard (that's the reason KDC like servers
>>>>> exist).
>>>>> Having long time valid certificates would widen the attack surface but
>>>>> since the first concern is there this is just a cosmetic issue.
>>>>>
>>>>> All in all using TLS certificates is not sufficient in these
>>>>> environments unfortunately.
>>>>>
>>>>> BR,
>>>>> G
>>>>>
>>>>>
>>>>> On Thu, Jun 3, 2021 at 12:49 PM Till Rohrmann <trohrm...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Thanks for the information Gabor. If it is about securing the
>>>>>> communication between the REST client and the REST server, then Flink
>>>>>> already supports enabling mutual SSL authentication [1]. Would this be
>>>>>> enough to secure the communication and to pass an audit?
>>>>>>
>>>>>> [1]
>>>>>> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/security/security-ssl/#external--rest-connectivity
>>>>>>
>>>>>> Cheers,
>>>>>> Till
>>>>>>
>>>>>> On Thu, Jun 3, 2021 at 10:33 AM Gabor Somogyi <
>>>>>> gabor.g.somo...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Till,
>>>>>>>
>>>>>>> Since I'm working in security area 10+ years let me share my thought.
>>>>>>> I would like to emphasise there are experts better than me but I
>>>>>>> have some
>>>>>>> basics.
>>>>>>> The discussion is open and not trying to tell alone things...
>>>>>>>
>>>>>>> > I mean if an attacker can get access to one of the machines, then
>>>>>>> it
>>>>>>> should also be possible to obtain the right Kerberos token.
>>>>>>> Not necessarily. For example if one gets access to a specific user's
>>>>>>> credentials then it's not possible to compromise other user's jobs,
>>>>>>> data,
>>>>>>> etc...
>>>>>>> Security is like an onion, the more layers has been added the more
>>>>>>> time an
>>>>>>> attacker needs to proceed.
>>>>>>> At the end of the day if one is in, then most probably can find the
>>>>>>> way but
>>>>>>> this time is normally enough to sysadmins or security experts to
>>>>>>> close down the system and minimize the damage.
>>>>>>>
>>>>>>> The other thing is that all tokens has a timeout and if the token is
>>>>>>> invalid then the attacker can't proceed further.
>>>>>>>
>>>>>>> > Is Kerberos also the standard authentication protocol for
>>>>>>> Kubernetes
>>>>>>> deployments?
>>>>>>> Kerberos is an industry standard which is cloud/deployment agnostic
>>>>>>> and it
>>>>>>> can be used in any deployments including k8s.
>>>>>>> The main intention is to use kerberos in k8s deployments too since
>>>>>>> we're
>>>>>>> going this direction as well.
>>>>>>> Please see how Spark does this:
>>>>>>>
>>>>>>> https://spark.apache.org/docs/latest/security.html#secure-interaction-with-kubernetes
>>>>>>>
>>>>>>> Last but not least the most important reason to add at least one
>>>>>>> strong
>>>>>>> authentication is that we have users who has
>>>>>>> hard requirements on this. They're doing security audits and if they
>>>>>>> fail
>>>>>>> then it's deal breaking.
>>>>>>> That is why we have added kerberos at the first place. Unfortunately
>>>>>>> we
>>>>>>> can't name them in this public list, however
>>>>>>> the customers who specifically asked for this were mainly in the
>>>>>>> banking
>>>>>>> and telco sector.
>>>>>>>
>>>>>>> BR,
>>>>>>> G
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jun 3, 2021 at 9:20 AM Till Rohrmann <trohrm...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>> > Thanks for updating the document Márton. Why is it that banks will
>>>>>>> > consider it more secure if Flink comes with Kerberos authentication
>>>>>>> > (assuming a properly secured setup)? I mean if an attacker can get
>>>>>>> access
>>>>>>> > to one of the machines, then it should also be possible to obtain
>>>>>>> the right
>>>>>>> > Kerberos token.
>>>>>>> >
>>>>>>> > I am not an authentication expert and that's why I wanted to ask
>>>>>>> what are
>>>>>>> > other authentication protocols other than Kerberos? Why did we
>>>>>>> select
>>>>>>> > Kerberos and not any other authentication protocol? Maybe you can
>>>>>>> list the
>>>>>>> > pros and cons for the different protocols. Is Kerberos also the
>>>>>>> standard
>>>>>>> > authentication protocol for Kubernetes deployments? If not, what
>>>>>>> would be
>>>>>>> > the answer when deploying on K8s?
>>>>>>> >
>>>>>>> > Cheers,
>>>>>>> > Till
>>>>>>> >
>>>>>>> > On Wed, Jun 2, 2021 at 12:07 PM Gabor Somogyi <
>>>>>>> gabor.g.somo...@gmail.com>
>>>>>>> > wrote:
>>>>>>> >
>>>>>>> >> Hi team,
>>>>>>> >>
>>>>>>> >> Happy to be here and hope I can provide quality additions in the
>>>>>>> future.
>>>>>>> >>
>>>>>>> >> Thank you all for helpful the suggestions!
>>>>>>> >> Considering them the FLIP has been modified and the work
>>>>>>> continues on the
>>>>>>> >> already existing Jira.
>>>>>>> >>
>>>>>>> >> BR,
>>>>>>> >> G
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Wed, Jun 2, 2021 at 11:23 AM Márton Balassi <
>>>>>>> balassi.mar...@gmail.com>
>>>>>>> >> wrote:
>>>>>>> >>
>>>>>>> >>> Thanks, Chesney - I totally missed that. Answered on the ticket
>>>>>>> too, let
>>>>>>> >>> us continue there then.
>>>>>>> >>>
>>>>>>> >>> Till, I agree that we should keep this codepath as slim as
>>>>>>> possible. It
>>>>>>> >>> is an important design decision that we aim to keep the list of
>>>>>>> >>> authentication protocols to a minimum. We believe that this
>>>>>>> should not be a
>>>>>>> >>> primary concern of Flink and a trusted proxy service (for
>>>>>>> example Apache
>>>>>>> >>> Knox) should be used to enable a multitude of enduser
>>>>>>> authentication
>>>>>>> >>> mechanisms. The bare minimum of authentication mechanisms to
>>>>>>> support
>>>>>>> >>> consequently consist of a single strong authentication protocol
>>>>>>> for which
>>>>>>> >>> Kerberos is the enterprise solution and HTTP Basic primary for
>>>>>>> development
>>>>>>> >>> and light-weight scenarios.
>>>>>>> >>>
>>>>>>> >>> Added the above wording to G's doc.
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> On Tue, Jun 1, 2021 at 11:47 AM Chesnay Schepler <
>>>>>>> ches...@apache.org>
>>>>>>> >>> wrote:
>>>>>>> >>>
>>>>>>> >>>> There's a related effort:
>>>>>>> >>>> https://issues.apache.org/jira/browse/FLINK-21108
>>>>>>> >>>>
>>>>>>> >>>> On 6/1/2021 10:14 AM, Till Rohrmann wrote:
>>>>>>> >>>> > Hi Gabor, welcome to the Flink community!
>>>>>>> >>>> >
>>>>>>> >>>> > Thanks for sharing this proposal with the community Márton. In
>>>>>>> >>>> general, I
>>>>>>> >>>> > agree that authentication is missing and that this is
>>>>>>> required for
>>>>>>> >>>> using
>>>>>>> >>>> > Flink within an enterprise. The thing I am wondering is
>>>>>>> whether this
>>>>>>> >>>> > feature strictly needs to be implemented inside of Flink or
>>>>>>> whether a
>>>>>>> >>>> proxy
>>>>>>> >>>> > setup could do the job? Have you considered this option? If
>>>>>>> yes, then
>>>>>>> >>>> it
>>>>>>> >>>> > would be good to list it under the point of rejected
>>>>>>> alternatives.
>>>>>>> >>>> >
>>>>>>> >>>> > I do see the benefit of implementing this feature inside of
>>>>>>> Flink if
>>>>>>> >>>> many
>>>>>>> >>>> > users need it. If not, then it might be easier for the
>>>>>>> project to not
>>>>>>> >>>> > increase the surface area since it makes the overall
>>>>>>> maintenance
>>>>>>> >>>> harder.
>>>>>>> >>>> >
>>>>>>> >>>> > Cheers,
>>>>>>> >>>> > Till
>>>>>>> >>>> >
>>>>>>> >>>> > On Mon, May 31, 2021 at 4:57 PM Márton Balassi <
>>>>>>> mbala...@apache.org>
>>>>>>> >>>> wrote:
>>>>>>> >>>> >
>>>>>>> >>>> >> Hi team,
>>>>>>> >>>> >>
>>>>>>> >>>> >> Firstly I would like to introduce Gabor or G [1] for short
>>>>>>> to the
>>>>>>> >>>> >> community, he is a Spark committer who has recently
>>>>>>> transitioned to
>>>>>>> >>>> the
>>>>>>> >>>> >> Flink Engineering team at Cloudera and is looking forward to
>>>>>>> >>>> contributing
>>>>>>> >>>> >> to Apache Flink. Previously G primarily focused on Spark
>>>>>>> Streaming
>>>>>>> >>>> and
>>>>>>> >>>> >> security.
>>>>>>> >>>> >>
>>>>>>> >>>> >> Based on requests from our customers G has implemented
>>>>>>> Kerberos and
>>>>>>> >>>> HTTP
>>>>>>> >>>> >> Basic Authentication for the Flink Dashboard and
>>>>>>> HistoryServer.
>>>>>>> >>>> Previously
>>>>>>> >>>> >> lacked an authentication story.
>>>>>>> >>>> >>
>>>>>>> >>>> >> We are looking to contribute this functionality back to the
>>>>>>> >>>> community, we
>>>>>>> >>>> >> believe that given Flink's maturity there should be a common
>>>>>>> code
>>>>>>> >>>> solution
>>>>>>> >>>> >> for this general pattern.
>>>>>>> >>>> >>
>>>>>>> >>>> >> We are looking forward to your feedback on G's design. [2]
>>>>>>> >>>> >>
>>>>>>> >>>> >> [1] http://gaborsomogyi.com/
>>>>>>> >>>> >> [2]
>>>>>>> >>>> >>
>>>>>>> >>>> >>
>>>>>>> >>>>
>>>>>>> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
>>>>>>> >>>> >>
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>>
>>>>>>

Reply via email to