Hi Till,

Your proxy suggestion has been considered in-depth and updated the FLIP
accordingly.
We've considered 2 proxy implementation (Nginx and Squid) but according to
our analysis and testing it's not suitable for the mentioned use-cases.
Please take a look at the rejected alternatives for detailed explanation.

Thanks for your time in advance!

BR,
G


On Fri, Jun 4, 2021 at 3:31 PM Till Rohrmann <trohrm...@apache.org> wrote:

> As I've said I am not a security expert and that's why I have to ask for
> clarification, Gabor. You are saying that if we configure a truststore for
> the REST endpoint with a single trusted certificate which has been
> generated by the operator of the Flink cluster, then the attacker can
> generate a new certificate, sign it and then talk to the Flink cluster if
> he has access to the node on which the REST endpoint runs? My understanding
> was that you need the corresponding private key which in my proposed setup
> would be under the control of the operator as well (e.g. stored in a
> keystore on the same machine but guarded by some secret). That way (if I am
> not mistaken), only the entity which has access to the keystore is able to
> talk to the Flink cluster.
>
> Maybe we are also getting our wires crossed here and are talking about
> different things.
>
> Thanks for listing the pros and cons of Kerberos. Concerning what other
> authentication mechanisms are used in the industry, I am not 100% sure.
>
> Cheers,
> Till
>
> On Fri, Jun 4, 2021 at 11:09 AM Gabor Somogyi <gabor.g.somo...@gmail.com>
> wrote:
>
>> > I did not mean for the user to sign its own certificates but for the
>> operator of the cluster. Once the user request hits the proxy, it should no
>> longer be under his control. I think I do not fully understand yet why this
>> would not work.
>> I said it's not solving the authentication problem over any proxy. Even
>> if the operator is signing the certificate one can have access to an
>> internal node.
>> Such case anybody can craft certificates which is accepted by the server.
>> When it's accepted a bad guy can cancel jobs causing huge impacts.
>>
>> > Also, I am missing a bit the comparison of Kerberos to other
>> authentication mechanisms and why they were rejected in favour of Kerberos.
>> PROS:
>> * Since it's not depending on cloud provider and/or k8s or bare-metal
>> etc. deployment it's the biggest plus
>> * Centralized with tools and no need to write tons of tools around
>> * There are clients/tools on almost all OS-es and several languages
>> * Super huge users are using it for years in production w/o huge issues
>> * Provides cross-realm trust possibility amongst other features
>> * Several open source components using it which could increase
>> compatibility
>>
>> CONS:
>> * Not everybody using kerberos
>> * It would increase the code footprint but this is true for many features
>> (as a side note I'm here to maintain it)
>>
>> Feel free to add your points because it only represents a single
>> viewpoint.
>> Also if you have any better option for strong authentication please share
>> it and we can consider the pros/cons here.
>>
>> BR,
>> G
>>
>>
>> On Fri, Jun 4, 2021 at 10:32 AM Till Rohrmann <trohrm...@apache.org>
>> wrote:
>>
>>> I did not mean for the user to sign its own certificates but for the
>>> operator of the cluster. Once the user request hits the proxy, it should no
>>> longer be under his control. I think I do not fully understand yet why this
>>> would not work.
>>>
>>> What I would like to avoid is to add more complexity into Flink if there
>>> is an easy solution which fulfills the requirements. That's why I would
>>> like to exercise thoroughly through the different alternatives. Also, I am
>>> missing a bit the comparison of Kerberos to other authentication mechanisms
>>> and why they were rejected in favour of Kerberos.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Fri, Jun 4, 2021 at 10:26 AM Gyula Fóra <gyf...@apache.org> wrote:
>>>
>>>> Hi!
>>>>
>>>> I think there might be possible alternatives but it seems Kerberos on
>>>> the rest endpoint ticks all the right boxes and provides a super clean and
>>>> simple solution for strong authentication.
>>>>
>>>> I wouldn’t even consider sidecar proxies etc if we can solve it in such
>>>> a simple way as proposed by G.
>>>>
>>>> Cheers
>>>> Gyula
>>>>
>>>> On Fri, 4 Jun 2021 at 10:03, Till Rohrmann <trohrm...@apache.org>
>>>> wrote:
>>>>
>>>>> I am not saying that we shouldn't add a strong authentication
>>>>> mechanism if there are good reasons for it. I primarily would like to
>>>>> understand the context a bit better in order to give qualified feedback 
>>>>> and
>>>>> come to a good decision. In order to do this, I have the feeling that we
>>>>> haven't fully considered all available options which are on the table, 
>>>>> tbh.
>>>>>
>>>>> Does the problem of certificate expiry also apply for self-signed
>>>>> certificates? If yes, then this should then also be a problem for the
>>>>> internal encryption of Flink's communication. If not, then one could use
>>>>> self-signed certificates with a longer validity to solve the mentioned
>>>>> issue.
>>>>>
>>>>> I think you can set up Flink in such a way that you don't have to
>>>>> handle all the different certificates. For example, you could deploy Flink
>>>>> with a "sidecar proxy" which is responsible for the authentication using 
>>>>> an
>>>>> arbitrary method (e.g. Kerberos) and then bind the REST endpoint to a 
>>>>> local
>>>>> network interface. That way, the REST endpoint would only be available
>>>>> through the sidecar proxy. Additionally, one could enable SSL for this
>>>>> communication. Would this be a solution for the problem?
>>>>>
>>>>> Cheers,
>>>>> Till
>>>>>
>>>>> On Thu, Jun 3, 2021 at 10:46 PM Márton Balassi <
>>>>> balassi.mar...@gmail.com> wrote:
>>>>>
>>>>>> That is an interesting idea, Till.
>>>>>>
>>>>>> The main issue with it is that TLS certificates have an expiration
>>>>>> time, usually they get approved for a couple years. Forcing our users to
>>>>>> restart jobs to reprovision TLS certificates would be weird when we could
>>>>>> just implement a single proper strong authentication mechanism instead 
>>>>>> in a
>>>>>> couple hundred lines of code. :-)
>>>>>>
>>>>>> In many cases it is also impractical to go the TLS mutual route,
>>>>>> because the Flink Dashboard can end up on any node in the k8s/Yarn 
>>>>>> cluster
>>>>>> which means that we need a certificate per node (due to the mutual auth),
>>>>>> but if we also want to protect the private key of these from users
>>>>>> accidentally or intentionally leaking them then we need this per user. As
>>>>>> in we end up managing user*machine number certificates and having to 
>>>>>> renew
>>>>>> them periodically, which albeit automatable is unfortunately not yet
>>>>>> automated in all large organizations.
>>>>>>
>>>>>> I fully agree that TLS certificate mutual authentication has its nice
>>>>>> properties, especially at very large (multiple thousand node) clusters -
>>>>>> but it has its own challenges too. Thanks for bringing it up.
>>>>>>
>>>>>> Happy to have this added to the rejected alternative list so that we
>>>>>> have the full picture documented.
>>>>>>
>>>>>> On Thu, Jun 3, 2021 at 5:52 PM Till Rohrmann <trohrm...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> I guess the idea would then be to let the proxy do the
>>>>>>> authentication job and only forward the request via an SSL mutually
>>>>>>> encrypted connection to the Flink cluster. Would this be possible? The
>>>>>>> beauty of this setup is in my opinion that this setup should work with 
>>>>>>> all
>>>>>>> kinds of authentication mechanisms.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Till
>>>>>>>
>>>>>>> On Thu, Jun 3, 2021 at 3:12 PM Gabor Somogyi <
>>>>>>> gabor.g.somo...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Thanks for giving options to fulfil the need.
>>>>>>>>
>>>>>>>> Users are looking for a solution where users can be identified on
>>>>>>>> the whole cluster and restrict access to resources/actions.
>>>>>>>> A good example for such an action is cancelling other users running
>>>>>>>> jobs.
>>>>>>>>
>>>>>>>> * SSL does provide mutual authentication but when authentication
>>>>>>>> passed there is no user based on restrictions can be made.
>>>>>>>> * The less problematic part is that generating/maintaining short
>>>>>>>> time valid certificates would be a hard (that's the reason KDC like 
>>>>>>>> servers
>>>>>>>> exist).
>>>>>>>> Having long time valid certificates would widen the attack surface
>>>>>>>> but since the first concern is there this is just a cosmetic issue.
>>>>>>>>
>>>>>>>> All in all using TLS certificates is not sufficient in these
>>>>>>>> environments unfortunately.
>>>>>>>>
>>>>>>>> BR,
>>>>>>>> G
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jun 3, 2021 at 12:49 PM Till Rohrmann <trohrm...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks for the information Gabor. If it is about securing the
>>>>>>>>> communication between the REST client and the REST server, then Flink
>>>>>>>>> already supports enabling mutual SSL authentication [1]. Would this be
>>>>>>>>> enough to secure the communication and to pass an audit?
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/security/security-ssl/#external--rest-connectivity
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Till
>>>>>>>>>
>>>>>>>>> On Thu, Jun 3, 2021 at 10:33 AM Gabor Somogyi <
>>>>>>>>> gabor.g.somo...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Till,
>>>>>>>>>>
>>>>>>>>>> Since I'm working in security area 10+ years let me share my
>>>>>>>>>> thought.
>>>>>>>>>> I would like to emphasise there are experts better than me but I
>>>>>>>>>> have some
>>>>>>>>>> basics.
>>>>>>>>>> The discussion is open and not trying to tell alone things...
>>>>>>>>>>
>>>>>>>>>> > I mean if an attacker can get access to one of the machines,
>>>>>>>>>> then it
>>>>>>>>>> should also be possible to obtain the right Kerberos token.
>>>>>>>>>> Not necessarily. For example if one gets access to a specific
>>>>>>>>>> user's
>>>>>>>>>> credentials then it's not possible to compromise other user's
>>>>>>>>>> jobs, data,
>>>>>>>>>> etc...
>>>>>>>>>> Security is like an onion, the more layers has been added the
>>>>>>>>>> more time an
>>>>>>>>>> attacker needs to proceed.
>>>>>>>>>> At the end of the day if one is in, then most probably can find
>>>>>>>>>> the way but
>>>>>>>>>> this time is normally enough to sysadmins or security experts to
>>>>>>>>>> close down the system and minimize the damage.
>>>>>>>>>>
>>>>>>>>>> The other thing is that all tokens has a timeout and if the token
>>>>>>>>>> is
>>>>>>>>>> invalid then the attacker can't proceed further.
>>>>>>>>>>
>>>>>>>>>> > Is Kerberos also the standard authentication protocol for
>>>>>>>>>> Kubernetes
>>>>>>>>>> deployments?
>>>>>>>>>> Kerberos is an industry standard which is cloud/deployment
>>>>>>>>>> agnostic and it
>>>>>>>>>> can be used in any deployments including k8s.
>>>>>>>>>> The main intention is to use kerberos in k8s deployments too
>>>>>>>>>> since we're
>>>>>>>>>> going this direction as well.
>>>>>>>>>> Please see how Spark does this:
>>>>>>>>>>
>>>>>>>>>> https://spark.apache.org/docs/latest/security.html#secure-interaction-with-kubernetes
>>>>>>>>>>
>>>>>>>>>> Last but not least the most important reason to add at least one
>>>>>>>>>> strong
>>>>>>>>>> authentication is that we have users who has
>>>>>>>>>> hard requirements on this. They're doing security audits and if
>>>>>>>>>> they fail
>>>>>>>>>> then it's deal breaking.
>>>>>>>>>> That is why we have added kerberos at the first place.
>>>>>>>>>> Unfortunately we
>>>>>>>>>> can't name them in this public list, however
>>>>>>>>>> the customers who specifically asked for this were mainly in the
>>>>>>>>>> banking
>>>>>>>>>> and telco sector.
>>>>>>>>>>
>>>>>>>>>> BR,
>>>>>>>>>> G
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Jun 3, 2021 at 9:20 AM Till Rohrmann <
>>>>>>>>>> trohrm...@apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>> > Thanks for updating the document Márton. Why is it that banks
>>>>>>>>>> will
>>>>>>>>>> > consider it more secure if Flink comes with Kerberos
>>>>>>>>>> authentication
>>>>>>>>>> > (assuming a properly secured setup)? I mean if an attacker can
>>>>>>>>>> get access
>>>>>>>>>> > to one of the machines, then it should also be possible to
>>>>>>>>>> obtain the right
>>>>>>>>>> > Kerberos token.
>>>>>>>>>> >
>>>>>>>>>> > I am not an authentication expert and that's why I wanted to
>>>>>>>>>> ask what are
>>>>>>>>>> > other authentication protocols other than Kerberos? Why did we
>>>>>>>>>> select
>>>>>>>>>> > Kerberos and not any other authentication protocol? Maybe you
>>>>>>>>>> can list the
>>>>>>>>>> > pros and cons for the different protocols. Is Kerberos also the
>>>>>>>>>> standard
>>>>>>>>>> > authentication protocol for Kubernetes deployments? If not,
>>>>>>>>>> what would be
>>>>>>>>>> > the answer when deploying on K8s?
>>>>>>>>>> >
>>>>>>>>>> > Cheers,
>>>>>>>>>> > Till
>>>>>>>>>> >
>>>>>>>>>> > On Wed, Jun 2, 2021 at 12:07 PM Gabor Somogyi <
>>>>>>>>>> gabor.g.somo...@gmail.com>
>>>>>>>>>> > wrote:
>>>>>>>>>> >
>>>>>>>>>> >> Hi team,
>>>>>>>>>> >>
>>>>>>>>>> >> Happy to be here and hope I can provide quality additions in
>>>>>>>>>> the future.
>>>>>>>>>> >>
>>>>>>>>>> >> Thank you all for helpful the suggestions!
>>>>>>>>>> >> Considering them the FLIP has been modified and the work
>>>>>>>>>> continues on the
>>>>>>>>>> >> already existing Jira.
>>>>>>>>>> >>
>>>>>>>>>> >> BR,
>>>>>>>>>> >> G
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >> On Wed, Jun 2, 2021 at 11:23 AM Márton Balassi <
>>>>>>>>>> balassi.mar...@gmail.com>
>>>>>>>>>> >> wrote:
>>>>>>>>>> >>
>>>>>>>>>> >>> Thanks, Chesney - I totally missed that. Answered on the
>>>>>>>>>> ticket too, let
>>>>>>>>>> >>> us continue there then.
>>>>>>>>>> >>>
>>>>>>>>>> >>> Till, I agree that we should keep this codepath as slim as
>>>>>>>>>> possible. It
>>>>>>>>>> >>> is an important design decision that we aim to keep the list
>>>>>>>>>> of
>>>>>>>>>> >>> authentication protocols to a minimum. We believe that this
>>>>>>>>>> should not be a
>>>>>>>>>> >>> primary concern of Flink and a trusted proxy service (for
>>>>>>>>>> example Apache
>>>>>>>>>> >>> Knox) should be used to enable a multitude of enduser
>>>>>>>>>> authentication
>>>>>>>>>> >>> mechanisms. The bare minimum of authentication mechanisms to
>>>>>>>>>> support
>>>>>>>>>> >>> consequently consist of a single strong authentication
>>>>>>>>>> protocol for which
>>>>>>>>>> >>> Kerberos is the enterprise solution and HTTP Basic primary
>>>>>>>>>> for development
>>>>>>>>>> >>> and light-weight scenarios.
>>>>>>>>>> >>>
>>>>>>>>>> >>> Added the above wording to G's doc.
>>>>>>>>>> >>>
>>>>>>>>>> >>>
>>>>>>>>>> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
>>>>>>>>>> >>>
>>>>>>>>>> >>>
>>>>>>>>>> >>>
>>>>>>>>>> >>> On Tue, Jun 1, 2021 at 11:47 AM Chesnay Schepler <
>>>>>>>>>> ches...@apache.org>
>>>>>>>>>> >>> wrote:
>>>>>>>>>> >>>
>>>>>>>>>> >>>> There's a related effort:
>>>>>>>>>> >>>> https://issues.apache.org/jira/browse/FLINK-21108
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> On 6/1/2021 10:14 AM, Till Rohrmann wrote:
>>>>>>>>>> >>>> > Hi Gabor, welcome to the Flink community!
>>>>>>>>>> >>>> >
>>>>>>>>>> >>>> > Thanks for sharing this proposal with the community
>>>>>>>>>> Márton. In
>>>>>>>>>> >>>> general, I
>>>>>>>>>> >>>> > agree that authentication is missing and that this is
>>>>>>>>>> required for
>>>>>>>>>> >>>> using
>>>>>>>>>> >>>> > Flink within an enterprise. The thing I am wondering is
>>>>>>>>>> whether this
>>>>>>>>>> >>>> > feature strictly needs to be implemented inside of Flink
>>>>>>>>>> or whether a
>>>>>>>>>> >>>> proxy
>>>>>>>>>> >>>> > setup could do the job? Have you considered this option?
>>>>>>>>>> If yes, then
>>>>>>>>>> >>>> it
>>>>>>>>>> >>>> > would be good to list it under the point of rejected
>>>>>>>>>> alternatives.
>>>>>>>>>> >>>> >
>>>>>>>>>> >>>> > I do see the benefit of implementing this feature inside
>>>>>>>>>> of Flink if
>>>>>>>>>> >>>> many
>>>>>>>>>> >>>> > users need it. If not, then it might be easier for the
>>>>>>>>>> project to not
>>>>>>>>>> >>>> > increase the surface area since it makes the overall
>>>>>>>>>> maintenance
>>>>>>>>>> >>>> harder.
>>>>>>>>>> >>>> >
>>>>>>>>>> >>>> > Cheers,
>>>>>>>>>> >>>> > Till
>>>>>>>>>> >>>> >
>>>>>>>>>> >>>> > On Mon, May 31, 2021 at 4:57 PM Márton Balassi <
>>>>>>>>>> mbala...@apache.org>
>>>>>>>>>> >>>> wrote:
>>>>>>>>>> >>>> >
>>>>>>>>>> >>>> >> Hi team,
>>>>>>>>>> >>>> >>
>>>>>>>>>> >>>> >> Firstly I would like to introduce Gabor or G [1] for
>>>>>>>>>> short to the
>>>>>>>>>> >>>> >> community, he is a Spark committer who has recently
>>>>>>>>>> transitioned to
>>>>>>>>>> >>>> the
>>>>>>>>>> >>>> >> Flink Engineering team at Cloudera and is looking forward
>>>>>>>>>> to
>>>>>>>>>> >>>> contributing
>>>>>>>>>> >>>> >> to Apache Flink. Previously G primarily focused on Spark
>>>>>>>>>> Streaming
>>>>>>>>>> >>>> and
>>>>>>>>>> >>>> >> security.
>>>>>>>>>> >>>> >>
>>>>>>>>>> >>>> >> Based on requests from our customers G has implemented
>>>>>>>>>> Kerberos and
>>>>>>>>>> >>>> HTTP
>>>>>>>>>> >>>> >> Basic Authentication for the Flink Dashboard and
>>>>>>>>>> HistoryServer.
>>>>>>>>>> >>>> Previously
>>>>>>>>>> >>>> >> lacked an authentication story.
>>>>>>>>>> >>>> >>
>>>>>>>>>> >>>> >> We are looking to contribute this functionality back to
>>>>>>>>>> the
>>>>>>>>>> >>>> community, we
>>>>>>>>>> >>>> >> believe that given Flink's maturity there should be a
>>>>>>>>>> common code
>>>>>>>>>> >>>> solution
>>>>>>>>>> >>>> >> for this general pattern.
>>>>>>>>>> >>>> >>
>>>>>>>>>> >>>> >> We are looking forward to your feedback on G's design. [2]
>>>>>>>>>> >>>> >>
>>>>>>>>>> >>>> >> [1] http://gaborsomogyi.com/
>>>>>>>>>> >>>> >> [2]
>>>>>>>>>> >>>> >>
>>>>>>>>>> >>>> >>
>>>>>>>>>> >>>>
>>>>>>>>>> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
>>>>>>>>>> >>>> >>
>>>>>>>>>> >>>>
>>>>>>>>>> >>>>
>>>>>>>>>>
>>>>>>>>>

Reply via email to