Hi Till,

Did you have the chance to take a look at the doc? Not yet seen any update.

BR,
G


On Wed, Jun 9, 2021 at 1:43 PM Till Rohrmann <trohrm...@apache.org> wrote:

> Thanks for the update Gabor. I'll take a look and respond in the document.
>
> Cheers,
> Till
>
> On Wed, Jun 9, 2021 at 12:59 PM Gabor Somogyi <gabor.g.somo...@gmail.com>
> wrote:
>
>> Hi Till,
>>
>> Your proxy suggestion has been considered in-depth and updated the FLIP
>> accordingly.
>> We've considered 2 proxy implementation (Nginx and Squid) but according
>> to our analysis and testing it's not suitable for the mentioned use-cases.
>> Please take a look at the rejected alternatives for detailed explanation.
>>
>> Thanks for your time in advance!
>>
>> BR,
>> G
>>
>>
>> On Fri, Jun 4, 2021 at 3:31 PM Till Rohrmann <trohrm...@apache.org>
>> wrote:
>>
>>> As I've said I am not a security expert and that's why I have to ask for
>>> clarification, Gabor. You are saying that if we configure a truststore for
>>> the REST endpoint with a single trusted certificate which has been
>>> generated by the operator of the Flink cluster, then the attacker can
>>> generate a new certificate, sign it and then talk to the Flink cluster if
>>> he has access to the node on which the REST endpoint runs? My understanding
>>> was that you need the corresponding private key which in my proposed setup
>>> would be under the control of the operator as well (e.g. stored in a
>>> keystore on the same machine but guarded by some secret). That way (if I am
>>> not mistaken), only the entity which has access to the keystore is able to
>>> talk to the Flink cluster.
>>>
>>> Maybe we are also getting our wires crossed here and are talking about
>>> different things.
>>>
>>> Thanks for listing the pros and cons of Kerberos. Concerning what other
>>> authentication mechanisms are used in the industry, I am not 100% sure.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Fri, Jun 4, 2021 at 11:09 AM Gabor Somogyi <gabor.g.somo...@gmail.com>
>>> wrote:
>>>
>>>> > I did not mean for the user to sign its own certificates but for the
>>>> operator of the cluster. Once the user request hits the proxy, it should no
>>>> longer be under his control. I think I do not fully understand yet why this
>>>> would not work.
>>>> I said it's not solving the authentication problem over any proxy. Even
>>>> if the operator is signing the certificate one can have access to an
>>>> internal node.
>>>> Such case anybody can craft certificates which is accepted by the
>>>> server. When it's accepted a bad guy can cancel jobs causing huge impacts.
>>>>
>>>> > Also, I am missing a bit the comparison of Kerberos to other
>>>> authentication mechanisms and why they were rejected in favour of Kerberos.
>>>> PROS:
>>>> * Since it's not depending on cloud provider and/or k8s or bare-metal
>>>> etc. deployment it's the biggest plus
>>>> * Centralized with tools and no need to write tons of tools around
>>>> * There are clients/tools on almost all OS-es and several languages
>>>> * Super huge users are using it for years in production w/o huge issues
>>>> * Provides cross-realm trust possibility amongst other features
>>>> * Several open source components using it which could increase
>>>> compatibility
>>>>
>>>> CONS:
>>>> * Not everybody using kerberos
>>>> * It would increase the code footprint but this is true for many
>>>> features (as a side note I'm here to maintain it)
>>>>
>>>> Feel free to add your points because it only represents a single
>>>> viewpoint.
>>>> Also if you have any better option for strong authentication please
>>>> share it and we can consider the pros/cons here.
>>>>
>>>> BR,
>>>> G
>>>>
>>>>
>>>> On Fri, Jun 4, 2021 at 10:32 AM Till Rohrmann <trohrm...@apache.org>
>>>> wrote:
>>>>
>>>>> I did not mean for the user to sign its own certificates but for the
>>>>> operator of the cluster. Once the user request hits the proxy, it should 
>>>>> no
>>>>> longer be under his control. I think I do not fully understand yet why 
>>>>> this
>>>>> would not work.
>>>>>
>>>>> What I would like to avoid is to add more complexity into Flink if
>>>>> there is an easy solution which fulfills the requirements. That's why I
>>>>> would like to exercise thoroughly through the different alternatives. 
>>>>> Also,
>>>>> I am missing a bit the comparison of Kerberos to other authentication
>>>>> mechanisms and why they were rejected in favour of Kerberos.
>>>>>
>>>>> Cheers,
>>>>> Till
>>>>>
>>>>> On Fri, Jun 4, 2021 at 10:26 AM Gyula Fóra <gyf...@apache.org> wrote:
>>>>>
>>>>>> Hi!
>>>>>>
>>>>>> I think there might be possible alternatives but it seems Kerberos on
>>>>>> the rest endpoint ticks all the right boxes and provides a super clean 
>>>>>> and
>>>>>> simple solution for strong authentication.
>>>>>>
>>>>>> I wouldn’t even consider sidecar proxies etc if we can solve it in
>>>>>> such a simple way as proposed by G.
>>>>>>
>>>>>> Cheers
>>>>>> Gyula
>>>>>>
>>>>>> On Fri, 4 Jun 2021 at 10:03, Till Rohrmann <trohrm...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> I am not saying that we shouldn't add a strong authentication
>>>>>>> mechanism if there are good reasons for it. I primarily would like to
>>>>>>> understand the context a bit better in order to give qualified feedback 
>>>>>>> and
>>>>>>> come to a good decision. In order to do this, I have the feeling that we
>>>>>>> haven't fully considered all available options which are on the table, 
>>>>>>> tbh.
>>>>>>>
>>>>>>> Does the problem of certificate expiry also apply for self-signed
>>>>>>> certificates? If yes, then this should then also be a problem for the
>>>>>>> internal encryption of Flink's communication. If not, then one could use
>>>>>>> self-signed certificates with a longer validity to solve the mentioned
>>>>>>> issue.
>>>>>>>
>>>>>>> I think you can set up Flink in such a way that you don't have to
>>>>>>> handle all the different certificates. For example, you could deploy 
>>>>>>> Flink
>>>>>>> with a "sidecar proxy" which is responsible for the authentication 
>>>>>>> using an
>>>>>>> arbitrary method (e.g. Kerberos) and then bind the REST endpoint to a 
>>>>>>> local
>>>>>>> network interface. That way, the REST endpoint would only be available
>>>>>>> through the sidecar proxy. Additionally, one could enable SSL for this
>>>>>>> communication. Would this be a solution for the problem?
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Till
>>>>>>>
>>>>>>> On Thu, Jun 3, 2021 at 10:46 PM Márton Balassi <
>>>>>>> balassi.mar...@gmail.com> wrote:
>>>>>>>
>>>>>>>> That is an interesting idea, Till.
>>>>>>>>
>>>>>>>> The main issue with it is that TLS certificates have an expiration
>>>>>>>> time, usually they get approved for a couple years. Forcing our users 
>>>>>>>> to
>>>>>>>> restart jobs to reprovision TLS certificates would be weird when we 
>>>>>>>> could
>>>>>>>> just implement a single proper strong authentication mechanism instead 
>>>>>>>> in a
>>>>>>>> couple hundred lines of code. :-)
>>>>>>>>
>>>>>>>> In many cases it is also impractical to go the TLS mutual route,
>>>>>>>> because the Flink Dashboard can end up on any node in the k8s/Yarn 
>>>>>>>> cluster
>>>>>>>> which means that we need a certificate per node (due to the mutual 
>>>>>>>> auth),
>>>>>>>> but if we also want to protect the private key of these from users
>>>>>>>> accidentally or intentionally leaking them then we need this per user. 
>>>>>>>> As
>>>>>>>> in we end up managing user*machine number certificates and having to 
>>>>>>>> renew
>>>>>>>> them periodically, which albeit automatable is unfortunately not yet
>>>>>>>> automated in all large organizations.
>>>>>>>>
>>>>>>>> I fully agree that TLS certificate mutual authentication has its
>>>>>>>> nice properties, especially at very large (multiple thousand node) 
>>>>>>>> clusters
>>>>>>>> - but it has its own challenges too. Thanks for bringing it up.
>>>>>>>>
>>>>>>>> Happy to have this added to the rejected alternative list so that
>>>>>>>> we have the full picture documented.
>>>>>>>>
>>>>>>>> On Thu, Jun 3, 2021 at 5:52 PM Till Rohrmann <trohrm...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I guess the idea would then be to let the proxy do the
>>>>>>>>> authentication job and only forward the request via an SSL mutually
>>>>>>>>> encrypted connection to the Flink cluster. Would this be possible? The
>>>>>>>>> beauty of this setup is in my opinion that this setup should work 
>>>>>>>>> with all
>>>>>>>>> kinds of authentication mechanisms.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Till
>>>>>>>>>
>>>>>>>>> On Thu, Jun 3, 2021 at 3:12 PM Gabor Somogyi <
>>>>>>>>> gabor.g.somo...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks for giving options to fulfil the need.
>>>>>>>>>>
>>>>>>>>>> Users are looking for a solution where users can be identified on
>>>>>>>>>> the whole cluster and restrict access to resources/actions.
>>>>>>>>>> A good example for such an action is cancelling other users
>>>>>>>>>> running jobs.
>>>>>>>>>>
>>>>>>>>>> * SSL does provide mutual authentication but when authentication
>>>>>>>>>> passed there is no user based on restrictions can be made.
>>>>>>>>>> * The less problematic part is that generating/maintaining short
>>>>>>>>>> time valid certificates would be a hard (that's the reason KDC like 
>>>>>>>>>> servers
>>>>>>>>>> exist).
>>>>>>>>>> Having long time valid certificates would widen the attack
>>>>>>>>>> surface but since the first concern is there this is just a cosmetic 
>>>>>>>>>> issue.
>>>>>>>>>>
>>>>>>>>>> All in all using TLS certificates is not sufficient in these
>>>>>>>>>> environments unfortunately.
>>>>>>>>>>
>>>>>>>>>> BR,
>>>>>>>>>> G
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Jun 3, 2021 at 12:49 PM Till Rohrmann <
>>>>>>>>>> trohrm...@apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks for the information Gabor. If it is about securing the
>>>>>>>>>>> communication between the REST client and the REST server, then 
>>>>>>>>>>> Flink
>>>>>>>>>>> already supports enabling mutual SSL authentication [1]. Would this 
>>>>>>>>>>> be
>>>>>>>>>>> enough to secure the communication and to pass an audit?
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/security/security-ssl/#external--rest-connectivity
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Till
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jun 3, 2021 at 10:33 AM Gabor Somogyi <
>>>>>>>>>>> gabor.g.somo...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Till,
>>>>>>>>>>>>
>>>>>>>>>>>> Since I'm working in security area 10+ years let me share my
>>>>>>>>>>>> thought.
>>>>>>>>>>>> I would like to emphasise there are experts better than me but
>>>>>>>>>>>> I have some
>>>>>>>>>>>> basics.
>>>>>>>>>>>> The discussion is open and not trying to tell alone things...
>>>>>>>>>>>>
>>>>>>>>>>>> > I mean if an attacker can get access to one of the machines,
>>>>>>>>>>>> then it
>>>>>>>>>>>> should also be possible to obtain the right Kerberos token.
>>>>>>>>>>>> Not necessarily. For example if one gets access to a specific
>>>>>>>>>>>> user's
>>>>>>>>>>>> credentials then it's not possible to compromise other user's
>>>>>>>>>>>> jobs, data,
>>>>>>>>>>>> etc...
>>>>>>>>>>>> Security is like an onion, the more layers has been added the
>>>>>>>>>>>> more time an
>>>>>>>>>>>> attacker needs to proceed.
>>>>>>>>>>>> At the end of the day if one is in, then most probably can find
>>>>>>>>>>>> the way but
>>>>>>>>>>>> this time is normally enough to sysadmins or security experts to
>>>>>>>>>>>> close down the system and minimize the damage.
>>>>>>>>>>>>
>>>>>>>>>>>> The other thing is that all tokens has a timeout and if the
>>>>>>>>>>>> token is
>>>>>>>>>>>> invalid then the attacker can't proceed further.
>>>>>>>>>>>>
>>>>>>>>>>>> > Is Kerberos also the standard authentication protocol for
>>>>>>>>>>>> Kubernetes
>>>>>>>>>>>> deployments?
>>>>>>>>>>>> Kerberos is an industry standard which is cloud/deployment
>>>>>>>>>>>> agnostic and it
>>>>>>>>>>>> can be used in any deployments including k8s.
>>>>>>>>>>>> The main intention is to use kerberos in k8s deployments too
>>>>>>>>>>>> since we're
>>>>>>>>>>>> going this direction as well.
>>>>>>>>>>>> Please see how Spark does this:
>>>>>>>>>>>>
>>>>>>>>>>>> https://spark.apache.org/docs/latest/security.html#secure-interaction-with-kubernetes
>>>>>>>>>>>>
>>>>>>>>>>>> Last but not least the most important reason to add at least
>>>>>>>>>>>> one strong
>>>>>>>>>>>> authentication is that we have users who has
>>>>>>>>>>>> hard requirements on this. They're doing security audits and if
>>>>>>>>>>>> they fail
>>>>>>>>>>>> then it's deal breaking.
>>>>>>>>>>>> That is why we have added kerberos at the first place.
>>>>>>>>>>>> Unfortunately we
>>>>>>>>>>>> can't name them in this public list, however
>>>>>>>>>>>> the customers who specifically asked for this were mainly in
>>>>>>>>>>>> the banking
>>>>>>>>>>>> and telco sector.
>>>>>>>>>>>>
>>>>>>>>>>>> BR,
>>>>>>>>>>>> G
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jun 3, 2021 at 9:20 AM Till Rohrmann <
>>>>>>>>>>>> trohrm...@apache.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> > Thanks for updating the document Márton. Why is it that banks
>>>>>>>>>>>> will
>>>>>>>>>>>> > consider it more secure if Flink comes with Kerberos
>>>>>>>>>>>> authentication
>>>>>>>>>>>> > (assuming a properly secured setup)? I mean if an attacker
>>>>>>>>>>>> can get access
>>>>>>>>>>>> > to one of the machines, then it should also be possible to
>>>>>>>>>>>> obtain the right
>>>>>>>>>>>> > Kerberos token.
>>>>>>>>>>>> >
>>>>>>>>>>>> > I am not an authentication expert and that's why I wanted to
>>>>>>>>>>>> ask what are
>>>>>>>>>>>> > other authentication protocols other than Kerberos? Why did
>>>>>>>>>>>> we select
>>>>>>>>>>>> > Kerberos and not any other authentication protocol? Maybe you
>>>>>>>>>>>> can list the
>>>>>>>>>>>> > pros and cons for the different protocols. Is Kerberos also
>>>>>>>>>>>> the standard
>>>>>>>>>>>> > authentication protocol for Kubernetes deployments? If not,
>>>>>>>>>>>> what would be
>>>>>>>>>>>> > the answer when deploying on K8s?
>>>>>>>>>>>> >
>>>>>>>>>>>> > Cheers,
>>>>>>>>>>>> > Till
>>>>>>>>>>>> >
>>>>>>>>>>>> > On Wed, Jun 2, 2021 at 12:07 PM Gabor Somogyi <
>>>>>>>>>>>> gabor.g.somo...@gmail.com>
>>>>>>>>>>>> > wrote:
>>>>>>>>>>>> >
>>>>>>>>>>>> >> Hi team,
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> Happy to be here and hope I can provide quality additions in
>>>>>>>>>>>> the future.
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> Thank you all for helpful the suggestions!
>>>>>>>>>>>> >> Considering them the FLIP has been modified and the work
>>>>>>>>>>>> continues on the
>>>>>>>>>>>> >> already existing Jira.
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> BR,
>>>>>>>>>>>> >> G
>>>>>>>>>>>> >>
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> On Wed, Jun 2, 2021 at 11:23 AM Márton Balassi <
>>>>>>>>>>>> balassi.mar...@gmail.com>
>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>> >>
>>>>>>>>>>>> >>> Thanks, Chesney - I totally missed that. Answered on the
>>>>>>>>>>>> ticket too, let
>>>>>>>>>>>> >>> us continue there then.
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> Till, I agree that we should keep this codepath as slim as
>>>>>>>>>>>> possible. It
>>>>>>>>>>>> >>> is an important design decision that we aim to keep the
>>>>>>>>>>>> list of
>>>>>>>>>>>> >>> authentication protocols to a minimum. We believe that this
>>>>>>>>>>>> should not be a
>>>>>>>>>>>> >>> primary concern of Flink and a trusted proxy service (for
>>>>>>>>>>>> example Apache
>>>>>>>>>>>> >>> Knox) should be used to enable a multitude of enduser
>>>>>>>>>>>> authentication
>>>>>>>>>>>> >>> mechanisms. The bare minimum of authentication mechanisms
>>>>>>>>>>>> to support
>>>>>>>>>>>> >>> consequently consist of a single strong authentication
>>>>>>>>>>>> protocol for which
>>>>>>>>>>>> >>> Kerberos is the enterprise solution and HTTP Basic primary
>>>>>>>>>>>> for development
>>>>>>>>>>>> >>> and light-weight scenarios.
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> Added the above wording to G's doc.
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>>
>>>>>>>>>>>> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> On Tue, Jun 1, 2021 at 11:47 AM Chesnay Schepler <
>>>>>>>>>>>> ches...@apache.org>
>>>>>>>>>>>> >>> wrote:
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>>> There's a related effort:
>>>>>>>>>>>> >>>> https://issues.apache.org/jira/browse/FLINK-21108
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> On 6/1/2021 10:14 AM, Till Rohrmann wrote:
>>>>>>>>>>>> >>>> > Hi Gabor, welcome to the Flink community!
>>>>>>>>>>>> >>>> >
>>>>>>>>>>>> >>>> > Thanks for sharing this proposal with the community
>>>>>>>>>>>> Márton. In
>>>>>>>>>>>> >>>> general, I
>>>>>>>>>>>> >>>> > agree that authentication is missing and that this is
>>>>>>>>>>>> required for
>>>>>>>>>>>> >>>> using
>>>>>>>>>>>> >>>> > Flink within an enterprise. The thing I am wondering is
>>>>>>>>>>>> whether this
>>>>>>>>>>>> >>>> > feature strictly needs to be implemented inside of Flink
>>>>>>>>>>>> or whether a
>>>>>>>>>>>> >>>> proxy
>>>>>>>>>>>> >>>> > setup could do the job? Have you considered this option?
>>>>>>>>>>>> If yes, then
>>>>>>>>>>>> >>>> it
>>>>>>>>>>>> >>>> > would be good to list it under the point of rejected
>>>>>>>>>>>> alternatives.
>>>>>>>>>>>> >>>> >
>>>>>>>>>>>> >>>> > I do see the benefit of implementing this feature inside
>>>>>>>>>>>> of Flink if
>>>>>>>>>>>> >>>> many
>>>>>>>>>>>> >>>> > users need it. If not, then it might be easier for the
>>>>>>>>>>>> project to not
>>>>>>>>>>>> >>>> > increase the surface area since it makes the overall
>>>>>>>>>>>> maintenance
>>>>>>>>>>>> >>>> harder.
>>>>>>>>>>>> >>>> >
>>>>>>>>>>>> >>>> > Cheers,
>>>>>>>>>>>> >>>> > Till
>>>>>>>>>>>> >>>> >
>>>>>>>>>>>> >>>> > On Mon, May 31, 2021 at 4:57 PM Márton Balassi <
>>>>>>>>>>>> mbala...@apache.org>
>>>>>>>>>>>> >>>> wrote:
>>>>>>>>>>>> >>>> >
>>>>>>>>>>>> >>>> >> Hi team,
>>>>>>>>>>>> >>>> >>
>>>>>>>>>>>> >>>> >> Firstly I would like to introduce Gabor or G [1] for
>>>>>>>>>>>> short to the
>>>>>>>>>>>> >>>> >> community, he is a Spark committer who has recently
>>>>>>>>>>>> transitioned to
>>>>>>>>>>>> >>>> the
>>>>>>>>>>>> >>>> >> Flink Engineering team at Cloudera and is looking
>>>>>>>>>>>> forward to
>>>>>>>>>>>> >>>> contributing
>>>>>>>>>>>> >>>> >> to Apache Flink. Previously G primarily focused on
>>>>>>>>>>>> Spark Streaming
>>>>>>>>>>>> >>>> and
>>>>>>>>>>>> >>>> >> security.
>>>>>>>>>>>> >>>> >>
>>>>>>>>>>>> >>>> >> Based on requests from our customers G has implemented
>>>>>>>>>>>> Kerberos and
>>>>>>>>>>>> >>>> HTTP
>>>>>>>>>>>> >>>> >> Basic Authentication for the Flink Dashboard and
>>>>>>>>>>>> HistoryServer.
>>>>>>>>>>>> >>>> Previously
>>>>>>>>>>>> >>>> >> lacked an authentication story.
>>>>>>>>>>>> >>>> >>
>>>>>>>>>>>> >>>> >> We are looking to contribute this functionality back to
>>>>>>>>>>>> the
>>>>>>>>>>>> >>>> community, we
>>>>>>>>>>>> >>>> >> believe that given Flink's maturity there should be a
>>>>>>>>>>>> common code
>>>>>>>>>>>> >>>> solution
>>>>>>>>>>>> >>>> >> for this general pattern.
>>>>>>>>>>>> >>>> >>
>>>>>>>>>>>> >>>> >> We are looking forward to your feedback on G's design.
>>>>>>>>>>>> [2]
>>>>>>>>>>>> >>>> >>
>>>>>>>>>>>> >>>> >> [1] http://gaborsomogyi.com/
>>>>>>>>>>>> >>>> >> [2]
>>>>>>>>>>>> >>>> >>
>>>>>>>>>>>> >>>> >>
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
>>>>>>>>>>>> >>>> >>
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>>
>>>>>>>>>>>>
>>>>>>>>>>>

Reply via email to