Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-02-04 Thread David Morávek
There is a really strong tendency to push many things out of Flink lately and keep the main building blocks only. However I really think that this belongs between the minimal building blocks that Flink should provide out of the box. It's also very likely that security topics will start getting more

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-02-04 Thread Gyula Fóra
Hi Chesnay, Thanks for the proposal for the alternative mechanism. I see the conceptual value of separating this process from Flink but in practice I feel there are a few very serious limitations with that. Just a few points that come to mind: 1. Implementing this as independent distributed proce

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-02-04 Thread Chesnay Schepler
The concrete proposal would be to add a generic process startup lifecycle hook (essentially a Consumer), that is run at the start of each processs (JobManager, TaskManager, HistoryServer (, CLI?). Everything else would be left to the implementation which would live outside of Flink. For this

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-02-04 Thread Gabor Somogyi
Hi All, First of all sorry that I've taken couple of mails heavily! I've had an impression after we've invested roughly 2 months into the FLIP it's moving to a rejection without alternative what we can work on. That said earlier which still stands if there is a better idea how that could be solve

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-02-03 Thread Till Rohrmann
Sorry I didn't want to offend anybody if it was perceived like this. I can see that me joining very late into the discussion w/o constructive ideas was not nice. My motivation for asking for the reasoning behind the current design proposal is primarily the lack of Kerberos knowledge. Moreover, it h

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-02-03 Thread Gyula Fóra
Hi Team! Let's all calm down a little and not let our emotions affect the discussion too much. There has been a lot of effort spent from all involved parties so this is quite understandable :) Even though not everyone said this explicitly, it seems that everyone more or less agrees that a feature

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-02-03 Thread Chesnay Schepler
First of, at no point have we questioned the use-case and importance of this feature, and the fact that David, Till and me spent time looking at the FLIP, asking questions, and discussing different aspects of it should make this obvious. I'd appreciate it if you didn't dismiss our replies that

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-02-03 Thread Gabor Somogyi
> And even if we do it like this, there is no guarantee that it works because there can be other applications bombing the KDC with requests. 1. The main issue to solve here is that workloads using delegation tokens are stopping after 7 days with default configuration. 2. This is not new design, it

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-02-03 Thread Gyula Fóra
Hi Till! The delegation token framework solves a few production problems, KDC scalability is just one and probably not the most important. As Gabor has explained some of which are: - Solves the problem for token renewal for long running jobs which would currently time out and die - Improves sec

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-02-03 Thread Till Rohrmann
I don't have a good alternative solution but it sounds to me a bit as if we are trying to solve Kerberos' scalability problems within Flink. And even if we do it like this, there is no guarantee that it works because there can be other applications bombing the KDC with requests. From a maintainabil

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-02-03 Thread Gabor Somogyi
Oh and the most important reason I've forgotten. Without the feature in the FLIP all secure workloads with delegation tokens are going to stop when tokens are reaching it's max lifetime 🙂 This is around 7 days with default config... On Thu, Feb 3, 2022 at 5:30 PM Gabor Somogyi wrote: > That's no

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-02-03 Thread Gabor Somogyi
That's not the single purpose of the feature but in some environments it caused problems. The main intention is not to deploy keytab to all the nodes because the attack surface is bigger + reduce the KDC load. I've already described the situation previously in this thread so copying it here. -

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-02-03 Thread Chesnay Schepler
What I don't understand is how this could overload the KDC. Aren't tokens valid for a relatively long time period? For new deployments where many TMs are started at once I could imagine it temporarily, but shouldn't the accesses to the KDC eventually naturally spread out? The FLIP mentions s

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-02-03 Thread Gabor Somogyi
> I would prefer not choosing the first option Then the second option may play only. > I am not a Kerberos expert but is it really so that every application that wants to use Kerberos needs to implement the token propagation itself? This somehow feels as if there is something missing. OK, so fir

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-02-03 Thread Till Rohrmann
I would prefer not choosing the first option > Make the TM accept tasks only after registration(not sure if it's possible or makes sense at all) because it effectively means that we change how Flink's component lifecycle works for distributing Kerberos tokens. It also effectively means that a TM

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-02-03 Thread Gabor Somogyi
> Isn't this something the underlying resource management system could do or which every process could do on its own? I was looking for such feature but not found. Maybe we can solve the propagation easier but then I'm waiting on better suggestion. If anybody has better/more simple idea then plea

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-02-03 Thread Till Rohrmann
Hi everyone, Sorry for joining this discussion late. I also did not read all responses in this thread so my question might already be answered: Why does Flink need to be involved in the propagation of the tokens? Why do we need explicit RPC calls in the Flink domain? Isn't this something the under

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-02-03 Thread Chesnay Schepler
Here's an example for the TM to run workloads without being connected to the RM, while potentially having a valid token: 1. TM registers at RM 2. JobMaster requests slot from RM -> TM gets notified 3. JM fails over 4. TM re-offers the slot to the failed over JobMaster 5. TM reconnects to RM at s

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-02-03 Thread Gabor Somogyi
> but it can happen that the JobMaster+TM collaborate to run stuff without the TM being registered at the RM Honestly I'm not educated enough within Flink to give an example to such scenario. Until now I thought JM defines tasks to be done and TM just blindly connects to external systems and does

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-02-03 Thread Chesnay Schepler
> Just to learn something new. I think local recovery is clear to me which is not touching external systems like Kafka or so (correct me if I'm wrong). Is it possible that such case the user code just starts to run blindly w/o JM coordination and connects to external systems to do data processi

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-02-03 Thread Gabor Somogyi
> Any error in loading the provider (be it by accident or explicit checks) then is a setup error and we can fail the cluster. Fail fast is a good direction in my view. In Spark I wanted to go to this direction but there were other opinions so there if a provider is not loaded then the workload goe

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-02-03 Thread Chesnay Schepler
1) The manager certainly shouldn't check for specific implementations. The problem with classpath-based checks is it can easily happen that the provider can't be loaded in the first place (e.g., if you don't use reflection, which you currently kinda force), and in that case Flink can't tell whe

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-02-03 Thread Gabor Somogyi
Thanks for the quick response! Appreciate your invested time... G On Thu, Feb 3, 2022 at 11:12 AM Chesnay Schepler wrote: > Thanks for answering the questions! > > 1) Does the HBase provider require HBase to be on the classpath? > To be instantiated no, to obtain a token yes. > If so, th

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-02-03 Thread Chesnay Schepler
Thanks for answering the questions! 1) Does the HBase provider require HBase to be on the classpath?     If so, then could it even be loaded if Hbase is on the classpath?     If not, then you're assuming the classpath of the JM/TM to be the same, which isn't necessarily true (in general; and als

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-02-03 Thread Gabor Somogyi
Please see my answers inline. Hope provided satisfying answers to all questions. G On Thu, Feb 3, 2022 at 9:17 AM Chesnay Schepler wrote: > I have a few question that I'd appreciate if you could answer them. > >1. How does the Provider know whether it is required or not? > > All registered

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-02-03 Thread Chesnay Schepler
I have a few question that I'd appreciate if you could answer them. 1. How does the Provider know whether it is required or not? 2. How does the configuration of Providers work (how do they get access to a configuration)? 3. How does a user select providers? (Is it purely based on the provi

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-31 Thread Gabor Somogyi
Thanks for the confirmation, now it works! G On Mon, Jan 31, 2022 at 12:25 PM Chesnay Schepler wrote: > You should have permissions now. Note that I saw 2 accounts matching > your name, and I picked gaborgsomogyi. > > On 31/01/2022 11:28, Gabor Somogyi wrote: > > Not sure if the mentioned writ

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-31 Thread Chesnay Schepler
You should have permissions now. Note that I saw 2 accounts matching your name, and I picked gaborgsomogyi. On 31/01/2022 11:28, Gabor Somogyi wrote: Not sure if the mentioned write right already given or not but I still don't see any edit button. G On Fri, Jan 28, 2022 at 5:08 PM Gabor Somo

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-31 Thread Gabor Somogyi
Not sure if the mentioned write right already given or not but I still don't see any edit button. G On Fri, Jan 28, 2022 at 5:08 PM Gabor Somogyi wrote: > Hi Robert, > > That would be awesome. > > My cwiki username: gaborgsomogyi > > G > > > On Fri, Jan 28, 2022 at 5:06 PM Robert Metzger > wr

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-28 Thread Gabor Somogyi
Hi Robert, That would be awesome. My cwiki username: gaborgsomogyi G On Fri, Jan 28, 2022 at 5:06 PM Robert Metzger wrote: > Hey Gabor, > > let me know your cwiki username, and I can give you write permissions. > > > On Fri, Jan 28, 2022 at 4:05 PM Gabor Somogyi > wrote: > > > Thanks for ma

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-28 Thread Gabor Somogyi
We've made the changes both in the doc + wiki. Please have a look and notify me if I've missed something based on our agreement. G On Fri, Jan 28, 2022 at 4:04 PM Gabor Somogyi wrote: > Thanks for making the design better! No further thing to discuss from my > side. > > Started to reflect the

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-28 Thread Robert Metzger
Hey Gabor, let me know your cwiki username, and I can give you write permissions. On Fri, Jan 28, 2022 at 4:05 PM Gabor Somogyi wrote: > Thanks for making the design better! No further thing to discuss from my > side. > > Started to reflect the agreement in the FLIP doc. > Since I don't have a

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-28 Thread Gabor Somogyi
Thanks for making the design better! No further thing to discuss from my side. Started to reflect the agreement in the FLIP doc. Since I don't have access to the wiki I need to ask Marci to do that which may take some time. G On Fri, Jan 28, 2022 at 3:52 PM David Morávek wrote: > Hi, > > AFAI

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-28 Thread David Morávek
Hi, AFAIU an under registration TM is not added to the registered TMs map until > RegistrationResponse .. > I think you're right, with a careful design around threading (delegating update broadcasts to the main thread) + synchronous initial update (that would be nice to avoid) this should be doab

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-28 Thread Gabor Somogyi
> - Make sure DTs issued by single DTMs are monotonically increasing (can be sorted on TM side) AFAIU an under registration TM is not added to the registered TMs map until RegistrationResponse is processed which would contain the initial tokens. If that's true then how is it possible to have race

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-28 Thread David Morávek
We had a long discussion with Chesnay about the possible edge cases and it basically boils down to the following two scenarios: 1) There is a possible race condition between TM registration (the first DT update) and token refresh if they happen simultaneously. Than the registration might beat the

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-28 Thread Gabor Somogyi
Thanks for investing your time! The first 2 bulletpoint are clear. If there is a chance that a TM can go to an inconsistent state then I agree with the 3rd bulletpoint. Just before we agree on that I would like to learn something new and understand how is it possible that a TM gets corrupted? (In

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-28 Thread David Morávek
Hi Gabor, This is definitely headed in a right direction +1. I think we still need to have a safeguard in case some of the TMs gets into the inconsistent state though, which will also eliminate the need for implementing a custom retry mechanism (when _updateDelegationToken_ call fails for some re

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-26 Thread David Morávek
Thanks the update, I'll go over it tomorrow. On Wed, Jan 26, 2022 at 5:33 PM Gabor Somogyi wrote: > Hi All, > > Since it has turned out that DTM can't be added as member of JobMaster > < > https://github.com/gaborgsomogyi/flink/blob/8ab75e46013f159778ccfce52463e7bc63e395a9/flink-runtime/src/main

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-26 Thread Gabor Somogyi
Hi All, Since it has turned out that DTM can't be added as member of JobMaster I've came up with a better proposal. David, thank

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-25 Thread Gabor Somogyi
First of all thanks for investing your time and helping me out. As I see you have pretty solid knowledge in the RPC area. I would like to rely on your knowledge since I'm learning this part. > - Do we need to introduce a new RPC method or can we for example piggyback on heartbeats? I'm fine with

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-24 Thread David Morávek
> > Do we need to introduce a new RPC method or can we for example piggyback > on heartbeats? > Seems we can use the very same approach as _ResourceManagerPartitionTracker_ is using: - _TaskManagers_ periodically report which token they're using (eg. identified by some id). This involves adding a

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-24 Thread David Morávek
> > Could you point to a code where you think it could be added exactly? A > helping hand is welcome here 🙂 > I think you can take a look at _ResourceManagerPartitionTracker_ [1] which seems to have somewhat similar properties to the DTM. One topic that needs to be addressed there is how the RPC

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-24 Thread Gabor Somogyi
> There is a separate JobMaster for each job within a Flink cluster and each JobMaster only has a partial view of the task managers Good point! I've had a deeper look and you're right. We definitely need to find another place. > Related per-cluster or per-job keytab: In the current code per-clus

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-24 Thread David Morávek
Hi Gabor, There is actually a huge difference between JobManager (process) and JobMaster (job coordinator). The naming is unfortunately bit misleading here from historical reasons. There is a separate JobMaster for each job within a Flink cluster and each JobMaster only has a partial view of the t

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-21 Thread Gabor Somogyi
Here is the exact class, I'm from mobile so not had a look at the exact class name: https://github.com/gaborgsomogyi/flink/blob/8ab75e46013f159778ccfce52463e7bc63e395a9/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java#L176 That keeps track of TMs where the tokens can be

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-21 Thread David Morávek
> > JobManager is the Flink class. There is no such class in Flink. The closest thing to the JobManager is a ClusterEntrypoint. The cluster entrypoint spawns new RM Runner & Dispatcher Runner that start participating in the leader election. Once they gain leadership they spawn the actual underlyi

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-21 Thread Gabor Somogyi
> I think we might both mean something different by the RM. You feel it well, I've not specified these terms well in the explanation. RM I meant resource management framework. JobManager is the Flink class. This means that inside JM instance there will be a DTM instance, so they would have the sam

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-21 Thread David Morávek
Hi Gabor, 1. One thing is important, token management is planned to be done > generically within Flink and not scattered in RM specific code. JobManager > has a DelegationTokenManager which obtains tokens time-to-time (if > configured properly). JM knows which TaskManagers are in place so it can >

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-21 Thread Gabor Somogyi
Oh and one more thing. I'm planning to add this feature in small chunk of PRs because security is super hairy area. That way reviewers can be more easily obtains the concept. On Fri, 21 Jan 2022, 18:03 David Morávek, wrote: > Hi Gabor, > > thanks for drafting the FLIP, I think having a solid Ker

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-21 Thread Gabor Somogyi
1. One thing is important, token management is planned to be done generically within Flink and not scattered in RM specific code. JobManager has a DelegationTokenManager which obtains tokens time-to-time (if configured properly). JM knows which TaskManagers are in place so it can distribute it to a

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-21 Thread David Morávek
Hi Gabor, thanks for drafting the FLIP, I think having a solid Kerberos support is crucial for many enterprise deployments. I have multiple questions regarding the implementation (note that I have very limited knowledge of Kerberos): 1) If I understand it correctly, we'll only obtain tokens in t

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-13 Thread Junfan Zhang
Hi G Thanks for your explain in detail. I have gotten your thoughts, and any way this proposal is a great improvement. Looking forward to your implementation and i will keep focus on it. Thanks again. Best JunFan. On Jan 13, 2022, 9:20 PM +0800, Gabor Somogyi , wrote: > Just to confirm keeping

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-13 Thread Gabor Somogyi
Just to confirm keeping "security.kerberos.fetch.delegation-token" is added to the doc. BR, G On Thu, Jan 13, 2022 at 1:34 PM Gabor Somogyi wrote: > Hi JunFan, > > > By the way, maybe this should be added in the migration plan or > intergation section in the FLIP-211. > > Going to add this so

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-13 Thread Gabor Somogyi
Hi JunFan, > By the way, maybe this should be added in the migration plan or intergation section in the FLIP-211. Going to add this soon. > Besides, I have a question that the KDC will collapse when the cluster reached 200 nodes you described in the google doc. Do you have any attachment or ref

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-13 Thread 张俊帆
Hi G Thanks for your quick reply. I think reserving the config of *security.kerberos.fetch.delegation-token* and simplifying disable the token fetching is a good idea.By the way, maybe this should be added in the migration plan or intergation section in the FLIP-211. Besides, I have a question

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-13 Thread Gabor Somogyi
Hi Junfan, Thanks for investing your time to make this feature better. I've had a look at FLINK-21700 and now I think I see your point (plz correct me if I misunderstood something). According to the actual plans *security.kerberos.fetch.delegation-token* is intended to be removed because *securit

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-12 Thread 张俊帆
Hi G, Thanks for starting the discussion. I think this is a important improvement for Flink. The proposal looks good to me. And I focus on one point. 1. Hope that keeping the consistent with current implementation, we rely on the config of  'security.kerberos.fetch.delegation-token’ to submit F

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-11 Thread Márton Balassi
Hi G, Thanks for taking this challenge on. Scalable Kerberos authentication support is important for Flink, delegation tokens is a great mechanism to future-proof this. I second your assessment that the existing implementation could use some improvement too and like the approach you have outlined.

[DISCUSS] FLIP-211: Kerberos delegation token framework

2022-01-11 Thread Gabor Somogyi
Hi All, Hope all of you have enjoyed the holiday season. I would like to start the discussion on FLIP-211 which aims to provide a Kerberos delegation token framework that /obtains/renews/distribute