Thanks the update, I'll go over it tomorrow. On Wed, Jan 26, 2022 at 5:33 PM Gabor Somogyi <gabor.g.somo...@gmail.com> wrote:
> Hi All, > > Since it has turned out that DTM can't be added as member of JobMaster > < > https://github.com/gaborgsomogyi/flink/blob/8ab75e46013f159778ccfce52463e7bc63e395a9/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java#L176 > > > I've > came up with a better proposal. > David, thanks for pinpointing this out, you've caught a bug in the early > phase! > > Namely ResourceManager > < > https://github.com/apache/flink/blob/674bc96662285b25e395fd3dddf9291a602fc183/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManager.java#L124 > > > is > a single instance class where DTM can be added as member variable. > It has a list of all already registered TMs and new TM registration is also > happening here. > The following can be added from logic perspective to be more specific: > * Create new DTM instance in ResourceManager > < > https://github.com/apache/flink/blob/674bc96662285b25e395fd3dddf9291a602fc183/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManager.java#L124 > > > and > start it (re-occurring thread to obtain new tokens) > * Add a new function named "updateDelegationTokens" to TaskExecutorGateway > < > https://github.com/apache/flink/blob/674bc96662285b25e395fd3dddf9291a602fc183/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskExecutorGateway.java#L54 > > > * Call "updateDelegationTokens" on all registered TMs to propagate new DTs > * In case of new TM registration call "updateDelegationTokens" before > registration succeeds to setup new TM properly > > This way: > * only a single DTM would live within a cluster which is the expected > behavior > * DTM is going to be added to a central place where all deployment target > can make use of it > * DTs are going to be pushed to TMs which would generate less network > traffic than pull based approach > (please see my previous mail where I've described both approaches) > * HA scenario is going to be consistent because such > < > https://github.com/apache/flink/blob/674bc96662285b25e395fd3dddf9291a602fc183/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskExecutor.java#L1069 > > > a solution can be added to "updateDelegationTokens" > > @David or all others plz share whether you agree on this or you have better > idea/suggestion. > > BR, > G > > > On Tue, Jan 25, 2022 at 11:00 AM Gabor Somogyi <gabor.g.somo...@gmail.com> > wrote: > > > First of all thanks for investing your time and helping me out. As I see > > you have pretty solid knowledge in the RPC area. > > I would like to rely on your knowledge since I'm learning this part. > > > > > - Do we need to introduce a new RPC method or can we for example > > piggyback > > on heartbeats? > > > > I'm fine with either solution but one thing is important conceptually. > > There are fundamentally 2 ways how tokens can be updated: > > - Push way: When there are new DTs then JM JVM pushes DTs to TM JVMs. > This > > is the preferred one since tiny amount of control logic needed. > > - Pull way: Each time a TM would like to poll JM whether there are new > > tokens and each TM wants to decide alone whether DTs needs to be updated > or > > not. > > As you've mentioned here some ID needs to be generated, it would > generated > > quite some additional network traffic which can be definitely avoided. > > As a final thought in Spark we've had this way of DT propagation logic > and > > we've had major issues with it. > > > > So all in all DTM needs to obtain new tokens and there must a way to send > > this data to all TMs from JM. > > > > > - What delivery semantics are we looking for? (what if we're only able > to > > update subset of TMs / what happens if we exhaust retries / should we > even > > have the retry mechanism whatsoever) - I have a feeling that somehow > > leveraging the existing heartbeat mechanism could help to answer these > > questions > > > > Let's go through these questions one by one. > > > What delivery semantics are we looking for? > > > > DTM must receive an exception when at least one TM was not able to get > DTs. > > > > > what if we're only able to update subset of TMs? > > > > Such case DTM will reschedule token obtain after > > "security.kerberos.tokens.retry-wait" time. > > > > > what happens if we exhaust retries? > > > > There is no number of retries. In default configuration tokens needs to > be > > re-obtained after one day. > > DTM tries to obtain new tokens after 1day * 0.75 > > (security.kerberos.tokens.renewal-ratio) = 18 hours. > > When fails it retries after "security.kerberos.tokens.retry-wait" which > is > > 1 hour by default. > > If it never succeeds then authentication error is going to happen on the > > TM side and the workload is > > going to stop. > > > > > should we even have the retry mechanism whatsoever? > > > > Yes, because there are always temporary cluster issues. > > > > > What does it mean for the running application (how does this look like > > from > > the user perspective)? As far as I remember the logs are only collected > > ("aggregated") after the container is stopped, is that correct? > > > > With default config it works like that but it can be forced to aggregate > > at specific intervals. > > A useful feature is forcing YARN to aggregate logs while the job is still > > running. > > For long-running jobs such as streaming jobs, this is invaluable. To do > > this, > > yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds must be > > set to a non-negative value. > > When this is set, a timer will be set for the given duration, and > whenever > > that timer goes off, > > log aggregation will run on new files. > > > > > I think > > this topic should get its own section in the FLIP (having some cross > > reference to YARN ticket would be really useful, but I'm not sure if > there > > are any). > > > > I think this is important knowledge but this FLIP is not touching the > > already existing behavior. > > DTs are set on the AM container which is renewed by YARN until it's not > > possible anymore. > > Any kind of new code is not going to change this limitation. BTW, there > is > > no jira for this. > > If you think it worth to write this down then I think the good place is > > the official security doc > > area as caveat. > > > > > If we split the FLIP into two parts / sections that I've suggested, I > > don't > > really think that you need to explicitly test for each deployment > scenario > > / cluster framework, because the DTM part is completely independent of > the > > deployment target. Basically this is what I'm aiming for with "making it > > work with the standalone" (as simple as starting a new java process) > Flink > > first (which is also how most people deploy streaming application on k8s > > and the direction we're pushing forward with the auto-scaling / reactive > > mode initiatives). > > > > I see your point and agree the main direction. k8s is the megatrend which > > most of the peoples > > will use sooner or later. Not 100% sure what kind of split you suggest > but > > in my view > > the main target is to add this feature and I'm open to any logical work > > ordering. > > Please share the specific details and we work it out... > > > > G > > > > > > On Mon, Jan 24, 2022 at 3:04 PM David Morávek <d...@apache.org> wrote: > > > >> > > >> > Could you point to a code where you think it could be added exactly? A > >> > helping hand is welcome here 🙂 > >> > > >> > >> I think you can take a look at _ResourceManagerPartitionTracker_ [1] > which > >> seems to have somewhat similar properties to the DTM. > >> > >> One topic that needs to be addressed there is how the RPC with the > >> _TaskExecutorGateway_ should look like. > >> - Do we need to introduce a new RPC method or can we for example > piggyback > >> on heartbeats? > >> - What delivery semantics are we looking for? (what if we're only able > to > >> update subset of TMs / what happens if we exhaust retries / should we > even > >> have the retry mechanism whatsoever) - I have a feeling that somehow > >> leveraging the existing heartbeat mechanism could help to answer these > >> questions > >> > >> In short, after DT reaches it's max lifetime then log aggregation stops > >> > > >> > >> What does it mean for the running application (how does this look like > >> from > >> the user perspective)? As far as I remember the logs are only collected > >> ("aggregated") after the container is stopped, is that correct? I think > >> this topic should get its own section in the FLIP (having some cross > >> reference to YARN ticket would be really useful, but I'm not sure if > there > >> are any). > >> > >> All deployment modes (per-job, per-app, ...) are planned to be tested > and > >> > expect to work with the initial implementation however not all > >> deployment > >> > targets (k8s, local, ... > >> > > >> > >> If we split the FLIP into two parts / sections that I've suggested, I > >> don't > >> really think that you need to explicitly test for each deployment > scenario > >> / cluster framework, because the DTM part is completely independent of > the > >> deployment target. Basically this is what I'm aiming for with "making it > >> work with the standalone" (as simple as starting a new java process) > Flink > >> first (which is also how most people deploy streaming application on k8s > >> and the direction we're pushing forward with the auto-scaling / reactive > >> mode initiatives). > >> > >> The whole integration with YARN (let's forget about log aggregation for > a > >> moment) / k8s-native only boils down to how do we make the keytab file > >> local to the JobManager so the DTM can read it, so it's basically built > on > >> top of that. The only special thing that needs to be tested there is the > >> "keytab distribution" code path. > >> > >> [1] > >> > >> > https://github.com/apache/flink/blob/release-1.14.3/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/ResourceManagerPartitionTracker.java > >> > >> Best, > >> D. > >> > >> On Mon, Jan 24, 2022 at 12:35 PM Gabor Somogyi < > gabor.g.somo...@gmail.com > >> > > >> wrote: > >> > >> > > There is a separate JobMaster for each job > >> > within a Flink cluster and each JobMaster only has a partial view of > the > >> > task managers > >> > > >> > Good point! I've had a deeper look and you're right. We definitely > need > >> to > >> > find another place. > >> > > >> > > Related per-cluster or per-job keytab: > >> > > >> > In the current code per-cluster keytab is implemented and I'm intended > >> to > >> > keep it like this within this FLIP. The reason is simple: tokens on TM > >> side > >> > can be stored within the UserGroupInformation (UGI) structure which is > >> > global. I'm not telling it's impossible to change that but I think > that > >> > this is such a complexity which the initial implementation is not > >> required > >> > to contain. Additionally we've not seen such need from user side. If > the > >> > need may rise later on then another FLIP with this topic can be > created > >> and > >> > discussed. Proper multi-UGI handling within a single JVM is a topic > >> where > >> > several round of deep-dive with the Hadoop/YARN guys are required. > >> > > >> > > single DTM instance embedded with > >> > the ResourceManager (the Flink component) > >> > > >> > Could you point to a code where you think it could be added exactly? A > >> > helping hand is welcome here🙂 > >> > > >> > > Then the single (initial) implementation should work with all the > >> > deployments modes out of the box (which is not what the FLIP > suggests). > >> Is > >> > that correct? > >> > > >> > All deployment modes (per-job, per-app, ...) are planned to be tested > >> and > >> > expect to work with the initial implementation however not all > >> deployment > >> > targets (k8s, local, ...) are not intended to be tested. Per > deployment > >> > target new jira needs to be created where I expect small number of > codes > >> > needs to be added and relatively expensive testing effort is required. > >> > > >> > > I've taken a look into the prototype and in the > >> "YarnClusterDescriptor" > >> > you're injecting a delegation token into the AM [1] (that's obtained > >> using > >> > the provided keytab). If I understand this correctly from previous > >> > discussion / FLIP, this is to support log aggregation and DT has a > >> limited > >> > validity. How is this DT going to be renewed? > >> > > >> > You're clever and touched a limitation which Spark has too. In short, > >> after > >> > DT reaches it's max lifetime then log aggregation stops. I've had > >> several > >> > deep-dive rounds with the YARN guys at Spark years because wanted to > >> fill > >> > this gap. They can't provide us any way to re-inject the newly > obtained > >> DT > >> > so at the end I gave up this. > >> > > >> > BR, > >> > G > >> > > >> > > >> > On Mon, 24 Jan 2022, 11:00 David Morávek, <d...@apache.org> wrote: > >> > > >> > > Hi Gabor, > >> > > > >> > > There is actually a huge difference between JobManager (process) and > >> > > JobMaster (job coordinator). The naming is unfortunately bit > >> misleading > >> > > here from historical reasons. There is a separate JobMaster for each > >> job > >> > > within a Flink cluster and each JobMaster only has a partial view of > >> the > >> > > task managers (depends on where the slots for a particular job are > >> > > allocated). This means that you'll end up with N > >> > "DelegationTokenManagers" > >> > > competing with each other (N = number of running jobs in the > cluster). > >> > > > >> > > This makes me think we're mixing two abstraction levels here: > >> > > > >> > > a) Per-cluster delegation tokens > >> > > - Simpler approach, it would involve a single DTM instance embedded > >> with > >> > > the ResourceManager (the Flink component) > >> > > b) Per-job delegation tokens > >> > > - More complex approach, but could be more flexible from the user > >> side of > >> > > things. > >> > > - Multiple DTM instances, that are bound with the JobMaster > lifecycle. > >> > > Delegation tokens are attached with a particular slots that are > >> executing > >> > > the job tasks instead of the whole task manager (TM could be > executing > >> > > multiple jobs with different tokens). > >> > > - The question is which keytab should be used for the clustering > >> > framework, > >> > > to support log aggregation on YARN (an extra keytab, keytab that > comes > >> > with > >> > > the first job?) > >> > > > >> > > I think these are the things that need to be clarified in the FLIP > >> before > >> > > proceeding. > >> > > > >> > > A follow-up question for getting a better understanding where this > >> should > >> > > be headed: Are there any use cases where user may want to use > >> different > >> > > keytabs with each job, or are we fine with using a cluster-wide > >> keytab? > >> > If > >> > > we go with per-cluster keytabs, is it OK that all jobs submitted > into > >> > this > >> > > cluster can access it (even the future ones)? Should this be a > >> security > >> > > concern? > >> > > > >> > > Presume you though I would implement a new class with JobManager > name. > >> > The > >> > > > plan is not that. > >> > > > > >> > > > >> > > I've never suggested such thing. > >> > > > >> > > > >> > > > No. That said earlier DT handling is planned to be done completely > >> in > >> > > > Flink. DTM has a renewal thread which re-obtains tokens in the > >> proper > >> > > time > >> > > > when needed. > >> > > > > >> > > > >> > > Then the single (initial) implementation should work with all the > >> > > deployments modes out of the box (which is not what the FLIP > >> suggests). > >> > Is > >> > > that correct? > >> > > > >> > > If the cluster framework, also requires delegation token for their > >> inner > >> > > working (this is IMO only applies to YARN), it might need an extra > >> step > >> > > (injecting the token into application master container). > >> > > > >> > > Separating the individual layers (actual Flink cluster - basically > >> making > >> > > this work with a standalone deployment / "cluster framework" - > >> support > >> > for > >> > > YARN log aggregation) in the FLIP would be useful. > >> > > > >> > > Reading the linked Spark readme could be useful. > >> > > > > >> > > > >> > > I've read that, but please be patient with the questions, Kerberos > is > >> not > >> > > an easy topic to get into and I've had a very little contact with it > >> in > >> > the > >> > > past. > >> > > > >> > > > >> > > > >> > > >> > https://github.com/gaborgsomogyi/flink/blob/8ab75e46013f159778ccfce52463e7bc63e395a9/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java#L176 > >> > > > > >> > > > >> > > I've taken a look into the prototype and in the > >> "YarnClusterDescriptor" > >> > > you're injecting a delegation token into the AM [1] (that's obtained > >> > using > >> > > the provided keytab). If I understand this correctly from previous > >> > > discussion / FLIP, this is to support log aggregation and DT has a > >> > limited > >> > > validity. How is this DT going to be renewed? > >> > > > >> > > [1] > >> > > > >> > > > >> > > >> > https://github.com/gaborgsomogyi/flink/commit/8ab75e46013f159778ccfce52463e7bc63e395a9#diff-02416e2d6ca99e1456f9c3949f3d7c2ac523d3fe25378620c09632e4aac34e4eR1261 > >> > > > >> > > Best, > >> > > D. > >> > > > >> > > On Fri, Jan 21, 2022 at 9:35 PM Gabor Somogyi < > >> gabor.g.somo...@gmail.com > >> > > > >> > > wrote: > >> > > > >> > > > Here is the exact class, I'm from mobile so not had a look at the > >> exact > >> > > > class name: > >> > > > > >> > > > > >> > > > >> > > >> > https://github.com/gaborgsomogyi/flink/blob/8ab75e46013f159778ccfce52463e7bc63e395a9/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java#L176 > >> > > > That keeps track of TMs where the tokens can be sent to. > >> > > > > >> > > > > My feeling would be that we shouldn't really introduce a new > >> > component > >> > > > with > >> > > > a custom lifecycle, but rather we should try to incorporate this > >> into > >> > > > existing ones. > >> > > > > >> > > > Can you be more specific? Presume you though I would implement a > new > >> > > class > >> > > > with JobManager name. The plan is not that. > >> > > > > >> > > > > If I understand this correctly, this means that we then push the > >> > token > >> > > > renewal logic to YARN. > >> > > > > >> > > > No. That said earlier DT handling is planned to be done completely > >> in > >> > > > Flink. DTM has a renewal thread which re-obtains tokens in the > >> proper > >> > > time > >> > > > when needed. YARN log aggregation is a totally different feature, > >> where > >> > > > YARN does the renewal. Log aggregation was an example why the code > >> > can't > >> > > be > >> > > > 100% reusable for all resource managers. Reading the linked Spark > >> > readme > >> > > > could be useful. > >> > > > > >> > > > G > >> > > > > >> > > > On Fri, 21 Jan 2022, 21:05 David Morávek, <d...@apache.org> > wrote: > >> > > > > >> > > > > > > >> > > > > > JobManager is the Flink class. > >> > > > > > >> > > > > > >> > > > > There is no such class in Flink. The closest thing to the > >> JobManager > >> > > is a > >> > > > > ClusterEntrypoint. The cluster entrypoint spawns new RM Runner & > >> > > > Dispatcher > >> > > > > Runner that start participating in the leader election. Once > they > >> > gain > >> > > > > leadership they spawn the actual underlying instances of these > two > >> > > "main > >> > > > > components". > >> > > > > > >> > > > > My feeling would be that we shouldn't really introduce a new > >> > component > >> > > > with > >> > > > > a custom lifecycle, but rather we should try to incorporate this > >> into > >> > > > > existing ones. > >> > > > > > >> > > > > My biggest concerns would be: > >> > > > > > >> > > > > - How would the lifecycle of the new component look like with > >> regards > >> > > to > >> > > > HA > >> > > > > setups. If we really try to decide to introduce a completely new > >> > > > component, > >> > > > > how should this work in case of multiple JobManager instances? > >> > > > > - Which components does it talk to / how? For example how does > the > >> > > > > broadcast of new token to task managers (TaskManagerGateway) > look > >> > like? > >> > > > Do > >> > > > > we simply introduce a new RPC on the ResourceManagerGateway that > >> > > > broadcasts > >> > > > > it or does the new component need to do some kind of bookkeeping > >> of > >> > > task > >> > > > > managers that it needs to notify? > >> > > > > > >> > > > > YARN based HDFS log aggregation would not work by dropping that > >> code. > >> > > > Just > >> > > > > > to be crystal clear, the actual implementation contains this > fir > >> > > > exactly > >> > > > > > this reason. > >> > > > > > > >> > > > > > >> > > > > This is the missing part +1. If I understand this correctly, > this > >> > means > >> > > > > that we then push the token renewal logic to YARN. How do you > >> plan to > >> > > > > implement the renewal logic on k8s? > >> > > > > > >> > > > > D. > >> > > > > > >> > > > > On Fri, Jan 21, 2022 at 8:37 PM Gabor Somogyi < > >> > > gabor.g.somo...@gmail.com > >> > > > > > >> > > > > wrote: > >> > > > > > >> > > > > > > I think we might both mean something different by the RM. > >> > > > > > > >> > > > > > You feel it well, I've not specified these terms well in the > >> > > > explanation. > >> > > > > > RM I meant resource management framework. JobManager is the > >> Flink > >> > > > class. > >> > > > > > This means that inside JM instance there will be a DTM > >> instance, so > >> > > > they > >> > > > > > would have the same lifecycle. Hope I've answered the > question. > >> > > > > > > >> > > > > > > If we have tokens available on the client side, why do we > >> need to > >> > > set > >> > > > > > them > >> > > > > > into the AM (yarn specific concept) launch context? > >> > > > > > > >> > > > > > YARN based HDFS log aggregation would not work by dropping > that > >> > code. > >> > > > > Just > >> > > > > > to be crystal clear, the actual implementation contains this > fir > >> > > > exactly > >> > > > > > this reason. > >> > > > > > > >> > > > > > G > >> > > > > > > >> > > > > > On Fri, 21 Jan 2022, 20:12 David Morávek, <d...@apache.org> > >> wrote: > >> > > > > > > >> > > > > > > Hi Gabor, > >> > > > > > > > >> > > > > > > 1. One thing is important, token management is planned to be > >> done > >> > > > > > > > generically within Flink and not scattered in RM specific > >> code. > >> > > > > > > JobManager > >> > > > > > > > has a DelegationTokenManager which obtains tokens > >> time-to-time > >> > > (if > >> > > > > > > > configured properly). JM knows which TaskManagers are in > >> place > >> > so > >> > > > it > >> > > > > > can > >> > > > > > > > distribute it to all TMs. That's it basically. > >> > > > > > > > >> > > > > > > > >> > > > > > > I think we might both mean something different by the RM. > >> > > JobManager > >> > > > is > >> > > > > > > basically just a process encapsulating multiple components, > >> one > >> > of > >> > > > > which > >> > > > > > is > >> > > > > > > a ResourceManager, which is the component that manages task > >> > manager > >> > > > > > > registrations [1]. There is more or less a single > >> implementation > >> > of > >> > > > the > >> > > > > > RM > >> > > > > > > with plugable drivers for the active integrations (yarn, > k8s). > >> > > > > > > > >> > > > > > > It would be great if you could share more details of how > >> exactly > >> > > the > >> > > > > DTM > >> > > > > > is > >> > > > > > > going to fit in the current JM architecture. > >> > > > > > > > >> > > > > > > 2. 99.9% of the code is generic but each RM handles tokens > >> > > > > differently. A > >> > > > > > > > good example is YARN obtains tokens on client side and > then > >> > sets > >> > > > them > >> > > > > > on > >> > > > > > > > the newly created AM container launch context. This is > >> purely > >> > > YARN > >> > > > > > > specific > >> > > > > > > > and cant't be spared. With my actual plans standalone can > be > >> > > > changed > >> > > > > to > >> > > > > > > use > >> > > > > > > > the framework. By using it I mean no RM specific DTM or > >> > > whatsoever > >> > > > is > >> > > > > > > > needed. > >> > > > > > > > > >> > > > > > > > >> > > > > > > If we have tokens available on the client side, why do we > >> need to > >> > > set > >> > > > > > them > >> > > > > > > into the AM (yarn specific concept) launch context? Why > can't > >> we > >> > > > simply > >> > > > > > > send them to the JM, eg. as a parameter of the job > submission > >> / > >> > via > >> > > > > > > separate RPC call? There might be something I'm missing due > to > >> > > > limited > >> > > > > > > knowledge, but handling the token on the "cluster framework" > >> > level > >> > > > > > doesn't > >> > > > > > > seem necessary. > >> > > > > > > > >> > > > > > > [1] > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/concepts/flink-architecture/#jobmanager > >> > > > > > > > >> > > > > > > Best, > >> > > > > > > D. > >> > > > > > > > >> > > > > > > On Fri, Jan 21, 2022 at 7:48 PM Gabor Somogyi < > >> > > > > gabor.g.somo...@gmail.com > >> > > > > > > > >> > > > > > > wrote: > >> > > > > > > > >> > > > > > > > Oh and one more thing. I'm planning to add this feature in > >> > small > >> > > > > chunk > >> > > > > > of > >> > > > > > > > PRs because security is super hairy area. That way > reviewers > >> > can > >> > > be > >> > > > > > more > >> > > > > > > > easily obtains the concept. > >> > > > > > > > > >> > > > > > > > On Fri, 21 Jan 2022, 18:03 David Morávek, < > d...@apache.org> > >> > > wrote: > >> > > > > > > > > >> > > > > > > > > Hi Gabor, > >> > > > > > > > > > >> > > > > > > > > thanks for drafting the FLIP, I think having a solid > >> Kerberos > >> > > > > support > >> > > > > > > is > >> > > > > > > > > crucial for many enterprise deployments. > >> > > > > > > > > > >> > > > > > > > > I have multiple questions regarding the implementation > >> (note > >> > > > that I > >> > > > > > > have > >> > > > > > > > > very limited knowledge of Kerberos): > >> > > > > > > > > > >> > > > > > > > > 1) If I understand it correctly, we'll only obtain > tokens > >> in > >> > > the > >> > > > > job > >> > > > > > > > > manager and then we'll distribute them via RPC (needs to > >> be > >> > > > > secured). > >> > > > > > > > > > >> > > > > > > > > Can you please outline how the communication will look > >> like? > >> > Is > >> > > > the > >> > > > > > > > > DelegationTokenManager going to be a part of the > >> > > ResourceManager? > >> > > > > Can > >> > > > > > > you > >> > > > > > > > > outline it's lifecycle / how it's going to be integrated > >> > there? > >> > > > > > > > > > >> > > > > > > > > 2) Do we really need a YARN / k8s specific > >> implementations? > >> > Is > >> > > it > >> > > > > > > > possible > >> > > > > > > > > to obtain / renew a token in a generic way? Maybe to > >> rephrase > >> > > > that, > >> > > > > > is > >> > > > > > > it > >> > > > > > > > > possible to implement DelegationTokenManager for the > >> > standalone > >> > > > > > Flink? > >> > > > > > > If > >> > > > > > > > > we're able to solve this point, it could be possible to > >> > target > >> > > > all > >> > > > > > > > > deployment scenarios with a single implementation. > >> > > > > > > > > > >> > > > > > > > > Best, > >> > > > > > > > > D. > >> > > > > > > > > > >> > > > > > > > > On Fri, Jan 14, 2022 at 3:47 AM Junfan Zhang < > >> > > > > > zuston.sha...@gmail.com> > >> > > > > > > > > wrote: > >> > > > > > > > > > >> > > > > > > > > > Hi G > >> > > > > > > > > > > >> > > > > > > > > > Thanks for your explain in detail. I have gotten your > >> > > thoughts, > >> > > > > and > >> > > > > > > any > >> > > > > > > > > > way this proposal > >> > > > > > > > > > is a great improvement. > >> > > > > > > > > > > >> > > > > > > > > > Looking forward to your implementation and i will keep > >> > focus > >> > > on > >> > > > > it. > >> > > > > > > > > > Thanks again. > >> > > > > > > > > > > >> > > > > > > > > > Best > >> > > > > > > > > > JunFan. > >> > > > > > > > > > On Jan 13, 2022, 9:20 PM +0800, Gabor Somogyi < > >> > > > > > > > gabor.g.somo...@gmail.com > >> > > > > > > > > >, > >> > > > > > > > > > wrote: > >> > > > > > > > > > > Just to confirm keeping > >> > > > > > "security.kerberos.fetch.delegation-token" > >> > > > > > > is > >> > > > > > > > > > added > >> > > > > > > > > > > to the doc. > >> > > > > > > > > > > > >> > > > > > > > > > > BR, > >> > > > > > > > > > > G > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > On Thu, Jan 13, 2022 at 1:34 PM Gabor Somogyi < > >> > > > > > > > > gabor.g.somo...@gmail.com > >> > > > > > > > > > > > >> > > > > > > > > > > wrote: > >> > > > > > > > > > > > >> > > > > > > > > > > > Hi JunFan, > >> > > > > > > > > > > > > >> > > > > > > > > > > > > By the way, maybe this should be added in the > >> > migration > >> > > > > plan > >> > > > > > or > >> > > > > > > > > > > > intergation section in the FLIP-211. > >> > > > > > > > > > > > > >> > > > > > > > > > > > Going to add this soon. > >> > > > > > > > > > > > > >> > > > > > > > > > > > > Besides, I have a question that the KDC will > >> collapse > >> > > > when > >> > > > > > the > >> > > > > > > > > > cluster > >> > > > > > > > > > > > reached 200 nodes you described > >> > > > > > > > > > > > in the google doc. Do you have any attachment or > >> > > reference > >> > > > to > >> > > > > > > prove > >> > > > > > > > > it? > >> > > > > > > > > > > > > >> > > > > > > > > > > > "KDC *may* collapse under some circumstances" is > the > >> > > proper > >> > > > > > > > wording. > >> > > > > > > > > > > > > >> > > > > > > > > > > > We have several customers who are executing > >> workloads > >> > on > >> > > > > > > > Spark/Flink. > >> > > > > > > > > > Most > >> > > > > > > > > > > > of the time I'm facing their > >> > > > > > > > > > > > daily issues which is heavily environment and > >> use-case > >> > > > > > dependent. > >> > > > > > > > > I've > >> > > > > > > > > > > > seen various cases: > >> > > > > > > > > > > > * where the mentioned ~1k nodes were working fine > >> > > > > > > > > > > > * where KDC thought the number of requests are > >> coming > >> > > from > >> > > > > DDOS > >> > > > > > > > > attack > >> > > > > > > > > > so > >> > > > > > > > > > > > discontinued authentication > >> > > > > > > > > > > > * where KDC was simply not responding because of > the > >> > load > >> > > > > > > > > > > > * where KDC was intermittently had some outage > (this > >> > was > >> > > > the > >> > > > > > most > >> > > > > > > > > nasty > >> > > > > > > > > > > > thing) > >> > > > > > > > > > > > > >> > > > > > > > > > > > Since you're managing relatively big cluster then > >> you > >> > > know > >> > > > > that > >> > > > > > > KDC > >> > > > > > > > > is > >> > > > > > > > > > not > >> > > > > > > > > > > > only used by Spark/Flink workloads > >> > > > > > > > > > > > but the whole company IT infrastructure is bombing > >> it > >> > so > >> > > it > >> > > > > > > really > >> > > > > > > > > > depends > >> > > > > > > > > > > > on other factors too whether KDC is reaching > >> > > > > > > > > > > > it's limit or not. Not sure what kind of evidence > >> are > >> > you > >> > > > > > looking > >> > > > > > > > for > >> > > > > > > > > > but > >> > > > > > > > > > > > I'm not authorized to share any information about > >> > > > > > > > > > > > our clients data. > >> > > > > > > > > > > > > >> > > > > > > > > > > > One thing is for sure. The more external system > >> types > >> > are > >> > > > > used > >> > > > > > in > >> > > > > > > > > > > > workloads (for ex. HDFS, HBase, Hive, Kafka) which > >> > > > > > > > > > > > are authenticating through KDC the more > possibility > >> to > >> > > > reach > >> > > > > > this > >> > > > > > > > > > > > threshold when the cluster is big enough. > >> > > > > > > > > > > > > >> > > > > > > > > > > > All in all this feature is here to help all users > >> never > >> > > > reach > >> > > > > > > this > >> > > > > > > > > > > > limitation. > >> > > > > > > > > > > > > >> > > > > > > > > > > > BR, > >> > > > > > > > > > > > G > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > On Thu, Jan 13, 2022 at 1:00 PM 张俊帆 < > >> > > > zuston.sha...@gmail.com > >> > > > > > > >> > > > > > > > wrote: > >> > > > > > > > > > > > > >> > > > > > > > > > > > > Hi G > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > Thanks for your quick reply. I think reserving > the > >> > > config > >> > > > > of > >> > > > > > > > > > > > > *security.kerberos.fetch.delegation-token* > >> > > > > > > > > > > > > and simplifying disable the token fetching is a > >> good > >> > > > > idea.By > >> > > > > > > the > >> > > > > > > > > way, > >> > > > > > > > > > > > > maybe this should be added > >> > > > > > > > > > > > > in the migration plan or intergation section in > >> the > >> > > > > FLIP-211. > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > Besides, I have a question that the KDC will > >> collapse > >> > > > when > >> > > > > > the > >> > > > > > > > > > cluster > >> > > > > > > > > > > > > reached 200 nodes you described > >> > > > > > > > > > > > > in the google doc. Do you have any attachment or > >> > > > reference > >> > > > > to > >> > > > > > > > prove > >> > > > > > > > > > it? > >> > > > > > > > > > > > > Because in our internal per-cluster, > >> > > > > > > > > > > > > the nodes reaches > 1000 and KDC looks good. Do > i > >> > > missed > >> > > > or > >> > > > > > > > > > misunderstood > >> > > > > > > > > > > > > something? Please correct me. > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > Best > >> > > > > > > > > > > > > JunFan. > >> > > > > > > > > > > > > On Jan 13, 2022, 5:26 PM +0800, > >> dev@flink.apache.org > >> > , > >> > > > > wrote: > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://docs.google.com/document/d/1JzMbQ1pCJsLVz8yHrCxroYMRP2GwGwvacLrGyaIx5Yc/edit?fbclid=IwAR0vfeJvAbEUSzHQAAJfnWTaX46L6o7LyXhMfBUCcPrNi-uXNgoOaI8PMDQ > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > >