Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

Gabor Somogyi Wed, 26 Jan 2022 08:33:09 -0800

Hi All,

Since it has turned out that DTM can't be added as member of JobMaster
<https://github.com/gaborgsomogyi/flink/blob/8ab75e46013f159778ccfce52463e7bc63e395a9/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java#L176>
I've
came up with a better proposal.
David, thanks for pinpointing this out, you've caught a bug in the early
phase!


Namely ResourceManager
<https://github.com/apache/flink/blob/674bc96662285b25e395fd3dddf9291a602fc183/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManager.java#L124>
is
a single instance class where DTM can be added as member variable.
It has a list of all already registered TMs and new TM registration is also
happening here.
The following can be added from logic perspective to be more specific:
* Create new DTM instance in ResourceManager
<https://github.com/apache/flink/blob/674bc96662285b25e395fd3dddf9291a602fc183/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManager.java#L124>
and
start it (re-occurring thread to obtain new tokens)
* Add a new function named "updateDelegationTokens" to TaskExecutorGateway
<https://github.com/apache/flink/blob/674bc96662285b25e395fd3dddf9291a602fc183/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskExecutorGateway.java#L54>
* Call "updateDelegationTokens" on all registered TMs to propagate new DTs
* In case of new TM registration call "updateDelegationTokens" before
registration succeeds to setup new TM properly

This way:
* only a single DTM would live within a cluster which is the expected
behavior
* DTM is going to be added to a central place where all deployment target
can make use of it
* DTs are going to be pushed to TMs which would generate less network
traffic than pull based approach
(please see my previous mail where I've described both approaches)
* HA scenario is going to be consistent because such
<https://github.com/apache/flink/blob/674bc96662285b25e395fd3dddf9291a602fc183/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskExecutor.java#L1069>
a solution can be added to "updateDelegationTokens"

@David or all others plz share whether you agree on this or you have better
idea/suggestion.

BR,
G


On Tue, Jan 25, 2022 at 11:00 AM Gabor Somogyi <gabor.g.somo...@gmail.com>
wrote:

> First of all thanks for investing your time and helping me out. As I see
> you have pretty solid knowledge in the RPC area.
> I would like to rely on your knowledge since I'm learning this part.
>
> > - Do we need to introduce a new RPC method or can we for example
> piggyback
> on heartbeats?
>
> I'm fine with either solution but one thing is important conceptually.
> There are fundamentally 2 ways how tokens can be updated:
> - Push way: When there are new DTs then JM JVM pushes DTs to TM JVMs. This
> is the preferred one since tiny amount of control logic needed.
> - Pull way: Each time a TM would like to poll JM whether there are new
> tokens and each TM wants to decide alone whether DTs needs to be updated or
> not.
> As you've mentioned here some ID needs to be generated, it would generated
> quite some additional network traffic which can be definitely avoided.
> As a final thought in Spark we've had this way of DT propagation logic and
> we've had major issues with it.
>
> So all in all DTM needs to obtain new tokens and there must a way to send
> this data to all TMs from JM.
>
> > - What delivery semantics are we looking for? (what if we're only able to
> update subset of TMs / what happens if we exhaust retries / should we even
> have the retry mechanism whatsoever) - I have a feeling that somehow
> leveraging the existing heartbeat mechanism could help to answer these
> questions
>
> Let's go through these questions one by one.
> > What delivery semantics are we looking for?
>
> DTM must receive an exception when at least one TM was not able to get DTs.
>
> > what if we're only able to update subset of TMs?
>
> Such case DTM will reschedule token obtain after
> "security.kerberos.tokens.retry-wait" time.
>
> > what happens if we exhaust retries?
>
> There is no number of retries. In default configuration tokens needs to be
> re-obtained after one day.
> DTM tries to obtain new tokens after 1day * 0.75
> (security.kerberos.tokens.renewal-ratio) = 18 hours.
> When fails it retries after "security.kerberos.tokens.retry-wait" which is
> 1 hour by default.
> If it never succeeds then authentication error is going to happen on the
> TM side and the workload is
> going to stop.
>
> > should we even have the retry mechanism whatsoever?
>
> Yes, because there are always temporary cluster issues.
>
> > What does it mean for the running application (how does this look like
> from
> the user perspective)? As far as I remember the logs are only collected
> ("aggregated") after the container is stopped, is that correct?
>
> With default config it works like that but it can be forced to aggregate
> at specific intervals.
> A useful feature is forcing YARN to aggregate logs while the job is still
> running.
> For long-running jobs such as streaming jobs, this is invaluable. To do
> this,
> yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds must be
> set to a non-negative value.
> When this is set, a timer will be set for the given duration, and whenever
> that timer goes off,
> log aggregation will run on new files.
>
> > I think
> this topic should get its own section in the FLIP (having some cross
> reference to YARN ticket would be really useful, but I'm not sure if there
> are any).
>
> I think this is important knowledge but this FLIP is not touching the
> already existing behavior.
> DTs are set on the AM container which is renewed by YARN until it's not
> possible anymore.
> Any kind of new code is not going to change this limitation. BTW, there is
> no jira for this.
> If you think it worth to write this down then I think the good place is
> the official security doc
> area as caveat.
>
> > If we split the FLIP into two parts / sections that I've suggested, I
> don't
> really think that you need to explicitly test for each deployment scenario
> / cluster framework, because the DTM part is completely independent of the
> deployment target. Basically this is what I'm aiming for with "making it
> work with the standalone" (as simple as starting a new java process) Flink
> first (which is also how most people deploy streaming application on k8s
> and the direction we're pushing forward with the auto-scaling / reactive
> mode initiatives).
>
> I see your point and agree the main direction. k8s is the megatrend which
> most of the peoples
> will use sooner or later. Not 100% sure what kind of split you suggest but
> in my view
> the main target is to add this feature and I'm open to any logical work
> ordering.
> Please share the specific details and we work it out...
>
> G
>
>
> On Mon, Jan 24, 2022 at 3:04 PM David Morávek <d...@apache.org> wrote:
>
>> >
>> > Could you point to a code where you think it could be added exactly? A
>> > helping hand is welcome here 🙂
>> >
>>
>> I think you can take a look at _ResourceManagerPartitionTracker_ [1] which
>> seems to have somewhat similar properties to the DTM.
>>
>> One topic that needs to be addressed there is how the RPC with the
>> _TaskExecutorGateway_ should look like.
>> - Do we need to introduce a new RPC method or can we for example piggyback
>> on heartbeats?
>> - What delivery semantics are we looking for? (what if we're only able to
>> update subset of TMs / what happens if we exhaust retries / should we even
>> have the retry mechanism whatsoever) - I have a feeling that somehow
>> leveraging the existing heartbeat mechanism could help to answer these
>> questions
>>
>> In short, after DT reaches it's max lifetime then log aggregation stops
>> >
>>
>> What does it mean for the running application (how does this look like
>> from
>> the user perspective)? As far as I remember the logs are only collected
>> ("aggregated") after the container is stopped, is that correct? I think
>> this topic should get its own section in the FLIP (having some cross
>> reference to YARN ticket would be really useful, but I'm not sure if there
>> are any).
>>
>> All deployment modes (per-job, per-app, ...) are planned to be tested and
>> > expect to work with the initial implementation however not all
>> deployment
>> > targets (k8s, local, ...
>> >
>>
>> If we split the FLIP into two parts / sections that I've suggested, I
>> don't
>> really think that you need to explicitly test for each deployment scenario
>> / cluster framework, because the DTM part is completely independent of the
>> deployment target. Basically this is what I'm aiming for with "making it
>> work with the standalone" (as simple as starting a new java process) Flink
>> first (which is also how most people deploy streaming application on k8s
>> and the direction we're pushing forward with the auto-scaling / reactive
>> mode initiatives).
>>
>> The whole integration with YARN (let's forget about log aggregation for a
>> moment) / k8s-native only boils down to how do we make the keytab file
>> local to the JobManager so the DTM can read it, so it's basically built on
>> top of that. The only special thing that needs to be tested there is the
>> "keytab distribution" code path.
>>
>> [1]
>>
>> https://github.com/apache/flink/blob/release-1.14.3/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/ResourceManagerPartitionTracker.java
>>
>> Best,
>> D.
>>
>> On Mon, Jan 24, 2022 at 12:35 PM Gabor Somogyi <gabor.g.somo...@gmail.com
>> >
>> wrote:
>>
>> > > There is a separate JobMaster for each job
>> > within a Flink cluster and each JobMaster only has a partial view of the
>> > task managers
>> >
>> > Good point! I've had a deeper look and you're right. We definitely need
>> to
>> > find another place.
>> >
>> > > Related per-cluster or per-job keytab:
>> >
>> > In the current code per-cluster keytab is implemented and I'm intended
>> to
>> > keep it like this within this FLIP. The reason is simple: tokens on TM
>> side
>> > can be stored within the UserGroupInformation (UGI) structure which is
>> > global. I'm not telling it's impossible to change that but I think that
>> > this is such a complexity which the initial implementation is not
>> required
>> > to contain. Additionally we've not seen such need from user side. If the
>> > need may rise later on then another FLIP with this topic can be created
>> and
>> > discussed. Proper multi-UGI handling within a single JVM is a topic
>> where
>> > several round of deep-dive with the Hadoop/YARN guys are required.
>> >
>> > > single DTM instance embedded with
>> > the ResourceManager (the Flink component)
>> >
>> > Could you point to a code where you think it could be added exactly? A
>> > helping hand is welcome here🙂
>> >
>> > > Then the single (initial) implementation should work with all the
>> > deployments modes out of the box (which is not what the FLIP suggests).
>> Is
>> > that correct?
>> >
>> > All deployment modes (per-job, per-app, ...) are planned to be tested
>> and
>> > expect to work with the initial implementation however not all
>> deployment
>> > targets (k8s, local, ...) are not intended to be tested. Per deployment
>> > target new jira needs to be created where I expect small number of codes
>> > needs to be added and relatively expensive testing effort is required.
>> >
>> > > I've taken a look into the prototype and in the
>> "YarnClusterDescriptor"
>> > you're injecting a delegation token into the AM [1] (that's obtained
>> using
>> > the provided keytab). If I understand this correctly from previous
>> > discussion / FLIP, this is to support log aggregation and DT has a
>> limited
>> > validity. How is this DT going to be renewed?
>> >
>> > You're clever and touched a limitation which Spark has too. In short,
>> after
>> > DT reaches it's max lifetime then log aggregation stops. I've had
>> several
>> > deep-dive rounds with the YARN guys at Spark years because wanted to
>> fill
>> > this gap. They can't provide us any way to re-inject the newly obtained
>> DT
>> > so at the end I gave up this.
>> >
>> > BR,
>> > G
>> >
>> >
>> > On Mon, 24 Jan 2022, 11:00 David Morávek, <d...@apache.org> wrote:
>> >
>> > > Hi Gabor,
>> > >
>> > > There is actually a huge difference between JobManager (process) and
>> > > JobMaster (job coordinator). The naming is unfortunately bit
>> misleading
>> > > here from historical reasons. There is a separate JobMaster for each
>> job
>> > > within a Flink cluster and each JobMaster only has a partial view of
>> the
>> > > task managers (depends on where the slots for a particular job are
>> > > allocated). This means that you'll end up with N
>> > "DelegationTokenManagers"
>> > > competing with each other (N = number of running jobs in the cluster).
>> > >
>> > > This makes me think we're mixing two abstraction levels here:
>> > >
>> > > a) Per-cluster delegation tokens
>> > > - Simpler approach, it would involve a single DTM instance embedded
>> with
>> > > the ResourceManager (the Flink component)
>> > > b) Per-job delegation tokens
>> > > - More complex approach, but could be more flexible from the user
>> side of
>> > > things.
>> > > - Multiple DTM instances, that are bound with the JobMaster lifecycle.
>> > > Delegation tokens are attached with a particular slots that are
>> executing
>> > > the job tasks instead of the whole task manager (TM could be executing
>> > > multiple jobs with different tokens).
>> > > - The question is which keytab should be used for the clustering
>> > framework,
>> > > to support log aggregation on YARN (an extra keytab, keytab that comes
>> > with
>> > > the first job?)
>> > >
>> > > I think these are the things that need to be clarified in the FLIP
>> before
>> > > proceeding.
>> > >
>> > > A follow-up question for getting a better understanding where this
>> should
>> > > be headed: Are there any use cases where user may want to use
>> different
>> > > keytabs with each job, or are we fine with using a cluster-wide
>> keytab?
>> > If
>> > > we go with per-cluster keytabs, is it OK that all jobs submitted into
>> > this
>> > > cluster can access it (even the future ones)? Should this be a
>> security
>> > > concern?
>> > >
>> > > Presume you though I would implement a new class with JobManager name.
>> > The
>> > > > plan is not that.
>> > > >
>> > >
>> > > I've never suggested such thing.
>> > >
>> > >
>> > > > No. That said earlier DT handling is planned to be done completely
>> in
>> > > > Flink. DTM has a renewal thread which re-obtains tokens in the
>> proper
>> > > time
>> > > > when needed.
>> > > >
>> > >
>> > > Then the single (initial) implementation should work with all the
>> > > deployments modes out of the box (which is not what the FLIP
>> suggests).
>> > Is
>> > > that correct?
>> > >
>> > > If the cluster framework, also requires delegation token for their
>> inner
>> > > working (this is IMO only applies to YARN), it might need an extra
>> step
>> > > (injecting the token into application master container).
>> > >
>> > > Separating the individual layers (actual Flink cluster - basically
>> making
>> > > this work with a standalone deployment  / "cluster framework" -
>> support
>> > for
>> > > YARN log aggregation) in the FLIP would be useful.
>> > >
>> > > Reading the linked Spark readme could be useful.
>> > > >
>> > >
>> > > I've read that, but please be patient with the questions, Kerberos is
>> not
>> > > an easy topic to get into and I've had a very little contact with it
>> in
>> > the
>> > > past.
>> > >
>> > >
>> > >
>> >
>> https://github.com/gaborgsomogyi/flink/blob/8ab75e46013f159778ccfce52463e7bc63e395a9/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java#L176
>> > > >
>> > >
>> > > I've taken a look into the prototype and in the
>> "YarnClusterDescriptor"
>> > > you're injecting a delegation token into the AM [1] (that's obtained
>> > using
>> > > the provided keytab). If I understand this correctly from previous
>> > > discussion / FLIP, this is to support log aggregation and DT has a
>> > limited
>> > > validity. How is this DT going to be renewed?
>> > >
>> > > [1]
>> > >
>> > >
>> >
>> https://github.com/gaborgsomogyi/flink/commit/8ab75e46013f159778ccfce52463e7bc63e395a9#diff-02416e2d6ca99e1456f9c3949f3d7c2ac523d3fe25378620c09632e4aac34e4eR1261
>> > >
>> > > Best,
>> > > D.
>> > >
>> > > On Fri, Jan 21, 2022 at 9:35 PM Gabor Somogyi <
>> gabor.g.somo...@gmail.com
>> > >
>> > > wrote:
>> > >
>> > > > Here is the exact class, I'm from mobile so not had a look at the
>> exact
>> > > > class name:
>> > > >
>> > > >
>> > >
>> >
>> https://github.com/gaborgsomogyi/flink/blob/8ab75e46013f159778ccfce52463e7bc63e395a9/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java#L176
>> > > > That keeps track of TMs where the tokens can be sent to.
>> > > >
>> > > > > My feeling would be that we shouldn't really introduce a new
>> > component
>> > > > with
>> > > > a custom lifecycle, but rather we should try to incorporate this
>> into
>> > > > existing ones.
>> > > >
>> > > > Can you be more specific? Presume you though I would implement a new
>> > > class
>> > > > with JobManager name. The plan is not that.
>> > > >
>> > > > > If I understand this correctly, this means that we then push the
>> > token
>> > > > renewal logic to YARN.
>> > > >
>> > > > No. That said earlier DT handling is planned to be done completely
>> in
>> > > > Flink. DTM has a renewal thread which re-obtains tokens in the
>> proper
>> > > time
>> > > > when needed. YARN log aggregation is a totally different feature,
>> where
>> > > > YARN does the renewal. Log aggregation was an example why the code
>> > can't
>> > > be
>> > > > 100% reusable for all resource managers. Reading the linked Spark
>> > readme
>> > > > could be useful.
>> > > >
>> > > > G
>> > > >
>> > > > On Fri, 21 Jan 2022, 21:05 David Morávek, <d...@apache.org> wrote:
>> > > >
>> > > > > >
>> > > > > > JobManager is the Flink class.
>> > > > >
>> > > > >
>> > > > > There is no such class in Flink. The closest thing to the
>> JobManager
>> > > is a
>> > > > > ClusterEntrypoint. The cluster entrypoint spawns new RM Runner &
>> > > > Dispatcher
>> > > > > Runner that start participating in the leader election. Once they
>> > gain
>> > > > > leadership they spawn the actual underlying instances of these two
>> > > "main
>> > > > > components".
>> > > > >
>> > > > > My feeling would be that we shouldn't really introduce a new
>> > component
>> > > > with
>> > > > > a custom lifecycle, but rather we should try to incorporate this
>> into
>> > > > > existing ones.
>> > > > >
>> > > > > My biggest concerns would be:
>> > > > >
>> > > > > - How would the lifecycle of the new component look like with
>> regards
>> > > to
>> > > > HA
>> > > > > setups. If we really try to decide to introduce a completely new
>> > > > component,
>> > > > > how should this work in case of multiple JobManager instances?
>> > > > > - Which components does it talk to / how? For example how does the
>> > > > > broadcast of new token to task managers (TaskManagerGateway) look
>> > like?
>> > > > Do
>> > > > > we simply introduce a new RPC on the ResourceManagerGateway that
>> > > > broadcasts
>> > > > > it or does the new component need to do some kind of bookkeeping
>> of
>> > > task
>> > > > > managers that it needs to notify?
>> > > > >
>> > > > > YARN based HDFS log aggregation would not work by dropping that
>> code.
>> > > > Just
>> > > > > > to be crystal clear, the actual implementation contains this fir
>> > > > exactly
>> > > > > > this reason.
>> > > > > >
>> > > > >
>> > > > > This is the missing part +1. If I understand this correctly, this
>> > means
>> > > > > that we then push the token renewal logic to YARN. How do you
>> plan to
>> > > > > implement the renewal logic on k8s?
>> > > > >
>> > > > > D.
>> > > > >
>> > > > > On Fri, Jan 21, 2022 at 8:37 PM Gabor Somogyi <
>> > > gabor.g.somo...@gmail.com
>> > > > >
>> > > > > wrote:
>> > > > >
>> > > > > > > I think we might both mean something different by the RM.
>> > > > > >
>> > > > > > You feel it well, I've not specified these terms well in the
>> > > > explanation.
>> > > > > > RM I meant resource management framework. JobManager is the
>> Flink
>> > > > class.
>> > > > > > This means that inside JM instance there will be a DTM
>> instance, so
>> > > > they
>> > > > > > would have the same lifecycle. Hope I've answered the question.
>> > > > > >
>> > > > > > > If we have tokens available on the client side, why do we
>> need to
>> > > set
>> > > > > > them
>> > > > > > into the AM (yarn specific concept) launch context?
>> > > > > >
>> > > > > > YARN based HDFS log aggregation would not work by dropping that
>> > code.
>> > > > > Just
>> > > > > > to be crystal clear, the actual implementation contains this fir
>> > > > exactly
>> > > > > > this reason.
>> > > > > >
>> > > > > > G
>> > > > > >
>> > > > > > On Fri, 21 Jan 2022, 20:12 David Morávek, <d...@apache.org>
>> wrote:
>> > > > > >
>> > > > > > > Hi Gabor,
>> > > > > > >
>> > > > > > > 1. One thing is important, token management is planned to be
>> done
>> > > > > > > > generically within Flink and not scattered in RM specific
>> code.
>> > > > > > > JobManager
>> > > > > > > > has a DelegationTokenManager which obtains tokens
>> time-to-time
>> > > (if
>> > > > > > > > configured properly). JM knows which TaskManagers are in
>> place
>> > so
>> > > > it
>> > > > > > can
>> > > > > > > > distribute it to all TMs. That's it basically.
>> > > > > > >
>> > > > > > >
>> > > > > > > I think we might both mean something different by the RM.
>> > > JobManager
>> > > > is
>> > > > > > > basically just a process encapsulating multiple components,
>> one
>> > of
>> > > > > which
>> > > > > > is
>> > > > > > > a ResourceManager, which is the component that manages task
>> > manager
>> > > > > > > registrations [1]. There is more or less a single
>> implementation
>> > of
>> > > > the
>> > > > > > RM
>> > > > > > > with plugable drivers for the active integrations (yarn, k8s).
>> > > > > > >
>> > > > > > > It would be great if you could share more details of how
>> exactly
>> > > the
>> > > > > DTM
>> > > > > > is
>> > > > > > > going to fit in the current JM architecture.
>> > > > > > >
>> > > > > > > 2. 99.9% of the code is generic but each RM handles tokens
>> > > > > differently. A
>> > > > > > > > good example is YARN obtains tokens on client side and then
>> > sets
>> > > > them
>> > > > > > on
>> > > > > > > > the newly created AM container launch context. This is
>> purely
>> > > YARN
>> > > > > > > specific
>> > > > > > > > and cant't be spared. With my actual plans standalone can be
>> > > > changed
>> > > > > to
>> > > > > > > use
>> > > > > > > > the framework. By using it I mean no RM specific DTM or
>> > > whatsoever
>> > > > is
>> > > > > > > > needed.
>> > > > > > > >
>> > > > > > >
>> > > > > > > If we have tokens available on the client side, why do we
>> need to
>> > > set
>> > > > > > them
>> > > > > > > into the AM (yarn specific concept) launch context? Why can't
>> we
>> > > > simply
>> > > > > > > send them to the JM, eg. as a parameter of the job submission
>> /
>> > via
>> > > > > > > separate RPC call? There might be something I'm missing due to
>> > > > limited
>> > > > > > > knowledge, but handling the token on the "cluster framework"
>> > level
>> > > > > > doesn't
>> > > > > > > seem necessary.
>> > > > > > >
>> > > > > > > [1]
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/concepts/flink-architecture/#jobmanager
>> > > > > > >
>> > > > > > > Best,
>> > > > > > > D.
>> > > > > > >
>> > > > > > > On Fri, Jan 21, 2022 at 7:48 PM Gabor Somogyi <
>> > > > > gabor.g.somo...@gmail.com
>> > > > > > >
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Oh and one more thing. I'm planning to add this feature in
>> > small
>> > > > > chunk
>> > > > > > of
>> > > > > > > > PRs because security is super hairy area. That way reviewers
>> > can
>> > > be
>> > > > > > more
>> > > > > > > > easily obtains the concept.
>> > > > > > > >
>> > > > > > > > On Fri, 21 Jan 2022, 18:03 David Morávek, <d...@apache.org>
>> > > wrote:
>> > > > > > > >
>> > > > > > > > > Hi Gabor,
>> > > > > > > > >
>> > > > > > > > > thanks for drafting the FLIP, I think having a solid
>> Kerberos
>> > > > > support
>> > > > > > > is
>> > > > > > > > > crucial for many enterprise deployments.
>> > > > > > > > >
>> > > > > > > > > I have multiple questions regarding the implementation
>> (note
>> > > > that I
>> > > > > > > have
>> > > > > > > > > very limited knowledge of Kerberos):
>> > > > > > > > >
>> > > > > > > > > 1) If I understand it correctly, we'll only obtain tokens
>> in
>> > > the
>> > > > > job
>> > > > > > > > > manager and then we'll distribute them via RPC (needs to
>> be
>> > > > > secured).
>> > > > > > > > >
>> > > > > > > > > Can you please outline how the communication will look
>> like?
>> > Is
>> > > > the
>> > > > > > > > > DelegationTokenManager going to be a part of the
>> > > ResourceManager?
>> > > > > Can
>> > > > > > > you
>> > > > > > > > > outline it's lifecycle / how it's going to be integrated
>> > there?
>> > > > > > > > >
>> > > > > > > > > 2) Do we really need a YARN / k8s specific
>> implementations?
>> > Is
>> > > it
>> > > > > > > > possible
>> > > > > > > > > to obtain / renew a token in a generic way? Maybe to
>> rephrase
>> > > > that,
>> > > > > > is
>> > > > > > > it
>> > > > > > > > > possible to implement DelegationTokenManager for the
>> > standalone
>> > > > > > Flink?
>> > > > > > > If
>> > > > > > > > > we're able to solve this point, it could be possible to
>> > target
>> > > > all
>> > > > > > > > > deployment scenarios with a single implementation.
>> > > > > > > > >
>> > > > > > > > > Best,
>> > > > > > > > > D.
>> > > > > > > > >
>> > > > > > > > > On Fri, Jan 14, 2022 at 3:47 AM Junfan Zhang <
>> > > > > > zuston.sha...@gmail.com>
>> > > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Hi G
>> > > > > > > > > >
>> > > > > > > > > > Thanks for your explain in detail. I have gotten your
>> > > thoughts,
>> > > > > and
>> > > > > > > any
>> > > > > > > > > > way this proposal
>> > > > > > > > > > is a great improvement.
>> > > > > > > > > >
>> > > > > > > > > > Looking forward to your implementation and i will keep
>> > focus
>> > > on
>> > > > > it.
>> > > > > > > > > > Thanks again.
>> > > > > > > > > >
>> > > > > > > > > > Best
>> > > > > > > > > > JunFan.
>> > > > > > > > > > On Jan 13, 2022, 9:20 PM +0800, Gabor Somogyi <
>> > > > > > > > gabor.g.somo...@gmail.com
>> > > > > > > > > >,
>> > > > > > > > > > wrote:
>> > > > > > > > > > > Just to confirm keeping
>> > > > > > "security.kerberos.fetch.delegation-token"
>> > > > > > > is
>> > > > > > > > > > added
>> > > > > > > > > > > to the doc.
>> > > > > > > > > > >
>> > > > > > > > > > > BR,
>> > > > > > > > > > > G
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > On Thu, Jan 13, 2022 at 1:34 PM Gabor Somogyi <
>> > > > > > > > > gabor.g.somo...@gmail.com
>> > > > > > > > > > >
>> > > > > > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > > Hi JunFan,
>> > > > > > > > > > > >
>> > > > > > > > > > > > > By the way, maybe this should be added in the
>> > migration
>> > > > > plan
>> > > > > > or
>> > > > > > > > > > > > intergation section in the FLIP-211.
>> > > > > > > > > > > >
>> > > > > > > > > > > > Going to add this soon.
>> > > > > > > > > > > >
>> > > > > > > > > > > > > Besides, I have a question that the KDC will
>> collapse
>> > > > when
>> > > > > > the
>> > > > > > > > > > cluster
>> > > > > > > > > > > > reached 200 nodes you described
>> > > > > > > > > > > > in the google doc. Do you have any attachment or
>> > > reference
>> > > > to
>> > > > > > > prove
>> > > > > > > > > it?
>> > > > > > > > > > > >
>> > > > > > > > > > > > "KDC *may* collapse under some circumstances" is the
>> > > proper
>> > > > > > > > wording.
>> > > > > > > > > > > >
>> > > > > > > > > > > > We have several customers who are executing
>> workloads
>> > on
>> > > > > > > > Spark/Flink.
>> > > > > > > > > > Most
>> > > > > > > > > > > > of the time I'm facing their
>> > > > > > > > > > > > daily issues which is heavily environment and
>> use-case
>> > > > > > dependent.
>> > > > > > > > > I've
>> > > > > > > > > > > > seen various cases:
>> > > > > > > > > > > > * where the mentioned ~1k nodes were working fine
>> > > > > > > > > > > > * where KDC thought the number of requests are
>> coming
>> > > from
>> > > > > DDOS
>> > > > > > > > > attack
>> > > > > > > > > > so
>> > > > > > > > > > > > discontinued authentication
>> > > > > > > > > > > > * where KDC was simply not responding because of the
>> > load
>> > > > > > > > > > > > * where KDC was intermittently had some outage (this
>> > was
>> > > > the
>> > > > > > most
>> > > > > > > > > nasty
>> > > > > > > > > > > > thing)
>> > > > > > > > > > > >
>> > > > > > > > > > > > Since you're managing relatively big cluster then
>> you
>> > > know
>> > > > > that
>> > > > > > > KDC
>> > > > > > > > > is
>> > > > > > > > > > not
>> > > > > > > > > > > > only used by Spark/Flink workloads
>> > > > > > > > > > > > but the whole company IT infrastructure is bombing
>> it
>> > so
>> > > it
>> > > > > > > really
>> > > > > > > > > > depends
>> > > > > > > > > > > > on other factors too whether KDC is reaching
>> > > > > > > > > > > > it's limit or not. Not sure what kind of evidence
>> are
>> > you
>> > > > > > looking
>> > > > > > > > for
>> > > > > > > > > > but
>> > > > > > > > > > > > I'm not authorized to share any information about
>> > > > > > > > > > > > our clients data.
>> > > > > > > > > > > >
>> > > > > > > > > > > > One thing is for sure. The more external system
>> types
>> > are
>> > > > > used
>> > > > > > in
>> > > > > > > > > > > > workloads (for ex. HDFS, HBase, Hive, Kafka) which
>> > > > > > > > > > > > are authenticating through KDC the more possibility
>> to
>> > > > reach
>> > > > > > this
>> > > > > > > > > > > > threshold when the cluster is big enough.
>> > > > > > > > > > > >
>> > > > > > > > > > > > All in all this feature is here to help all users
>> never
>> > > > reach
>> > > > > > > this
>> > > > > > > > > > > > limitation.
>> > > > > > > > > > > >
>> > > > > > > > > > > > BR,
>> > > > > > > > > > > > G
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > On Thu, Jan 13, 2022 at 1:00 PM 张俊帆 <
>> > > > zuston.sha...@gmail.com
>> > > > > >
>> > > > > > > > wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > > > Hi G
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Thanks for your quick reply. I think reserving the
>> > > config
>> > > > > of
>> > > > > > > > > > > > > *security.kerberos.fetch.delegation-token*
>> > > > > > > > > > > > > and simplifying disable the token fetching is a
>> good
>> > > > > idea.By
>> > > > > > > the
>> > > > > > > > > way,
>> > > > > > > > > > > > > maybe this should be added
>> > > > > > > > > > > > > in the migration plan or intergation section in
>> the
>> > > > > FLIP-211.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Besides, I have a question that the KDC will
>> collapse
>> > > > when
>> > > > > > the
>> > > > > > > > > > cluster
>> > > > > > > > > > > > > reached 200 nodes you described
>> > > > > > > > > > > > > in the google doc. Do you have any attachment or
>> > > > reference
>> > > > > to
>> > > > > > > > prove
>> > > > > > > > > > it?
>> > > > > > > > > > > > > Because in our internal per-cluster,
>> > > > > > > > > > > > > the nodes reaches > 1000 and KDC looks good. Do i
>> > > missed
>> > > > or
>> > > > > > > > > > misunderstood
>> > > > > > > > > > > > > something? Please correct me.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Best
>> > > > > > > > > > > > > JunFan.
>> > > > > > > > > > > > > On Jan 13, 2022, 5:26 PM +0800,
>> dev@flink.apache.org
>> > ,
>> > > > > wrote:
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://docs.google.com/document/d/1JzMbQ1pCJsLVz8yHrCxroYMRP2GwGwvacLrGyaIx5Yc/edit?fbclid=IwAR0vfeJvAbEUSzHQAAJfnWTaX46L6o7LyXhMfBUCcPrNi-uXNgoOaI8PMDQ
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

Reply via email to