Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

David Morávek Wed, 26 Jan 2022 12:03:37 -0800

Thanks the update, I'll go over it tomorrow.

On Wed, Jan 26, 2022 at 5:33 PM Gabor Somogyi <gabor.g.somo...@gmail.com>
wrote:


> Hi All,
>
> Since it has turned out that DTM can't be added as member of JobMaster
> <
> https://github.com/gaborgsomogyi/flink/blob/8ab75e46013f159778ccfce52463e7bc63e395a9/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java#L176
> >
> I've
> came up with a better proposal.
> David, thanks for pinpointing this out, you've caught a bug in the early
> phase!
>
> Namely ResourceManager
> <
> https://github.com/apache/flink/blob/674bc96662285b25e395fd3dddf9291a602fc183/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManager.java#L124
> >
> is
> a single instance class where DTM can be added as member variable.
> It has a list of all already registered TMs and new TM registration is also
> happening here.
> The following can be added from logic perspective to be more specific:
> * Create new DTM instance in ResourceManager
> <
> https://github.com/apache/flink/blob/674bc96662285b25e395fd3dddf9291a602fc183/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManager.java#L124
> >
> and
> start it (re-occurring thread to obtain new tokens)
> * Add a new function named "updateDelegationTokens" to TaskExecutorGateway
> <
> https://github.com/apache/flink/blob/674bc96662285b25e395fd3dddf9291a602fc183/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskExecutorGateway.java#L54
> >
> * Call "updateDelegationTokens" on all registered TMs to propagate new DTs
> * In case of new TM registration call "updateDelegationTokens" before
> registration succeeds to setup new TM properly
>
> This way:
> * only a single DTM would live within a cluster which is the expected
> behavior
> * DTM is going to be added to a central place where all deployment target
> can make use of it
> * DTs are going to be pushed to TMs which would generate less network
> traffic than pull based approach
> (please see my previous mail where I've described both approaches)
> * HA scenario is going to be consistent because such
> <
> https://github.com/apache/flink/blob/674bc96662285b25e395fd3dddf9291a602fc183/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskExecutor.java#L1069
> >
> a solution can be added to "updateDelegationTokens"
>
> @David or all others plz share whether you agree on this or you have better
> idea/suggestion.
>
> BR,
> G
>
>
> On Tue, Jan 25, 2022 at 11:00 AM Gabor Somogyi <gabor.g.somo...@gmail.com>
> wrote:
>
> > First of all thanks for investing your time and helping me out. As I see
> > you have pretty solid knowledge in the RPC area.
> > I would like to rely on your knowledge since I'm learning this part.
> >
> > > - Do we need to introduce a new RPC method or can we for example
> > piggyback
> > on heartbeats?
> >
> > I'm fine with either solution but one thing is important conceptually.
> > There are fundamentally 2 ways how tokens can be updated:
> > - Push way: When there are new DTs then JM JVM pushes DTs to TM JVMs.
> This
> > is the preferred one since tiny amount of control logic needed.
> > - Pull way: Each time a TM would like to poll JM whether there are new
> > tokens and each TM wants to decide alone whether DTs needs to be updated
> or
> > not.
> > As you've mentioned here some ID needs to be generated, it would
> generated
> > quite some additional network traffic which can be definitely avoided.
> > As a final thought in Spark we've had this way of DT propagation logic
> and
> > we've had major issues with it.
> >
> > So all in all DTM needs to obtain new tokens and there must a way to send
> > this data to all TMs from JM.
> >
> > > - What delivery semantics are we looking for? (what if we're only able
> to
> > update subset of TMs / what happens if we exhaust retries / should we
> even
> > have the retry mechanism whatsoever) - I have a feeling that somehow
> > leveraging the existing heartbeat mechanism could help to answer these
> > questions
> >
> > Let's go through these questions one by one.
> > > What delivery semantics are we looking for?
> >
> > DTM must receive an exception when at least one TM was not able to get
> DTs.
> >
> > > what if we're only able to update subset of TMs?
> >
> > Such case DTM will reschedule token obtain after
> > "security.kerberos.tokens.retry-wait" time.
> >
> > > what happens if we exhaust retries?
> >
> > There is no number of retries. In default configuration tokens needs to
> be
> > re-obtained after one day.
> > DTM tries to obtain new tokens after 1day * 0.75
> > (security.kerberos.tokens.renewal-ratio) = 18 hours.
> > When fails it retries after "security.kerberos.tokens.retry-wait" which
> is
> > 1 hour by default.
> > If it never succeeds then authentication error is going to happen on the
> > TM side and the workload is
> > going to stop.
> >
> > > should we even have the retry mechanism whatsoever?
> >
> > Yes, because there are always temporary cluster issues.
> >
> > > What does it mean for the running application (how does this look like
> > from
> > the user perspective)? As far as I remember the logs are only collected
> > ("aggregated") after the container is stopped, is that correct?
> >
> > With default config it works like that but it can be forced to aggregate
> > at specific intervals.
> > A useful feature is forcing YARN to aggregate logs while the job is still
> > running.
> > For long-running jobs such as streaming jobs, this is invaluable. To do
> > this,
> > yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds must be
> > set to a non-negative value.
> > When this is set, a timer will be set for the given duration, and
> whenever
> > that timer goes off,
> > log aggregation will run on new files.
> >
> > > I think
> > this topic should get its own section in the FLIP (having some cross
> > reference to YARN ticket would be really useful, but I'm not sure if
> there
> > are any).
> >
> > I think this is important knowledge but this FLIP is not touching the
> > already existing behavior.
> > DTs are set on the AM container which is renewed by YARN until it's not
> > possible anymore.
> > Any kind of new code is not going to change this limitation. BTW, there
> is
> > no jira for this.
> > If you think it worth to write this down then I think the good place is
> > the official security doc
> > area as caveat.
> >
> > > If we split the FLIP into two parts / sections that I've suggested, I
> > don't
> > really think that you need to explicitly test for each deployment
> scenario
> > / cluster framework, because the DTM part is completely independent of
> the
> > deployment target. Basically this is what I'm aiming for with "making it
> > work with the standalone" (as simple as starting a new java process)
> Flink
> > first (which is also how most people deploy streaming application on k8s
> > and the direction we're pushing forward with the auto-scaling / reactive
> > mode initiatives).
> >
> > I see your point and agree the main direction. k8s is the megatrend which
> > most of the peoples
> > will use sooner or later. Not 100% sure what kind of split you suggest
> but
> > in my view
> > the main target is to add this feature and I'm open to any logical work
> > ordering.
> > Please share the specific details and we work it out...
> >
> > G
> >
> >
> > On Mon, Jan 24, 2022 at 3:04 PM David Morávek <d...@apache.org> wrote:
> >
> >> >
> >> > Could you point to a code where you think it could be added exactly? A
> >> > helping hand is welcome here 🙂
> >> >
> >>
> >> I think you can take a look at _ResourceManagerPartitionTracker_ [1]
> which
> >> seems to have somewhat similar properties to the DTM.
> >>
> >> One topic that needs to be addressed there is how the RPC with the
> >> _TaskExecutorGateway_ should look like.
> >> - Do we need to introduce a new RPC method or can we for example
> piggyback
> >> on heartbeats?
> >> - What delivery semantics are we looking for? (what if we're only able
> to
> >> update subset of TMs / what happens if we exhaust retries / should we
> even
> >> have the retry mechanism whatsoever) - I have a feeling that somehow
> >> leveraging the existing heartbeat mechanism could help to answer these
> >> questions
> >>
> >> In short, after DT reaches it's max lifetime then log aggregation stops
> >> >
> >>
> >> What does it mean for the running application (how does this look like
> >> from
> >> the user perspective)? As far as I remember the logs are only collected
> >> ("aggregated") after the container is stopped, is that correct? I think
> >> this topic should get its own section in the FLIP (having some cross
> >> reference to YARN ticket would be really useful, but I'm not sure if
> there
> >> are any).
> >>
> >> All deployment modes (per-job, per-app, ...) are planned to be tested
> and
> >> > expect to work with the initial implementation however not all
> >> deployment
> >> > targets (k8s, local, ...
> >> >
> >>
> >> If we split the FLIP into two parts / sections that I've suggested, I
> >> don't
> >> really think that you need to explicitly test for each deployment
> scenario
> >> / cluster framework, because the DTM part is completely independent of
> the
> >> deployment target. Basically this is what I'm aiming for with "making it
> >> work with the standalone" (as simple as starting a new java process)
> Flink
> >> first (which is also how most people deploy streaming application on k8s
> >> and the direction we're pushing forward with the auto-scaling / reactive
> >> mode initiatives).
> >>
> >> The whole integration with YARN (let's forget about log aggregation for
> a
> >> moment) / k8s-native only boils down to how do we make the keytab file
> >> local to the JobManager so the DTM can read it, so it's basically built
> on
> >> top of that. The only special thing that needs to be tested there is the
> >> "keytab distribution" code path.
> >>
> >> [1]
> >>
> >>
> https://github.com/apache/flink/blob/release-1.14.3/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/ResourceManagerPartitionTracker.java
> >>
> >> Best,
> >> D.
> >>
> >> On Mon, Jan 24, 2022 at 12:35 PM Gabor Somogyi <
> gabor.g.somo...@gmail.com
> >> >
> >> wrote:
> >>
> >> > > There is a separate JobMaster for each job
> >> > within a Flink cluster and each JobMaster only has a partial view of
> the
> >> > task managers
> >> >
> >> > Good point! I've had a deeper look and you're right. We definitely
> need
> >> to
> >> > find another place.
> >> >
> >> > > Related per-cluster or per-job keytab:
> >> >
> >> > In the current code per-cluster keytab is implemented and I'm intended
> >> to
> >> > keep it like this within this FLIP. The reason is simple: tokens on TM
> >> side
> >> > can be stored within the UserGroupInformation (UGI) structure which is
> >> > global. I'm not telling it's impossible to change that but I think
> that
> >> > this is such a complexity which the initial implementation is not
> >> required
> >> > to contain. Additionally we've not seen such need from user side. If
> the
> >> > need may rise later on then another FLIP with this topic can be
> created
> >> and
> >> > discussed. Proper multi-UGI handling within a single JVM is a topic
> >> where
> >> > several round of deep-dive with the Hadoop/YARN guys are required.
> >> >
> >> > > single DTM instance embedded with
> >> > the ResourceManager (the Flink component)
> >> >
> >> > Could you point to a code where you think it could be added exactly? A
> >> > helping hand is welcome here🙂
> >> >
> >> > > Then the single (initial) implementation should work with all the
> >> > deployments modes out of the box (which is not what the FLIP
> suggests).
> >> Is
> >> > that correct?
> >> >
> >> > All deployment modes (per-job, per-app, ...) are planned to be tested
> >> and
> >> > expect to work with the initial implementation however not all
> >> deployment
> >> > targets (k8s, local, ...) are not intended to be tested. Per
> deployment
> >> > target new jira needs to be created where I expect small number of
> codes
> >> > needs to be added and relatively expensive testing effort is required.
> >> >
> >> > > I've taken a look into the prototype and in the
> >> "YarnClusterDescriptor"
> >> > you're injecting a delegation token into the AM [1] (that's obtained
> >> using
> >> > the provided keytab). If I understand this correctly from previous
> >> > discussion / FLIP, this is to support log aggregation and DT has a
> >> limited
> >> > validity. How is this DT going to be renewed?
> >> >
> >> > You're clever and touched a limitation which Spark has too. In short,
> >> after
> >> > DT reaches it's max lifetime then log aggregation stops. I've had
> >> several
> >> > deep-dive rounds with the YARN guys at Spark years because wanted to
> >> fill
> >> > this gap. They can't provide us any way to re-inject the newly
> obtained
> >> DT
> >> > so at the end I gave up this.
> >> >
> >> > BR,
> >> > G
> >> >
> >> >
> >> > On Mon, 24 Jan 2022, 11:00 David Morávek, <d...@apache.org> wrote:
> >> >
> >> > > Hi Gabor,
> >> > >
> >> > > There is actually a huge difference between JobManager (process) and
> >> > > JobMaster (job coordinator). The naming is unfortunately bit
> >> misleading
> >> > > here from historical reasons. There is a separate JobMaster for each
> >> job
> >> > > within a Flink cluster and each JobMaster only has a partial view of
> >> the
> >> > > task managers (depends on where the slots for a particular job are
> >> > > allocated). This means that you'll end up with N
> >> > "DelegationTokenManagers"
> >> > > competing with each other (N = number of running jobs in the
> cluster).
> >> > >
> >> > > This makes me think we're mixing two abstraction levels here:
> >> > >
> >> > > a) Per-cluster delegation tokens
> >> > > - Simpler approach, it would involve a single DTM instance embedded
> >> with
> >> > > the ResourceManager (the Flink component)
> >> > > b) Per-job delegation tokens
> >> > > - More complex approach, but could be more flexible from the user
> >> side of
> >> > > things.
> >> > > - Multiple DTM instances, that are bound with the JobMaster
> lifecycle.
> >> > > Delegation tokens are attached with a particular slots that are
> >> executing
> >> > > the job tasks instead of the whole task manager (TM could be
> executing
> >> > > multiple jobs with different tokens).
> >> > > - The question is which keytab should be used for the clustering
> >> > framework,
> >> > > to support log aggregation on YARN (an extra keytab, keytab that
> comes
> >> > with
> >> > > the first job?)
> >> > >
> >> > > I think these are the things that need to be clarified in the FLIP
> >> before
> >> > > proceeding.
> >> > >
> >> > > A follow-up question for getting a better understanding where this
> >> should
> >> > > be headed: Are there any use cases where user may want to use
> >> different
> >> > > keytabs with each job, or are we fine with using a cluster-wide
> >> keytab?
> >> > If
> >> > > we go with per-cluster keytabs, is it OK that all jobs submitted
> into
> >> > this
> >> > > cluster can access it (even the future ones)? Should this be a
> >> security
> >> > > concern?
> >> > >
> >> > > Presume you though I would implement a new class with JobManager
> name.
> >> > The
> >> > > > plan is not that.
> >> > > >
> >> > >
> >> > > I've never suggested such thing.
> >> > >
> >> > >
> >> > > > No. That said earlier DT handling is planned to be done completely
> >> in
> >> > > > Flink. DTM has a renewal thread which re-obtains tokens in the
> >> proper
> >> > > time
> >> > > > when needed.
> >> > > >
> >> > >
> >> > > Then the single (initial) implementation should work with all the
> >> > > deployments modes out of the box (which is not what the FLIP
> >> suggests).
> >> > Is
> >> > > that correct?
> >> > >
> >> > > If the cluster framework, also requires delegation token for their
> >> inner
> >> > > working (this is IMO only applies to YARN), it might need an extra
> >> step
> >> > > (injecting the token into application master container).
> >> > >
> >> > > Separating the individual layers (actual Flink cluster - basically
> >> making
> >> > > this work with a standalone deployment  / "cluster framework" -
> >> support
> >> > for
> >> > > YARN log aggregation) in the FLIP would be useful.
> >> > >
> >> > > Reading the linked Spark readme could be useful.
> >> > > >
> >> > >
> >> > > I've read that, but please be patient with the questions, Kerberos
> is
> >> not
> >> > > an easy topic to get into and I've had a very little contact with it
> >> in
> >> > the
> >> > > past.
> >> > >
> >> > >
> >> > >
> >> >
> >>
> https://github.com/gaborgsomogyi/flink/blob/8ab75e46013f159778ccfce52463e7bc63e395a9/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java#L176
> >> > > >
> >> > >
> >> > > I've taken a look into the prototype and in the
> >> "YarnClusterDescriptor"
> >> > > you're injecting a delegation token into the AM [1] (that's obtained
> >> > using
> >> > > the provided keytab). If I understand this correctly from previous
> >> > > discussion / FLIP, this is to support log aggregation and DT has a
> >> > limited
> >> > > validity. How is this DT going to be renewed?
> >> > >
> >> > > [1]
> >> > >
> >> > >
> >> >
> >>
> https://github.com/gaborgsomogyi/flink/commit/8ab75e46013f159778ccfce52463e7bc63e395a9#diff-02416e2d6ca99e1456f9c3949f3d7c2ac523d3fe25378620c09632e4aac34e4eR1261
> >> > >
> >> > > Best,
> >> > > D.
> >> > >
> >> > > On Fri, Jan 21, 2022 at 9:35 PM Gabor Somogyi <
> >> gabor.g.somo...@gmail.com
> >> > >
> >> > > wrote:
> >> > >
> >> > > > Here is the exact class, I'm from mobile so not had a look at the
> >> exact
> >> > > > class name:
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/gaborgsomogyi/flink/blob/8ab75e46013f159778ccfce52463e7bc63e395a9/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java#L176
> >> > > > That keeps track of TMs where the tokens can be sent to.
> >> > > >
> >> > > > > My feeling would be that we shouldn't really introduce a new
> >> > component
> >> > > > with
> >> > > > a custom lifecycle, but rather we should try to incorporate this
> >> into
> >> > > > existing ones.
> >> > > >
> >> > > > Can you be more specific? Presume you though I would implement a
> new
> >> > > class
> >> > > > with JobManager name. The plan is not that.
> >> > > >
> >> > > > > If I understand this correctly, this means that we then push the
> >> > token
> >> > > > renewal logic to YARN.
> >> > > >
> >> > > > No. That said earlier DT handling is planned to be done completely
> >> in
> >> > > > Flink. DTM has a renewal thread which re-obtains tokens in the
> >> proper
> >> > > time
> >> > > > when needed. YARN log aggregation is a totally different feature,
> >> where
> >> > > > YARN does the renewal. Log aggregation was an example why the code
> >> > can't
> >> > > be
> >> > > > 100% reusable for all resource managers. Reading the linked Spark
> >> > readme
> >> > > > could be useful.
> >> > > >
> >> > > > G
> >> > > >
> >> > > > On Fri, 21 Jan 2022, 21:05 David Morávek, <d...@apache.org>
> wrote:
> >> > > >
> >> > > > > >
> >> > > > > > JobManager is the Flink class.
> >> > > > >
> >> > > > >
> >> > > > > There is no such class in Flink. The closest thing to the
> >> JobManager
> >> > > is a
> >> > > > > ClusterEntrypoint. The cluster entrypoint spawns new RM Runner &
> >> > > > Dispatcher
> >> > > > > Runner that start participating in the leader election. Once
> they
> >> > gain
> >> > > > > leadership they spawn the actual underlying instances of these
> two
> >> > > "main
> >> > > > > components".
> >> > > > >
> >> > > > > My feeling would be that we shouldn't really introduce a new
> >> > component
> >> > > > with
> >> > > > > a custom lifecycle, but rather we should try to incorporate this
> >> into
> >> > > > > existing ones.
> >> > > > >
> >> > > > > My biggest concerns would be:
> >> > > > >
> >> > > > > - How would the lifecycle of the new component look like with
> >> regards
> >> > > to
> >> > > > HA
> >> > > > > setups. If we really try to decide to introduce a completely new
> >> > > > component,
> >> > > > > how should this work in case of multiple JobManager instances?
> >> > > > > - Which components does it talk to / how? For example how does
> the
> >> > > > > broadcast of new token to task managers (TaskManagerGateway)
> look
> >> > like?
> >> > > > Do
> >> > > > > we simply introduce a new RPC on the ResourceManagerGateway that
> >> > > > broadcasts
> >> > > > > it or does the new component need to do some kind of bookkeeping
> >> of
> >> > > task
> >> > > > > managers that it needs to notify?
> >> > > > >
> >> > > > > YARN based HDFS log aggregation would not work by dropping that
> >> code.
> >> > > > Just
> >> > > > > > to be crystal clear, the actual implementation contains this
> fir
> >> > > > exactly
> >> > > > > > this reason.
> >> > > > > >
> >> > > > >
> >> > > > > This is the missing part +1. If I understand this correctly,
> this
> >> > means
> >> > > > > that we then push the token renewal logic to YARN. How do you
> >> plan to
> >> > > > > implement the renewal logic on k8s?
> >> > > > >
> >> > > > > D.
> >> > > > >
> >> > > > > On Fri, Jan 21, 2022 at 8:37 PM Gabor Somogyi <
> >> > > gabor.g.somo...@gmail.com
> >> > > > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > > > I think we might both mean something different by the RM.
> >> > > > > >
> >> > > > > > You feel it well, I've not specified these terms well in the
> >> > > > explanation.
> >> > > > > > RM I meant resource management framework. JobManager is the
> >> Flink
> >> > > > class.
> >> > > > > > This means that inside JM instance there will be a DTM
> >> instance, so
> >> > > > they
> >> > > > > > would have the same lifecycle. Hope I've answered the
> question.
> >> > > > > >
> >> > > > > > > If we have tokens available on the client side, why do we
> >> need to
> >> > > set
> >> > > > > > them
> >> > > > > > into the AM (yarn specific concept) launch context?
> >> > > > > >
> >> > > > > > YARN based HDFS log aggregation would not work by dropping
> that
> >> > code.
> >> > > > > Just
> >> > > > > > to be crystal clear, the actual implementation contains this
> fir
> >> > > > exactly
> >> > > > > > this reason.
> >> > > > > >
> >> > > > > > G
> >> > > > > >
> >> > > > > > On Fri, 21 Jan 2022, 20:12 David Morávek, <d...@apache.org>
> >> wrote:
> >> > > > > >
> >> > > > > > > Hi Gabor,
> >> > > > > > >
> >> > > > > > > 1. One thing is important, token management is planned to be
> >> done
> >> > > > > > > > generically within Flink and not scattered in RM specific
> >> code.
> >> > > > > > > JobManager
> >> > > > > > > > has a DelegationTokenManager which obtains tokens
> >> time-to-time
> >> > > (if
> >> > > > > > > > configured properly). JM knows which TaskManagers are in
> >> place
> >> > so
> >> > > > it
> >> > > > > > can
> >> > > > > > > > distribute it to all TMs. That's it basically.
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > I think we might both mean something different by the RM.
> >> > > JobManager
> >> > > > is
> >> > > > > > > basically just a process encapsulating multiple components,
> >> one
> >> > of
> >> > > > > which
> >> > > > > > is
> >> > > > > > > a ResourceManager, which is the component that manages task
> >> > manager
> >> > > > > > > registrations [1]. There is more or less a single
> >> implementation
> >> > of
> >> > > > the
> >> > > > > > RM
> >> > > > > > > with plugable drivers for the active integrations (yarn,
> k8s).
> >> > > > > > >
> >> > > > > > > It would be great if you could share more details of how
> >> exactly
> >> > > the
> >> > > > > DTM
> >> > > > > > is
> >> > > > > > > going to fit in the current JM architecture.
> >> > > > > > >
> >> > > > > > > 2. 99.9% of the code is generic but each RM handles tokens
> >> > > > > differently. A
> >> > > > > > > > good example is YARN obtains tokens on client side and
> then
> >> > sets
> >> > > > them
> >> > > > > > on
> >> > > > > > > > the newly created AM container launch context. This is
> >> purely
> >> > > YARN
> >> > > > > > > specific
> >> > > > > > > > and cant't be spared. With my actual plans standalone can
> be
> >> > > > changed
> >> > > > > to
> >> > > > > > > use
> >> > > > > > > > the framework. By using it I mean no RM specific DTM or
> >> > > whatsoever
> >> > > > is
> >> > > > > > > > needed.
> >> > > > > > > >
> >> > > > > > >
> >> > > > > > > If we have tokens available on the client side, why do we
> >> need to
> >> > > set
> >> > > > > > them
> >> > > > > > > into the AM (yarn specific concept) launch context? Why
> can't
> >> we
> >> > > > simply
> >> > > > > > > send them to the JM, eg. as a parameter of the job
> submission
> >> /
> >> > via
> >> > > > > > > separate RPC call? There might be something I'm missing due
> to
> >> > > > limited
> >> > > > > > > knowledge, but handling the token on the "cluster framework"
> >> > level
> >> > > > > > doesn't
> >> > > > > > > seem necessary.
> >> > > > > > >
> >> > > > > > > [1]
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/concepts/flink-architecture/#jobmanager
> >> > > > > > >
> >> > > > > > > Best,
> >> > > > > > > D.
> >> > > > > > >
> >> > > > > > > On Fri, Jan 21, 2022 at 7:48 PM Gabor Somogyi <
> >> > > > > gabor.g.somo...@gmail.com
> >> > > > > > >
> >> > > > > > > wrote:
> >> > > > > > >
> >> > > > > > > > Oh and one more thing. I'm planning to add this feature in
> >> > small
> >> > > > > chunk
> >> > > > > > of
> >> > > > > > > > PRs because security is super hairy area. That way
> reviewers
> >> > can
> >> > > be
> >> > > > > > more
> >> > > > > > > > easily obtains the concept.
> >> > > > > > > >
> >> > > > > > > > On Fri, 21 Jan 2022, 18:03 David Morávek, <
> d...@apache.org>
> >> > > wrote:
> >> > > > > > > >
> >> > > > > > > > > Hi Gabor,
> >> > > > > > > > >
> >> > > > > > > > > thanks for drafting the FLIP, I think having a solid
> >> Kerberos
> >> > > > > support
> >> > > > > > > is
> >> > > > > > > > > crucial for many enterprise deployments.
> >> > > > > > > > >
> >> > > > > > > > > I have multiple questions regarding the implementation
> >> (note
> >> > > > that I
> >> > > > > > > have
> >> > > > > > > > > very limited knowledge of Kerberos):
> >> > > > > > > > >
> >> > > > > > > > > 1) If I understand it correctly, we'll only obtain
> tokens
> >> in
> >> > > the
> >> > > > > job
> >> > > > > > > > > manager and then we'll distribute them via RPC (needs to
> >> be
> >> > > > > secured).
> >> > > > > > > > >
> >> > > > > > > > > Can you please outline how the communication will look
> >> like?
> >> > Is
> >> > > > the
> >> > > > > > > > > DelegationTokenManager going to be a part of the
> >> > > ResourceManager?
> >> > > > > Can
> >> > > > > > > you
> >> > > > > > > > > outline it's lifecycle / how it's going to be integrated
> >> > there?
> >> > > > > > > > >
> >> > > > > > > > > 2) Do we really need a YARN / k8s specific
> >> implementations?
> >> > Is
> >> > > it
> >> > > > > > > > possible
> >> > > > > > > > > to obtain / renew a token in a generic way? Maybe to
> >> rephrase
> >> > > > that,
> >> > > > > > is
> >> > > > > > > it
> >> > > > > > > > > possible to implement DelegationTokenManager for the
> >> > standalone
> >> > > > > > Flink?
> >> > > > > > > If
> >> > > > > > > > > we're able to solve this point, it could be possible to
> >> > target
> >> > > > all
> >> > > > > > > > > deployment scenarios with a single implementation.
> >> > > > > > > > >
> >> > > > > > > > > Best,
> >> > > > > > > > > D.
> >> > > > > > > > >
> >> > > > > > > > > On Fri, Jan 14, 2022 at 3:47 AM Junfan Zhang <
> >> > > > > > zuston.sha...@gmail.com>
> >> > > > > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > > > Hi G
> >> > > > > > > > > >
> >> > > > > > > > > > Thanks for your explain in detail. I have gotten your
> >> > > thoughts,
> >> > > > > and
> >> > > > > > > any
> >> > > > > > > > > > way this proposal
> >> > > > > > > > > > is a great improvement.
> >> > > > > > > > > >
> >> > > > > > > > > > Looking forward to your implementation and i will keep
> >> > focus
> >> > > on
> >> > > > > it.
> >> > > > > > > > > > Thanks again.
> >> > > > > > > > > >
> >> > > > > > > > > > Best
> >> > > > > > > > > > JunFan.
> >> > > > > > > > > > On Jan 13, 2022, 9:20 PM +0800, Gabor Somogyi <
> >> > > > > > > > gabor.g.somo...@gmail.com
> >> > > > > > > > > >,
> >> > > > > > > > > > wrote:
> >> > > > > > > > > > > Just to confirm keeping
> >> > > > > > "security.kerberos.fetch.delegation-token"
> >> > > > > > > is
> >> > > > > > > > > > added
> >> > > > > > > > > > > to the doc.
> >> > > > > > > > > > >
> >> > > > > > > > > > > BR,
> >> > > > > > > > > > > G
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > On Thu, Jan 13, 2022 at 1:34 PM Gabor Somogyi <
> >> > > > > > > > > gabor.g.somo...@gmail.com
> >> > > > > > > > > > >
> >> > > > > > > > > > > wrote:
> >> > > > > > > > > > >
> >> > > > > > > > > > > > Hi JunFan,
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > By the way, maybe this should be added in the
> >> > migration
> >> > > > > plan
> >> > > > > > or
> >> > > > > > > > > > > > intergation section in the FLIP-211.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Going to add this soon.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > Besides, I have a question that the KDC will
> >> collapse
> >> > > > when
> >> > > > > > the
> >> > > > > > > > > > cluster
> >> > > > > > > > > > > > reached 200 nodes you described
> >> > > > > > > > > > > > in the google doc. Do you have any attachment or
> >> > > reference
> >> > > > to
> >> > > > > > > prove
> >> > > > > > > > > it?
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > "KDC *may* collapse under some circumstances" is
> the
> >> > > proper
> >> > > > > > > > wording.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > We have several customers who are executing
> >> workloads
> >> > on
> >> > > > > > > > Spark/Flink.
> >> > > > > > > > > > Most
> >> > > > > > > > > > > > of the time I'm facing their
> >> > > > > > > > > > > > daily issues which is heavily environment and
> >> use-case
> >> > > > > > dependent.
> >> > > > > > > > > I've
> >> > > > > > > > > > > > seen various cases:
> >> > > > > > > > > > > > * where the mentioned ~1k nodes were working fine
> >> > > > > > > > > > > > * where KDC thought the number of requests are
> >> coming
> >> > > from
> >> > > > > DDOS
> >> > > > > > > > > attack
> >> > > > > > > > > > so
> >> > > > > > > > > > > > discontinued authentication
> >> > > > > > > > > > > > * where KDC was simply not responding because of
> the
> >> > load
> >> > > > > > > > > > > > * where KDC was intermittently had some outage
> (this
> >> > was
> >> > > > the
> >> > > > > > most
> >> > > > > > > > > nasty
> >> > > > > > > > > > > > thing)
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Since you're managing relatively big cluster then
> >> you
> >> > > know
> >> > > > > that
> >> > > > > > > KDC
> >> > > > > > > > > is
> >> > > > > > > > > > not
> >> > > > > > > > > > > > only used by Spark/Flink workloads
> >> > > > > > > > > > > > but the whole company IT infrastructure is bombing
> >> it
> >> > so
> >> > > it
> >> > > > > > > really
> >> > > > > > > > > > depends
> >> > > > > > > > > > > > on other factors too whether KDC is reaching
> >> > > > > > > > > > > > it's limit or not. Not sure what kind of evidence
> >> are
> >> > you
> >> > > > > > looking
> >> > > > > > > > for
> >> > > > > > > > > > but
> >> > > > > > > > > > > > I'm not authorized to share any information about
> >> > > > > > > > > > > > our clients data.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > One thing is for sure. The more external system
> >> types
> >> > are
> >> > > > > used
> >> > > > > > in
> >> > > > > > > > > > > > workloads (for ex. HDFS, HBase, Hive, Kafka) which
> >> > > > > > > > > > > > are authenticating through KDC the more
> possibility
> >> to
> >> > > > reach
> >> > > > > > this
> >> > > > > > > > > > > > threshold when the cluster is big enough.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > All in all this feature is here to help all users
> >> never
> >> > > > reach
> >> > > > > > > this
> >> > > > > > > > > > > > limitation.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > BR,
> >> > > > > > > > > > > > G
> >> > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > On Thu, Jan 13, 2022 at 1:00 PM 张俊帆 <
> >> > > > zuston.sha...@gmail.com
> >> > > > > >
> >> > > > > > > > wrote:
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > Hi G
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > Thanks for your quick reply. I think reserving
> the
> >> > > config
> >> > > > > of
> >> > > > > > > > > > > > > *security.kerberos.fetch.delegation-token*
> >> > > > > > > > > > > > > and simplifying disable the token fetching is a
> >> good
> >> > > > > idea.By
> >> > > > > > > the
> >> > > > > > > > > way,
> >> > > > > > > > > > > > > maybe this should be added
> >> > > > > > > > > > > > > in the migration plan or intergation section in
> >> the
> >> > > > > FLIP-211.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > Besides, I have a question that the KDC will
> >> collapse
> >> > > > when
> >> > > > > > the
> >> > > > > > > > > > cluster
> >> > > > > > > > > > > > > reached 200 nodes you described
> >> > > > > > > > > > > > > in the google doc. Do you have any attachment or
> >> > > > reference
> >> > > > > to
> >> > > > > > > > prove
> >> > > > > > > > > > it?
> >> > > > > > > > > > > > > Because in our internal per-cluster,
> >> > > > > > > > > > > > > the nodes reaches > 1000 and KDC looks good. Do
> i
> >> > > missed
> >> > > > or
> >> > > > > > > > > > misunderstood
> >> > > > > > > > > > > > > something? Please correct me.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > Best
> >> > > > > > > > > > > > > JunFan.
> >> > > > > > > > > > > > > On Jan 13, 2022, 5:26 PM +0800,
> >> dev@flink.apache.org
> >> > ,
> >> > > > > wrote:
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://docs.google.com/document/d/1JzMbQ1pCJsLVz8yHrCxroYMRP2GwGwvacLrGyaIx5Yc/edit?fbclid=IwAR0vfeJvAbEUSzHQAAJfnWTaX46L6o7LyXhMfBUCcPrNi-uXNgoOaI8PMDQ
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

Reply via email to