Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

Gabor Somogyi Mon, 31 Jan 2022 04:14:52 -0800

Thanks for the confirmation, now it works!

G



On Mon, Jan 31, 2022 at 12:25 PM Chesnay Schepler <[email protected]>
wrote:

> You should have permissions now. Note that I saw 2 accounts matching
> your name, and I picked gaborgsomogyi.
>
> On 31/01/2022 11:28, Gabor Somogyi wrote:
> > Not sure if the mentioned write right already given or not but I still
> > don't see any edit button.
> >
> > G
> >
> >
> > On Fri, Jan 28, 2022 at 5:08 PM Gabor Somogyi <[email protected]
> >
> > wrote:
> >
> >> Hi Robert,
> >>
> >> That would be awesome.
> >>
> >> My cwiki username: gaborgsomogyi
> >>
> >> G
> >>
> >>
> >> On Fri, Jan 28, 2022 at 5:06 PM Robert Metzger <[email protected]>
> >> wrote:
> >>
> >>> Hey Gabor,
> >>>
> >>> let me know your cwiki username, and I can give you write permissions.
> >>>
> >>>
> >>> On Fri, Jan 28, 2022 at 4:05 PM Gabor Somogyi <
> [email protected]>
> >>> wrote:
> >>>
> >>>> Thanks for making the design better! No further thing to discuss from
> my
> >>>> side.
> >>>>
> >>>> Started to reflect the agreement in the FLIP doc.
> >>>> Since I don't have access to the wiki I need to ask Marci to do that
> >>> which
> >>>> may take some time.
> >>>>
> >>>> G
> >>>>
> >>>>
> >>>> On Fri, Jan 28, 2022 at 3:52 PM David Morávek <[email protected]>
> wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> AFAIU an under registration TM is not added to the registered TMs map
> >>>> until
> >>>>>> RegistrationResponse ..
> >>>>>>
> >>>>> I think you're right, with a careful design around threading
> >>> (delegating
> >>>>> update broadcasts to the main thread) + synchronous initial update
> >>> (that
> >>>>> would be nice to avoid) this should be doable.
> >>>>>
> >>>>> Not sure what you mean "we can't register the TM without providing it
> >>>> with
> >>>>>> token" but in unsecure configuration registration must happen w/o
> >>>> tokens.
> >>>>> Exactly as you describe it, this was meant only for the "kerberized /
> >>>>> secured" cluster case, in other cases we wouldn't enforce a non-null
> >>>> token
> >>>>> in the response
> >>>>>
> >>>>> I think this is a good idea in general.
> >>>>> +1
> >>>>>
> >>>>> If you don't have any more thoughts on the RPC / lifecycle part, can
> >>> you
> >>>>> please reflect it into the FLIP?
> >>>>>
> >>>>> D.
> >>>>>
> >>>>> On Fri, Jan 28, 2022 at 3:16 PM Gabor Somogyi <
> >>> [email protected]
> >>>>> wrote:
> >>>>>
> >>>>>>> - Make sure DTs issued by single DTMs are monotonically increasing
> >>>> (can
> >>>>>> be
> >>>>>> sorted on TM side)
> >>>>>>
> >>>>>> AFAIU an under registration TM is not added to the registered TMs
> >>> map
> >>>>> until
> >>>>>> RegistrationResponse
> >>>>>> is processed which would contain the initial tokens. If that's true
> >>>> then
> >>>>>> how is it possible to have race with
> >>>>>> DTM update which is working on the registered TMs list?
> >>>>>> To be more specific "taskExecutors" is the registered map of TMs to
> >>>> which
> >>>>>> DTM can send updated tokens
> >>>>>> but this doesn't contain the under registration TM while
> >>>>>> RegistrationResponse is not processed, right?
> >>>>>>
> >>>>>> Of course if DTM can update while RegistrationResponse is processed
> >>>> then
> >>>>>> somehow sorting would be
> >>>>>> required and that case I would agree.
> >>>>>>
> >>>>>> - Scope DT updates by the RM ID and ensure that TM only accepts
> >>> update
> >>>>> from
> >>>>>> the current leader
> >>>>>>
> >>>>>> I've planned this initially the mentioned way so agreed.
> >>>>>>
> >>>>>> - Return initial token with the RegistrationResponse, which should
> >>> make
> >>>>> the
> >>>>>> RPC contract bit clearer (ensure that we can't register the TM
> >>> without
> >>>>>> providing it with token)
> >>>>>>
> >>>>>> I think this is a good idea in general. Not sure what you mean "we
> >>>> can't
> >>>>>> register the TM without
> >>>>>> providing it with token" but in unsecure configuration registration
> >>>> must
> >>>>>> happen w/o tokens.
> >>>>>> All in all the newly added tokens field must be somehow optional.
> >>>>>>
> >>>>>> G
> >>>>>>
> >>>>>>
> >>>>>> On Fri, Jan 28, 2022 at 2:22 PM David Morávek <[email protected]>
> >>> wrote:
> >>>>>>> We had a long discussion with Chesnay about the possible edge
> >>> cases
> >>>> and
> >>>>>> it
> >>>>>>> basically boils down to the following two scenarios:
> >>>>>>>
> >>>>>>> 1) There is a possible race condition between TM registration (the
> >>>>> first
> >>>>>> DT
> >>>>>>> update) and token refresh if they happen simultaneously. Than the
> >>>>>>> registration might beat the refreshed token. This could be easily
> >>>>>> addressed
> >>>>>>> if DTs could be sorted (eg. by the expiration time) on the TM
> >>> side.
> >>>> In
> >>>>>>> other words, if there are multiple updates at the same time we
> >>> need
> >>>> to
> >>>>>> make
> >>>>>>> sure that we have a deterministic way of choosing the latest one.
> >>>>>>>
> >>>>>>> One idea by Chesnay that popped up during this discussion was
> >>> whether
> >>>>> we
> >>>>>>> could simply return the initial token with the
> >>> RegistrationResponse
> >>>> to
> >>>>>>> avoid making an extra call during the TM registration.
> >>>>>>>
> >>>>>>> 2) When the RM leadership changes (eg. because zookeeper session
> >>>> times
> >>>>>> out)
> >>>>>>> there might be a race condition where the old RM is shutting down
> >>> and
> >>>>>>> updates the tokens, that it might again beat the registration
> >>> token
> >>>> of
> >>>>>> the
> >>>>>>> new RM. This could be avoided if we scope the token by
> >>>>>> _ResourceManagerId_
> >>>>>>> and only accept updates for the current leader (basically we'd
> >>> have
> >>>> an
> >>>>>>> extra parameter to the _updateDelegationToken_ method).
> >>>>>>>
> >>>>>>> -
> >>>>>>>
> >>>>>>> DTM is way simpler then for example slot management, which could
> >>>>> receive
> >>>>>>> updates from the JobMaster that RM might not know about.
> >>>>>>>
> >>>>>>> So if you want to go in the path you're describing it should be
> >>>> doable
> >>>>>> and
> >>>>>>> we'd propose following to cover all cases:
> >>>>>>>
> >>>>>>> - Make sure DTs issued by single DTMs are monotonically increasing
> >>>> (can
> >>>>>> be
> >>>>>>> sorted on TM side)
> >>>>>>> - Scope DT updates by the RM ID and ensure that TM only accepts
> >>>> update
> >>>>>> from
> >>>>>>> the current leader
> >>>>>>> - Return initial token with the RegistrationResponse, which should
> >>>> make
> >>>>>> the
> >>>>>>> RPC contract bit clearer (ensure that we can't register the TM
> >>>> without
> >>>>>>> providing it with token)
> >>>>>>>
> >>>>>>> Any thoughts?
> >>>>>>>
> >>>>>>>
> >>>>>>> On Fri, Jan 28, 2022 at 10:53 AM Gabor Somogyi <
> >>>>>> [email protected]>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Thanks for investing your time!
> >>>>>>>>
> >>>>>>>> The first 2 bulletpoint are clear.
> >>>>>>>> If there is a chance that a TM can go to an inconsistent state
> >>>> then I
> >>>>>>> agree
> >>>>>>>> with the 3rd bulletpoint.
> >>>>>>>> Just before we agree on that I would like to learn something new
> >>>> and
> >>>>>>>> understand how is it possible that a TM
> >>>>>>>> gets corrupted? (In Spark I've never seen such thing and no
> >>>> mechanism
> >>>>>> to
> >>>>>>>> fix this but Flink is definitely not Spark)
> >>>>>>>>
> >>>>>>>> Here is my understanding:
> >>>>>>>> * DTM pushes new obtained DTs to TMs and if any exception occurs
> >>>>> then a
> >>>>>>>> retry after "security.kerberos.tokens.retry-wait"
> >>>>>>>> happens. This means DTM retries until it's not possible to send
> >>> new
> >>>>> DTs
> >>>>>>> to
> >>>>>>>> all registered TMs.
> >>>>>>>> * New TM registration must fail if "updateDelegationToken" fails
> >>>>>>>> * "updateDelegationToken" fails consistently like a DB (at
> >>> least I
> >>>>> plan
> >>>>>>> to
> >>>>>>>> implement it that way).
> >>>>>>>> If DTs are arriving on the TM side then a single
> >>>>>>>> "UserGroupInformation.getCurrentUser.addCredentials"
> >>>>>>>> will be called which I've never seen it failed.
> >>>>>>>> * I hope all other code parts are not touching existing DTs
> >>> within
> >>>>> the
> >>>>>>> JVM
> >>>>>>>> I would like to emphasize I'm not against to add it just want to
> >>>> see
> >>>>>> what
> >>>>>>>> kind of problems are we facing.
> >>>>>>>> It would ease to catch bugs earlier and help in the maintenance.
> >>>>>>>>
> >>>>>>>> All in all I would buy the idea to add the 3rd bullet if we
> >>> foresee
> >>>>> the
> >>>>>>>> need.
> >>>>>>>>
> >>>>>>>> G
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Fri, Jan 28, 2022 at 10:07 AM David Morávek <[email protected]
> >>>>>> wrote:
> >>>>>>>>> Hi Gabor,
> >>>>>>>>>
> >>>>>>>>> This is definitely headed in a right direction +1.
> >>>>>>>>>
> >>>>>>>>> I think we still need to have a safeguard in case some of the
> >>> TMs
> >>>>>> gets
> >>>>>>>> into
> >>>>>>>>> the inconsistent state though, which will also eliminate the
> >>> need
> >>>>> for
> >>>>>>>>> implementing a custom retry mechanism (when
> >>>> _updateDelegationToken_
> >>>>>>> call
> >>>>>>>>> fails for some reason).
> >>>>>>>>>
> >>>>>>>>> We already have this safeguard in place for slot pool (in case
> >>>>> there
> >>>>>>> are
> >>>>>>>>> some slots in inconsistent state - eg. we haven't freed them
> >>> for
> >>>>> some
> >>>>>>>>> reason) and for the partition tracker, which could be simply
> >>>>>> enhanced.
> >>>>>>>> This
> >>>>>>>>> is done via periodic heartbeat from TaskManagers to the
> >>>>>> ResourceManager
> >>>>>>>>> that contains report about state of these two components
> >>> (from TM
> >>>>>>>>> perspective) so the RM can reconcile their state if necessary.
> >>>>>>>>>
> >>>>>>>>> I don't think adding an additional field to
> >>>>>>>> _TaskExecutorHeartbeatPayload_
> >>>>>>>>> should be a concern as we only heartbeat every ~ 10s by
> >>> default
> >>>> and
> >>>>>> the
> >>>>>>>> new
> >>>>>>>>> field would be small compared to rest of the existing payload.
> >>>> Also
> >>>>>>>>> heartbeat doesn't need to contain the whole DT, but just some
> >>>>>>> identifier
> >>>>>>>>> which signals whether it uses the right one, that could be
> >>>>>>> significantly
> >>>>>>>>> smaller.
> >>>>>>>>>
> >>>>>>>>> This is still a PUSH based approach as the RM would again call
> >>>> the
> >>>>>>> newly
> >>>>>>>>> introduced _updateDelegationToken_ when it encounters
> >>>> inconsistency
> >>>>>>> (eg.
> >>>>>>>>> due to a temporary network partition / a race condition we
> >>> didn't
> >>>>>> test
> >>>>>>>> for
> >>>>>>>>> / some other scenario we didn't think about). In practice
> >>> these
> >>>>>>>>> inconsistencies are super hard to avoid and reason about (and
> >>>>>>>> unfortunately
> >>>>>>>>> yes, we see them happen from time to time), so reusing the
> >>>> existing
> >>>>>>>>> mechanism that is designed for this exact problem simplify
> >>>> things.
> >>>>>>>>> To sum this up we'd have three code paths for calling
> >>>>>>>>> _updateDelegationToken_:
> >>>>>>>>> 1) When the TM registers, we push the token (if DTM already
> >>> has
> >>>> it)
> >>>>>> to
> >>>>>>> it
> >>>>>>>>> 2) When DTM obtains a new token it broadcasts it to all
> >>> currently
> >>>>>>>> connected
> >>>>>>>>> TMs
> >>>>>>>>> 3) When a TM gets out of sync, DTM would reconcile it's state
> >>>>>>>>>
> >>>>>>>>> WDYT?
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> D.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Wed, Jan 26, 2022 at 9:03 PM David Morávek <
> >>> [email protected]>
> >>>>>> wrote:
> >>>>>>>>>> Thanks the update, I'll go over it tomorrow.
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Jan 26, 2022 at 5:33 PM Gabor Somogyi <
> >>>>>>>> [email protected]
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi All,
> >>>>>>>>>>>
> >>>>>>>>>>> Since it has turned out that DTM can't be added as member
> >>> of
> >>>>>>> JobMaster
> >>>>>>>>>>> <
> >>>>>>>>>>>
> >>>
> https://github.com/gaborgsomogyi/flink/blob/8ab75e46013f159778ccfce52463e7bc63e395a9/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java#L176
> >>>>>>>>>>> I've
> >>>>>>>>>>> came up with a better proposal.
> >>>>>>>>>>> David, thanks for pinpointing this out, you've caught a
> >>> bug in
> >>>>> the
> >>>>>>>> early
> >>>>>>>>>>> phase!
> >>>>>>>>>>>
> >>>>>>>>>>> Namely ResourceManager
> >>>>>>>>>>> <
> >>>>>>>>>>>
> >>>
> https://github.com/apache/flink/blob/674bc96662285b25e395fd3dddf9291a602fc183/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManager.java#L124
> >>>>>>>>>>> is
> >>>>>>>>>>> a single instance class where DTM can be added as member
> >>>>> variable.
> >>>>>>>>>>> It has a list of all already registered TMs and new TM
> >>>>>> registration
> >>>>>>> is
> >>>>>>>>>>> also
> >>>>>>>>>>> happening here.
> >>>>>>>>>>> The following can be added from logic perspective to be
> >>> more
> >>>>>>> specific:
> >>>>>>>>>>> * Create new DTM instance in ResourceManager
> >>>>>>>>>>> <
> >>>>>>>>>>>
> >>>
> https://github.com/apache/flink/blob/674bc96662285b25e395fd3dddf9291a602fc183/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManager.java#L124
> >>>>>>>>>>> and
> >>>>>>>>>>> start it (re-occurring thread to obtain new tokens)
> >>>>>>>>>>> * Add a new function named "updateDelegationTokens" to
> >>>>>>>>> TaskExecutorGateway
> >>>>>>>>>>> <
> >>>>>>>>>>>
> >>>
> https://github.com/apache/flink/blob/674bc96662285b25e395fd3dddf9291a602fc183/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskExecutorGateway.java#L54
> >>>>>>>>>>> * Call "updateDelegationTokens" on all registered TMs to
> >>>>> propagate
> >>>>>>> new
> >>>>>>>>> DTs
> >>>>>>>>>>> * In case of new TM registration call
> >>> "updateDelegationTokens"
> >>>>>>> before
> >>>>>>>>>>> registration succeeds to setup new TM properly
> >>>>>>>>>>>
> >>>>>>>>>>> This way:
> >>>>>>>>>>> * only a single DTM would live within a cluster which is
> >>> the
> >>>>>>> expected
> >>>>>>>>>>> behavior
> >>>>>>>>>>> * DTM is going to be added to a central place where all
> >>>>> deployment
> >>>>>>>>> target
> >>>>>>>>>>> can make use of it
> >>>>>>>>>>> * DTs are going to be pushed to TMs which would generate
> >>> less
> >>>>>>> network
> >>>>>>>>>>> traffic than pull based approach
> >>>>>>>>>>> (please see my previous mail where I've described both
> >>>>> approaches)
> >>>>>>>>>>> * HA scenario is going to be consistent because such
> >>>>>>>>>>> <
> >>>>>>>>>>>
> >>>
> https://github.com/apache/flink/blob/674bc96662285b25e395fd3dddf9291a602fc183/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskExecutor.java#L1069
> >>>>>>>>>>> a solution can be added to "updateDelegationTokens"
> >>>>>>>>>>>
> >>>>>>>>>>> @David or all others plz share whether you agree on this or
> >>>> you
> >>>>>> have
> >>>>>>>>>>> better
> >>>>>>>>>>> idea/suggestion.
> >>>>>>>>>>>
> >>>>>>>>>>> BR,
> >>>>>>>>>>> G
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Tue, Jan 25, 2022 at 11:00 AM Gabor Somogyi <
> >>>>>>>>> [email protected]
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> First of all thanks for investing your time and helping
> >>> me
> >>>>> out.
> >>>>>>> As I
> >>>>>>>>> see
> >>>>>>>>>>>> you have pretty solid knowledge in the RPC area.
> >>>>>>>>>>>> I would like to rely on your knowledge since I'm learning
> >>>> this
> >>>>>>> part.
> >>>>>>>>>>>>> - Do we need to introduce a new RPC method or can we
> >>> for
> >>>>>> example
> >>>>>>>>>>>> piggyback
> >>>>>>>>>>>> on heartbeats?
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'm fine with either solution but one thing is important
> >>>>>>>> conceptually.
> >>>>>>>>>>>> There are fundamentally 2 ways how tokens can be updated:
> >>>>>>>>>>>> - Push way: When there are new DTs then JM JVM pushes
> >>> DTs to
> >>>>> TM
> >>>>>>>> JVMs.
> >>>>>>>>>>> This
> >>>>>>>>>>>> is the preferred one since tiny amount of control logic
> >>>>> needed.
> >>>>>>>>>>>> - Pull way: Each time a TM would like to poll JM whether
> >>>> there
> >>>>>> are
> >>>>>>>> new
> >>>>>>>>>>>> tokens and each TM wants to decide alone whether DTs
> >>> needs
> >>>> to
> >>>>> be
> >>>>>>>>>>> updated or
> >>>>>>>>>>>> not.
> >>>>>>>>>>>> As you've mentioned here some ID needs to be generated,
> >>> it
> >>>>> would
> >>>>>>>>>>> generated
> >>>>>>>>>>>> quite some additional network traffic which can be
> >>>> definitely
> >>>>>>>> avoided.
> >>>>>>>>>>>> As a final thought in Spark we've had this way of DT
> >>>>> propagation
> >>>>>>>> logic
> >>>>>>>>>>> and
> >>>>>>>>>>>> we've had major issues with it.
> >>>>>>>>>>>>
> >>>>>>>>>>>> So all in all DTM needs to obtain new tokens and there
> >>> must
> >>>> a
> >>>>>> way
> >>>>>>> to
> >>>>>>>>>>> send
> >>>>>>>>>>>> this data to all TMs from JM.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> - What delivery semantics are we looking for? (what if
> >>>> we're
> >>>>>>> only
> >>>>>>>>>>> able to
> >>>>>>>>>>>> update subset of TMs / what happens if we exhaust
> >>> retries /
> >>>>>> should
> >>>>>>>> we
> >>>>>>>>>>> even
> >>>>>>>>>>>> have the retry mechanism whatsoever) - I have a feeling
> >>> that
> >>>>>>> somehow
> >>>>>>>>>>>> leveraging the existing heartbeat mechanism could help to
> >>>>> answer
> >>>>>>>> these
> >>>>>>>>>>>> questions
> >>>>>>>>>>>>
> >>>>>>>>>>>> Let's go through these questions one by one.
> >>>>>>>>>>>>> What delivery semantics are we looking for?
> >>>>>>>>>>>> DTM must receive an exception when at least one TM was
> >>> not
> >>>>> able
> >>>>>> to
> >>>>>>>> get
> >>>>>>>>>>> DTs.
> >>>>>>>>>>>>> what if we're only able to update subset of TMs?
> >>>>>>>>>>>> Such case DTM will reschedule token obtain after
> >>>>>>>>>>>> "security.kerberos.tokens.retry-wait" time.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> what happens if we exhaust retries?
> >>>>>>>>>>>> There is no number of retries. In default configuration
> >>>> tokens
> >>>>>>> needs
> >>>>>>>>> to
> >>>>>>>>>>> be
> >>>>>>>>>>>> re-obtained after one day.
> >>>>>>>>>>>> DTM tries to obtain new tokens after 1day * 0.75
> >>>>>>>>>>>> (security.kerberos.tokens.renewal-ratio) = 18 hours.
> >>>>>>>>>>>> When fails it retries after
> >>>>>> "security.kerberos.tokens.retry-wait"
> >>>>>>>>> which
> >>>>>>>>>>> is
> >>>>>>>>>>>> 1 hour by default.
> >>>>>>>>>>>> If it never succeeds then authentication error is going
> >>> to
> >>>>>> happen
> >>>>>>> on
> >>>>>>>>> the
> >>>>>>>>>>>> TM side and the workload is
> >>>>>>>>>>>> going to stop.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> should we even have the retry mechanism whatsoever?
> >>>>>>>>>>>> Yes, because there are always temporary cluster issues.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> What does it mean for the running application (how does
> >>>> this
> >>>>>>> look
> >>>>>>>>> like
> >>>>>>>>>>>> from
> >>>>>>>>>>>> the user perspective)? As far as I remember the logs are
> >>>> only
> >>>>>>>>> collected
> >>>>>>>>>>>> ("aggregated") after the container is stopped, is that
> >>>>> correct?
> >>>>>>>>>>>> With default config it works like that but it can be
> >>> forced
> >>>> to
> >>>>>>>>> aggregate
> >>>>>>>>>>>> at specific intervals.
> >>>>>>>>>>>> A useful feature is forcing YARN to aggregate logs while
> >>> the
> >>>>> job
> >>>>>>> is
> >>>>>>>>>>> still
> >>>>>>>>>>>> running.
> >>>>>>>>>>>> For long-running jobs such as streaming jobs, this is
> >>>>>> invaluable.
> >>>>>>> To
> >>>>>>>>> do
> >>>>>>>>>>>> this,
> >>>>>>>>>>>>
> >>>>>> yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds
> >>>>>>>> must
> >>>>>>>>>>> be
> >>>>>>>>>>>> set to a non-negative value.
> >>>>>>>>>>>> When this is set, a timer will be set for the given
> >>>> duration,
> >>>>>> and
> >>>>>>>>>>> whenever
> >>>>>>>>>>>> that timer goes off,
> >>>>>>>>>>>> log aggregation will run on new files.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> I think
> >>>>>>>>>>>> this topic should get its own section in the FLIP (having
> >>>> some
> >>>>>>> cross
> >>>>>>>>>>>> reference to YARN ticket would be really useful, but I'm
> >>> not
> >>>>>> sure
> >>>>>>> if
> >>>>>>>>>>> there
> >>>>>>>>>>>> are any).
> >>>>>>>>>>>>
> >>>>>>>>>>>> I think this is important knowledge but this FLIP is not
> >>>>>> touching
> >>>>>>>> the
> >>>>>>>>>>>> already existing behavior.
> >>>>>>>>>>>> DTs are set on the AM container which is renewed by YARN
> >>>> until
> >>>>>>> it's
> >>>>>>>>> not
> >>>>>>>>>>>> possible anymore.
> >>>>>>>>>>>> Any kind of new code is not going to change this
> >>> limitation.
> >>>>>> BTW,
> >>>>>>>>> there
> >>>>>>>>>>> is
> >>>>>>>>>>>> no jira for this.
> >>>>>>>>>>>> If you think it worth to write this down then I think the
> >>>> good
> >>>>>>> place
> >>>>>>>>> is
> >>>>>>>>>>>> the official security doc
> >>>>>>>>>>>> area as caveat.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> If we split the FLIP into two parts / sections that
> >>> I've
> >>>>>>>> suggested,
> >>>>>>>>> I
> >>>>>>>>>>>> don't
> >>>>>>>>>>>> really think that you need to explicitly test for each
> >>>>>> deployment
> >>>>>>>>>>> scenario
> >>>>>>>>>>>> / cluster framework, because the DTM part is completely
> >>>>>>> independent
> >>>>>>>> of
> >>>>>>>>>>> the
> >>>>>>>>>>>> deployment target. Basically this is what I'm aiming for
> >>>> with
> >>>>>>>> "making
> >>>>>>>>> it
> >>>>>>>>>>>> work with the standalone" (as simple as starting a new
> >>> java
> >>>>>>> process)
> >>>>>>>>>>> Flink
> >>>>>>>>>>>> first (which is also how most people deploy streaming
> >>>>>> application
> >>>>>>> on
> >>>>>>>>> k8s
> >>>>>>>>>>>> and the direction we're pushing forward with the
> >>>> auto-scaling
> >>>>> /
> >>>>>>>>> reactive
> >>>>>>>>>>>> mode initiatives).
> >>>>>>>>>>>>
> >>>>>>>>>>>> I see your point and agree the main direction. k8s is the
> >>>>>>> megatrend
> >>>>>>>>>>> which
> >>>>>>>>>>>> most of the peoples
> >>>>>>>>>>>> will use sooner or later. Not 100% sure what kind of
> >>> split
> >>>> you
> >>>>>>>> suggest
> >>>>>>>>>>> but
> >>>>>>>>>>>> in my view
> >>>>>>>>>>>> the main target is to add this feature and I'm open to
> >>> any
> >>>>>> logical
> >>>>>>>>> work
> >>>>>>>>>>>> ordering.
> >>>>>>>>>>>> Please share the specific details and we work it out...
> >>>>>>>>>>>>
> >>>>>>>>>>>> G
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Mon, Jan 24, 2022 at 3:04 PM David Morávek <
> >>>>> [email protected]>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>> Could you point to a code where you think it could be
> >>>> added
> >>>>>>>>> exactly?
> >>>>>>>>>>> A
> >>>>>>>>>>>>>> helping hand is welcome here 🙂
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> I think you can take a look at
> >>>>>> _ResourceManagerPartitionTracker_
> >>>>>>>> [1]
> >>>>>>>>>>> which
> >>>>>>>>>>>>> seems to have somewhat similar properties to the DTM.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> One topic that needs to be addressed there is how the
> >>> RPC
> >>>>> with
> >>>>>>> the
> >>>>>>>>>>>>> _TaskExecutorGateway_ should look like.
> >>>>>>>>>>>>> - Do we need to introduce a new RPC method or can we for
> >>>>>> example
> >>>>>>>>>>> piggyback
> >>>>>>>>>>>>> on heartbeats?
> >>>>>>>>>>>>> - What delivery semantics are we looking for? (what if
> >>>> we're
> >>>>>> only
> >>>>>>>>> able
> >>>>>>>>>>> to
> >>>>>>>>>>>>> update subset of TMs / what happens if we exhaust
> >>> retries /
> >>>>>>> should
> >>>>>>>> we
> >>>>>>>>>>> even
> >>>>>>>>>>>>> have the retry mechanism whatsoever) - I have a feeling
> >>>> that
> >>>>>>>> somehow
> >>>>>>>>>>>>> leveraging the existing heartbeat mechanism could help
> >>> to
> >>>>>> answer
> >>>>>>>>> these
> >>>>>>>>>>>>> questions
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> In short, after DT reaches it's max lifetime then log
> >>>>>> aggregation
> >>>>>>>>> stops
> >>>>>>>>>>>>> What does it mean for the running application (how does
> >>>> this
> >>>>>> look
> >>>>>>>>> like
> >>>>>>>>>>>>> from
> >>>>>>>>>>>>> the user perspective)? As far as I remember the logs are
> >>>> only
> >>>>>>>>> collected
> >>>>>>>>>>>>> ("aggregated") after the container is stopped, is that
> >>>>>> correct? I
> >>>>>>>>> think
> >>>>>>>>>>>>> this topic should get its own section in the FLIP
> >>> (having
> >>>>> some
> >>>>>>>> cross
> >>>>>>>>>>>>> reference to YARN ticket would be really useful, but I'm
> >>>> not
> >>>>>> sure
> >>>>>>>> if
> >>>>>>>>>>> there
> >>>>>>>>>>>>> are any).
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> All deployment modes (per-job, per-app, ...) are
> >>> planned to
> >>>>> be
> >>>>>>>> tested
> >>>>>>>>>>> and
> >>>>>>>>>>>>>> expect to work with the initial implementation however
> >>>> not
> >>>>>> all
> >>>>>>>>>>>>> deployment
> >>>>>>>>>>>>>> targets (k8s, local, ...
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> If we split the FLIP into two parts / sections that I've
> >>>>>>>> suggested, I
> >>>>>>>>>>>>> don't
> >>>>>>>>>>>>> really think that you need to explicitly test for each
> >>>>>> deployment
> >>>>>>>>>>> scenario
> >>>>>>>>>>>>> / cluster framework, because the DTM part is completely
> >>>>>>> independent
> >>>>>>>>> of
> >>>>>>>>>>> the
> >>>>>>>>>>>>> deployment target. Basically this is what I'm aiming for
> >>>> with
> >>>>>>>> "making
> >>>>>>>>>>> it
> >>>>>>>>>>>>> work with the standalone" (as simple as starting a new
> >>> java
> >>>>>>>> process)
> >>>>>>>>>>> Flink
> >>>>>>>>>>>>> first (which is also how most people deploy streaming
> >>>>>> application
> >>>>>>>> on
> >>>>>>>>>>> k8s
> >>>>>>>>>>>>> and the direction we're pushing forward with the
> >>>>> auto-scaling /
> >>>>>>>>>>> reactive
> >>>>>>>>>>>>> mode initiatives).
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The whole integration with YARN (let's forget about log
> >>>>>>> aggregation
> >>>>>>>>>>> for a
> >>>>>>>>>>>>> moment) / k8s-native only boils down to how do we make
> >>> the
> >>>>>> keytab
> >>>>>>>>> file
> >>>>>>>>>>>>> local to the JobManager so the DTM can read it, so it's
> >>>>>> basically
> >>>>>>>>>>> built on
> >>>>>>>>>>>>> top of that. The only special thing that needs to be
> >>> tested
> >>>>>> there
> >>>>>>>> is
> >>>>>>>>>>> the
> >>>>>>>>>>>>> "keytab distribution" code path.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>
> https://github.com/apache/flink/blob/release-1.14.3/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/ResourceManagerPartitionTracker.java
> >>>>>>>>>>>>> Best,
> >>>>>>>>>>>>> D.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Mon, Jan 24, 2022 at 12:35 PM Gabor Somogyi <
> >>>>>>>>>>> [email protected]
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> There is a separate JobMaster for each job
> >>>>>>>>>>>>>> within a Flink cluster and each JobMaster only has a
> >>>>> partial
> >>>>>>> view
> >>>>>>>>> of
> >>>>>>>>>>> the
> >>>>>>>>>>>>>> task managers
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Good point! I've had a deeper look and you're right.
> >>> We
> >>>>>>>> definitely
> >>>>>>>>>>> need
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>> find another place.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Related per-cluster or per-job keytab:
> >>>>>>>>>>>>>> In the current code per-cluster keytab is implemented
> >>> and
> >>>>> I'm
> >>>>>>>>>>> intended
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>> keep it like this within this FLIP. The reason is
> >>> simple:
> >>>>>>> tokens
> >>>>>>>> on
> >>>>>>>>>>> TM
> >>>>>>>>>>>>> side
> >>>>>>>>>>>>>> can be stored within the UserGroupInformation (UGI)
> >>>>> structure
> >>>>>>>> which
> >>>>>>>>>>> is
> >>>>>>>>>>>>>> global. I'm not telling it's impossible to change that
> >>>> but
> >>>>> I
> >>>>>>>> think
> >>>>>>>>>>> that
> >>>>>>>>>>>>>> this is such a complexity which the initial
> >>>> implementation
> >>>>> is
> >>>>>>> not
> >>>>>>>>>>>>> required
> >>>>>>>>>>>>>> to contain. Additionally we've not seen such need from
> >>>> user
> >>>>>>> side.
> >>>>>>>>> If
> >>>>>>>>>>> the
> >>>>>>>>>>>>>> need may rise later on then another FLIP with this
> >>> topic
> >>>>> can
> >>>>>> be
> >>>>>>>>>>> created
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>> discussed. Proper multi-UGI handling within a single
> >>> JVM
> >>>>> is a
> >>>>>>>> topic
> >>>>>>>>>>>>> where
> >>>>>>>>>>>>>> several round of deep-dive with the Hadoop/YARN guys
> >>> are
> >>>>>>>> required.
> >>>>>>>>>>>>>>> single DTM instance embedded with
> >>>>>>>>>>>>>> the ResourceManager (the Flink component)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Could you point to a code where you think it could be
> >>>> added
> >>>>>>>>> exactly?
> >>>>>>>>>>> A
> >>>>>>>>>>>>>> helping hand is welcome here🙂
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Then the single (initial) implementation should work
> >>>> with
> >>>>>> all
> >>>>>>>> the
> >>>>>>>>>>>>>> deployments modes out of the box (which is not what
> >>> the
> >>>>> FLIP
> >>>>>>>>>>> suggests).
> >>>>>>>>>>>>> Is
> >>>>>>>>>>>>>> that correct?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> All deployment modes (per-job, per-app, ...) are
> >>> planned
> >>>> to
> >>>>>> be
> >>>>>>>>> tested
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>> expect to work with the initial implementation however
> >>>> not
> >>>>>> all
> >>>>>>>>>>>>> deployment
> >>>>>>>>>>>>>> targets (k8s, local, ...) are not intended to be
> >>> tested.
> >>>>> Per
> >>>>>>>>>>> deployment
> >>>>>>>>>>>>>> target new jira needs to be created where I expect
> >>> small
> >>>>>> number
> >>>>>>>> of
> >>>>>>>>>>> codes
> >>>>>>>>>>>>>> needs to be added and relatively expensive testing
> >>> effort
> >>>>> is
> >>>>>>>>>>> required.
> >>>>>>>>>>>>>>> I've taken a look into the prototype and in the
> >>>>>>>>>>>>> "YarnClusterDescriptor"
> >>>>>>>>>>>>>> you're injecting a delegation token into the AM [1]
> >>>> (that's
> >>>>>>>>> obtained
> >>>>>>>>>>>>> using
> >>>>>>>>>>>>>> the provided keytab). If I understand this correctly
> >>> from
> >>>>>>>> previous
> >>>>>>>>>>>>>> discussion / FLIP, this is to support log aggregation
> >>> and
> >>>>> DT
> >>>>>>> has
> >>>>>>>> a
> >>>>>>>>>>>>> limited
> >>>>>>>>>>>>>> validity. How is this DT going to be renewed?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> You're clever and touched a limitation which Spark has
> >>>> too.
> >>>>>> In
> >>>>>>>>> short,
> >>>>>>>>>>>>> after
> >>>>>>>>>>>>>> DT reaches it's max lifetime then log aggregation
> >>> stops.
> >>>>> I've
> >>>>>>> had
> >>>>>>>>>>>>> several
> >>>>>>>>>>>>>> deep-dive rounds with the YARN guys at Spark years
> >>>> because
> >>>>>>> wanted
> >>>>>>>>> to
> >>>>>>>>>>>>> fill
> >>>>>>>>>>>>>> this gap. They can't provide us any way to re-inject
> >>> the
> >>>>>> newly
> >>>>>>>>>>> obtained
> >>>>>>>>>>>>> DT
> >>>>>>>>>>>>>> so at the end I gave up this.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> BR,
> >>>>>>>>>>>>>> G
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Mon, 24 Jan 2022, 11:00 David Morávek, <
> >>>> [email protected]
> >>>>>>>> wrote:
> >>>>>>>>>>>>>>> Hi Gabor,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> There is actually a huge difference between
> >>> JobManager
> >>>>>>>> (process)
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>> JobMaster (job coordinator). The naming is
> >>>> unfortunately
> >>>>>> bit
> >>>>>>>>>>>>> misleading
> >>>>>>>>>>>>>>> here from historical reasons. There is a separate
> >>>>> JobMaster
> >>>>>>> for
> >>>>>>>>>>> each
> >>>>>>>>>>>>> job
> >>>>>>>>>>>>>>> within a Flink cluster and each JobMaster only has a
> >>>>>> partial
> >>>>>>>> view
> >>>>>>>>>>> of
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>> task managers (depends on where the slots for a
> >>>>> particular
> >>>>>>> job
> >>>>>>>>> are
> >>>>>>>>>>>>>>> allocated). This means that you'll end up with N
> >>>>>>>>>>>>>> "DelegationTokenManagers"
> >>>>>>>>>>>>>>> competing with each other (N = number of running
> >>> jobs
> >>>> in
> >>>>>> the
> >>>>>>>>>>> cluster).
> >>>>>>>>>>>>>>> This makes me think we're mixing two abstraction
> >>> levels
> >>>>>> here:
> >>>>>>>>>>>>>>> a) Per-cluster delegation tokens
> >>>>>>>>>>>>>>> - Simpler approach, it would involve a single DTM
> >>>>> instance
> >>>>>>>>> embedded
> >>>>>>>>>>>>> with
> >>>>>>>>>>>>>>> the ResourceManager (the Flink component)
> >>>>>>>>>>>>>>> b) Per-job delegation tokens
> >>>>>>>>>>>>>>> - More complex approach, but could be more flexible
> >>>> from
> >>>>>> the
> >>>>>>>> user
> >>>>>>>>>>>>> side of
> >>>>>>>>>>>>>>> things.
> >>>>>>>>>>>>>>> - Multiple DTM instances, that are bound with the
> >>>>> JobMaster
> >>>>>>>>>>> lifecycle.
> >>>>>>>>>>>>>>> Delegation tokens are attached with a particular
> >>> slots
> >>>>> that
> >>>>>>> are
> >>>>>>>>>>>>> executing
> >>>>>>>>>>>>>>> the job tasks instead of the whole task manager (TM
> >>>> could
> >>>>>> be
> >>>>>>>>>>> executing
> >>>>>>>>>>>>>>> multiple jobs with different tokens).
> >>>>>>>>>>>>>>> - The question is which keytab should be used for
> >>> the
> >>>>>>>> clustering
> >>>>>>>>>>>>>> framework,
> >>>>>>>>>>>>>>> to support log aggregation on YARN (an extra keytab,
> >>>>> keytab
> >>>>>>>> that
> >>>>>>>>>>> comes
> >>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>> the first job?)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I think these are the things that need to be
> >>> clarified
> >>>> in
> >>>>>> the
> >>>>>>>>> FLIP
> >>>>>>>>>>>>> before
> >>>>>>>>>>>>>>> proceeding.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> A follow-up question for getting a better
> >>> understanding
> >>>>>> where
> >>>>>>>>> this
> >>>>>>>>>>>>> should
> >>>>>>>>>>>>>>> be headed: Are there any use cases where user may
> >>> want
> >>>> to
> >>>>>> use
> >>>>>>>>>>>>> different
> >>>>>>>>>>>>>>> keytabs with each job, or are we fine with using a
> >>>>>>> cluster-wide
> >>>>>>>>>>>>> keytab?
> >>>>>>>>>>>>>> If
> >>>>>>>>>>>>>>> we go with per-cluster keytabs, is it OK that all
> >>> jobs
> >>>>>>>> submitted
> >>>>>>>>>>> into
> >>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>> cluster can access it (even the future ones)? Should
> >>>> this
> >>>>>> be
> >>>>>>> a
> >>>>>>>>>>>>> security
> >>>>>>>>>>>>>>> concern?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Presume you though I would implement a new class
> >>> with
> >>>>>>>> JobManager
> >>>>>>>>>>> name.
> >>>>>>>>>>>>>> The
> >>>>>>>>>>>>>>>> plan is not that.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I've never suggested such thing.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> No. That said earlier DT handling is planned to be
> >>>> done
> >>>>>>>>>>> completely
> >>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>> Flink. DTM has a renewal thread which re-obtains
> >>>> tokens
> >>>>>> in
> >>>>>>>> the
> >>>>>>>>>>>>> proper
> >>>>>>>>>>>>>>> time
> >>>>>>>>>>>>>>>> when needed.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Then the single (initial) implementation should work
> >>>> with
> >>>>>> all
> >>>>>>>> the
> >>>>>>>>>>>>>>> deployments modes out of the box (which is not what
> >>> the
> >>>>>> FLIP
> >>>>>>>>>>>>> suggests).
> >>>>>>>>>>>>>> Is
> >>>>>>>>>>>>>>> that correct?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> If the cluster framework, also requires delegation
> >>>> token
> >>>>>> for
> >>>>>>>>> their
> >>>>>>>>>>>>> inner
> >>>>>>>>>>>>>>> working (this is IMO only applies to YARN), it might
> >>>> need
> >>>>>> an
> >>>>>>>>> extra
> >>>>>>>>>>>>> step
> >>>>>>>>>>>>>>> (injecting the token into application master
> >>>> container).
> >>>>>>>>>>>>>>> Separating the individual layers (actual Flink
> >>> cluster
> >>>> -
> >>>>>>>>> basically
> >>>>>>>>>>>>> making
> >>>>>>>>>>>>>>> this work with a standalone deployment  / "cluster
> >>>>>>> framework" -
> >>>>>>>>>>>>> support
> >>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>> YARN log aggregation) in the FLIP would be useful.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Reading the linked Spark readme could be useful.
> >>>>>>>>>>>>>>> I've read that, but please be patient with the
> >>>> questions,
> >>>>>>>>> Kerberos
> >>>>>>>>>>> is
> >>>>>>>>>>>>> not
> >>>>>>>>>>>>>>> an easy topic to get into and I've had a very little
> >>>>>> contact
> >>>>>>>> with
> >>>>>>>>>>> it
> >>>>>>>>>>>>> in
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>> past.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>
> https://github.com/gaborgsomogyi/flink/blob/8ab75e46013f159778ccfce52463e7bc63e395a9/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java#L176
> >>>>>>>>>>>>>>> I've taken a look into the prototype and in the
> >>>>>>>>>>>>> "YarnClusterDescriptor"
> >>>>>>>>>>>>>>> you're injecting a delegation token into the AM [1]
> >>>>> (that's
> >>>>>>>>>>> obtained
> >>>>>>>>>>>>>> using
> >>>>>>>>>>>>>>> the provided keytab). If I understand this correctly
> >>>> from
> >>>>>>>>> previous
> >>>>>>>>>>>>>>> discussion / FLIP, this is to support log
> >>> aggregation
> >>>> and
> >>>>>> DT
> >>>>>>>> has
> >>>>>>>>> a
> >>>>>>>>>>>>>> limited
> >>>>>>>>>>>>>>> validity. How is this DT going to be renewed?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>
> https://github.com/gaborgsomogyi/flink/commit/8ab75e46013f159778ccfce52463e7bc63e395a9#diff-02416e2d6ca99e1456f9c3949f3d7c2ac523d3fe25378620c09632e4aac34e4eR1261
> >>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>> D.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Fri, Jan 21, 2022 at 9:35 PM Gabor Somogyi <
> >>>>>>>>>>>>> [email protected]
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Here is the exact class, I'm from mobile so not
> >>> had a
> >>>>>> look
> >>>>>>> at
> >>>>>>>>> the
> >>>>>>>>>>>>> exact
> >>>>>>>>>>>>>>>> class name:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>
> https://github.com/gaborgsomogyi/flink/blob/8ab75e46013f159778ccfce52463e7bc63e395a9/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java#L176
> >>>>>>>>>>>>>>>> That keeps track of TMs where the tokens can be
> >>> sent
> >>>>> to.
> >>>>>>>>>>>>>>>>> My feeling would be that we shouldn't really
> >>>>> introduce
> >>>>>> a
> >>>>>>>> new
> >>>>>>>>>>>>>> component
> >>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>> a custom lifecycle, but rather we should try to
> >>>>>> incorporate
> >>>>>>>>> this
> >>>>>>>>>>>>> into
> >>>>>>>>>>>>>>>> existing ones.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Can you be more specific? Presume you though I
> >>> would
> >>>>>>>> implement
> >>>>>>>>> a
> >>>>>>>>>>> new
> >>>>>>>>>>>>>>> class
> >>>>>>>>>>>>>>>> with JobManager name. The plan is not that.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> If I understand this correctly, this means that
> >>> we
> >>>>> then
> >>>>>>>> push
> >>>>>>>>>>> the
> >>>>>>>>>>>>>> token
> >>>>>>>>>>>>>>>> renewal logic to YARN.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> No. That said earlier DT handling is planned to be
> >>>> done
> >>>>>>>>>>> completely
> >>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>> Flink. DTM has a renewal thread which re-obtains
> >>>> tokens
> >>>>>> in
> >>>>>>>> the
> >>>>>>>>>>>>> proper
> >>>>>>>>>>>>>>> time
> >>>>>>>>>>>>>>>> when needed. YARN log aggregation is a totally
> >>>>> different
> >>>>>>>>> feature,
> >>>>>>>>>>>>> where
> >>>>>>>>>>>>>>>> YARN does the renewal. Log aggregation was an
> >>> example
> >>>>> why
> >>>>>>> the
> >>>>>>>>>>> code
> >>>>>>>>>>>>>> can't
> >>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>> 100% reusable for all resource managers. Reading
> >>> the
> >>>>>> linked
> >>>>>>>>> Spark
> >>>>>>>>>>>>>> readme
> >>>>>>>>>>>>>>>> could be useful.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> G
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Fri, 21 Jan 2022, 21:05 David Morávek, <
> >>>>>> [email protected]
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>> JobManager is the Flink class.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> There is no such class in Flink. The closest
> >>> thing
> >>>> to
> >>>>>> the
> >>>>>>>>>>>>> JobManager
> >>>>>>>>>>>>>>> is a
> >>>>>>>>>>>>>>>>> ClusterEntrypoint. The cluster entrypoint spawns
> >>>> new
> >>>>> RM
> >>>>>>>>> Runner
> >>>>>>>>>>> &
> >>>>>>>>>>>>>>>> Dispatcher
> >>>>>>>>>>>>>>>>> Runner that start participating in the leader
> >>>>> election.
> >>>>>>>> Once
> >>>>>>>>>>> they
> >>>>>>>>>>>>>> gain
> >>>>>>>>>>>>>>>>> leadership they spawn the actual underlying
> >>>> instances
> >>>>>> of
> >>>>>>>>> these
> >>>>>>>>>>> two
> >>>>>>>>>>>>>>> "main
> >>>>>>>>>>>>>>>>> components".
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> My feeling would be that we shouldn't really
> >>>>> introduce
> >>>>>> a
> >>>>>>>> new
> >>>>>>>>>>>>>> component
> >>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>> a custom lifecycle, but rather we should try to
> >>>>>>> incorporate
> >>>>>>>>>>> this
> >>>>>>>>>>>>> into
> >>>>>>>>>>>>>>>>> existing ones.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> My biggest concerns would be:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> - How would the lifecycle of the new component
> >>> look
> >>>>>> like
> >>>>>>>> with
> >>>>>>>>>>>>> regards
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>> HA
> >>>>>>>>>>>>>>>>> setups. If we really try to decide to introduce
> >>> a
> >>>>>>>> completely
> >>>>>>>>>>> new
> >>>>>>>>>>>>>>>> component,
> >>>>>>>>>>>>>>>>> how should this work in case of multiple
> >>> JobManager
> >>>>>>>>> instances?
> >>>>>>>>>>>>>>>>> - Which components does it talk to / how? For
> >>>> example
> >>>>>> how
> >>>>>>>>> does
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> broadcast of new token to task managers
> >>>>>>>> (TaskManagerGateway)
> >>>>>>>>>>> look
> >>>>>>>>>>>>>> like?
> >>>>>>>>>>>>>>>> Do
> >>>>>>>>>>>>>>>>> we simply introduce a new RPC on the
> >>>>>>> ResourceManagerGateway
> >>>>>>>>>>> that
> >>>>>>>>>>>>>>>> broadcasts
> >>>>>>>>>>>>>>>>> it or does the new component need to do some
> >>> kind
> >>>> of
> >>>>>>>>>>> bookkeeping
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>>> task
> >>>>>>>>>>>>>>>>> managers that it needs to notify?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> YARN based HDFS log aggregation would not work
> >>> by
> >>>>>>> dropping
> >>>>>>>>> that
> >>>>>>>>>>>>> code.
> >>>>>>>>>>>>>>>> Just
> >>>>>>>>>>>>>>>>>> to be crystal clear, the actual implementation
> >>>>>> contains
> >>>>>>>>> this
> >>>>>>>>>>> fir
> >>>>>>>>>>>>>>>> exactly
> >>>>>>>>>>>>>>>>>> this reason.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> This is the missing part +1. If I understand
> >>> this
> >>>>>>>> correctly,
> >>>>>>>>>>> this
> >>>>>>>>>>>>>> means
> >>>>>>>>>>>>>>>>> that we then push the token renewal logic to
> >>> YARN.
> >>>>> How
> >>>>>> do
> >>>>>>>> you
> >>>>>>>>>>>>> plan to
> >>>>>>>>>>>>>>>>> implement the renewal logic on k8s?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> D.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Fri, Jan 21, 2022 at 8:37 PM Gabor Somogyi <
> >>>>>>>>>>>>>>> [email protected]
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I think we might both mean something
> >>> different
> >>>> by
> >>>>>> the
> >>>>>>>> RM.
> >>>>>>>>>>>>>>>>>> You feel it well, I've not specified these
> >>> terms
> >>>>> well
> >>>>>>> in
> >>>>>>>>> the
> >>>>>>>>>>>>>>>> explanation.
> >>>>>>>>>>>>>>>>>> RM I meant resource management framework.
> >>>>> JobManager
> >>>>>> is
> >>>>>>>> the
> >>>>>>>>>>>>> Flink
> >>>>>>>>>>>>>>>> class.
> >>>>>>>>>>>>>>>>>> This means that inside JM instance there will
> >>> be
> >>>> a
> >>>>>> DTM
> >>>>>>>>>>>>> instance, so
> >>>>>>>>>>>>>>>> they
> >>>>>>>>>>>>>>>>>> would have the same lifecycle. Hope I've
> >>> answered
> >>>>> the
> >>>>>>>>>>> question.
> >>>>>>>>>>>>>>>>>>> If we have tokens available on the client
> >>> side,
> >>>>> why
> >>>>>>> do
> >>>>>>>> we
> >>>>>>>>>>>>> need to
> >>>>>>>>>>>>>>> set
> >>>>>>>>>>>>>>>>>> them
> >>>>>>>>>>>>>>>>>> into the AM (yarn specific concept) launch
> >>>> context?
> >>>>>>>>>>>>>>>>>> YARN based HDFS log aggregation would not
> >>> work by
> >>>>>>>> dropping
> >>>>>>>>>>> that
> >>>>>>>>>>>>>> code.
> >>>>>>>>>>>>>>>>> Just
> >>>>>>>>>>>>>>>>>> to be crystal clear, the actual implementation
> >>>>>> contains
> >>>>>>>>> this
> >>>>>>>>>>> fir
> >>>>>>>>>>>>>>>> exactly
> >>>>>>>>>>>>>>>>>> this reason.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> G
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Fri, 21 Jan 2022, 20:12 David Morávek, <
> >>>>>>>> [email protected]
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>> Hi Gabor,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> 1. One thing is important, token management
> >>> is
> >>>>>>> planned
> >>>>>>>> to
> >>>>>>>>>>> be
> >>>>>>>>>>>>> done
> >>>>>>>>>>>>>>>>>>>> generically within Flink and not
> >>> scattered in
> >>>>> RM
> >>>>>>>>> specific
> >>>>>>>>>>>>> code.
> >>>>>>>>>>>>>>>>>>> JobManager
> >>>>>>>>>>>>>>>>>>>> has a DelegationTokenManager which obtains
> >>>>> tokens
> >>>>>>>>>>>>> time-to-time
> >>>>>>>>>>>>>>> (if
> >>>>>>>>>>>>>>>>>>>> configured properly). JM knows which
> >>>>> TaskManagers
> >>>>>>> are
> >>>>>>>>> in
> >>>>>>>>>>>>> place
> >>>>>>>>>>>>>> so
> >>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>> distribute it to all TMs. That's it
> >>>> basically.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I think we might both mean something
> >>> different
> >>>> by
> >>>>>> the
> >>>>>>>> RM.
> >>>>>>>>>>>>>>> JobManager
> >>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>> basically just a process encapsulating
> >>> multiple
> >>>>>>>>> components,
> >>>>>>>>>>>>> one
> >>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>> a ResourceManager, which is the component
> >>> that
> >>>>>>> manages
> >>>>>>>>> task
> >>>>>>>>>>>>>> manager
> >>>>>>>>>>>>>>>>>>> registrations [1]. There is more or less a
> >>>> single
> >>>>>>>>>>>>> implementation
> >>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> RM
> >>>>>>>>>>>>>>>>>>> with plugable drivers for the active
> >>>> integrations
> >>>>>>>> (yarn,
> >>>>>>>>>>> k8s).
> >>>>>>>>>>>>>>>>>>> It would be great if you could share more
> >>>> details
> >>>>>> of
> >>>>>>>> how
> >>>>>>>>>>>>> exactly
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> DTM
> >>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>> going to fit in the current JM architecture.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> 2. 99.9% of the code is generic but each RM
> >>>>> handles
> >>>>>>>>> tokens
> >>>>>>>>>>>>>>>>> differently. A
> >>>>>>>>>>>>>>>>>>>> good example is YARN obtains tokens on
> >>> client
> >>>>>> side
> >>>>>>>> and
> >>>>>>>>>>> then
> >>>>>>>>>>>>>> sets
> >>>>>>>>>>>>>>>> them
> >>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>> the newly created AM container launch
> >>>> context.
> >>>>>> This
> >>>>>>>> is
> >>>>>>>>>>>>> purely
> >>>>>>>>>>>>>>> YARN
> >>>>>>>>>>>>>>>>>>> specific
> >>>>>>>>>>>>>>>>>>>> and cant't be spared. With my actual plans
> >>>>>>> standalone
> >>>>>>>>>>> can be
> >>>>>>>>>>>>>>>> changed
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> use
> >>>>>>>>>>>>>>>>>>>> the framework. By using it I mean no RM
> >>>>> specific
> >>>>>>> DTM
> >>>>>>>> or
> >>>>>>>>>>>>>>> whatsoever
> >>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>> needed.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> If we have tokens available on the client
> >>> side,
> >>>>> why
> >>>>>>> do
> >>>>>>>> we
> >>>>>>>>>>>>> need to
> >>>>>>>>>>>>>>> set
> >>>>>>>>>>>>>>>>>> them
> >>>>>>>>>>>>>>>>>>> into the AM (yarn specific concept) launch
> >>>>> context?
> >>>>>>> Why
> >>>>>>>>>>> can't
> >>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>> simply
> >>>>>>>>>>>>>>>>>>> send them to the JM, eg. as a parameter of
> >>> the
> >>>>> job
> >>>>>>>>>>> submission
> >>>>>>>>>>>>> /
> >>>>>>>>>>>>>> via
> >>>>>>>>>>>>>>>>>>> separate RPC call? There might be something
> >>> I'm
> >>>>>>> missing
> >>>>>>>>>>> due to
> >>>>>>>>>>>>>>>> limited
> >>>>>>>>>>>>>>>>>>> knowledge, but handling the token on the
> >>>> "cluster
> >>>>>>>>>>> framework"
> >>>>>>>>>>>>>> level
> >>>>>>>>>>>>>>>>>> doesn't
> >>>>>>>>>>>>>>>>>>> seem necessary.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>
> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/concepts/flink-architecture/#jobmanager
> >>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>> D.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Fri, Jan 21, 2022 at 7:48 PM Gabor
> >>> Somogyi <
> >>>>>>>>>>>>>>>>> [email protected]
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Oh and one more thing. I'm planning to add
> >>>> this
> >>>>>>>> feature
> >>>>>>>>>>> in
> >>>>>>>>>>>>>> small
> >>>>>>>>>>>>>>>>> chunk
> >>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>> PRs because security is super hairy area.
> >>>> That
> >>>>>> way
> >>>>>>>>>>> reviewers
> >>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>> more
> >>>>>>>>>>>>>>>>>>>> easily obtains the concept.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Fri, 21 Jan 2022, 18:03 David Morávek,
> >>> <
> >>>>>>>>>>> [email protected]>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>> Hi Gabor,
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> thanks for drafting the FLIP, I think
> >>>> having
> >>>>> a
> >>>>>>>> solid
> >>>>>>>>>>>>> Kerberos
> >>>>>>>>>>>>>>>>> support
> >>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>> crucial for many enterprise deployments.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I have multiple questions regarding the
> >>>>>>>>> implementation
> >>>>>>>>>>>>> (note
> >>>>>>>>>>>>>>>> that I
> >>>>>>>>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>>>>>>> very limited knowledge of Kerberos):
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> 1) If I understand it correctly, we'll
> >>> only
> >>>>>>> obtain
> >>>>>>>>>>> tokens
> >>>>>>>>>>>>> in
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>>>> manager and then we'll distribute them
> >>> via
> >>>>> RPC
> >>>>>>>> (needs
> >>>>>>>>>>> to
> >>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>> secured).
> >>>>>>>>>>>>>>>>>>>>> Can you please outline how the
> >>>> communication
> >>>>>> will
> >>>>>>>>> look
> >>>>>>>>>>>>> like?
> >>>>>>>>>>>>>> Is
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> DelegationTokenManager going to be a
> >>> part
> >>>> of
> >>>>>> the
> >>>>>>>>>>>>>>> ResourceManager?
> >>>>>>>>>>>>>>>>> Can
> >>>>>>>>>>>>>>>>>>> you
> >>>>>>>>>>>>>>>>>>>>> outline it's lifecycle / how it's going
> >>> to
> >>>> be
> >>>>>>>>>>> integrated
> >>>>>>>>>>>>>> there?
> >>>>>>>>>>>>>>>>>>>>> 2) Do we really need a YARN / k8s
> >>> specific
> >>>>>>>>>>>>> implementations?
> >>>>>>>>>>>>>> Is
> >>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>> possible
> >>>>>>>>>>>>>>>>>>>>> to obtain / renew a token in a generic
> >>> way?
> >>>>>> Maybe
> >>>>>>>> to
> >>>>>>>>>>>>> rephrase
> >>>>>>>>>>>>>>>> that,
> >>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>> possible to implement
> >>>> DelegationTokenManager
> >>>>>> for
> >>>>>>>> the
> >>>>>>>>>>>>>> standalone
> >>>>>>>>>>>>>>>>>> Flink?
> >>>>>>>>>>>>>>>>>>> If
> >>>>>>>>>>>>>>>>>>>>> we're able to solve this point, it
> >>> could be
> >>>>>>>> possible
> >>>>>>>>> to
> >>>>>>>>>>>>>> target
> >>>>>>>>>>>>>>>> all
> >>>>>>>>>>>>>>>>>>>>> deployment scenarios with a single
> >>>>>>> implementation.
> >>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>> D.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Fri, Jan 14, 2022 at 3:47 AM Junfan
> >>>> Zhang
> >>>>> <
> >>>>>>>>>>>>>>>>>> [email protected]>
> >>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Hi G
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Thanks for your explain in detail. I
> >>> have
> >>>>>>> gotten
> >>>>>>>>> your
> >>>>>>>>>>>>>>> thoughts,
> >>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> any
> >>>>>>>>>>>>>>>>>>>>>> way this proposal
> >>>>>>>>>>>>>>>>>>>>>> is a great improvement.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Looking forward to your implementation
> >>>> and
> >>>>> i
> >>>>>>> will
> >>>>>>>>>>> keep
> >>>>>>>>>>>>>> focus
> >>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>> it.
> >>>>>>>>>>>>>>>>>>>>>> Thanks again.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Best
> >>>>>>>>>>>>>>>>>>>>>> JunFan.
> >>>>>>>>>>>>>>>>>>>>>> On Jan 13, 2022, 9:20 PM +0800, Gabor
> >>>>>> Somogyi <
> >>>>>>>>>>>>>>>>>>>> [email protected]
> >>>>>>>>>>>>>>>>>>>>>> ,
> >>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>> Just to confirm keeping
> >>>>>>>>>>>>>>>>>> "security.kerberos.fetch.delegation-token"
> >>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>> added
> >>>>>>>>>>>>>>>>>>>>>>> to the doc.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> BR,
> >>>>>>>>>>>>>>>>>>>>>>> G
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Thu, Jan 13, 2022 at 1:34 PM
> >>> Gabor
> >>>>>>> Somogyi <
> >>>>>>>>>>>>>>>>>>>>> [email protected]
> >>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Hi JunFan,
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> By the way, maybe this should be
> >>>>> added
> >>>>>> in
> >>>>>>>> the
> >>>>>>>>>>>>>> migration
> >>>>>>>>>>>>>>>>> plan
> >>>>>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>>>>>>>>> intergation section in the
> >>> FLIP-211.
> >>>>>>>>>>>>>>>>>>>>>>>> Going to add this soon.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Besides, I have a question that
> >>> the
> >>>>> KDC
> >>>>>>>> will
> >>>>>>>>>>>>> collapse
> >>>>>>>>>>>>>>>> when
> >>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> cluster
> >>>>>>>>>>>>>>>>>>>>>>>> reached 200 nodes you described
> >>>>>>>>>>>>>>>>>>>>>>>> in the google doc. Do you have any
> >>>>>>> attachment
> >>>>>>>>> or
> >>>>>>>>>>>>>>> reference
> >>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> prove
> >>>>>>>>>>>>>>>>>>>>> it?
> >>>>>>>>>>>>>>>>>>>>>>>> "KDC *may* collapse under some
> >>>>>>> circumstances"
> >>>>>>>>> is
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>> proper
> >>>>>>>>>>>>>>>>>>>> wording.
> >>>>>>>>>>>>>>>>>>>>>>>> We have several customers who are
> >>>>>> executing
> >>>>>>>>>>>>> workloads
> >>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>> Spark/Flink.
> >>>>>>>>>>>>>>>>>>>>>> Most
> >>>>>>>>>>>>>>>>>>>>>>>> of the time I'm facing their
> >>>>>>>>>>>>>>>>>>>>>>>> daily issues which is heavily
> >>>>> environment
> >>>>>>> and
> >>>>>>>>>>>>> use-case
> >>>>>>>>>>>>>>>>>> dependent.
> >>>>>>>>>>>>>>>>>>>>> I've
> >>>>>>>>>>>>>>>>>>>>>>>> seen various cases:
> >>>>>>>>>>>>>>>>>>>>>>>> * where the mentioned ~1k nodes
> >>> were
> >>>>>>> working
> >>>>>>>>> fine
> >>>>>>>>>>>>>>>>>>>>>>>> * where KDC thought the number of
> >>>>>> requests
> >>>>>>>> are
> >>>>>>>>>>>>> coming
> >>>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>> DDOS
> >>>>>>>>>>>>>>>>>>>>> attack
> >>>>>>>>>>>>>>>>>>>>>> so
> >>>>>>>>>>>>>>>>>>>>>>>> discontinued authentication
> >>>>>>>>>>>>>>>>>>>>>>>> * where KDC was simply not
> >>> responding
> >>>>>>> because
> >>>>>>>>> of
> >>>>>>>>>>> the
> >>>>>>>>>>>>>> load
> >>>>>>>>>>>>>>>>>>>>>>>> * where KDC was intermittently had
> >>>> some
> >>>>>>>> outage
> >>>>>>>>>>> (this
> >>>>>>>>>>>>>> was
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> most
> >>>>>>>>>>>>>>>>>>>>> nasty
> >>>>>>>>>>>>>>>>>>>>>>>> thing)
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Since you're managing relatively
> >>> big
> >>>>>>> cluster
> >>>>>>>>> then
> >>>>>>>>>>>>> you
> >>>>>>>>>>>>>>> know
> >>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>> KDC
> >>>>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>>>> only used by Spark/Flink workloads
> >>>>>>>>>>>>>>>>>>>>>>>> but the whole company IT
> >>>> infrastructure
> >>>>>> is
> >>>>>>>>>>> bombing
> >>>>>>>>>>>>> it
> >>>>>>>>>>>>>> so
> >>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>> really
> >>>>>>>>>>>>>>>>>>>>>> depends
> >>>>>>>>>>>>>>>>>>>>>>>> on other factors too whether KDC
> >>> is
> >>>>>>> reaching
> >>>>>>>>>>>>>>>>>>>>>>>> it's limit or not. Not sure what
> >>> kind
> >>>>> of
> >>>>>>>>> evidence
> >>>>>>>>>>>>> are
> >>>>>>>>>>>>>> you
> >>>>>>>>>>>>>>>>>> looking
> >>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>>>>>>> I'm not authorized to share any
> >>>>>> information
> >>>>>>>>> about
> >>>>>>>>>>>>>>>>>>>>>>>> our clients data.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> One thing is for sure. The more
> >>>>> external
> >>>>>>>> system
> >>>>>>>>>>>>> types
> >>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>> used
> >>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>> workloads (for ex. HDFS, HBase,
> >>> Hive,
> >>>>>>> Kafka)
> >>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>>>>> are authenticating through KDC the
> >>>> more
> >>>>>>>>>>> possibility
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>> reach
> >>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>> threshold when the cluster is big
> >>>>> enough.
> >>>>>>>>>>>>>>>>>>>>>>>> All in all this feature is here to
> >>>> help
> >>>>>> all
> >>>>>>>>> users
> >>>>>>>>>>>>> never
> >>>>>>>>>>>>>>>> reach
> >>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>> limitation.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> BR,
> >>>>>>>>>>>>>>>>>>>>>>>> G
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> On Thu, Jan 13, 2022 at 1:00 PM
> >>> 张俊帆 <
> >>>>>>>>>>>>>>>> [email protected]
> >>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>> Hi G
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your quick reply. I
> >>>> think
> >>>>>>>>> reserving
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>> config
> >>>>>>>>>>>>>>>>> of
> >>>>>>> *security.kerberos.fetch.delegation-token*
> >>>>>>>>>>>>>>>>>>>>>>>>> and simplifying disable the
> >>> token
> >>>>>>> fetching
> >>>>>>>>> is a
> >>>>>>>>>>>>> good
> >>>>>>>>>>>>>>>>> idea.By
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> way,
> >>>>>>>>>>>>>>>>>>>>>>>>> maybe this should be added
> >>>>>>>>>>>>>>>>>>>>>>>>> in the migration plan or
> >>>> intergation
> >>>>>>>> section
> >>>>>>>>> in
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> FLIP-211.
> >>>>>>>>>>>>>>>>>>>>>>>>> Besides, I have a question that
> >>> the
> >>>>> KDC
> >>>>>>>> will
> >>>>>>>>>>>>> collapse
> >>>>>>>>>>>>>>>> when
> >>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> cluster
> >>>>>>>>>>>>>>>>>>>>>>>>> reached 200 nodes you described
> >>>>>>>>>>>>>>>>>>>>>>>>> in the google doc. Do you have
> >>> any
> >>>>>>>> attachment
> >>>>>>>>>>> or
> >>>>>>>>>>>>>>>> reference
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>> prove
> >>>>>>>>>>>>>>>>>>>>>> it?
> >>>>>>>>>>>>>>>>>>>>>>>>> Because in our internal
> >>>> per-cluster,
> >>>>>>>>>>>>>>>>>>>>>>>>> the nodes reaches > 1000 and KDC
> >>>>> looks
> >>>>>>>> good.
> >>>>>>>>>>> Do i
> >>>>>>>>>>>>>>> missed
> >>>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>>>>>>> misunderstood
> >>>>>>>>>>>>>>>>>>>>>>>>> something? Please correct me.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Best
> >>>>>>>>>>>>>>>>>>>>>>>>> JunFan.
> >>>>>>>>>>>>>>>>>>>>>>>>> On Jan 13, 2022, 5:26 PM +0800,
> >>>>>>>>>>>>> [email protected]
> >>>>>>>>>>>>>> ,
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>
> https://docs.google.com/document/d/1JzMbQ1pCJsLVz8yHrCxroYMRP2GwGwvacLrGyaIx5Yc/edit?fbclid=IwAR0vfeJvAbEUSzHQAAJfnWTaX46L6o7LyXhMfBUCcPrNi-uXNgoOaI8PMDQ
>
>
>

Re: [DISCUSS] FLIP-211: Kerberos delegation token framework

Reply via email to