Hi Danny, Please find my answers inline.
G On Mon, Nov 21, 2022 at 9:27 PM Danny Cranmer <dannycran...@apache.org> wrote: > Hey, > > > All the rest which is public can be considered accidental leak (for > example > "DelegationTokenManager" where Java doesn't support internal interfaces). > > If the classes are internal, then they should be annotated @Internal. This > is a Flink thing [1]. My explicit recommendation is to ensure all > classes/interfaces that you expect end users or developers to use are > annotated @PublicEvolving or @Experimental and others are @Internal. > I've just created https://issues.apache.org/jira/browse/FLINK-30134 to address this gap. > > Thanks for the explanation of how the connectors will retrieve the tokens. > Do you intend for this to be via a static method or via the operator > context, or something else? With regards to the AWS clients we would > typically not rebuild the credential provider for each request. It would be > simple enough to implement a cached credential provider that is backed by > the TokenContainer. I suppose this is simpler than the hook/subscription > mechanism, we can always add that later if the need arises. > I'm planning a simple static method if that fits to the needs so nothing super fancy: byte[] TokenContainer.getTokensFor(String id) I'm going to ping you on the PR and you're going to have possibility to shape this. That said the main intention is to ease things :) > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-196%3A+Source+API+stability+guarantees > > Thanks, > Danny > > On Mon, Nov 21, 2022 at 5:45 PM Gabor Somogyi <gabor.g.somo...@gmail.com> > wrote: > > > Hi Danny, > > > > Thanks for spending your time to make the design better. My comments > added > > inline. > > Please share your opinion or just correct me if I've misunderstood your > > suggestions. > > > > G > > > > > > On Mon, Nov 21, 2022 at 5:10 PM Danny Cranmer <dannycran...@apache.org> > > wrote: > > > > > Thanks for the FLIP Gabor. > > > > > > Generally looks good, have a couple of comments. > > > > > > If I understand correctly we plan on pulling the current implementation > > > apart and adding an extra layer of abstraction, to allow arbitrary > > > implementations. I notice that most of the delegation token classes [1] > > are > > > missing compatibility annotations, although a couple are @Experimental. > > Can > > > we update these classes to add the compatibility annotations ASAP? > > > The DelegationTokenManager class [2] you talk about adding already > > exists, > > > without any compatibility guarantees. The recent FLIPs [3][4] do not > > cover > > > what support un-annotated classes default to, unless I missed it. > > > > > > > You've touched an important part which every component must define/obey > > sooner or later when it's mature enough. > > The feature is just finished so marked as fully experimental as a whole. > > The intended developer facing API is "DelegationTokenProvider" (at least > > for now). > > All the rest which is public can be considered accidental leak (for > example > > "DelegationTokenManager" where Java doesn't support internal interfaces). > > > > The "DelegationTokenManager" interface came into the picture in a comment > > to make it maybe pluggable but now I think that we must delete/change it. > > The reason behind is that "obtainDelegationTokens" API has a > "Credentials" > > class as parameter which binding the whole stuff to Hadoop. > > I personally don't insist to any of the choices just saying that an API > > shouldn't contain implementation details. > > > > What we can do in short term: if you think there are places where > > additional annotations are needed then please share your explicit > > suggestions. > > When it's clear I can create a jira/PR and we can get a rid of those > gaps. > > > > In mid/long term when we see that a fully custom provider can be added to > > the framework without Hadoop then API guarantees can come into the > picture. > > > > > > > > > > > External connectors need to look into the TokenContainer and when > token > > > is found with a specific key then it must be used > > > Will we implement some form of hook that connectors can subscribe to? > > > Otherwise I feel we will duplicate a bunch of logic across connectors > > > periodically polling this TokenContainer for updates. > > > > > > > Good point, in short not considered until you not mentioned it. > > In detail Hadoop components are checking the content of the UGI when > > authentication is needed. > > Of course this doesn't mean we must do the same here. We are going to do > > what is more convenient from API > > user perspective. > > My original idea is super simple and only 2 lines considering > > implementation: > > 1. get a byte array from the TokenContainer map with a specific key (the > > provider puts the serialized token byte array into the map with this key) > > 2. The byte array can be deserialized by the actual connector (reverse > > action done by the provider) > > > > Deserializing the tokens all the time can be time consuming so I tend to > > agree that maybe we can add > > listener logic. Such way a connector can subscribe for specific tokens. > > When the token arrives it receives > > a callback where deserialization happens. The deserialized tokens can be > > stored which would be definitely a speed-up. > > > > If we're intended to add a listener mechanism in this area then my > > suggestions would be: > > * define some sort of timeout for the listener execution. I know, > normally > > this is just storing the value somewhere but if that is somehow > > broken then it can kill whole clusters. > > * some sort of retry logic would be good but to implement this is not > > trivial > > > > All in all the listener mechanism would be more brittle but I see your > > point and it makes sense. > > Please share your thoughts. > > > > > > > Thanks, > > > > > > [1] > > > > > > > > > https://github.com/apache/flink/tree/0634d0cc0a401d268cc1b3788c8ee63a91ff33a6/flink-runtime/src/main/java/org/apache/flink/runtime/security/token > > > [2] > > > > > > > > > https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/security/token/DelegationTokenManager.java > > > [3] > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-196%3A+Source+API+stability+guarantees > > > [4] > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-197%3A+API+stability+graduation+process > > > > > > On Wed, Nov 9, 2022 at 4:46 PM Gabor Somogyi < > gabor.g.somo...@gmail.com> > > > wrote: > > > > > > > OK, removed the fallback part. > > > > > > > > Thanks for the help! > > > > > > > > G > > > > > > > > > > > > On Wed, Nov 9, 2022 at 5:03 PM Őrhidi Mátyás < > matyas.orh...@gmail.com> > > > > wrote: > > > > > > > > > looks better! no further concerns > > > > > > > > > > Cheers, > > > > > Matyas > > > > > > > > > > On Mon, Nov 7, 2022 at 9:21 AM Gabor Somogyi < > > > gabor.g.somo...@gmail.com> > > > > > wrote: > > > > > > > > > > > Oh gosh, copied wrong config keys so fixed my last mail with > green. > > > > > > > > > > > > On Mon, Nov 7, 2022 at 6:07 PM Gabor Somogyi < > > > > gabor.g.somo...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > Hi Matyas, > > > > > > > > > > > > > > In the meantime I was thinking about the per provider re-obtain > > > > feature > > > > > > > and here are my thoughts related that: > > > > > > > * I think it's a good feature in general but as mentioned I > would > > > add > > > > > it > > > > > > > in a separate FLIP > > > > > > > * In case of Hadoop providers it just wouldn't work (HBase > > doesn't > > > > have > > > > > > > end timestamp so actually HDFS is triggering the re-obtain) but > > in > > > > all > > > > > > > non-hadoop providers it's a good idea > > > > > > > * Adding "security.delegation.tokens.renewal.retry.backoff" and > > > > > > > "security.delegation.tokens.renewal.time-ratio" is needed but > as > > > you > > > > > > > mentioned fallback to kerberos configs just doesn't make sense > > > > > > > * In a later FLIP we can add per provider > > > > > > > > > > > > "security.delegation.token.provider.{providerName}.renewal.retry.backoff" > > > > > > > and/or > > > > > > > > > "security.delegation.token.provider.{providerName}.renewal.time-ratio" > > > > > > > for non-hadoop providers > > > > > > > * This is an additional feature which justifies to separate > > Hadoop > > > > and > > > > > > > non-hadoop providers on the API level > > > > > > > > > > > > > > Waiting on your opinion. > > > > > > > > > > > > > > G > > > > > > > > > > > > > > > > > > > > > On Mon, Nov 7, 2022 at 4:17 PM Gabor Somogyi < > > > > > gabor.g.somo...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > >> Hi Matyas, > > > > > > >> > > > > > > >> Thanks for your comments, answered inline. > > > > > > >> > > > > > > >> G > > > > > > >> > > > > > > >> > > > > > > >> On Mon, Nov 7, 2022 at 2:58 PM Őrhidi Mátyás < > > > > matyas.orh...@gmail.com > > > > > > > > > > > > >> wrote: > > > > > > >> > > > > > > >>> Hi Gabor, > > > > > > >>> > > > > > > >>> Thanks for driving this effort! A few thoughts on the topic: > > > > > > >>> - Could you please add a few examples of the delegation token > > > > > providers > > > > > > >>> we > > > > > > >>> expected to be added in the near future? Ideally these > > providers > > > > > could > > > > > > be > > > > > > >>> configured independently from each other. However the > > > > configuration > > > > > > >>> defaults mentioned in the FLIP are derived from hadoop > > > > > configuration. I > > > > > > >>> don't see the point here. > > > > > > >>> > > > > > > >> A clear plan is to add S3 now and Kafka possibly later on. > > > > > > >> > > > > > > >> S3 looks straightforward but that doesn't fit into the > existing > > > > > > framework. > > > > > > >> On Kafka side I've added the Kafka provider to Spark so I can > > > > imagine > > > > > > >> similar solution w/ minor differences. > > > > > > >> Please see the Spark solution: > > > > > > >> > > > > > > > > > > > > > > > > > > > > > https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#security > > > > > > >> Here the minor planned difference is that Spark handles Kafka > > > token > > > > > > >> inside UGI which is not nice but works. > > > > > > >> I've just seen similar generalization effort on the Spark side > > too > > > > so > > > > > > the > > > > > > >> Kafka part may or may not change there. > > > > > > >> > > > > > > >> Not sure what you mean configs are derived from Hadoop. > > > > > > >> > > > > > > >> What I can think of is that > > > > > > >> "security.delegation.tokens.renewal.retry.backoff" is > defaulting > > > to > > > > > > >> "security.kerberos.tokens.renewal.retry.backoff". > > > > > > >> This is basically happening for backward compatibility > purposes. > > > > > > >> > > > > > > >> The other thing what I can think of is that you miss the > > > independent > > > > > > >> provider token obtain functionality. > > > > > > >> When I mean independent configuration I mean each provider has > > > it's > > > > > own > > > > > > >> set of keys which doesn't collide > > > > > > >> but the voting system when token obtain must happen remains > and > > > not > > > > > > >> touched in this FLIP. > > > > > > >> Under voting system I mean each provider may send back its end > > > > > timestamp > > > > > > >> and the lowest wins > > > > > > >> (at that timestamp all tokens are going to be re-obtained). > > > > > > >> If that's what you mean we can think about solutions but it > has > > > > > nothing > > > > > > >> to do with framework generalization. > > > > > > >> > > > > > > >> > > > > > > >>> - Are we planning to support such scenarios where we need to > > > > > read/write > > > > > > >>> from different authentication realms from the same > application. > > > Two > > > > > > >>> Hadoop > > > > > > >>> clusters, Kafka clusters etc? This would need an > authentication > > > > > > provider > > > > > > >>> per source/sink. > > > > > > >>> > > > > > > >>> > > > > > > >> It doesn't need 2 different provider per source/sink to do > Kafka > > > to > > > > > > >> Kafka. Such cases cross-realm trust can be configured w/ > > principal > > > > > > mapping. > > > > > > >> Please see the details here: > > > > > > >> > > > > > > > https://gist.github.com/gaborgsomogyi/c636f352ccec7730ff41ac1d524cb87d > > > > > > >> Even if the gist was originally created for Spark I tend to do > > the > > > > > same > > > > > > >> here in the future. > > > > > > >> > > > > > > >> Just to make an extract here Kafka principal mapping looks > like > > > > this: > > > > > > >> sasl.kerberos.principal.to.local.rules=RULE:[1:$1@$0](.*@ > > > > > > >> DT2HOST.COMPANY.COM)s/@.*//,DEFAULT > > > > > > >> > > > > > > >> Providing 2 users in Hadoop world is just impossible because > UGI > > > is > > > > > > >> basically a singleton. > > > > > > >> Of course everything can be hacked around (for ex. changing > > > current > > > > > user > > > > > > >> on-the-fly) but that would be such a > > > > > > >> headache what I think we must avoid. It would end-up in > > > > > synchronization > > > > > > >> and performance hell. > > > > > > >> I've made some experiments in my Spark era and this would be > the > > > > same > > > > > > >> here :) > > > > > > >> > > > > > > >> > > > > > > >>> Thanks, > > > > > > >>> Matyas > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> On Mon, Nov 7, 2022 at 5:10 AM Gabor Somogyi < > > > > > > gabor.g.somo...@gmail.com> > > > > > > >>> wrote: > > > > > > >>> > > > > > > >>> > Hi team, > > > > > > >>> > > > > > > > >>> > Delegation token framework is going to be finished soon > > (added > > > in > > > > > > >>> FLIP-211 > > > > > > >>> > < > > > > > > >>> > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-211%3A+Kerberos+delegation+token+framework?src=contextnavpagetreemode > > > > > > >>> > > > > > > > > >>> > ). > > > > > > >>> > Previously there were concerns that the current > > implementation > > > is > > > > > > >>> bound to > > > > > > >>> > Hadoop and Kerberos authentication. This is fair concern > and > > > as a > > > > > > >>> result > > > > > > >>> > we've created a proposal to generalize the delegation token > > > > > framework > > > > > > >>> > (practically making it authentication agnostic). > > > > > > >>> > > > > > > > >>> > This can open the path to add further non-hadoop and > > > non-Kerberos > > > > > > based > > > > > > >>> > providers like S3 or many others. > > > > > > >>> > > > > > > > >>> > One can find the FLIP in: > > > > > > >>> > - Wiki: > > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-272%3A+Generalized+delegation+token+support > > > > > > >>> > - document: > > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/12tFdx1AZVuW9BjwBht_pMNELgrqro8Z5-hzWeaRY4pc/edit?usp=sharing > > > > > > >>> > > > > > > > >>> > I would like to start a discussion to make the framework > > > better. > > > > > > >>> > > > > > > > >>> > BR, > > > > > > >>> > G > > > > > > >>> > > > > > > > >>> > > > > > > >> > > > > > > > > > > > > > > > > > > > > >