Hi everyone. Jun raised a good point/discovered an oversight in the KIP during the VOTE thread that we must resolve.
Regarding this statement in the KIP: "A client-side metric will be created that documents the latency imposed by re-authentication." Jun correctly asked: What's the name of this metric? Does it measure avg or max? My initial reaction is to measure both: *reauthentication-latency-ms-{avg,max}*. Any thoughts? Ron On Wed, Sep 19, 2018 at 9:08 AM Ron Dagostino <rndg...@gmail.com> wrote: > Thanks, Rajini -- I updated the KIP to fix this. > > Ron > > On Wed, Sep 19, 2018 at 4:54 AM Rajini Sivaram <rajinisiva...@gmail.com> > wrote: > >> I should have said `security configs` instead of `channel configs`. >> >> The KIP says: >> >> - The configuration option this KIP proposes to add to enable >> server-side expired-connection-kill is '*connections.max.reauth.ms >> <http://connections.max.reauth.ms>*' (not prefixed with listener >> prefix >> or SASL mechanism name – this is a single value for the cluster) >> - The '*connections.max.reauth.ms <http://connections.max.reauth.ms>*' >> configuration option will not be dynamically changeable; restarts will >> be >> required if the value is to be changed. However, if a new listener is >> dynamically added, the value could be set for that listener at that >> time. >> >> Those statements are contradictory. Perhaps the first one should say `it >> may be optionally prefixed with the listener name`? >> >> On Tue, Sep 18, 2018 at 3:55 PM, Ron Dagostino <rndg...@gmail.com> wrote: >> >> > HI Rajini. The KIP is updated as summarized below, and I will start a >> vote >> > immediately. >> > >> > <<<It may be useful to have a metric of expired connections killed >> > Ok, agreed. I called it expired-connections-killed-total >> > >> > <<<For `successful-v0-authentication-{rate,total}`, we probably want >> > <<<version as a tag rather in the name. >> > Ok, agreed. I kept existing metrics unchanged but added an additional >> tag >> > to the V0 metrics so they are separate. >> > >> > <<<Not sure if we need four of these >> > <<<(rate/total with success/failure). Perhaps just success/total is >> > sufficient? >> > Ok, agreed, just kept the successful total. >> > >> > <<<For the session lifetime config, we don't need to require a listener >> or >> > <<<mechanism prefix >> > Ok, agreed, the config is now cluster-wide. >> > >> > <<<For all channel configs, we allow an optional listener prefix, >> > <<<so we should do the same here >> > Not sure what this is referring to. We don't have channel configs here, >> > right? >> > >> > <<<The KIP says connections are terminated on requests not related to >> > <<<re-authentication (ApiVersionsRequest, SaslHandshakeRequest, and >> > <<<SaslAuthenticateRequest). We can skip for ApiVersionsRequest for >> > <<<re-authentication, so that doesn't need to be in the list. >> > Yes, I was planning on that optimization; agreed, I removed it from the >> > list >> > >> > <<<we allow new listeners >> > <<<to be added dynamically and all configs for the listener can be added >> > <<<dynamically (with the listener prefix). I think we want to allow that >> > for >> > <<<this config >> > <<<We should mention this in the KIP, though in terms of >> implementation, I >> > <<<would leave that for a separate JIRA (it doesn't need to be >> implemented >> > at >> > <<<the same time). >> > Ok, agreed >> > >> > Thanks again for all the feedback and discussion. >> > >> > Ron >> > >> > On Tue, Sep 18, 2018 at 6:43 AM Rajini Sivaram <rajinisiva...@gmail.com >> > >> > wrote: >> > >> > > Hi Ron, >> > > >> > > Thanks for the updates. The KIP looks good. A few comments and minor >> > points >> > > below, but feel free to start vote to try and get it into 2.1.0. More >> > > community feedback will be really useful. >> > > >> > > 1) It may be useful to have a metric of expired connections killed by >> the >> > > broker. There could be a client implementation that doesn't support >> > > re-authentications, but happens to use the latest version of >> > > SaslAuthenticateRequest. Or cases where re-authentication didn't >> happen >> > on >> > > time. >> > > >> > > 2) For `successful-v0-authentication-{rate,total}`, we probably want >> > > version as a tag rather in the name. Not sure if we need four of these >> > > (rate/total with success/failure). Perhaps just success/total is >> > > sufficient? >> > > >> > > 3) For the session lifetime config, we don't need to require a >> listener >> > or >> > > mechanism prefix. In most cases, we would expect a single config on >> the >> > > broker-side. For all channel configs, we allow an optional listener >> > prefix, >> > > so we should do the same here. >> > > >> > > 4) The KIP says connections are terminated on requests not related to >> > > re-authentication (ApiVersionsRequest, SaslHandshakeRequest, and >> > > SaslAuthenticateRequest). We can skip for ApiVersionsRequest for >> > > re-authentication, so that doesn't need to be in the list. >> > > >> > > 5) The KIP says that the new config will not be dynamically >> updatable. We >> > > have a very limited set of configs that are dynamically updatable for >> an >> > > existing listener. And we don't want to add this config to the list >> since >> > > we don't expect this value to change frequently. But we allow new >> > listeners >> > > to be added dynamically and all configs for the listener can be added >> > > dynamically (with the listener prefix). I think we want to allow that >> for >> > > this config (i.e. add a new OAuth listener with re-authentication >> > enabled). >> > > We should mention this in the KIP, though in terms of implementation, >> I >> > > would leave that for a separate JIRA (it doesn't need to be >> implemented >> > at >> > > the same time). >> > > >> > > >> > > >> > > On Tue, Sep 18, 2018 at 3:06 AM, Ron Dagostino <rndg...@gmail.com> >> > wrote: >> > > >> > > > HI again, Rajini. Would we ever want the max session time to be >> > > different >> > > > across different SASL mechanisms? I'm wondering, now that we are >> > > > supporting all SASL mechanisms via this KIP, if we still need to >> prefix >> > > > this config with the "[listener].[mechanism]." prefix. I've kept >> the >> > > > prefix in the KIP for now, but it would be easier to just set it >> once >> > for >> > > > all mechanisms, and I don't see that as being a problem. Let me >> know >> > > what >> > > > you think. >> > > > >> > > > Ron >> > > > >> > > > On Mon, Sep 17, 2018 at 9:51 PM Ron Dagostino <rndg...@gmail.com> >> > wrote: >> > > > >> > > > > Hi Rajini. The KIP is updated. Aside from a once-over to make >> sure >> > it >> > > > is >> > > > > all accurate, I think we need to confirm the metrics. The >> decision >> > to >> > > > not >> > > > > reject authentications that use tokens with too-long a lifetime >> > allowed >> > > > the >> > > > > metrics to be simpler. I decided that in addition to tracking >> these >> > > > > metrics on the broker: >> > > > > >> > > > > failed-reauthentication-{rate,total} and >> > > > > successful-reauthentication-{rate,total} >> > > > > >> > > > > we simply need one more set of broker metrics to track the subset >> of >> > > > > clients clients that are not upgraded to v2.1.0 and are still >> using a >> > > V0 >> > > > > SaslAuthenticateRequest: >> > > > > >> > > > > failed-v0-authentication-{rate,total} and >> > > > > successful-v0-authentication-{rate,total} >> > > > > >> > > > > See the Migration section of the KIP for details of how this >> would be >> > > > used. >> > > > > >> > > > > I wonder if we need a broker metric documenting the number of >> > "expired" >> > > > > sessions killed by the broker since it would be the same as >> > > > > successful-v0-authentication-total. I've eliminated that from the >> > KIP >> > > > for >> > > > > now. Thoughts? >> > > > > >> > > > > There is also a client-side metric for re-authentication latency >> > > tracking >> > > > > (still unnamed -- do you have a preference?) >> > > > > >> > > > > I think we're close to being able to put this KIP up for a vote. >> > > > > >> > > > > Ron >> > > > > >> > > > > >> > > > > On Mon, Sep 17, 2018 at 2:45 PM Ron Dagostino <rndg...@gmail.com> >> > > wrote: >> > > > > >> > > > >> <<<extra field alongside errors, not in the opaque body as >> mentioned >> > > in >> > > > >> the KIP >> > > > >> Right, that was a concern I had mentioned in a follow-up email. >> > Agree >> > > > it >> > > > >> should go alongside errors. >> > > > >> >> > > > >> <<<SCRAM for delegation token based authentication where >> credentials >> > > > have >> > > > >> <<<an expiry time, so we do need to be able to set individual >> > > credential >> > > > >> <<<lifetimes, where the server knows the expiry time but client >> may >> > > not >> > > > >> Ah, ok, I was not aware that this case existed. Agreed, for >> > > > consistency, >> > > > >> server will always send it back to the client. >> > > > >> >> > > > >> <<<I was trying to avoid this altogether [SaslClient negotiated >> > > property >> > > > >> for lifetime] >> > > > >> Since we now agree that the server will send it back, yes, there >> is >> > no >> > > > >> need for this. >> > > > >> >> > > > >> <<<wasn't entirely clear about the purpose of >> > > > >> [sasl.login.refresh.reauthenticate.enable] >> > > > >> I think this might have been more useful when we weren't >> necessarily >> > > > >> going to support all SASL mechanisms and/or when the broker was >> not >> > > > going >> > > > >> to advertise the fact that it supported re-authentication. You >> are >> > > > >> correct, now that we support it for all SASL mechanisms and we >> are >> > > > bumping >> > > > >> an API version, I think it is okay to simply enable it wherever >> both >> > > the >> > > > >> client and server meet the required versions. >> > > > >> >> > > > >> <<<wasn't entirely clear about the purpose of [ >> > > > connections.max.reauth.ms] >> > > > >> <<<wasn't expecting that we would reject >> > > > >> <<<tokens or tickets simply because they were too long-lived >> > > > >> <<<tickets are granted by a 3rd party authority >> > > > >> <<<Client re-authenticates even though token was not >> > > > >> <<<refreshed. Does this matter? >> > > > >> I was going under the assumption that it would matter, but based >> on >> > > your >> > > > >> pushback I just realized that the same functionality can be >> > > implemented >> > > > as >> > > > >> part of token validation if there is a desire to limit token >> > lifetimes >> > > > to a >> > > > >> certain max value (and the token validator has to be provided in >> > > > production >> > > > >> anyway since all we provide out-of-the-box is the unsecured >> > > > validator). So >> > > > >> I'm willing to abandon this check as part of re-authentication. >> > > > >> >> > > > >> I'll adjust the KIP accordingly a bit later. Thanks for the >> > continued >> > > > >> feedback/discussion. >> > > > >> >> > > > >> Ron >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> On Mon, Sep 17, 2018 at 2:10 PM Rajini Sivaram < >> > > rajinisiva...@gmail.com >> > > > > >> > > > >> wrote: >> > > > >> >> > > > >>> Hi Ron, >> > > > >>> >> > > > >>> >> > > > >>> *1) Is there a reason to communicate this value back to the >> client >> > > when >> > > > >>> the >> > > > >>> client already knows it? It's an extra network round-trip, and >> > since >> > > > the >> > > > >>> SASL round-tripsare defined by the spec I'm not certain adding >> an >> > > extra >> > > > >>> round-trip is acceptable.* >> > > > >>> >> > > > >>> I wasn't suggesting an extra round-trip for propagating session >> > > > >>> lifetime. I >> > > > >>> was expecting session lifetime to be added to the last >> > > > SASL_AUTHENTICATE >> > > > >>> response from the broker. Because SASL is a challenge-response >> > > > mechanism, >> > > > >>> SaslServer knows when the response being sent is the last one >> and >> > > hence >> > > > >>> can >> > > > >>> send the session lifetime in the response (in the same way as we >> > > > >>> propagate >> > > > >>> errors). I was expecting this to be added as an extra field >> > alongside >> > > > >>> errors, not in the opaque body as mentioned in the KIP. The >> opaque >> > > byte >> > > > >>> array strictly conforms to the SASL mechanism wire protocol and >> we >> > > want >> > > > >>> to >> > > > >>> keep it that way. >> > > > >>> >> > > > >>> As you have said, we don't need server to propagate session >> > lifetime >> > > > for >> > > > >>> OAUTHBEARER since client knows the token lifetime. But server >> also >> > > > knows >> > > > >>> the credential lifetime and by having the server decide the >> > lifetime, >> > > > we >> > > > >>> can use the same code path for all mechanisms. If we only have a >> > > > constant >> > > > >>> max lifetime in the server, for PLAIN and SCRAM we will end up >> > having >> > > > the >> > > > >>> same lifetime for all credentials with no ability to set actual >> > > expiry. >> > > > >>> We >> > > > >>> use SCRAM for delegation token based authentication where >> > credentials >> > > > >>> have >> > > > >>> an expiry time, so we do need to be able to set individual >> > credential >> > > > >>> lifetimes, where the server knows the expiry time but client may >> > not. >> > > > >>> >> > > > >>> *2) I also just realized that if the client is to learn the >> > > credential >> > > > >>> lifetime we wouldn't want to put special-case code in the >> > > Authenticator >> > > > >>> for >> > > > >>> GSSAPI and OAUTHBEARER; we would want to expose the value >> > > generically, >> > > > >>> probably as a negotiated property on the SaslClient instance.* >> > > > >>> >> > > > >>> I was trying to avoid this altogether. Client doesn't need to >> know >> > > > >>> credential lifetime. Server asks the client to re-authenticate >> > within >> > > > its >> > > > >>> session lifetime. >> > > > >>> >> > > > >>> 3) From the KIP, I wasn't entirely clear about the purpose of >> the >> > two >> > > > >>> configs: >> > > > >>> >> > > > >>> sasl.login.refresh.reauthenticate.enable: Do we need this? >> Client >> > > > knows >> > > > >>> if >> > > > >>> broker version supports re-authentication based on the >> > > > SASL_AUTHENTICATE >> > > > >>> version returned in ApiVersionsResponse. Client knows if broker >> is >> > > > >>> configured to enable re-authentication based on session lifetime >> > > > returned >> > > > >>> in SaslAuthenticateResponse. If broker has re-authentication >> > > configured >> > > > >>> and >> > > > >>> client supports re-authentication, you would always want >> > > > re-authenticate. >> > > > >>> So I wasn't sure why we need a config to opt in or out on the >> > > > >>> client-side. >> > > > >>> >> > > > >>> >> > > > >>> connections.max.reauth.ms: We obviously need a broker-side >> config. >> > > Not >> > > > >>> entirely sure about the semantics of the config that drives >> > > > >>> re-authentication. In particular, I wasn't expecting that we >> would >> > > > reject >> > > > >>> tokens or tickets simply because they were too long-lived. Since >> > > tokens >> > > > >>> or >> > > > >>> tickets are granted by a 3rd party authority, I am not sure if >> > > clients >> > > > >>> will >> > > > >>> always have control over the lifetime. Do we need to support any >> > more >> > > > >>> scenarios than these: >> > > > >>> >> > > > >>> >> > > > >>> A) reauth.ms=10,credential.lifetime.ms=10 : Broker sets >> > > > >>> session.lifetime=10, >> > > > >>> so this works. >> > > > >>> >> > > > >>> B) reauth.ms=10, credential.lifetime.ms=5 : Broker sets >> > > > >>> session.lifetime=5, >> > > > >>> so this works. >> > > > >>> C) reauth.ms=10, credential.lifetime.ms=20 : Broker sets >> > > > >>> session.lifetime=10. Client re-authenticates even though token >> was >> > > not >> > > > >>> refreshed. Does this matter? >> > > > >>> D) reauth.ms=Long.MAX_VALUE, credential.lifetime.ms=10: Broker >> > sets >> > > > >>> session.lifetime=10, client re-authenticates based on credential >> > > > expiry. >> > > > >>> E) reauth.ms=0 (default), credential.lifetime.ms=10 : Broker >> sets >> > > > >>> session.lifetime=0, Broker doesn't terminate sessions, client >> > doesn't >> > > > >>> re-authenticate. We generate useful metrics. >> > > > >>> F) reauth.ms=0 (default),no lifetime for credential (e.g. >> PLAIN): >> > > > Broker >> > > > >>> sets session.lifetime=0, Broker doesn't terminate sessions, >> client >> > > > >>> doesn't >> > > > >>> re-authenticate >> > > > >>> G) reauth.ms=10,no lifetime for credential (e.g. PLAIN) : >> Broker >> > > sets >> > > > >>> session.lifetime=10. Client re-authenticates. >> > > > >>> >> > > > >>> I would have thought that D) is the typical scenario for >> > > OAuth/Kerberos >> > > > >>> to >> > > > >>> respect token expiry time. G) would be typical scenario for >> PLAIN >> > to >> > > > >>> force >> > > > >>> re-authenication at regular intervals. A/B/C are useful to force >> > > > >>> re-authentication in scenarios where you might check for >> credential >> > > > >>> revocation in the server. And E/F are useful to disable >> > > > re-authentication >> > > > >>> and generate metrics (also the default behaviour useful during >> > > > >>> migration). >> > > > >>> Have I missed something? >> > > > >>> >> > > > >>> >> > > > >>> On Mon, Sep 17, 2018 at 4:27 PM, Ron Dagostino < >> rndg...@gmail.com> >> > > > >>> wrote: >> > > > >>> >> > > > >>> > Hi yet again, Rajini. I also just realized that if the >> client is >> > > to >> > > > >>> learn >> > > > >>> > the credential lifetime we wouldn't want to put special-case >> code >> > > in >> > > > >>> the >> > > > >>> > Authenticator for GSSAPI and OAUTHBEARER; we would want to >> expose >> > > the >> > > > >>> value >> > > > >>> > generically, probably as a negotiated property on the >> SaslClient >> > > > >>> instance. >> > > > >>> > We might be talking about making the negotiated property key >> part >> > > of >> > > > >>> the >> > > > >>> > public API. In other words, the SaslClient would be >> responsible >> > > for >> > > > >>> > exposing the credential (i.e. token or ticket) lifetime at a >> > > > well-known >> > > > >>> > negotiated property name, such as "Credential.Lifetime" and >> > > putting a >> > > > >>> Long >> > > > >>> > value there if there is a lifetime. That well-klnown key >> (e.g. >> > > > >>> > "Credential.Lifetime") would be part of the public API, right? >> > > > >>> > >> > > > >>> > Ron >> > > > >>> > >> > > > >>> > On Mon, Sep 17, 2018 at 11:03 AM Ron Dagostino < >> > rndg...@gmail.com> >> > > > >>> wrote: >> > > > >>> > >> > > > >>> > > Hi again, Rajini. After thinking about this a little >> while, it >> > > > >>> occurs to >> > > > >>> > > me that maybe the communication of max session lifetime >> should >> > > > occur >> > > > >>> via >> > > > >>> > > SASL_HANDSHAKE after all. Here's why. The value >> communicated >> > is >> > > > >>> the max >> > > > >>> > > session lifetime allowed, and the client can assume it is >> the >> > the >> > > > >>> session >> > > > >>> > > lifetime for that particular session unless the particular >> SASL >> > > > >>> mechanism >> > > > >>> > > could result in a shorter session that would be obvious to >> the >> > > > >>> client and >> > > > >>> > > the server. In particular, for OAUTHBEARER, the session >> > lifetime >> > > > >>> will be >> > > > >>> > > the token lifetime, which the client and server will both >> know. >> > > Is >> > > > >>> > there a >> > > > >>> > > reason to communicate this value back to the client when the >> > > client >> > > > >>> > already >> > > > >>> > > knows it? It's an extra network round-trip, and since the >> SASL >> > > > >>> > round-trips >> > > > >>> > > are defined by the spec I'm not certain adding an extra >> > > round-trip >> > > > is >> > > > >>> > > acceptable. Even if we decide we can add it, it helps with >> > > latency >> > > > >>> if we >> > > > >>> > > don't. >> > > > >>> > > >> > > > >>> > > Kerberos may be a bit different -- I don't know if the >> broker >> > can >> > > > >>> learn >> > > > >>> > > the session lifetime. If it can then the same thing holds >> -- >> > > both >> > > > >>> client >> > > > >>> > > and server will know the session lifetime and there is no >> > reason >> > > to >> > > > >>> > > communicate it back. If the server can't learn the lifetime >> > > then I >> > > > >>> don't >> > > > >>> > > think adding an extra SASL_AUTHENTICATE round trip is going >> to >> > > > help, >> > > > >>> > anyway. >> > > > >>> > > >> > > > >>> > > Also, by communicating the max session lifetime in the >> > > > SASL_HANDSHAKE >> > > > >>> > > response, both OAUTHBEARER and GSSAPI clients will be able >> to >> > > know >> > > > >>> before >> > > > >>> > > sending any SASL_AUTHENTICATE requests whether their >> credential >> > > > >>> violates >> > > > >>> > > the maximum. This allows a behaving client to give a good >> > error >> > > > >>> message. >> > > > >>> > > A malicious client would ignore the value and send a >> > longer-lived >> > > > >>> > > credential, and then that would be rejected on the server >> side. >> > > > >>> > > >> > > > >>> > > I'm still good with ExpiringCredential not needing to be >> > public. >> > > > >>> > > >> > > > >>> > > What do you think? >> > > > >>> > > >> > > > >>> > > Ron >> > > > >>> > > >> > > > >>> > > Ron >> > > > >>> > > >> > > > >>> > > Ron >> > > > >>> > > >> > > > >>> > > On Mon, Sep 17, 2018 at 9:13 AM Ron Dagostino < >> > rndg...@gmail.com >> > > > >> > > > >>> wrote: >> > > > >>> > > >> > > > >>> > >> Hi Rajini. I've updated the KIP to reflect the decision to >> > > fully >> > > > >>> > support >> > > > >>> > >> this for all SASL mechanisms and to not require the >> > > > >>> ExpiringCredential >> > > > >>> > >> interface to be public. >> > > > >>> > >> >> > > > >>> > >> Ron >> > > > >>> > >> >> > > > >>> > >> On Mon, Sep 17, 2018 at 6:55 AM Ron Dagostino < >> > > rndg...@gmail.com> >> > > > >>> > wrote: >> > > > >>> > >> >> > > > >>> > >>> Actually, I think the metric remains the same assuming the >> > > > >>> > >>> authentication fails when the token lifetime is too long. >> > > > >>> Negative >> > > > >>> > max >> > > > >>> > >>> config on server counts what would have been killed >> because >> > > maybe >> > > > >>> > client >> > > > >>> > >>> re-auth is not turned on; positive max enables the kill, >> > which >> > > is >> > > > >>> > counted >> > > > >>> > >>> by a second metric as proposed. The current proposal had >> > > already >> > > > >>> > stated >> > > > >>> > >>> that a non-zero value would disallow an authentication >> with a >> > > > token >> > > > >>> > that >> > > > >>> > >>> has too large a lifetime, and that is still the case, I >> > think. >> > > > But >> > > > >>> > let’s >> > > > >>> > >>> make sure we are on the same page about all this. >> > > > >>> > >>> >> > > > >>> > >>> Ron >> > > > >>> > >>> >> > > > >>> > >>> > On Sep 17, 2018, at 6:42 AM, Ron Dagostino < >> > > rndg...@gmail.com> >> > > > >>> > wrote: >> > > > >>> > >>> > >> > > > >>> > >>> > Hi Rajini. I see what you are saying. The background >> > login >> > > > >>> refresh >> > > > >>> > >>> thread does have to factor into the decision for >> OAUTHBEARER >> > > > >>> because >> > > > >>> > there >> > > > >>> > >>> is no new token to re-authenticate with until that refresh >> > > > succeeds >> > > > >>> > >>> (similarly with Kerberos), but I think you are right that >> the >> > > > >>> interface >> > > > >>> > >>> doesn’t necessarily have to be public. The server does >> > decide >> > > > the >> > > > >>> > time for >> > > > >>> > >>> PLAIN and the SCRAM-related mechanisms under the current >> > > > proposal, >> > > > >>> but >> > > > >>> > it >> > > > >>> > >>> is done in SaslHandshakeResponse, and at that point the >> > server >> > > > >>> won’t >> > > > >>> > have >> > > > >>> > >>> any token yet; making it happen via >> SaslAuthenticateRequest >> > at >> > > > the >> > > > >>> > very end >> > > > >>> > >>> does allow the server to know everything for all >> mechanisms. >> > > > >>> > >>> > >> > > > >>> > >>> > I see two potential issues to discuss. First, the >> server >> > > must >> > > > >>> > >>> communicate a time that exceeds the (token or ticket) >> refresh >> > > > >>> period >> > > > >>> > on the >> > > > >>> > >>> client. Assuming it communicates the expiration time of >> the >> > > > >>> > token/ticket, >> > > > >>> > >>> that’s the best it can do. So I think that’ll be fine. >> > > > >>> > >>> > >> > > > >>> > >>> > The second issue is what happens if the server is >> > configured >> > > to >> > > > >>> > accept >> > > > >>> > >>> a max value — say, an hour — and the token is good for >> > > longer. I >> > > > >>> > assumed >> > > > >>> > >>> that the client should not be allowed to authenticate in >> the >> > > > first >> > > > >>> > place >> > > > >>> > >>> because it could then simply re-authenticate with the same >> > > token >> > > > >>> after >> > > > >>> > an >> > > > >>> > >>> hour, which defeats the motivations for this KIP. So do >> we >> > > agree >> > > > >>> the >> > > > >>> > max >> > > > >>> > >>> token lifetime allowed will be enforced at authentication >> > time? >> > > > >>> > Assuming >> > > > >>> > >>> so, then we need to discuss migration. >> > > > >>> > >>> > >> > > > >>> > >>> > The current proposal includes a metric that can be used >> to >> > > > >>> identify >> > > > >>> > if >> > > > >>> > >>> an OAUTHBEARER client is misconfigured — the number of >> > > > connections >> > > > >>> that >> > > > >>> > >>> would have been killed (could be non-zero when the >> configured >> > > max >> > > > >>> is a >> > > > >>> > >>> negative number). Do we still want this type of an >> > indication, >> > > > and >> > > > >>> if >> > > > >>> > so, >> > > > >>> > >>> is it still done the same way — a negative number for max, >> > but >> > > > >>> instead >> > > > >>> > of >> > > > >>> > >>> counting the number of kills that would have been done it >> > > counts >> > > > >>> the >> > > > >>> > number >> > > > >>> > >>> of authentications that would have been failed? >> > > > >>> > >>> > >> > > > >>> > >>> > Ron >> > > > >>> > >>> > >> > > > >>> > >>> >> On Sep 17, 2018, at 6:06 AM, Rajini Sivaram < >> > > > >>> > rajinisiva...@gmail.com> >> > > > >>> > >>> wrote: >> > > > >>> > >>> >> >> > > > >>> > >>> >> Hi Ron, >> > > > >>> > >>> >> >> > > > >>> > >>> >> Thinking a bit more about other SASL mechanisms, I >> think >> > the >> > > > >>> issue >> > > > >>> > is >> > > > >>> > >>> that >> > > > >>> > >>> >> in the current proposal, clients decide the >> > > re-authentication >> > > > >>> > period. >> > > > >>> > >>> For >> > > > >>> > >>> >> mechanisms like PLAIN or SCRAM, we would actually want >> > > server >> > > > to >> > > > >>> > >>> determine >> > > > >>> > >>> >> the re-authentication interval. For example, the PLAIN >> or >> > > > SCRAM >> > > > >>> > >>> database >> > > > >>> > >>> >> could specify the expiry time for each principal, or >> the >> > > > broker >> > > > >>> > could >> > > > >>> > >>> be >> > > > >>> > >>> >> configured with a single expiry time for all >> principals of >> > > > that >> > > > >>> > >>> mechanism. >> > > > >>> > >>> >> For OAuth, brokers do know the expiry time. Haven't >> > figured >> > > > out >> > > > >>> what >> > > > >>> > >>> to do >> > > > >>> > >>> >> with Kerberos, but in any case for broker-side >> connection >> > > > >>> > termination >> > > > >>> > >>> on >> > > > >>> > >>> >> expiry, we need the broker to know/decide the expiry >> time. >> > > So >> > > > I >> > > > >>> > would >> > > > >>> > >>> like >> > > > >>> > >>> >> to suggest a slightly different approach to managing >> > > > >>> > >>> re-authentications. >> > > > >>> > >>> >> >> > > > >>> > >>> >> 1) Instead of changing SASL_HANDSHAKE version number, >> we >> > > bump >> > > > up >> > > > >>> > >>> >> SASL_AUTHENTICATE version number. >> > > > >>> > >>> >> 2) In the final SASL_AUTHENTICATE response, broker >> returns >> > > the >> > > > >>> > expiry >> > > > >>> > >>> time >> > > > >>> > >>> >> of this authenticated session. This is the interval >> after >> > > > which >> > > > >>> > >>> broker will >> > > > >>> > >>> >> terminate the connection If it wasn't re-authenticated. >> > > > >>> > >>> >> 3) We no longer need the public interface change to >> add ` >> > > > >>> > >>> >> >> > > org.apache.kafka.common.security.expiring.ExpiringCredential` >> > > > >>> for >> > > > >>> > the >> > > > >>> > >>> >> client-side. Instead we schedule the next >> > re-authentication >> > > on >> > > > >>> the >> > > > >>> > >>> client >> > > > >>> > >>> >> based on the expiry time provided by the server (some >> time >> > > > >>> earlier >> > > > >>> > >>> than the >> > > > >>> > >>> >> expiry). >> > > > >>> > >>> >> 4) If client uses SASL_AUTHENTICATE v0, broker will not >> > > return >> > > > >>> > expiry >> > > > >>> > >>> time. >> > > > >>> > >>> >> The connection will be terminated if that feature is >> > enabled >> > > > >>> (the >> > > > >>> > >>> same code >> > > > >>> > >>> >> path as client failing to re-authenticte on time). >> > > > >>> > >>> >> >> > > > >>> > >>> >> Thoughts? >> > > > >>> > >>> >> >> > > > >>> > >>> >>> On Sat, Sep 15, 2018 at 3:06 AM, Ron Dagostino < >> > > > >>> rndg...@gmail.com> >> > > > >>> > >>> wrote: >> > > > >>> > >>> >>> >> > > > >>> > >>> >>> Hi everyone. I've updated the KIP to reflect all >> > > discussion >> > > > to >> > > > >>> > date, >> > > > >>> > >>> >>> including the decision to go with the low-level >> approach. >> > > > This >> > > > >>> > >>> latest >> > > > >>> > >>> >>> version of the KIP includes the above " >> > > > >>> connections.max.reauth.ms" >> > > > >>> > >>> >>> proposal, >> > > > >>> > >>> >>> which I know has not been discussed yet. It mentions >> new >> > > > >>> metrics, >> > > > >>> > >>> some of >> > > > >>> > >>> >>> which may not have been discussed either (and names >> are >> > > > missing >> > > > >>> > from >> > > > >>> > >>> some >> > > > >>> > >>> >>> of them). Regardless, this new version is the closest >> > yet >> > > > to a >> > > > >>> > >>> version >> > > > >>> > >>> >>> that can be put to a vote next week. >> > > > >>> > >>> >>> >> > > > >>> > >>> >>> Ron >> > > > >>> > >>> >>> >> > > > >>> > >>> >>>> On Fri, Sep 14, 2018 at 8:59 PM Ron Dagostino < >> > > > >>> rndg...@gmail.com> >> > > > >>> > >>> wrote: >> > > > >>> > >>> >>>> >> > > > >>> > >>> >>>> Minor correction: I'm proposing " >> > > connections.max.reauth.ms" >> > > > >>> as >> > > > >>> > the >> > > > >>> > >>> >>>> broker-side configuration property, not " >> > > > >>> > >>> >>>> connections.max.expired.credentials.ms". >> > > > >>> > >>> >>>> >> > > > >>> > >>> >>>> Ron >> > > > >>> > >>> >>>> >> > > > >>> > >>> >>>>> On Fri, Sep 14, 2018 at 8:40 PM Ron Dagostino < >> > > > >>> rndg...@gmail.com >> > > > >>> > > >> > > > >>> > >>> wrote: >> > > > >>> > >>> >>>>> >> > > > >>> > >>> >>>>> Hi Rajini. I'm going to choose * >> > > > >>> > >>> connections.max.expired.credentials.ms >> > > > >>> > >>> >>>>> <http://connections.max.expired.credentials.ms>* as >> > the >> > > > >>> option >> > > > >>> > >>> name >> > > > >>> > >>> >>>>> because it is consistent with the comment I made in >> the >> > > > >>> section >> > > > >>> > >>> about >> > > > >>> > >>> >>> how >> > > > >>> > >>> >>>>> to get the client and server to agree on credential >> > > > lifetime: >> > > > >>> > >>> >>>>> >> > > > >>> > >>> >>>>> "We could add a new API call so that clients could >> ask >> > > > >>> servers >> > > > >>> > for >> > > > >>> > >>> the >> > > > >>> > >>> >>>>> lifetime they use, or we could extend the >> > > > >>> > >>> SaslHandshakeRequest/Response >> > > > >>> > >>> >>> API >> > > > >>> > >>> >>>>> call to include that information in the server's >> > > response – >> > > > >>> the >> > > > >>> > >>> client >> > > > >>> > >>> >>>>> would then adopt that value" >> > > > >>> > >>> >>>>> >> > > > >>> > >>> >>>>> >> > > > >>> > >>> >>>>> We set the config option value on the broker (with >> the >> > " >> > > > >>> > >>> >>>>> listener.name.mechanism." prefix), and we will >> return >> > the >> > > > >>> > >>> configured >> > > > >>> > >>> >>>>> value in the SaslHandshakeResponse (requiring a wire >> > > format >> > > > >>> > change >> > > > >>> > >>> in >> > > > >>> > >>> >>>>> addition to a version bump). The value (referred >> to as >> > > "X" >> > > > >>> > below) >> > > > >>> > >>> can >> > > > >>> > >>> >>> be >> > > > >>> > >>> >>>>> negative, zero, or positive and is to be >> interpreted as >> > > > >>> follows: >> > > > >>> > >>> >>>>> >> > > > >>> > >>> >>>>> X = 0: Fully disable. The server will respond to >> > > > >>> > >>> re-authentications, >> > > > >>> > >>> >>> but >> > > > >>> > >>> >>>>> it won't kill any connections due to expiration, >> and it >> > > > won't >> > > > >>> > track >> > > > >>> > >>> >>> either >> > > > >>> > >>> >>>>> of the 2 metrics mentioned below. >> > > > >>> > >>> >>>>> >> > > > >>> > >>> >>>>> Now, a couple of definitions for when X != 0: >> > > > >>> > >>> >>>>> >> > > > >>> > >>> >>>>> 1) We define the *maximum allowed expiration time* >> to >> > be >> > > > the >> > > > >>> time >> > > > >>> > >>> >>>>> determined by the point when (re-)authentication >> occurs >> > > > plus >> > > > >>> |X| >> > > > >>> > >>> >>>>> milliseconds. >> > > > >>> > >>> >>>>> 2) We define the *requested expiration time* to be >> the >> > > > >>> maximum >> > > > >>> > >>> allowed >> > > > >>> > >>> >>>>> expiration time except for the case of an OAuth >> bearer >> > > > >>> token, in >> > > > >>> > >>> which >> > > > >>> > >>> >>> case >> > > > >>> > >>> >>>>> it is the point at which the token expires. >> (Kerberos >> > > > >>> presumably >> > > > >>> > >>> also >> > > > >>> > >>> >>>>> specifies a ticket lifetime, but frankly I am not >> in a >> > > > >>> position >> > > > >>> > to >> > > > >>> > >>> do >> > > > >>> > >>> >>> any >> > > > >>> > >>> >>>>> Kerberos-related coding in time for a 2.1.0 release >> and >> > > > would >> > > > >>> > >>> prefer to >> > > > >>> > >>> >>>>> ignore this piece of information for this KIP -- >> would >> > it >> > > > be >> > > > >>> > >>> acceptable >> > > > >>> > >>> >>> to >> > > > >>> > >>> >>>>> have someone else add it later?). >> > > > >>> > >>> >>>>> >> > > > >>> > >>> >>>>> Based on these definitions, we define the behavior >> as >> > > > >>> follows: >> > > > >>> > >>> >>>>> >> > > > >>> > >>> >>>>> *X > 0: Fully enable* >> > > > >>> > >>> >>>>> A) The server will reject any >> > > > >>> authentication/re-authentication >> > > > >>> > >>> attempt >> > > > >>> > >>> >>>>> when the requested expiration time is after the >> maximum >> > > > >>> allowed >> > > > >>> > >>> >>> expiration >> > > > >>> > >>> >>>>> time (which could only happen with an OAuth bearer >> > token, >> > > > >>> > assuming >> > > > >>> > >>> we >> > > > >>> > >>> >>> skip >> > > > >>> > >>> >>>>> dealing with Kerberos for now). >> > > > >>> > >>> >>>>> B) The server will kill connections that are used >> > beyond >> > > > the >> > > > >>> > >>> requested >> > > > >>> > >>> >>>>> expiration time. >> > > > >>> > >>> >>>>> C) A broker metric will be maintained that documents >> > the >> > > > >>> number >> > > > >>> > >>> >>>>> connections killed by the broker. This metric will >> be >> > > > >>> non-zero >> > > > >>> > if >> > > > >>> > >>> a >> > > > >>> > >>> >>> client >> > > > >>> > >>> >>>>> is connecting to the broker with re-authentication >> > either >> > > > >>> > >>> unavailable >> > > > >>> > >>> >>> (i.e. >> > > > >>> > >>> >>>>> an older client) or disabled. >> > > > >>> > >>> >>>>> >> > > > >>> > >>> >>>>> *X < 0: Partially enable* >> > > > >>> > >>> >>>>> A) Same as above: the server will reject any >> > > > >>> > >>> >>>>> authentication/re-authentication attempt when the >> > > > requested >> > > > >>> > >>> expiration >> > > > >>> > >>> >>> time >> > > > >>> > >>> >>>>> is after the maximum allowed expiration time (which >> > could >> > > > >>> only >> > > > >>> > >>> happen >> > > > >>> > >>> >>> with >> > > > >>> > >>> >>>>> an OAuth bearer token, assuming we skip dealing with >> > > > >>> Kerberos for >> > > > >>> > >>> now). >> > > > >>> > >>> >>>>> B) The server will **not** kill connections that are >> > used >> > > > >>> beyond >> > > > >>> > >>> the >> > > > >>> > >>> >>>>> requested expiration time. >> > > > >>> > >>> >>>>> C) A broker metric will be maintained that documents >> > the >> > > > >>> number >> > > > >>> > of >> > > > >>> > >>> API >> > > > >>> > >>> >>>>> requests unrelated to re-authentication that are >> made >> > > over >> > > > a >> > > > >>> > >>> connection >> > > > >>> > >>> >>>>> beyond the requested expiration time. This metric >> will >> > > be >> > > > >>> > helpful >> > > > >>> > >>> for >> > > > >>> > >>> >>>>> operators to ensure that all clients are properly >> > > upgraded >> > > > >>> and >> > > > >>> > >>> >>>>> re-authenticating before fully enabling the >> server-side >> > > > >>> > >>> >>>>> expired-connection-kill functionality (i.e. by >> changing >> > > the >> > > > >>> value >> > > > >>> > >>> from a >> > > > >>> > >>> >>>>> negative number to a positive number): this metric >> will >> > > be >> > > > >>> zero >> > > > >>> > >>> across >> > > > >>> > >>> >>> all >> > > > >>> > >>> >>>>> brokers when it is safe to fully enable the >> server-side >> > > > >>> feature. >> > > > >>> > >>> >>>>> >> > > > >>> > >>> >>>>> Thoughts? >> > > > >>> > >>> >>>>> >> > > > >>> > >>> >>>>> Ron >> > > > >>> > >>> >>>>> >> > > > >>> > >>> >>>>> On Fri, Sep 14, 2018 at 11:59 AM Rajini Sivaram < >> > > > >>> > >>> >>> rajinisiva...@gmail.com> >> > > > >>> > >>> >>>>> wrote: >> > > > >>> > >>> >>>>> >> > > > >>> > >>> >>>>>> It will be good to shorten the config name. We >> have a >> > > > config >> > > > >>> > >>> named ` >> > > > >>> > >>> >>>>>> connections.max.idle.ms`. We could add something >> > like ` >> > > > >>> > >>> >>>>>> connections.max.expired.credentials.ms`. As you >> said, >> > > it >> > > > >>> would >> > > > >>> > be >> > > > >>> > >>> >>>>>> prefixed >> > > > >>> > >>> >>>>>> with listener and SASL mechanism name. We should be >> > able >> > > > to >> > > > >>> > >>> support >> > > > >>> > >>> >>>>>> connection termination in future even with SSL, so >> > > perhaps >> > > > >>> we >> > > > >>> > >>> don't >> > > > >>> > >>> >>> want >> > > > >>> > >>> >>>>>> the `sasl.server.` prefix (we just need to validate >> > > > whether >> > > > >>> the >> > > > >>> > >>> config >> > > > >>> > >>> >>> is >> > > > >>> > >>> >>>>>> supported for the protocol). You can choose >> whether a >> > > > >>> boolean >> > > > >>> > >>> flag or a >> > > > >>> > >>> >>>>>> time interval makes more sense. >> > > > >>> > >>> >>>>>> >> > > > >>> > >>> >>>>>> For the client-side, the KIP explains how to make >> it >> > > work >> > > > >>> for >> > > > >>> > >>> other >> > > > >>> > >>> >>>>>> mechanisms and we can leave it that. Not convinced >> > about >> > > > >>> > >>> server-side >> > > > >>> > >>> >>>>>> though. It seems to me that we probably would >> require >> > > API >> > > > >>> > changes >> > > > >>> > >>> to >> > > > >>> > >>> >>> make >> > > > >>> > >>> >>>>>> it work. Basically the proposed approach works only >> > for >> > > > >>> > >>> OAUTHBEARER. >> > > > >>> > >>> >>>>>> Since >> > > > >>> > >>> >>>>>> we have to bump up SaslHandshakeRequest version in >> > this >> > > > >>> KIP, it >> > > > >>> > >>> will be >> > > > >>> > >>> >>>>>> good to work out if we need to change the request >> or >> > > flow >> > > > to >> > > > >>> > make >> > > > >>> > >>> this >> > > > >>> > >>> >>>>>> work >> > > > >>> > >>> >>>>>> for other mechanisms. I haven't figured out exactly >> > what >> > > > is >> > > > >>> > >>> needed, but >> > > > >>> > >>> >>>>>> will think about it and get back next week. In the >> > > > >>> meantime, you >> > > > >>> > >>> can >> > > > >>> > >>> >>> get >> > > > >>> > >>> >>>>>> the KIP up-to-date with the config, migration path >> > etc. >> > > > and >> > > > >>> get >> > > > >>> > it >> > > > >>> > >>> >>> ready >> > > > >>> > >>> >>>>>> to >> > > > >>> > >>> >>>>>> start vote next week. >> > > > >>> > >>> >>>>>> >> > > > >>> > >>> >>>>>> >> > > > >>> > >>> >>>>>> >> > > > >>> > >>> >>>>>> >> > > > >>> > >>> >>>>>> >> > > > >>> > >>> >>>>>> On Fri, Sep 14, 2018 at 1:24 PM, Ron Dagostino < >> > > > >>> > rndg...@gmail.com >> > > > >>> > >>> > >> > > > >>> > >>> >>>>>> wrote: >> > > > >>> > >>> >>>>>> >> > > > >>> > >>> >>>>>>> Hi Rajini. >> > > > >>> > >>> >>>>>>> >> > > > >>> > >>> >>>>>>> Agreed, we will bump the SaslHandshakeRequest API >> > > number >> > > > >>> (no >> > > > >>> > wire >> > > > >>> > >>> >>>>>> format >> > > > >>> > >>> >>>>>>> change, of course). >> > > > >>> > >>> >>>>>>> >> > > > >>> > >>> >>>>>>> Agreed, responding to re-authentication will >> always >> > be >> > > > >>> enabled >> > > > >>> > >>> on the >> > > > >>> > >>> >>>>>>> broker since it is initiated by the client. >> > > > >>> > >>> >>>>>>> >> > > > >>> > >>> >>>>>>> Agreed, connection termination by the broker will >> be >> > > > >>> opt-in. >> > > > >>> > I'm >> > > > >>> > >>> >>>>>> thinking >> > > > >>> > >>> >>>>>>> *sasl.server.disconnect.expired.credential. >> > connections. >> > > > >>> > enable*, >> > > > >>> > >>> >>>>>> prefixed >> > > > >>> > >>> >>>>>>> with listener and SASL mechanism name in >> lower-case >> > so >> > > it >> > > > >>> can >> > > > >>> > be >> > > > >>> > >>> >>>>>> opt-in at >> > > > >>> > >>> >>>>>>> the granularity of a SASL mechanism; for example, >> " >> > > > >>> > >>> >>>>>>> listener.name.sasl_ssl.oauthbearer.sasl.server. >> > > > >>> > >>> >>>>>>> disconnect.expired.credential. >> > connections.enable=true >> > > > >>> > >>> >>>>>>> ". It is long, so if you have a preference for a >> > > shorter >> > > > >>> name >> > > > >>> > >>> let me >> > > > >>> > >>> >>>>>> know >> > > > >>> > >>> >>>>>>> and we'll go with it instead. >> > > > >>> > >>> >>>>>>> >> > > > >>> > >>> >>>>>>> Agreed, additional metrics will be helpful. I'll >> add >> > > > them >> > > > >>> to >> > > > >>> > the >> > > > >>> > >>> >>> KIP. >> > > > >>> > >>> >>>>>>> >> > > > >>> > >>> >>>>>>> <<<need to work out exactly how this works with >> all >> > > SASL >> > > > >>> > >>> mechanisms >> > > > >>> > >>> >>>>>>> <<<not sure we have the data required on the >> broker >> > or >> > > a >> > > > >>> way of >> > > > >>> > >>> >>>>>>> <<< extending the [GSSAPI] mechanism >> > > > >>> > >>> >>>>>>> Not sure I agree here. The paragraph I added to >> the >> > > KIP >> > > > >>> > >>> describes >> > > > >>> > >>> >>> how >> > > > >>> > >>> >>>>>> I >> > > > >>> > >>> >>>>>>> think this can be done. Given that description, >> do >> > you >> > > > >>> still >> > > > >>> > >>> feel >> > > > >>> > >>> >>>>>> Kerberos >> > > > >>> > >>> >>>>>>> will not be possible? If Kerberos presents a >> > > significant >> > > > >>> > hurdle >> > > > >>> > >>> >>> then I >> > > > >>> > >>> >>>>>>> don't think that should prevent us from doing it >> for >> > > > other >> > > > >>> > >>> mechanisms >> > > > >>> > >>> >>>>>> -- I >> > > > >>> > >>> >>>>>>> would rather state that it isn't supported with >> > GSSAPI >> > > > due >> > > > >>> to >> > > > >>> > >>> >>>>>> limitations >> > > > >>> > >>> >>>>>>> in Kerberos itself than not have it for >> OAUTHBEARER, >> > > for >> > > > >>> > example. >> > > > >>> > >>> >>> And >> > > > >>> > >>> >>>>>>> right now I don't think we need more than a >> > high-level >> > > > >>> > >>> description of >> > > > >>> > >>> >>>>>> how >> > > > >>> > >>> >>>>>>> this could be done. In other words, I think we >> have >> > > this >> > > > >>> > >>> covered, >> > > > >>> > >>> >>>>>> unless >> > > > >>> > >>> >>>>>>> there is a fundamental problem due to Kerberos not >> > > making >> > > > >>> data >> > > > >>> > >>> (i.e. >> > > > >>> > >>> >>>>>> ticket >> > > > >>> > >>> >>>>>>> expiration) available on the server. I want to >> > submit >> > > > >>> this >> > > > >>> > KIP >> > > > >>> > >>> for >> > > > >>> > >>> >>> a >> > > > >>> > >>> >>>>>> vote >> > > > >>> > >>> >>>>>>> early next week in the hope of having it accepted >> by >> > > the >> > > > >>> > Monday, >> > > > >>> > >>> 9/24 >> > > > >>> > >>> >>>>>>> deadline for the 2.1.0 release, and if that >> happens I >> > > > >>> believe I >> > > > >>> > >>> can >> > > > >>> > >>> >>>>>> get a >> > > > >>> > >>> >>>>>>> PR into really good shape soon thereafter (I'm >> > working >> > > on >> > > > >>> it >> > > > >>> > >>> now). >> > > > >>> > >>> >>>>>>> >> > > > >>> > >>> >>>>>>> Ron >> > > > >>> > >>> >>>>>>> >> > > > >>> > >>> >>>>>>> >> > > > >>> > >>> >>>>>>> >> > > > >>> > >>> >>>>>>> >> > > > >>> > >>> >>>>>>> >> > > > >>> > >>> >>>>>>> >> > > > >>> > >>> >>>>>>> On Fri, Sep 14, 2018 at 5:57 AM Rajini Sivaram < >> > > > >>> > >>> >>>>>> rajinisiva...@gmail.com> >> > > > >>> > >>> >>>>>>> wrote: >> > > > >>> > >>> >>>>>>> >> > > > >>> > >>> >>>>>>>> Hi Ron, >> > > > >>> > >>> >>>>>>>> >> > > > >>> > >>> >>>>>>>> 1) Yes, we should update the version of >> > > > >>> > `SaslHandshakeRequest`. >> > > > >>> > >>> >>>>>> Clients >> > > > >>> > >>> >>>>>>>> would attempt to re-authenticate only if broker's >> > > > >>> > >>> >>>>>> SaslHandshakeRequest >> > > > >>> > >>> >>>>>>>> version >=2. >> > > > >>> > >>> >>>>>>>> 2) I think we should enable broker's >> > re-authentication >> > > > and >> > > > >>> > >>> >>> connection >> > > > >>> > >>> >>>>>>>> termination code regardless of client version. >> > > > >>> > Re-authentication >> > > > >>> > >>> >>>>>> could be >> > > > >>> > >>> >>>>>>>> always enabled since it is triggered only by >> clients >> > > and >> > > > >>> > broker >> > > > >>> > >>> >>> could >> > > > >>> > >>> >>>>>>> just >> > > > >>> > >>> >>>>>>>> handle it. Connection termination should be >> opt-in, >> > > but >> > > > >>> should >> > > > >>> > >>> work >> > > > >>> > >>> >>>>>> with >> > > > >>> > >>> >>>>>>>> clients of any version. If turned on and clients >> are >> > > of >> > > > an >> > > > >>> > older >> > > > >>> > >>> >>>>>> version, >> > > > >>> > >>> >>>>>>>> this would simply force reconnection, which >> should >> > be >> > > > ok. >> > > > >>> > >>> >>> Additional >> > > > >>> > >>> >>>>>>>> metrics to monitor expired connections may be >> useful >> > > > >>> anyway. >> > > > >>> > >>> >>>>>>>> >> > > > >>> > >>> >>>>>>>> We also need to work out exactly how this works >> with >> > > all >> > > > >>> SASL >> > > > >>> > >>> >>>>>> mechanisms. >> > > > >>> > >>> >>>>>>>> In particular, we have ensure that we can make it >> > work >> > > > >>> with >> > > > >>> > >>> >>> Kerberos >> > > > >>> > >>> >>>>>>> since >> > > > >>> > >>> >>>>>>>> we use the implementation from the JRE and I am >> not >> > > sure >> > > > >>> we >> > > > >>> > have >> > > > >>> > >>> >>> the >> > > > >>> > >>> >>>>>> data >> > > > >>> > >>> >>>>>>>> required on the broker or a way of extending the >> > > > >>> mechanism. >> > > > >>> > >>> >>>>>>>> >> > > > >>> > >>> >>>>>>>> Thoughts? >> > > > >>> > >>> >>>>>>>> >> > > > >>> > >>> >>>>>>>> On Thu, Sep 13, 2018 at 6:56 PM, Ron Dagostino < >> > > > >>> > >>> rndg...@gmail.com> >> > > > >>> > >>> >>>>>>> wrote: >> > > > >>> > >>> >>>>>>>> >> > > > >>> > >>> >>>>>>>>> Hi Rajini. I'm thinking about how we deal with >> > > > >>> migration. >> > > > >>> > >>> For >> > > > >>> > >>> >>>>>>> example, >> > > > >>> > >>> >>>>>>>>> let's say we have an existing 2.0.0 cluster >> using >> > > > >>> > >>> >>> SASL/OAUTHBEARER >> > > > >>> > >>> >>>>>> and >> > > > >>> > >>> >>>>>>> we >> > > > >>> > >>> >>>>>>>>> want to use this feature. The desired end >> state is >> > > to >> > > > >>> have >> > > > >>> > all >> > > > >>> > >>> >>>>>> clients >> > > > >>> > >>> >>>>>>>> and >> > > > >>> > >>> >>>>>>>>> brokers migrated to a version that supports the >> > > feature >> > > > >>> (call >> > > > >>> > >>> it >> > > > >>> > >>> >>>>>> 2.x) >> > > > >>> > >>> >>>>>>>> with >> > > > >>> > >>> >>>>>>>>> the feature turned on. We need to document the >> > > > supported >> > > > >>> > >>> path(s) >> > > > >>> > >>> >>>>>> to >> > > > >>> > >>> >>>>>>> this >> > > > >>> > >>> >>>>>>>>> end state. >> > > > >>> > >>> >>>>>>>>> >> > > > >>> > >>> >>>>>>>>> Here are a couple of scenarios with >> implications: >> > > > >>> > >>> >>>>>>>>> >> > > > >>> > >>> >>>>>>>>> 1) *Migrate client to 2.x and turn the feature >> on >> > but >> > > > >>> broker >> > > > >>> > is >> > > > >>> > >>> >>>>>> still >> > > > >>> > >>> >>>>>>> at >> > > > >>> > >>> >>>>>>>>> 2.0.* In this case the client is going to get >> an >> > > error >> > > > >>> when >> > > > >>> > it >> > > > >>> > >>> >>>>>> sends >> > > > >>> > >>> >>>>>>> the >> > > > >>> > >>> >>>>>>>>> SaslHandshakeRequest. We want to avoid this. >> It >> > > seems >> > > > >>> to me >> > > > >>> > >>> the >> > > > >>> > >>> >>>>>>> client >> > > > >>> > >>> >>>>>>>>> needs to know if the broker has been upgraded to >> > the >> > > > >>> > necessary >> > > > >>> > >>> >>>>>> version >> > > > >>> > >>> >>>>>>>>> before trying to re-authenticate, which makes me >> > > > believe >> > > > >>> we >> > > > >>> > >>> need >> > > > >>> > >>> >>> to >> > > > >>> > >>> >>>>>>> bump >> > > > >>> > >>> >>>>>>>>> the API version number in 2.x and the client >> has to >> > > > >>> check to >> > > > >>> > >>> see >> > > > >>> > >>> >>>>>> if the >> > > > >>> > >>> >>>>>>>> max >> > > > >>> > >>> >>>>>>>>> version the broker supports meets a minimum >> > standard >> > > > >>> before >> > > > >>> > >>> >>> trying >> > > > >>> > >>> >>>>>> to >> > > > >>> > >>> >>>>>>>>> re-authenticate. Do you agree? >> > > > >>> > >>> >>>>>>>>> >> > > > >>> > >>> >>>>>>>>> 2) *Migrate broker to 2.x and turn the feature >> on >> > but >> > > > >>> client >> > > > >>> > is >> > > > >>> > >>> >>>>>> still >> > > > >>> > >>> >>>>>>> at >> > > > >>> > >>> >>>>>>>>> 2.0*. In this case the broker is going to end >> up >> > > > killing >> > > > >>> > >>> >>>>>> connections >> > > > >>> > >>> >>>>>>>>> periodically. Again, we want to avoid this. >> It's >> > > > >>> tempting >> > > > >>> > to >> > > > >>> > >>> >>> say >> > > > >>> > >>> >>>>>>>> "don't >> > > > >>> > >>> >>>>>>>>> do this" but I wonder if we can require >> > installations >> > > > to >> > > > >>> > >>> upgrade >> > > > >>> > >>> >>>>>> all >> > > > >>> > >>> >>>>>>>>> clients before turning the feature on at the >> > brokers. >> > > > >>> > >>> Certainly >> > > > >>> > >>> >>>>>> we can >> > > > >>> > >>> >>>>>>>> say >> > > > >>> > >>> >>>>>>>>> "don't do this" for inter-broker clients -- in >> > other >> > > > >>> words, >> > > > >>> > we >> > > > >>> > >>> >>> can >> > > > >>> > >>> >>>>>> say >> > > > >>> > >>> >>>>>>>> that >> > > > >>> > >>> >>>>>>>>> all brokers in a cluster should be upgraded >> before >> > > the >> > > > >>> > feature >> > > > >>> > >>> is >> > > > >>> > >>> >>>>>>> turned >> > > > >>> > >>> >>>>>>>> on >> > > > >>> > >>> >>>>>>>>> for any one of them -- but I don't know about >> our >> > > > >>> ability to >> > > > >>> > >>> say >> > > > >>> > >>> >>>>>> it for >> > > > >>> > >>> >>>>>>>>> non-broker clients. >> > > > >>> > >>> >>>>>>>>> >> > > > >>> > >>> >>>>>>>>> Now we consider the cases where both the brokers >> > and >> > > > the >> > > > >>> > >>> clients >> > > > >>> > >>> >>>>>> have >> > > > >>> > >>> >>>>>>>> been >> > > > >>> > >>> >>>>>>>>> upgraded to 2.x. When and where should the >> feature >> > > be >> > > > >>> turned >> > > > >>> > >>> on? >> > > > >>> > >>> >>>>>> The >> > > > >>> > >>> >>>>>>>>> inter-broker case is interesting because I don't >> > > think >> > > > >>> think >> > > > >>> > we >> > > > >>> > >>> >>> can >> > > > >>> > >>> >>>>>>>> require >> > > > >>> > >>> >>>>>>>>> installations to restart every broker with a new >> > > config >> > > > >>> where >> > > > >>> > >>> the >> > > > >>> > >>> >>>>>>> feature >> > > > >>> > >>> >>>>>>>>> is turned on at the same time. Do you agree >> that >> > we >> > > > >>> cannot >> > > > >>> > >>> >>> require >> > > > >>> > >>> >>>>>>> this >> > > > >>> > >>> >>>>>>>>> and therefore must support installations >> restarting >> > > > >>> brokers >> > > > >>> > >>> with >> > > > >>> > >>> >>> a >> > > > >>> > >>> >>>>>> new >> > > > >>> > >>> >>>>>>>>> config at whatever pace they need -- which may >> be >> > > quite >> > > > >>> slow >> > > > >>> > >>> >>>>>> relative >> > > > >>> > >>> >>>>>>> to >> > > > >>> > >>> >>>>>>>>> token lifetimes? Assuming you do agree, then >> there >> > > is >> > > > >>> going >> > > > >>> > to >> > > > >>> > >>> >>> be >> > > > >>> > >>> >>>>>> the >> > > > >>> > >>> >>>>>>>> case >> > > > >>> > >>> >>>>>>>>> where some brokers are going to have the feature >> > > turned >> > > > >>> on >> > > > >>> > and >> > > > >>> > >>> >>> some >> > > > >>> > >>> >>>>>>>> clients >> > > > >>> > >>> >>>>>>>>> (definitely inter-broker clients at a minimum) >> are >> > > > going >> > > > >>> to >> > > > >>> > >>> have >> > > > >>> > >>> >>>>>> the >> > > > >>> > >>> >>>>>>>>> feature turned off. >> > > > >>> > >>> >>>>>>>>> >> > > > >>> > >>> >>>>>>>>> The above discussion assumes a single config on >> the >> > > > >>> broker >> > > > >>> > side >> > > > >>> > >>> >>>>>> that >> > > > >>> > >>> >>>>>>>> turns >> > > > >>> > >>> >>>>>>>>> on both the inter-broker client >> re-authentication >> > > > >>> feature as >> > > > >>> > >>> well >> > > > >>> > >>> >>>>>> as >> > > > >>> > >>> >>>>>>> the >> > > > >>> > >>> >>>>>>>>> server-side expired-connection-kill feature, but >> > now >> > > > I'm >> > > > >>> > >>> thinking >> > > > >>> > >>> >>>>>> we >> > > > >>> > >>> >>>>>>> need >> > > > >>> > >>> >>>>>>>>> the ability to turn these features on >> > independently, >> > > > plus >> > > > >>> > maybe >> > > > >>> > >>> >>> we >> > > > >>> > >>> >>>>>>> need a >> > > > >>> > >>> >>>>>>>>> way to monitor the number of "active, expired" >> > > > >>> connections to >> > > > >>> > >>> the >> > > > >>> > >>> >>>>>>> server >> > > > >>> > >>> >>>>>>>> so >> > > > >>> > >>> >>>>>>>>> that operators can be sure that all clients have >> > been >> > > > >>> > >>> >>>>>> upgraded/enabled >> > > > >>> > >>> >>>>>>>>> before turning on the server-side >> > > > expired-connection-kill >> > > > >>> > >>> >>> feature. >> > > > >>> > >>> >>>>>>>>> >> > > > >>> > >>> >>>>>>>>> So the migration plan would be as follows: >> > > > >>> > >>> >>>>>>>>> >> > > > >>> > >>> >>>>>>>>> 1) Upgrade all brokers to 2.x. >> > > > >>> > >>> >>>>>>>>> 2) After all brokers are upgraded, turn on >> > > > >>> re-authentication >> > > > >>> > >>> for >> > > > >>> > >>> >>>>>> all >> > > > >>> > >>> >>>>>>>>> brokers at whatever rate is desired -- just >> > > eventually, >> > > > >>> at >> > > > >>> > some >> > > > >>> > >>> >>>>>> point, >> > > > >>> > >>> >>>>>>>> get >> > > > >>> > >>> >>>>>>>>> the client-side feature turned on for all >> brokers >> > so >> > > > that >> > > > >>> > >>> >>>>>> inter-broker >> > > > >>> > >>> >>>>>>>>> connections are re-authenticating. >> > > > >>> > >>> >>>>>>>>> 3) In parallel with (1) and (2) above, upgrade >> > > clients >> > > > >>> to 2.x >> > > > >>> > >>> and >> > > > >>> > >>> >>>>>> turn >> > > > >>> > >>> >>>>>>>>> their re-authentication feature on. Clients >> will >> > > check >> > > > >>> the >> > > > >>> > API >> > > > >>> > >>> >>>>>> version >> > > > >>> > >>> >>>>>>>> and >> > > > >>> > >>> >>>>>>>>> only re-authenticate to a broker that has also >> been >> > > > >>> upgraded >> > > > >>> > >>> >>> (note >> > > > >>> > >>> >>>>>> that >> > > > >>> > >>> >>>>>>>> the >> > > > >>> > >>> >>>>>>>>> ability of a broker to respond to a >> > re-authentication >> > > > >>> cannot >> > > > >>> > be >> > > > >>> > >>> >>>>>> turned >> > > > >>> > >>> >>>>>>>> off >> > > > >>> > >>> >>>>>>>>> -- it is always on beginning with version 2.x, >> and >> > it >> > > > >>> just >> > > > >>> > sits >> > > > >>> > >>> >>>>>> there >> > > > >>> > >>> >>>>>>>> doing >> > > > >>> > >>> >>>>>>>>> nothing if it isn't exercised by an enabled >> client) >> > > > >>> > >>> >>>>>>>>> 4) After (1), (2), and (3) are complete, check >> the >> > > 2.x >> > > > >>> broker >> > > > >>> > >>> >>>>>> metrics >> > > > >>> > >>> >>>>>>> to >> > > > >>> > >>> >>>>>>>>> confirm that there are no "active, expired" >> > > > connections. >> > > > >>> > Once >> > > > >>> > >>> >>> you >> > > > >>> > >>> >>>>>> are >> > > > >>> > >>> >>>>>>>>> satisfied that (1), (2), and (3) are indeed >> > complete >> > > > you >> > > > >>> can >> > > > >>> > >>> >>>>>> enable the >> > > > >>> > >>> >>>>>>>>> server-side expired-connection-kill feature on >> each >> > > > >>> broker >> > > > >>> > via >> > > > >>> > >>> a >> > > > >>> > >>> >>>>>>> restart >> > > > >>> > >>> >>>>>>>> at >> > > > >>> > >>> >>>>>>>>> whatever pace is desired. >> > > > >>> > >>> >>>>>>>>> >> > > > >>> > >>> >>>>>>>>> Comments? >> > > > >>> > >>> >>>>>>>>> >> > > > >>> > >>> >>>>>>>>> Ron >> > > > >>> > >>> >>>>>>>>> >> > > > >>> > >>> >>>>>>>>> >> > > > >>> > >>> >>>>>>>>> On Wed, Sep 12, 2018 at 4:48 PM Ron Dagostino < >> > > > >>> > >>> rndg...@gmail.com >> > > > >>> > >>> >>>> >> > > > >>> > >>> >>>>>>> wrote: >> > > > >>> > >>> >>>>>>>>> >> > > > >>> > >>> >>>>>>>>>> Ok, I am tempted to just say we go with the >> > > low-level >> > > > >>> > approach >> > > > >>> > >>> >>>>>> since >> > > > >>> > >>> >>>>>>> it >> > > >