I should have said `security configs` instead of `channel configs`. The KIP says:
- The configuration option this KIP proposes to add to enable server-side expired-connection-kill is '*connections.max.reauth.ms <http://connections.max.reauth.ms>*' (not prefixed with listener prefix or SASL mechanism name – this is a single value for the cluster) - The '*connections.max.reauth.ms <http://connections.max.reauth.ms>*' configuration option will not be dynamically changeable; restarts will be required if the value is to be changed. However, if a new listener is dynamically added, the value could be set for that listener at that time. Those statements are contradictory. Perhaps the first one should say `it may be optionally prefixed with the listener name`? On Tue, Sep 18, 2018 at 3:55 PM, Ron Dagostino <rndg...@gmail.com> wrote: > HI Rajini. The KIP is updated as summarized below, and I will start a vote > immediately. > > <<<It may be useful to have a metric of expired connections killed > Ok, agreed. I called it expired-connections-killed-total > > <<<For `successful-v0-authentication-{rate,total}`, we probably want > <<<version as a tag rather in the name. > Ok, agreed. I kept existing metrics unchanged but added an additional tag > to the V0 metrics so they are separate. > > <<<Not sure if we need four of these > <<<(rate/total with success/failure). Perhaps just success/total is > sufficient? > Ok, agreed, just kept the successful total. > > <<<For the session lifetime config, we don't need to require a listener or > <<<mechanism prefix > Ok, agreed, the config is now cluster-wide. > > <<<For all channel configs, we allow an optional listener prefix, > <<<so we should do the same here > Not sure what this is referring to. We don't have channel configs here, > right? > > <<<The KIP says connections are terminated on requests not related to > <<<re-authentication (ApiVersionsRequest, SaslHandshakeRequest, and > <<<SaslAuthenticateRequest). We can skip for ApiVersionsRequest for > <<<re-authentication, so that doesn't need to be in the list. > Yes, I was planning on that optimization; agreed, I removed it from the > list > > <<<we allow new listeners > <<<to be added dynamically and all configs for the listener can be added > <<<dynamically (with the listener prefix). I think we want to allow that > for > <<<this config > <<<We should mention this in the KIP, though in terms of implementation, I > <<<would leave that for a separate JIRA (it doesn't need to be implemented > at > <<<the same time). > Ok, agreed > > Thanks again for all the feedback and discussion. > > Ron > > On Tue, Sep 18, 2018 at 6:43 AM Rajini Sivaram <rajinisiva...@gmail.com> > wrote: > > > Hi Ron, > > > > Thanks for the updates. The KIP looks good. A few comments and minor > points > > below, but feel free to start vote to try and get it into 2.1.0. More > > community feedback will be really useful. > > > > 1) It may be useful to have a metric of expired connections killed by the > > broker. There could be a client implementation that doesn't support > > re-authentications, but happens to use the latest version of > > SaslAuthenticateRequest. Or cases where re-authentication didn't happen > on > > time. > > > > 2) For `successful-v0-authentication-{rate,total}`, we probably want > > version as a tag rather in the name. Not sure if we need four of these > > (rate/total with success/failure). Perhaps just success/total is > > sufficient? > > > > 3) For the session lifetime config, we don't need to require a listener > or > > mechanism prefix. In most cases, we would expect a single config on the > > broker-side. For all channel configs, we allow an optional listener > prefix, > > so we should do the same here. > > > > 4) The KIP says connections are terminated on requests not related to > > re-authentication (ApiVersionsRequest, SaslHandshakeRequest, and > > SaslAuthenticateRequest). We can skip for ApiVersionsRequest for > > re-authentication, so that doesn't need to be in the list. > > > > 5) The KIP says that the new config will not be dynamically updatable. We > > have a very limited set of configs that are dynamically updatable for an > > existing listener. And we don't want to add this config to the list since > > we don't expect this value to change frequently. But we allow new > listeners > > to be added dynamically and all configs for the listener can be added > > dynamically (with the listener prefix). I think we want to allow that for > > this config (i.e. add a new OAuth listener with re-authentication > enabled). > > We should mention this in the KIP, though in terms of implementation, I > > would leave that for a separate JIRA (it doesn't need to be implemented > at > > the same time). > > > > > > > > On Tue, Sep 18, 2018 at 3:06 AM, Ron Dagostino <rndg...@gmail.com> > wrote: > > > > > HI again, Rajini. Would we ever want the max session time to be > > different > > > across different SASL mechanisms? I'm wondering, now that we are > > > supporting all SASL mechanisms via this KIP, if we still need to prefix > > > this config with the "[listener].[mechanism]." prefix. I've kept the > > > prefix in the KIP for now, but it would be easier to just set it once > for > > > all mechanisms, and I don't see that as being a problem. Let me know > > what > > > you think. > > > > > > Ron > > > > > > On Mon, Sep 17, 2018 at 9:51 PM Ron Dagostino <rndg...@gmail.com> > wrote: > > > > > > > Hi Rajini. The KIP is updated. Aside from a once-over to make sure > it > > > is > > > > all accurate, I think we need to confirm the metrics. The decision > to > > > not > > > > reject authentications that use tokens with too-long a lifetime > allowed > > > the > > > > metrics to be simpler. I decided that in addition to tracking these > > > > metrics on the broker: > > > > > > > > failed-reauthentication-{rate,total} and > > > > successful-reauthentication-{rate,total} > > > > > > > > we simply need one more set of broker metrics to track the subset of > > > > clients clients that are not upgraded to v2.1.0 and are still using a > > V0 > > > > SaslAuthenticateRequest: > > > > > > > > failed-v0-authentication-{rate,total} and > > > > successful-v0-authentication-{rate,total} > > > > > > > > See the Migration section of the KIP for details of how this would be > > > used. > > > > > > > > I wonder if we need a broker metric documenting the number of > "expired" > > > > sessions killed by the broker since it would be the same as > > > > successful-v0-authentication-total. I've eliminated that from the > KIP > > > for > > > > now. Thoughts? > > > > > > > > There is also a client-side metric for re-authentication latency > > tracking > > > > (still unnamed -- do you have a preference?) > > > > > > > > I think we're close to being able to put this KIP up for a vote. > > > > > > > > Ron > > > > > > > > > > > > On Mon, Sep 17, 2018 at 2:45 PM Ron Dagostino <rndg...@gmail.com> > > wrote: > > > > > > > >> <<<extra field alongside errors, not in the opaque body as mentioned > > in > > > >> the KIP > > > >> Right, that was a concern I had mentioned in a follow-up email. > Agree > > > it > > > >> should go alongside errors. > > > >> > > > >> <<<SCRAM for delegation token based authentication where credentials > > > have > > > >> <<<an expiry time, so we do need to be able to set individual > > credential > > > >> <<<lifetimes, where the server knows the expiry time but client may > > not > > > >> Ah, ok, I was not aware that this case existed. Agreed, for > > > consistency, > > > >> server will always send it back to the client. > > > >> > > > >> <<<I was trying to avoid this altogether [SaslClient negotiated > > property > > > >> for lifetime] > > > >> Since we now agree that the server will send it back, yes, there is > no > > > >> need for this. > > > >> > > > >> <<<wasn't entirely clear about the purpose of > > > >> [sasl.login.refresh.reauthenticate.enable] > > > >> I think this might have been more useful when we weren't necessarily > > > >> going to support all SASL mechanisms and/or when the broker was not > > > going > > > >> to advertise the fact that it supported re-authentication. You are > > > >> correct, now that we support it for all SASL mechanisms and we are > > > bumping > > > >> an API version, I think it is okay to simply enable it wherever both > > the > > > >> client and server meet the required versions. > > > >> > > > >> <<<wasn't entirely clear about the purpose of [ > > > connections.max.reauth.ms] > > > >> <<<wasn't expecting that we would reject > > > >> <<<tokens or tickets simply because they were too long-lived > > > >> <<<tickets are granted by a 3rd party authority > > > >> <<<Client re-authenticates even though token was not > > > >> <<<refreshed. Does this matter? > > > >> I was going under the assumption that it would matter, but based on > > your > > > >> pushback I just realized that the same functionality can be > > implemented > > > as > > > >> part of token validation if there is a desire to limit token > lifetimes > > > to a > > > >> certain max value (and the token validator has to be provided in > > > production > > > >> anyway since all we provide out-of-the-box is the unsecured > > > validator). So > > > >> I'm willing to abandon this check as part of re-authentication. > > > >> > > > >> I'll adjust the KIP accordingly a bit later. Thanks for the > continued > > > >> feedback/discussion. > > > >> > > > >> Ron > > > >> > > > >> > > > >> > > > >> > > > >> On Mon, Sep 17, 2018 at 2:10 PM Rajini Sivaram < > > rajinisiva...@gmail.com > > > > > > > >> wrote: > > > >> > > > >>> Hi Ron, > > > >>> > > > >>> > > > >>> *1) Is there a reason to communicate this value back to the client > > when > > > >>> the > > > >>> client already knows it? It's an extra network round-trip, and > since > > > the > > > >>> SASL round-tripsare defined by the spec I'm not certain adding an > > extra > > > >>> round-trip is acceptable.* > > > >>> > > > >>> I wasn't suggesting an extra round-trip for propagating session > > > >>> lifetime. I > > > >>> was expecting session lifetime to be added to the last > > > SASL_AUTHENTICATE > > > >>> response from the broker. Because SASL is a challenge-response > > > mechanism, > > > >>> SaslServer knows when the response being sent is the last one and > > hence > > > >>> can > > > >>> send the session lifetime in the response (in the same way as we > > > >>> propagate > > > >>> errors). I was expecting this to be added as an extra field > alongside > > > >>> errors, not in the opaque body as mentioned in the KIP. The opaque > > byte > > > >>> array strictly conforms to the SASL mechanism wire protocol and we > > want > > > >>> to > > > >>> keep it that way. > > > >>> > > > >>> As you have said, we don't need server to propagate session > lifetime > > > for > > > >>> OAUTHBEARER since client knows the token lifetime. But server also > > > knows > > > >>> the credential lifetime and by having the server decide the > lifetime, > > > we > > > >>> can use the same code path for all mechanisms. If we only have a > > > constant > > > >>> max lifetime in the server, for PLAIN and SCRAM we will end up > having > > > the > > > >>> same lifetime for all credentials with no ability to set actual > > expiry. > > > >>> We > > > >>> use SCRAM for delegation token based authentication where > credentials > > > >>> have > > > >>> an expiry time, so we do need to be able to set individual > credential > > > >>> lifetimes, where the server knows the expiry time but client may > not. > > > >>> > > > >>> *2) I also just realized that if the client is to learn the > > credential > > > >>> lifetime we wouldn't want to put special-case code in the > > Authenticator > > > >>> for > > > >>> GSSAPI and OAUTHBEARER; we would want to expose the value > > generically, > > > >>> probably as a negotiated property on the SaslClient instance.* > > > >>> > > > >>> I was trying to avoid this altogether. Client doesn't need to know > > > >>> credential lifetime. Server asks the client to re-authenticate > within > > > its > > > >>> session lifetime. > > > >>> > > > >>> 3) From the KIP, I wasn't entirely clear about the purpose of the > two > > > >>> configs: > > > >>> > > > >>> sasl.login.refresh.reauthenticate.enable: Do we need this? Client > > > knows > > > >>> if > > > >>> broker version supports re-authentication based on the > > > SASL_AUTHENTICATE > > > >>> version returned in ApiVersionsResponse. Client knows if broker is > > > >>> configured to enable re-authentication based on session lifetime > > > returned > > > >>> in SaslAuthenticateResponse. If broker has re-authentication > > configured > > > >>> and > > > >>> client supports re-authentication, you would always want > > > re-authenticate. > > > >>> So I wasn't sure why we need a config to opt in or out on the > > > >>> client-side. > > > >>> > > > >>> > > > >>> connections.max.reauth.ms: We obviously need a broker-side config. > > Not > > > >>> entirely sure about the semantics of the config that drives > > > >>> re-authentication. In particular, I wasn't expecting that we would > > > reject > > > >>> tokens or tickets simply because they were too long-lived. Since > > tokens > > > >>> or > > > >>> tickets are granted by a 3rd party authority, I am not sure if > > clients > > > >>> will > > > >>> always have control over the lifetime. Do we need to support any > more > > > >>> scenarios than these: > > > >>> > > > >>> > > > >>> A) reauth.ms=10,credential.lifetime.ms=10 : Broker sets > > > >>> session.lifetime=10, > > > >>> so this works. > > > >>> > > > >>> B) reauth.ms=10, credential.lifetime.ms=5 : Broker sets > > > >>> session.lifetime=5, > > > >>> so this works. > > > >>> C) reauth.ms=10, credential.lifetime.ms=20 : Broker sets > > > >>> session.lifetime=10. Client re-authenticates even though token was > > not > > > >>> refreshed. Does this matter? > > > >>> D) reauth.ms=Long.MAX_VALUE, credential.lifetime.ms=10: Broker > sets > > > >>> session.lifetime=10, client re-authenticates based on credential > > > expiry. > > > >>> E) reauth.ms=0 (default), credential.lifetime.ms=10 : Broker sets > > > >>> session.lifetime=0, Broker doesn't terminate sessions, client > doesn't > > > >>> re-authenticate. We generate useful metrics. > > > >>> F) reauth.ms=0 (default),no lifetime for credential (e.g. PLAIN): > > > Broker > > > >>> sets session.lifetime=0, Broker doesn't terminate sessions, client > > > >>> doesn't > > > >>> re-authenticate > > > >>> G) reauth.ms=10,no lifetime for credential (e.g. PLAIN) : Broker > > sets > > > >>> session.lifetime=10. Client re-authenticates. > > > >>> > > > >>> I would have thought that D) is the typical scenario for > > OAuth/Kerberos > > > >>> to > > > >>> respect token expiry time. G) would be typical scenario for PLAIN > to > > > >>> force > > > >>> re-authenication at regular intervals. A/B/C are useful to force > > > >>> re-authentication in scenarios where you might check for credential > > > >>> revocation in the server. And E/F are useful to disable > > > re-authentication > > > >>> and generate metrics (also the default behaviour useful during > > > >>> migration). > > > >>> Have I missed something? > > > >>> > > > >>> > > > >>> On Mon, Sep 17, 2018 at 4:27 PM, Ron Dagostino <rndg...@gmail.com> > > > >>> wrote: > > > >>> > > > >>> > Hi yet again, Rajini. I also just realized that if the client is > > to > > > >>> learn > > > >>> > the credential lifetime we wouldn't want to put special-case code > > in > > > >>> the > > > >>> > Authenticator for GSSAPI and OAUTHBEARER; we would want to expose > > the > > > >>> value > > > >>> > generically, probably as a negotiated property on the SaslClient > > > >>> instance. > > > >>> > We might be talking about making the negotiated property key part > > of > > > >>> the > > > >>> > public API. In other words, the SaslClient would be responsible > > for > > > >>> > exposing the credential (i.e. token or ticket) lifetime at a > > > well-known > > > >>> > negotiated property name, such as "Credential.Lifetime" and > > putting a > > > >>> Long > > > >>> > value there if there is a lifetime. That well-klnown key (e.g. > > > >>> > "Credential.Lifetime") would be part of the public API, right? > > > >>> > > > > >>> > Ron > > > >>> > > > > >>> > On Mon, Sep 17, 2018 at 11:03 AM Ron Dagostino < > rndg...@gmail.com> > > > >>> wrote: > > > >>> > > > > >>> > > Hi again, Rajini. After thinking about this a little while, it > > > >>> occurs to > > > >>> > > me that maybe the communication of max session lifetime should > > > occur > > > >>> via > > > >>> > > SASL_HANDSHAKE after all. Here's why. The value communicated > is > > > >>> the max > > > >>> > > session lifetime allowed, and the client can assume it is the > the > > > >>> session > > > >>> > > lifetime for that particular session unless the particular SASL > > > >>> mechanism > > > >>> > > could result in a shorter session that would be obvious to the > > > >>> client and > > > >>> > > the server. In particular, for OAUTHBEARER, the session > lifetime > > > >>> will be > > > >>> > > the token lifetime, which the client and server will both know. > > Is > > > >>> > there a > > > >>> > > reason to communicate this value back to the client when the > > client > > > >>> > already > > > >>> > > knows it? It's an extra network round-trip, and since the SASL > > > >>> > round-trips > > > >>> > > are defined by the spec I'm not certain adding an extra > > round-trip > > > is > > > >>> > > acceptable. Even if we decide we can add it, it helps with > > latency > > > >>> if we > > > >>> > > don't. > > > >>> > > > > > >>> > > Kerberos may be a bit different -- I don't know if the broker > can > > > >>> learn > > > >>> > > the session lifetime. If it can then the same thing holds -- > > both > > > >>> client > > > >>> > > and server will know the session lifetime and there is no > reason > > to > > > >>> > > communicate it back. If the server can't learn the lifetime > > then I > > > >>> don't > > > >>> > > think adding an extra SASL_AUTHENTICATE round trip is going to > > > help, > > > >>> > anyway. > > > >>> > > > > > >>> > > Also, by communicating the max session lifetime in the > > > SASL_HANDSHAKE > > > >>> > > response, both OAUTHBEARER and GSSAPI clients will be able to > > know > > > >>> before > > > >>> > > sending any SASL_AUTHENTICATE requests whether their credential > > > >>> violates > > > >>> > > the maximum. This allows a behaving client to give a good > error > > > >>> message. > > > >>> > > A malicious client would ignore the value and send a > longer-lived > > > >>> > > credential, and then that would be rejected on the server side. > > > >>> > > > > > >>> > > I'm still good with ExpiringCredential not needing to be > public. > > > >>> > > > > > >>> > > What do you think? > > > >>> > > > > > >>> > > Ron > > > >>> > > > > > >>> > > Ron > > > >>> > > > > > >>> > > Ron > > > >>> > > > > > >>> > > On Mon, Sep 17, 2018 at 9:13 AM Ron Dagostino < > rndg...@gmail.com > > > > > > >>> wrote: > > > >>> > > > > > >>> > >> Hi Rajini. I've updated the KIP to reflect the decision to > > fully > > > >>> > support > > > >>> > >> this for all SASL mechanisms and to not require the > > > >>> ExpiringCredential > > > >>> > >> interface to be public. > > > >>> > >> > > > >>> > >> Ron > > > >>> > >> > > > >>> > >> On Mon, Sep 17, 2018 at 6:55 AM Ron Dagostino < > > rndg...@gmail.com> > > > >>> > wrote: > > > >>> > >> > > > >>> > >>> Actually, I think the metric remains the same assuming the > > > >>> > >>> authentication fails when the token lifetime is too long. > > > >>> Negative > > > >>> > max > > > >>> > >>> config on server counts what would have been killed because > > maybe > > > >>> > client > > > >>> > >>> re-auth is not turned on; positive max enables the kill, > which > > is > > > >>> > counted > > > >>> > >>> by a second metric as proposed. The current proposal had > > already > > > >>> > stated > > > >>> > >>> that a non-zero value would disallow an authentication with a > > > token > > > >>> > that > > > >>> > >>> has too large a lifetime, and that is still the case, I > think. > > > But > > > >>> > let’s > > > >>> > >>> make sure we are on the same page about all this. > > > >>> > >>> > > > >>> > >>> Ron > > > >>> > >>> > > > >>> > >>> > On Sep 17, 2018, at 6:42 AM, Ron Dagostino < > > rndg...@gmail.com> > > > >>> > wrote: > > > >>> > >>> > > > > >>> > >>> > Hi Rajini. I see what you are saying. The background > login > > > >>> refresh > > > >>> > >>> thread does have to factor into the decision for OAUTHBEARER > > > >>> because > > > >>> > there > > > >>> > >>> is no new token to re-authenticate with until that refresh > > > succeeds > > > >>> > >>> (similarly with Kerberos), but I think you are right that the > > > >>> interface > > > >>> > >>> doesn’t necessarily have to be public. The server does > decide > > > the > > > >>> > time for > > > >>> > >>> PLAIN and the SCRAM-related mechanisms under the current > > > proposal, > > > >>> but > > > >>> > it > > > >>> > >>> is done in SaslHandshakeResponse, and at that point the > server > > > >>> won’t > > > >>> > have > > > >>> > >>> any token yet; making it happen via SaslAuthenticateRequest > at > > > the > > > >>> > very end > > > >>> > >>> does allow the server to know everything for all mechanisms. > > > >>> > >>> > > > > >>> > >>> > I see two potential issues to discuss. First, the server > > must > > > >>> > >>> communicate a time that exceeds the (token or ticket) refresh > > > >>> period > > > >>> > on the > > > >>> > >>> client. Assuming it communicates the expiration time of the > > > >>> > token/ticket, > > > >>> > >>> that’s the best it can do. So I think that’ll be fine. > > > >>> > >>> > > > > >>> > >>> > The second issue is what happens if the server is > configured > > to > > > >>> > accept > > > >>> > >>> a max value — say, an hour — and the token is good for > > longer. I > > > >>> > assumed > > > >>> > >>> that the client should not be allowed to authenticate in the > > > first > > > >>> > place > > > >>> > >>> because it could then simply re-authenticate with the same > > token > > > >>> after > > > >>> > an > > > >>> > >>> hour, which defeats the motivations for this KIP. So do we > > agree > > > >>> the > > > >>> > max > > > >>> > >>> token lifetime allowed will be enforced at authentication > time? > > > >>> > Assuming > > > >>> > >>> so, then we need to discuss migration. > > > >>> > >>> > > > > >>> > >>> > The current proposal includes a metric that can be used to > > > >>> identify > > > >>> > if > > > >>> > >>> an OAUTHBEARER client is misconfigured — the number of > > > connections > > > >>> that > > > >>> > >>> would have been killed (could be non-zero when the configured > > max > > > >>> is a > > > >>> > >>> negative number). Do we still want this type of an > indication, > > > and > > > >>> if > > > >>> > so, > > > >>> > >>> is it still done the same way — a negative number for max, > but > > > >>> instead > > > >>> > of > > > >>> > >>> counting the number of kills that would have been done it > > counts > > > >>> the > > > >>> > number > > > >>> > >>> of authentications that would have been failed? > > > >>> > >>> > > > > >>> > >>> > Ron > > > >>> > >>> > > > > >>> > >>> >> On Sep 17, 2018, at 6:06 AM, Rajini Sivaram < > > > >>> > rajinisiva...@gmail.com> > > > >>> > >>> wrote: > > > >>> > >>> >> > > > >>> > >>> >> Hi Ron, > > > >>> > >>> >> > > > >>> > >>> >> Thinking a bit more about other SASL mechanisms, I think > the > > > >>> issue > > > >>> > is > > > >>> > >>> that > > > >>> > >>> >> in the current proposal, clients decide the > > re-authentication > > > >>> > period. > > > >>> > >>> For > > > >>> > >>> >> mechanisms like PLAIN or SCRAM, we would actually want > > server > > > to > > > >>> > >>> determine > > > >>> > >>> >> the re-authentication interval. For example, the PLAIN or > > > SCRAM > > > >>> > >>> database > > > >>> > >>> >> could specify the expiry time for each principal, or the > > > broker > > > >>> > could > > > >>> > >>> be > > > >>> > >>> >> configured with a single expiry time for all principals of > > > that > > > >>> > >>> mechanism. > > > >>> > >>> >> For OAuth, brokers do know the expiry time. Haven't > figured > > > out > > > >>> what > > > >>> > >>> to do > > > >>> > >>> >> with Kerberos, but in any case for broker-side connection > > > >>> > termination > > > >>> > >>> on > > > >>> > >>> >> expiry, we need the broker to know/decide the expiry time. > > So > > > I > > > >>> > would > > > >>> > >>> like > > > >>> > >>> >> to suggest a slightly different approach to managing > > > >>> > >>> re-authentications. > > > >>> > >>> >> > > > >>> > >>> >> 1) Instead of changing SASL_HANDSHAKE version number, we > > bump > > > up > > > >>> > >>> >> SASL_AUTHENTICATE version number. > > > >>> > >>> >> 2) In the final SASL_AUTHENTICATE response, broker returns > > the > > > >>> > expiry > > > >>> > >>> time > > > >>> > >>> >> of this authenticated session. This is the interval after > > > which > > > >>> > >>> broker will > > > >>> > >>> >> terminate the connection If it wasn't re-authenticated. > > > >>> > >>> >> 3) We no longer need the public interface change to add ` > > > >>> > >>> >> > > org.apache.kafka.common.security.expiring.ExpiringCredential` > > > >>> for > > > >>> > the > > > >>> > >>> >> client-side. Instead we schedule the next > re-authentication > > on > > > >>> the > > > >>> > >>> client > > > >>> > >>> >> based on the expiry time provided by the server (some time > > > >>> earlier > > > >>> > >>> than the > > > >>> > >>> >> expiry). > > > >>> > >>> >> 4) If client uses SASL_AUTHENTICATE v0, broker will not > > return > > > >>> > expiry > > > >>> > >>> time. > > > >>> > >>> >> The connection will be terminated if that feature is > enabled > > > >>> (the > > > >>> > >>> same code > > > >>> > >>> >> path as client failing to re-authenticte on time). > > > >>> > >>> >> > > > >>> > >>> >> Thoughts? > > > >>> > >>> >> > > > >>> > >>> >>> On Sat, Sep 15, 2018 at 3:06 AM, Ron Dagostino < > > > >>> rndg...@gmail.com> > > > >>> > >>> wrote: > > > >>> > >>> >>> > > > >>> > >>> >>> Hi everyone. I've updated the KIP to reflect all > > discussion > > > to > > > >>> > date, > > > >>> > >>> >>> including the decision to go with the low-level approach. > > > This > > > >>> > >>> latest > > > >>> > >>> >>> version of the KIP includes the above " > > > >>> connections.max.reauth.ms" > > > >>> > >>> >>> proposal, > > > >>> > >>> >>> which I know has not been discussed yet. It mentions new > > > >>> metrics, > > > >>> > >>> some of > > > >>> > >>> >>> which may not have been discussed either (and names are > > > missing > > > >>> > from > > > >>> > >>> some > > > >>> > >>> >>> of them). Regardless, this new version is the closest > yet > > > to a > > > >>> > >>> version > > > >>> > >>> >>> that can be put to a vote next week. > > > >>> > >>> >>> > > > >>> > >>> >>> Ron > > > >>> > >>> >>> > > > >>> > >>> >>>> On Fri, Sep 14, 2018 at 8:59 PM Ron Dagostino < > > > >>> rndg...@gmail.com> > > > >>> > >>> wrote: > > > >>> > >>> >>>> > > > >>> > >>> >>>> Minor correction: I'm proposing " > > connections.max.reauth.ms" > > > >>> as > > > >>> > the > > > >>> > >>> >>>> broker-side configuration property, not " > > > >>> > >>> >>>> connections.max.expired.credentials.ms". > > > >>> > >>> >>>> > > > >>> > >>> >>>> Ron > > > >>> > >>> >>>> > > > >>> > >>> >>>>> On Fri, Sep 14, 2018 at 8:40 PM Ron Dagostino < > > > >>> rndg...@gmail.com > > > >>> > > > > > >>> > >>> wrote: > > > >>> > >>> >>>>> > > > >>> > >>> >>>>> Hi Rajini. I'm going to choose * > > > >>> > >>> connections.max.expired.credentials.ms > > > >>> > >>> >>>>> <http://connections.max.expired.credentials.ms>* as > the > > > >>> option > > > >>> > >>> name > > > >>> > >>> >>>>> because it is consistent with the comment I made in the > > > >>> section > > > >>> > >>> about > > > >>> > >>> >>> how > > > >>> > >>> >>>>> to get the client and server to agree on credential > > > lifetime: > > > >>> > >>> >>>>> > > > >>> > >>> >>>>> "We could add a new API call so that clients could ask > > > >>> servers > > > >>> > for > > > >>> > >>> the > > > >>> > >>> >>>>> lifetime they use, or we could extend the > > > >>> > >>> SaslHandshakeRequest/Response > > > >>> > >>> >>> API > > > >>> > >>> >>>>> call to include that information in the server's > > response – > > > >>> the > > > >>> > >>> client > > > >>> > >>> >>>>> would then adopt that value" > > > >>> > >>> >>>>> > > > >>> > >>> >>>>> > > > >>> > >>> >>>>> We set the config option value on the broker (with the > " > > > >>> > >>> >>>>> listener.name.mechanism." prefix), and we will return > the > > > >>> > >>> configured > > > >>> > >>> >>>>> value in the SaslHandshakeResponse (requiring a wire > > format > > > >>> > change > > > >>> > >>> in > > > >>> > >>> >>>>> addition to a version bump). The value (referred to as > > "X" > > > >>> > below) > > > >>> > >>> can > > > >>> > >>> >>> be > > > >>> > >>> >>>>> negative, zero, or positive and is to be interpreted as > > > >>> follows: > > > >>> > >>> >>>>> > > > >>> > >>> >>>>> X = 0: Fully disable. The server will respond to > > > >>> > >>> re-authentications, > > > >>> > >>> >>> but > > > >>> > >>> >>>>> it won't kill any connections due to expiration, and it > > > won't > > > >>> > track > > > >>> > >>> >>> either > > > >>> > >>> >>>>> of the 2 metrics mentioned below. > > > >>> > >>> >>>>> > > > >>> > >>> >>>>> Now, a couple of definitions for when X != 0: > > > >>> > >>> >>>>> > > > >>> > >>> >>>>> 1) We define the *maximum allowed expiration time* to > be > > > the > > > >>> time > > > >>> > >>> >>>>> determined by the point when (re-)authentication occurs > > > plus > > > >>> |X| > > > >>> > >>> >>>>> milliseconds. > > > >>> > >>> >>>>> 2) We define the *requested expiration time* to be the > > > >>> maximum > > > >>> > >>> allowed > > > >>> > >>> >>>>> expiration time except for the case of an OAuth bearer > > > >>> token, in > > > >>> > >>> which > > > >>> > >>> >>> case > > > >>> > >>> >>>>> it is the point at which the token expires. (Kerberos > > > >>> presumably > > > >>> > >>> also > > > >>> > >>> >>>>> specifies a ticket lifetime, but frankly I am not in a > > > >>> position > > > >>> > to > > > >>> > >>> do > > > >>> > >>> >>> any > > > >>> > >>> >>>>> Kerberos-related coding in time for a 2.1.0 release and > > > would > > > >>> > >>> prefer to > > > >>> > >>> >>>>> ignore this piece of information for this KIP -- would > it > > > be > > > >>> > >>> acceptable > > > >>> > >>> >>> to > > > >>> > >>> >>>>> have someone else add it later?). > > > >>> > >>> >>>>> > > > >>> > >>> >>>>> Based on these definitions, we define the behavior as > > > >>> follows: > > > >>> > >>> >>>>> > > > >>> > >>> >>>>> *X > 0: Fully enable* > > > >>> > >>> >>>>> A) The server will reject any > > > >>> authentication/re-authentication > > > >>> > >>> attempt > > > >>> > >>> >>>>> when the requested expiration time is after the maximum > > > >>> allowed > > > >>> > >>> >>> expiration > > > >>> > >>> >>>>> time (which could only happen with an OAuth bearer > token, > > > >>> > assuming > > > >>> > >>> we > > > >>> > >>> >>> skip > > > >>> > >>> >>>>> dealing with Kerberos for now). > > > >>> > >>> >>>>> B) The server will kill connections that are used > beyond > > > the > > > >>> > >>> requested > > > >>> > >>> >>>>> expiration time. > > > >>> > >>> >>>>> C) A broker metric will be maintained that documents > the > > > >>> number > > > >>> > >>> >>>>> connections killed by the broker. This metric will be > > > >>> non-zero > > > >>> > if > > > >>> > >>> a > > > >>> > >>> >>> client > > > >>> > >>> >>>>> is connecting to the broker with re-authentication > either > > > >>> > >>> unavailable > > > >>> > >>> >>> (i.e. > > > >>> > >>> >>>>> an older client) or disabled. > > > >>> > >>> >>>>> > > > >>> > >>> >>>>> *X < 0: Partially enable* > > > >>> > >>> >>>>> A) Same as above: the server will reject any > > > >>> > >>> >>>>> authentication/re-authentication attempt when the > > > requested > > > >>> > >>> expiration > > > >>> > >>> >>> time > > > >>> > >>> >>>>> is after the maximum allowed expiration time (which > could > > > >>> only > > > >>> > >>> happen > > > >>> > >>> >>> with > > > >>> > >>> >>>>> an OAuth bearer token, assuming we skip dealing with > > > >>> Kerberos for > > > >>> > >>> now). > > > >>> > >>> >>>>> B) The server will **not** kill connections that are > used > > > >>> beyond > > > >>> > >>> the > > > >>> > >>> >>>>> requested expiration time. > > > >>> > >>> >>>>> C) A broker metric will be maintained that documents > the > > > >>> number > > > >>> > of > > > >>> > >>> API > > > >>> > >>> >>>>> requests unrelated to re-authentication that are made > > over > > > a > > > >>> > >>> connection > > > >>> > >>> >>>>> beyond the requested expiration time. This metric will > > be > > > >>> > helpful > > > >>> > >>> for > > > >>> > >>> >>>>> operators to ensure that all clients are properly > > upgraded > > > >>> and > > > >>> > >>> >>>>> re-authenticating before fully enabling the server-side > > > >>> > >>> >>>>> expired-connection-kill functionality (i.e. by changing > > the > > > >>> value > > > >>> > >>> from a > > > >>> > >>> >>>>> negative number to a positive number): this metric will > > be > > > >>> zero > > > >>> > >>> across > > > >>> > >>> >>> all > > > >>> > >>> >>>>> brokers when it is safe to fully enable the server-side > > > >>> feature. > > > >>> > >>> >>>>> > > > >>> > >>> >>>>> Thoughts? > > > >>> > >>> >>>>> > > > >>> > >>> >>>>> Ron > > > >>> > >>> >>>>> > > > >>> > >>> >>>>> On Fri, Sep 14, 2018 at 11:59 AM Rajini Sivaram < > > > >>> > >>> >>> rajinisiva...@gmail.com> > > > >>> > >>> >>>>> wrote: > > > >>> > >>> >>>>> > > > >>> > >>> >>>>>> It will be good to shorten the config name. We have a > > > config > > > >>> > >>> named ` > > > >>> > >>> >>>>>> connections.max.idle.ms`. We could add something > like ` > > > >>> > >>> >>>>>> connections.max.expired.credentials.ms`. As you said, > > it > > > >>> would > > > >>> > be > > > >>> > >>> >>>>>> prefixed > > > >>> > >>> >>>>>> with listener and SASL mechanism name. We should be > able > > > to > > > >>> > >>> support > > > >>> > >>> >>>>>> connection termination in future even with SSL, so > > perhaps > > > >>> we > > > >>> > >>> don't > > > >>> > >>> >>> want > > > >>> > >>> >>>>>> the `sasl.server.` prefix (we just need to validate > > > whether > > > >>> the > > > >>> > >>> config > > > >>> > >>> >>> is > > > >>> > >>> >>>>>> supported for the protocol). You can choose whether a > > > >>> boolean > > > >>> > >>> flag or a > > > >>> > >>> >>>>>> time interval makes more sense. > > > >>> > >>> >>>>>> > > > >>> > >>> >>>>>> For the client-side, the KIP explains how to make it > > work > > > >>> for > > > >>> > >>> other > > > >>> > >>> >>>>>> mechanisms and we can leave it that. Not convinced > about > > > >>> > >>> server-side > > > >>> > >>> >>>>>> though. It seems to me that we probably would require > > API > > > >>> > changes > > > >>> > >>> to > > > >>> > >>> >>> make > > > >>> > >>> >>>>>> it work. Basically the proposed approach works only > for > > > >>> > >>> OAUTHBEARER. > > > >>> > >>> >>>>>> Since > > > >>> > >>> >>>>>> we have to bump up SaslHandshakeRequest version in > this > > > >>> KIP, it > > > >>> > >>> will be > > > >>> > >>> >>>>>> good to work out if we need to change the request or > > flow > > > to > > > >>> > make > > > >>> > >>> this > > > >>> > >>> >>>>>> work > > > >>> > >>> >>>>>> for other mechanisms. I haven't figured out exactly > what > > > is > > > >>> > >>> needed, but > > > >>> > >>> >>>>>> will think about it and get back next week. In the > > > >>> meantime, you > > > >>> > >>> can > > > >>> > >>> >>> get > > > >>> > >>> >>>>>> the KIP up-to-date with the config, migration path > etc. > > > and > > > >>> get > > > >>> > it > > > >>> > >>> >>> ready > > > >>> > >>> >>>>>> to > > > >>> > >>> >>>>>> start vote next week. > > > >>> > >>> >>>>>> > > > >>> > >>> >>>>>> > > > >>> > >>> >>>>>> > > > >>> > >>> >>>>>> > > > >>> > >>> >>>>>> > > > >>> > >>> >>>>>> On Fri, Sep 14, 2018 at 1:24 PM, Ron Dagostino < > > > >>> > rndg...@gmail.com > > > >>> > >>> > > > > >>> > >>> >>>>>> wrote: > > > >>> > >>> >>>>>> > > > >>> > >>> >>>>>>> Hi Rajini. > > > >>> > >>> >>>>>>> > > > >>> > >>> >>>>>>> Agreed, we will bump the SaslHandshakeRequest API > > number > > > >>> (no > > > >>> > wire > > > >>> > >>> >>>>>> format > > > >>> > >>> >>>>>>> change, of course). > > > >>> > >>> >>>>>>> > > > >>> > >>> >>>>>>> Agreed, responding to re-authentication will always > be > > > >>> enabled > > > >>> > >>> on the > > > >>> > >>> >>>>>>> broker since it is initiated by the client. > > > >>> > >>> >>>>>>> > > > >>> > >>> >>>>>>> Agreed, connection termination by the broker will be > > > >>> opt-in. > > > >>> > I'm > > > >>> > >>> >>>>>> thinking > > > >>> > >>> >>>>>>> *sasl.server.disconnect.expired.credential. > connections. > > > >>> > enable*, > > > >>> > >>> >>>>>> prefixed > > > >>> > >>> >>>>>>> with listener and SASL mechanism name in lower-case > so > > it > > > >>> can > > > >>> > be > > > >>> > >>> >>>>>> opt-in at > > > >>> > >>> >>>>>>> the granularity of a SASL mechanism; for example, " > > > >>> > >>> >>>>>>> listener.name.sasl_ssl.oauthbearer.sasl.server. > > > >>> > >>> >>>>>>> disconnect.expired.credential. > connections.enable=true > > > >>> > >>> >>>>>>> ". It is long, so if you have a preference for a > > shorter > > > >>> name > > > >>> > >>> let me > > > >>> > >>> >>>>>> know > > > >>> > >>> >>>>>>> and we'll go with it instead. > > > >>> > >>> >>>>>>> > > > >>> > >>> >>>>>>> Agreed, additional metrics will be helpful. I'll add > > > them > > > >>> to > > > >>> > the > > > >>> > >>> >>> KIP. > > > >>> > >>> >>>>>>> > > > >>> > >>> >>>>>>> <<<need to work out exactly how this works with all > > SASL > > > >>> > >>> mechanisms > > > >>> > >>> >>>>>>> <<<not sure we have the data required on the broker > or > > a > > > >>> way of > > > >>> > >>> >>>>>>> <<< extending the [GSSAPI] mechanism > > > >>> > >>> >>>>>>> Not sure I agree here. The paragraph I added to the > > KIP > > > >>> > >>> describes > > > >>> > >>> >>> how > > > >>> > >>> >>>>>> I > > > >>> > >>> >>>>>>> think this can be done. Given that description, do > you > > > >>> still > > > >>> > >>> feel > > > >>> > >>> >>>>>> Kerberos > > > >>> > >>> >>>>>>> will not be possible? If Kerberos presents a > > significant > > > >>> > hurdle > > > >>> > >>> >>> then I > > > >>> > >>> >>>>>>> don't think that should prevent us from doing it for > > > other > > > >>> > >>> mechanisms > > > >>> > >>> >>>>>> -- I > > > >>> > >>> >>>>>>> would rather state that it isn't supported with > GSSAPI > > > due > > > >>> to > > > >>> > >>> >>>>>> limitations > > > >>> > >>> >>>>>>> in Kerberos itself than not have it for OAUTHBEARER, > > for > > > >>> > example. > > > >>> > >>> >>> And > > > >>> > >>> >>>>>>> right now I don't think we need more than a > high-level > > > >>> > >>> description of > > > >>> > >>> >>>>>> how > > > >>> > >>> >>>>>>> this could be done. In other words, I think we have > > this > > > >>> > >>> covered, > > > >>> > >>> >>>>>> unless > > > >>> > >>> >>>>>>> there is a fundamental problem due to Kerberos not > > making > > > >>> data > > > >>> > >>> (i.e. > > > >>> > >>> >>>>>> ticket > > > >>> > >>> >>>>>>> expiration) available on the server. I want to > submit > > > >>> this > > > >>> > KIP > > > >>> > >>> for > > > >>> > >>> >>> a > > > >>> > >>> >>>>>> vote > > > >>> > >>> >>>>>>> early next week in the hope of having it accepted by > > the > > > >>> > Monday, > > > >>> > >>> 9/24 > > > >>> > >>> >>>>>>> deadline for the 2.1.0 release, and if that happens I > > > >>> believe I > > > >>> > >>> can > > > >>> > >>> >>>>>> get a > > > >>> > >>> >>>>>>> PR into really good shape soon thereafter (I'm > working > > on > > > >>> it > > > >>> > >>> now). > > > >>> > >>> >>>>>>> > > > >>> > >>> >>>>>>> Ron > > > >>> > >>> >>>>>>> > > > >>> > >>> >>>>>>> > > > >>> > >>> >>>>>>> > > > >>> > >>> >>>>>>> > > > >>> > >>> >>>>>>> > > > >>> > >>> >>>>>>> > > > >>> > >>> >>>>>>> On Fri, Sep 14, 2018 at 5:57 AM Rajini Sivaram < > > > >>> > >>> >>>>>> rajinisiva...@gmail.com> > > > >>> > >>> >>>>>>> wrote: > > > >>> > >>> >>>>>>> > > > >>> > >>> >>>>>>>> Hi Ron, > > > >>> > >>> >>>>>>>> > > > >>> > >>> >>>>>>>> 1) Yes, we should update the version of > > > >>> > `SaslHandshakeRequest`. > > > >>> > >>> >>>>>> Clients > > > >>> > >>> >>>>>>>> would attempt to re-authenticate only if broker's > > > >>> > >>> >>>>>> SaslHandshakeRequest > > > >>> > >>> >>>>>>>> version >=2. > > > >>> > >>> >>>>>>>> 2) I think we should enable broker's > re-authentication > > > and > > > >>> > >>> >>> connection > > > >>> > >>> >>>>>>>> termination code regardless of client version. > > > >>> > Re-authentication > > > >>> > >>> >>>>>> could be > > > >>> > >>> >>>>>>>> always enabled since it is triggered only by clients > > and > > > >>> > broker > > > >>> > >>> >>> could > > > >>> > >>> >>>>>>> just > > > >>> > >>> >>>>>>>> handle it. Connection termination should be opt-in, > > but > > > >>> should > > > >>> > >>> work > > > >>> > >>> >>>>>> with > > > >>> > >>> >>>>>>>> clients of any version. If turned on and clients are > > of > > > an > > > >>> > older > > > >>> > >>> >>>>>> version, > > > >>> > >>> >>>>>>>> this would simply force reconnection, which should > be > > > ok. > > > >>> > >>> >>> Additional > > > >>> > >>> >>>>>>>> metrics to monitor expired connections may be useful > > > >>> anyway. > > > >>> > >>> >>>>>>>> > > > >>> > >>> >>>>>>>> We also need to work out exactly how this works with > > all > > > >>> SASL > > > >>> > >>> >>>>>> mechanisms. > > > >>> > >>> >>>>>>>> In particular, we have ensure that we can make it > work > > > >>> with > > > >>> > >>> >>> Kerberos > > > >>> > >>> >>>>>>> since > > > >>> > >>> >>>>>>>> we use the implementation from the JRE and I am not > > sure > > > >>> we > > > >>> > have > > > >>> > >>> >>> the > > > >>> > >>> >>>>>> data > > > >>> > >>> >>>>>>>> required on the broker or a way of extending the > > > >>> mechanism. > > > >>> > >>> >>>>>>>> > > > >>> > >>> >>>>>>>> Thoughts? > > > >>> > >>> >>>>>>>> > > > >>> > >>> >>>>>>>> On Thu, Sep 13, 2018 at 6:56 PM, Ron Dagostino < > > > >>> > >>> rndg...@gmail.com> > > > >>> > >>> >>>>>>> wrote: > > > >>> > >>> >>>>>>>> > > > >>> > >>> >>>>>>>>> Hi Rajini. I'm thinking about how we deal with > > > >>> migration. > > > >>> > >>> For > > > >>> > >>> >>>>>>> example, > > > >>> > >>> >>>>>>>>> let's say we have an existing 2.0.0 cluster using > > > >>> > >>> >>> SASL/OAUTHBEARER > > > >>> > >>> >>>>>> and > > > >>> > >>> >>>>>>> we > > > >>> > >>> >>>>>>>>> want to use this feature. The desired end state is > > to > > > >>> have > > > >>> > all > > > >>> > >>> >>>>>> clients > > > >>> > >>> >>>>>>>> and > > > >>> > >>> >>>>>>>>> brokers migrated to a version that supports the > > feature > > > >>> (call > > > >>> > >>> it > > > >>> > >>> >>>>>> 2.x) > > > >>> > >>> >>>>>>>> with > > > >>> > >>> >>>>>>>>> the feature turned on. We need to document the > > > supported > > > >>> > >>> path(s) > > > >>> > >>> >>>>>> to > > > >>> > >>> >>>>>>> this > > > >>> > >>> >>>>>>>>> end state. > > > >>> > >>> >>>>>>>>> > > > >>> > >>> >>>>>>>>> Here are a couple of scenarios with implications: > > > >>> > >>> >>>>>>>>> > > > >>> > >>> >>>>>>>>> 1) *Migrate client to 2.x and turn the feature on > but > > > >>> broker > > > >>> > is > > > >>> > >>> >>>>>> still > > > >>> > >>> >>>>>>> at > > > >>> > >>> >>>>>>>>> 2.0.* In this case the client is going to get an > > error > > > >>> when > > > >>> > it > > > >>> > >>> >>>>>> sends > > > >>> > >>> >>>>>>> the > > > >>> > >>> >>>>>>>>> SaslHandshakeRequest. We want to avoid this. It > > seems > > > >>> to me > > > >>> > >>> the > > > >>> > >>> >>>>>>> client > > > >>> > >>> >>>>>>>>> needs to know if the broker has been upgraded to > the > > > >>> > necessary > > > >>> > >>> >>>>>> version > > > >>> > >>> >>>>>>>>> before trying to re-authenticate, which makes me > > > believe > > > >>> we > > > >>> > >>> need > > > >>> > >>> >>> to > > > >>> > >>> >>>>>>> bump > > > >>> > >>> >>>>>>>>> the API version number in 2.x and the client has to > > > >>> check to > > > >>> > >>> see > > > >>> > >>> >>>>>> if the > > > >>> > >>> >>>>>>>> max > > > >>> > >>> >>>>>>>>> version the broker supports meets a minimum > standard > > > >>> before > > > >>> > >>> >>> trying > > > >>> > >>> >>>>>> to > > > >>> > >>> >>>>>>>>> re-authenticate. Do you agree? > > > >>> > >>> >>>>>>>>> > > > >>> > >>> >>>>>>>>> 2) *Migrate broker to 2.x and turn the feature on > but > > > >>> client > > > >>> > is > > > >>> > >>> >>>>>> still > > > >>> > >>> >>>>>>> at > > > >>> > >>> >>>>>>>>> 2.0*. In this case the broker is going to end up > > > killing > > > >>> > >>> >>>>>> connections > > > >>> > >>> >>>>>>>>> periodically. Again, we want to avoid this. It's > > > >>> tempting > > > >>> > to > > > >>> > >>> >>> say > > > >>> > >>> >>>>>>>> "don't > > > >>> > >>> >>>>>>>>> do this" but I wonder if we can require > installations > > > to > > > >>> > >>> upgrade > > > >>> > >>> >>>>>> all > > > >>> > >>> >>>>>>>>> clients before turning the feature on at the > brokers. > > > >>> > >>> Certainly > > > >>> > >>> >>>>>> we can > > > >>> > >>> >>>>>>>> say > > > >>> > >>> >>>>>>>>> "don't do this" for inter-broker clients -- in > other > > > >>> words, > > > >>> > we > > > >>> > >>> >>> can > > > >>> > >>> >>>>>> say > > > >>> > >>> >>>>>>>> that > > > >>> > >>> >>>>>>>>> all brokers in a cluster should be upgraded before > > the > > > >>> > feature > > > >>> > >>> is > > > >>> > >>> >>>>>>> turned > > > >>> > >>> >>>>>>>> on > > > >>> > >>> >>>>>>>>> for any one of them -- but I don't know about our > > > >>> ability to > > > >>> > >>> say > > > >>> > >>> >>>>>> it for > > > >>> > >>> >>>>>>>>> non-broker clients. > > > >>> > >>> >>>>>>>>> > > > >>> > >>> >>>>>>>>> Now we consider the cases where both the brokers > and > > > the > > > >>> > >>> clients > > > >>> > >>> >>>>>> have > > > >>> > >>> >>>>>>>> been > > > >>> > >>> >>>>>>>>> upgraded to 2.x. When and where should the feature > > be > > > >>> turned > > > >>> > >>> on? > > > >>> > >>> >>>>>> The > > > >>> > >>> >>>>>>>>> inter-broker case is interesting because I don't > > think > > > >>> think > > > >>> > we > > > >>> > >>> >>> can > > > >>> > >>> >>>>>>>> require > > > >>> > >>> >>>>>>>>> installations to restart every broker with a new > > config > > > >>> where > > > >>> > >>> the > > > >>> > >>> >>>>>>> feature > > > >>> > >>> >>>>>>>>> is turned on at the same time. Do you agree that > we > > > >>> cannot > > > >>> > >>> >>> require > > > >>> > >>> >>>>>>> this > > > >>> > >>> >>>>>>>>> and therefore must support installations restarting > > > >>> brokers > > > >>> > >>> with > > > >>> > >>> >>> a > > > >>> > >>> >>>>>> new > > > >>> > >>> >>>>>>>>> config at whatever pace they need -- which may be > > quite > > > >>> slow > > > >>> > >>> >>>>>> relative > > > >>> > >>> >>>>>>> to > > > >>> > >>> >>>>>>>>> token lifetimes? Assuming you do agree, then there > > is > > > >>> going > > > >>> > to > > > >>> > >>> >>> be > > > >>> > >>> >>>>>> the > > > >>> > >>> >>>>>>>> case > > > >>> > >>> >>>>>>>>> where some brokers are going to have the feature > > turned > > > >>> on > > > >>> > and > > > >>> > >>> >>> some > > > >>> > >>> >>>>>>>> clients > > > >>> > >>> >>>>>>>>> (definitely inter-broker clients at a minimum) are > > > going > > > >>> to > > > >>> > >>> have > > > >>> > >>> >>>>>> the > > > >>> > >>> >>>>>>>>> feature turned off. > > > >>> > >>> >>>>>>>>> > > > >>> > >>> >>>>>>>>> The above discussion assumes a single config on the > > > >>> broker > > > >>> > side > > > >>> > >>> >>>>>> that > > > >>> > >>> >>>>>>>> turns > > > >>> > >>> >>>>>>>>> on both the inter-broker client re-authentication > > > >>> feature as > > > >>> > >>> well > > > >>> > >>> >>>>>> as > > > >>> > >>> >>>>>>> the > > > >>> > >>> >>>>>>>>> server-side expired-connection-kill feature, but > now > > > I'm > > > >>> > >>> thinking > > > >>> > >>> >>>>>> we > > > >>> > >>> >>>>>>> need > > > >>> > >>> >>>>>>>>> the ability to turn these features on > independently, > > > plus > > > >>> > maybe > > > >>> > >>> >>> we > > > >>> > >>> >>>>>>> need a > > > >>> > >>> >>>>>>>>> way to monitor the number of "active, expired" > > > >>> connections to > > > >>> > >>> the > > > >>> > >>> >>>>>>> server > > > >>> > >>> >>>>>>>> so > > > >>> > >>> >>>>>>>>> that operators can be sure that all clients have > been > > > >>> > >>> >>>>>> upgraded/enabled > > > >>> > >>> >>>>>>>>> before turning on the server-side > > > expired-connection-kill > > > >>> > >>> >>> feature. > > > >>> > >>> >>>>>>>>> > > > >>> > >>> >>>>>>>>> So the migration plan would be as follows: > > > >>> > >>> >>>>>>>>> > > > >>> > >>> >>>>>>>>> 1) Upgrade all brokers to 2.x. > > > >>> > >>> >>>>>>>>> 2) After all brokers are upgraded, turn on > > > >>> re-authentication > > > >>> > >>> for > > > >>> > >>> >>>>>> all > > > >>> > >>> >>>>>>>>> brokers at whatever rate is desired -- just > > eventually, > > > >>> at > > > >>> > some > > > >>> > >>> >>>>>> point, > > > >>> > >>> >>>>>>>> get > > > >>> > >>> >>>>>>>>> the client-side feature turned on for all brokers > so > > > that > > > >>> > >>> >>>>>> inter-broker > > > >>> > >>> >>>>>>>>> connections are re-authenticating. > > > >>> > >>> >>>>>>>>> 3) In parallel with (1) and (2) above, upgrade > > clients > > > >>> to 2.x > > > >>> > >>> and > > > >>> > >>> >>>>>> turn > > > >>> > >>> >>>>>>>>> their re-authentication feature on. Clients will > > check > > > >>> the > > > >>> > API > > > >>> > >>> >>>>>> version > > > >>> > >>> >>>>>>>> and > > > >>> > >>> >>>>>>>>> only re-authenticate to a broker that has also been > > > >>> upgraded > > > >>> > >>> >>> (note > > > >>> > >>> >>>>>> that > > > >>> > >>> >>>>>>>> the > > > >>> > >>> >>>>>>>>> ability of a broker to respond to a > re-authentication > > > >>> cannot > > > >>> > be > > > >>> > >>> >>>>>> turned > > > >>> > >>> >>>>>>>> off > > > >>> > >>> >>>>>>>>> -- it is always on beginning with version 2.x, and > it > > > >>> just > > > >>> > sits > > > >>> > >>> >>>>>> there > > > >>> > >>> >>>>>>>> doing > > > >>> > >>> >>>>>>>>> nothing if it isn't exercised by an enabled client) > > > >>> > >>> >>>>>>>>> 4) After (1), (2), and (3) are complete, check the > > 2.x > > > >>> broker > > > >>> > >>> >>>>>> metrics > > > >>> > >>> >>>>>>> to > > > >>> > >>> >>>>>>>>> confirm that there are no "active, expired" > > > connections. > > > >>> > Once > > > >>> > >>> >>> you > > > >>> > >>> >>>>>> are > > > >>> > >>> >>>>>>>>> satisfied that (1), (2), and (3) are indeed > complete > > > you > > > >>> can > > > >>> > >>> >>>>>> enable the > > > >>> > >>> >>>>>>>>> server-side expired-connection-kill feature on each > > > >>> broker > > > >>> > via > > > >>> > >>> a > > > >>> > >>> >>>>>>> restart > > > >>> > >>> >>>>>>>> at > > > >>> > >>> >>>>>>>>> whatever pace is desired. > > > >>> > >>> >>>>>>>>> > > > >>> > >>> >>>>>>>>> Comments? > > > >>> > >>> >>>>>>>>> > > > >>> > >>> >>>>>>>>> Ron > > > >>> > >>> >>>>>>>>> > > > >>> > >>> >>>>>>>>> > > > >>> > >>> >>>>>>>>> On Wed, Sep 12, 2018 at 4:48 PM Ron Dagostino < > > > >>> > >>> rndg...@gmail.com > > > >>> > >>> >>>> > > > >>> > >>> >>>>>>> wrote: > > > >>> > >>> >>>>>>>>> > > > >>> > >>> >>>>>>>>>> Ok, I am tempted to just say we go with the > > low-level > > > >>> > approach > > > >>> > >>> >>>>>> since > > > >>> > >>> >>>>>>> it > > > >>> > >>> >>>>>>>>> is > > > >>> > >>> >>>>>>>>>> the quickest and seems to meet the clear > > requirements. > > > >>> We > > > >>> > can > > > >>> > >>> >>>>>> always > > > >>> > >>> >>>>>>>>> adjust > > > >>> > >>> >>>>>>>>>> later if we get to clarity on other requirements > or > > we > > > >>> > decide > > > >>> > >>> >>> we > > > >>> > >>> >>>>>> need > > > >>> > >>> >>>>>>>> to > > > >>> > >>> >>>>>>>>>> approach it differently for whatever reason. But > in > > > the > > > >>> > >>> >>>>>> meantime, > > > >>> > >>> >>>>>>>> before > > > >>> > >>> >>>>>>>>>> fully committing to this decision, I would > > appreciate > > > >>> > another > > > >>> > >>> >>>>>>>> perspective > > > >>> > >>> >>>>>>>>>> if someone has one. > > > >>> > >>> >>>>>>>>>> > > > >>> > >>> >>>>>>>>>> Ron > > > >>> > >>> >>>>>>>>>> > > > >>> > >>> >>>>>>>>>> On Wed, Sep 12, 2018 at 3:15 PM Rajini Sivaram < > > > >>> > >>> >>>>>>>> rajinisiva...@gmail.com> > > > >>> > >>> >>>>>>>>>> wrote: > > > >>> > >>> >>>>>>>>>> > > > >>> > >>> >>>>>>>>>>> Hi Ron, > > > >>> > >>> >>>>>>>>>>> > > > >>> > >>> >>>>>>>>>>> Yes, I would leave out retries from this KIP for > > now. > > > >>> In > > > >>> > the > > > >>> > >>> >>>>>> future, > > > >>> > >>> >>>>>>>> if > > > >>> > >>> >>>>>>>>>>> there is a requirement for supporting retries, we > > can > > > >>> > >>> consider > > > >>> > >>> >>>>>> it. I > > > >>> > >>> >>>>>>>>> think > > > >>> > >>> >>>>>>>>>>> we can support retries with either approach if we > > > >>> needed > > > >>> > to, > > > >>> > >>> >>>>>> but it > > > >>> > >>> >>>>>>>>> would > > > >>> > >>> >>>>>>>>>>> be better to do it along with other changes > > required > > > to > > > >>> > >>> >>> support > > > >>> > >>> >>>>>>>>>>> authentication servers that are not highly > > available. > > > >>> > >>> >>>>>>>>>>> > > > >>> > >>> >>>>>>>>>>> For maintainability, I am biased, so it will be > > good > > > >>> to get > > > >>> > >>> >>> the > > > >>> > >>> >>>>>>>>>>> perspective > > > >>> > >>> >>>>>>>>>>> of others in the community :-) > > > >>> > >>> >>>>>>>>>>> > > > >>> > >>> >>>>>>>>>>> On Wed, Sep 12, 2018 at 7:47 PM, Ron Dagostino < > > > >>> > >>> >>>>>> rndg...@gmail.com> > > > >>> > >>> >>>>>>>>> wrote: > > > >>> > >>> >>>>>>>>>>> > > > >>> > >>> >>>>>>>>>>>> Hi Rajini. Here is some feedback/some things I > > > >>> thought > > > >>> > of. > > > >>> > >>> >>>>>>>>>>>> > > > >>> > >>> >>>>>>>>>>>> First, I just realized that from a timing > > > perspective > > > >>> > that I > > > >>> > >>> >>>>>> am > > > >>> > >>> >>>>>>> not > > > >>> > >>> >>>>>>>>> sure > > > >>> > >>> >>>>>>>>>>>> retry is going to be necessary. The background > > > login > > > >>> > >>> >>> refresh > > > >>> > >>> >>>>>>> thread > > > >>> > >>> >>>>>>>>>>>> triggers re-authentication when it refreshes the > > > >>> client's > > > >>> > >>> >>>>>>>> credential. > > > >>> > >>> >>>>>>>>>>> The > > > >>> > >>> >>>>>>>>>>>> OAuth infrastructure has to be available in > order > > > for > > > >>> this > > > >>> > >>> >>>>>> refresh > > > >>> > >>> >>>>>>>> to > > > >>> > >>> >>>>>>>>>>>> succeed (the background thread repeatedly > retries > > if > > > >>> it > > > >>> > >>> >>> can't > > > >>> > >>> >>>>>>>> refresh > > > >>> > >>> >>>>>>>>>>> the > > > >>> > >>> >>>>>>>>>>>> credential, and that retry loop handles any > > > temporary > > > >>> > >>> >>>>>> outage). If > > > >>> > >>> >>>>>>>>>>> clients > > > >>> > >>> >>>>>>>>>>>> are told to re-authenticate when the credential > is > > > >>> > refreshed > > > >>> > >>> >>>>>> **and > > > >>> > >>> >>>>>>>>> they > > > >>> > >>> >>>>>>>>>>>> actually re-authenticate at that point** then it > > is > > > >>> highly > > > >>> > >>> >>>>>>> unlikely > > > >>> > >>> >>>>>>>>> that > > > >>> > >>> >>>>>>>>>>>> the OAuth infrastructure would fail within those > > > >>> > intervening > > > >>> > >>> >>>>>>>>>>> milliseconds. > > > >>> > >>> >>>>>>>>>>>> So we don't need re-authentication retry in this > > KIP > > > >>> as > > > >>> > long > > > >>> > >>> >>>>>> as > > > >>> > >>> >>>>>>>> retry > > > >>> > >>> >>>>>>>>>>>> starts immediately. The low-level prototype as > > > >>> currently > > > >>> > >>> >>>>>> coded > > > >>> > >>> >>>>>>>>> doesn't > > > >>> > >>> >>>>>>>>>>>> actually start re-authentication until the > > > connection > > > >>> is > > > >>> > >>> >>>>>>>> subsequently > > > >>> > >>> >>>>>>>>>>> used, > > > >>> > >>> >>>>>>>>>>>> and that could take a while. But then again, if > > the > > > >>> > >>> >>>>>> connection > > > >>> > >>> >>>>>>>> isn't > > > >>> > >>> >>>>>>>>>>> used > > > >>> > >>> >>>>>>>>>>>> for some period of time, then the lost value of > > > >>> having to > > > >>> > >>> >>>>>> abandon > > > >>> > >>> >>>>>>>> the > > > >>> > >>> >>>>>>>>>>>> connection is lessened anyway. Plus, as you > > pointed > > > >>> out, > > > >>> > >>> >>>>>> OAuth > > > >>> > >>> >>>>>>>>>>>> infrastructure is assumed to be highly > available. > > > >>> > >>> >>>>>>>>>>>> > > > >>> > >>> >>>>>>>>>>>> Does this makes sense, and would you be willing > to > > > say > > > >>> > that > > > >>> > >>> >>>>>> retry > > > >>> > >>> >>>>>>>>> isn't > > > >>> > >>> >>>>>>>>>>> a > > > >>> > >>> >>>>>>>>>>>> necessary requirement? I'm tempted but would > like > > > to > > > >>> hear > > > >>> > >>> >>>>>> your > > > >>> > >>> >>>>>>>>>>> perspective > > > >>> > >>> >>>>>>>>>>>> first. > > > >>> > >>> >>>>>>>>>>>> > > > >>> > >>> >>>>>>>>>>>> Interleaving/latency and maintainability (more > > than > > > >>> lines > > > >>> > of > > > >>> > >>> >>>>>> code) > > > >>> > >>> >>>>>>>> are > > > >>> > >>> >>>>>>>>>>> the > > > >>> > >>> >>>>>>>>>>>> two remaining areas of comparison. I did not > add > > > the > > > >>> > >>> >>>>>>>> maintainability > > > >>> > >>> >>>>>>>>>>> issue > > > >>> > >>> >>>>>>>>>>>> to the KIP yet, but before I do I thought I > would > > > >>> address > > > >>> > it > > > >>> > >>> >>>>>> here > > > >>> > >>> >>>>>>>>> first > > > >>> > >>> >>>>>>>>>>> to > > > >>> > >>> >>>>>>>>>>>> see if we can come to consensus on it because > I'm > > > not > > > >>> > sure I > > > >>> > >>> >>>>>> see > > > >>> > >>> >>>>>>> the > > > >>> > >>> >>>>>>>>>>>> high-level approach as being hard to maintain > (yet > > > -- > > > >>> I > > > >>> > >>> >>> could > > > >>> > >>> >>>>>> be > > > >>> > >>> >>>>>>>>>>>> convinced/convince myself; we'll see). I do > want > > to > > > >>> make > > > >>> > >>> >>>>>> sure we > > > >>> > >>> >>>>>>>> are > > > >>> > >>> >>>>>>>>> on > > > >>> > >>> >>>>>>>>>>>> the same page about what is required to add > > > >>> > >>> >>> re-authentication > > > >>> > >>> >>>>>>>> support > > > >>> > >>> >>>>>>>>> in > > > >>> > >>> >>>>>>>>>>>> the high-level case. Granted, the amount is > zero > > in > > > >>> the > > > >>> > >>> >>>>>> low-level > > > >>> > >>> >>>>>>>>> case, > > > >>> > >>> >>>>>>>>>>>> and it doesn't get any better that that, but the > > > >>> amount in > > > >>> > >>> >>> the > > > >>> > >>> >>>>>>>>>>> high-level > > > >>> > >>> >>>>>>>>>>>> case is very low -- just a few lines of code. > For > > > >>> > example: > > > >>> > >>> >>>>>>>>>>>> > > > >>> > >>> >>>>>>>>>>>> KafkaAdminClient: > > > >>> > >>> >>>>>>>>>>>> > > https://github.com/apache/kafka/pull/5582/commits/ > > > >>> > >>> >>> 4fa70f38b9 > > > >>> > >>> >>>>>>>>>>>> d33428ff98b64a3a2bd668f5f28c38#diff- > > > >>> > >>> >>>>>>> 6869b8fccf6b098cbcb0676e8ceb26a7 > > > >>> > >>> >>>>>>>>>>>> It is the same few lines of code for > > KafkaConsumer, > > > >>> > >>> >>>>>> KafkaProducer, > > > >>> > >>> >>>>>>>>>>>> WorkerGroupMember, and > > > TransactionMarkerChannelManager > > > >>> > >>> >>>>>>>>>>>> > > > >>> > >>> >>>>>>>>>>>> The two synchronous I/O use cases are > > > >>> > >>> >>>>>> ControllerChannelManager and > > > >>> > >>> >>>>>>>>>>>> ReplicaFetcherBlockingSend (via > > > >>> ReplicaFetcherThread), and > > > >>> > >>> >>>>>> they > > > >>> > >>> >>>>>>>>> require > > > >>> > >>> >>>>>>>>>>> a > > > >>> > >>> >>>>>>>>>>>> little bit more -- but not much. > > > >>> > >>> >>>>>>>>>>>> > > > >>> > >>> >>>>>>>>>>>> Thoughts? > > > >>> > >>> >>>>>>>>>>>> > > > >>> > >>> >>>>>>>>>>>> Ron > > > >>> > >>> >>>>>>>>>>>> > > > >>> > >>> >>>>>>>>>>>> On Wed, Sep 12, 2018 at 1:57 PM Ron Dagostino < > > > >>> > >>> >>>>>> rndg...@gmail.com> > > > >>> > >>> >>>>>>>>>>> wrote: > > > >>> > >>> >>>>>>>>>>>> > > > >>> > >>> >>>>>>>>>>>>> Thanks, Rajini. Before I digest/respond to > that, > > > >>> here's > > > >>> > >>> >>> an > > > >>> > >>> >>>>>>> update > > > >>> > >>> >>>>>>>>>>> that I > > > >>> > >>> >>>>>>>>>>>>> just completed. > > > >>> > >>> >>>>>>>>>>>>> > > > >>> > >>> >>>>>>>>>>>>> I added a commit to the PR ( > > > >>> > >>> >>>>>>>>>>> https://github.com/apache/kafka/pull/5582/) > > > >>> > >>> >>>>>>>>>>>>> that implements server-side kill of expired > > > >>> OAUTHBEARER > > > >>> > >>> >>> >