I should have said `security configs` instead of `channel configs`.

The KIP says:

   - The configuration option this KIP proposes to add to enable
   server-side expired-connection-kill is '*connections.max.reauth.ms
   <http://connections.max.reauth.ms>*' (not prefixed with listener prefix
   or SASL mechanism name – this is a single value for the cluster)
   - The '*connections.max.reauth.ms <http://connections.max.reauth.ms>*'
   configuration option will not be dynamically changeable; restarts will be
   required if the value is to be changed.  However, if a new listener is
   dynamically added, the value could be set for that listener at that time.

Those statements are contradictory. Perhaps the first one should say `it
may be optionally prefixed with the listener name`?

On Tue, Sep 18, 2018 at 3:55 PM, Ron Dagostino <rndg...@gmail.com> wrote:

> HI Rajini.  The KIP is updated as summarized below, and I will start a vote
> immediately.
>
> <<<It may be useful to have a metric of expired connections killed
> Ok, agreed.  I called it expired-connections-killed-total
>
> <<<For `successful-v0-authentication-{rate,total}`, we probably want
> <<<version as a tag rather in the name.
> Ok, agreed.  I kept existing metrics unchanged but added an additional tag
> to the V0 metrics so they are separate.
>
> <<<Not sure if we need four of these
> <<<(rate/total with success/failure). Perhaps just success/total is
> sufficient?
> Ok, agreed, just kept the successful total.
>
> <<<For the session lifetime config, we don't need to require a listener or
> <<<mechanism prefix
> Ok, agreed, the config is now cluster-wide.
>
> <<<For all channel configs, we allow an optional listener prefix,
> <<<so we should do the same here
> Not sure what this is referring to.  We don't have channel configs here,
> right?
>
> <<<The KIP says connections are terminated on requests not related to
> <<<re-authentication (ApiVersionsRequest, SaslHandshakeRequest, and
> <<<SaslAuthenticateRequest). We can skip for ApiVersionsRequest for
> <<<re-authentication, so that doesn't need to be in the list.
> Yes, I was planning on that optimization; agreed, I removed it from the
> list
>
> <<<we allow new listeners
> <<<to be added dynamically and all configs for the listener can be added
> <<<dynamically (with the listener prefix). I think we want to allow that
> for
> <<<this config
> <<<We should mention this in the KIP, though in terms of implementation, I
> <<<would leave that for a separate JIRA (it doesn't need to be implemented
> at
> <<<the same time).
> Ok, agreed
>
>  Thanks again for all the feedback and discussion.
>
> Ron
>
> On Tue, Sep 18, 2018 at 6:43 AM Rajini Sivaram <rajinisiva...@gmail.com>
> wrote:
>
> > Hi Ron,
> >
> > Thanks for the updates. The KIP looks good. A few comments and minor
> points
> > below, but feel free to start vote to try and get it into 2.1.0. More
> > community feedback will be really useful.
> >
> > 1) It may be useful to have a metric of expired connections killed by the
> > broker. There could be a client implementation that doesn't support
> > re-authentications, but happens to use the latest version of
> > SaslAuthenticateRequest. Or cases where re-authentication didn't happen
> on
> > time.
> >
> > 2) For `successful-v0-authentication-{rate,total}`, we probably want
> > version as a tag rather in the name. Not sure if we need four of these
> > (rate/total with success/failure). Perhaps just success/total is
> > sufficient?
> >
> > 3) For the session lifetime config, we don't need to require a listener
> or
> > mechanism prefix. In most cases, we would expect a single config on the
> > broker-side. For all channel configs, we allow an optional listener
> prefix,
> > so we should do the same here.
> >
> > 4) The KIP says connections are terminated on requests not related to
> > re-authentication (ApiVersionsRequest, SaslHandshakeRequest, and
> > SaslAuthenticateRequest). We can skip for ApiVersionsRequest for
> > re-authentication, so that doesn't need to be in the list.
> >
> > 5) The KIP says that the new config will not be dynamically updatable. We
> > have a very limited set of configs that are dynamically updatable for an
> > existing listener. And we don't want to add this config to the list since
> > we don't expect this value to change frequently. But we allow new
> listeners
> > to be added dynamically and all configs for the listener can be added
> > dynamically (with the listener prefix). I think we want to allow that for
> > this config (i.e. add a new OAuth listener with re-authentication
> enabled).
> > We should mention this in the KIP, though in terms of implementation, I
> > would leave that for a separate JIRA (it doesn't need to be implemented
> at
> > the same time).
> >
> >
> >
> > On Tue, Sep 18, 2018 at 3:06 AM, Ron Dagostino <rndg...@gmail.com>
> wrote:
> >
> > > HI again, Rajini.  Would we ever want the max session time to be
> > different
> > > across different SASL mechanisms?  I'm wondering, now that we are
> > > supporting all SASL mechanisms via this KIP, if we still need to prefix
> > > this config with the "[listener].[mechanism]." prefix.  I've kept the
> > > prefix in the KIP for now, but it would be easier to just set it once
> for
> > > all mechanisms, and I don't see that as being a problem.  Let me know
> > what
> > > you think.
> > >
> > > Ron
> > >
> > > On Mon, Sep 17, 2018 at 9:51 PM Ron Dagostino <rndg...@gmail.com>
> wrote:
> > >
> > > > Hi Rajini.  The KIP is updated.  Aside from a once-over to make sure
> it
> > > is
> > > > all accurate, I think we need to confirm the metrics.  The decision
> to
> > > not
> > > > reject authentications that use tokens with too-long a lifetime
> allowed
> > > the
> > > > metrics to be simpler.  I decided that in addition to tracking these
> > > > metrics on the broker:
> > > >
> > > > failed-reauthentication-{rate,total} and
> > > > successful-reauthentication-{rate,total}
> > > >
> > > > we simply need one more set of broker metrics to track the subset of
> > > > clients clients that are not upgraded to v2.1.0 and are still using a
> > V0
> > > > SaslAuthenticateRequest:
> > > >
> > > > failed-v0-authentication-{rate,total} and
> > > > successful-v0-authentication-{rate,total}
> > > >
> > > > See the Migration section of the KIP for details of how this would be
> > > used.
> > > >
> > > > I wonder if we need a broker metric documenting the number of
> "expired"
> > > > sessions killed by the broker since it would be the same as
> > > > successful-v0-authentication-total. I've eliminated that from the
> KIP
> > > for
> > > > now.  Thoughts?
> > > >
> > > > There is also a client-side metric for re-authentication latency
> > tracking
> > > > (still unnamed -- do you have a preference?)
> > > >
> > > > I think we're close to being able to put this KIP up for a vote.
> > > >
> > > > Ron
> > > >
> > > >
> > > > On Mon, Sep 17, 2018 at 2:45 PM Ron Dagostino <rndg...@gmail.com>
> > wrote:
> > > >
> > > >> <<<extra field alongside errors, not in the opaque body as mentioned
> > in
> > > >> the KIP
> > > >> Right, that was a concern I had mentioned in a follow-up email.
> Agree
> > > it
> > > >> should go alongside errors.
> > > >>
> > > >> <<<SCRAM for delegation token based authentication where credentials
> > > have
> > > >> <<<an expiry time, so we do need to be able to set individual
> > credential
> > > >> <<<lifetimes, where the server knows the expiry time but client may
> > not
> > > >> Ah, ok, I was not aware that this case existed.  Agreed, for
> > > consistency,
> > > >> server will always send it back to the client.
> > > >>
> > > >> <<<I was trying to avoid this altogether [SaslClient negotiated
> > property
> > > >> for lifetime]
> > > >> Since we now agree that the server will send it back, yes, there is
> no
> > > >> need for this.
> > > >>
> > > >> <<<wasn't entirely clear about the purpose of
> > > >> [sasl.login.refresh.reauthenticate.enable]
> > > >> I think this might have been more useful when we weren't necessarily
> > > >> going to support all SASL mechanisms and/or when the broker was not
> > > going
> > > >> to advertise the fact that it supported re-authentication.  You are
> > > >> correct, now that we support it for all SASL mechanisms and we are
> > > bumping
> > > >> an API version, I think it is okay to simply enable it wherever both
> > the
> > > >> client and server meet the required versions.
> > > >>
> > > >> <<<wasn't entirely clear about the purpose of [
> > > connections.max.reauth.ms]
> > > >> <<<wasn't expecting that we would reject
> > > >> <<<tokens or tickets simply because they were too long-lived
> > > >> <<<tickets are granted by a 3rd party authority
> > > >> <<<Client re-authenticates even though token was not
> > > >> <<<refreshed. Does this matter?
> > > >> I was going under the assumption that it would matter, but based on
> > your
> > > >> pushback I just realized that the same functionality can be
> > implemented
> > > as
> > > >> part of token validation if there is a desire to limit token
> lifetimes
> > > to a
> > > >> certain max value (and the token validator has to be provided in
> > > production
> > > >> anyway since all we provide out-of-the-box is the unsecured
> > > validator).  So
> > > >> I'm willing to abandon this check as part of re-authentication.
> > > >>
> > > >> I'll adjust the KIP accordingly a bit later.  Thanks for the
> continued
> > > >> feedback/discussion.
> > > >>
> > > >> Ron
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Mon, Sep 17, 2018 at 2:10 PM Rajini Sivaram <
> > rajinisiva...@gmail.com
> > > >
> > > >> wrote:
> > > >>
> > > >>> Hi Ron,
> > > >>>
> > > >>>
> > > >>> *1) Is there a reason to communicate this value back to the client
> > when
> > > >>> the
> > > >>> client already knows it?  It's an extra network round-trip, and
> since
> > > the
> > > >>> SASL round-tripsare defined by the spec I'm not certain adding an
> > extra
> > > >>> round-trip is acceptable.*
> > > >>>
> > > >>> I wasn't suggesting an extra round-trip for propagating session
> > > >>> lifetime. I
> > > >>> was expecting session lifetime to be added to the last
> > > SASL_AUTHENTICATE
> > > >>> response from the broker. Because SASL is a challenge-response
> > > mechanism,
> > > >>> SaslServer knows when the response being sent is the last one and
> > hence
> > > >>> can
> > > >>> send the session lifetime in the response (in the same way as we
> > > >>> propagate
> > > >>> errors). I was expecting this to be added as an extra field
> alongside
> > > >>> errors, not in the opaque body as mentioned in the KIP. The opaque
> > byte
> > > >>> array strictly conforms to the SASL mechanism wire protocol and we
> > want
> > > >>> to
> > > >>> keep it that way.
> > > >>>
> > > >>> As you have said, we don't need server to propagate session
> lifetime
> > > for
> > > >>> OAUTHBEARER since client knows the token lifetime. But server also
> > > knows
> > > >>> the credential lifetime and by having the server decide the
> lifetime,
> > > we
> > > >>> can use the same code path for all mechanisms. If we only have a
> > > constant
> > > >>> max lifetime in the server, for PLAIN and SCRAM we will end up
> having
> > > the
> > > >>> same lifetime for all credentials with no ability to set actual
> > expiry.
> > > >>> We
> > > >>> use SCRAM for delegation token based authentication where
> credentials
> > > >>> have
> > > >>> an expiry time, so we do need to be able to set individual
> credential
> > > >>> lifetimes, where the server knows the expiry time but client may
> not.
> > > >>>
> > > >>> *2) I also just realized that if the client is to learn the
> > credential
> > > >>> lifetime we wouldn't want to put special-case code in the
> > Authenticator
> > > >>> for
> > > >>> GSSAPI and OAUTHBEARER; we would want to expose the value
> > generically,
> > > >>> probably as a negotiated property on the SaslClient instance.*
> > > >>>
> > > >>> I was trying to avoid this altogether. Client doesn't need to know
> > > >>> credential lifetime. Server asks the client to re-authenticate
> within
> > > its
> > > >>> session lifetime.
> > > >>>
> > > >>> 3) From the KIP, I wasn't entirely clear about the purpose of the
> two
> > > >>> configs:
> > > >>>
> > > >>> sasl.login.refresh.reauthenticate.enable: Do we need this? Client
> > > knows
> > > >>> if
> > > >>> broker version supports re-authentication based on the
> > > SASL_AUTHENTICATE
> > > >>> version returned in ApiVersionsResponse. Client knows if broker is
> > > >>> configured to enable re-authentication based on session lifetime
> > > returned
> > > >>> in SaslAuthenticateResponse. If broker has re-authentication
> > configured
> > > >>> and
> > > >>> client supports re-authentication, you would always want
> > > re-authenticate.
> > > >>> So I wasn't sure why we need a config to opt in or out on the
> > > >>> client-side.
> > > >>>
> > > >>>
> > > >>> connections.max.reauth.ms: We obviously need a broker-side config.
> > Not
> > > >>> entirely sure about the semantics of the config that drives
> > > >>> re-authentication. In particular, I wasn't expecting that we would
> > > reject
> > > >>> tokens or tickets simply because they were too long-lived. Since
> > tokens
> > > >>> or
> > > >>> tickets are granted by a 3rd party authority, I am not sure if
> > clients
> > > >>> will
> > > >>> always have control over the lifetime. Do we need to support any
> more
> > > >>> scenarios than these:
> > > >>>
> > > >>>
> > > >>> A) reauth.ms=10,credential.lifetime.ms=10 : Broker sets
> > > >>> session.lifetime=10,
> > > >>> so this works.
> > > >>>
> > > >>> B) reauth.ms=10, credential.lifetime.ms=5 : Broker sets
> > > >>> session.lifetime=5,
> > > >>> so this works.
> > > >>> C) reauth.ms=10, credential.lifetime.ms=20 : Broker sets
> > > >>> session.lifetime=10. Client re-authenticates even though token was
> > not
> > > >>> refreshed. Does this matter?
> > > >>> D) reauth.ms=Long.MAX_VALUE, credential.lifetime.ms=10: Broker
> sets
> > > >>> session.lifetime=10, client re-authenticates based on credential
> > > expiry.
> > > >>> E) reauth.ms=0 (default), credential.lifetime.ms=10 : Broker sets
> > > >>> session.lifetime=0, Broker doesn't terminate sessions, client
> doesn't
> > > >>> re-authenticate. We generate useful metrics.
> > > >>> F) reauth.ms=0 (default),no lifetime for credential (e.g. PLAIN):
> > > Broker
> > > >>> sets session.lifetime=0, Broker doesn't terminate sessions, client
> > > >>> doesn't
> > > >>> re-authenticate
> > > >>> G) reauth.ms=10,no lifetime for credential (e.g. PLAIN) : Broker
> > sets
> > > >>> session.lifetime=10. Client re-authenticates.
> > > >>>
> > > >>> I would have thought that D) is the typical scenario for
> > OAuth/Kerberos
> > > >>> to
> > > >>> respect token expiry time. G) would be typical scenario for PLAIN
> to
> > > >>> force
> > > >>> re-authenication at regular intervals. A/B/C are useful to force
> > > >>> re-authentication in scenarios where you might check for credential
> > > >>> revocation in the server. And E/F are useful to disable
> > > re-authentication
> > > >>> and generate metrics (also the default behaviour useful during
> > > >>> migration).
> > > >>> Have I missed something?
> > > >>>
> > > >>>
> > > >>> On Mon, Sep 17, 2018 at 4:27 PM, Ron Dagostino <rndg...@gmail.com>
> > > >>> wrote:
> > > >>>
> > > >>> > Hi yet again, Rajini.  I also just realized that if the client is
> > to
> > > >>> learn
> > > >>> > the credential lifetime we wouldn't want to put special-case code
> > in
> > > >>> the
> > > >>> > Authenticator for GSSAPI and OAUTHBEARER; we would want to expose
> > the
> > > >>> value
> > > >>> > generically, probably as a negotiated property on the SaslClient
> > > >>> instance.
> > > >>> > We might be talking about making the negotiated property key part
> > of
> > > >>> the
> > > >>> > public API.  In other words, the SaslClient would be responsible
> > for
> > > >>> > exposing the credential (i.e. token or ticket) lifetime at a
> > > well-known
> > > >>> > negotiated property name, such as "Credential.Lifetime" and
> > putting a
> > > >>> Long
> > > >>> > value there if there is a lifetime.  That well-klnown key (e.g.
> > > >>> > "Credential.Lifetime") would be part of the public API, right?
> > > >>> >
> > > >>> > Ron
> > > >>> >
> > > >>> > On Mon, Sep 17, 2018 at 11:03 AM Ron Dagostino <
> rndg...@gmail.com>
> > > >>> wrote:
> > > >>> >
> > > >>> > > Hi again, Rajini.  After thinking about this a little while, it
> > > >>> occurs to
> > > >>> > > me that maybe the communication of max session lifetime should
> > > occur
> > > >>> via
> > > >>> > > SASL_HANDSHAKE after all.  Here's why.  The value communicated
> is
> > > >>> the max
> > > >>> > > session lifetime allowed, and the client can assume it is the
> the
> > > >>> session
> > > >>> > > lifetime for that particular session unless the particular SASL
> > > >>> mechanism
> > > >>> > > could result in a shorter session that would be obvious to the
> > > >>> client and
> > > >>> > > the server.  In particular, for OAUTHBEARER, the session
> lifetime
> > > >>> will be
> > > >>> > > the token lifetime, which the client and server will both know.
> > Is
> > > >>> > there a
> > > >>> > > reason to communicate this value back to the client when the
> > client
> > > >>> > already
> > > >>> > > knows it?  It's an extra network round-trip, and since the SASL
> > > >>> > round-trips
> > > >>> > > are defined by the spec I'm not certain adding an extra
> > round-trip
> > > is
> > > >>> > > acceptable.  Even if we decide we can add it, it helps with
> > latency
> > > >>> if we
> > > >>> > > don't.
> > > >>> > >
> > > >>> > > Kerberos may be a bit different -- I don't know if the broker
> can
> > > >>> learn
> > > >>> > > the session lifetime.  If it can then the same thing holds --
> > both
> > > >>> client
> > > >>> > > and server will know the session lifetime and there is no
> reason
> > to
> > > >>> > > communicate it back.  If the server can't learn the lifetime
> > then I
> > > >>> don't
> > > >>> > > think adding an extra SASL_AUTHENTICATE round trip is going to
> > > help,
> > > >>> > anyway.
> > > >>> > >
> > > >>> > > Also, by communicating the max session lifetime in the
> > > SASL_HANDSHAKE
> > > >>> > > response, both OAUTHBEARER and GSSAPI clients will be able to
> > know
> > > >>> before
> > > >>> > > sending any SASL_AUTHENTICATE requests whether their credential
> > > >>> violates
> > > >>> > > the maximum.  This allows a behaving client to give a good
> error
> > > >>> message.
> > > >>> > > A malicious client would ignore the value and send a
> longer-lived
> > > >>> > > credential, and then that would be rejected on the server side.
> > > >>> > >
> > > >>> > > I'm still good with ExpiringCredential not needing to be
> public.
> > > >>> > >
> > > >>> > > What do you think?
> > > >>> > >
> > > >>> > > Ron
> > > >>> > >
> > > >>> > > Ron
> > > >>> > >
> > > >>> > > Ron
> > > >>> > >
> > > >>> > > On Mon, Sep 17, 2018 at 9:13 AM Ron Dagostino <
> rndg...@gmail.com
> > >
> > > >>> wrote:
> > > >>> > >
> > > >>> > >> Hi Rajini.  I've updated the KIP to reflect the decision to
> > fully
> > > >>> > support
> > > >>> > >> this for all SASL mechanisms and to not require the
> > > >>> ExpiringCredential
> > > >>> > >> interface to be public.
> > > >>> > >>
> > > >>> > >> Ron
> > > >>> > >>
> > > >>> > >> On Mon, Sep 17, 2018 at 6:55 AM Ron Dagostino <
> > rndg...@gmail.com>
> > > >>> > wrote:
> > > >>> > >>
> > > >>> > >>> Actually, I think the metric remains the same assuming the
> > > >>> > >>> authentication fails when the token  lifetime is too long.
> > > >>> Negative
> > > >>> > max
> > > >>> > >>> config on server counts what would have been killed because
> > maybe
> > > >>> > client
> > > >>> > >>> re-auth is not turned on; positive max enables the kill,
> which
> > is
> > > >>> > counted
> > > >>> > >>> by a second metric as proposed.  The current proposal had
> > already
> > > >>> > stated
> > > >>> > >>> that a non-zero value would disallow an authentication with a
> > > token
> > > >>> > that
> > > >>> > >>> has too large a lifetime, and that is still the case, I
> think.
> > > But
> > > >>> > let’s
> > > >>> > >>> make sure we are on the same page about all this.
> > > >>> > >>>
> > > >>> > >>> Ron
> > > >>> > >>>
> > > >>> > >>> > On Sep 17, 2018, at 6:42 AM, Ron Dagostino <
> > rndg...@gmail.com>
> > > >>> > wrote:
> > > >>> > >>> >
> > > >>> > >>> > Hi Rajini.  I see what you are saying.  The background
> login
> > > >>> refresh
> > > >>> > >>> thread does have to factor into the decision for OAUTHBEARER
> > > >>> because
> > > >>> > there
> > > >>> > >>> is no new token to re-authenticate with until that refresh
> > > succeeds
> > > >>> > >>> (similarly with Kerberos), but I think you are right that the
> > > >>> interface
> > > >>> > >>> doesn’t necessarily have to be public.  The server does
> decide
> > > the
> > > >>> > time for
> > > >>> > >>> PLAIN and the SCRAM-related mechanisms under the current
> > > proposal,
> > > >>> but
> > > >>> > it
> > > >>> > >>> is done in SaslHandshakeResponse, and at that point the
> server
> > > >>> won’t
> > > >>> > have
> > > >>> > >>> any token yet; making it happen via SaslAuthenticateRequest
> at
> > > the
> > > >>> > very end
> > > >>> > >>> does allow the server to know everything for all mechanisms.
> > > >>> > >>> >
> > > >>> > >>> > I see two potential issues to discuss.  First, the server
> > must
> > > >>> > >>> communicate a time that exceeds the (token or ticket) refresh
> > > >>> period
> > > >>> > on the
> > > >>> > >>> client.  Assuming it communicates the expiration time of the
> > > >>> > token/ticket,
> > > >>> > >>> that’s the best it can do.  So I think that’ll be fine.
> > > >>> > >>> >
> > > >>> > >>> > The second issue is what happens if the server is
> configured
> > to
> > > >>> > accept
> > > >>> > >>> a max value — say, an hour — and the token is good for
> > longer.  I
> > > >>> > assumed
> > > >>> > >>> that the client should not be allowed to authenticate in the
> > > first
> > > >>> > place
> > > >>> > >>> because it could then simply re-authenticate with the same
> > token
> > > >>> after
> > > >>> > an
> > > >>> > >>> hour, which defeats the motivations for this KIP.  So do we
> > agree
> > > >>> the
> > > >>> > max
> > > >>> > >>> token lifetime allowed will be enforced at authentication
> time?
> > > >>> > Assuming
> > > >>> > >>> so, then we need to discuss migration.
> > > >>> > >>> >
> > > >>> > >>> > The current proposal includes a metric that can be used to
> > > >>> identify
> > > >>> > if
> > > >>> > >>> an OAUTHBEARER client is misconfigured — the number of
> > > connections
> > > >>> that
> > > >>> > >>> would have been killed (could be non-zero when the configured
> > max
> > > >>> is a
> > > >>> > >>> negative number). Do we still want this type of an
> indication,
> > > and
> > > >>> if
> > > >>> > so,
> > > >>> > >>> is it still done the same way — a negative number for max,
> but
> > > >>> instead
> > > >>> > of
> > > >>> > >>> counting the number of kills that would have been done it
> > counts
> > > >>> the
> > > >>> > number
> > > >>> > >>> of authentications that would have been failed?
> > > >>> > >>> >
> > > >>> > >>> > Ron
> > > >>> > >>> >
> > > >>> > >>> >> On Sep 17, 2018, at 6:06 AM, Rajini Sivaram <
> > > >>> > rajinisiva...@gmail.com>
> > > >>> > >>> wrote:
> > > >>> > >>> >>
> > > >>> > >>> >> Hi Ron,
> > > >>> > >>> >>
> > > >>> > >>> >> Thinking a bit more about other SASL mechanisms, I think
> the
> > > >>> issue
> > > >>> > is
> > > >>> > >>> that
> > > >>> > >>> >> in the current proposal, clients decide the
> > re-authentication
> > > >>> > period.
> > > >>> > >>> For
> > > >>> > >>> >> mechanisms like PLAIN or SCRAM, we would actually want
> > server
> > > to
> > > >>> > >>> determine
> > > >>> > >>> >> the re-authentication interval. For example, the PLAIN or
> > > SCRAM
> > > >>> > >>> database
> > > >>> > >>> >> could specify the expiry time for each principal, or the
> > > broker
> > > >>> > could
> > > >>> > >>> be
> > > >>> > >>> >> configured with a single expiry time for all principals of
> > > that
> > > >>> > >>> mechanism.
> > > >>> > >>> >> For OAuth, brokers do know the expiry time. Haven't
> figured
> > > out
> > > >>> what
> > > >>> > >>> to do
> > > >>> > >>> >> with Kerberos, but in any case for broker-side connection
> > > >>> > termination
> > > >>> > >>> on
> > > >>> > >>> >> expiry, we need the broker to know/decide the expiry time.
> > So
> > > I
> > > >>> > would
> > > >>> > >>> like
> > > >>> > >>> >> to suggest a slightly different approach to managing
> > > >>> > >>> re-authentications.
> > > >>> > >>> >>
> > > >>> > >>> >> 1) Instead of changing SASL_HANDSHAKE version number, we
> > bump
> > > up
> > > >>> > >>> >> SASL_AUTHENTICATE version number.
> > > >>> > >>> >> 2) In the final SASL_AUTHENTICATE response, broker returns
> > the
> > > >>> > expiry
> > > >>> > >>> time
> > > >>> > >>> >> of this authenticated session. This is the interval after
> > > which
> > > >>> > >>> broker will
> > > >>> > >>> >> terminate the connection If it wasn't re-authenticated.
> > > >>> > >>> >> 3) We no longer need the public interface change to add `
> > > >>> > >>> >>
> > org.apache.kafka.common.security.expiring.ExpiringCredential`
> > > >>> for
> > > >>> > the
> > > >>> > >>> >> client-side. Instead we schedule the next
> re-authentication
> > on
> > > >>> the
> > > >>> > >>> client
> > > >>> > >>> >> based on the expiry time provided by the server (some time
> > > >>> earlier
> > > >>> > >>> than the
> > > >>> > >>> >> expiry).
> > > >>> > >>> >> 4) If client uses SASL_AUTHENTICATE v0, broker will not
> > return
> > > >>> > expiry
> > > >>> > >>> time.
> > > >>> > >>> >> The connection will be terminated if that feature is
> enabled
> > > >>> (the
> > > >>> > >>> same code
> > > >>> > >>> >> path as client failing to re-authenticte on time).
> > > >>> > >>> >>
> > > >>> > >>> >> Thoughts?
> > > >>> > >>> >>
> > > >>> > >>> >>> On Sat, Sep 15, 2018 at 3:06 AM, Ron Dagostino <
> > > >>> rndg...@gmail.com>
> > > >>> > >>> wrote:
> > > >>> > >>> >>>
> > > >>> > >>> >>> Hi everyone.  I've updated the KIP to reflect all
> > discussion
> > > to
> > > >>> > date,
> > > >>> > >>> >>> including the decision to go with the low-level approach.
> > > This
> > > >>> > >>> latest
> > > >>> > >>> >>> version of the KIP includes the above "
> > > >>> connections.max.reauth.ms"
> > > >>> > >>> >>> proposal,
> > > >>> > >>> >>> which I know has not been discussed yet.  It mentions new
> > > >>> metrics,
> > > >>> > >>> some of
> > > >>> > >>> >>> which may not have been discussed either (and names are
> > > missing
> > > >>> > from
> > > >>> > >>> some
> > > >>> > >>> >>> of them).  Regardless, this new version is the closest
> yet
> > > to a
> > > >>> > >>> version
> > > >>> > >>> >>> that can be put to a vote next week.
> > > >>> > >>> >>>
> > > >>> > >>> >>> Ron
> > > >>> > >>> >>>
> > > >>> > >>> >>>> On Fri, Sep 14, 2018 at 8:59 PM Ron Dagostino <
> > > >>> rndg...@gmail.com>
> > > >>> > >>> wrote:
> > > >>> > >>> >>>>
> > > >>> > >>> >>>> Minor correction: I'm proposing "
> > connections.max.reauth.ms"
> > > >>> as
> > > >>> > the
> > > >>> > >>> >>>> broker-side configuration property, not "
> > > >>> > >>> >>>> connections.max.expired.credentials.ms".
> > > >>> > >>> >>>>
> > > >>> > >>> >>>> Ron
> > > >>> > >>> >>>>
> > > >>> > >>> >>>>> On Fri, Sep 14, 2018 at 8:40 PM Ron Dagostino <
> > > >>> rndg...@gmail.com
> > > >>> > >
> > > >>> > >>> wrote:
> > > >>> > >>> >>>>>
> > > >>> > >>> >>>>> Hi Rajini.  I'm going to choose *
> > > >>> > >>> connections.max.expired.credentials.ms
> > > >>> > >>> >>>>> <http://connections.max.expired.credentials.ms>* as
> the
> > > >>> option
> > > >>> > >>> name
> > > >>> > >>> >>>>> because it is consistent with the comment I made in the
> > > >>> section
> > > >>> > >>> about
> > > >>> > >>> >>> how
> > > >>> > >>> >>>>> to get the client and server to agree on credential
> > > lifetime:
> > > >>> > >>> >>>>>
> > > >>> > >>> >>>>> "We could add a new API call so that clients could ask
> > > >>> servers
> > > >>> > for
> > > >>> > >>> the
> > > >>> > >>> >>>>> lifetime they use, or we could extend the
> > > >>> > >>> SaslHandshakeRequest/Response
> > > >>> > >>> >>> API
> > > >>> > >>> >>>>> call to include that information in the server's
> > response –
> > > >>> the
> > > >>> > >>> client
> > > >>> > >>> >>>>> would then adopt that value"
> > > >>> > >>> >>>>>
> > > >>> > >>> >>>>>
> > > >>> > >>> >>>>> We set the config option value on the broker (with the
> "
> > > >>> > >>> >>>>> listener.name.mechanism." prefix), and we will return
> the
> > > >>> > >>> configured
> > > >>> > >>> >>>>> value in the SaslHandshakeResponse (requiring a wire
> > format
> > > >>> > change
> > > >>> > >>> in
> > > >>> > >>> >>>>> addition to a version bump).  The value (referred to as
> > "X"
> > > >>> > below)
> > > >>> > >>> can
> > > >>> > >>> >>> be
> > > >>> > >>> >>>>> negative, zero, or positive and is to be interpreted as
> > > >>> follows:
> > > >>> > >>> >>>>>
> > > >>> > >>> >>>>> X = 0: Fully disable.  The server will respond to
> > > >>> > >>> re-authentications,
> > > >>> > >>> >>> but
> > > >>> > >>> >>>>> it won't kill any connections due to expiration, and it
> > > won't
> > > >>> > track
> > > >>> > >>> >>> either
> > > >>> > >>> >>>>> of the 2 metrics mentioned below.
> > > >>> > >>> >>>>>
> > > >>> > >>> >>>>> Now, a couple of definitions for when X != 0:
> > > >>> > >>> >>>>>
> > > >>> > >>> >>>>> 1) We define the *maximum allowed expiration time* to
> be
> > > the
> > > >>> time
> > > >>> > >>> >>>>> determined by the point when (re-)authentication occurs
> > > plus
> > > >>> |X|
> > > >>> > >>> >>>>> milliseconds.
> > > >>> > >>> >>>>> 2) We define the *requested expiration time* to be the
> > > >>> maximum
> > > >>> > >>> allowed
> > > >>> > >>> >>>>> expiration time except for the case of an OAuth bearer
> > > >>> token, in
> > > >>> > >>> which
> > > >>> > >>> >>> case
> > > >>> > >>> >>>>> it is the point at which the token expires.  (Kerberos
> > > >>> presumably
> > > >>> > >>> also
> > > >>> > >>> >>>>> specifies a ticket lifetime, but frankly I am not in a
> > > >>> position
> > > >>> > to
> > > >>> > >>> do
> > > >>> > >>> >>> any
> > > >>> > >>> >>>>> Kerberos-related coding in time for a 2.1.0 release and
> > > would
> > > >>> > >>> prefer to
> > > >>> > >>> >>>>> ignore this piece of information for this KIP -- would
> it
> > > be
> > > >>> > >>> acceptable
> > > >>> > >>> >>> to
> > > >>> > >>> >>>>> have someone else add it later?).
> > > >>> > >>> >>>>>
> > > >>> > >>> >>>>> Based on these definitions, we define the behavior as
> > > >>> follows:
> > > >>> > >>> >>>>>
> > > >>> > >>> >>>>> *X > 0: Fully enable*
> > > >>> > >>> >>>>> A) The server will reject any
> > > >>> authentication/re-authentication
> > > >>> > >>> attempt
> > > >>> > >>> >>>>> when the requested expiration time is after the maximum
> > > >>> allowed
> > > >>> > >>> >>> expiration
> > > >>> > >>> >>>>> time (which could only happen with an OAuth bearer
> token,
> > > >>> > assuming
> > > >>> > >>> we
> > > >>> > >>> >>> skip
> > > >>> > >>> >>>>> dealing with Kerberos for now).
> > > >>> > >>> >>>>> B) The server will kill connections that are used
> beyond
> > > the
> > > >>> > >>> requested
> > > >>> > >>> >>>>> expiration time.
> > > >>> > >>> >>>>> C) A broker metric will be maintained that documents
> the
> > > >>> number
> > > >>> > >>> >>>>> connections killed by the broker.  This metric will be
> > > >>> non-zero
> > > >>> > if
> > > >>> > >>> a
> > > >>> > >>> >>> client
> > > >>> > >>> >>>>> is connecting to the broker with re-authentication
> either
> > > >>> > >>> unavailable
> > > >>> > >>> >>> (i.e.
> > > >>> > >>> >>>>> an older client) or disabled.
> > > >>> > >>> >>>>>
> > > >>> > >>> >>>>> *X < 0: Partially enable*
> > > >>> > >>> >>>>> A) Same as above: the server will reject any
> > > >>> > >>> >>>>> authentication/re-authentication attempt when the
> > > requested
> > > >>> > >>> expiration
> > > >>> > >>> >>> time
> > > >>> > >>> >>>>> is after the maximum allowed expiration time (which
> could
> > > >>> only
> > > >>> > >>> happen
> > > >>> > >>> >>> with
> > > >>> > >>> >>>>> an OAuth bearer token, assuming we skip dealing with
> > > >>> Kerberos for
> > > >>> > >>> now).
> > > >>> > >>> >>>>> B) The server will **not** kill connections that are
> used
> > > >>> beyond
> > > >>> > >>> the
> > > >>> > >>> >>>>> requested expiration time.
> > > >>> > >>> >>>>> C) A broker metric will be maintained that documents
> the
> > > >>> number
> > > >>> > of
> > > >>> > >>> API
> > > >>> > >>> >>>>> requests unrelated to re-authentication that are made
> > over
> > > a
> > > >>> > >>> connection
> > > >>> > >>> >>>>> beyond the requested expiration time.  This metric will
> > be
> > > >>> > helpful
> > > >>> > >>> for
> > > >>> > >>> >>>>> operators to ensure that all clients are properly
> > upgraded
> > > >>> and
> > > >>> > >>> >>>>> re-authenticating before fully enabling the server-side
> > > >>> > >>> >>>>> expired-connection-kill functionality (i.e. by changing
> > the
> > > >>> value
> > > >>> > >>> from a
> > > >>> > >>> >>>>> negative number to a positive number): this metric will
> > be
> > > >>> zero
> > > >>> > >>> across
> > > >>> > >>> >>> all
> > > >>> > >>> >>>>> brokers when it is safe to fully enable the server-side
> > > >>> feature.
> > > >>> > >>> >>>>>
> > > >>> > >>> >>>>> Thoughts?
> > > >>> > >>> >>>>>
> > > >>> > >>> >>>>> Ron
> > > >>> > >>> >>>>>
> > > >>> > >>> >>>>> On Fri, Sep 14, 2018 at 11:59 AM Rajini Sivaram <
> > > >>> > >>> >>> rajinisiva...@gmail.com>
> > > >>> > >>> >>>>> wrote:
> > > >>> > >>> >>>>>
> > > >>> > >>> >>>>>> It will be good to shorten the config name. We have a
> > > config
> > > >>> > >>> named `
> > > >>> > >>> >>>>>> connections.max.idle.ms`. We could add something
> like `
> > > >>> > >>> >>>>>> connections.max.expired.credentials.ms`. As you said,
> > it
> > > >>> would
> > > >>> > be
> > > >>> > >>> >>>>>> prefixed
> > > >>> > >>> >>>>>> with listener and SASL mechanism name. We should be
> able
> > > to
> > > >>> > >>> support
> > > >>> > >>> >>>>>> connection termination in future even with SSL, so
> > perhaps
> > > >>> we
> > > >>> > >>> don't
> > > >>> > >>> >>> want
> > > >>> > >>> >>>>>> the `sasl.server.` prefix (we just need to validate
> > > whether
> > > >>> the
> > > >>> > >>> config
> > > >>> > >>> >>> is
> > > >>> > >>> >>>>>> supported for the protocol). You can choose whether a
> > > >>> boolean
> > > >>> > >>> flag or a
> > > >>> > >>> >>>>>> time interval makes more sense.
> > > >>> > >>> >>>>>>
> > > >>> > >>> >>>>>> For the client-side, the KIP explains how to make it
> > work
> > > >>> for
> > > >>> > >>> other
> > > >>> > >>> >>>>>> mechanisms and we can leave it that. Not convinced
> about
> > > >>> > >>> server-side
> > > >>> > >>> >>>>>> though. It seems to me that we probably would require
> > API
> > > >>> > changes
> > > >>> > >>> to
> > > >>> > >>> >>> make
> > > >>> > >>> >>>>>> it work. Basically the proposed approach works only
> for
> > > >>> > >>> OAUTHBEARER.
> > > >>> > >>> >>>>>> Since
> > > >>> > >>> >>>>>> we have to bump up SaslHandshakeRequest version in
> this
> > > >>> KIP, it
> > > >>> > >>> will be
> > > >>> > >>> >>>>>> good to work out if we need to change the request or
> > flow
> > > to
> > > >>> > make
> > > >>> > >>> this
> > > >>> > >>> >>>>>> work
> > > >>> > >>> >>>>>> for other mechanisms. I haven't figured out exactly
> what
> > > is
> > > >>> > >>> needed, but
> > > >>> > >>> >>>>>> will think about it and get back next week. In the
> > > >>> meantime, you
> > > >>> > >>> can
> > > >>> > >>> >>> get
> > > >>> > >>> >>>>>> the KIP up-to-date with the config, migration path
> etc.
> > > and
> > > >>> get
> > > >>> > it
> > > >>> > >>> >>> ready
> > > >>> > >>> >>>>>> to
> > > >>> > >>> >>>>>> start vote next week.
> > > >>> > >>> >>>>>>
> > > >>> > >>> >>>>>>
> > > >>> > >>> >>>>>>
> > > >>> > >>> >>>>>>
> > > >>> > >>> >>>>>>
> > > >>> > >>> >>>>>> On Fri, Sep 14, 2018 at 1:24 PM, Ron Dagostino <
> > > >>> > rndg...@gmail.com
> > > >>> > >>> >
> > > >>> > >>> >>>>>> wrote:
> > > >>> > >>> >>>>>>
> > > >>> > >>> >>>>>>> Hi Rajini.
> > > >>> > >>> >>>>>>>
> > > >>> > >>> >>>>>>> Agreed, we will bump the SaslHandshakeRequest API
> > number
> > > >>> (no
> > > >>> > wire
> > > >>> > >>> >>>>>> format
> > > >>> > >>> >>>>>>> change, of course).
> > > >>> > >>> >>>>>>>
> > > >>> > >>> >>>>>>> Agreed, responding to re-authentication will always
> be
> > > >>> enabled
> > > >>> > >>> on the
> > > >>> > >>> >>>>>>> broker since it is initiated by the client.
> > > >>> > >>> >>>>>>>
> > > >>> > >>> >>>>>>> Agreed, connection termination by the broker will be
> > > >>> opt-in.
> > > >>> > I'm
> > > >>> > >>> >>>>>> thinking
> > > >>> > >>> >>>>>>> *sasl.server.disconnect.expired.credential.
> connections.
> > > >>> > enable*,
> > > >>> > >>> >>>>>> prefixed
> > > >>> > >>> >>>>>>> with listener and SASL mechanism name in lower-case
> so
> > it
> > > >>> can
> > > >>> > be
> > > >>> > >>> >>>>>> opt-in at
> > > >>> > >>> >>>>>>> the granularity of a SASL mechanism; for example, "
> > > >>> > >>> >>>>>>> listener.name.sasl_ssl.oauthbearer.sasl.server.
> > > >>> > >>> >>>>>>> disconnect.expired.credential.
> connections.enable=true
> > > >>> > >>> >>>>>>> ".  It is long, so if you have a preference for a
> > shorter
> > > >>> name
> > > >>> > >>> let me
> > > >>> > >>> >>>>>> know
> > > >>> > >>> >>>>>>> and we'll go with it instead.
> > > >>> > >>> >>>>>>>
> > > >>> > >>> >>>>>>> Agreed, additional metrics will be helpful.  I'll add
> > > them
> > > >>> to
> > > >>> > the
> > > >>> > >>> >>> KIP.
> > > >>> > >>> >>>>>>>
> > > >>> > >>> >>>>>>> <<<need to work out exactly how this works with all
> > SASL
> > > >>> > >>> mechanisms
> > > >>> > >>> >>>>>>> <<<not sure we have the data required on the broker
> or
> > a
> > > >>> way of
> > > >>> > >>> >>>>>>> <<< extending the [GSSAPI] mechanism
> > > >>> > >>> >>>>>>> Not sure I agree here.  The paragraph I added to the
> > KIP
> > > >>> > >>> describes
> > > >>> > >>> >>> how
> > > >>> > >>> >>>>>> I
> > > >>> > >>> >>>>>>> think this can be done.  Given that description, do
> you
> > > >>> still
> > > >>> > >>> feel
> > > >>> > >>> >>>>>> Kerberos
> > > >>> > >>> >>>>>>> will not be possible?  If Kerberos presents a
> > significant
> > > >>> > hurdle
> > > >>> > >>> >>> then I
> > > >>> > >>> >>>>>>> don't think that should prevent us from doing it for
> > > other
> > > >>> > >>> mechanisms
> > > >>> > >>> >>>>>> -- I
> > > >>> > >>> >>>>>>> would rather state that it isn't supported with
> GSSAPI
> > > due
> > > >>> to
> > > >>> > >>> >>>>>> limitations
> > > >>> > >>> >>>>>>> in Kerberos itself than not have it for OAUTHBEARER,
> > for
> > > >>> > example.
> > > >>> > >>> >>> And
> > > >>> > >>> >>>>>>> right now I don't think we need more than a
> high-level
> > > >>> > >>> description of
> > > >>> > >>> >>>>>> how
> > > >>> > >>> >>>>>>> this could be done.  In other words, I think we have
> > this
> > > >>> > >>> covered,
> > > >>> > >>> >>>>>> unless
> > > >>> > >>> >>>>>>> there is a fundamental problem due to Kerberos not
> > making
> > > >>> data
> > > >>> > >>> (i.e.
> > > >>> > >>> >>>>>> ticket
> > > >>> > >>> >>>>>>> expiration) available on the server.   I want to
> submit
> > > >>> this
> > > >>> > KIP
> > > >>> > >>> for
> > > >>> > >>> >>> a
> > > >>> > >>> >>>>>> vote
> > > >>> > >>> >>>>>>> early next week in the hope of having it accepted by
> > the
> > > >>> > Monday,
> > > >>> > >>> 9/24
> > > >>> > >>> >>>>>>> deadline for the 2.1.0 release, and if that happens I
> > > >>> believe I
> > > >>> > >>> can
> > > >>> > >>> >>>>>> get a
> > > >>> > >>> >>>>>>> PR into really good shape soon thereafter (I'm
> working
> > on
> > > >>> it
> > > >>> > >>> now).
> > > >>> > >>> >>>>>>>
> > > >>> > >>> >>>>>>> Ron
> > > >>> > >>> >>>>>>>
> > > >>> > >>> >>>>>>>
> > > >>> > >>> >>>>>>>
> > > >>> > >>> >>>>>>>
> > > >>> > >>> >>>>>>>
> > > >>> > >>> >>>>>>>
> > > >>> > >>> >>>>>>> On Fri, Sep 14, 2018 at 5:57 AM Rajini Sivaram <
> > > >>> > >>> >>>>>> rajinisiva...@gmail.com>
> > > >>> > >>> >>>>>>> wrote:
> > > >>> > >>> >>>>>>>
> > > >>> > >>> >>>>>>>> Hi Ron,
> > > >>> > >>> >>>>>>>>
> > > >>> > >>> >>>>>>>> 1) Yes, we should update the version of
> > > >>> > `SaslHandshakeRequest`.
> > > >>> > >>> >>>>>> Clients
> > > >>> > >>> >>>>>>>> would attempt to re-authenticate only if broker's
> > > >>> > >>> >>>>>> SaslHandshakeRequest
> > > >>> > >>> >>>>>>>> version >=2.
> > > >>> > >>> >>>>>>>> 2) I think we should enable broker's
> re-authentication
> > > and
> > > >>> > >>> >>> connection
> > > >>> > >>> >>>>>>>> termination code regardless of client version.
> > > >>> > Re-authentication
> > > >>> > >>> >>>>>> could be
> > > >>> > >>> >>>>>>>> always enabled since it is triggered only by clients
> > and
> > > >>> > broker
> > > >>> > >>> >>> could
> > > >>> > >>> >>>>>>> just
> > > >>> > >>> >>>>>>>> handle it. Connection termination should be opt-in,
> > but
> > > >>> should
> > > >>> > >>> work
> > > >>> > >>> >>>>>> with
> > > >>> > >>> >>>>>>>> clients of any version. If turned on and clients are
> > of
> > > an
> > > >>> > older
> > > >>> > >>> >>>>>> version,
> > > >>> > >>> >>>>>>>> this would simply force reconnection, which should
> be
> > > ok.
> > > >>> > >>> >>> Additional
> > > >>> > >>> >>>>>>>> metrics to monitor expired connections may be useful
> > > >>> anyway.
> > > >>> > >>> >>>>>>>>
> > > >>> > >>> >>>>>>>> We also need to work out exactly how this works with
> > all
> > > >>> SASL
> > > >>> > >>> >>>>>> mechanisms.
> > > >>> > >>> >>>>>>>> In particular, we have ensure that we can make it
> work
> > > >>> with
> > > >>> > >>> >>> Kerberos
> > > >>> > >>> >>>>>>> since
> > > >>> > >>> >>>>>>>> we use the implementation from the JRE and I am not
> > sure
> > > >>> we
> > > >>> > have
> > > >>> > >>> >>> the
> > > >>> > >>> >>>>>> data
> > > >>> > >>> >>>>>>>> required on the broker or a way of extending the
> > > >>> mechanism.
> > > >>> > >>> >>>>>>>>
> > > >>> > >>> >>>>>>>> Thoughts?
> > > >>> > >>> >>>>>>>>
> > > >>> > >>> >>>>>>>> On Thu, Sep 13, 2018 at 6:56 PM, Ron Dagostino <
> > > >>> > >>> rndg...@gmail.com>
> > > >>> > >>> >>>>>>> wrote:
> > > >>> > >>> >>>>>>>>
> > > >>> > >>> >>>>>>>>> Hi Rajini.  I'm thinking about how we deal with
> > > >>> migration.
> > > >>> > >>>  For
> > > >>> > >>> >>>>>>> example,
> > > >>> > >>> >>>>>>>>> let's say we have an existing 2.0.0 cluster using
> > > >>> > >>> >>> SASL/OAUTHBEARER
> > > >>> > >>> >>>>>> and
> > > >>> > >>> >>>>>>> we
> > > >>> > >>> >>>>>>>>> want to use this feature.  The desired end state is
> > to
> > > >>> have
> > > >>> > all
> > > >>> > >>> >>>>>> clients
> > > >>> > >>> >>>>>>>> and
> > > >>> > >>> >>>>>>>>> brokers migrated to a version that supports the
> > feature
> > > >>> (call
> > > >>> > >>> it
> > > >>> > >>> >>>>>> 2.x)
> > > >>> > >>> >>>>>>>> with
> > > >>> > >>> >>>>>>>>> the feature turned on.  We need to document the
> > > supported
> > > >>> > >>> path(s)
> > > >>> > >>> >>>>>> to
> > > >>> > >>> >>>>>>> this
> > > >>> > >>> >>>>>>>>> end state.
> > > >>> > >>> >>>>>>>>>
> > > >>> > >>> >>>>>>>>> Here are a couple of scenarios with implications:
> > > >>> > >>> >>>>>>>>>
> > > >>> > >>> >>>>>>>>> 1) *Migrate client to 2.x and turn the feature on
> but
> > > >>> broker
> > > >>> > is
> > > >>> > >>> >>>>>> still
> > > >>> > >>> >>>>>>> at
> > > >>> > >>> >>>>>>>>> 2.0.*  In this case the client is going to get an
> > error
> > > >>> when
> > > >>> > it
> > > >>> > >>> >>>>>> sends
> > > >>> > >>> >>>>>>> the
> > > >>> > >>> >>>>>>>>> SaslHandshakeRequest.  We want to avoid this.  It
> > seems
> > > >>> to me
> > > >>> > >>> the
> > > >>> > >>> >>>>>>> client
> > > >>> > >>> >>>>>>>>> needs to know if the broker has been upgraded to
> the
> > > >>> > necessary
> > > >>> > >>> >>>>>> version
> > > >>> > >>> >>>>>>>>> before trying to re-authenticate, which makes me
> > > believe
> > > >>> we
> > > >>> > >>> need
> > > >>> > >>> >>> to
> > > >>> > >>> >>>>>>> bump
> > > >>> > >>> >>>>>>>>> the API version number in 2.x and the client has to
> > > >>> check to
> > > >>> > >>> see
> > > >>> > >>> >>>>>> if the
> > > >>> > >>> >>>>>>>> max
> > > >>> > >>> >>>>>>>>> version the broker supports meets a minimum
> standard
> > > >>> before
> > > >>> > >>> >>> trying
> > > >>> > >>> >>>>>> to
> > > >>> > >>> >>>>>>>>> re-authenticate.  Do you agree?
> > > >>> > >>> >>>>>>>>>
> > > >>> > >>> >>>>>>>>> 2) *Migrate broker to 2.x and turn the feature on
> but
> > > >>> client
> > > >>> > is
> > > >>> > >>> >>>>>> still
> > > >>> > >>> >>>>>>> at
> > > >>> > >>> >>>>>>>>> 2.0*.  In this case the broker is going to end up
> > > killing
> > > >>> > >>> >>>>>> connections
> > > >>> > >>> >>>>>>>>> periodically.   Again, we want to avoid this.  It's
> > > >>> tempting
> > > >>> > to
> > > >>> > >>> >>> say
> > > >>> > >>> >>>>>>>> "don't
> > > >>> > >>> >>>>>>>>> do this" but I wonder if we can require
> installations
> > > to
> > > >>> > >>> upgrade
> > > >>> > >>> >>>>>> all
> > > >>> > >>> >>>>>>>>> clients before turning the feature on at the
> brokers.
> > > >>> > >>> Certainly
> > > >>> > >>> >>>>>> we can
> > > >>> > >>> >>>>>>>> say
> > > >>> > >>> >>>>>>>>> "don't do this" for inter-broker clients -- in
> other
> > > >>> words,
> > > >>> > we
> > > >>> > >>> >>> can
> > > >>> > >>> >>>>>> say
> > > >>> > >>> >>>>>>>> that
> > > >>> > >>> >>>>>>>>> all brokers in a cluster should be upgraded before
> > the
> > > >>> > feature
> > > >>> > >>> is
> > > >>> > >>> >>>>>>> turned
> > > >>> > >>> >>>>>>>> on
> > > >>> > >>> >>>>>>>>> for any one of them -- but I don't know about our
> > > >>> ability to
> > > >>> > >>> say
> > > >>> > >>> >>>>>> it for
> > > >>> > >>> >>>>>>>>> non-broker clients.
> > > >>> > >>> >>>>>>>>>
> > > >>> > >>> >>>>>>>>> Now we consider the cases where both the brokers
> and
> > > the
> > > >>> > >>> clients
> > > >>> > >>> >>>>>> have
> > > >>> > >>> >>>>>>>> been
> > > >>> > >>> >>>>>>>>> upgraded to 2.x.  When and where should the feature
> > be
> > > >>> turned
> > > >>> > >>> on?
> > > >>> > >>> >>>>>> The
> > > >>> > >>> >>>>>>>>> inter-broker case is interesting because I don't
> > think
> > > >>> think
> > > >>> > we
> > > >>> > >>> >>> can
> > > >>> > >>> >>>>>>>> require
> > > >>> > >>> >>>>>>>>> installations to restart every broker with a new
> > config
> > > >>> where
> > > >>> > >>> the
> > > >>> > >>> >>>>>>> feature
> > > >>> > >>> >>>>>>>>> is turned on at the same time.  Do you agree that
> we
> > > >>> cannot
> > > >>> > >>> >>> require
> > > >>> > >>> >>>>>>> this
> > > >>> > >>> >>>>>>>>> and therefore must support installations restarting
> > > >>> brokers
> > > >>> > >>> with
> > > >>> > >>> >>> a
> > > >>> > >>> >>>>>> new
> > > >>> > >>> >>>>>>>>> config at whatever pace they need -- which may be
> > quite
> > > >>> slow
> > > >>> > >>> >>>>>> relative
> > > >>> > >>> >>>>>>> to
> > > >>> > >>> >>>>>>>>> token lifetimes?  Assuming you do agree, then there
> > is
> > > >>> going
> > > >>> > to
> > > >>> > >>> >>> be
> > > >>> > >>> >>>>>> the
> > > >>> > >>> >>>>>>>> case
> > > >>> > >>> >>>>>>>>> where some brokers are going to have the feature
> > turned
> > > >>> on
> > > >>> > and
> > > >>> > >>> >>> some
> > > >>> > >>> >>>>>>>> clients
> > > >>> > >>> >>>>>>>>> (definitely inter-broker clients at a minimum) are
> > > going
> > > >>> to
> > > >>> > >>> have
> > > >>> > >>> >>>>>> the
> > > >>> > >>> >>>>>>>>> feature turned off.
> > > >>> > >>> >>>>>>>>>
> > > >>> > >>> >>>>>>>>> The above discussion assumes a single config on the
> > > >>> broker
> > > >>> > side
> > > >>> > >>> >>>>>> that
> > > >>> > >>> >>>>>>>> turns
> > > >>> > >>> >>>>>>>>> on both the inter-broker client re-authentication
> > > >>> feature as
> > > >>> > >>> well
> > > >>> > >>> >>>>>> as
> > > >>> > >>> >>>>>>> the
> > > >>> > >>> >>>>>>>>> server-side expired-connection-kill feature, but
> now
> > > I'm
> > > >>> > >>> thinking
> > > >>> > >>> >>>>>> we
> > > >>> > >>> >>>>>>> need
> > > >>> > >>> >>>>>>>>> the ability to turn these features on
> independently,
> > > plus
> > > >>> > maybe
> > > >>> > >>> >>> we
> > > >>> > >>> >>>>>>> need a
> > > >>> > >>> >>>>>>>>> way to monitor the number of "active, expired"
> > > >>> connections to
> > > >>> > >>> the
> > > >>> > >>> >>>>>>> server
> > > >>> > >>> >>>>>>>> so
> > > >>> > >>> >>>>>>>>> that operators can be sure that all clients have
> been
> > > >>> > >>> >>>>>> upgraded/enabled
> > > >>> > >>> >>>>>>>>> before turning on the server-side
> > > expired-connection-kill
> > > >>> > >>> >>> feature.
> > > >>> > >>> >>>>>>>>>
> > > >>> > >>> >>>>>>>>> So the migration plan would be as follows:
> > > >>> > >>> >>>>>>>>>
> > > >>> > >>> >>>>>>>>> 1) Upgrade all brokers to 2.x.
> > > >>> > >>> >>>>>>>>> 2) After all brokers are upgraded, turn on
> > > >>> re-authentication
> > > >>> > >>> for
> > > >>> > >>> >>>>>> all
> > > >>> > >>> >>>>>>>>> brokers at whatever rate is desired -- just
> > eventually,
> > > >>> at
> > > >>> > some
> > > >>> > >>> >>>>>> point,
> > > >>> > >>> >>>>>>>> get
> > > >>> > >>> >>>>>>>>> the client-side feature turned on for all brokers
> so
> > > that
> > > >>> > >>> >>>>>> inter-broker
> > > >>> > >>> >>>>>>>>> connections are re-authenticating.
> > > >>> > >>> >>>>>>>>> 3) In parallel with (1) and (2) above, upgrade
> > clients
> > > >>> to 2.x
> > > >>> > >>> and
> > > >>> > >>> >>>>>> turn
> > > >>> > >>> >>>>>>>>> their re-authentication feature on.  Clients will
> > check
> > > >>> the
> > > >>> > API
> > > >>> > >>> >>>>>> version
> > > >>> > >>> >>>>>>>> and
> > > >>> > >>> >>>>>>>>> only re-authenticate to a broker that has also been
> > > >>> upgraded
> > > >>> > >>> >>> (note
> > > >>> > >>> >>>>>> that
> > > >>> > >>> >>>>>>>> the
> > > >>> > >>> >>>>>>>>> ability of a broker to respond to a
> re-authentication
> > > >>> cannot
> > > >>> > be
> > > >>> > >>> >>>>>> turned
> > > >>> > >>> >>>>>>>> off
> > > >>> > >>> >>>>>>>>> -- it is always on beginning with version 2.x, and
> it
> > > >>> just
> > > >>> > sits
> > > >>> > >>> >>>>>> there
> > > >>> > >>> >>>>>>>> doing
> > > >>> > >>> >>>>>>>>> nothing if it isn't exercised by an enabled client)
> > > >>> > >>> >>>>>>>>> 4) After (1), (2), and (3) are complete, check the
> > 2.x
> > > >>> broker
> > > >>> > >>> >>>>>> metrics
> > > >>> > >>> >>>>>>> to
> > > >>> > >>> >>>>>>>>> confirm that there are no "active, expired"
> > > connections.
> > > >>> > Once
> > > >>> > >>> >>> you
> > > >>> > >>> >>>>>> are
> > > >>> > >>> >>>>>>>>> satisfied that (1), (2), and (3) are indeed
> complete
> > > you
> > > >>> can
> > > >>> > >>> >>>>>> enable the
> > > >>> > >>> >>>>>>>>> server-side expired-connection-kill feature on each
> > > >>> broker
> > > >>> > via
> > > >>> > >>> a
> > > >>> > >>> >>>>>>> restart
> > > >>> > >>> >>>>>>>> at
> > > >>> > >>> >>>>>>>>> whatever pace is desired.
> > > >>> > >>> >>>>>>>>>
> > > >>> > >>> >>>>>>>>> Comments?
> > > >>> > >>> >>>>>>>>>
> > > >>> > >>> >>>>>>>>> Ron
> > > >>> > >>> >>>>>>>>>
> > > >>> > >>> >>>>>>>>>
> > > >>> > >>> >>>>>>>>> On Wed, Sep 12, 2018 at 4:48 PM Ron Dagostino <
> > > >>> > >>> rndg...@gmail.com
> > > >>> > >>> >>>>
> > > >>> > >>> >>>>>>> wrote:
> > > >>> > >>> >>>>>>>>>
> > > >>> > >>> >>>>>>>>>> Ok, I am tempted to just say we go with the
> > low-level
> > > >>> > approach
> > > >>> > >>> >>>>>> since
> > > >>> > >>> >>>>>>> it
> > > >>> > >>> >>>>>>>>> is
> > > >>> > >>> >>>>>>>>>> the quickest and seems to meet the clear
> > requirements.
> > > >>> We
> > > >>> > can
> > > >>> > >>> >>>>>> always
> > > >>> > >>> >>>>>>>>> adjust
> > > >>> > >>> >>>>>>>>>> later if we get to clarity on other requirements
> or
> > we
> > > >>> > decide
> > > >>> > >>> >>> we
> > > >>> > >>> >>>>>> need
> > > >>> > >>> >>>>>>>> to
> > > >>> > >>> >>>>>>>>>> approach it differently for whatever reason.  But
> in
> > > the
> > > >>> > >>> >>>>>> meantime,
> > > >>> > >>> >>>>>>>> before
> > > >>> > >>> >>>>>>>>>> fully committing to this decision, I would
> > appreciate
> > > >>> > another
> > > >>> > >>> >>>>>>>> perspective
> > > >>> > >>> >>>>>>>>>> if someone has one.
> > > >>> > >>> >>>>>>>>>>
> > > >>> > >>> >>>>>>>>>> Ron
> > > >>> > >>> >>>>>>>>>>
> > > >>> > >>> >>>>>>>>>> On Wed, Sep 12, 2018 at 3:15 PM Rajini Sivaram <
> > > >>> > >>> >>>>>>>> rajinisiva...@gmail.com>
> > > >>> > >>> >>>>>>>>>> wrote:
> > > >>> > >>> >>>>>>>>>>
> > > >>> > >>> >>>>>>>>>>> Hi Ron,
> > > >>> > >>> >>>>>>>>>>>
> > > >>> > >>> >>>>>>>>>>> Yes, I would leave out retries from this KIP for
> > now.
> > > >>> In
> > > >>> > the
> > > >>> > >>> >>>>>> future,
> > > >>> > >>> >>>>>>>> if
> > > >>> > >>> >>>>>>>>>>> there is a requirement for supporting retries, we
> > can
> > > >>> > >>> consider
> > > >>> > >>> >>>>>> it. I
> > > >>> > >>> >>>>>>>>> think
> > > >>> > >>> >>>>>>>>>>> we can support retries with either approach if we
> > > >>> needed
> > > >>> > to,
> > > >>> > >>> >>>>>> but it
> > > >>> > >>> >>>>>>>>> would
> > > >>> > >>> >>>>>>>>>>> be better to do it along with other changes
> > required
> > > to
> > > >>> > >>> >>> support
> > > >>> > >>> >>>>>>>>>>> authentication servers that are not highly
> > available.
> > > >>> > >>> >>>>>>>>>>>
> > > >>> > >>> >>>>>>>>>>> For maintainability, I am biased, so it will be
> > good
> > > >>> to get
> > > >>> > >>> >>> the
> > > >>> > >>> >>>>>>>>>>> perspective
> > > >>> > >>> >>>>>>>>>>> of others in the community :-)
> > > >>> > >>> >>>>>>>>>>>
> > > >>> > >>> >>>>>>>>>>> On Wed, Sep 12, 2018 at 7:47 PM, Ron Dagostino <
> > > >>> > >>> >>>>>> rndg...@gmail.com>
> > > >>> > >>> >>>>>>>>> wrote:
> > > >>> > >>> >>>>>>>>>>>
> > > >>> > >>> >>>>>>>>>>>> Hi Rajini.  Here is some feedback/some things I
> > > >>> thought
> > > >>> > of.
> > > >>> > >>> >>>>>>>>>>>>
> > > >>> > >>> >>>>>>>>>>>> First, I just realized that from a timing
> > > perspective
> > > >>> > that I
> > > >>> > >>> >>>>>> am
> > > >>> > >>> >>>>>>> not
> > > >>> > >>> >>>>>>>>> sure
> > > >>> > >>> >>>>>>>>>>>> retry is going to be necessary.  The background
> > > login
> > > >>> > >>> >>> refresh
> > > >>> > >>> >>>>>>> thread
> > > >>> > >>> >>>>>>>>>>>> triggers re-authentication when it refreshes the
> > > >>> client's
> > > >>> > >>> >>>>>>>> credential.
> > > >>> > >>> >>>>>>>>>>> The
> > > >>> > >>> >>>>>>>>>>>> OAuth infrastructure has to be available in
> order
> > > for
> > > >>> this
> > > >>> > >>> >>>>>> refresh
> > > >>> > >>> >>>>>>>> to
> > > >>> > >>> >>>>>>>>>>>> succeed (the background thread repeatedly
> retries
> > if
> > > >>> it
> > > >>> > >>> >>> can't
> > > >>> > >>> >>>>>>>> refresh
> > > >>> > >>> >>>>>>>>>>> the
> > > >>> > >>> >>>>>>>>>>>> credential, and that retry loop handles any
> > > temporary
> > > >>> > >>> >>>>>> outage).  If
> > > >>> > >>> >>>>>>>>>>> clients
> > > >>> > >>> >>>>>>>>>>>> are told to re-authenticate when the credential
> is
> > > >>> > refreshed
> > > >>> > >>> >>>>>> **and
> > > >>> > >>> >>>>>>>>> they
> > > >>> > >>> >>>>>>>>>>>> actually re-authenticate at that point** then it
> > is
> > > >>> highly
> > > >>> > >>> >>>>>>> unlikely
> > > >>> > >>> >>>>>>>>> that
> > > >>> > >>> >>>>>>>>>>>> the OAuth infrastructure would fail within those
> > > >>> > intervening
> > > >>> > >>> >>>>>>>>>>> milliseconds.
> > > >>> > >>> >>>>>>>>>>>> So we don't need re-authentication retry in this
> > KIP
> > > >>> as
> > > >>> > long
> > > >>> > >>> >>>>>> as
> > > >>> > >>> >>>>>>>> retry
> > > >>> > >>> >>>>>>>>>>>> starts immediately.  The low-level prototype as
> > > >>> currently
> > > >>> > >>> >>>>>> coded
> > > >>> > >>> >>>>>>>>> doesn't
> > > >>> > >>> >>>>>>>>>>>> actually start re-authentication until the
> > > connection
> > > >>> is
> > > >>> > >>> >>>>>>>> subsequently
> > > >>> > >>> >>>>>>>>>>> used,
> > > >>> > >>> >>>>>>>>>>>> and that could take a while.  But then again, if
> > the
> > > >>> > >>> >>>>>> connection
> > > >>> > >>> >>>>>>>> isn't
> > > >>> > >>> >>>>>>>>>>> used
> > > >>> > >>> >>>>>>>>>>>> for some period of time, then the lost value of
> > > >>> having to
> > > >>> > >>> >>>>>> abandon
> > > >>> > >>> >>>>>>>> the
> > > >>> > >>> >>>>>>>>>>>> connection is lessened anyway.  Plus, as you
> > pointed
> > > >>> out,
> > > >>> > >>> >>>>>> OAuth
> > > >>> > >>> >>>>>>>>>>>> infrastructure is assumed to be highly
> available.
> > > >>> > >>> >>>>>>>>>>>>
> > > >>> > >>> >>>>>>>>>>>> Does this makes sense, and would you be willing
> to
> > > say
> > > >>> > that
> > > >>> > >>> >>>>>> retry
> > > >>> > >>> >>>>>>>>> isn't
> > > >>> > >>> >>>>>>>>>>> a
> > > >>> > >>> >>>>>>>>>>>> necessary requirement?  I'm tempted but would
> like
> > > to
> > > >>> hear
> > > >>> > >>> >>>>>> your
> > > >>> > >>> >>>>>>>>>>> perspective
> > > >>> > >>> >>>>>>>>>>>> first.
> > > >>> > >>> >>>>>>>>>>>>
> > > >>> > >>> >>>>>>>>>>>> Interleaving/latency and maintainability (more
> > than
> > > >>> lines
> > > >>> > of
> > > >>> > >>> >>>>>> code)
> > > >>> > >>> >>>>>>>> are
> > > >>> > >>> >>>>>>>>>>> the
> > > >>> > >>> >>>>>>>>>>>> two remaining areas of comparison.  I did not
> add
> > > the
> > > >>> > >>> >>>>>>>> maintainability
> > > >>> > >>> >>>>>>>>>>> issue
> > > >>> > >>> >>>>>>>>>>>> to the KIP yet, but before I do I thought I
> would
> > > >>> address
> > > >>> > it
> > > >>> > >>> >>>>>> here
> > > >>> > >>> >>>>>>>>> first
> > > >>> > >>> >>>>>>>>>>> to
> > > >>> > >>> >>>>>>>>>>>> see if we can come to consensus on it because
> I'm
> > > not
> > > >>> > sure I
> > > >>> > >>> >>>>>> see
> > > >>> > >>> >>>>>>> the
> > > >>> > >>> >>>>>>>>>>>> high-level approach as being hard to maintain
> (yet
> > > --
> > > >>> I
> > > >>> > >>> >>> could
> > > >>> > >>> >>>>>> be
> > > >>> > >>> >>>>>>>>>>>> convinced/convince myself; we'll see).  I do
> want
> > to
> > > >>> make
> > > >>> > >>> >>>>>> sure we
> > > >>> > >>> >>>>>>>> are
> > > >>> > >>> >>>>>>>>> on
> > > >>> > >>> >>>>>>>>>>>> the same page about what is required to add
> > > >>> > >>> >>> re-authentication
> > > >>> > >>> >>>>>>>> support
> > > >>> > >>> >>>>>>>>> in
> > > >>> > >>> >>>>>>>>>>>> the high-level case.  Granted, the amount is
> zero
> > in
> > > >>> the
> > > >>> > >>> >>>>>> low-level
> > > >>> > >>> >>>>>>>>> case,
> > > >>> > >>> >>>>>>>>>>>> and it doesn't get any better that that, but the
> > > >>> amount in
> > > >>> > >>> >>> the
> > > >>> > >>> >>>>>>>>>>> high-level
> > > >>> > >>> >>>>>>>>>>>> case is very low -- just a few lines of code.
> For
> > > >>> > example:
> > > >>> > >>> >>>>>>>>>>>>
> > > >>> > >>> >>>>>>>>>>>> KafkaAdminClient:
> > > >>> > >>> >>>>>>>>>>>>
> > https://github.com/apache/kafka/pull/5582/commits/
> > > >>> > >>> >>> 4fa70f38b9
> > > >>> > >>> >>>>>>>>>>>> d33428ff98b64a3a2bd668f5f28c38#diff-
> > > >>> > >>> >>>>>>> 6869b8fccf6b098cbcb0676e8ceb26a7
> > > >>> > >>> >>>>>>>>>>>> It is the same few lines of code for
> > KafkaConsumer,
> > > >>> > >>> >>>>>> KafkaProducer,
> > > >>> > >>> >>>>>>>>>>>> WorkerGroupMember, and
> > > TransactionMarkerChannelManager
> > > >>> > >>> >>>>>>>>>>>>
> > > >>> > >>> >>>>>>>>>>>> The two synchronous I/O use cases are
> > > >>> > >>> >>>>>> ControllerChannelManager and
> > > >>> > >>> >>>>>>>>>>>> ReplicaFetcherBlockingSend (via
> > > >>> ReplicaFetcherThread), and
> > > >>> > >>> >>>>>> they
> > > >>> > >>> >>>>>>>>> require
> > > >>> > >>> >>>>>>>>>>> a
> > > >>> > >>> >>>>>>>>>>>> little bit more -- but not much.
> > > >>> > >>> >>>>>>>>>>>>
> > > >>> > >>> >>>>>>>>>>>> Thoughts?
> > > >>> > >>> >>>>>>>>>>>>
> > > >>> > >>> >>>>>>>>>>>> Ron
> > > >>> > >>> >>>>>>>>>>>>
> > > >>> > >>> >>>>>>>>>>>> On Wed, Sep 12, 2018 at 1:57 PM Ron Dagostino <
> > > >>> > >>> >>>>>> rndg...@gmail.com>
> > > >>> > >>> >>>>>>>>>>> wrote:
> > > >>> > >>> >>>>>>>>>>>>
> > > >>> > >>> >>>>>>>>>>>>> Thanks, Rajini.  Before I digest/respond to
> that,
> > > >>> here's
> > > >>> > >>> >>> an
> > > >>> > >>> >>>>>>> update
> > > >>> > >>> >>>>>>>>>>> that I
> > > >>> > >>> >>>>>>>>>>>>> just completed.
> > > >>> > >>> >>>>>>>>>>>>>
> > > >>> > >>> >>>>>>>>>>>>> I added a commit to the PR (
> > > >>> > >>> >>>>>>>>>>> https://github.com/apache/kafka/pull/5582/)
> > > >>> > >>> >>>>>>>>>>>>> that implements server-side kill of expired
> > > >>> OAUTHBEARER
> > > >>> > >>> >>>
>

Reply via email to