Re: [DISCUSS] KIP 368: Allow SASL Connections to Periodically Re-Authenticate

Ron Dagostino Mon, 24 Sep 2018 18:23:58 -0700

Hi everyone.  Jun raised a good point/discovered an oversight in the KIP
during the VOTE thread that we must resolve.


Regarding this statement in the KIP:

"A client-side metric will be created that documents the latency
imposed by re-authentication."

Jun correctly asked:
What's the name of this metric? Does it measure avg or max?

My initial reaction is to measure both:
*reauthentication-latency-ms-{avg,max}*.  Any thoughts?

Ron

On Wed, Sep 19, 2018 at 9:08 AM Ron Dagostino <[email protected]> wrote:

> Thanks, Rajini -- I updated the KIP to fix this.
>
> Ron
>
> On Wed, Sep 19, 2018 at 4:54 AM Rajini Sivaram <[email protected]>
> wrote:
>
>> I should have said `security configs` instead of `channel configs`.
>>
>> The KIP says:
>>
>>    - The configuration option this KIP proposes to add to enable
>>    server-side expired-connection-kill is '*connections.max.reauth.ms
>>    <http://connections.max.reauth.ms>*' (not prefixed with listener
>> prefix
>>    or SASL mechanism name – this is a single value for the cluster)
>>    - The '*connections.max.reauth.ms <http://connections.max.reauth.ms>*'
>>    configuration option will not be dynamically changeable; restarts will
>> be
>>    required if the value is to be changed.  However, if a new listener is
>>    dynamically added, the value could be set for that listener at that
>> time.
>>
>> Those statements are contradictory. Perhaps the first one should say `it
>> may be optionally prefixed with the listener name`?
>>
>> On Tue, Sep 18, 2018 at 3:55 PM, Ron Dagostino <[email protected]> wrote:
>>
>> > HI Rajini.  The KIP is updated as summarized below, and I will start a
>> vote
>> > immediately.
>> >
>> > <<<It may be useful to have a metric of expired connections killed
>> > Ok, agreed.  I called it expired-connections-killed-total
>> >
>> > <<<For `successful-v0-authentication-{rate,total}`, we probably want
>> > <<<version as a tag rather in the name.
>> > Ok, agreed.  I kept existing metrics unchanged but added an additional
>> tag
>> > to the V0 metrics so they are separate.
>> >
>> > <<<Not sure if we need four of these
>> > <<<(rate/total with success/failure). Perhaps just success/total is
>> > sufficient?
>> > Ok, agreed, just kept the successful total.
>> >
>> > <<<For the session lifetime config, we don't need to require a listener
>> or
>> > <<<mechanism prefix
>> > Ok, agreed, the config is now cluster-wide.
>> >
>> > <<<For all channel configs, we allow an optional listener prefix,
>> > <<<so we should do the same here
>> > Not sure what this is referring to.  We don't have channel configs here,
>> > right?
>> >
>> > <<<The KIP says connections are terminated on requests not related to
>> > <<<re-authentication (ApiVersionsRequest, SaslHandshakeRequest, and
>> > <<<SaslAuthenticateRequest). We can skip for ApiVersionsRequest for
>> > <<<re-authentication, so that doesn't need to be in the list.
>> > Yes, I was planning on that optimization; agreed, I removed it from the
>> > list
>> >
>> > <<<we allow new listeners
>> > <<<to be added dynamically and all configs for the listener can be added
>> > <<<dynamically (with the listener prefix). I think we want to allow that
>> > for
>> > <<<this config
>> > <<<We should mention this in the KIP, though in terms of
>> implementation, I
>> > <<<would leave that for a separate JIRA (it doesn't need to be
>> implemented
>> > at
>> > <<<the same time).
>> > Ok, agreed
>> >
>> >  Thanks again for all the feedback and discussion.
>> >
>> > Ron
>> >
>> > On Tue, Sep 18, 2018 at 6:43 AM Rajini Sivaram <[email protected]
>> >
>> > wrote:
>> >
>> > > Hi Ron,
>> > >
>> > > Thanks for the updates. The KIP looks good. A few comments and minor
>> > points
>> > > below, but feel free to start vote to try and get it into 2.1.0. More
>> > > community feedback will be really useful.
>> > >
>> > > 1) It may be useful to have a metric of expired connections killed by
>> the
>> > > broker. There could be a client implementation that doesn't support
>> > > re-authentications, but happens to use the latest version of
>> > > SaslAuthenticateRequest. Or cases where re-authentication didn't
>> happen
>> > on
>> > > time.
>> > >
>> > > 2) For `successful-v0-authentication-{rate,total}`, we probably want
>> > > version as a tag rather in the name. Not sure if we need four of these
>> > > (rate/total with success/failure). Perhaps just success/total is
>> > > sufficient?
>> > >
>> > > 3) For the session lifetime config, we don't need to require a
>> listener
>> > or
>> > > mechanism prefix. In most cases, we would expect a single config on
>> the
>> > > broker-side. For all channel configs, we allow an optional listener
>> > prefix,
>> > > so we should do the same here.
>> > >
>> > > 4) The KIP says connections are terminated on requests not related to
>> > > re-authentication (ApiVersionsRequest, SaslHandshakeRequest, and
>> > > SaslAuthenticateRequest). We can skip for ApiVersionsRequest for
>> > > re-authentication, so that doesn't need to be in the list.
>> > >
>> > > 5) The KIP says that the new config will not be dynamically
>> updatable. We
>> > > have a very limited set of configs that are dynamically updatable for
>> an
>> > > existing listener. And we don't want to add this config to the list
>> since
>> > > we don't expect this value to change frequently. But we allow new
>> > listeners
>> > > to be added dynamically and all configs for the listener can be added
>> > > dynamically (with the listener prefix). I think we want to allow that
>> for
>> > > this config (i.e. add a new OAuth listener with re-authentication
>> > enabled).
>> > > We should mention this in the KIP, though in terms of implementation,
>> I
>> > > would leave that for a separate JIRA (it doesn't need to be
>> implemented
>> > at
>> > > the same time).
>> > >
>> > >
>> > >
>> > > On Tue, Sep 18, 2018 at 3:06 AM, Ron Dagostino <[email protected]>
>> > wrote:
>> > >
>> > > > HI again, Rajini.  Would we ever want the max session time to be
>> > > different
>> > > > across different SASL mechanisms?  I'm wondering, now that we are
>> > > > supporting all SASL mechanisms via this KIP, if we still need to
>> prefix
>> > > > this config with the "[listener].[mechanism]." prefix.  I've kept
>> the
>> > > > prefix in the KIP for now, but it would be easier to just set it
>> once
>> > for
>> > > > all mechanisms, and I don't see that as being a problem.  Let me
>> know
>> > > what
>> > > > you think.
>> > > >
>> > > > Ron
>> > > >
>> > > > On Mon, Sep 17, 2018 at 9:51 PM Ron Dagostino <[email protected]>
>> > wrote:
>> > > >
>> > > > > Hi Rajini.  The KIP is updated.  Aside from a once-over to make
>> sure
>> > it
>> > > > is
>> > > > > all accurate, I think we need to confirm the metrics.  The
>> decision
>> > to
>> > > > not
>> > > > > reject authentications that use tokens with too-long a lifetime
>> > allowed
>> > > > the
>> > > > > metrics to be simpler.  I decided that in addition to tracking
>> these
>> > > > > metrics on the broker:
>> > > > >
>> > > > > failed-reauthentication-{rate,total} and
>> > > > > successful-reauthentication-{rate,total}
>> > > > >
>> > > > > we simply need one more set of broker metrics to track the subset
>> of
>> > > > > clients clients that are not upgraded to v2.1.0 and are still
>> using a
>> > > V0
>> > > > > SaslAuthenticateRequest:
>> > > > >
>> > > > > failed-v0-authentication-{rate,total} and
>> > > > > successful-v0-authentication-{rate,total}
>> > > > >
>> > > > > See the Migration section of the KIP for details of how this
>> would be
>> > > > used.
>> > > > >
>> > > > > I wonder if we need a broker metric documenting the number of
>> > "expired"
>> > > > > sessions killed by the broker since it would be the same as
>> > > > > successful-v0-authentication-total. I've eliminated that from the
>> > KIP
>> > > > for
>> > > > > now.  Thoughts?
>> > > > >
>> > > > > There is also a client-side metric for re-authentication latency
>> > > tracking
>> > > > > (still unnamed -- do you have a preference?)
>> > > > >
>> > > > > I think we're close to being able to put this KIP up for a vote.
>> > > > >
>> > > > > Ron
>> > > > >
>> > > > >
>> > > > > On Mon, Sep 17, 2018 at 2:45 PM Ron Dagostino <[email protected]>
>> > > wrote:
>> > > > >
>> > > > >> <<<extra field alongside errors, not in the opaque body as
>> mentioned
>> > > in
>> > > > >> the KIP
>> > > > >> Right, that was a concern I had mentioned in a follow-up email.
>> > Agree
>> > > > it
>> > > > >> should go alongside errors.
>> > > > >>
>> > > > >> <<<SCRAM for delegation token based authentication where
>> credentials
>> > > > have
>> > > > >> <<<an expiry time, so we do need to be able to set individual
>> > > credential
>> > > > >> <<<lifetimes, where the server knows the expiry time but client
>> may
>> > > not
>> > > > >> Ah, ok, I was not aware that this case existed.  Agreed, for
>> > > > consistency,
>> > > > >> server will always send it back to the client.
>> > > > >>
>> > > > >> <<<I was trying to avoid this altogether [SaslClient negotiated
>> > > property
>> > > > >> for lifetime]
>> > > > >> Since we now agree that the server will send it back, yes, there
>> is
>> > no
>> > > > >> need for this.
>> > > > >>
>> > > > >> <<<wasn't entirely clear about the purpose of
>> > > > >> [sasl.login.refresh.reauthenticate.enable]
>> > > > >> I think this might have been more useful when we weren't
>> necessarily
>> > > > >> going to support all SASL mechanisms and/or when the broker was
>> not
>> > > > going
>> > > > >> to advertise the fact that it supported re-authentication.  You
>> are
>> > > > >> correct, now that we support it for all SASL mechanisms and we
>> are
>> > > > bumping
>> > > > >> an API version, I think it is okay to simply enable it wherever
>> both
>> > > the
>> > > > >> client and server meet the required versions.
>> > > > >>
>> > > > >> <<<wasn't entirely clear about the purpose of [
>> > > > connections.max.reauth.ms]
>> > > > >> <<<wasn't expecting that we would reject
>> > > > >> <<<tokens or tickets simply because they were too long-lived
>> > > > >> <<<tickets are granted by a 3rd party authority
>> > > > >> <<<Client re-authenticates even though token was not
>> > > > >> <<<refreshed. Does this matter?
>> > > > >> I was going under the assumption that it would matter, but based
>> on
>> > > your
>> > > > >> pushback I just realized that the same functionality can be
>> > > implemented
>> > > > as
>> > > > >> part of token validation if there is a desire to limit token
>> > lifetimes
>> > > > to a
>> > > > >> certain max value (and the token validator has to be provided in
>> > > > production
>> > > > >> anyway since all we provide out-of-the-box is the unsecured
>> > > > validator).  So
>> > > > >> I'm willing to abandon this check as part of re-authentication.
>> > > > >>
>> > > > >> I'll adjust the KIP accordingly a bit later.  Thanks for the
>> > continued
>> > > > >> feedback/discussion.
>> > > > >>
>> > > > >> Ron
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >> On Mon, Sep 17, 2018 at 2:10 PM Rajini Sivaram <
>> > > [email protected]
>> > > > >
>> > > > >> wrote:
>> > > > >>
>> > > > >>> Hi Ron,
>> > > > >>>
>> > > > >>>
>> > > > >>> *1) Is there a reason to communicate this value back to the
>> client
>> > > when
>> > > > >>> the
>> > > > >>> client already knows it?  It's an extra network round-trip, and
>> > since
>> > > > the
>> > > > >>> SASL round-tripsare defined by the spec I'm not certain adding
>> an
>> > > extra
>> > > > >>> round-trip is acceptable.*
>> > > > >>>
>> > > > >>> I wasn't suggesting an extra round-trip for propagating session
>> > > > >>> lifetime. I
>> > > > >>> was expecting session lifetime to be added to the last
>> > > > SASL_AUTHENTICATE
>> > > > >>> response from the broker. Because SASL is a challenge-response
>> > > > mechanism,
>> > > > >>> SaslServer knows when the response being sent is the last one
>> and
>> > > hence
>> > > > >>> can
>> > > > >>> send the session lifetime in the response (in the same way as we
>> > > > >>> propagate
>> > > > >>> errors). I was expecting this to be added as an extra field
>> > alongside
>> > > > >>> errors, not in the opaque body as mentioned in the KIP. The
>> opaque
>> > > byte
>> > > > >>> array strictly conforms to the SASL mechanism wire protocol and
>> we
>> > > want
>> > > > >>> to
>> > > > >>> keep it that way.
>> > > > >>>
>> > > > >>> As you have said, we don't need server to propagate session
>> > lifetime
>> > > > for
>> > > > >>> OAUTHBEARER since client knows the token lifetime. But server
>> also
>> > > > knows
>> > > > >>> the credential lifetime and by having the server decide the
>> > lifetime,
>> > > > we
>> > > > >>> can use the same code path for all mechanisms. If we only have a
>> > > > constant
>> > > > >>> max lifetime in the server, for PLAIN and SCRAM we will end up
>> > having
>> > > > the
>> > > > >>> same lifetime for all credentials with no ability to set actual
>> > > expiry.
>> > > > >>> We
>> > > > >>> use SCRAM for delegation token based authentication where
>> > credentials
>> > > > >>> have
>> > > > >>> an expiry time, so we do need to be able to set individual
>> > credential
>> > > > >>> lifetimes, where the server knows the expiry time but client may
>> > not.
>> > > > >>>
>> > > > >>> *2) I also just realized that if the client is to learn the
>> > > credential
>> > > > >>> lifetime we wouldn't want to put special-case code in the
>> > > Authenticator
>> > > > >>> for
>> > > > >>> GSSAPI and OAUTHBEARER; we would want to expose the value
>> > > generically,
>> > > > >>> probably as a negotiated property on the SaslClient instance.*
>> > > > >>>
>> > > > >>> I was trying to avoid this altogether. Client doesn't need to
>> know
>> > > > >>> credential lifetime. Server asks the client to re-authenticate
>> > within
>> > > > its
>> > > > >>> session lifetime.
>> > > > >>>
>> > > > >>> 3) From the KIP, I wasn't entirely clear about the purpose of
>> the
>> > two
>> > > > >>> configs:
>> > > > >>>
>> > > > >>> sasl.login.refresh.reauthenticate.enable: Do we need this?
>> Client
>> > > > knows
>> > > > >>> if
>> > > > >>> broker version supports re-authentication based on the
>> > > > SASL_AUTHENTICATE
>> > > > >>> version returned in ApiVersionsResponse. Client knows if broker
>> is
>> > > > >>> configured to enable re-authentication based on session lifetime
>> > > > returned
>> > > > >>> in SaslAuthenticateResponse. If broker has re-authentication
>> > > configured
>> > > > >>> and
>> > > > >>> client supports re-authentication, you would always want
>> > > > re-authenticate.
>> > > > >>> So I wasn't sure why we need a config to opt in or out on the
>> > > > >>> client-side.
>> > > > >>>
>> > > > >>>
>> > > > >>> connections.max.reauth.ms: We obviously need a broker-side
>> config.
>> > > Not
>> > > > >>> entirely sure about the semantics of the config that drives
>> > > > >>> re-authentication. In particular, I wasn't expecting that we
>> would
>> > > > reject
>> > > > >>> tokens or tickets simply because they were too long-lived. Since
>> > > tokens
>> > > > >>> or
>> > > > >>> tickets are granted by a 3rd party authority, I am not sure if
>> > > clients
>> > > > >>> will
>> > > > >>> always have control over the lifetime. Do we need to support any
>> > more
>> > > > >>> scenarios than these:
>> > > > >>>
>> > > > >>>
>> > > > >>> A) reauth.ms=10,credential.lifetime.ms=10 : Broker sets
>> > > > >>> session.lifetime=10,
>> > > > >>> so this works.
>> > > > >>>
>> > > > >>> B) reauth.ms=10, credential.lifetime.ms=5 : Broker sets
>> > > > >>> session.lifetime=5,
>> > > > >>> so this works.
>> > > > >>> C) reauth.ms=10, credential.lifetime.ms=20 : Broker sets
>> > > > >>> session.lifetime=10. Client re-authenticates even though token
>> was
>> > > not
>> > > > >>> refreshed. Does this matter?
>> > > > >>> D) reauth.ms=Long.MAX_VALUE, credential.lifetime.ms=10: Broker
>> > sets
>> > > > >>> session.lifetime=10, client re-authenticates based on credential
>> > > > expiry.
>> > > > >>> E) reauth.ms=0 (default), credential.lifetime.ms=10 : Broker
>> sets
>> > > > >>> session.lifetime=0, Broker doesn't terminate sessions, client
>> > doesn't
>> > > > >>> re-authenticate. We generate useful metrics.
>> > > > >>> F) reauth.ms=0 (default),no lifetime for credential (e.g.
>> PLAIN):
>> > > > Broker
>> > > > >>> sets session.lifetime=0, Broker doesn't terminate sessions,
>> client
>> > > > >>> doesn't
>> > > > >>> re-authenticate
>> > > > >>> G) reauth.ms=10,no lifetime for credential (e.g. PLAIN) :
>> Broker
>> > > sets
>> > > > >>> session.lifetime=10. Client re-authenticates.
>> > > > >>>
>> > > > >>> I would have thought that D) is the typical scenario for
>> > > OAuth/Kerberos
>> > > > >>> to
>> > > > >>> respect token expiry time. G) would be typical scenario for
>> PLAIN
>> > to
>> > > > >>> force
>> > > > >>> re-authenication at regular intervals. A/B/C are useful to force
>> > > > >>> re-authentication in scenarios where you might check for
>> credential
>> > > > >>> revocation in the server. And E/F are useful to disable
>> > > > re-authentication
>> > > > >>> and generate metrics (also the default behaviour useful during
>> > > > >>> migration).
>> > > > >>> Have I missed something?
>> > > > >>>
>> > > > >>>
>> > > > >>> On Mon, Sep 17, 2018 at 4:27 PM, Ron Dagostino <
>> [email protected]>
>> > > > >>> wrote:
>> > > > >>>
>> > > > >>> > Hi yet again, Rajini.  I also just realized that if the
>> client is
>> > > to
>> > > > >>> learn
>> > > > >>> > the credential lifetime we wouldn't want to put special-case
>> code
>> > > in
>> > > > >>> the
>> > > > >>> > Authenticator for GSSAPI and OAUTHBEARER; we would want to
>> expose
>> > > the
>> > > > >>> value
>> > > > >>> > generically, probably as a negotiated property on the
>> SaslClient
>> > > > >>> instance.
>> > > > >>> > We might be talking about making the negotiated property key
>> part
>> > > of
>> > > > >>> the
>> > > > >>> > public API.  In other words, the SaslClient would be
>> responsible
>> > > for
>> > > > >>> > exposing the credential (i.e. token or ticket) lifetime at a
>> > > > well-known
>> > > > >>> > negotiated property name, such as "Credential.Lifetime" and
>> > > putting a
>> > > > >>> Long
>> > > > >>> > value there if there is a lifetime.  That well-klnown key
>> (e.g.
>> > > > >>> > "Credential.Lifetime") would be part of the public API, right?
>> > > > >>> >
>> > > > >>> > Ron
>> > > > >>> >
>> > > > >>> > On Mon, Sep 17, 2018 at 11:03 AM Ron Dagostino <
>> > [email protected]>
>> > > > >>> wrote:
>> > > > >>> >
>> > > > >>> > > Hi again, Rajini.  After thinking about this a little
>> while, it
>> > > > >>> occurs to
>> > > > >>> > > me that maybe the communication of max session lifetime
>> should
>> > > > occur
>> > > > >>> via
>> > > > >>> > > SASL_HANDSHAKE after all.  Here's why.  The value
>> communicated
>> > is
>> > > > >>> the max
>> > > > >>> > > session lifetime allowed, and the client can assume it is
>> the
>> > the
>> > > > >>> session
>> > > > >>> > > lifetime for that particular session unless the particular
>> SASL
>> > > > >>> mechanism
>> > > > >>> > > could result in a shorter session that would be obvious to
>> the
>> > > > >>> client and
>> > > > >>> > > the server.  In particular, for OAUTHBEARER, the session
>> > lifetime
>> > > > >>> will be
>> > > > >>> > > the token lifetime, which the client and server will both
>> know.
>> > > Is
>> > > > >>> > there a
>> > > > >>> > > reason to communicate this value back to the client when the
>> > > client
>> > > > >>> > already
>> > > > >>> > > knows it?  It's an extra network round-trip, and since the
>> SASL
>> > > > >>> > round-trips
>> > > > >>> > > are defined by the spec I'm not certain adding an extra
>> > > round-trip
>> > > > is
>> > > > >>> > > acceptable.  Even if we decide we can add it, it helps with
>> > > latency
>> > > > >>> if we
>> > > > >>> > > don't.
>> > > > >>> > >
>> > > > >>> > > Kerberos may be a bit different -- I don't know if the
>> broker
>> > can
>> > > > >>> learn
>> > > > >>> > > the session lifetime.  If it can then the same thing holds
>> --
>> > > both
>> > > > >>> client
>> > > > >>> > > and server will know the session lifetime and there is no
>> > reason
>> > > to
>> > > > >>> > > communicate it back.  If the server can't learn the lifetime
>> > > then I
>> > > > >>> don't
>> > > > >>> > > think adding an extra SASL_AUTHENTICATE round trip is going
>> to
>> > > > help,
>> > > > >>> > anyway.
>> > > > >>> > >
>> > > > >>> > > Also, by communicating the max session lifetime in the
>> > > > SASL_HANDSHAKE
>> > > > >>> > > response, both OAUTHBEARER and GSSAPI clients will be able
>> to
>> > > know
>> > > > >>> before
>> > > > >>> > > sending any SASL_AUTHENTICATE requests whether their
>> credential
>> > > > >>> violates
>> > > > >>> > > the maximum.  This allows a behaving client to give a good
>> > error
>> > > > >>> message.
>> > > > >>> > > A malicious client would ignore the value and send a
>> > longer-lived
>> > > > >>> > > credential, and then that would be rejected on the server
>> side.
>> > > > >>> > >
>> > > > >>> > > I'm still good with ExpiringCredential not needing to be
>> > public.
>> > > > >>> > >
>> > > > >>> > > What do you think?
>> > > > >>> > >
>> > > > >>> > > Ron
>> > > > >>> > >
>> > > > >>> > > Ron
>> > > > >>> > >
>> > > > >>> > > Ron
>> > > > >>> > >
>> > > > >>> > > On Mon, Sep 17, 2018 at 9:13 AM Ron Dagostino <
>> > [email protected]
>> > > >
>> > > > >>> wrote:
>> > > > >>> > >
>> > > > >>> > >> Hi Rajini.  I've updated the KIP to reflect the decision to
>> > > fully
>> > > > >>> > support
>> > > > >>> > >> this for all SASL mechanisms and to not require the
>> > > > >>> ExpiringCredential
>> > > > >>> > >> interface to be public.
>> > > > >>> > >>
>> > > > >>> > >> Ron
>> > > > >>> > >>
>> > > > >>> > >> On Mon, Sep 17, 2018 at 6:55 AM Ron Dagostino <
>> > > [email protected]>
>> > > > >>> > wrote:
>> > > > >>> > >>
>> > > > >>> > >>> Actually, I think the metric remains the same assuming the
>> > > > >>> > >>> authentication fails when the token  lifetime is too long.
>> > > > >>> Negative
>> > > > >>> > max
>> > > > >>> > >>> config on server counts what would have been killed
>> because
>> > > maybe
>> > > > >>> > client
>> > > > >>> > >>> re-auth is not turned on; positive max enables the kill,
>> > which
>> > > is
>> > > > >>> > counted
>> > > > >>> > >>> by a second metric as proposed.  The current proposal had
>> > > already
>> > > > >>> > stated
>> > > > >>> > >>> that a non-zero value would disallow an authentication
>> with a
>> > > > token
>> > > > >>> > that
>> > > > >>> > >>> has too large a lifetime, and that is still the case, I
>> > think.
>> > > > But
>> > > > >>> > let’s
>> > > > >>> > >>> make sure we are on the same page about all this.
>> > > > >>> > >>>
>> > > > >>> > >>> Ron
>> > > > >>> > >>>
>> > > > >>> > >>> > On Sep 17, 2018, at 6:42 AM, Ron Dagostino <
>> > > [email protected]>
>> > > > >>> > wrote:
>> > > > >>> > >>> >
>> > > > >>> > >>> > Hi Rajini.  I see what you are saying.  The background
>> > login
>> > > > >>> refresh
>> > > > >>> > >>> thread does have to factor into the decision for
>> OAUTHBEARER
>> > > > >>> because
>> > > > >>> > there
>> > > > >>> > >>> is no new token to re-authenticate with until that refresh
>> > > > succeeds
>> > > > >>> > >>> (similarly with Kerberos), but I think you are right that
>> the
>> > > > >>> interface
>> > > > >>> > >>> doesn’t necessarily have to be public.  The server does
>> > decide
>> > > > the
>> > > > >>> > time for
>> > > > >>> > >>> PLAIN and the SCRAM-related mechanisms under the current
>> > > > proposal,
>> > > > >>> but
>> > > > >>> > it
>> > > > >>> > >>> is done in SaslHandshakeResponse, and at that point the
>> > server
>> > > > >>> won’t
>> > > > >>> > have
>> > > > >>> > >>> any token yet; making it happen via
>> SaslAuthenticateRequest
>> > at
>> > > > the
>> > > > >>> > very end
>> > > > >>> > >>> does allow the server to know everything for all
>> mechanisms.
>> > > > >>> > >>> >
>> > > > >>> > >>> > I see two potential issues to discuss.  First, the
>> server
>> > > must
>> > > > >>> > >>> communicate a time that exceeds the (token or ticket)
>> refresh
>> > > > >>> period
>> > > > >>> > on the
>> > > > >>> > >>> client.  Assuming it communicates the expiration time of
>> the
>> > > > >>> > token/ticket,
>> > > > >>> > >>> that’s the best it can do.  So I think that’ll be fine.
>> > > > >>> > >>> >
>> > > > >>> > >>> > The second issue is what happens if the server is
>> > configured
>> > > to
>> > > > >>> > accept
>> > > > >>> > >>> a max value — say, an hour — and the token is good for
>> > > longer.  I
>> > > > >>> > assumed
>> > > > >>> > >>> that the client should not be allowed to authenticate in
>> the
>> > > > first
>> > > > >>> > place
>> > > > >>> > >>> because it could then simply re-authenticate with the same
>> > > token
>> > > > >>> after
>> > > > >>> > an
>> > > > >>> > >>> hour, which defeats the motivations for this KIP.  So do
>> we
>> > > agree
>> > > > >>> the
>> > > > >>> > max
>> > > > >>> > >>> token lifetime allowed will be enforced at authentication
>> > time?
>> > > > >>> > Assuming
>> > > > >>> > >>> so, then we need to discuss migration.
>> > > > >>> > >>> >
>> > > > >>> > >>> > The current proposal includes a metric that can be used
>> to
>> > > > >>> identify
>> > > > >>> > if
>> > > > >>> > >>> an OAUTHBEARER client is misconfigured — the number of
>> > > > connections
>> > > > >>> that
>> > > > >>> > >>> would have been killed (could be non-zero when the
>> configured
>> > > max
>> > > > >>> is a
>> > > > >>> > >>> negative number). Do we still want this type of an
>> > indication,
>> > > > and
>> > > > >>> if
>> > > > >>> > so,
>> > > > >>> > >>> is it still done the same way — a negative number for max,
>> > but
>> > > > >>> instead
>> > > > >>> > of
>> > > > >>> > >>> counting the number of kills that would have been done it
>> > > counts
>> > > > >>> the
>> > > > >>> > number
>> > > > >>> > >>> of authentications that would have been failed?
>> > > > >>> > >>> >
>> > > > >>> > >>> > Ron
>> > > > >>> > >>> >
>> > > > >>> > >>> >> On Sep 17, 2018, at 6:06 AM, Rajini Sivaram <
>> > > > >>> > [email protected]>
>> > > > >>> > >>> wrote:
>> > > > >>> > >>> >>
>> > > > >>> > >>> >> Hi Ron,
>> > > > >>> > >>> >>
>> > > > >>> > >>> >> Thinking a bit more about other SASL mechanisms, I
>> think
>> > the
>> > > > >>> issue
>> > > > >>> > is
>> > > > >>> > >>> that
>> > > > >>> > >>> >> in the current proposal, clients decide the
>> > > re-authentication
>> > > > >>> > period.
>> > > > >>> > >>> For
>> > > > >>> > >>> >> mechanisms like PLAIN or SCRAM, we would actually want
>> > > server
>> > > > to
>> > > > >>> > >>> determine
>> > > > >>> > >>> >> the re-authentication interval. For example, the PLAIN
>> or
>> > > > SCRAM
>> > > > >>> > >>> database
>> > > > >>> > >>> >> could specify the expiry time for each principal, or
>> the
>> > > > broker
>> > > > >>> > could
>> > > > >>> > >>> be
>> > > > >>> > >>> >> configured with a single expiry time for all
>> principals of
>> > > > that
>> > > > >>> > >>> mechanism.
>> > > > >>> > >>> >> For OAuth, brokers do know the expiry time. Haven't
>> > figured
>> > > > out
>> > > > >>> what
>> > > > >>> > >>> to do
>> > > > >>> > >>> >> with Kerberos, but in any case for broker-side
>> connection
>> > > > >>> > termination
>> > > > >>> > >>> on
>> > > > >>> > >>> >> expiry, we need the broker to know/decide the expiry
>> time.
>> > > So
>> > > > I
>> > > > >>> > would
>> > > > >>> > >>> like
>> > > > >>> > >>> >> to suggest a slightly different approach to managing
>> > > > >>> > >>> re-authentications.
>> > > > >>> > >>> >>
>> > > > >>> > >>> >> 1) Instead of changing SASL_HANDSHAKE version number,
>> we
>> > > bump
>> > > > up
>> > > > >>> > >>> >> SASL_AUTHENTICATE version number.
>> > > > >>> > >>> >> 2) In the final SASL_AUTHENTICATE response, broker
>> returns
>> > > the
>> > > > >>> > expiry
>> > > > >>> > >>> time
>> > > > >>> > >>> >> of this authenticated session. This is the interval
>> after
>> > > > which
>> > > > >>> > >>> broker will
>> > > > >>> > >>> >> terminate the connection If it wasn't re-authenticated.
>> > > > >>> > >>> >> 3) We no longer need the public interface change to
>> add `
>> > > > >>> > >>> >>
>> > > org.apache.kafka.common.security.expiring.ExpiringCredential`
>> > > > >>> for
>> > > > >>> > the
>> > > > >>> > >>> >> client-side. Instead we schedule the next
>> > re-authentication
>> > > on
>> > > > >>> the
>> > > > >>> > >>> client
>> > > > >>> > >>> >> based on the expiry time provided by the server (some
>> time
>> > > > >>> earlier
>> > > > >>> > >>> than the
>> > > > >>> > >>> >> expiry).
>> > > > >>> > >>> >> 4) If client uses SASL_AUTHENTICATE v0, broker will not
>> > > return
>> > > > >>> > expiry
>> > > > >>> > >>> time.
>> > > > >>> > >>> >> The connection will be terminated if that feature is
>> > enabled
>> > > > >>> (the
>> > > > >>> > >>> same code
>> > > > >>> > >>> >> path as client failing to re-authenticte on time).
>> > > > >>> > >>> >>
>> > > > >>> > >>> >> Thoughts?
>> > > > >>> > >>> >>
>> > > > >>> > >>> >>> On Sat, Sep 15, 2018 at 3:06 AM, Ron Dagostino <
>> > > > >>> [email protected]>
>> > > > >>> > >>> wrote:
>> > > > >>> > >>> >>>
>> > > > >>> > >>> >>> Hi everyone.  I've updated the KIP to reflect all
>> > > discussion
>> > > > to
>> > > > >>> > date,
>> > > > >>> > >>> >>> including the decision to go with the low-level
>> approach.
>> > > > This
>> > > > >>> > >>> latest
>> > > > >>> > >>> >>> version of the KIP includes the above "
>> > > > >>> connections.max.reauth.ms"
>> > > > >>> > >>> >>> proposal,
>> > > > >>> > >>> >>> which I know has not been discussed yet.  It mentions
>> new
>> > > > >>> metrics,
>> > > > >>> > >>> some of
>> > > > >>> > >>> >>> which may not have been discussed either (and names
>> are
>> > > > missing
>> > > > >>> > from
>> > > > >>> > >>> some
>> > > > >>> > >>> >>> of them).  Regardless, this new version is the closest
>> > yet
>> > > > to a
>> > > > >>> > >>> version
>> > > > >>> > >>> >>> that can be put to a vote next week.
>> > > > >>> > >>> >>>
>> > > > >>> > >>> >>> Ron
>> > > > >>> > >>> >>>
>> > > > >>> > >>> >>>> On Fri, Sep 14, 2018 at 8:59 PM Ron Dagostino <
>> > > > >>> [email protected]>
>> > > > >>> > >>> wrote:
>> > > > >>> > >>> >>>>
>> > > > >>> > >>> >>>> Minor correction: I'm proposing "
>> > > connections.max.reauth.ms"
>> > > > >>> as
>> > > > >>> > the
>> > > > >>> > >>> >>>> broker-side configuration property, not "
>> > > > >>> > >>> >>>> connections.max.expired.credentials.ms".
>> > > > >>> > >>> >>>>
>> > > > >>> > >>> >>>> Ron
>> > > > >>> > >>> >>>>
>> > > > >>> > >>> >>>>> On Fri, Sep 14, 2018 at 8:40 PM Ron Dagostino <
>> > > > >>> [email protected]
>> > > > >>> > >
>> > > > >>> > >>> wrote:
>> > > > >>> > >>> >>>>>
>> > > > >>> > >>> >>>>> Hi Rajini.  I'm going to choose *
>> > > > >>> > >>> connections.max.expired.credentials.ms
>> > > > >>> > >>> >>>>> <http://connections.max.expired.credentials.ms>* as
>> > the
>> > > > >>> option
>> > > > >>> > >>> name
>> > > > >>> > >>> >>>>> because it is consistent with the comment I made in
>> the
>> > > > >>> section
>> > > > >>> > >>> about
>> > > > >>> > >>> >>> how
>> > > > >>> > >>> >>>>> to get the client and server to agree on credential
>> > > > lifetime:
>> > > > >>> > >>> >>>>>
>> > > > >>> > >>> >>>>> "We could add a new API call so that clients could
>> ask
>> > > > >>> servers
>> > > > >>> > for
>> > > > >>> > >>> the
>> > > > >>> > >>> >>>>> lifetime they use, or we could extend the
>> > > > >>> > >>> SaslHandshakeRequest/Response
>> > > > >>> > >>> >>> API
>> > > > >>> > >>> >>>>> call to include that information in the server's
>> > > response –
>> > > > >>> the
>> > > > >>> > >>> client
>> > > > >>> > >>> >>>>> would then adopt that value"
>> > > > >>> > >>> >>>>>
>> > > > >>> > >>> >>>>>
>> > > > >>> > >>> >>>>> We set the config option value on the broker (with
>> the
>> > "
>> > > > >>> > >>> >>>>> listener.name.mechanism." prefix), and we will
>> return
>> > the
>> > > > >>> > >>> configured
>> > > > >>> > >>> >>>>> value in the SaslHandshakeResponse (requiring a wire
>> > > format
>> > > > >>> > change
>> > > > >>> > >>> in
>> > > > >>> > >>> >>>>> addition to a version bump).  The value (referred
>> to as
>> > > "X"
>> > > > >>> > below)
>> > > > >>> > >>> can
>> > > > >>> > >>> >>> be
>> > > > >>> > >>> >>>>> negative, zero, or positive and is to be
>> interpreted as
>> > > > >>> follows:
>> > > > >>> > >>> >>>>>
>> > > > >>> > >>> >>>>> X = 0: Fully disable.  The server will respond to
>> > > > >>> > >>> re-authentications,
>> > > > >>> > >>> >>> but
>> > > > >>> > >>> >>>>> it won't kill any connections due to expiration,
>> and it
>> > > > won't
>> > > > >>> > track
>> > > > >>> > >>> >>> either
>> > > > >>> > >>> >>>>> of the 2 metrics mentioned below.
>> > > > >>> > >>> >>>>>
>> > > > >>> > >>> >>>>> Now, a couple of definitions for when X != 0:
>> > > > >>> > >>> >>>>>
>> > > > >>> > >>> >>>>> 1) We define the *maximum allowed expiration time*
>> to
>> > be
>> > > > the
>> > > > >>> time
>> > > > >>> > >>> >>>>> determined by the point when (re-)authentication
>> occurs
>> > > > plus
>> > > > >>> |X|
>> > > > >>> > >>> >>>>> milliseconds.
>> > > > >>> > >>> >>>>> 2) We define the *requested expiration time* to be
>> the
>> > > > >>> maximum
>> > > > >>> > >>> allowed
>> > > > >>> > >>> >>>>> expiration time except for the case of an OAuth
>> bearer
>> > > > >>> token, in
>> > > > >>> > >>> which
>> > > > >>> > >>> >>> case
>> > > > >>> > >>> >>>>> it is the point at which the token expires.
>> (Kerberos
>> > > > >>> presumably
>> > > > >>> > >>> also
>> > > > >>> > >>> >>>>> specifies a ticket lifetime, but frankly I am not
>> in a
>> > > > >>> position
>> > > > >>> > to
>> > > > >>> > >>> do
>> > > > >>> > >>> >>> any
>> > > > >>> > >>> >>>>> Kerberos-related coding in time for a 2.1.0 release
>> and
>> > > > would
>> > > > >>> > >>> prefer to
>> > > > >>> > >>> >>>>> ignore this piece of information for this KIP --
>> would
>> > it
>> > > > be
>> > > > >>> > >>> acceptable
>> > > > >>> > >>> >>> to
>> > > > >>> > >>> >>>>> have someone else add it later?).
>> > > > >>> > >>> >>>>>
>> > > > >>> > >>> >>>>> Based on these definitions, we define the behavior
>> as
>> > > > >>> follows:
>> > > > >>> > >>> >>>>>
>> > > > >>> > >>> >>>>> *X > 0: Fully enable*
>> > > > >>> > >>> >>>>> A) The server will reject any
>> > > > >>> authentication/re-authentication
>> > > > >>> > >>> attempt
>> > > > >>> > >>> >>>>> when the requested expiration time is after the
>> maximum
>> > > > >>> allowed
>> > > > >>> > >>> >>> expiration
>> > > > >>> > >>> >>>>> time (which could only happen with an OAuth bearer
>> > token,
>> > > > >>> > assuming
>> > > > >>> > >>> we
>> > > > >>> > >>> >>> skip
>> > > > >>> > >>> >>>>> dealing with Kerberos for now).
>> > > > >>> > >>> >>>>> B) The server will kill connections that are used
>> > beyond
>> > > > the
>> > > > >>> > >>> requested
>> > > > >>> > >>> >>>>> expiration time.
>> > > > >>> > >>> >>>>> C) A broker metric will be maintained that documents
>> > the
>> > > > >>> number
>> > > > >>> > >>> >>>>> connections killed by the broker.  This metric will
>> be
>> > > > >>> non-zero
>> > > > >>> > if
>> > > > >>> > >>> a
>> > > > >>> > >>> >>> client
>> > > > >>> > >>> >>>>> is connecting to the broker with re-authentication
>> > either
>> > > > >>> > >>> unavailable
>> > > > >>> > >>> >>> (i.e.
>> > > > >>> > >>> >>>>> an older client) or disabled.
>> > > > >>> > >>> >>>>>
>> > > > >>> > >>> >>>>> *X < 0: Partially enable*
>> > > > >>> > >>> >>>>> A) Same as above: the server will reject any
>> > > > >>> > >>> >>>>> authentication/re-authentication attempt when the
>> > > > requested
>> > > > >>> > >>> expiration
>> > > > >>> > >>> >>> time
>> > > > >>> > >>> >>>>> is after the maximum allowed expiration time (which
>> > could
>> > > > >>> only
>> > > > >>> > >>> happen
>> > > > >>> > >>> >>> with
>> > > > >>> > >>> >>>>> an OAuth bearer token, assuming we skip dealing with
>> > > > >>> Kerberos for
>> > > > >>> > >>> now).
>> > > > >>> > >>> >>>>> B) The server will **not** kill connections that are
>> > used
>> > > > >>> beyond
>> > > > >>> > >>> the
>> > > > >>> > >>> >>>>> requested expiration time.
>> > > > >>> > >>> >>>>> C) A broker metric will be maintained that documents
>> > the
>> > > > >>> number
>> > > > >>> > of
>> > > > >>> > >>> API
>> > > > >>> > >>> >>>>> requests unrelated to re-authentication that are
>> made
>> > > over
>> > > > a
>> > > > >>> > >>> connection
>> > > > >>> > >>> >>>>> beyond the requested expiration time.  This metric
>> will
>> > > be
>> > > > >>> > helpful
>> > > > >>> > >>> for
>> > > > >>> > >>> >>>>> operators to ensure that all clients are properly
>> > > upgraded
>> > > > >>> and
>> > > > >>> > >>> >>>>> re-authenticating before fully enabling the
>> server-side
>> > > > >>> > >>> >>>>> expired-connection-kill functionality (i.e. by
>> changing
>> > > the
>> > > > >>> value
>> > > > >>> > >>> from a
>> > > > >>> > >>> >>>>> negative number to a positive number): this metric
>> will
>> > > be
>> > > > >>> zero
>> > > > >>> > >>> across
>> > > > >>> > >>> >>> all
>> > > > >>> > >>> >>>>> brokers when it is safe to fully enable the
>> server-side
>> > > > >>> feature.
>> > > > >>> > >>> >>>>>
>> > > > >>> > >>> >>>>> Thoughts?
>> > > > >>> > >>> >>>>>
>> > > > >>> > >>> >>>>> Ron
>> > > > >>> > >>> >>>>>
>> > > > >>> > >>> >>>>> On Fri, Sep 14, 2018 at 11:59 AM Rajini Sivaram <
>> > > > >>> > >>> >>> [email protected]>
>> > > > >>> > >>> >>>>> wrote:
>> > > > >>> > >>> >>>>>
>> > > > >>> > >>> >>>>>> It will be good to shorten the config name. We
>> have a
>> > > > config
>> > > > >>> > >>> named `
>> > > > >>> > >>> >>>>>> connections.max.idle.ms`. We could add something
>> > like `
>> > > > >>> > >>> >>>>>> connections.max.expired.credentials.ms`. As you
>> said,
>> > > it
>> > > > >>> would
>> > > > >>> > be
>> > > > >>> > >>> >>>>>> prefixed
>> > > > >>> > >>> >>>>>> with listener and SASL mechanism name. We should be
>> > able
>> > > > to
>> > > > >>> > >>> support
>> > > > >>> > >>> >>>>>> connection termination in future even with SSL, so
>> > > perhaps
>> > > > >>> we
>> > > > >>> > >>> don't
>> > > > >>> > >>> >>> want
>> > > > >>> > >>> >>>>>> the `sasl.server.` prefix (we just need to validate
>> > > > whether
>> > > > >>> the
>> > > > >>> > >>> config
>> > > > >>> > >>> >>> is
>> > > > >>> > >>> >>>>>> supported for the protocol). You can choose
>> whether a
>> > > > >>> boolean
>> > > > >>> > >>> flag or a
>> > > > >>> > >>> >>>>>> time interval makes more sense.
>> > > > >>> > >>> >>>>>>
>> > > > >>> > >>> >>>>>> For the client-side, the KIP explains how to make
>> it
>> > > work
>> > > > >>> for
>> > > > >>> > >>> other
>> > > > >>> > >>> >>>>>> mechanisms and we can leave it that. Not convinced
>> > about
>> > > > >>> > >>> server-side
>> > > > >>> > >>> >>>>>> though. It seems to me that we probably would
>> require
>> > > API
>> > > > >>> > changes
>> > > > >>> > >>> to
>> > > > >>> > >>> >>> make
>> > > > >>> > >>> >>>>>> it work. Basically the proposed approach works only
>> > for
>> > > > >>> > >>> OAUTHBEARER.
>> > > > >>> > >>> >>>>>> Since
>> > > > >>> > >>> >>>>>> we have to bump up SaslHandshakeRequest version in
>> > this
>> > > > >>> KIP, it
>> > > > >>> > >>> will be
>> > > > >>> > >>> >>>>>> good to work out if we need to change the request
>> or
>> > > flow
>> > > > to
>> > > > >>> > make
>> > > > >>> > >>> this
>> > > > >>> > >>> >>>>>> work
>> > > > >>> > >>> >>>>>> for other mechanisms. I haven't figured out exactly
>> > what
>> > > > is
>> > > > >>> > >>> needed, but
>> > > > >>> > >>> >>>>>> will think about it and get back next week. In the
>> > > > >>> meantime, you
>> > > > >>> > >>> can
>> > > > >>> > >>> >>> get
>> > > > >>> > >>> >>>>>> the KIP up-to-date with the config, migration path
>> > etc.
>> > > > and
>> > > > >>> get
>> > > > >>> > it
>> > > > >>> > >>> >>> ready
>> > > > >>> > >>> >>>>>> to
>> > > > >>> > >>> >>>>>> start vote next week.
>> > > > >>> > >>> >>>>>>
>> > > > >>> > >>> >>>>>>
>> > > > >>> > >>> >>>>>>
>> > > > >>> > >>> >>>>>>
>> > > > >>> > >>> >>>>>>
>> > > > >>> > >>> >>>>>> On Fri, Sep 14, 2018 at 1:24 PM, Ron Dagostino <
>> > > > >>> > [email protected]
>> > > > >>> > >>> >
>> > > > >>> > >>> >>>>>> wrote:
>> > > > >>> > >>> >>>>>>
>> > > > >>> > >>> >>>>>>> Hi Rajini.
>> > > > >>> > >>> >>>>>>>
>> > > > >>> > >>> >>>>>>> Agreed, we will bump the SaslHandshakeRequest API
>> > > number
>> > > > >>> (no
>> > > > >>> > wire
>> > > > >>> > >>> >>>>>> format
>> > > > >>> > >>> >>>>>>> change, of course).
>> > > > >>> > >>> >>>>>>>
>> > > > >>> > >>> >>>>>>> Agreed, responding to re-authentication will
>> always
>> > be
>> > > > >>> enabled
>> > > > >>> > >>> on the
>> > > > >>> > >>> >>>>>>> broker since it is initiated by the client.
>> > > > >>> > >>> >>>>>>>
>> > > > >>> > >>> >>>>>>> Agreed, connection termination by the broker will
>> be
>> > > > >>> opt-in.
>> > > > >>> > I'm
>> > > > >>> > >>> >>>>>> thinking
>> > > > >>> > >>> >>>>>>> *sasl.server.disconnect.expired.credential.
>> > connections.
>> > > > >>> > enable*,
>> > > > >>> > >>> >>>>>> prefixed
>> > > > >>> > >>> >>>>>>> with listener and SASL mechanism name in
>> lower-case
>> > so
>> > > it
>> > > > >>> can
>> > > > >>> > be
>> > > > >>> > >>> >>>>>> opt-in at
>> > > > >>> > >>> >>>>>>> the granularity of a SASL mechanism; for example,
>> "
>> > > > >>> > >>> >>>>>>> listener.name.sasl_ssl.oauthbearer.sasl.server.
>> > > > >>> > >>> >>>>>>> disconnect.expired.credential.
>> > connections.enable=true
>> > > > >>> > >>> >>>>>>> ".  It is long, so if you have a preference for a
>> > > shorter
>> > > > >>> name
>> > > > >>> > >>> let me
>> > > > >>> > >>> >>>>>> know
>> > > > >>> > >>> >>>>>>> and we'll go with it instead.
>> > > > >>> > >>> >>>>>>>
>> > > > >>> > >>> >>>>>>> Agreed, additional metrics will be helpful.  I'll
>> add
>> > > > them
>> > > > >>> to
>> > > > >>> > the
>> > > > >>> > >>> >>> KIP.
>> > > > >>> > >>> >>>>>>>
>> > > > >>> > >>> >>>>>>> <<<need to work out exactly how this works with
>> all
>> > > SASL
>> > > > >>> > >>> mechanisms
>> > > > >>> > >>> >>>>>>> <<<not sure we have the data required on the
>> broker
>> > or
>> > > a
>> > > > >>> way of
>> > > > >>> > >>> >>>>>>> <<< extending the [GSSAPI] mechanism
>> > > > >>> > >>> >>>>>>> Not sure I agree here.  The paragraph I added to
>> the
>> > > KIP
>> > > > >>> > >>> describes
>> > > > >>> > >>> >>> how
>> > > > >>> > >>> >>>>>> I
>> > > > >>> > >>> >>>>>>> think this can be done.  Given that description,
>> do
>> > you
>> > > > >>> still
>> > > > >>> > >>> feel
>> > > > >>> > >>> >>>>>> Kerberos
>> > > > >>> > >>> >>>>>>> will not be possible?  If Kerberos presents a
>> > > significant
>> > > > >>> > hurdle
>> > > > >>> > >>> >>> then I
>> > > > >>> > >>> >>>>>>> don't think that should prevent us from doing it
>> for
>> > > > other
>> > > > >>> > >>> mechanisms
>> > > > >>> > >>> >>>>>> -- I
>> > > > >>> > >>> >>>>>>> would rather state that it isn't supported with
>> > GSSAPI
>> > > > due
>> > > > >>> to
>> > > > >>> > >>> >>>>>> limitations
>> > > > >>> > >>> >>>>>>> in Kerberos itself than not have it for
>> OAUTHBEARER,
>> > > for
>> > > > >>> > example.
>> > > > >>> > >>> >>> And
>> > > > >>> > >>> >>>>>>> right now I don't think we need more than a
>> > high-level
>> > > > >>> > >>> description of
>> > > > >>> > >>> >>>>>> how
>> > > > >>> > >>> >>>>>>> this could be done.  In other words, I think we
>> have
>> > > this
>> > > > >>> > >>> covered,
>> > > > >>> > >>> >>>>>> unless
>> > > > >>> > >>> >>>>>>> there is a fundamental problem due to Kerberos not
>> > > making
>> > > > >>> data
>> > > > >>> > >>> (i.e.
>> > > > >>> > >>> >>>>>> ticket
>> > > > >>> > >>> >>>>>>> expiration) available on the server.   I want to
>> > submit
>> > > > >>> this
>> > > > >>> > KIP
>> > > > >>> > >>> for
>> > > > >>> > >>> >>> a
>> > > > >>> > >>> >>>>>> vote
>> > > > >>> > >>> >>>>>>> early next week in the hope of having it accepted
>> by
>> > > the
>> > > > >>> > Monday,
>> > > > >>> > >>> 9/24
>> > > > >>> > >>> >>>>>>> deadline for the 2.1.0 release, and if that
>> happens I
>> > > > >>> believe I
>> > > > >>> > >>> can
>> > > > >>> > >>> >>>>>> get a
>> > > > >>> > >>> >>>>>>> PR into really good shape soon thereafter (I'm
>> > working
>> > > on
>> > > > >>> it
>> > > > >>> > >>> now).
>> > > > >>> > >>> >>>>>>>
>> > > > >>> > >>> >>>>>>> Ron
>> > > > >>> > >>> >>>>>>>
>> > > > >>> > >>> >>>>>>>
>> > > > >>> > >>> >>>>>>>
>> > > > >>> > >>> >>>>>>>
>> > > > >>> > >>> >>>>>>>
>> > > > >>> > >>> >>>>>>>
>> > > > >>> > >>> >>>>>>> On Fri, Sep 14, 2018 at 5:57 AM Rajini Sivaram <
>> > > > >>> > >>> >>>>>> [email protected]>
>> > > > >>> > >>> >>>>>>> wrote:
>> > > > >>> > >>> >>>>>>>
>> > > > >>> > >>> >>>>>>>> Hi Ron,
>> > > > >>> > >>> >>>>>>>>
>> > > > >>> > >>> >>>>>>>> 1) Yes, we should update the version of
>> > > > >>> > `SaslHandshakeRequest`.
>> > > > >>> > >>> >>>>>> Clients
>> > > > >>> > >>> >>>>>>>> would attempt to re-authenticate only if broker's
>> > > > >>> > >>> >>>>>> SaslHandshakeRequest
>> > > > >>> > >>> >>>>>>>> version >=2.
>> > > > >>> > >>> >>>>>>>> 2) I think we should enable broker's
>> > re-authentication
>> > > > and
>> > > > >>> > >>> >>> connection
>> > > > >>> > >>> >>>>>>>> termination code regardless of client version.
>> > > > >>> > Re-authentication
>> > > > >>> > >>> >>>>>> could be
>> > > > >>> > >>> >>>>>>>> always enabled since it is triggered only by
>> clients
>> > > and
>> > > > >>> > broker
>> > > > >>> > >>> >>> could
>> > > > >>> > >>> >>>>>>> just
>> > > > >>> > >>> >>>>>>>> handle it. Connection termination should be
>> opt-in,
>> > > but
>> > > > >>> should
>> > > > >>> > >>> work
>> > > > >>> > >>> >>>>>> with
>> > > > >>> > >>> >>>>>>>> clients of any version. If turned on and clients
>> are
>> > > of
>> > > > an
>> > > > >>> > older
>> > > > >>> > >>> >>>>>> version,
>> > > > >>> > >>> >>>>>>>> this would simply force reconnection, which
>> should
>> > be
>> > > > ok.
>> > > > >>> > >>> >>> Additional
>> > > > >>> > >>> >>>>>>>> metrics to monitor expired connections may be
>> useful
>> > > > >>> anyway.
>> > > > >>> > >>> >>>>>>>>
>> > > > >>> > >>> >>>>>>>> We also need to work out exactly how this works
>> with
>> > > all
>> > > > >>> SASL
>> > > > >>> > >>> >>>>>> mechanisms.
>> > > > >>> > >>> >>>>>>>> In particular, we have ensure that we can make it
>> > work
>> > > > >>> with
>> > > > >>> > >>> >>> Kerberos
>> > > > >>> > >>> >>>>>>> since
>> > > > >>> > >>> >>>>>>>> we use the implementation from the JRE and I am
>> not
>> > > sure
>> > > > >>> we
>> > > > >>> > have
>> > > > >>> > >>> >>> the
>> > > > >>> > >>> >>>>>> data
>> > > > >>> > >>> >>>>>>>> required on the broker or a way of extending the
>> > > > >>> mechanism.
>> > > > >>> > >>> >>>>>>>>
>> > > > >>> > >>> >>>>>>>> Thoughts?
>> > > > >>> > >>> >>>>>>>>
>> > > > >>> > >>> >>>>>>>> On Thu, Sep 13, 2018 at 6:56 PM, Ron Dagostino <
>> > > > >>> > >>> [email protected]>
>> > > > >>> > >>> >>>>>>> wrote:
>> > > > >>> > >>> >>>>>>>>
>> > > > >>> > >>> >>>>>>>>> Hi Rajini.  I'm thinking about how we deal with
>> > > > >>> migration.
>> > > > >>> > >>>  For
>> > > > >>> > >>> >>>>>>> example,
>> > > > >>> > >>> >>>>>>>>> let's say we have an existing 2.0.0 cluster
>> using
>> > > > >>> > >>> >>> SASL/OAUTHBEARER
>> > > > >>> > >>> >>>>>> and
>> > > > >>> > >>> >>>>>>> we
>> > > > >>> > >>> >>>>>>>>> want to use this feature.  The desired end
>> state is
>> > > to
>> > > > >>> have
>> > > > >>> > all
>> > > > >>> > >>> >>>>>> clients
>> > > > >>> > >>> >>>>>>>> and
>> > > > >>> > >>> >>>>>>>>> brokers migrated to a version that supports the
>> > > feature
>> > > > >>> (call
>> > > > >>> > >>> it
>> > > > >>> > >>> >>>>>> 2.x)
>> > > > >>> > >>> >>>>>>>> with
>> > > > >>> > >>> >>>>>>>>> the feature turned on.  We need to document the
>> > > > supported
>> > > > >>> > >>> path(s)
>> > > > >>> > >>> >>>>>> to
>> > > > >>> > >>> >>>>>>> this
>> > > > >>> > >>> >>>>>>>>> end state.
>> > > > >>> > >>> >>>>>>>>>
>> > > > >>> > >>> >>>>>>>>> Here are a couple of scenarios with
>> implications:
>> > > > >>> > >>> >>>>>>>>>
>> > > > >>> > >>> >>>>>>>>> 1) *Migrate client to 2.x and turn the feature
>> on
>> > but
>> > > > >>> broker
>> > > > >>> > is
>> > > > >>> > >>> >>>>>> still
>> > > > >>> > >>> >>>>>>> at
>> > > > >>> > >>> >>>>>>>>> 2.0.*  In this case the client is going to get
>> an
>> > > error
>> > > > >>> when
>> > > > >>> > it
>> > > > >>> > >>> >>>>>> sends
>> > > > >>> > >>> >>>>>>> the
>> > > > >>> > >>> >>>>>>>>> SaslHandshakeRequest.  We want to avoid this.
>> It
>> > > seems
>> > > > >>> to me
>> > > > >>> > >>> the
>> > > > >>> > >>> >>>>>>> client
>> > > > >>> > >>> >>>>>>>>> needs to know if the broker has been upgraded to
>> > the
>> > > > >>> > necessary
>> > > > >>> > >>> >>>>>> version
>> > > > >>> > >>> >>>>>>>>> before trying to re-authenticate, which makes me
>> > > > believe
>> > > > >>> we
>> > > > >>> > >>> need
>> > > > >>> > >>> >>> to
>> > > > >>> > >>> >>>>>>> bump
>> > > > >>> > >>> >>>>>>>>> the API version number in 2.x and the client
>> has to
>> > > > >>> check to
>> > > > >>> > >>> see
>> > > > >>> > >>> >>>>>> if the
>> > > > >>> > >>> >>>>>>>> max
>> > > > >>> > >>> >>>>>>>>> version the broker supports meets a minimum
>> > standard
>> > > > >>> before
>> > > > >>> > >>> >>> trying
>> > > > >>> > >>> >>>>>> to
>> > > > >>> > >>> >>>>>>>>> re-authenticate.  Do you agree?
>> > > > >>> > >>> >>>>>>>>>
>> > > > >>> > >>> >>>>>>>>> 2) *Migrate broker to 2.x and turn the feature
>> on
>> > but
>> > > > >>> client
>> > > > >>> > is
>> > > > >>> > >>> >>>>>> still
>> > > > >>> > >>> >>>>>>> at
>> > > > >>> > >>> >>>>>>>>> 2.0*.  In this case the broker is going to end
>> up
>> > > > killing
>> > > > >>> > >>> >>>>>> connections
>> > > > >>> > >>> >>>>>>>>> periodically.   Again, we want to avoid this.
>> It's
>> > > > >>> tempting
>> > > > >>> > to
>> > > > >>> > >>> >>> say
>> > > > >>> > >>> >>>>>>>> "don't
>> > > > >>> > >>> >>>>>>>>> do this" but I wonder if we can require
>> > installations
>> > > > to
>> > > > >>> > >>> upgrade
>> > > > >>> > >>> >>>>>> all
>> > > > >>> > >>> >>>>>>>>> clients before turning the feature on at the
>> > brokers.
>> > > > >>> > >>> Certainly
>> > > > >>> > >>> >>>>>> we can
>> > > > >>> > >>> >>>>>>>> say
>> > > > >>> > >>> >>>>>>>>> "don't do this" for inter-broker clients -- in
>> > other
>> > > > >>> words,
>> > > > >>> > we
>> > > > >>> > >>> >>> can
>> > > > >>> > >>> >>>>>> say
>> > > > >>> > >>> >>>>>>>> that
>> > > > >>> > >>> >>>>>>>>> all brokers in a cluster should be upgraded
>> before
>> > > the
>> > > > >>> > feature
>> > > > >>> > >>> is
>> > > > >>> > >>> >>>>>>> turned
>> > > > >>> > >>> >>>>>>>> on
>> > > > >>> > >>> >>>>>>>>> for any one of them -- but I don't know about
>> our
>> > > > >>> ability to
>> > > > >>> > >>> say
>> > > > >>> > >>> >>>>>> it for
>> > > > >>> > >>> >>>>>>>>> non-broker clients.
>> > > > >>> > >>> >>>>>>>>>
>> > > > >>> > >>> >>>>>>>>> Now we consider the cases where both the brokers
>> > and
>> > > > the
>> > > > >>> > >>> clients
>> > > > >>> > >>> >>>>>> have
>> > > > >>> > >>> >>>>>>>> been
>> > > > >>> > >>> >>>>>>>>> upgraded to 2.x.  When and where should the
>> feature
>> > > be
>> > > > >>> turned
>> > > > >>> > >>> on?
>> > > > >>> > >>> >>>>>> The
>> > > > >>> > >>> >>>>>>>>> inter-broker case is interesting because I don't
>> > > think
>> > > > >>> think
>> > > > >>> > we
>> > > > >>> > >>> >>> can
>> > > > >>> > >>> >>>>>>>> require
>> > > > >>> > >>> >>>>>>>>> installations to restart every broker with a new
>> > > config
>> > > > >>> where
>> > > > >>> > >>> the
>> > > > >>> > >>> >>>>>>> feature
>> > > > >>> > >>> >>>>>>>>> is turned on at the same time.  Do you agree
>> that
>> > we
>> > > > >>> cannot
>> > > > >>> > >>> >>> require
>> > > > >>> > >>> >>>>>>> this
>> > > > >>> > >>> >>>>>>>>> and therefore must support installations
>> restarting
>> > > > >>> brokers
>> > > > >>> > >>> with
>> > > > >>> > >>> >>> a
>> > > > >>> > >>> >>>>>> new
>> > > > >>> > >>> >>>>>>>>> config at whatever pace they need -- which may
>> be
>> > > quite
>> > > > >>> slow
>> > > > >>> > >>> >>>>>> relative
>> > > > >>> > >>> >>>>>>> to
>> > > > >>> > >>> >>>>>>>>> token lifetimes?  Assuming you do agree, then
>> there
>> > > is
>> > > > >>> going
>> > > > >>> > to
>> > > > >>> > >>> >>> be
>> > > > >>> > >>> >>>>>> the
>> > > > >>> > >>> >>>>>>>> case
>> > > > >>> > >>> >>>>>>>>> where some brokers are going to have the feature
>> > > turned
>> > > > >>> on
>> > > > >>> > and
>> > > > >>> > >>> >>> some
>> > > > >>> > >>> >>>>>>>> clients
>> > > > >>> > >>> >>>>>>>>> (definitely inter-broker clients at a minimum)
>> are
>> > > > going
>> > > > >>> to
>> > > > >>> > >>> have
>> > > > >>> > >>> >>>>>> the
>> > > > >>> > >>> >>>>>>>>> feature turned off.
>> > > > >>> > >>> >>>>>>>>>
>> > > > >>> > >>> >>>>>>>>> The above discussion assumes a single config on
>> the
>> > > > >>> broker
>> > > > >>> > side
>> > > > >>> > >>> >>>>>> that
>> > > > >>> > >>> >>>>>>>> turns
>> > > > >>> > >>> >>>>>>>>> on both the inter-broker client
>> re-authentication
>> > > > >>> feature as
>> > > > >>> > >>> well
>> > > > >>> > >>> >>>>>> as
>> > > > >>> > >>> >>>>>>> the
>> > > > >>> > >>> >>>>>>>>> server-side expired-connection-kill feature, but
>> > now
>> > > > I'm
>> > > > >>> > >>> thinking
>> > > > >>> > >>> >>>>>> we
>> > > > >>> > >>> >>>>>>> need
>> > > > >>> > >>> >>>>>>>>> the ability to turn these features on
>> > independently,
>> > > > plus
>> > > > >>> > maybe
>> > > > >>> > >>> >>> we
>> > > > >>> > >>> >>>>>>> need a
>> > > > >>> > >>> >>>>>>>>> way to monitor the number of "active, expired"
>> > > > >>> connections to
>> > > > >>> > >>> the
>> > > > >>> > >>> >>>>>>> server
>> > > > >>> > >>> >>>>>>>> so
>> > > > >>> > >>> >>>>>>>>> that operators can be sure that all clients have
>> > been
>> > > > >>> > >>> >>>>>> upgraded/enabled
>> > > > >>> > >>> >>>>>>>>> before turning on the server-side
>> > > > expired-connection-kill
>> > > > >>> > >>> >>> feature.
>> > > > >>> > >>> >>>>>>>>>
>> > > > >>> > >>> >>>>>>>>> So the migration plan would be as follows:
>> > > > >>> > >>> >>>>>>>>>
>> > > > >>> > >>> >>>>>>>>> 1) Upgrade all brokers to 2.x.
>> > > > >>> > >>> >>>>>>>>> 2) After all brokers are upgraded, turn on
>> > > > >>> re-authentication
>> > > > >>> > >>> for
>> > > > >>> > >>> >>>>>> all
>> > > > >>> > >>> >>>>>>>>> brokers at whatever rate is desired -- just
>> > > eventually,
>> > > > >>> at
>> > > > >>> > some
>> > > > >>> > >>> >>>>>> point,
>> > > > >>> > >>> >>>>>>>> get
>> > > > >>> > >>> >>>>>>>>> the client-side feature turned on for all
>> brokers
>> > so
>> > > > that
>> > > > >>> > >>> >>>>>> inter-broker
>> > > > >>> > >>> >>>>>>>>> connections are re-authenticating.
>> > > > >>> > >>> >>>>>>>>> 3) In parallel with (1) and (2) above, upgrade
>> > > clients
>> > > > >>> to 2.x
>> > > > >>> > >>> and
>> > > > >>> > >>> >>>>>> turn
>> > > > >>> > >>> >>>>>>>>> their re-authentication feature on.  Clients
>> will
>> > > check
>> > > > >>> the
>> > > > >>> > API
>> > > > >>> > >>> >>>>>> version
>> > > > >>> > >>> >>>>>>>> and
>> > > > >>> > >>> >>>>>>>>> only re-authenticate to a broker that has also
>> been
>> > > > >>> upgraded
>> > > > >>> > >>> >>> (note
>> > > > >>> > >>> >>>>>> that
>> > > > >>> > >>> >>>>>>>> the
>> > > > >>> > >>> >>>>>>>>> ability of a broker to respond to a
>> > re-authentication
>> > > > >>> cannot
>> > > > >>> > be
>> > > > >>> > >>> >>>>>> turned
>> > > > >>> > >>> >>>>>>>> off
>> > > > >>> > >>> >>>>>>>>> -- it is always on beginning with version 2.x,
>> and
>> > it
>> > > > >>> just
>> > > > >>> > sits
>> > > > >>> > >>> >>>>>> there
>> > > > >>> > >>> >>>>>>>> doing
>> > > > >>> > >>> >>>>>>>>> nothing if it isn't exercised by an enabled
>> client)
>> > > > >>> > >>> >>>>>>>>> 4) After (1), (2), and (3) are complete, check
>> the
>> > > 2.x
>> > > > >>> broker
>> > > > >>> > >>> >>>>>> metrics
>> > > > >>> > >>> >>>>>>> to
>> > > > >>> > >>> >>>>>>>>> confirm that there are no "active, expired"
>> > > > connections.
>> > > > >>> > Once
>> > > > >>> > >>> >>> you
>> > > > >>> > >>> >>>>>> are
>> > > > >>> > >>> >>>>>>>>> satisfied that (1), (2), and (3) are indeed
>> > complete
>> > > > you
>> > > > >>> can
>> > > > >>> > >>> >>>>>> enable the
>> > > > >>> > >>> >>>>>>>>> server-side expired-connection-kill feature on
>> each
>> > > > >>> broker
>> > > > >>> > via
>> > > > >>> > >>> a
>> > > > >>> > >>> >>>>>>> restart
>> > > > >>> > >>> >>>>>>>> at
>> > > > >>> > >>> >>>>>>>>> whatever pace is desired.
>> > > > >>> > >>> >>>>>>>>>
>> > > > >>> > >>> >>>>>>>>> Comments?
>> > > > >>> > >>> >>>>>>>>>
>> > > > >>> > >>> >>>>>>>>> Ron
>> > > > >>> > >>> >>>>>>>>>
>> > > > >>> > >>> >>>>>>>>>
>> > > > >>> > >>> >>>>>>>>> On Wed, Sep 12, 2018 at 4:48 PM Ron Dagostino <
>> > > > >>> > >>> [email protected]
>> > > > >>> > >>> >>>>
>> > > > >>> > >>> >>>>>>> wrote:
>> > > > >>> > >>> >>>>>>>>>
>> > > > >>> > >>> >>>>>>>>>> Ok, I am tempted to just say we go with the
>> > > low-level
>> > > > >>> > approach
>> > > > >>> > >>> >>>>>> since
>> > > > >>> > >>> >>>>>>> it
>> >
>
>

Re: [DISCUSS] KIP 368: Allow SASL Connections to Periodically Re-Authenticate

Reply via email to