Re: [DISCUSS] KIP 368: Allow SASL Connections to Periodically Re-Authenticate

Ron Dagostino Wed, 05 Sep 2018 14:02:30 -0700

Hi again, Rajini,  I realized a couple of potential concerns with using the
TransportLayer directly during re-authentication.   First, in the blocking
I/O use case, the owner of the NetworkClient instance calls
NetworkClientUtils.sendAndReceive() to send requests.   This method assumes
the caller has exclusive access to the NetworkClient, so it does not check
to see if the node is ready; it just sends the request and repeatedly calls
poll() until the response arrives.  if we were to take the connection
temporarily offline that method currently has no mechanism for checking for
such a state before it sends the request; we could add it, but we would
have to put in some kind of sleep loop to keep checking for it to come back
"online" before it could send.  Adding such a sleep loop could be done, of
course, but it doesn't sound ideal.


A similar sleep loop situation exists in the async use case.  The owner
repeatedly calls poll() in a loop, but if the connection to the node is
temporarily offline then the poll() method would have to enter a sleep loop
until either the connection comes back online or the timeout elapses
(whichever comes first).

I don't know if there is an aversion to adding sleep loops like that, so
maybe it isn't a big issue, but I wanted to raise it as a potential concern
with this approach.

Also, is the import of SASL-specific classes in KafkaChannel a major
objection to the current implementation?  I could eliminate that by
replacing the 2 offending methods in KafkaChannel with this one and having
the implementation delegate to the authenticator:

    /**
     * Respond to a re-authentication request. This occurs on the
     * Server side of the re-authentication dance (i.e. on the broker).
     *
     * @param requestHeader
     *            the request header
     * @param request
     *            the request to process
     * @return the response to return to the client
     */
    public AbstractResponse respondToReauthenticationRequest(RequestHeader
requestHeader,
            AbstractRequest request)

There is practically no work being done in the KafkaChannel instance anyway
-- it does some sanity checking but otherwise delegates to the
authenticator.  We could just add a method to the Authenticator interface
and delegate the whole thing.

Ron

On Wed, Sep 5, 2018 at 2:07 PM Ron Dagostino <[email protected]> wrote:

> Hi Rajini.  I'm now skeptical of my "ConnectionState.REAUTHENTICATING" idea.
> The concept of a connection being "READY" or not can impact
> ConsumerCoordinator (see, for example,
> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java#L352).
> The semantics of a connection being "READY" and the
> implications/assumptions are not clear, and I suspect there will be some
> unintended consequences of this approach that may not be immediately
> apparent.
>
> I will confess that when I was implementing re-authentication I thought it
> might be worthwhile to unify the authentication-related code bases --
> except I suspected it would be good to go in the other direction: have the
> current code that directly uses the TransportLayer instead use
> AuthenticationSuccessOrFailureReceiver and AuthenticationRequestEnqueuer.
> I'm not advocating that we do it -- I decided to not go there when creating
> the PR, after all -- but I did get a strong feeling that directly using the
> TransportLayer as is currently done is really only viable before anybody
> else starts using the connection.  If we want to use the TransportLayer again
> after that point then it is up to us to somehow take the connection
> "temporarily offline" so that we have exclusive rights to it again, and I
> wonder if the concept of a connection being "temporarily offline" is
> something the existing code is able to handle -- probably not, and I
> suspect there are unstated assumptions that would be invalidated.
>
> Do you think this particular "ConnectionState.REAUTHENTICATING" idea is
> worth pursuing?  How about the general idea of trying to use the
> TransportLayer directly -- are you still feeling like it is viable?
>
> Ron
>
>
>
> Ron
>
> On Wed, Sep 5, 2018 at 11:40 AM Ron Dagostino <[email protected]> wrote:
>
>> <<<in favor of implementing server-side kill in addition to
>> re-authentication, not as a replacement.
>> <<<I think Rajini suggested the same thing
>> Oh, ok, I misunderstood.  Then I think we are on the same page: if we are
>> going to do re-authentication, then we should also do
>> server-side-kill-upon-expiration as part of the same implementation.  I'm
>> good with that.
>>
>> I am also looking into Rajini's idea of doing re-authentication at the
>> NetworkClient level and reusing the existing authentication code path.  I
>> was skeptical when she suggested it, but now that I look closer I see
>> something that I can try.  NetworkClient has logic to recognize the
>> state "ConnectionState.CONNECTING" as meaning "you can't do anything
>> with this connection at the moment; please wait."  I'm going to try adding
>> a new state "ConnectionState.REAUTHENTICATING" that would be recognized
>> in a similar way.  Then the challenge becomes inserting myself into any
>> existing flow that might be going on.  I'll probably add the request to set
>> the state to "REAUTHENTICATING" to a queue if I can't grab the state
>> immediately and have the network client's poll() method check at the end
>> to see if any such requests can be granted; there would be a callback
>> associated with the request, and that way I can be assured I would be
>> granted the request in a reasonable amount of time (assuming the connection
>> doesn't close in the meantime).  Then it would be up the callback
>> implementation to perform the re-authentication dance and set the state
>> back to "ConnectionState.READY".  I don't know if this will work, and
>> I'm probably missing some subtleties at the moment, but I'll give it a shot
>> and see what happens.
>>
>> Ron
>>
>> On Wed, Sep 5, 2018 at 11:23 AM Colin McCabe <[email protected]> wrote:
>>
>>> On Wed, Sep 5, 2018, at 07:34, Ron Dagostino wrote:
>>> > I added a "How To Support Re-Authentication for Other SASL Mechanisms"
>>> > section to the KIP as Rajini suggested.  I also added a "Rejected
>>> > Alternative" for the idea of forcibly closing connections on the client
>>> > side upon token refresh or on the server side upon token expiration.
>>> It
>>> > may be a bit premature to reject the server-side kill scenario given
>>> that
>>> > Colin and Rajini are partial to it, but see below for what I said
>>> about it,
>>> > and I think it makes sense -- server-side kill without an ability for
>>> the
>>> > client to re-authenticate to avoid it may be useful in certain specific
>>> > cases, but as a general feature it doesn't really work.  I would be
>>> willing
>>> > to add server-side-kill to the scope of this KIP if that is desired.
>>>
>>> Hi Ron,
>>>
>>> To clarify, I am in favor of implementing server-side kill in addition
>>> to re-authentication, not as a replacement.  I think Rajini suggested the
>>> same thing.
>>>
>>> It seems clear that server-side kill is needed to provide security.
>>> Otherwise a bad client can simply decide not to re-authenticate, and
>>> continue using server resources indefinitely.  Neither authentication nor
>>> re-authentication should be optional, or else the bad guys will simply take
>>> the option not to authenticate.
>>>
>>> best,
>>> Colin
>>>
>>>
>>> >
>>> > A brute-force alternative is to simply kill the connection on the
>>> client
>>> > > side when the background login thread refreshes the credential.  The
>>> > > advantage is that we don't need a code path for re-authentication –
>>> the
>>> > > client simply connects again to replace the connection that was
>>> killed.
>>> > > There are many disadvantages, though.  The approach is harsh – having
>>> > > connections pulled out from underneath the client will introduce
>>> latency
>>> > > while the client reconnects; it introduces non-trivial resource
>>> utilization
>>> > > on both the client and server as TLS is renegotiated; and it forces
>>> the
>>> > > client to periodically "recover" from what essentially looks like a
>>> failure
>>> > > scenario.  While these are significant disadvantages, the most
>>> significant
>>> > > disadvantage of all is that killing connections on the client side
>>> adds no
>>> > > security – trusting the client to kill its connection in a timely
>>> fashion
>>> > > is a blind and unjustifiable trust.
>>> > >
>>> >
>>> >
>>> > > We could kill the connection from the server side instead, when the
>>> token
>>> > > expires.  But in this case, if there is no ability for the client to
>>> > > re-authenticate to avoid the killing of the connection in the first
>>> place,
>>> > > then we still have all of the harsh approach disadvantages mentioned
>>> above.
>>> >
>>> >
>>> > Ron
>>> >
>>> > On Wed, Sep 5, 2018 at 10:25 AM Colin McCabe <[email protected]>
>>> wrote:
>>> >
>>> > > On Wed, Sep 5, 2018, at 01:41, Rajini Sivaram wrote:
>>> > > > *Re-authentication vs disconnection:*
>>> > > > In a vast number of secure Kafka deployments, SASL_SSL is the
>>> security
>>> > > > protocol (this is the recommended config for OAUTHBEARER). If we
>>> require
>>> > > > disconnections on token expiry, we would need new connections to be
>>> > > > established with an expensive SSL handshake. This adds load on the
>>> broker
>>> > > > and will result in a latency spike. For OAUTHBEARER in particular,
>>> when
>>> > > > tokens are used to make authorisation decisions, we want to be a
>>> able to
>>> > > > support short-lived tokens where token lifetime (granting
>>> authorisation)
>>> > > is
>>> > > > small. To make this usable in practice, I believe we need to
>>> support
>>> > > > re-authentication of existing connections.
>>> > >
>>> > > Hi Rajini,
>>> > >
>>> > > Thanks for the explanation.  That makes sense.
>>> > >
>>> > > >
>>> > > > *Also explicitly out-of-scope for this proposal is the ability for
>>> > > brokers
>>> > > > to close connections that continue to use expired credentials.
>>> This
>>> > > > ability is a natural next step, but it will be addressed via a
>>> separate
>>> > > KIP
>>> > > > if/when this one is adopted.*
>>> > > > Perhaps we could do this the other way round? I don't think we
>>> would ever
>>> > > > want to close connections on the client-side to support expired
>>> > > credentials
>>> > > > because that doesn't add any security guarantees. But we do
>>> require the
>>> > > > ability for brokers to close connections in order to enforce
>>> credential
>>> > > > expiry. Disconnection on the broker-side may be sufficient for some
>>> > > > deployments and could be useful on its own. It would also be the
>>> easier
>>> > > > implementation. So maybe that could be the first step?
>>> > >
>>> > > +1 for doing disconnection first.  Otherwise, as you noted, there
>>> are no
>>> > > security guarantees -- the client can just decide not to
>>> re-authenticate
>>> > > and keep using the old credentials.  You don't even need to modify
>>> the
>>> > > source code -- older clients would behave this way.
>>> > >
>>> > > best,
>>> > > Colin
>>> > >
>>> > > >
>>> > > > *The implementation is designed in such a way that it does not
>>> preclude
>>> > > > adding support for re-authentication of other SASL mechanism (e.g.
>>> PLAIN,
>>> > > > SCRAM-related, and GSSAPI), but doing so is explicitly
>>> out-of-scope for
>>> > > > this proposal. *
>>> > > > Isn't re-authentication driven by ExpiringCredential? We don't
>>> need to
>>> > > > support re-authentication by default for the other mechanisms in
>>> this
>>> > > KIP,
>>> > > > but any mechanism could enable this by adding a custom login
>>> callback
>>> > > > handler to provide an ExpiringCredential? For disconnection as
>>> well as
>>> > > > re-authentication, it will be good if we can specify exactly how
>>> it can
>>> > > be
>>> > > > implemented for each of the SASL mechanisms, even if we actually
>>> > > implement
>>> > > > it only for OAUTHBEARER.
>>> > > >
>>> > > >
>>> > > > On Wed, Sep 5, 2018 at 2:43 AM, Colin McCabe <[email protected]>
>>> wrote:
>>> > > >
>>> > > > > On Tue, Sep 4, 2018, at 17:43, Ron Dagostino wrote:
>>> > > > > > Hi Colin.  Different organizations will rely on different token
>>> > > > > lifetimes,
>>> > > > > > but anything shorter than an hour feels like it would be pretty
>>> > > > > > aggressive.  An hour or more will probably be most common.
>>> > > > >
>>> > > > > Thanks.  That's helpful to give me a sense of what the
>>> performance
>>> > > impact
>>> > > > > might be.
>>> > > > >
>>> > > > > >
>>> > > > > > <<<alternate solution of terminating connections when the
>>> bearer
>>> > > token
>>> > > > > > changed
>>> > > > > > I may be mistaken, but I think you are suggesting here that we
>>> > > forcibly
>>> > > > > > kill connections from the client side whenever the background
>>> Login
>>> > > > > refresh
>>> > > > > > thread refreshes the token (which it currently does so that the
>>> > > client
>>> > > > > can
>>> > > > > > continue to make new connections).  Am I correct?
>>> > > > >
>>> > > > > Yes, this is what I'm thinking about.  We could also terminate
>>> the
>>> > > > > connection on the server, if that is more convenient.
>>> > > > >
>>> > > > > >  If that is what you are
>>> > > > > > referring to, my sense is that it would be a very crude way of
>>> > > dealing
>>> > > > > with
>>> > > > > > the issue that would probably lead to dissatisfaction in some
>>> sense
>>> > > > > (though
>>> > > > > > I can't be sure).
>>> > > > >
>>> > > > > What information should we gather so that we can be sure?
>>> > > > >
>>> > > > > >  I do know that when I implemented SASL/OAUTHBEARER it
>>> > > > > > was communicated that leaving existing connections intact --
>>> as is
>>> > > done
>>> > > > > for
>>> > > > > > GSSAPI -- was the appropriate path forward.
>>> > > > >
>>> > > > > Thanks, that's good background information.  Can someone chime
>>> in with
>>> > > the
>>> > > > > reasoning behind this?
>>> > > > >
>>> > > > > My best guess is that terminating connections might cause a
>>> temporary
>>> > > > > increase in latency as they are re-established.
>>> > > > >
>>> > > > > In any case, we should figure out what the reasoning is so that
>>> we can
>>> > > > > make a decision.  It seems worthwhile including this as a
>>> "rejected
>>> > > > > alternative," at least.
>>> > > > >
>>> > > > > thanks,
>>> > > > > Colin
>>> > > > >
>>> > > > >
>>> > > > > >
>>> > > > > > Ron
>>> > > > > >
>>> > > > > > On Tue, Sep 4, 2018 at 8:31 PM Colin McCabe <
>>> [email protected]>
>>> > > wrote:
>>> > > > > >
>>> > > > > > > Hi Ron,
>>> > > > > > >
>>> > > > > > > Thanks for the KIP.
>>> > > > > > >
>>> > > > > > > What is the frequency at which you envision bearer tokens
>>> changing
>>> > > at?
>>> > > > > > >
>>> > > > > > > Did you consider the alternate solution of terminating
>>> connections
>>> > > when
>>> > > > > > > the bearer token changed?
>>> > > > > > >
>>> > > > > > > best,
>>> > > > > > > Colin
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > On Tue, Aug 28, 2018, at 07:28, Ron Dagostino wrote:
>>> > > > > > > > Hi everyone. I created KIP 368: Allow SASL Connections to
>>> > > > > Periodically
>>> > > > > > > > Re-Authenticate
>>> > > > > > > > <
>>> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>> > > > > 368%3A+Allow+SASL+Connections+to+Periodically+Re-Authenticate
>>> > > > > > > >
>>> > > > > > > > (
>>> > > > > > > >
>>> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>> > > > > 368%3A+Allow+SASL+Connections+to+Periodically+Re-Authenticate
>>> > > > > > > ).
>>> > > > > > > > The motivation for this KIP is as follows:
>>> > > > > > > >
>>> > > > > > > > The adoption of KIP-255: OAuth Authentication via
>>> > > SASL/OAUTHBEARER
>>> > > > > > > > <
>>> > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.
>>> > > > > action?pageId=75968876>
>>> > > > > > >
>>> > > > > > > > in
>>> > > > > > > > release 2.0.0 creates the possibility of using information
>>> in the
>>> > > > > bearer
>>> > > > > > > > token to make authorization decisions.  Unfortunately,
>>> however,
>>> > > Kafka
>>> > > > > > > > connections are long-lived, so there is no ability to
>>> change the
>>> > > > > bearer
>>> > > > > > > > token associated with a particular connection.  Allowing
>>> SASL
>>> > > > > > > > connections
>>> > > > > > > > to periodically re-authenticate would resolve this.  In
>>> addition
>>> > > to
>>> > > > > this
>>> > > > > > > > motivation there are two others that are security-related.
>>> > > First, to
>>> > > > > > > > eliminate access to Kafka for connected clients, the
>>> current
>>> > > > > requirement
>>> > > > > > > > is
>>> > > > > > > > to remove all authorizations (i.e. remove all ACLs).  This
>>> is
>>> > > > > necessary
>>> > > > > > > > because of the long-lived nature of the connections.  It is
>>> > > > > > > > operationally
>>> > > > > > > > simpler to shut off access at the point of authentication,
>>> and
>>> > > with
>>> > > > > the
>>> > > > > > > > release of KIP-86: Configurable SASL Callback Handlers
>>> > > > > > > > <
>>> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>> > > > > 86%3A+Configurable+SASL+callback+handlers
>>> > > > > > > >
>>> > > > > > > > it
>>> > > > > > > > is going to become more and more likely that installations
>>> will
>>> > > > > > > > authenticate users against external directories (e.g. via
>>> > > LDAP).  The
>>> > > > > > > > ability to stop Kafka access by simply disabling an
>>> account in an
>>> > > > > LDAP
>>> > > > > > > > directory (for example) is desirable.  The second
>>> motivating
>>> > > factor
>>> > > > > for
>>> > > > > > > > re-authentication related to security is that the use of
>>> > > short-lived
>>> > > > > > > > tokens
>>> > > > > > > > is a common OAuth security recommendation, but issuing a
>>> > > short-lived
>>> > > > > > > > token
>>> > > > > > > > to a Kafka client (or a broker when OAUTHBEARER is the
>>> > > inter-broker
>>> > > > > > > > protocol) currently has no benefit because once a client is
>>> > > > > connected to
>>> > > > > > > > a
>>> > > > > > > > broker the client is never challenged again and the
>>> connection
>>> > > may
>>> > > > > > > > remain
>>> > > > > > > > intact beyond the token expiration time (and may remain
>>> intact
>>> > > > > > > > indefinitely
>>> > > > > > > > under perfect circumstances).  This KIP proposes adding the
>>> > > ability
>>> > > > > for
>>> > > > > > > > clients (and brokers when OAUTHBEARER is the inter-broker
>>> > > protocol)
>>> > > > > to
>>> > > > > > > > re-authenticate their connections to brokers and have the
>>> new
>>> > > bearer
>>> > > > > > > > token
>>> > > > > > > > appear on their session rather than the old one.
>>> > > > > > > >
>>> > > > > > > > The description of this KIP is actually quite
>>> straightforward
>>> > > from a
>>> > > > > > > > functionality perspective; from an implementation
>>> perspective,
>>> > > > > though,
>>> > > > > > > the
>>> > > > > > > > KIP is not so straightforward, so it includes a pull
>>> request
>>> > > with a
>>> > > > > > > > proposed implementation.  See https://github.com/apache/
>>> > > > > kafka/pull/5582.
>>> > > > > > > >
>>> > > > > > > > Ron
>>> > > > > > >
>>> > > > >
>>> > >
>>>
>>

Re: [DISCUSS] KIP 368: Allow SASL Connections to Periodically Re-Authenticate

Reply via email to