Re: Two open issues on Kafka security

Michael Herstine Thu, 02 Oct 2014 09:02:07 -0700

Hi Jay,

Yup― in both SASL & (non-blocking) SSL the runtime libs provide an
“engine” abstraction that just takes in & produces buffers of byte
containing the authentication messages. The application is responsible for
transmitting them… somehow. I was picturing a simple length-prefixed
packet.


Thanks for the pointer to the ZK code― I spent yesterday morning reading
the server side & see how it’s being done (interesting side note: SASL is
only used for Kerberos― other authentication schemes go through a
different mechanism).

I’m all for going with the original proposal & not introducing a second
(albeit trivial) protocol… I was laboring under the impression that we
wanted to avoid adding new request/response types, that’s all.

On 10/1/14, 9:52 PM, "Jay Kreps" <jay.kr...@gmail.com> wrote:

>Here is the client side in ZK:
>https://svn.apache.org/repos/asf/zookeeper/trunk/src/java/main/org/apache/
>zookeeper/client/ZooKeeperSaslClient.java
>
>Note how they have a special Zookeeper request API that is used to
>send the SASL bytes (e.g. see ZooKeeperSaslClient.sendSaslPacket).
>
>This API follows the same protocol and rpc mechanism all their other
>request/response types follow but it just has a simple byte[] entry
>for the SASL token in both the request and response.
>
>-Jay
>
>On Wed, Oct 1, 2014 at 9:46 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
>> Hey Michael,
>>
>> WRT question 2, I think for SASL you do need the mechanism information
>> but what I was talking about was the challenge/response byte[] that is
>> sent back and forth from the client to the server. My understanding is
>> that SASL gives you an api for the client and server to use to produce
>> these byte[]'s but doesn't actually specify any way of exchanging them
>> (that is protocol specific). I could be wrong here since my knowledge
>> of this stuff is pretty weak. But according to my understanding you
>> must be imagining some protocol for exchanging challenge/response
>> information. This protocol would have to be clearly documented for
>> client implementors. What is that protocol?
>>
>> -Jay
>>
>> On Wed, Oct 1, 2014 at 2:36 PM, Michael Herstine
>> <mherst...@linkedin.com.invalid> wrote:
>>> Regarding question #1, I’m not sure I follow you, Joe: you’re
>>>proposing (I
>>> think) that the API take a byte[], but what will be in that array? A
>>> serialized certificate if the client authenticated via SSL and the
>>> principal name (perhaps normalized) if the client authenticated via
>>> Kerberos?
>>>
>>> Regarding question #2, I think I was unclear in the meeting yesterday:
>>>I
>>> was proposing a separate port for each authentication method (including
>>> none). That is, if a client wants no authentication, then they would
>>> connect to port N on the broker. If they wanted to talk over SSL, then
>>> they connect to port N+1 (say). Kerberos: N+2. This would remove the
>>>need
>>> for a new request, since the authentication type would be implicit in
>>>the
>>> port on which the client connected (and it was my understanding that it
>>> was desirable to not introduce any new messages).
>>>
>>> Perhaps the confusion comes from the fact, correctly pointed out by
>>>Jay,
>>> that when you want to use SASL on a single port, there does of course
>>>need
>>> to be a way for the incoming client to signal which mechanism it wants
>>>to
>>> use, and that’s out of scope of the SASL spec. I didn’t see there
>>>being a
>>> desire to add new SASL mechanisms going forward, but perhaps I was
>>> incorrect?
>>>
>>> In any event, I’d like to suggest we keep the “open” or “no auth” port
>>> separate, both to make it easy for admins to force the use of security
>>>(by
>>> shutting down that port) and to avoid downgrade attacks (where an
>>>attacker
>>> intercepts the opening packet from a client requesting security &
>>>alters
>>> it to request none).
>>>
>>> I’ll update the Wiki with my notes from yesterday’s meeting this
>>>afternoon.
>>>
>>> Thanks,
>>>
>>> On 10/1/14, 9:35 AM, "Jonathan Creasy" <jonathan.cre...@turn.com>
>>>wrote:
>>>
>>>>This is not nearly as deep as the discussion so far, but I did want to
>>>>throw this idea out there to make sure we¹ve thought about it.
>>>>
>>>>The Kafka project should make sure that when deployed alongside a
>>>>Hadoop
>>>>cluster from any major distributions that it can tie seamlessly into
>>>>the
>>>>authentication and authorization used within that cluster. For example,
>>>>Apache Sentry.
>>>>
>>>>This may present additional difficulties that means a decision is made
>>>>to
>>>>not do that or alternatively the Kerberos authentication and the
>>>>authorization schemes we are already working on may be sufficient.
>>>>
>>>>I¹m not sure that anything I¹ve read so far in this discussion actually
>>>>poses a problem, but I¹m an Ops guy and being able to more easily
>>>>integrate more things, makes my life better. :)
>>>>
>>>>-Jonathan
>>>>
>>>>On 9/30/14, 11:26 PM, "Joe Stein" <joe.st...@stealth.ly> wrote:
>>>>
>>>>>inline
>>>>>
>>>>>On Tue, Sep 30, 2014 at 11:58 PM, Jay Kreps <jay.kr...@gmail.com>
>>>>>wrote:
>>>>>
>>>>>> Hey Joe,
>>>>>>
>>>>>> For (1) what are you thinking for the PermissionManager api?
>>>>>>
>>>>>> The way I see it, the first question we have to answer is whether it
>>>>>> is possible to make authentication and authorization independent.
>>>>>>What
>>>>>> I mean by that is whether I can write an authorization library that
>>>>>> will work the same whether you authenticate with ssl or kerberos.
>>>>>
>>>>>
>>>>>To me that is a requirement. We can't tie them together.  We have to
>>>>>provide the ability for authorization to work regardless of the
>>>>>authentication.  One *VERY* important use case is level of trust in
>>>>>authentication from the authorization perpsective.  e.g. I authorize
>>>>>"identity" based on the how you authenticated.... Alice is able to
>>>>>view
>>>>>topic X if Alice authenticated over kerberos.  Bob isn't allowed to
>>>>>view
>>>>>topic X no matter what. Alice can authenticate over not kerberos (uses
>>>>>cases for that) and in that case Alice wouldn't see topic X.  A
>>>>>concrete
>>>>>use case for this with Kafka would be a third party bank consuming
>>>>>data
>>>>>to
>>>>>a broker.  The service provider would have some kerberos local auth
>>>>>for
>>>>>that bank to-do back up that would also have access to other topics
>>>>>related
>>>>>to that banks data.... the bank itself over SSL wants a stream of
>>>>>events
>>>>>(some specific topic) and that banks identity only sees that topic.
>>>>>It
>>>>>is
>>>>>important to not confuse identity, authentication and authorization.
>>>>>
>>>>>
>>>>>> If
>>>>>> so then we need to pick some subset of identity information that we
>>>>>> can extract from both and have this constitute the identity we pass
>>>>>> into the authorization interface. The original proposal had just the
>>>>>> username/subject. But maybe we should add the ip address as well as
>>>>>> that is useful. What I would prefer not to do is add everything in
>>>>>>the
>>>>>> certificate. I think the assumption is that you are generating these
>>>>>> certificates for Kafka so you can put whatever identity info you
>>>>>>want
>>>>>> in the Subject Alternative Name. If that is true then just using
>>>>>>that
>>>>>> should be okay, right?
>>>>>>
>>>>>
>>>>>I think we should just push the byte[] and let the plugin deal with
>>>>>it.
>>>>>So, if we have a certificate object then pass that along with whatever
>>>>>other meta data (e.g. IP address of client) we can.  I don't think we
>>>>>should do any parsing whatsover and let the plugin deal with that.
>>>>>Any
>>>>>parsing we do on the identity information for the "security object"
>>>>>forces
>>>>>us into specific implementations and I don't see any reason to-do
>>>>>that...
>>>>>If plug-ins want an "easier" time to deal with certs and parsing and
>>>>>blah
>>>>>blah blah then we can implement some way they can do this without much
>>>>>fuss.... we also need to make sure that crypto library is plugable too
>>>>>(so
>>>>>we can expose an API for them to call) so that HSM can be easily
>>>>>dropped
>>>>>in
>>>>>without Kafka caring... so in the plugin we could provide a
>>>>>indentity.getAlternativeAttribute() and then that use case is solved
>>>>>(and
>>>>>we can use bouncy castle or whatever to parse it for them to make it
>>>>>easier).... and always give them raw bytes so they could do it
>>>>>themselves.
>>>>>
>>>>>
>>>>>>
>>>>>> -Jay
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 30, 2014 at 4:09 PM, Joe Stein <joe.st...@stealth.ly>
>>>>>>wrote:
>>>>>> > 1) We need to support the most flexibility we can and make this
>>>>>> transparent
>>>>>> > to kafka (to use Gwen's term).  Any specific implementation is
>>>>>>going
>>>>>>to
>>>>>> > make it not work with some solution stopping people from using
>>>>>>Kafka.
>>>>>> That
>>>>>> > is a reality because everyone just does it slightly differently
>>>>>>enough.
>>>>>> If
>>>>>> > we have an "identity" byte structure (lets not use string because
>>>>>>some
>>>>>> > security objects are bytes) this should just fall through to the
>>>>>> > implementor.  For certs this is the entire x509 object (not just
>>>>>>the
>>>>>> > certificate part as it could contain an ASN.1 timestamp) and
>>>>>>inside
>>>>>>you
>>>>>> > parse and do what you want with it.
>>>>>> >
>>>>>> > 2) While I think there are many benefits to just the handshake
>>>>>>approach I
>>>>>> > don't think it outweighs the cons Jay expressed. a) We can't lead
>>>>>>the
>>>>>> > client libraries down a new path of interacting with Kafka.  By
>>>>>> > incrementally adding to the wire protocol we are directing a very
>>>>>>clear
>>>>>> and
>>>>>> > expect ted approach.  We already have issues with implementation
>>>>>>even
>>>>>> with
>>>>>> > the wire protocol in place and are trying to improve that aspect
>>>>>>of
>>>>>>the
>>>>>> > community as a whole.  Lets not take a step backwards with this
>>>>>>there...
>>>>>> > also we need to not add more/different hoops to
>>>>>> > debugging/administering/monitoring kafka so taking advantage (as
>>>>>>Jay
>>>>>> says)
>>>>>> > of built in logging (etc) is important... also for the client
>>>>>>librariy
>>>>>> > developers too :)
>>>>>> >
>>>>>> > On Tue, Sep 30, 2014 at 6:44 PM, Gwen Shapira
>>>>>><gshap...@cloudera.com>
>>>>>> wrote:
>>>>>> >
>>>>>> >> Re #1:
>>>>>> >>
>>>>>> >> Since the auth_to_local is a kerberos config, its up to the
>>>>>>admin to
>>>>>> >> decide how he likes the user names and set it up properly (or
>>>>>>leave
>>>>>> >> empty) and make sure the ACLs match. Simplified names may be
>>>>>>needed
>>>>>>if
>>>>>> >> the authorization system integrates with LDAP to get groups or
>>>>>> >> something fancy like that.
>>>>>> >>
>>>>>> >> Note that its completely transparent to Kafka - if the admin
>>>>>>sets up
>>>>>> >> auth_to_local rules, we simply see a different principal name. No
>>>>>>need
>>>>>> >> to do anything different.
>>>>>> >>
>>>>>> >> Gwen
>>>>>> >>
>>>>>> >> On Tue, Sep 30, 2014 at 3:31 PM, Jay Kreps <jay.kr...@gmail.com>
>>>>>>wrote:
>>>>>> >> > Current proposal is here:
>>>>>> >> >
>>>>>> >> > https://cwiki.apache.org/confluence/display/KAFKA/Security
>>>>>> >> >
>>>>>> >> > Here are the two open questions I am aware of:
>>>>>> >> >
>>>>>> >> > 1. We want to separate authentication and authorization. This
>>>>>>means
>>>>>> >> > permissions will be assigned to some user-like
>>>>>>subject/entity/person
>>>>>> >> > string that is independent of the authorization mechanism. It
>>>>>>sounds
>>>>>> >> > like we agreed this could be done and we had in mind some
>>>>>>krb-specific
>>>>>> >> > mangling that Gwen knew about and I think the plan was to use
>>>>>>whatever
>>>>>> >> > the user chose to put in the Subject Alternative Name of the
>>>>>>cert
>>>>>>for
>>>>>> >> > ssl. So in both cases these would translate to a string
>>>>>>denoting
>>>>>>the
>>>>>> >> > entity whom we are granting permissions to in the authorization
>>>>>>layer.
>>>>>> >> > We should document these in the wiki to get feedback on them.
>>>>>> >> >
>>>>>> >> > The Hadoop approach to extraction was something like this:
>>>>>> >> >
>>>>>> >>
>>>>>>
>>>>>>http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.1/bk_installing
>>>>>>_ma
>>>>>>n
>>>>>>ually_book/content/rpm-chap14-2-3-1.html
>>>>>> >> >
>>>>>> >> > But actually I'm not sure if just using the full kerberos
>>>>>>principal is
>>>>>> >> > so bad? I.e. having the user be jenni...@athena.mit.edu versus
>>>>>>just
>>>>>> >> > jennifer. Where this would make a difference would be in a case
>>>>>>where
>>>>>> >> > you wanted the same user/entity to be able to authenticate via
>>>>>> >> > different mechanisms (Hadoop auth, kerberos, ssl) and have a
>>>>>>single
>>>>>> >> > set of permissions.
>>>>>> >> >
>>>>>> >> > 2. For SASL/Kerberos we need to figure out how the
>>>>>>communication
>>>>>> >> > between client and server will be handled to pass the
>>>>>> >> > challenge/response byte[]. I.e.
>>>>>> >> >
>>>>>> >> >
>>>>>> >>
>>>>>>
>>>>>>http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslClie
>>>>>>nt.
>>>>>>h
>>>>>>tml#evaluateChallenge(byte[])
>>>>>> >> >
>>>>>> >>
>>>>>>
>>>>>>http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslServ
>>>>>>er.
>>>>>>h
>>>>>>tml#evaluateResponse(byte[])
>>>>>> >> >
>>>>>> >> > I am not super expert in this area but I will try to give my
>>>>>> >> > understanding and I'm sure someone can correct me if I am
>>>>>>confused.
>>>>>> >> >
>>>>>> >> > Unlike SSL the transmission of this is actually outside the
>>>>>>scope
>>>>>>of
>>>>>> >> > SASL so we have to specify this. Two proposals
>>>>>> >> >
>>>>>> >> > Original Proposal: Add a new "authenticate" request/response
>>>>>> >> >
>>>>>> >> > The proposal in the original wiki was to add a new
>>>>>>"authenticate"
>>>>>> >> > request/response to pass this information. This matches what
>>>>>>was
>>>>>>done
>>>>>> >> > in the kerberos implementation for zookeeper. The intention is
>>>>>>that
>>>>>> >> > the client would send this request immediately after
>>>>>>establishing
>>>>>>a
>>>>>> >> > connection, in which case it acts much like a "handshake",
>>>>>>however
>>>>>> >> > there is no requirement that they do so.
>>>>>> >> >
>>>>>> >> > Whether the authentication happens via SSL or via Kerberos, the
>>>>>>effect
>>>>>> >> > will just be to set the username in their session. This will
>>>>>>default
>>>>>> >> > to the "anybody" user. So in the default non-secure case we
>>>>>>will
>>>>>>just
>>>>>> >> > be defaulting "anybody" to have full permission. So to answer
>>>>>>the
>>>>>> >> > question about whether changing user is required or not, I
>>>>>>don't
>>>>>>think
>>>>>> >> > it is but I think we kind of get it for free in this approach.
>>>>>> >> >
>>>>>> >> > In this approach there is no particular need or advantage to
>>>>>>having a
>>>>>> >> > separate port for kerberos I don't think.
>>>>>> >> >
>>>>>> >> > Alternate Proposal: Create a Handshake
>>>>>> >> >
>>>>>> >> > The alternative I think Michael was proposing was to create a
>>>>>> >> > handshake that would happen at connection time on connections
>>>>>>coming
>>>>>> >> > in on the SASL port. This would require a separate port for
>>>>>>SASL
>>>>>>since
>>>>>> >> > otherwise you wouldn't be able to tell if the bytes you were
>>>>>>getting
>>>>>> >> > were for SASL or were the first request of an unauthenticated
>>>>>> >> > connection.
>>>>>> >> >
>>>>>> >> > Michael it would be good to work out the details of how this
>>>>>>works.
>>>>>> >> > Are we just sending size-delimited byte arrays back and forth
>>>>>>until
>>>>>> >> > the challenge response terminates?
>>>>>> >> >
>>>>>> >> > My Take
>>>>>> >> >
>>>>>> >> > The pro I see for Michael's proposal is that it keeps the
>>>>>> >> > authentication logic more localized in the socket server.
>>>>>> >> >
>>>>>> >> > I see two cons:
>>>>>> >> > 1. Since the handshake won't go through the normal api layer it
>>>>>>won't
>>>>>> >> > go through the normal logging (e.g. request log), jmx
>>>>>>monitoring,
>>>>>> >> > client trace token, correlation id, etc that we get for other
>>>>>> >> > requests. This could make operations a little confusing and
>>>>>>make
>>>>>> >> > debugging a little harder since the client will be blocking on
>>>>>>network
>>>>>> >> > requests without the normal logging.
>>>>>> >> > 2. This part of the protocol will be inconsistent with the
>>>>>>rest of
>>>>>>the
>>>>>> >> > Kafka protocol so it will be a little odd for client
>>>>>>implementors
>>>>>>as
>>>>>> >> > this will effectively be a request/response that they will
>>>>>>have to
>>>>>> >> > implement that will be different from all the other
>>>>>>request/responses
>>>>>> >> > they implement.
>>>>>> >> >
>>>>>> >> > In practice these two alternatives are not very different
>>>>>>except
>>>>>>that
>>>>>> >> > in the original proposal the bytes you send are prefixed by the
>>>>>>normal
>>>>>> >> > request header fields such as the client id, correlation id,
>>>>>>etc.
>>>>>> >> > Overall I would prefer this as I think it is a bit more
>>>>>>consistent
>>>>>> >> > from the client's point of view.
>>>>>> >> >
>>>>>> >> > Cheers,
>>>>>> >> >
>>>>>> >> > -Jay
>>>>>> >>
>>>>>>
>>>>
>>>

Re: Two open issues on Kafka security

Reply via email to