Two open issues on Kafka security

Jay Kreps Tue, 30 Sep 2014 15:32:01 -0700

Current proposal is here:

https://cwiki.apache.org/confluence/display/KAFKA/Security

Here are the two open questions I am aware of:

1. We want to separate authentication and authorization. This means
permissions will be assigned to some user-like subject/entity/person
string that is independent of the authorization mechanism. It sounds
like we agreed this could be done and we had in mind some krb-specific
mangling that Gwen knew about and I think the plan was to use whatever
the user chose to put in the Subject Alternative Name of the cert for
ssl. So in both cases these would translate to a string denoting the
entity whom we are granting permissions to in the authorization layer.
We should document these in the wiki to get feedback on them.

The Hadoop approach to extraction was something like this:
http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.1/bk_installing_manually_book/content/rpm-chap14-2-3-1.html

But actually I'm not sure if just using the full kerberos principal is
so bad? I.e. having the user be jenni...@athena.mit.edu versus just
jennifer. Where this would make a difference would be in a case where
you wanted the same user/entity to be able to authenticate via
different mechanisms (Hadoop auth, kerberos, ssl) and have a single
set of permissions.

2. For SASL/Kerberos we need to figure out how the communication
between client and server will be handled to pass the
challenge/response byte[]. I.e.

http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslClient.html#evaluateChallenge(byte[])
http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslServer.html#evaluateResponse(byte[])

I am not super expert in this area but I will try to give my
understanding and I'm sure someone can correct me if I am confused.

Unlike SSL the transmission of this is actually outside the scope of
SASL so we have to specify this. Two proposals

Original Proposal: Add a new "authenticate" request/response

The proposal in the original wiki was to add a new "authenticate"
request/response to pass this information. This matches what was done
in the kerberos implementation for zookeeper. The intention is that
the client would send this request immediately after establishing a
connection, in which case it acts much like a "handshake", however
there is no requirement that they do so.

Whether the authentication happens via SSL or via Kerberos, the effect
will just be to set the username in their session. This will default
to the "anybody" user. So in the default non-secure case we will just
be defaulting "anybody" to have full permission. So to answer the
question about whether changing user is required or not, I don't think
it is but I think we kind of get it for free in this approach.

In this approach there is no particular need or advantage to having a
separate port for kerberos I don't think.

Alternate Proposal: Create a Handshake

The alternative I think Michael was proposing was to create a
handshake that would happen at connection time on connections coming
in on the SASL port. This would require a separate port for SASL since
otherwise you wouldn't be able to tell if the bytes you were getting
were for SASL or were the first request of an unauthenticated
connection.

Michael it would be good to work out the details of how this works.
Are we just sending size-delimited byte arrays back and forth until
the challenge response terminates?

My Take

The pro I see for Michael's proposal is that it keeps the
authentication logic more localized in the socket server.

I see two cons:
1. Since the handshake won't go through the normal api layer it won't
go through the normal logging (e.g. request log), jmx monitoring,
client trace token, correlation id, etc that we get for other
requests. This could make operations a little confusing and make
debugging a little harder since the client will be blocking on network
requests without the normal logging.
2. This part of the protocol will be inconsistent with the rest of the
Kafka protocol so it will be a little odd for client implementors as
this will effectively be a request/response that they will have to
implement that will be different from all the other request/responses
they implement.

In practice these two alternatives are not very different except that
in the original proposal the bytes you send are prefixed by the normal
request header fields such as the client id, correlation id, etc.
Overall I would prefer this as I think it is a bit more consistent
from the client's point of view.

Cheers,

-Jay

Two open issues on Kafka security

Reply via email to