Hi Jonathan, "Hadoop delegation tokens to enable MapReduce, Samza, or other frameworks running in the Hadoop environment to access Kafka" https://cwiki.apache.org/confluence/display/KAFKA/Security is on the list, yup!
/******************************************* Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop> ********************************************/ On Wed, Oct 1, 2014 at 12:35 PM, Jonathan Creasy <jonathan.cre...@turn.com> wrote: > This is not nearly as deep as the discussion so far, but I did want to > throw this idea out there to make sure we¹ve thought about it. > > The Kafka project should make sure that when deployed alongside a Hadoop > cluster from any major distributions that it can tie seamlessly into the > authentication and authorization used within that cluster. For example, > Apache Sentry. > > This may present additional difficulties that means a decision is made to > not do that or alternatively the Kerberos authentication and the > authorization schemes we are already working on may be sufficient. > > I¹m not sure that anything I¹ve read so far in this discussion actually > poses a problem, but I¹m an Ops guy and being able to more easily > integrate more things, makes my life better. :) > > -Jonathan > > On 9/30/14, 11:26 PM, "Joe Stein" <joe.st...@stealth.ly> wrote: > > >inline > > > >On Tue, Sep 30, 2014 at 11:58 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > > > >> Hey Joe, > >> > >> For (1) what are you thinking for the PermissionManager api? > >> > >> The way I see it, the first question we have to answer is whether it > >> is possible to make authentication and authorization independent. What > >> I mean by that is whether I can write an authorization library that > >> will work the same whether you authenticate with ssl or kerberos. > > > > > >To me that is a requirement. We can't tie them together. We have to > >provide the ability for authorization to work regardless of the > >authentication. One *VERY* important use case is level of trust in > >authentication from the authorization perpsective. e.g. I authorize > >"identity" based on the how you authenticated.... Alice is able to view > >topic X if Alice authenticated over kerberos. Bob isn't allowed to view > >topic X no matter what. Alice can authenticate over not kerberos (uses > >cases for that) and in that case Alice wouldn't see topic X. A concrete > >use case for this with Kafka would be a third party bank consuming data to > >a broker. The service provider would have some kerberos local auth for > >that bank to-do back up that would also have access to other topics > >related > >to that banks data.... the bank itself over SSL wants a stream of events > >(some specific topic) and that banks identity only sees that topic. It is > >important to not confuse identity, authentication and authorization. > > > > > >> If > >> so then we need to pick some subset of identity information that we > >> can extract from both and have this constitute the identity we pass > >> into the authorization interface. The original proposal had just the > >> username/subject. But maybe we should add the ip address as well as > >> that is useful. What I would prefer not to do is add everything in the > >> certificate. I think the assumption is that you are generating these > >> certificates for Kafka so you can put whatever identity info you want > >> in the Subject Alternative Name. If that is true then just using that > >> should be okay, right? > >> > > > >I think we should just push the byte[] and let the plugin deal with it. > >So, if we have a certificate object then pass that along with whatever > >other meta data (e.g. IP address of client) we can. I don't think we > >should do any parsing whatsover and let the plugin deal with that. Any > >parsing we do on the identity information for the "security object" forces > >us into specific implementations and I don't see any reason to-do that... > >If plug-ins want an "easier" time to deal with certs and parsing and blah > >blah blah then we can implement some way they can do this without much > >fuss.... we also need to make sure that crypto library is plugable too (so > >we can expose an API for them to call) so that HSM can be easily dropped > >in > >without Kafka caring... so in the plugin we could provide a > >indentity.getAlternativeAttribute() and then that use case is solved (and > >we can use bouncy castle or whatever to parse it for them to make it > >easier).... and always give them raw bytes so they could do it themselves. > > > > > >> > >> -Jay > >> > >> > >> > >> > >> > >> On Tue, Sep 30, 2014 at 4:09 PM, Joe Stein <joe.st...@stealth.ly> > wrote: > >> > 1) We need to support the most flexibility we can and make this > >> transparent > >> > to kafka (to use Gwen's term). Any specific implementation is going > >>to > >> > make it not work with some solution stopping people from using Kafka. > >> That > >> > is a reality because everyone just does it slightly differently > >>enough. > >> If > >> > we have an "identity" byte structure (lets not use string because some > >> > security objects are bytes) this should just fall through to the > >> > implementor. For certs this is the entire x509 object (not just the > >> > certificate part as it could contain an ASN.1 timestamp) and inside > >>you > >> > parse and do what you want with it. > >> > > >> > 2) While I think there are many benefits to just the handshake > >>approach I > >> > don't think it outweighs the cons Jay expressed. a) We can't lead the > >> > client libraries down a new path of interacting with Kafka. By > >> > incrementally adding to the wire protocol we are directing a very > >>clear > >> and > >> > expect ted approach. We already have issues with implementation even > >> with > >> > the wire protocol in place and are trying to improve that aspect of > >>the > >> > community as a whole. Lets not take a step backwards with this > >>there... > >> > also we need to not add more/different hoops to > >> > debugging/administering/monitoring kafka so taking advantage (as Jay > >> says) > >> > of built in logging (etc) is important... also for the client librariy > >> > developers too :) > >> > > >> > On Tue, Sep 30, 2014 at 6:44 PM, Gwen Shapira <gshap...@cloudera.com> > >> wrote: > >> > > >> >> Re #1: > >> >> > >> >> Since the auth_to_local is a kerberos config, its up to the admin to > >> >> decide how he likes the user names and set it up properly (or leave > >> >> empty) and make sure the ACLs match. Simplified names may be needed > >>if > >> >> the authorization system integrates with LDAP to get groups or > >> >> something fancy like that. > >> >> > >> >> Note that its completely transparent to Kafka - if the admin sets up > >> >> auth_to_local rules, we simply see a different principal name. No > >>need > >> >> to do anything different. > >> >> > >> >> Gwen > >> >> > >> >> On Tue, Sep 30, 2014 at 3:31 PM, Jay Kreps <jay.kr...@gmail.com> > >>wrote: > >> >> > Current proposal is here: > >> >> > > >> >> > https://cwiki.apache.org/confluence/display/KAFKA/Security > >> >> > > >> >> > Here are the two open questions I am aware of: > >> >> > > >> >> > 1. We want to separate authentication and authorization. This means > >> >> > permissions will be assigned to some user-like > >>subject/entity/person > >> >> > string that is independent of the authorization mechanism. It > >>sounds > >> >> > like we agreed this could be done and we had in mind some > >>krb-specific > >> >> > mangling that Gwen knew about and I think the plan was to use > >>whatever > >> >> > the user chose to put in the Subject Alternative Name of the cert > >>for > >> >> > ssl. So in both cases these would translate to a string denoting > >>the > >> >> > entity whom we are granting permissions to in the authorization > >>layer. > >> >> > We should document these in the wiki to get feedback on them. > >> >> > > >> >> > The Hadoop approach to extraction was something like this: > >> >> > > >> >> > >> > >> > http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.1/bk_installing_man > >>ually_book/content/rpm-chap14-2-3-1.html > >> >> > > >> >> > But actually I'm not sure if just using the full kerberos > >>principal is > >> >> > so bad? I.e. having the user be jenni...@athena.mit.edu versus > just > >> >> > jennifer. Where this would make a difference would be in a case > >>where > >> >> > you wanted the same user/entity to be able to authenticate via > >> >> > different mechanisms (Hadoop auth, kerberos, ssl) and have a single > >> >> > set of permissions. > >> >> > > >> >> > 2. For SASL/Kerberos we need to figure out how the communication > >> >> > between client and server will be handled to pass the > >> >> > challenge/response byte[]. I.e. > >> >> > > >> >> > > >> >> > >> > >> > http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslClient.h > >>tml#evaluateChallenge(byte[]) > >> >> > > >> >> > >> > >> > http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslServer.h > >>tml#evaluateResponse(byte[]) > >> >> > > >> >> > I am not super expert in this area but I will try to give my > >> >> > understanding and I'm sure someone can correct me if I am confused. > >> >> > > >> >> > Unlike SSL the transmission of this is actually outside the scope > >>of > >> >> > SASL so we have to specify this. Two proposals > >> >> > > >> >> > Original Proposal: Add a new "authenticate" request/response > >> >> > > >> >> > The proposal in the original wiki was to add a new "authenticate" > >> >> > request/response to pass this information. This matches what was > >>done > >> >> > in the kerberos implementation for zookeeper. The intention is that > >> >> > the client would send this request immediately after establishing a > >> >> > connection, in which case it acts much like a "handshake", however > >> >> > there is no requirement that they do so. > >> >> > > >> >> > Whether the authentication happens via SSL or via Kerberos, the > >>effect > >> >> > will just be to set the username in their session. This will > >>default > >> >> > to the "anybody" user. So in the default non-secure case we will > >>just > >> >> > be defaulting "anybody" to have full permission. So to answer the > >> >> > question about whether changing user is required or not, I don't > >>think > >> >> > it is but I think we kind of get it for free in this approach. > >> >> > > >> >> > In this approach there is no particular need or advantage to > >>having a > >> >> > separate port for kerberos I don't think. > >> >> > > >> >> > Alternate Proposal: Create a Handshake > >> >> > > >> >> > The alternative I think Michael was proposing was to create a > >> >> > handshake that would happen at connection time on connections > >>coming > >> >> > in on the SASL port. This would require a separate port for SASL > >>since > >> >> > otherwise you wouldn't be able to tell if the bytes you were > >>getting > >> >> > were for SASL or were the first request of an unauthenticated > >> >> > connection. > >> >> > > >> >> > Michael it would be good to work out the details of how this works. > >> >> > Are we just sending size-delimited byte arrays back and forth until > >> >> > the challenge response terminates? > >> >> > > >> >> > My Take > >> >> > > >> >> > The pro I see for Michael's proposal is that it keeps the > >> >> > authentication logic more localized in the socket server. > >> >> > > >> >> > I see two cons: > >> >> > 1. Since the handshake won't go through the normal api layer it > >>won't > >> >> > go through the normal logging (e.g. request log), jmx monitoring, > >> >> > client trace token, correlation id, etc that we get for other > >> >> > requests. This could make operations a little confusing and make > >> >> > debugging a little harder since the client will be blocking on > >>network > >> >> > requests without the normal logging. > >> >> > 2. This part of the protocol will be inconsistent with the rest of > >>the > >> >> > Kafka protocol so it will be a little odd for client implementors > >>as > >> >> > this will effectively be a request/response that they will have to > >> >> > implement that will be different from all the other > >>request/responses > >> >> > they implement. > >> >> > > >> >> > In practice these two alternatives are not very different except > >>that > >> >> > in the original proposal the bytes you send are prefixed by the > >>normal > >> >> > request header fields such as the client id, correlation id, etc. > >> >> > Overall I would prefer this as I think it is a bit more consistent > >> >> > from the client's point of view. > >> >> > > >> >> > Cheers, > >> >> > > >> >> > -Jay > >> >> > >> > >