I agree, username+IP would be sufficient. I assume, when authentication is turned off or doesn’t exist, but authorization plugin is enabled, then username would be empty or passed as “nobody”, but with valid IP (if available).
> The name “context" is probably not the right one. The idea is to have an > object into which we can easily add additional properties in the future > to support additional authorization libraries without breaking backward > compatibility with existing ones. +1. Makes the design scalable. Thanks Bosco > > > ----- Original message ----- > From: Jarek Jarcec Cecho <jar...@apache.org> > To: dev@kafka.apache.org > Subject: Re: Two open issues on Kafka security > Date: Thu, 2 Oct 2014 08:33:45 -0700 > > Thanks for getting back Jay! > > For the interface - Looking at Sentry and other authorization libraries > in the Hadoop eco system it seems that “username” is primarily use to > perform authorization these days. And then IP for auditing. Hence I feel > that username+IP would be sufficient, at least for now. However I would > assume that in the future we might need more then just those two, so > what about defining the API in a way that we can easily extend in the > future, something like? > > authorize(Context, Entity, Action), where > > * Action - is the action that user is trying to do (read to topic, read > from topic, create topic, …) > * Entity - given entity that user is trying to perform that action on > (topic, …) > * Context - container with user/session information - user name, IP > address or perhaps entire certificate as was suggested early on the > email thread. > > The name “context" is probably not the right one. The idea is to have an > object into which we can easily add additional properties in the future > to support additional authorization libraries without breaking backward > compatibility with existing ones. > > The hierarchy is interesting topic - I’m not familiar enough with Kafka > internals so I can’t really talk about how much more complex it would > be. I can speak about Sentry and the way we designed security model for > Hive and Search where introducing the hierarchy wasn’t complex at all > and actually lead to a cleaner model. The biggest user visible benefit > is that you don’t have to deal with special rules such as “give READ > privilege to user jarcec to ALL topics”. If you have a singleton parent > entity (service or whatever name seems more accurate), you can easily > say that you have the READ access on this root entity and then all > topics will simply inherit that. > > Jarcec > > On Oct 1, 2014, at 9:33 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > >> Hey Jarek, >> >> I agree with the importance of separating authentication and >> authorization. The question is what concept of identity is sufficient >> to pass through to the authorization layer? Just a "user name"? Or >> perhaps you also need the ip the request originated from? Whatever >> these would be it would be nice to enumerate them so the authz portion >> can be written in a way that ignores the authn part. >> >> So if no one else proposes anything different maybe we can just say >> user name + ip? >> >> With respect to hierarchy, it would be nice to have topic hierarchies >> but we don't have them now so seems overkill to try to think them >> through wrt security now, right? >> >> -Jay >> >> >> >> On Wed, Oct 1, 2014 at 1:13 PM, Jarek Jarcec Cecho <jar...@apache.org> wrote: >>> I’m following the security proposal wiki page [1] and this discussion and I >>> would like to jump in with few points if I might :) Let me start by saying >>> that I like the material and the discussion here, good work! >>> >>> I was part of the team who originally designed and worked on Sentry and I >>> wanted to share few to see how it will resonate with people. My first and >>> probably biggest point would be to separate authorization and >>> authentication as two separate systems. I believe that Jao has already >>> stressed that in the email thread, but I wanted to reiterate on that point. >>> In my experience users don’t care that much about how the user has been >>> authenticated if they trust that mechanism, what they care more about is >>> that the authorization model is consistent and behaves the same way. E.g. >>> if I configured that user jarcec can write into topic “logs”, he should be >>> able to do that no matter where the connection came from - whether he has >>> been authorized from Kerberos as he is directly exploring the data from his >>> computer, he is authorized through delegation token because he is running >>> map reduce jobs calculating statistics or he is authorized through SSL >>> certificated because … (well I’m missing good example here, but you’re >>> probably following my point). >>> >>> I’ve also noticed that we are planning to have no hierarchy in the authz >>> object model per the wiki [1] with the reasoning that Kafka do not supports >>> topic hierarchy. I see that point, but at the same time it got me thinking >>> - are we sure that Kafka will never have hierarchic topics? Seems as a nice >>> feature that might be usable for some use cases and something that we might >>> want to add in the future. But regardless of that I would suggest to >>> introduce a hierarchy anyway, even though if it would be just two levels. >>> In sentry (for Hive) we’ve introduced concept of “Service” where all the >>> databases are children of the service. In Kafka I would imagine that we >>> would have “service” and “topics” as the children. Having this is much >>> easier to model general privileges where you need to grant access to all >>> topics - you will just grant access to the entire service and all topics >>> will get “inherited”. >>> >>> I’m wondering what are other people thoughts? >>> >>> Jarcec >>> >>> Links: >>> 1: https://cwiki.apache.org/confluence/display/KAFKA/Security >>> >>> On Oct 1, 2014, at 9:44 AM, Joe Stein <joe.st...@stealth.ly> wrote: >>> >>>> Hi Jonathan, >>>> >>>> "Hadoop delegation tokens to enable MapReduce, Samza, or other frameworks >>>> running in the Hadoop environment to access Kafka" >>>> https://cwiki.apache.org/confluence/display/KAFKA/Security is on the list, >>>> yup! >>>> >>>> /******************************************* >>>> Joe Stein >>>> Founder, Principal Consultant >>>> Big Data Open Source Security LLC >>>> http://www.stealth.ly >>>> Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop> >>>> ********************************************/ >>>> >>>> On Wed, Oct 1, 2014 at 12:35 PM, Jonathan Creasy <jonathan.cre...@turn.com> >>>> wrote: >>>> >>>>> This is not nearly as deep as the discussion so far, but I did want to >>>>> throw this idea out there to make sure we¹ve thought about it. >>>>> >>>>> The Kafka project should make sure that when deployed alongside a Hadoop >>>>> cluster from any major distributions that it can tie seamlessly into the >>>>> authentication and authorization used within that cluster. For example, >>>>> Apache Sentry. >>>>> >>>>> This may present additional difficulties that means a decision is made to >>>>> not do that or alternatively the Kerberos authentication and the >>>>> authorization schemes we are already working on may be sufficient. >>>>> >>>>> I¹m not sure that anything I¹ve read so far in this discussion actually >>>>> poses a problem, but I¹m an Ops guy and being able to more easily >>>>> integrate more things, makes my life better. :) >>>>> >>>>> -Jonathan >>>>> >>>>> On 9/30/14, 11:26 PM, "Joe Stein" <joe.st...@stealth.ly> wrote: >>>>> >>>>>> inline >>>>>> >>>>>> On Tue, Sep 30, 2014 at 11:58 PM, Jay Kreps <jay.kr...@gmail.com> wrote: >>>>>> >>>>>>> Hey Joe, >>>>>>> >>>>>>> For (1) what are you thinking for the PermissionManager api? >>>>>>> >>>>>>> The way I see it, the first question we have to answer is whether it >>>>>>> is possible to make authentication and authorization independent. What >>>>>>> I mean by that is whether I can write an authorization library that >>>>>>> will work the same whether you authenticate with ssl or kerberos. >>>>>> >>>>>> >>>>>> To me that is a requirement. We can't tie them together. We have to >>>>>> provide the ability for authorization to work regardless of the >>>>>> authentication. One *VERY* important use case is level of trust in >>>>>> authentication from the authorization perpsective. e.g. I authorize >>>>>> "identity" based on the how you authenticated.... Alice is able to view >>>>>> topic X if Alice authenticated over kerberos. Bob isn't allowed to view >>>>>> topic X no matter what. Alice can authenticate over not kerberos (uses >>>>>> cases for that) and in that case Alice wouldn't see topic X. A concrete >>>>>> use case for this with Kafka would be a third party bank consuming data >>>>>> to >>>>>> a broker. The service provider would have some kerberos local auth for >>>>>> that bank to-do back up that would also have access to other topics >>>>>> related >>>>>> to that banks data.... the bank itself over SSL wants a stream of events >>>>>> (some specific topic) and that banks identity only sees that topic. It >>>>>> is >>>>>> important to not confuse identity, authentication and authorization. >>>>>> >>>>>> >>>>>>> If >>>>>>> so then we need to pick some subset of identity information that we >>>>>>> can extract from both and have this constitute the identity we pass >>>>>>> into the authorization interface. The original proposal had just the >>>>>>> username/subject. But maybe we should add the ip address as well as >>>>>>> that is useful. What I would prefer not to do is add everything in the >>>>>>> certificate. I think the assumption is that you are generating these >>>>>>> certificates for Kafka so you can put whatever identity info you want >>>>>>> in the Subject Alternative Name. If that is true then just using that >>>>>>> should be okay, right? >>>>>>> >>>>>> >>>>>> I think we should just push the byte[] and let the plugin deal with it. >>>>>> So, if we have a certificate object then pass that along with whatever >>>>>> other meta data (e.g. IP address of client) we can. I don't think we >>>>>> should do any parsing whatsover and let the plugin deal with that. Any >>>>>> parsing we do on the identity information for the "security object" >>>>>> forces >>>>>> us into specific implementations and I don't see any reason to-do that... >>>>>> If plug-ins want an "easier" time to deal with certs and parsing and blah >>>>>> blah blah then we can implement some way they can do this without much >>>>>> fuss.... we also need to make sure that crypto library is plugable too >>>>>> (so >>>>>> we can expose an API for them to call) so that HSM can be easily dropped >>>>>> in >>>>>> without Kafka caring... so in the plugin we could provide a >>>>>> indentity.getAlternativeAttribute() and then that use case is solved (and >>>>>> we can use bouncy castle or whatever to parse it for them to make it >>>>>> easier).... and always give them raw bytes so they could do it >>>>>> themselves. >>>>>> >>>>>> >>>>>>> >>>>>>> -Jay >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Sep 30, 2014 at 4:09 PM, Joe Stein <joe.st...@stealth.ly> >>>>> wrote: >>>>>>>> 1) We need to support the most flexibility we can and make this >>>>>>> transparent >>>>>>>> to kafka (to use Gwen's term). Any specific implementation is going >>>>>>> to >>>>>>>> make it not work with some solution stopping people from using Kafka. >>>>>>> That >>>>>>>> is a reality because everyone just does it slightly differently >>>>>>> enough. >>>>>>> If >>>>>>>> we have an "identity" byte structure (lets not use string because some >>>>>>>> security objects are bytes) this should just fall through to the >>>>>>>> implementor. For certs this is the entire x509 object (not just the >>>>>>>> certificate part as it could contain an ASN.1 timestamp) and inside >>>>>>> you >>>>>>>> parse and do what you want with it. >>>>>>>> >>>>>>>> 2) While I think there are many benefits to just the handshake >>>>>>> approach I >>>>>>>> don't think it outweighs the cons Jay expressed. a) We can't lead the >>>>>>>> client libraries down a new path of interacting with Kafka. By >>>>>>>> incrementally adding to the wire protocol we are directing a very >>>>>>> clear >>>>>>> and >>>>>>>> expect ted approach. We already have issues with implementation even >>>>>>> with >>>>>>>> the wire protocol in place and are trying to improve that aspect of >>>>>>> the >>>>>>>> community as a whole. Lets not take a step backwards with this >>>>>>> there... >>>>>>>> also we need to not add more/different hoops to >>>>>>>> debugging/administering/monitoring kafka so taking advantage (as Jay >>>>>>> says) >>>>>>>> of built in logging (etc) is important... also for the client librariy >>>>>>>> developers too :) >>>>>>>> >>>>>>>> On Tue, Sep 30, 2014 at 6:44 PM, Gwen Shapira <gshap...@cloudera.com> >>>>>>> wrote: >>>>>>>> >>>>>>>>> Re #1: >>>>>>>>> >>>>>>>>> Since the auth_to_local is a kerberos config, its up to the admin to >>>>>>>>> decide how he likes the user names and set it up properly (or leave >>>>>>>>> empty) and make sure the ACLs match. Simplified names may be needed >>>>>>> if >>>>>>>>> the authorization system integrates with LDAP to get groups or >>>>>>>>> something fancy like that. >>>>>>>>> >>>>>>>>> Note that its completely transparent to Kafka - if the admin sets up >>>>>>>>> auth_to_local rules, we simply see a different principal name. No >>>>>>> need >>>>>>>>> to do anything different. >>>>>>>>> >>>>>>>>> Gwen >>>>>>>>> >>>>>>>>> On Tue, Sep 30, 2014 at 3:31 PM, Jay Kreps <jay.kr...@gmail.com> >>>>>>> wrote: >>>>>>>>>> Current proposal is here: >>>>>>>>>> >>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/Security >>>>>>>>>> >>>>>>>>>> Here are the two open questions I am aware of: >>>>>>>>>> >>>>>>>>>> 1. We want to separate authentication and authorization. This means >>>>>>>>>> permissions will be assigned to some user-like >>>>>>> subject/entity/person >>>>>>>>>> string that is independent of the authorization mechanism. It >>>>>>> sounds >>>>>>>>>> like we agreed this could be done and we had in mind some >>>>>>> krb-specific >>>>>>>>>> mangling that Gwen knew about and I think the plan was to use >>>>>>> whatever >>>>>>>>>> the user chose to put in the Subject Alternative Name of the cert >>>>>>> for >>>>>>>>>> ssl. So in both cases these would translate to a string denoting >>>>>>> the >>>>>>>>>> entity whom we are granting permissions to in the authorization >>>>>>> layer. >>>>>>>>>> We should document these in the wiki to get feedback on them. >>>>>>>>>> >>>>>>>>>> The Hadoop approach to extraction was something like this: >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>> http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.1/bk_installing_man >>>>>>> ually_book/content/rpm-chap14-2-3-1.html >>>>>>>>>> >>>>>>>>>> But actually I'm not sure if just using the full kerberos >>>>>>> principal is >>>>>>>>>> so bad? I.e. having the user be jenni...@athena.mit.edu versus >>>>> just >>>>>>>>>> jennifer. Where this would make a difference would be in a case >>>>>>> where >>>>>>>>>> you wanted the same user/entity to be able to authenticate via >>>>>>>>>> different mechanisms (Hadoop auth, kerberos, ssl) and have a single >>>>>>>>>> set of permissions. >>>>>>>>>> >>>>>>>>>> 2. For SASL/Kerberos we need to figure out how the communication >>>>>>>>>> between client and server will be handled to pass the >>>>>>>>>> challenge/response byte[]. I.e. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>> http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslClient.h >>>>>>> tml#evaluateChallenge(byte[]) >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>> http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslServer.h >>>>>>> tml#evaluateResponse(byte[]) >>>>>>>>>> >>>>>>>>>> I am not super expert in this area but I will try to give my >>>>>>>>>> understanding and I'm sure someone can correct me if I am confused. >>>>>>>>>> >>>>>>>>>> Unlike SSL the transmission of this is actually outside the scope >>>>>>> of >>>>>>>>>> SASL so we have to specify this. Two proposals >>>>>>>>>> >>>>>>>>>> Original Proposal: Add a new "authenticate" request/response >>>>>>>>>> >>>>>>>>>> The proposal in the original wiki was to add a new "authenticate" >>>>>>>>>> request/response to pass this information. This matches what was >>>>>>> done >>>>>>>>>> in the kerberos implementation for zookeeper. The intention is that >>>>>>>>>> the client would send this request immediately after establishing a >>>>>>>>>> connection, in which case it acts much like a "handshake", however >>>>>>>>>> there is no requirement that they do so. >>>>>>>>>> >>>>>>>>>> Whether the authentication happens via SSL or via Kerberos, the >>>>>>> effect >>>>>>>>>> will just be to set the username in their session. This will >>>>>>> default >>>>>>>>>> to the "anybody" user. So in the default non-secure case we will >>>>>>> just >>>>>>>>>> be defaulting "anybody" to have full permission. So to answer the >>>>>>>>>> question about whether changing user is required or not, I don't >>>>>>> think >>>>>>>>>> it is but I think we kind of get it for free in this approach. >>>>>>>>>> >>>>>>>>>> In this approach there is no particular need or advantage to >>>>>>> having a >>>>>>>>>> separate port for kerberos I don't think. >>>>>>>>>> >>>>>>>>>> Alternate Proposal: Create a Handshake >>>>>>>>>> >>>>>>>>>> The alternative I think Michael was proposing was to create a >>>>>>>>>> handshake that would happen at connection time on connections >>>>>>> coming >>>>>>>>>> in on the SASL port. This would require a separate port for SASL >>>>>>> since >>>>>>>>>> otherwise you wouldn't be able to tell if the bytes you were >>>>>>> getting >>>>>>>>>> were for SASL or were the first request of an unauthenticated >>>>>>>>>> connection. >>>>>>>>>> >>>>>>>>>> Michael it would be good to work out the details of how this works. >>>>>>>>>> Are we just sending size-delimited byte arrays back and forth until >>>>>>>>>> the challenge response terminates? >>>>>>>>>> >>>>>>>>>> My Take >>>>>>>>>> >>>>>>>>>> The pro I see for Michael's proposal is that it keeps the >>>>>>>>>> authentication logic more localized in the socket server. >>>>>>>>>> >>>>>>>>>> I see two cons: >>>>>>>>>> 1. Since the handshake won't go through the normal api layer it >>>>>>> won't >>>>>>>>>> go through the normal logging (e.g. request log), jmx monitoring, >>>>>>>>>> client trace token, correlation id, etc that we get for other >>>>>>>>>> requests. This could make operations a little confusing and make >>>>>>>>>> debugging a little harder since the client will be blocking on >>>>>>> network >>>>>>>>>> requests without the normal logging. >>>>>>>>>> 2. This part of the protocol will be inconsistent with the rest of >>>>>>> the >>>>>>>>>> Kafka protocol so it will be a little odd for client implementors >>>>>>> as >>>>>>>>>> this will effectively be a request/response that they will have to >>>>>>>>>> implement that will be different from all the other >>>>>>> request/responses >>>>>>>>>> they implement. >>>>>>>>>> >>>>>>>>>> In practice these two alternatives are not very different except >>>>>>> that >>>>>>>>>> in the original proposal the bytes you send are prefixed by the >>>>>>> normal >>>>>>>>>> request header fields such as the client id, correlation id, etc. >>>>>>>>>> Overall I would prefer this as I think it is a bit more consistent >>>>>>>>>> from the client's point of view. >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> >>>>>>>>>> -Jay >>>>>>>>> >>>>>>> >>>>> >>>>> >>> >