Hi Jay, Yup― in both SASL & (non-blocking) SSL the runtime libs provide an “engine” abstraction that just takes in & produces buffers of byte containing the authentication messages. The application is responsible for transmitting them… somehow. I was picturing a simple length-prefixed packet.
Thanks for the pointer to the ZK code― I spent yesterday morning reading the server side & see how it’s being done (interesting side note: SASL is only used for Kerberos― other authentication schemes go through a different mechanism). I’m all for going with the original proposal & not introducing a second (albeit trivial) protocol… I was laboring under the impression that we wanted to avoid adding new request/response types, that’s all. On 10/1/14, 9:52 PM, "Jay Kreps" <jay.kr...@gmail.com> wrote: >Here is the client side in ZK: >https://svn.apache.org/repos/asf/zookeeper/trunk/src/java/main/org/apache/ >zookeeper/client/ZooKeeperSaslClient.java > >Note how they have a special Zookeeper request API that is used to >send the SASL bytes (e.g. see ZooKeeperSaslClient.sendSaslPacket). > >This API follows the same protocol and rpc mechanism all their other >request/response types follow but it just has a simple byte[] entry >for the SASL token in both the request and response. > >-Jay > >On Wed, Oct 1, 2014 at 9:46 PM, Jay Kreps <jay.kr...@gmail.com> wrote: >> Hey Michael, >> >> WRT question 2, I think for SASL you do need the mechanism information >> but what I was talking about was the challenge/response byte[] that is >> sent back and forth from the client to the server. My understanding is >> that SASL gives you an api for the client and server to use to produce >> these byte[]'s but doesn't actually specify any way of exchanging them >> (that is protocol specific). I could be wrong here since my knowledge >> of this stuff is pretty weak. But according to my understanding you >> must be imagining some protocol for exchanging challenge/response >> information. This protocol would have to be clearly documented for >> client implementors. What is that protocol? >> >> -Jay >> >> On Wed, Oct 1, 2014 at 2:36 PM, Michael Herstine >> <mherst...@linkedin.com.invalid> wrote: >>> Regarding question #1, I’m not sure I follow you, Joe: you’re >>>proposing (I >>> think) that the API take a byte[], but what will be in that array? A >>> serialized certificate if the client authenticated via SSL and the >>> principal name (perhaps normalized) if the client authenticated via >>> Kerberos? >>> >>> Regarding question #2, I think I was unclear in the meeting yesterday: >>>I >>> was proposing a separate port for each authentication method (including >>> none). That is, if a client wants no authentication, then they would >>> connect to port N on the broker. If they wanted to talk over SSL, then >>> they connect to port N+1 (say). Kerberos: N+2. This would remove the >>>need >>> for a new request, since the authentication type would be implicit in >>>the >>> port on which the client connected (and it was my understanding that it >>> was desirable to not introduce any new messages). >>> >>> Perhaps the confusion comes from the fact, correctly pointed out by >>>Jay, >>> that when you want to use SASL on a single port, there does of course >>>need >>> to be a way for the incoming client to signal which mechanism it wants >>>to >>> use, and that’s out of scope of the SASL spec. I didn’t see there >>>being a >>> desire to add new SASL mechanisms going forward, but perhaps I was >>> incorrect? >>> >>> In any event, I’d like to suggest we keep the “open” or “no auth” port >>> separate, both to make it easy for admins to force the use of security >>>(by >>> shutting down that port) and to avoid downgrade attacks (where an >>>attacker >>> intercepts the opening packet from a client requesting security & >>>alters >>> it to request none). >>> >>> I’ll update the Wiki with my notes from yesterday’s meeting this >>>afternoon. >>> >>> Thanks, >>> >>> On 10/1/14, 9:35 AM, "Jonathan Creasy" <jonathan.cre...@turn.com> >>>wrote: >>> >>>>This is not nearly as deep as the discussion so far, but I did want to >>>>throw this idea out there to make sure we¹ve thought about it. >>>> >>>>The Kafka project should make sure that when deployed alongside a >>>>Hadoop >>>>cluster from any major distributions that it can tie seamlessly into >>>>the >>>>authentication and authorization used within that cluster. For example, >>>>Apache Sentry. >>>> >>>>This may present additional difficulties that means a decision is made >>>>to >>>>not do that or alternatively the Kerberos authentication and the >>>>authorization schemes we are already working on may be sufficient. >>>> >>>>I¹m not sure that anything I¹ve read so far in this discussion actually >>>>poses a problem, but I¹m an Ops guy and being able to more easily >>>>integrate more things, makes my life better. :) >>>> >>>>-Jonathan >>>> >>>>On 9/30/14, 11:26 PM, "Joe Stein" <joe.st...@stealth.ly> wrote: >>>> >>>>>inline >>>>> >>>>>On Tue, Sep 30, 2014 at 11:58 PM, Jay Kreps <jay.kr...@gmail.com> >>>>>wrote: >>>>> >>>>>> Hey Joe, >>>>>> >>>>>> For (1) what are you thinking for the PermissionManager api? >>>>>> >>>>>> The way I see it, the first question we have to answer is whether it >>>>>> is possible to make authentication and authorization independent. >>>>>>What >>>>>> I mean by that is whether I can write an authorization library that >>>>>> will work the same whether you authenticate with ssl or kerberos. >>>>> >>>>> >>>>>To me that is a requirement. We can't tie them together. We have to >>>>>provide the ability for authorization to work regardless of the >>>>>authentication. One *VERY* important use case is level of trust in >>>>>authentication from the authorization perpsective. e.g. I authorize >>>>>"identity" based on the how you authenticated.... Alice is able to >>>>>view >>>>>topic X if Alice authenticated over kerberos. Bob isn't allowed to >>>>>view >>>>>topic X no matter what. Alice can authenticate over not kerberos (uses >>>>>cases for that) and in that case Alice wouldn't see topic X. A >>>>>concrete >>>>>use case for this with Kafka would be a third party bank consuming >>>>>data >>>>>to >>>>>a broker. The service provider would have some kerberos local auth >>>>>for >>>>>that bank to-do back up that would also have access to other topics >>>>>related >>>>>to that banks data.... the bank itself over SSL wants a stream of >>>>>events >>>>>(some specific topic) and that banks identity only sees that topic. >>>>>It >>>>>is >>>>>important to not confuse identity, authentication and authorization. >>>>> >>>>> >>>>>> If >>>>>> so then we need to pick some subset of identity information that we >>>>>> can extract from both and have this constitute the identity we pass >>>>>> into the authorization interface. The original proposal had just the >>>>>> username/subject. But maybe we should add the ip address as well as >>>>>> that is useful. What I would prefer not to do is add everything in >>>>>>the >>>>>> certificate. I think the assumption is that you are generating these >>>>>> certificates for Kafka so you can put whatever identity info you >>>>>>want >>>>>> in the Subject Alternative Name. If that is true then just using >>>>>>that >>>>>> should be okay, right? >>>>>> >>>>> >>>>>I think we should just push the byte[] and let the plugin deal with >>>>>it. >>>>>So, if we have a certificate object then pass that along with whatever >>>>>other meta data (e.g. IP address of client) we can. I don't think we >>>>>should do any parsing whatsover and let the plugin deal with that. >>>>>Any >>>>>parsing we do on the identity information for the "security object" >>>>>forces >>>>>us into specific implementations and I don't see any reason to-do >>>>>that... >>>>>If plug-ins want an "easier" time to deal with certs and parsing and >>>>>blah >>>>>blah blah then we can implement some way they can do this without much >>>>>fuss.... we also need to make sure that crypto library is plugable too >>>>>(so >>>>>we can expose an API for them to call) so that HSM can be easily >>>>>dropped >>>>>in >>>>>without Kafka caring... so in the plugin we could provide a >>>>>indentity.getAlternativeAttribute() and then that use case is solved >>>>>(and >>>>>we can use bouncy castle or whatever to parse it for them to make it >>>>>easier).... and always give them raw bytes so they could do it >>>>>themselves. >>>>> >>>>> >>>>>> >>>>>> -Jay >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Sep 30, 2014 at 4:09 PM, Joe Stein <joe.st...@stealth.ly> >>>>>>wrote: >>>>>> > 1) We need to support the most flexibility we can and make this >>>>>> transparent >>>>>> > to kafka (to use Gwen's term). Any specific implementation is >>>>>>going >>>>>>to >>>>>> > make it not work with some solution stopping people from using >>>>>>Kafka. >>>>>> That >>>>>> > is a reality because everyone just does it slightly differently >>>>>>enough. >>>>>> If >>>>>> > we have an "identity" byte structure (lets not use string because >>>>>>some >>>>>> > security objects are bytes) this should just fall through to the >>>>>> > implementor. For certs this is the entire x509 object (not just >>>>>>the >>>>>> > certificate part as it could contain an ASN.1 timestamp) and >>>>>>inside >>>>>>you >>>>>> > parse and do what you want with it. >>>>>> > >>>>>> > 2) While I think there are many benefits to just the handshake >>>>>>approach I >>>>>> > don't think it outweighs the cons Jay expressed. a) We can't lead >>>>>>the >>>>>> > client libraries down a new path of interacting with Kafka. By >>>>>> > incrementally adding to the wire protocol we are directing a very >>>>>>clear >>>>>> and >>>>>> > expect ted approach. We already have issues with implementation >>>>>>even >>>>>> with >>>>>> > the wire protocol in place and are trying to improve that aspect >>>>>>of >>>>>>the >>>>>> > community as a whole. Lets not take a step backwards with this >>>>>>there... >>>>>> > also we need to not add more/different hoops to >>>>>> > debugging/administering/monitoring kafka so taking advantage (as >>>>>>Jay >>>>>> says) >>>>>> > of built in logging (etc) is important... also for the client >>>>>>librariy >>>>>> > developers too :) >>>>>> > >>>>>> > On Tue, Sep 30, 2014 at 6:44 PM, Gwen Shapira >>>>>><gshap...@cloudera.com> >>>>>> wrote: >>>>>> > >>>>>> >> Re #1: >>>>>> >> >>>>>> >> Since the auth_to_local is a kerberos config, its up to the >>>>>>admin to >>>>>> >> decide how he likes the user names and set it up properly (or >>>>>>leave >>>>>> >> empty) and make sure the ACLs match. Simplified names may be >>>>>>needed >>>>>>if >>>>>> >> the authorization system integrates with LDAP to get groups or >>>>>> >> something fancy like that. >>>>>> >> >>>>>> >> Note that its completely transparent to Kafka - if the admin >>>>>>sets up >>>>>> >> auth_to_local rules, we simply see a different principal name. No >>>>>>need >>>>>> >> to do anything different. >>>>>> >> >>>>>> >> Gwen >>>>>> >> >>>>>> >> On Tue, Sep 30, 2014 at 3:31 PM, Jay Kreps <jay.kr...@gmail.com> >>>>>>wrote: >>>>>> >> > Current proposal is here: >>>>>> >> > >>>>>> >> > https://cwiki.apache.org/confluence/display/KAFKA/Security >>>>>> >> > >>>>>> >> > Here are the two open questions I am aware of: >>>>>> >> > >>>>>> >> > 1. We want to separate authentication and authorization. This >>>>>>means >>>>>> >> > permissions will be assigned to some user-like >>>>>>subject/entity/person >>>>>> >> > string that is independent of the authorization mechanism. It >>>>>>sounds >>>>>> >> > like we agreed this could be done and we had in mind some >>>>>>krb-specific >>>>>> >> > mangling that Gwen knew about and I think the plan was to use >>>>>>whatever >>>>>> >> > the user chose to put in the Subject Alternative Name of the >>>>>>cert >>>>>>for >>>>>> >> > ssl. So in both cases these would translate to a string >>>>>>denoting >>>>>>the >>>>>> >> > entity whom we are granting permissions to in the authorization >>>>>>layer. >>>>>> >> > We should document these in the wiki to get feedback on them. >>>>>> >> > >>>>>> >> > The Hadoop approach to extraction was something like this: >>>>>> >> > >>>>>> >> >>>>>> >>>>>>http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.1/bk_installing >>>>>>_ma >>>>>>n >>>>>>ually_book/content/rpm-chap14-2-3-1.html >>>>>> >> > >>>>>> >> > But actually I'm not sure if just using the full kerberos >>>>>>principal is >>>>>> >> > so bad? I.e. having the user be jenni...@athena.mit.edu versus >>>>>>just >>>>>> >> > jennifer. Where this would make a difference would be in a case >>>>>>where >>>>>> >> > you wanted the same user/entity to be able to authenticate via >>>>>> >> > different mechanisms (Hadoop auth, kerberos, ssl) and have a >>>>>>single >>>>>> >> > set of permissions. >>>>>> >> > >>>>>> >> > 2. For SASL/Kerberos we need to figure out how the >>>>>>communication >>>>>> >> > between client and server will be handled to pass the >>>>>> >> > challenge/response byte[]. I.e. >>>>>> >> > >>>>>> >> > >>>>>> >> >>>>>> >>>>>>http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslClie >>>>>>nt. >>>>>>h >>>>>>tml#evaluateChallenge(byte[]) >>>>>> >> > >>>>>> >> >>>>>> >>>>>>http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslServ >>>>>>er. >>>>>>h >>>>>>tml#evaluateResponse(byte[]) >>>>>> >> > >>>>>> >> > I am not super expert in this area but I will try to give my >>>>>> >> > understanding and I'm sure someone can correct me if I am >>>>>>confused. >>>>>> >> > >>>>>> >> > Unlike SSL the transmission of this is actually outside the >>>>>>scope >>>>>>of >>>>>> >> > SASL so we have to specify this. Two proposals >>>>>> >> > >>>>>> >> > Original Proposal: Add a new "authenticate" request/response >>>>>> >> > >>>>>> >> > The proposal in the original wiki was to add a new >>>>>>"authenticate" >>>>>> >> > request/response to pass this information. This matches what >>>>>>was >>>>>>done >>>>>> >> > in the kerberos implementation for zookeeper. The intention is >>>>>>that >>>>>> >> > the client would send this request immediately after >>>>>>establishing >>>>>>a >>>>>> >> > connection, in which case it acts much like a "handshake", >>>>>>however >>>>>> >> > there is no requirement that they do so. >>>>>> >> > >>>>>> >> > Whether the authentication happens via SSL or via Kerberos, the >>>>>>effect >>>>>> >> > will just be to set the username in their session. This will >>>>>>default >>>>>> >> > to the "anybody" user. So in the default non-secure case we >>>>>>will >>>>>>just >>>>>> >> > be defaulting "anybody" to have full permission. So to answer >>>>>>the >>>>>> >> > question about whether changing user is required or not, I >>>>>>don't >>>>>>think >>>>>> >> > it is but I think we kind of get it for free in this approach. >>>>>> >> > >>>>>> >> > In this approach there is no particular need or advantage to >>>>>>having a >>>>>> >> > separate port for kerberos I don't think. >>>>>> >> > >>>>>> >> > Alternate Proposal: Create a Handshake >>>>>> >> > >>>>>> >> > The alternative I think Michael was proposing was to create a >>>>>> >> > handshake that would happen at connection time on connections >>>>>>coming >>>>>> >> > in on the SASL port. This would require a separate port for >>>>>>SASL >>>>>>since >>>>>> >> > otherwise you wouldn't be able to tell if the bytes you were >>>>>>getting >>>>>> >> > were for SASL or were the first request of an unauthenticated >>>>>> >> > connection. >>>>>> >> > >>>>>> >> > Michael it would be good to work out the details of how this >>>>>>works. >>>>>> >> > Are we just sending size-delimited byte arrays back and forth >>>>>>until >>>>>> >> > the challenge response terminates? >>>>>> >> > >>>>>> >> > My Take >>>>>> >> > >>>>>> >> > The pro I see for Michael's proposal is that it keeps the >>>>>> >> > authentication logic more localized in the socket server. >>>>>> >> > >>>>>> >> > I see two cons: >>>>>> >> > 1. Since the handshake won't go through the normal api layer it >>>>>>won't >>>>>> >> > go through the normal logging (e.g. request log), jmx >>>>>>monitoring, >>>>>> >> > client trace token, correlation id, etc that we get for other >>>>>> >> > requests. This could make operations a little confusing and >>>>>>make >>>>>> >> > debugging a little harder since the client will be blocking on >>>>>>network >>>>>> >> > requests without the normal logging. >>>>>> >> > 2. This part of the protocol will be inconsistent with the >>>>>>rest of >>>>>>the >>>>>> >> > Kafka protocol so it will be a little odd for client >>>>>>implementors >>>>>>as >>>>>> >> > this will effectively be a request/response that they will >>>>>>have to >>>>>> >> > implement that will be different from all the other >>>>>>request/responses >>>>>> >> > they implement. >>>>>> >> > >>>>>> >> > In practice these two alternatives are not very different >>>>>>except >>>>>>that >>>>>> >> > in the original proposal the bytes you send are prefixed by the >>>>>>normal >>>>>> >> > request header fields such as the client id, correlation id, >>>>>>etc. >>>>>> >> > Overall I would prefer this as I think it is a bit more >>>>>>consistent >>>>>> >> > from the client's point of view. >>>>>> >> > >>>>>> >> > Cheers, >>>>>> >> > >>>>>> >> > -Jay >>>>>> >> >>>>>> >>>> >>>