It would be nice to have Alcatraz on-disk security for the discriminating client.
Thanks, Rob > On Jun 6, 2014, at 11:51 AM, Pradeep Gollakota <pradeep...@gmail.com> wrote: > > I'm actually not convinced that encryption needs to be handled server side > in Kafka. I think the best solution for encryption is to handle it > producer/consumer side just like compression. This will offload key > management to the users and we'll still be able to leverage the sendfile > optimization for better performance. > > > On Fri, Jun 6, 2014 at 10:48 AM, Rob Withers <robert.w.with...@gmail.com> > wrote: > >> On consideration, if we have 3 different access groups (1 for production >> WRITE and 2 consumers) they all need to decode the same encryption and so >> all need the same public/private key....certs won't work, unless you write >> a CertAuthority to build multiple certs with the same keys. Better seems >> to not use certs and wrap the encryption specification with an ACL >> capabilities for each group of access. >> >> >> On Jun 6, 2014, at 11:43 AM, Rob Withers wrote: >> >> This is quite interesting to me and it is an excelent opportunity to >>> promote a slightly different security scheme. Object-capabilities are >>> perfect for online security and would use ACL style authentication to gain >>> capabilities filtered to those allowed resources for allow actions >>> (READ/WRITE/DELETE/LIST/SCAN). Erights.org has the quitenscential (??) >>> object capabilities model and capnproto is impleemting this for C++. I >>> have a java implementation at http://github.com/pauwau/pauwau but the >>> master is broken. 0.2 works, basically. B asically a TLS connection with >>> no certificate server, it is peer to peer. It has some advanced features, >>> but the lining of capabilities with authorization so that you can only >>> invoke correct services is extended to the secure user. >>> >>> Regarding non-repudiation, on disk, why not prepend a CRC? >>> >>> Regarding on-disk encryption, multiple users/groups may need to access, >>> with different capabilities. Sounds like zookeeper needs to store a cert >>> for each class of access so that a group member can access the decrypted >>> data from disk. Use cert-based async decryption. The only isue is storing >>> the private key in zookeeper. Perhaps some hash magic could be used. >>> >>> Thanks for kafka, >>> Rob >>> >>> On Jun 5, 2014, at 3:01 PM, Jay Kreps wrote: >>> >>> Hey Joe, >>>> >>>> I don't really understand the sections you added to the wiki. Can you >>>> clarify them? >>>> >>>> Is non-repudiation what SASL would call integrity checks? If so don't SSL >>>> and and many of the SASL schemes already support this as well as >>>> on-the-wire encryption? >>>> >>>> Or are you proposing an on-disk encryption scheme? Is this actually >>>> needed? >>>> Isn't a on-the-wire encryption when combined with mutual authentication >>>> and >>>> permissions sufficient for most uses? >>>> >>>> On-disk encryption seems unnecessary because if an attacker can get root >>>> on >>>> the kafka boxes it can potentially modify Kafka to do anything he or she >>>> wants with data. So this seems to break any security model. >>>> >>>> I understand the problem of a large organization not really having a >>>> trusted network and wanting to secure data transfer and limit and audit >>>> data access. The uses for these other things I don't totally understand. >>>> >>>> Also it would be worth understanding the state of other messaging and >>>> storage systems (Hadoop, dbs, etc). What features do they support. I >>>> think >>>> there is a sense in which you don't have to run faster than the bear, but >>>> only faster then your friends. :-) >>>> >>>> -Jay >>>> >>>> >>>> On Wed, Jun 4, 2014 at 5:57 PM, Joe Stein <joe.st...@stealth.ly> wrote: >>>> >>>> I like the idea of working on the spec and prioritizing. I will update >>>>> the >>>>> wiki. >>>>> >>>>> - Joestein >>>>> >>>>> >>>>> On Wed, Jun 4, 2014 at 1:11 PM, Jay Kreps <jay.kr...@gmail.com> wrote: >>>>> >>>>> Hey Joe, >>>>>> >>>>>> Thanks for kicking this discussion off! I totally agree that for >>>>> something >>>>> >>>>>> that acts as a central message broker security is critical feature. I >>>>> think >>>>> >>>>>> a number of people have been interested in this topic and several >>>>>> people >>>>>> have put effort into special purpose security efforts. >>>>>> >>>>>> Since most the LinkedIn folks are working on the consumer right now I >>>>> think >>>>> >>>>>> this would be a great project for any other interested people to take >>>>>> on. >>>>>> There are some challenges in doing these things distributed but it can >>>>> also >>>>> >>>>>> be a lot of fun. >>>>>> >>>>>> I think a good first step would be to get a written plan we can all >>>>>> agree >>>>>> on for how things should work. Then we can break things down into >>>>>> chunks >>>>>> that can be done independently while still aiming at a good end state. >>>>>> >>>>>> I had tried to write up some notes that summarized at least the >>>>>> thoughts >>>>> I >>>>> >>>>>> had had on security: >>>>>> https://cwiki.apache.org/confluence/display/KAFKA/Security >>>>>> >>>>>> What do you think of that? >>>>>> >>>>>> One assumption I had (which may be incorrect) is that although we want >>>>> all >>>>> >>>>>> the things in your list, the two most pressing would be authentication >>>>> and >>>>> >>>>>> authorization, and that was all that write up covered. You have more >>>>>> experience in this domain, so I wonder how you would prioritize? >>>>>> >>>>>> Those notes are really sketchy, so I think the first goal I would have >>>>>> would be to get to a real spec we can all agree on and discuss. A lot >>>>>> of >>>>>> the security stuff has a high human interaction element and needs to >>>>>> work >>>>>> in pretty different domains and different companies so getting this >>>>>> kind >>>>> of >>>>> >>>>>> review is important. >>>>>> >>>>>> -Jay >>>>>> >>>>>> >>>>>> On Tue, Jun 3, 2014 at 12:57 PM, Joe Stein <joe.st...@stealth.ly> >>>>>> wrote: >>>>>> >>>>>> Hi,I wanted to re-ignite the discussion around Apache Kafka Security. >>>>>> This >>>>>> >>>>>>> is a huge bottleneck (non-starter in some cases) for a lot of >>>>>> organizations >>>>>> >>>>>>> (due to regulatory, compliance and other requirements). Below are my >>>>>>> suggestions for specific changes in Kafka to accommodate security >>>>>>> requirements. This comes from what folks are doing "in the wild" to >>>>>>> workaround and implement security with Kafka as it is today and also >>>>>> what I >>>>>> >>>>>>> have discovered from organizations about their blockers. It also picks >>>>>> up >>>>> >>>>>> from the wiki (which I should have time to update later in the week >>>>>> based >>>>> >>>>>> on the below and feedback from the thread). >>>>>>> >>>>>>> 1) Transport Layer Security (i.e. SSL) >>>>>>> >>>>>>> This also includes client authentication in addition to in-transit >>>>>> security >>>>>> >>>>>>> layer. This work has been picked up here >>>>>>> https://issues.apache.org/jira/browse/KAFKA-1477 and do appreciate >>>>>>> any >>>>>>> thoughts, comments, feedback, tomatoes, whatever for this patch. It >>>>>> is a >>>>> >>>>>> pickup from the fork of the work first done here >>>>>>> https://github.com/relango/kafka/tree/kafka_security. >>>>>>> >>>>>>> 2) Data encryption at rest. >>>>>>> >>>>>>> This is very important and something that can be facilitated within >>>>>>> the >>>>>>> wire protocol. It requires an additional map data structure for the >>>>>>> "encrypted [data encryption key]". With this map (either in your >>>>>>> object >>>>>> or >>>>>> >>>>>>> in the wire protocol) you can store the dynamically generated >>>>>>> symmetric >>>>>> key >>>>>> >>>>>>> (for each message) and then encrypt the data using that dynamically >>>>>>> generated key. You then encrypt the encryption key using each public >>>>>> key >>>>> >>>>>> for whom is expected to be able to decrypt the encryption key to then >>>>>>> decrypt the message. For each public key encrypted symmetric key >>>>>> (which >>>>> >>>>>> is >>>>>> >>>>>>> now the "encrypted [data encryption key]" along with which public key >>>>>> it >>>>> >>>>>> was encrypted with for (so a map of [publicKey] = >>>>>>> encryptedDataEncryptionKey) as a chain. Other patterns can be >>>>>> implemented >>>>>> >>>>>>> but this is a pretty standard digital enveloping [0] pattern with only >>>>>> 1 >>>>> >>>>>> field added. Other patterns should be able to use that field to-do >>>>>> their >>>>> >>>>>> implementation too. >>>>>>> >>>>>>> 3) Non-repudiation and long term non-repudiation. >>>>>>> >>>>>>> Non-repudiation is proving data hasn't changed. This is often (if not >>>>>>> always) done with x509 public certificates (chained to a certificate >>>>>>> authority). >>>>>>> >>>>>>> Long term non-repudiation is what happens when the certificates of the >>>>>>> certificate authority are expired (or revoked) and everything ever >>>>>> signed >>>>> >>>>>> (ever) with that certificate's public key then becomes "no longer >>>>>> provable >>>>>> >>>>>>> as ever being authentic". That is where RFC3126 [1] and RFC3161 [2] >>>>>> come >>>>> >>>>>> in (or worm drives [hardware], etc). >>>>>>> >>>>>>> For either (or both) of these it is an operation of the encryptor to >>>>>>> sign/hash the data (with or without third party trusted timestap of >>>>>>> the >>>>>>> signing event) and encrypt that with their own private key and >>>>>> distribute >>>>> >>>>>> the results (before and after encrypting if required) along with their >>>>>>> public key. This structure is a bit more complex but feasible, it is a >>>>>> map >>>>>> >>>>>>> of digital signature formats and the chain of dig sig attestations. >>>>>> The >>>>> >>>>>> map's key being the method (i.e. CRC32, PKCS7 [3], XmlDigSig [4]) and >>>>>> then >>>>>> >>>>>>> a list of map where that key is "purpose" of signature (what your >>>>>> attesting >>>>>> >>>>>>> too). As a sibling field to the list another field for "the attester" >>>>>> as >>>>> >>>>>> bytes (e.g. their PKCS12 [5] for the map of PKCS7 signatures). >>>>>>> >>>>>>> 4) Authorization >>>>>>> >>>>>>> We should have a policy of "404" for data, topics, partitions (etc) if >>>>>>> authenticated connections do not have access. In "secure mode" any >>>>>>> non >>>>>>> authenticated connections should get a "404" type message on >>>>>> everything. >>>>> >>>>>> Knowing "something is there" is a security risk in many uses cases. So >>>>>> if >>>>>> >>>>>>> you don't have access you don't even see it. Baking "that" into Kafka >>>>>>> along with some interface for entitlement (access management) systems >>>>>>> (pretty standard) is all that I think needs to be done to the core >>>>>> project. >>>>>> >>>>>>> I want to tackle item later in the year after summer after the other >>>>>> three >>>>>> >>>>>>> are complete. >>>>>>> >>>>>>> I look forward to thoughts on this and anyone else interested in >>>>>> working >>>>> >>>>>> with us on these items. >>>>>>> >>>>>>> [0] >>>>>> http://www.emc.com/emc-plus/rsa-labs/standards- >>>>> initiatives/what-is-a-digital-envelope.htm >>>>> >>>>>> [1] http://tools.ietf.org/html/rfc3126 >>>>>>> [2] http://tools.ietf.org/html/rfc3161 >>>>>>> [3] >>>>>> http://www.emc.com/emc-plus/rsa-labs/standards-initiatives/pkcs-7- >>>>> cryptographic-message-syntax-standar.htm >>>>> >>>>>> [4] http://en.wikipedia.org/wiki/XML_Signature >>>>>>> [5] http://en.wikipedia.org/wiki/PKCS_12 >>>>>>> >>>>>>> /******************************************* >>>>>>> Joe Stein >>>>>>> Founder, Principal Consultant >>>>>>> Big Data Open Source Security LLC >>>>>>> http://www.stealth.ly >>>>>>> Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop> >>>>>>> ********************************************/ >>