It would be nice to have Alcatraz on-disk security for the discriminating 
client.

Thanks,
Rob

> On Jun 6, 2014, at 11:51 AM, Pradeep Gollakota <pradeep...@gmail.com> wrote:
> 
> I'm actually not convinced that encryption needs to be handled server side
> in Kafka. I think the best solution for encryption is to handle it
> producer/consumer side just like compression. This will offload key
> management to the users and we'll still be able to leverage the sendfile
> optimization for better performance.
> 
> 
> On Fri, Jun 6, 2014 at 10:48 AM, Rob Withers <robert.w.with...@gmail.com>
> wrote:
> 
>> On consideration, if we have 3 different access groups (1 for production
>> WRITE and 2 consumers) they all need to decode the same encryption and so
>> all need the same public/private key....certs won't work, unless you write
>> a CertAuthority to build multiple certs with the same keys.  Better seems
>> to not use certs and wrap the encryption specification with an ACL
>> capabilities for each group of access.
>> 
>> 
>> On Jun 6, 2014, at 11:43 AM, Rob Withers wrote:
>> 
>> This is quite interesting to me and it is an excelent opportunity to
>>> promote a slightly different security scheme.  Object-capabilities are
>>> perfect for online security and would use ACL style authentication to gain
>>> capabilities filtered to those allowed resources for allow actions
>>> (READ/WRITE/DELETE/LIST/SCAN).  Erights.org has the quitenscential (??)
>>> object capabilities model and capnproto is impleemting this for C++.  I
>>> have a java implementation at http://github.com/pauwau/pauwau but the
>>> master is broken.  0.2 works, basically.  B asically a TLS connection with
>>> no certificate server, it is peer to peer.  It has some advanced features,
>>> but the lining of capabilities with authorization so that you can only
>>> invoke correct services is extended to the secure user.
>>> 
>>> Regarding non-repudiation, on disk, why not prepend a CRC?
>>> 
>>> Regarding on-disk encryption, multiple users/groups may need to access,
>>> with different capabilities.  Sounds like zookeeper needs to store a cert
>>> for each class of access so that a group member can access the decrypted
>>> data from disk.  Use cert-based async decryption.  The only isue is storing
>>> the private key in zookeeper.  Perhaps some hash magic could be used.
>>> 
>>> Thanks for kafka,
>>> Rob
>>> 
>>> On Jun 5, 2014, at 3:01 PM, Jay Kreps wrote:
>>> 
>>> Hey Joe,
>>>> 
>>>> I don't really understand the sections you added to the wiki. Can you
>>>> clarify them?
>>>> 
>>>> Is non-repudiation what SASL would call integrity checks? If so don't SSL
>>>> and and many of the SASL schemes already support this as well as
>>>> on-the-wire encryption?
>>>> 
>>>> Or are you proposing an on-disk encryption scheme? Is this actually
>>>> needed?
>>>> Isn't a on-the-wire encryption when combined with mutual authentication
>>>> and
>>>> permissions sufficient for most uses?
>>>> 
>>>> On-disk encryption seems unnecessary because if an attacker can get root
>>>> on
>>>> the kafka boxes it can potentially modify Kafka to do anything he or she
>>>> wants with data. So this seems to break any security model.
>>>> 
>>>> I understand the problem of a large organization not really having a
>>>> trusted network and wanting to secure data transfer and limit and audit
>>>> data access. The uses for these other things I don't totally understand.
>>>> 
>>>> Also it would be worth understanding the state of other messaging and
>>>> storage systems (Hadoop, dbs, etc). What features do they support. I
>>>> think
>>>> there is a sense in which you don't have to run faster than the bear, but
>>>> only faster then your friends. :-)
>>>> 
>>>> -Jay
>>>> 
>>>> 
>>>> On Wed, Jun 4, 2014 at 5:57 PM, Joe Stein <joe.st...@stealth.ly> wrote:
>>>> 
>>>> I like the idea of working on the spec and prioritizing. I will update
>>>>> the
>>>>> wiki.
>>>>> 
>>>>> - Joestein
>>>>> 
>>>>> 
>>>>> On Wed, Jun 4, 2014 at 1:11 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
>>>>> 
>>>>> Hey Joe,
>>>>>> 
>>>>>> Thanks for kicking this discussion off! I totally agree that for
>>>>> something
>>>>> 
>>>>>> that acts as a central message broker security is critical feature. I
>>>>> think
>>>>> 
>>>>>> a number of people have been interested in this topic and several
>>>>>> people
>>>>>> have put effort into special purpose security efforts.
>>>>>> 
>>>>>> Since most the LinkedIn folks are working on the consumer right now I
>>>>> think
>>>>> 
>>>>>> this would be a great project for any other interested people to take
>>>>>> on.
>>>>>> There are some challenges in doing these things distributed but it can
>>>>> also
>>>>> 
>>>>>> be a lot of fun.
>>>>>> 
>>>>>> I think a good first step would be to get a written plan we can all
>>>>>> agree
>>>>>> on for how things should work. Then we can break things down into
>>>>>> chunks
>>>>>> that can be done independently while still aiming at a good end state.
>>>>>> 
>>>>>> I had tried to write up some notes that summarized at least the
>>>>>> thoughts
>>>>> I
>>>>> 
>>>>>> had had on security:
>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/Security
>>>>>> 
>>>>>> What do you think of that?
>>>>>> 
>>>>>> One assumption I had (which may be incorrect) is that although we want
>>>>> all
>>>>> 
>>>>>> the things in your list, the two most pressing would be authentication
>>>>> and
>>>>> 
>>>>>> authorization, and that was all that write up covered. You have more
>>>>>> experience in this domain, so I wonder how you would prioritize?
>>>>>> 
>>>>>> Those notes are really sketchy, so I think the first goal I would have
>>>>>> would be to get to a real spec we can all agree on and discuss. A lot
>>>>>> of
>>>>>> the security stuff has a high human interaction element and needs to
>>>>>> work
>>>>>> in pretty different domains and different companies so getting this
>>>>>> kind
>>>>> of
>>>>> 
>>>>>> review is important.
>>>>>> 
>>>>>> -Jay
>>>>>> 
>>>>>> 
>>>>>> On Tue, Jun 3, 2014 at 12:57 PM, Joe Stein <joe.st...@stealth.ly>
>>>>>> wrote:
>>>>>> 
>>>>>> Hi,I wanted to re-ignite the discussion around Apache Kafka Security.
>>>>>> This
>>>>>> 
>>>>>>> is a huge bottleneck (non-starter in some cases) for a lot of
>>>>>> organizations
>>>>>> 
>>>>>>> (due to regulatory, compliance and other requirements). Below are my
>>>>>>> suggestions for specific changes in Kafka to accommodate security
>>>>>>> requirements.  This comes from what folks are doing "in the wild" to
>>>>>>> workaround and implement security with Kafka as it is today and also
>>>>>> what I
>>>>>> 
>>>>>>> have discovered from organizations about their blockers. It also picks
>>>>>> up
>>>>> 
>>>>>> from the wiki (which I should have time to update later in the week
>>>>>> based
>>>>> 
>>>>>> on the below and feedback from the thread).
>>>>>>> 
>>>>>>> 1) Transport Layer Security (i.e. SSL)
>>>>>>> 
>>>>>>> This also includes client authentication in addition to in-transit
>>>>>> security
>>>>>> 
>>>>>>> layer.  This work has been picked up here
>>>>>>> https://issues.apache.org/jira/browse/KAFKA-1477 and do appreciate
>>>>>>> any
>>>>>>> thoughts, comments, feedback, tomatoes, whatever for this patch.  It
>>>>>> is a
>>>>> 
>>>>>> pickup from the fork of the work first done here
>>>>>>> https://github.com/relango/kafka/tree/kafka_security.
>>>>>>> 
>>>>>>> 2) Data encryption at rest.
>>>>>>> 
>>>>>>> This is very important and something that can be facilitated within
>>>>>>> the
>>>>>>> wire protocol. It requires an additional map data structure for the
>>>>>>> "encrypted [data encryption key]". With this map (either in your
>>>>>>> object
>>>>>> or
>>>>>> 
>>>>>>> in the wire protocol) you can store the dynamically generated
>>>>>>> symmetric
>>>>>> key
>>>>>> 
>>>>>>> (for each message) and then encrypt the data using that dynamically
>>>>>>> generated key.  You then encrypt the encryption key using each public
>>>>>> key
>>>>> 
>>>>>> for whom is expected to be able to decrypt the encryption key to then
>>>>>>> decrypt the message.  For each public key encrypted symmetric key
>>>>>> (which
>>>>> 
>>>>>> is
>>>>>> 
>>>>>>> now the "encrypted [data encryption key]" along with which public key
>>>>>> it
>>>>> 
>>>>>> was encrypted with for (so a map of [publicKey] =
>>>>>>> encryptedDataEncryptionKey) as a chain.   Other patterns can be
>>>>>> implemented
>>>>>> 
>>>>>>> but this is a pretty standard digital enveloping [0] pattern with only
>>>>>> 1
>>>>> 
>>>>>> field added. Other patterns should be able to use that field to-do
>>>>>> their
>>>>> 
>>>>>> implementation too.
>>>>>>> 
>>>>>>> 3) Non-repudiation and long term non-repudiation.
>>>>>>> 
>>>>>>> Non-repudiation is proving data hasn't changed.  This is often (if not
>>>>>>> always) done with x509 public certificates (chained to a certificate
>>>>>>> authority).
>>>>>>> 
>>>>>>> Long term non-repudiation is what happens when the certificates of the
>>>>>>> certificate authority are expired (or revoked) and everything ever
>>>>>> signed
>>>>> 
>>>>>> (ever) with that certificate's public key then becomes "no longer
>>>>>> provable
>>>>>> 
>>>>>>> as ever being authentic".  That is where RFC3126 [1] and RFC3161 [2]
>>>>>> come
>>>>> 
>>>>>> in (or worm drives [hardware], etc).
>>>>>>> 
>>>>>>> For either (or both) of these it is an operation of the encryptor to
>>>>>>> sign/hash the data (with or without third party trusted timestap of
>>>>>>> the
>>>>>>> signing event) and encrypt that with their own private key and
>>>>>> distribute
>>>>> 
>>>>>> the results (before and after encrypting if required) along with their
>>>>>>> public key. This structure is a bit more complex but feasible, it is a
>>>>>> map
>>>>>> 
>>>>>>> of digital signature formats and the chain of dig sig attestations.
>>>>>> The
>>>>> 
>>>>>> map's key being the method (i.e. CRC32, PKCS7 [3], XmlDigSig [4]) and
>>>>>> then
>>>>>> 
>>>>>>> a list of map where that key is "purpose" of signature (what your
>>>>>> attesting
>>>>>> 
>>>>>>> too).  As a sibling field to the list another field for "the attester"
>>>>>> as
>>>>> 
>>>>>> bytes (e.g. their PKCS12 [5] for the map of PKCS7 signatures).
>>>>>>> 
>>>>>>> 4) Authorization
>>>>>>> 
>>>>>>> We should have a policy of "404" for data, topics, partitions (etc) if
>>>>>>> authenticated connections do not have access.  In "secure mode" any
>>>>>>> non
>>>>>>> authenticated connections should get a "404" type message on
>>>>>> everything.
>>>>> 
>>>>>> Knowing "something is there" is a security risk in many uses cases.  So
>>>>>> if
>>>>>> 
>>>>>>> you don't have access you don't even see it.  Baking "that" into Kafka
>>>>>>> along with some interface for entitlement (access management) systems
>>>>>>> (pretty standard) is all that I think needs to be done to the core
>>>>>> project.
>>>>>> 
>>>>>>> I want to tackle item later in the year after summer after the other
>>>>>> three
>>>>>> 
>>>>>>> are complete.
>>>>>>> 
>>>>>>> I look forward to thoughts on this and anyone else interested in
>>>>>> working
>>>>> 
>>>>>> with us on these items.
>>>>>>> 
>>>>>>> [0]
>>>>>> http://www.emc.com/emc-plus/rsa-labs/standards-
>>>>> initiatives/what-is-a-digital-envelope.htm
>>>>> 
>>>>>> [1] http://tools.ietf.org/html/rfc3126
>>>>>>> [2] http://tools.ietf.org/html/rfc3161
>>>>>>> [3]
>>>>>> http://www.emc.com/emc-plus/rsa-labs/standards-initiatives/pkcs-7-
>>>>> cryptographic-message-syntax-standar.htm
>>>>> 
>>>>>> [4] http://en.wikipedia.org/wiki/XML_Signature
>>>>>>> [5] http://en.wikipedia.org/wiki/PKCS_12
>>>>>>> 
>>>>>>> /*******************************************
>>>>>>> Joe Stein
>>>>>>> Founder, Principal Consultant
>>>>>>> Big Data Open Source Security LLC
>>>>>>> http://www.stealth.ly
>>>>>>> Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
>>>>>>> ********************************************/
>> 

Reply via email to