Re: [DISCUSS] Kafka Security Specific Features

Robert Withers Tue, 10 Jun 2014 15:27:25 -0700

What strikes me as an opportunity is to define a plug gable at-rest encryption 
module interface, that supports each/both of our security needs.


Thanks,
Rob

> On Jun 10, 2014, at 4:01 PM, Todd Palino <tpal...@linkedin.com.INVALID> wrote:
> 
> The situation of production before having the consumer is definitely a
> good one. That’s why I wanted to take a little time before responding. Had
> to think about it.
> 
> I think that while we may certainly produce data before the consumer is
> ready, that doesn’t mean that the consumer can’t have a key pair generated
> for it already, so the producer could start encrypting for that consumer
> before it exists. This would probably work fine for lower retention
> periods (a week or two), but could be a little more difficult to manage if
> you are keeping data in Kafka longer than that. My gut reaction is that
> it’s better to handle it that way and keep the key pair and session key
> handling simple. The more we can do that, the more we can leave key
> management as a separate component that can be swapped out so the user can
> decide how it should be done.
> 
> -Todd
> 
> 
>> On 6/9/14, 8:16 AM, "Robert Withers" <robert.w.with...@gmail.com> wrote:
>> 
>> Yes, that sounds familiar as I helped write (minimally) S/MIME in squeak
>> (open source Smalltalk environment).  This what I was thinking in my
>> alternative here, though I have a concern...
>> 
>> Production may occur before the consumer is coded and executed.  In the
>> analogy of mail, the mail is sent before the complete recipient list is
>> known.
>> 
>> This seems to mean that the private key (cert or OTP) must be stored and
>> interacted with.  My feeling is that key metadata are in a system
>> encrypted Hbase store (session key store), for low latency reads, rather
>> than a topic requiring scanning.  Store the private keys and then give
>> client access (producers/consumers) with the hash of the OTP.  A new
>> consumer comes along, create a new cert encoding the OTP hash.
>> 
>> On write, use the producer cert to send a topic hash with the msg which
>> would allow the broker to reuse or generate an OTP, stored in the session
>> key store.
>> 
>> On read (consumer), if we have a previously run reader, use the encrypted
>> hash.  If new, create consumer cert and encrypt the hash for that session.
>> 
>> The reader/writer will pass a cert encrypted session hash.  The trick
>> seems to be converting hash to PK to encrypt/decrypt.  Given Kafka
>> resource distribution, we need system encryption for metadata and
>> cert-based key exchange.  This seems to mean triple encryption:
>> 1) client to/from broker
>> 2) system key/hash  mgmt/translation
>> 3) at-rest encryption
>> 
>> Thanks,
>> Rob
>> 
>>> On Jun 9, 2014, at 7:57 AM, Todd Palino <tpal...@linkedin.com.INVALID>
>>> wrote:
>>> 
>>> It’s the same method used by S/MIME and many other encryption
>>> specifications with the potential for multiple recipients. The sender
>>> generates a session key, and uses that key to encrypt the message. The
>>> session key is then encrypted once for each recipient with that
>>> recipient’s public key. All of the encrypted copies of the session key
>>> are
>>> then included with the encrypted message. This way, you avoid having to
>>> encrypt the message multiple times (this assumes, of course, that the
>>> message itself is larger than the key).
>>> 
>>> In our case, we have some options available to us. We could do that, and
>>> put all the encrypted keys in the message metadata. Or we could treat it
>>> more like a session and have the encrypted session keys in a special
>>> topic
>>> (e.g. __session_keys), much like offsets are now. When the producer
>>> starts
>>> up, they create a session key and encrypt it for each consumer with the
>>> current consumer key. The producer publishes the bundle of encrypted
>>> keys
>>> into __session_keys as a single message. The producer then publishes
>>> messages to the normal topic encrypted with the session key. The
>>> metadata
>>> for each of those messages would contain something the offset into
>>> __session_keys to identify the bundle. This has the added benefit of not
>>> increasing the per-message data size too much.
>>> 
>>> Whenever a consumer key is invalidated, or however often the session key
>>> should be rotated, the producer would publish a new bundle. This
>>> maintains
>>> a history of session keys that can be used to decrypt any messages, so
>>> the
>>> retention on __session_keys must be at least as long as any topic which
>>> may potentially contain encrypted data. Past that point, it’s up to the
>>> consumer what they want to do with the data. A consumer like Hadoop
>>> might
>>> re-encrypt it for local storage, or store it in plaintext (depending on
>>> the security and requirements of that system).
>>> 
>>> -Todd
>>> 
>>>> On 6/8/14, 2:33 PM, "Rob Withers" <robert.w.with...@gmail.com> wrote:
>>>> 
>>>> I like the use of meta envelopes.  We did this recently, on the job,
>>>> as we have an envelope that specifies the type for decoding.  We
>>>> discussed adding the encodinType and you are suggesting adding
>>>> encryption metadata for that msg.  All good.
>>>> 
>>>> I don't see your OTP example.  Could you delve deeper for me, please?
>>>> The model I envision is internal OTP, with access to decryption
>>>> accessed by cert.  A double layer of security, with the internal at-
>>>> rest encryption being an unchanging OTP with ACL access to it as the
>>>> upper layer.  Are you saying it is possible to re-encrypt with new
>>>> keys or that there is a chain of keys over time?
>>>> 
>>>> Thanks,
>>>> Rob
>>>> 
>>>>> On Jun 8, 2014, at 3:06 PM, Todd Palino wrote:
>>>>> 
>>>>> I’ll agree that perhaps the “absolutely not” is not quite right.
>>>>> There are
>>>>> certainly some uses for a simpler solution, but I would still say it
>>>>> cannot only be encryption at the broker. This would leave many use
>>>>> cases
>>>>> for at-rest encryption out of the loop (most auditing cases for SOX,
>>>>> PCI,
>>>>> HIPAA, and other PII standards). Yes, it does add external overhead
>>>>> that
>>>>> must be managed, but it’s just the nature of the beast. We can’t
>>>>> solve all
>>>>> of the external infrastructure needed for this, but we can make it
>>>>> easier
>>>>> to use for consumers and producers by adding metadata.
>>>>> 
>>>>> There’s no need for unchanging encryption, and that’s specifically
>>>>> why I
>>>>> want to see a message envelope that will help consumers determine the
>>>>> encryption uses for a particular message.  You can definitely still
>>>>> expire
>>>>> keys, you just have to keep the expired keys around as long as the
>>>>> encrypted data stays around, and your endpoints need to know when
>>>>> they are
>>>>> decrypting data with an expired key (you might want to throw up a
>>>>> warning,
>>>>> or do something else to let the users know that it’s happening). And
>>>>> as
>>>>> someone else mentioned, there are solutions for encrypting data for
>>>>> multiple consumers. You can encrypt the data with an OTP, and then
>>>>> multiply encrypt the OTP once for each consumer and store those
>>>>> encrypted
>>>>> strings in the envelope.
>>>>> 
>>>>> -Todd
>>>>> 
>>>>>> On 6/7/14, 12:25 PM, "Rob Withers" <robert.w.with...@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>> At one level this makes sense to me to externalize the security issue
>>>>>> to producers and consumers.  On consideration I realized that this
>>>>>> adds a lot of coordination requirements to the app layer across teams
>>>>>> or even companies.  Another issue I feel is that you want a specific
>>>>>> unchanging encryption for the data and the clients (producers/
>>>>>> consumers) will need to be able to decode frozen data.  If certs are
>>>>>> used they cannot expire.  Also, different clients would need to use
>>>>>> the same cert.
>>>>>> 
>>>>>> So, you statement that it should ABSOLUTELY not include internal
>>>>>> encryption rings seems misplaced.  There are some customers of kafka
>>>>>> that would opt to encrypt the on-disk data and key management is a
>>>>>> significant issue.  This is best handled internally, with key
>>>>>> management stored in either ZK or in a topic.  Truly, perhaps
>>>>>> annealing Hadoop/HBASE as a metadata store seems applicable.
>>>>>> 
>>>>>> Thanks, another 2 cents,
>>>>>> Rob
>>>>>> 
>>>>>>> On Jun 6, 2014, at 12:15 PM, Todd Palino wrote:
>>>>>>> 
>>>>>>> Yes, I realized last night that I needed to be clearer in what I was
>>>>>>> saying. Encryption should ABSOLUTELY not be handled server-side. I
>>>>>>> think
>>>>>>> it¹s a good idea to enable use of it in the consumer/producer, but
>>>>>>> doing
>>>>>>> it server side will not solve many use cases for needing encryption
>>>>>>> because the server then has access to all the keys. You could say
>>>>>>> that
>>>>>>> this eliminates the need for TLS, but TLS is pretty low-hanging
>>>>>>> fruit, and
>>>>>>> there¹s definitely a need for encryption of the traffic across the
>>>>>>> network
>>>>>>> even if you don¹t need at-rest encryption as well.
>>>>>>> 
>>>>>>> And as you mentioned, something needs to be done about key
>>>>>>> management.
>>>>>>> Storing information with the message about which key(s) was used is
>>>>>>> a good
>>>>>>> idea, because it allows you to know when a producer has switched
>>>>>>> keys.
>>>>>>> There are definitely some alternative solutions to that as well. But
>>>>>>> storing the keys in the broker, Zookeeper, or other systems like
>>>>>>> that are
>>>>>>> not. There needs to be a system used where the keys are only
>>>>>>> available to
>>>>>>> the producers and consumers that need them, and they only get access
>>>>>>> to
>>>>>>> the appropriate part of the key pair.  Even as the guy running Kafka
>>>>>>> and
>>>>>>> Zookeeper, I should not have access to the keys being used, and if
>>>>>>> data is
>>>>>>> encrypted I should not be able to see the cleartext.
>>>>>>> 
>>>>>>> And even if we decide not to put anything about at-rest encryption
>>>>>>> in the
>>>>>>> consumer/producer clients directly, and leave it for an exercise
>>>>>>> above
>>>>>>> that level (you have to pass the ciphertext as the message to the
>>>>>>> client),
>>>>>>> I still think there is a good case for implementing a message
>>>>>>> envelope
>>>>>>> that can store the information about which key was used, and other
>>>>>>> pertinent metadata, and have the ability for special applications
>>>>>>> like
>>>>>>> mirror maker to be able to preserve it across clusters. This still
>>>>>>> helps
>>>>>>> to enable the use of encryption and other features (like auditing)
>>>>>>> even if
>>>>>>> we decide it¹s too large a scope to fully implement.
>>>>>>> 
>>>>>>> -Todd
>>>>>>> 
>>>>>>> On 6/6/14, 10:51 AM, "Pradeep Gollakota" <pradeep...@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> I'm actually not convinced that encryption needs to be handled
>>>>>>>> server side
>>>>>>>> in Kafka. I think the best solution for encryption is to handle it
>>>>>>>> producer/consumer side just like compression. This will offload key
>>>>>>>> management to the users and we'll still be able to leverage the
>>>>>>>> sendfile
>>>>>>>> optimization for better performance.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Fri, Jun 6, 2014 at 10:48 AM, Rob Withers
>>>>>>>> <robert.w.with...@gmail.com
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> On consideration, if we have 3 different access groups (1 for
>>>>>>>>> production
>>>>>>>>> WRITE and 2 consumers) they all need to decode the same encryption
>>>>>>>>> and
>>>>>>>>> so
>>>>>>>>> all need the same public/private key....certs won't work, unless
>>>>>>>>> you
>>>>>>>>> write
>>>>>>>>> a CertAuthority to build multiple certs with the same keys.
>>>>>>>>> Better
>>>>>>>>> seems
>>>>>>>>> to not use certs and wrap the encryption specification with an ACL
>>>>>>>>> capabilities for each group of access.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Jun 6, 2014, at 11:43 AM, Rob Withers wrote:
>>>>>>>>> 
>>>>>>>>> This is quite interesting to me and it is an excelent
>>>>>>>>> opportunity to
>>>>>>>>>> promote a slightly different security scheme.  Object-
>>>>>>>>>> capabilities are
>>>>>>>>>> perfect for online security and would use ACL style
>>>>>>>>>> authentication to
>>>>>>>>>> gain
>>>>>>>>>> capabilities filtered to those allowed resources for allow
>>>>>>>>>> actions
>>>>>>>>>> (READ/WRITE/DELETE/LIST/SCAN).  Erights.org has the
>>>>>>>>>> quitenscential (??)
>>>>>>>>>> object capabilities model and capnproto is impleemting this for
>>>>>>>>>> C+
>>>>>>>>>> +.  I
>>>>>>>>>> have a java implementation at http://github.com/pauwau/pauwau but
>>>>>>>>>> the
>>>>>>>>>> master is broken.  0.2 works, basically.  B asically a TLS
>>>>>>>>>> connection
>>>>>>>>>> with
>>>>>>>>>> no certificate server, it is peer to peer.  It has some advanced
>>>>>>>>>> features,
>>>>>>>>>> but the lining of capabilities with authorization so that you can
>>>>>>>>>> only
>>>>>>>>>> invoke correct services is extended to the secure user.
>>>>>>>>>> 
>>>>>>>>>> Regarding non-repudiation, on disk, why not prepend a CRC?
>>>>>>>>>> 
>>>>>>>>>> Regarding on-disk encryption, multiple users/groups may need to
>>>>>>>>>> access,
>>>>>>>>>> with different capabilities.  Sounds like zookeeper needs to
>>>>>>>>>> store a
>>>>>>>>>> cert
>>>>>>>>>> for each class of access so that a group member can access the
>>>>>>>>>> decrypted
>>>>>>>>>> data from disk.  Use cert-based async decryption.  The only
>>>>>>>>>> isue is
>>>>>>>>>> storing
>>>>>>>>>> the private key in zookeeper.  Perhaps some hash magic could be
>>>>>>>>>> used.
>>>>>>>>>> 
>>>>>>>>>> Thanks for kafka,
>>>>>>>>>> Rob
>>>>>>>>>> 
>>>>>>>>>> On Jun 5, 2014, at 3:01 PM, Jay Kreps wrote:
>>>>>>>>>> 
>>>>>>>>>> Hey Joe,
>>>>>>>>>>> 
>>>>>>>>>>> I don't really understand the sections you added to the wiki.
>>>>>>>>>>> Can you
>>>>>>>>>>> clarify them?
>>>>>>>>>>> 
>>>>>>>>>>> Is non-repudiation what SASL would call integrity checks? If so
>>>>>>>>>>> don't
>>>>>>>>>>> SSL
>>>>>>>>>>> and and many of the SASL schemes already support this as well as
>>>>>>>>>>> on-the-wire encryption?
>>>>>>>>>>> 
>>>>>>>>>>> Or are you proposing an on-disk encryption scheme? Is this
>>>>>>>>>>> actually
>>>>>>>>>>> needed?
>>>>>>>>>>> Isn't a on-the-wire encryption when combined with mutual
>>>>>>>>>>> authentication
>>>>>>>>>>> and
>>>>>>>>>>> permissions sufficient for most uses?
>>>>>>>>>>> 
>>>>>>>>>>> On-disk encryption seems unnecessary because if an attacker can
>>>>>>>>>>> get
>>>>>>>>>>> root
>>>>>>>>>>> on
>>>>>>>>>>> the kafka boxes it can potentially modify Kafka to do anything
>>>>>>>>>>> he or
>>>>>>>>>>> she
>>>>>>>>>>> wants with data. So this seems to break any security model.
>>>>>>>>>>> 
>>>>>>>>>>> I understand the problem of a large organization not really
>>>>>>>>>>> having a
>>>>>>>>>>> trusted network and wanting to secure data transfer and limit
>>>>>>>>>>> and
>>>>>>>>>>> audit
>>>>>>>>>>> data access. The uses for these other things I don't totally
>>>>>>>>>>> understand.
>>>>>>>>>>> 
>>>>>>>>>>> Also it would be worth understanding the state of other
>>>>>>>>>>> messaging and
>>>>>>>>>>> storage systems (Hadoop, dbs, etc). What features do they
>>>>>>>>>>> support. I
>>>>>>>>>>> think
>>>>>>>>>>> there is a sense in which you don't have to run faster than the
>>>>>>>>>>> bear,
>>>>>>>>>>> but
>>>>>>>>>>> only faster then your friends. :-)
>>>>>>>>>>> 
>>>>>>>>>>> -Jay
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Jun 4, 2014 at 5:57 PM, Joe Stein <joe.st...@stealth.ly>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> I like the idea of working on the spec and prioritizing. I will
>>>>>>>>>>> update
>>>>>>>>>>>> the
>>>>>>>>>>>> wiki.
>>>>>>>>>>>> 
>>>>>>>>>>>> - Joestein
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Jun 4, 2014 at 1:11 PM, Jay Kreps <jay.kr...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hey Joe,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks for kicking this discussion off! I totally agree that
>>>>>>>>>>>>> for
>>>>>>>>>>>> something
>>>>>>>>>>>> 
>>>>>>>>>>>>> that acts as a central message broker security is critical
>>>>>>>>>>>>> feature.
>>>>>>>>>>>>> I
>>>>>>>>>>>> think
>>>>>>>>>>>> 
>>>>>>>>>>>>> a number of people have been interested in this topic and
>>>>>>>>>>>>> several
>>>>>>>>>>>>> people
>>>>>>>>>>>>> have put effort into special purpose security efforts.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Since most the LinkedIn folks are working on the consumer
>>>>>>>>>>>>> right now
>>>>>>>>>>>>> I
>>>>>>>>>>>> think
>>>>>>>>>>>> 
>>>>>>>>>>>>> this would be a great project for any other interested
>>>>>>>>>>>>> people to
>>>>>>>>>>>>> take
>>>>>>>>>>>>> on.
>>>>>>>>>>>>> There are some challenges in doing these things distributed
>>>>>>>>>>>>> but it
>>>>>>>>>>>>> can
>>>>>>>>>>>> also
>>>>>>>>>>>> 
>>>>>>>>>>>>> be a lot of fun.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I think a good first step would be to get a written plan we
>>>>>>>>>>>>> can all
>>>>>>>>>>>>> agree
>>>>>>>>>>>>> on for how things should work. Then we can break things down
>>>>>>>>>>>>> into
>>>>>>>>>>>>> chunks
>>>>>>>>>>>>> that can be done independently while still aiming at a good
>>>>>>>>>>>>> end
>>>>>>>>>>>>> state.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I had tried to write up some notes that summarized at least
>>>>>>>>>>>>> the
>>>>>>>>>>>>> thoughts
>>>>>>>>>>>> I
>>>>>>>>>>>> 
>>>>>>>>>>>>> had had on security:
>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/Security
>>>>>>>>>>>>> 
>>>>>>>>>>>>> What do you think of that?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> One assumption I had (which may be incorrect) is that although
>>>>>>>>>>>>> we
>>>>>>>>>>>>> want
>>>>>>>>>>>> all
>>>>>>>>>>>> 
>>>>>>>>>>>>> the things in your list, the two most pressing would be
>>>>>>>>>>>>> authentication
>>>>>>>>>>>> and
>>>>>>>>>>>> 
>>>>>>>>>>>>> authorization, and that was all that write up covered. You
>>>>>>>>>>>>> have more
>>>>>>>>>>>>> experience in this domain, so I wonder how you would
>>>>>>>>>>>>> prioritize?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Those notes are really sketchy, so I think the first goal I
>>>>>>>>>>>>> would
>>>>>>>>>>>>> have
>>>>>>>>>>>>> would be to get to a real spec we can all agree on and
>>>>>>>>>>>>> discuss. A
>>>>>>>>>>>>> lot
>>>>>>>>>>>>> of
>>>>>>>>>>>>> the security stuff has a high human interaction element and
>>>>>>>>>>>>> needs to
>>>>>>>>>>>>> work
>>>>>>>>>>>>> in pretty different domains and different companies so getting
>>>>>>>>>>>>> this
>>>>>>>>>>>>> kind
>>>>>>>>>>>> of
>>>>>>>>>>>> 
>>>>>>>>>>>>> review is important.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -Jay
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Tue, Jun 3, 2014 at 12:57 PM, Joe Stein
>>>>>>>>>>>>> <joe.st...@stealth.ly>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi,I wanted to re-ignite the discussion around Apache Kafka
>>>>>>>>>>>>> Security.
>>>>>>>>>>>>> This
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> is a huge bottleneck (non-starter in some cases) for a lot of
>>>>>>>>>>>>> organizations
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> (due to regulatory, compliance and other requirements). Below
>>>>>>>>>>>>>> are
>>>>>>>>>>>>>> my
>>>>>>>>>>>>>> suggestions for specific changes in Kafka to accommodate
>>>>>>>>>>>>>> security
>>>>>>>>>>>>>> requirements.  This comes from what folks are doing "in the
>>>>>>>>>>>>>> wild"
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> workaround and implement security with Kafka as it is today
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>> also
>>>>>>>>>>>>> what I
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> have discovered from organizations about their blockers. It
>>>>>>>>>>>>>> also
>>>>>>>>>>>>>> picks
>>>>>>>>>>>>> up
>>>>>>>>>>>> 
>>>>>>>>>>>>> from the wiki (which I should have time to update later in the
>>>>>>>>>>>>> week
>>>>>>>>>>>>> based
>>>>>>>>>>>> 
>>>>>>>>>>>>> on the below and feedback from the thread).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 1) Transport Layer Security (i.e. SSL)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> This also includes client authentication in addition to in-
>>>>>>>>>>>>>> transit
>>>>>>>>>>>>> security
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> layer.  This work has been picked up here
>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-1477 and do
>>>>>>>>>>>>>> appreciate
>>>>>>>>>>>>>> any
>>>>>>>>>>>>>> thoughts, comments, feedback, tomatoes, whatever for this
>>>>>>>>>>>>>> patch.
>>>>>>>>>>>>>> It
>>>>>>>>>>>>> is a
>>>>>>>>>>>> 
>>>>>>>>>>>>> pickup from the fork of the work first done here
>>>>>>>>>>>>>> https://github.com/relango/kafka/tree/kafka_security.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 2) Data encryption at rest.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> This is very important and something that can be facilitated
>>>>>>>>>>>>>> within
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> wire protocol. It requires an additional map data structure
>>>>>>>>>>>>>> for the
>>>>>>>>>>>>>> "encrypted [data encryption key]". With this map (either in
>>>>>>>>>>>>>> your
>>>>>>>>>>>>>> object
>>>>>>>>>>>>> or
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> in the wire protocol) you can store the dynamically generated
>>>>>>>>>>>>>> symmetric
>>>>>>>>>>>>> key
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> (for each message) and then encrypt the data using that
>>>>>>>>>>>>>> dynamically
>>>>>>>>>>>>>> generated key.  You then encrypt the encryption key using
>>>>>>>>>>>>>> each
>>>>>>>>>>>>>> public
>>>>>>>>>>>>> key
>>>>>>>>>>>> 
>>>>>>>>>>>>> for whom is expected to be able to decrypt the encryption
>>>>>>>>>>>>> key to
>>>>>>>>>>>>> then
>>>>>>>>>>>>>> decrypt the message.  For each public key encrypted symmetric
>>>>>>>>>>>>>> key
>>>>>>>>>>>>> (which
>>>>>>>>>>>> 
>>>>>>>>>>>>> is
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> now the "encrypted [data encryption key]" along with which
>>>>>>>>>>>>>> public
>>>>>>>>>>>>>> key
>>>>>>>>>>>>> it
>>>>>>>>>>>> 
>>>>>>>>>>>>> was encrypted with for (so a map of [publicKey] =
>>>>>>>>>>>>>> encryptedDataEncryptionKey) as a chain.   Other patterns
>>>>>>>>>>>>>> can be
>>>>>>>>>>>>> implemented
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> but this is a pretty standard digital enveloping [0] pattern
>>>>>>>>>>>>>> with
>>>>>>>>>>>>>> only
>>>>>>>>>>>>> 1
>>>>>>>>>>>> 
>>>>>>>>>>>>> field added. Other patterns should be able to use that field
>>>>>>>>>>>>> to-do
>>>>>>>>>>>>> their
>>>>>>>>>>>> 
>>>>>>>>>>>>> implementation too.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 3) Non-repudiation and long term non-repudiation.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Non-repudiation is proving data hasn't changed.  This is
>>>>>>>>>>>>>> often (if
>>>>>>>>>>>>>> not
>>>>>>>>>>>>>> always) done with x509 public certificates (chained to a
>>>>>>>>>>>>>> certificate
>>>>>>>>>>>>>> authority).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Long term non-repudiation is what happens when the
>>>>>>>>>>>>>> certificates of
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> certificate authority are expired (or revoked) and everything
>>>>>>>>>>>>>> ever
>>>>>>>>>>>>> signed
>>>>>>>>>>>> 
>>>>>>>>>>>>> (ever) with that certificate's public key then becomes "no
>>>>>>>>>>>>> longer
>>>>>>>>>>>>> provable
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> as ever being authentic".  That is where RFC3126 [1] and
>>>>>>>>>>>>>> RFC3161
>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>> come
>>>>>>>>>>>> 
>>>>>>>>>>>>> in (or worm drives [hardware], etc).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> For either (or both) of these it is an operation of the
>>>>>>>>>>>>>> encryptor
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> sign/hash the data (with or without third party trusted
>>>>>>>>>>>>>> timestap of
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> signing event) and encrypt that with their own private key
>>>>>>>>>>>>>> and
>>>>>>>>>>>>> distribute
>>>>>>>>>>>> 
>>>>>>>>>>>>> the results (before and after encrypting if required) along
>>>>>>>>>>>>> with
>>>>>>>>>>>>> their
>>>>>>>>>>>>>> public key. This structure is a bit more complex but
>>>>>>>>>>>>>> feasible, it
>>>>>>>>>>>>>> is a
>>>>>>>>>>>>> map
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> of digital signature formats and the chain of dig sig
>>>>>>>>>>>>>> attestations.
>>>>>>>>>>>>> The
>>>>>>>>>>>> 
>>>>>>>>>>>>> map's key being the method (i.e. CRC32, PKCS7 [3], XmlDigSig
>>>>>>>>>>>>> [4])
>>>>>>>>>>>>> and
>>>>>>>>>>>>> then
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> a list of map where that key is "purpose" of signature (what
>>>>>>>>>>>>>> your
>>>>>>>>>>>>> attesting
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> too).  As a sibling field to the list another field for "the
>>>>>>>>>>>>>> attester"
>>>>>>>>>>>>> as
>>>>>>>>>>>> 
>>>>>>>>>>>>> bytes (e.g. their PKCS12 [5] for the map of PKCS7 signatures).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 4) Authorization
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> We should have a policy of "404" for data, topics, partitions
>>>>>>>>>>>>>> (etc) if
>>>>>>>>>>>>>> authenticated connections do not have access.  In "secure
>>>>>>>>>>>>>> mode" any
>>>>>>>>>>>>>> non
>>>>>>>>>>>>>> authenticated connections should get a "404" type message on
>>>>>>>>>>>>> everything.
>>>>>>>>>>>> 
>>>>>>>>>>>>> Knowing "something is there" is a security risk in many uses
>>>>>>>>>>>>> cases.
>>>>>>>>>>>>> So
>>>>>>>>>>>>> if
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> you don't have access you don't even see it.  Baking "that"
>>>>>>>>>>>>>> into
>>>>>>>>>>>>>> Kafka
>>>>>>>>>>>>>> along with some interface for entitlement (access management)
>>>>>>>>>>>>>> systems
>>>>>>>>>>>>>> (pretty standard) is all that I think needs to be done to the
>>>>>>>>>>>>>> core
>>>>>>>>>>>>> project.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I want to tackle item later in the year after summer after
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> other
>>>>>>>>>>>>> three
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> are complete.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I look forward to thoughts on this and anyone else interested
>>>>>>>>>>>>>> in
>>>>>>>>>>>>> working
>>>>>>>>>>>> 
>>>>>>>>>>>>> with us on these items.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> [0]
>>>>>>>>>>>>> http://www.emc.com/emc-plus/rsa-labs/standards-
>>>>>>>>>>>> initiatives/what-is-a-digital-envelope.htm
>>>>>>>>>>>> 
>>>>>>>>>>>>> [1] http://tools.ietf.org/html/rfc3126
>>>>>>>>>>>>>> [2] http://tools.ietf.org/html/rfc3161
>>>>>>>>>>>>>> [3]
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> http://www.emc.com/emc-plus/rsa-labs/standards-initiatives/pkcs
>>>>>>>>>>>>> -7
>>>>>>>>>>>>> -
>>>>>>>>>>>> cryptographic-message-syntax-standar.htm
>>>>>>>>>>>> 
>>>>>>>>>>>>> [4] http://en.wikipedia.org/wiki/XML_Signature
>>>>>>>>>>>>>> [5] http://en.wikipedia.org/wiki/PKCS_12
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> /*******************************************
>>>>>>>>>>>>>> Joe Stein
>>>>>>>>>>>>>> Founder, Principal Consultant
>>>>>>>>>>>>>> Big Data Open Source Security LLC
>>>>>>>>>>>>>> http://www.stealth.ly
>>>>>>>>>>>>>> Twitter: @allthingshadoop
>>>>>>>>>>>>>> <http://www.twitter.com/allthingshadoop
>>>>>>>>>>>>>> ********************************************/
>

Re: [DISCUSS] Kafka Security Specific Features

Reply via email to