Hi,I wanted to re-ignite the discussion around Apache Kafka Security. This is a huge bottleneck (non-starter in some cases) for a lot of organizations (due to regulatory, compliance and other requirements). Below are my suggestions for specific changes in Kafka to accommodate security requirements. This comes from what folks are doing "in the wild" to workaround and implement security with Kafka as it is today and also what I have discovered from organizations about their blockers. It also picks up from the wiki (which I should have time to update later in the week based on the below and feedback from the thread).
1) Transport Layer Security (i.e. SSL) This also includes client authentication in addition to in-transit security layer. This work has been picked up here https://issues.apache.org/jira/browse/KAFKA-1477 and do appreciate any thoughts, comments, feedback, tomatoes, whatever for this patch. It is a pickup from the fork of the work first done here https://github.com/relango/kafka/tree/kafka_security. 2) Data encryption at rest. This is very important and something that can be facilitated within the wire protocol. It requires an additional map data structure for the "encrypted [data encryption key]". With this map (either in your object or in the wire protocol) you can store the dynamically generated symmetric key (for each message) and then encrypt the data using that dynamically generated key. You then encrypt the encryption key using each public key for whom is expected to be able to decrypt the encryption key to then decrypt the message. For each public key encrypted symmetric key (which is now the "encrypted [data encryption key]" along with which public key it was encrypted with for (so a map of [publicKey] = encryptedDataEncryptionKey) as a chain. Other patterns can be implemented but this is a pretty standard digital enveloping [0] pattern with only 1 field added. Other patterns should be able to use that field to-do their implementation too. 3) Non-repudiation and long term non-repudiation. Non-repudiation is proving data hasn't changed. This is often (if not always) done with x509 public certificates (chained to a certificate authority). Long term non-repudiation is what happens when the certificates of the certificate authority are expired (or revoked) and everything ever signed (ever) with that certificate's public key then becomes "no longer provable as ever being authentic". That is where RFC3126 [1] and RFC3161 [2] come in (or worm drives [hardware], etc). For either (or both) of these it is an operation of the encryptor to sign/hash the data (with or without third party trusted timestap of the signing event) and encrypt that with their own private key and distribute the results (before and after encrypting if required) along with their public key. This structure is a bit more complex but feasible, it is a map of digital signature formats and the chain of dig sig attestations. The map's key being the method (i.e. CRC32, PKCS7 [3], XmlDigSig [4]) and then a list of map where that key is "purpose" of signature (what your attesting too). As a sibling field to the list another field for "the attester" as bytes (e.g. their PKCS12 [5] for the map of PKCS7 signatures). 4) Authorization We should have a policy of "404" for data, topics, partitions (etc) if authenticated connections do not have access. In "secure mode" any non authenticated connections should get a "404" type message on everything. Knowing "something is there" is a security risk in many uses cases. So if you don't have access you don't even see it. Baking "that" into Kafka along with some interface for entitlement (access management) systems (pretty standard) is all that I think needs to be done to the core project. I want to tackle item later in the year after summer after the other three are complete. I look forward to thoughts on this and anyone else interested in working with us on these items. [0] http://www.emc.com/emc-plus/rsa-labs/standards-initiatives/what-is-a-digital-envelope.htm [1] http://tools.ietf.org/html/rfc3126 [2] http://tools.ietf.org/html/rfc3161 [3] http://www.emc.com/emc-plus/rsa-labs/standards-initiatives/pkcs-7-cryptographic-message-syntax-standar.htm [4] http://en.wikipedia.org/wiki/XML_Signature [5] http://en.wikipedia.org/wiki/PKCS_12 /******************************************* Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop> ********************************************/