Tom, good point, I've done exactly that -- hashing record keys -- but it's unclear to me what should happen when the hash key must be rotated. In my case the (external) solution involved rainbow tables, versioned keys, and custom materializers that were aware of older keys for each record.
In particular I had a pipeline that would re-key records and re-ingest them, while opportunistically overwriting records materialized with the old key. For a native solution I think maybe we'd need to carry around any old versions of each record key, perhaps as metadata. Then brokers and materializers can compact records based on _any_ overlapping key, maybe? Not sure. Ryanne On Thu, May 7, 2020, 12:05 PM Tom Bentley <tbent...@redhat.com> wrote: > Hi Rayanne, > > You raise some good points there. > > Similarly, if the whole record is encrypted, it becomes impossible to do > > joins, group bys etc, which just need the record key and maybe don't have > > access to the encryption key. Maybe only record _values_ should be > > encrypted, and maybe Kafka Streams could defer decryption until the > actual > > value is inspected. That way joins etc are possible without the > encryption > > key, and RocksDB would not need to decrypt values before materializing to > > disk. > > > > It's getting a bit late here, so maybe I overlooked something, but wouldn't > the natural thing to do be to make the "encrypted" key a hash of the > original key, and let the value of the encrypted value be the cipher text > of the (original key, original value) pair. A scheme like this would > preserve equality of the key (strictly speaking there's a chance of > collision of course). I guess this could also be a solution for the > compacted topic issue Sönke mentioned. > > Cheers, > > Tom > > > > On Thu, May 7, 2020 at 5:17 PM Ryanne Dolan <ryannedo...@gmail.com> wrote: > > > Thanks Sönke, this is an area in which Kafka is really, really far > behind. > > > > I've built secure systems around Kafka as laid out in the KIP. One issue > > that is not addressed in the KIP is re-encryption of records after a key > > rotation. When a key is compromised, it's important that any data > encrypted > > using that key is immediately destroyed or re-encrypted with a new key. > > Ideally first-class support for end-to-end encryption in Kafka would make > > this possible natively, or else I'm not sure what the point would be. It > > seems to me that the brokers would need to be involved in this process, > so > > perhaps a client-first approach will be painting ourselves into a corner. > > Not sure. > > > > Another issue is whether materialized tables, e.g. in Kafka Streams, > would > > see unencrypted or encrypted records. If we implemented the KIP as > written, > > it would still result in a bunch of plain text data in RocksDB > everywhere. > > Again, I'm not sure what the point would be. Perhaps using custom serdes > > would actually be a more holistic approach, since Kafka Streams etc could > > leverage these as well. > > > > Similarly, if the whole record is encrypted, it becomes impossible to do > > joins, group bys etc, which just need the record key and maybe don't have > > access to the encryption key. Maybe only record _values_ should be > > encrypted, and maybe Kafka Streams could defer decryption until the > actual > > value is inspected. That way joins etc are possible without the > encryption > > key, and RocksDB would not need to decrypt values before materializing to > > disk. > > > > This is why I've implemented encryption on a per-field basis, not at the > > record level, when addressing kafka security in the past. And I've had to > > build external pipelines that purge, re-encrypt, and re-ingest records > when > > keys are compromised. > > > > This KIP might be a step in the right direction, not sure. But I'm > hesitant > > to support the idea of end-to-end encryption without a plan to address > the > > myriad other problems. > > > > That said, we need this badly and I hope something shakes out. > > > > Ryanne > > > > On Tue, Apr 28, 2020, 6:26 PM Sönke Liebau > > <soenke.lie...@opencore.com.invalid> wrote: > > > > > All, > > > > > > I've asked for comments on this KIP in the past, but since I didn't > > really > > > get any feedback I've decided to reduce the initial scope of the KIP a > > bit > > > and try again. > > > > > > I have reworked to KIP to provide a limited, but useful set of features > > for > > > this initial KIP and laid out a very rough roadmap of what I'd envision > > > this looking like in a final version. > > > > > > I am aware that the KIP is currently light on implementation details, > but > > > would like to get some feedback on the general approach before fully > > > speccing everything. > > > > > > The KIP can be found at > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka > > > > > > > > > I would very much appreciate any feedback! > > > > > > Best regards, > > > Sönke > > > > > >