Hi Rayanne, You raise some good points there.
Similarly, if the whole record is encrypted, it becomes impossible to do > joins, group bys etc, which just need the record key and maybe don't have > access to the encryption key. Maybe only record _values_ should be > encrypted, and maybe Kafka Streams could defer decryption until the actual > value is inspected. That way joins etc are possible without the encryption > key, and RocksDB would not need to decrypt values before materializing to > disk. > It's getting a bit late here, so maybe I overlooked something, but wouldn't the natural thing to do be to make the "encrypted" key a hash of the original key, and let the value of the encrypted value be the cipher text of the (original key, original value) pair. A scheme like this would preserve equality of the key (strictly speaking there's a chance of collision of course). I guess this could also be a solution for the compacted topic issue Sönke mentioned. Cheers, Tom On Thu, May 7, 2020 at 5:17 PM Ryanne Dolan <ryannedo...@gmail.com> wrote: > Thanks Sönke, this is an area in which Kafka is really, really far behind. > > I've built secure systems around Kafka as laid out in the KIP. One issue > that is not addressed in the KIP is re-encryption of records after a key > rotation. When a key is compromised, it's important that any data encrypted > using that key is immediately destroyed or re-encrypted with a new key. > Ideally first-class support for end-to-end encryption in Kafka would make > this possible natively, or else I'm not sure what the point would be. It > seems to me that the brokers would need to be involved in this process, so > perhaps a client-first approach will be painting ourselves into a corner. > Not sure. > > Another issue is whether materialized tables, e.g. in Kafka Streams, would > see unencrypted or encrypted records. If we implemented the KIP as written, > it would still result in a bunch of plain text data in RocksDB everywhere. > Again, I'm not sure what the point would be. Perhaps using custom serdes > would actually be a more holistic approach, since Kafka Streams etc could > leverage these as well. > > Similarly, if the whole record is encrypted, it becomes impossible to do > joins, group bys etc, which just need the record key and maybe don't have > access to the encryption key. Maybe only record _values_ should be > encrypted, and maybe Kafka Streams could defer decryption until the actual > value is inspected. That way joins etc are possible without the encryption > key, and RocksDB would not need to decrypt values before materializing to > disk. > > This is why I've implemented encryption on a per-field basis, not at the > record level, when addressing kafka security in the past. And I've had to > build external pipelines that purge, re-encrypt, and re-ingest records when > keys are compromised. > > This KIP might be a step in the right direction, not sure. But I'm hesitant > to support the idea of end-to-end encryption without a plan to address the > myriad other problems. > > That said, we need this badly and I hope something shakes out. > > Ryanne > > On Tue, Apr 28, 2020, 6:26 PM Sönke Liebau > <soenke.lie...@opencore.com.invalid> wrote: > > > All, > > > > I've asked for comments on this KIP in the past, but since I didn't > really > > get any feedback I've decided to reduce the initial scope of the KIP a > bit > > and try again. > > > > I have reworked to KIP to provide a limited, but useful set of features > for > > this initial KIP and laid out a very rough roadmap of what I'd envision > > this looking like in a final version. > > > > I am aware that the KIP is currently light on implementation details, but > > would like to get some feedback on the general approach before fully > > speccing everything. > > > > The KIP can be found at > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka > > > > > > I would very much appreciate any feedback! > > > > Best regards, > > Sönke > > >