Tbh tom is right it is entirely possible to support end 2 end encryption today
without broker or client changes with serializers. Infact i know many companies
doing this.As such maybe a good approach would be to provide a default
encryption and decryption serde thats able to be used rather than any client or
broker changes at all. This way those who already have a working solution does
not change and basically youre providing a default solution to those who have
not already made one so that its easier to adopt.Sent from my Samsung Galaxy
smartphone.
-------- Original message --------From: Sönke Liebau
<soenke.lie...@opencore.com.INVALID> Date: 08/05/2020 10:05 (GMT+00:00) To:
dev <dev@kafka.apache.org> Subject: Re: [DISCUSS] KIP-317 - Add end-to-end data
encryption functionality to Apache Kafka Hey everybody,thanks a lot for reading
and giving feedback!! I'll try and answer allpoints that I found going through
the thread in this mail, but if I misssomething please feel free to let me
know! I've added a running number tothe discussed topics for ease of reference
down the road.I'll go through the KIP and update it with everything that I have
writtenbelow after sending this mail.@Tom:(1) If I understand your concerns
correctly you feel that thisfunctionality would have a hard time getting
approved into Apache Kafkabecause it can be achieved with custom Serializers in
the same way and thatwe should maybe develop this outside of Apache Kafka at
first.I feel like it is precisely the fact that this is not part of core
ApacheKafka that makes people think twice about doing end-to-end encryption.
Imay be working in a market (Germany) that is a bit special when compared tothe
rest of the world where encryption and things like that are concerned,but I've
personally sat in multiple meetings where this feature wasdiscussed. It is not
necessarily the end-to-end encryption itself, but theat-rest encryption that
you get with it.When people hear that this is not part of Apache Kafka itself,
but thatwould need to develop something themselves that more often than not is
theend of that discussion. Using something that is not "stock" is quite
oftensimply not an option.Even if they decide to go forward with it, they'll
find Hendrik's blog postfrom 4 years ago on this, probably the Whitepapers from
Confluent andLenses and maybe a few implementations on github - all of which
just serveto further muddy the waters. Not because any of these resources are
bad orwrong, but just because information and implementations are spread out
overa lot of different places. Developing this outside of Apache Kafka
wouldsimply serve to add one more item to this list that would not really
matterI'm afraid.I strongly feel that this is a needed feature in Kafka and
that there is alarge number of people out there that would want to use it - but
I may verywell be mistaken, responses to this thread have not exactly been
plentifulthis last year and a half..@Mike:(2) Regarding the encryption of
headers, my current idea is to keep thisconfigurable. I have seen customers use
headers for stuff like accountnumbers which under the GDPR are considered to be
personal data that shouldbe encrypted wherever possible. So in some instances
it might be useful toencrypt header fields as well.My current PoC
implementation allows specifying a Regex for headers thatshould be encrypted,
which would allow having encrypted and unencryptedheaders in the same record to
hopefully suit most use cases.(3) Also, my plan is to not change the message
format, but to"encrypt-in-place" and add a header field with the necessary
informationfor decryption, which would then be removed by the decrypting
consumer.There may be some out-of-date intentions still in the KIP, I'll go
throughit and update.@Ryanne:First off, I fully agree that we should avoid
painting ourselves into acorner with an early client-only implementation. I
scaled down this Kipfrom earlier attempts that included things like key
rollover andbroker-side implementations because I could not get any feedback
from thecommunity on those for a long time and felt that maybe there was
noappetite for the full-blown solution. So I decided to try with a morelimited
scope. I am very happy to discuss/go for the fully featured versionagain :)(4)
Regarding plaintext data in RocksDB instances, I am a bit torn to behonest. On
the one hand, I feel like this scenario is not something that wecan fully
control. Kafka Streams in this case is a client that takes datafrom Kafka,
decrypts it and then puts it somewhere in plaintext. To me thisscenario differs
only slightly from for example someone writing a backupjob that reads a topic
and writes it to a textfile - not much we can doabout it.That being said, Kafka
Streams is part of Apache Kafka, so does meritspecial consideration. I'll have
to dig into how StateStores are used a bit(I am not the worlds largest expert -
or any kind of expert on that) to tryand come up with an idea.(5) On key
encryption and hashing, this is definitely an issue that we needa solution for.
I currently have key encryption configurable in myimplementation. When
encryption is enabled, an option would of course be tohash the original key and
store the key data together with the value in anencrypted form. Any salt added
to the key before hashing could be encryptedalong with the data. This would
allow all key-based functionality likecompaction, joins etc. to keep working
without having to know the cleartextkey.I've also considered deterministic
encryption which would keep theencrypted key the same, but I am fairly certain
that we will want to allowregular key rotation (more on this in next paragraph)
without re-encryptingolder data and that would then change the encrypted key
and break all thesethings.Regarding re-encrypting existing keys when a crypto
key is compromised, Ithink we need to be very careful with this if we do it
in-place on thebroker. If we add functionality along the lines of compaction,
which readsre-encrypts and rewrites segment files we have to make sure that
producerschose partitions on the cleartext value, otherwise all records
startingfrom the key change may go to a different partition of the topic..(6)
Key rollover would be a cool feature to have. I was up until now onlythinking
about supporting regular key rollover functionality that wouldchange keys for
all records going forward tbh - mostly for complexityreasons - I think there
was actually a sentence in the original KIP to thisregard. But if you and
others feel this is needed then I am happy todiscuss this.If we implement this
on the broker we could use topic compaction forinspiration, read all segment
files and check records one by one, if thekey used for that record has been
"retired/compromised/..." re-encrypt withnew key and write a new segment file.
Lots of things to consider aroundthis regarding performance, how to trigger
etc. but in principle this couldwork I think.One issue I can see with this is
if we use envelope encryption for the keysto address the rogue admin issue, so
the broker doesn't have access to theactual key encrypting the data, this would
make that operation impossible.I hope I got to all items that were raised, but
may very well haveoverlooked something, please let me know if I did - and of
course yourthoughts on what I wrote!I'll update the KIP today as well.Best
regards,SönkeOn Thu, 7 May 2020 at 19:54, Ryanne Dolan <ryannedo...@gmail.com>
wrote:> Tom, good point, I've done exactly that -- hashing record keys -- but
it's> unclear to me what should happen when the hash key must be rotated. In
my> case the (external) solution involved rainbow tables, versioned keys, and>
custom materializers that were aware of older keys for each record.>> In
particular I had a pipeline that would re-key records and re-ingest> them,
while opportunistically overwriting records materialized with the old> key.>>
For a native solution I think maybe we'd need to carry around any old> versions
of each record key, perhaps as metadata. Then brokers and> materializers can
compact records based on _any_ overlapping key, maybe?> Not sure.>> Ryanne>> On
Thu, May 7, 2020, 12:05 PM Tom Bentley <tbent...@redhat.com> wrote:>> > Hi
Rayanne,> >> > You raise some good points there.> >> > Similarly, if the whole
record is encrypted, it becomes impossible to do> > > joins, group bys etc,
which just need the record key and maybe don't> have> > > access to the
encryption key. Maybe only record _values_ should be> > > encrypted, and maybe
Kafka Streams could defer decryption until the> > actual> > > value is
inspected. That way joins etc are possible without the> > encryption> > > key,
and RocksDB would not need to decrypt values before materializing> to> > >
disk.> > >> >> > It's getting a bit late here, so maybe I overlooked something,
but> wouldn't> > the natural thing to do be to make the "encrypted" key a hash
of the> > original key, and let the value of the encrypted value be the cipher
text> > of the (original key, original value) pair. A scheme like this would> >
preserve equality of the key (strictly speaking there's a chance of> >
collision of course). I guess this could also be a solution for the> >
compacted topic issue Sönke mentioned.> >> > Cheers,> >> > Tom> >> >> >> > On
Thu, May 7, 2020 at 5:17 PM Ryanne Dolan <ryannedo...@gmail.com>> wrote:> >> >
> Thanks Sönke, this is an area in which Kafka is really, really far> >
behind.> > >> > > I've built secure systems around Kafka as laid out in the
KIP. One> issue> > > that is not addressed in the KIP is re-encryption of
records after a> key> > > rotation. When a key is compromised, it's important
that any data> > encrypted> > > using that key is immediately destroyed or
re-encrypted with a new key.> > > Ideally first-class support for end-to-end
encryption in Kafka would> make> > > this possible natively, or else I'm not
sure what the point would be.> It> > > seems to me that the brokers would need
to be involved in this process,> > so> > > perhaps a client-first approach will
be painting ourselves into a> corner.> > > Not sure.> > >> > > Another issue is
whether materialized tables, e.g. in Kafka Streams,> > would> > > see
unencrypted or encrypted records. If we implemented the KIP as> > written,> > >
it would still result in a bunch of plain text data in RocksDB> > everywhere.>
> > Again, I'm not sure what the point would be. Perhaps using custom> serdes>
> > would actually be a more holistic approach, since Kafka Streams etc> could>
> > leverage these as well.> > >> > > Similarly, if the whole record is
encrypted, it becomes impossible to> do> > > joins, group bys etc, which just
need the record key and maybe don't> have> > > access to the encryption key.
Maybe only record _values_ should be> > > encrypted, and maybe Kafka Streams
could defer decryption until the> > actual> > > value is inspected. That way
joins etc are possible without the> > encryption> > > key, and RocksDB would
not need to decrypt values before materializing> to> > > disk.> > >> > > This
is why I've implemented encryption on a per-field basis, not at> the> > >
record level, when addressing kafka security in the past. And I've had> to> > >
build external pipelines that purge, re-encrypt, and re-ingest records> > when>
> > keys are compromised.> > >> > > This KIP might be a step in the right
direction, not sure. But I'm> > hesitant> > > to support the idea of end-to-end
encryption without a plan to address> > the> > > myriad other problems.> > >> >
> That said, we need this badly and I hope something shakes out.> > >> > >
Ryanne> > >> > > On Tue, Apr 28, 2020, 6:26 PM Sönke Liebau> > >
<soenke.lie...@opencore.com.invalid> wrote:> > >> > > > All,> > > >> > > > I've
asked for comments on this KIP in the past, but since I didn't> > > really> > >
> get any feedback I've decided to reduce the initial scope of the KIP> a> > >
bit> > > > and try again.> > > >> > > > I have reworked to KIP to provide a
limited, but useful set of> features> > > for> > > > this initial KIP and laid
out a very rough roadmap of what I'd> envision> > > > this looking like in a
final version.> > > >> > > > I am aware that the KIP is currently light on
implementation details,> > but> > > > would like to get some feedback on the
general approach before fully> > > > speccing everything.> > > >> > > > The KIP
can be found at> > > >> > > >> > >> >>
https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka>
> > >> > > >> > > > I would very much appreciate any feedback!> > > >> > > >
Best regards,> > > > Sönke> > > >> > >> >>-- Sönke LiebauPartnerTel. +49 179
7940878OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany