Hi Sonke

Thanks for bringing this for discussion. There are lot of considerations
even if we assume we have end-to-end encryption done. Example depending
upon company's setup there could be restrictions on how/which encryption
keys are shared. Environment could have multiple security and network
boundaries beyond which keys are not allowed to be shared. That will mean
that consumers may not be able to decrypt the messages at all if the data
is moved from one zone to another. If we have mirroring done, are
mirror-makers supposed to decrypt and encrypt again OR they would be pretty
much bytes-in bytes-out paradigm that it is today? Also having a polyglot
Kafka client base will force you to support encryption/decryption libraries
that work for all the languages and that may not work depending upon the
scope of the team owning Kafka Infrastructure.

Combining disk encryption with TLS+ACLs could be enough instead of having
end-to-end message level encryption. What is your opinion on that?

We have experimented with end-to-end encryption with custom
serializers/deserializers and I felt that was good enough because
other challenges I mentioned before may not be ease to address with a
generic solution.

Thanks
Maulin



Thanks
Maulin




On Sat, May 9, 2020 at 2:05 PM Ryanne Dolan <ryannedo...@gmail.com> wrote:

> Adam, I agree, seems reasonable to limit the broker's responsibility to
> encrypting only data at rest. I guess whole segment files could be
> encrypted with the same key, and rotating keys would just involve
> re-encrypting entire segments. Maybe a key rotation would involve closing
> all affected segments and kicking off a background task to re-encrypt them.
> Certainly that would not impede ingestion of new records, and seems
> consumers could use the old segments until they are replaced with the newly
> encrypted ones.
>
> Seems that could still get us per-topic keys (vs encrypting the entire
> volume), which would be my main requirement.
>
> Not really "end-to-end", but combined with TLS or something, seems
> reasonable.
>
> Ryanne
>
> On Sat, May 9, 2020, 11:00 AM Adam Bellemare <adam.bellem...@gmail.com>
> wrote:
>
> > Hi All
> >
> > I typed up a number of replies which I have below, but I have one major
> > overriding question: Is there a reason we aren't implementing
> > encryption-at-rest almost exactly the same way that most relational
> > databases do? ie:
> > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption
> >
> > I ask this because it seems like we're going to end up with something
> > similar to what they did in terms of requirements, plus...
> >
> > "For the *past 16 months*, there has been discussion about whether and
> how
> > to implement Transparent Data Encryption (tde) in Postgres. Many other
> > relational databases support tde, and *some security standards require*
> it.
> > However, it is also debatable how much security value tde provides.
> > The tde *400-email
> > thread* became difficult for people to follow..."
> > What still isn't clear to me is the scope that we're trying to cover
> here.
> > Encryption at rest suggests that we need to have the data encrypted on
> the
> > brokers, and *only* on the brokers, since they're the durable units of
> > storage. Any encryption over the wire should be covered by TLS.  I think
> > that our goals for this should be (from
> >
> https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#Threat_models
> > )
> >
> > > TDE protects data from theft when file system access controls are
> > > compromised:
> > >
> > >    - Malicious user steals storage devices and reads database files
> > >    directly.
> > >    - Malicious backup operator takes backup.
> > >    - Protecting data at rest (persistent data)
> > >
> > > This does not protect from users who can read system memory, e.g.,
> shared
> > > buffers, which root users can do.
> > >
> >
> > I am not a security expert nor am I an expert on relational databases.
> > However, I can't identify any reason why the approach outlined by
> > PostgresDB, which is very similar to MySQL/InnoDB and IBM (from my
> > understanding) wouldn't work for data-at-rest encryption. In addition,
> we'd
> > get the added benefit of being consistent with other solutions, which is
> an
> > easier sell when discussing security with management (Kafka? Oh yeah,
> their
> > encryption solution is just like the one we already have in place for our
> > Postgres solutions), and may let us avoid reinventing a good part of the
> > wheel.
> >
> >
> > ------------------
> >
> > @Ryanne
> > One more complicating factor, regarding joins - the foreign key joiner
> > requires access to the value to extract the foreign key - if it's
> > encrypted, the FKJ would need to decrypt it to apply the value extractor.
> >
> > @Soenk re (1)
> > > When people hear that this is not part of Apache Kafka itself, but that
> > > would need to develop something themselves that more often than not is
> > the
> > > end of that discussion. Using something that is not "stock" is quite
> > often
> > > simply not an option.
> >
> > > I strongly feel that this is a needed feature in Kafka and that there
> is
> > a
> > > large number of people out there that would want to use it - but I may
> > very
> > > well be mistaken, responses to this thread have not exactly been
> > plentiful
> > > this last year and a half..
> >
> > I agree with you on the default vs. non-default points made. We must all
> > note that this mailing list is *not *representative of the typical users
> of
> > Kafka, and that many organizations are predominantly looking to use
> > out-of-the-box solutions. This will only become more common as hosted
> Kafka
> > solutions (think AWS hosted Kafka) gain more traction. I think the goal
> of
> > this KIP to provide that out-of-the-box experience is extremely
> important,
> > especially for all the reasons noted so far (GDPR, privacy, financials,
> > interest by many parties but no default solution).
> >
> > re: (4)
> > >> Regarding plaintext data in RocksDB instances, I am a bit torn to be
> > >> honest. On the one hand, I feel like this scenario is not something
> that
> > we
> > >> can fully control.
> >
> > I agree with this in principle. I think that our responsibility to
> encrypt
> > data at rest ends the moment that data leaves the broker. That being
> said,
> > it isn't unreasonable. I am going to think more about this and see if I
> can
> > come up with something.
> >
> >
> >
> >
> >
> > On Fri, May 8, 2020 at 5:05 AM Sönke Liebau
> > <soenke.lie...@opencore.com.invalid> wrote:
> >
> > > Hey everybody,
> > >
> > > thanks a lot for reading and giving feedback!! I'll try and answer all
> > > points that I found going through the thread in this mail, but if I
> miss
> > > something please feel free to let me know! I've added a running number
> to
> > > the discussed topics for ease of reference down the road.
> > >
> > > I'll go through the KIP and update it with everything that I have
> written
> > > below after sending this mail.
> > >
> > > @Tom:
> > > (1) If I understand your concerns correctly you feel that this
> > > functionality would have a hard time getting approved into Apache Kafka
> > > because it can be achieved with custom Serializers in the same way and
> > that
> > > we should maybe develop this outside of Apache Kafka at first.
> > > I feel like it is precisely the fact that this is not part of core
> Apache
> > > Kafka that makes people think twice about doing end-to-end encryption.
> I
> > > may be working in a market (Germany) that is a bit special when
> compared
> > to
> > > the rest of the world where encryption and things like that are
> > concerned,
> > > but I've personally sat in multiple meetings where this feature was
> > > discussed. It is not necessarily the end-to-end encryption itself, but
> > the
> > > at-rest encryption that you get with it.
> > > When people hear that this is not part of Apache Kafka itself, but that
> > > would need to develop something themselves that more often than not is
> > the
> > > end of that discussion. Using something that is not "stock" is quite
> > often
> > > simply not an option.
> > > Even if they decide to go forward with it, they'll find Hendrik's blog
> > post
> > > from 4 years ago on this, probably the Whitepapers from Confluent and
> > > Lenses and maybe a few implementations on github - all of which just
> > serve
> > > to further muddy the waters. Not because any of these resources are bad
> > or
> > > wrong, but just because information and implementations are spread out
> > over
> > > a lot of different places. Developing this outside of Apache Kafka
> would
> > > simply serve to add one more item to this list that would not really
> > matter
> > > I'm afraid.
> > >
> > > I strongly feel that this is a needed feature in Kafka and that there
> is
> > a
> > > large number of people out there that would want to use it - but I may
> > very
> > > well be mistaken, responses to this thread have not exactly been
> > plentiful
> > > this last year and a half..
> > >
> > > @Mike:
> > > (2) Regarding the encryption of headers, my current idea is to keep
> this
> > > configurable. I have seen customers use headers for stuff like account
> > > numbers which under the GDPR are considered to be personal data that
> > should
> > > be encrypted wherever possible. So in some instances it might be useful
> > to
> > > encrypt header fields as well.
> > > My current PoC implementation allows specifying a Regex for headers
> that
> > > should be encrypted, which would allow having encrypted and unencrypted
> > > headers in the same record to hopefully suit most use cases.
> > >
> > > (3) Also, my plan is to not change the message format, but to
> > > "encrypt-in-place" and add a header field with the necessary
> information
> > > for decryption, which would then be removed by the decrypting consumer.
> > > There may be some out-of-date intentions still in the KIP, I'll go
> > through
> > > it and update.
> > >
> > > @Ryanne:
> > > First off, I fully agree that we should avoid painting ourselves into a
> > > corner with an early client-only implementation. I scaled down this Kip
> > > from earlier attempts that included things like key rollover and
> > > broker-side implementations because I could not get any feedback from
> the
> > > community on those for a long time and felt that maybe there was no
> > > appetite for the full-blown solution. So I decided to try with a more
> > > limited scope. I am very happy to discuss/go for the fully featured
> > version
> > > again :)
> > >
> > > (4) Regarding plaintext data in RocksDB instances, I am a bit torn to
> be
> > > honest. On the one hand, I feel like this scenario is not something
> that
> > we
> > > can fully control. Kafka Streams in this case is a client that takes
> data
> > > from Kafka, decrypts it and then puts it somewhere in plaintext. To me
> > this
> > > scenario differs only slightly from for example someone writing a
> backup
> > > job that reads a topic and writes it to a textfile - not much we can do
> > > about it.
> > > That being said, Kafka Streams is part of Apache Kafka, so does merit
> > > special consideration. I'll have to dig into how StateStores are used a
> > bit
> > > (I am not the worlds largest expert - or any kind of expert on that) to
> > try
> > > and come up with an idea.
> > >
> > >
> > > (5) On key encryption and hashing, this is definitely an issue that we
> > need
> > > a solution for. I currently have key encryption configurable in my
> > > implementation. When encryption is enabled, an option would of course
> be
> > to
> > > hash the original key and store the key data together with the value in
> > an
> > > encrypted form. Any salt added to the key before hashing could be
> > encrypted
> > > along with the data. This would allow all key-based functionality like
> > > compaction, joins etc. to keep working without having to know the
> > cleartext
> > > key.
> > >
> > > I've also considered deterministic encryption which would keep the
> > > encrypted key the same, but I am fairly certain that we will want to
> > allow
> > > regular key rotation (more on this in next paragraph) without
> > re-encrypting
> > > older data and that would then change the encrypted key and break all
> > these
> > > things.
> > > Regarding re-encrypting existing keys when a crypto key is
> compromised, I
> > > think we need to be very careful with this if we do it in-place on the
> > > broker. If we add functionality along the lines of compaction, which
> > reads
> > > re-encrypts and rewrites segment files we have to make sure that
> > producers
> > > chose partitions on the cleartext value, otherwise all records starting
> > > from the key change may go to a different partition of the topic..
> > >
> > > (6) Key rollover would be a cool feature to have. I was up until now
> only
> > > thinking about supporting regular key rollover functionality that would
> > > change keys for all records going forward tbh - mostly for complexity
> > > reasons - I think there was actually a sentence in the original KIP to
> > this
> > > regard. But if you and others feel this is needed then I am happy to
> > > discuss this.
> > > If we implement this on the broker we could use topic compaction for
> > > inspiration, read all segment files and check records one by one, if
> the
> > > key used for that record has been "retired/compromised/..." re-encrypt
> > with
> > > new key and write a new segment file. Lots of things to consider around
> > > this regarding performance, how to trigger etc. but in principle this
> > could
> > > work I think.
> > > One issue I can see with this is if we use envelope encryption for the
> > keys
> > > to address the rogue admin issue, so the broker doesn't have access to
> > the
> > > actual key encrypting the data, this would make that operation
> > impossible.
> > >
> > >
> > >
> > > I hope I got to all items that were raised, but may very well have
> > > overlooked something, please let me know if I did - and of course your
> > > thoughts on what I wrote!
> > >
> > > I'll update the KIP today as well.
> > >
> > > Best regards,
> > > Sönke
> > >
> > >
> > >
> > >
> > > On Thu, 7 May 2020 at 19:54, Ryanne Dolan <ryannedo...@gmail.com>
> wrote:
> > >
> > > > Tom, good point, I've done exactly that -- hashing record keys -- but
> > > it's
> > > > unclear to me what should happen when the hash key must be rotated.
> In
> > my
> > > > case the (external) solution involved rainbow tables, versioned keys,
> > and
> > > > custom materializers that were aware of older keys for each record.
> > > >
> > > > In particular I had a pipeline that would re-key records and
> re-ingest
> > > > them, while opportunistically overwriting records materialized with
> the
> > > old
> > > > key.
> > > >
> > > > For a native solution I think maybe we'd need to carry around any old
> > > > versions of each record key, perhaps as metadata. Then brokers and
> > > > materializers can compact records based on _any_ overlapping key,
> > maybe?
> > > > Not sure.
> > > >
> > > > Ryanne
> > > >
> > > > On Thu, May 7, 2020, 12:05 PM Tom Bentley <tbent...@redhat.com>
> wrote:
> > > >
> > > > > Hi Rayanne,
> > > > >
> > > > > You raise some good points there.
> > > > >
> > > > > Similarly, if the whole record is encrypted, it becomes impossible
> to
> > > do
> > > > > > joins, group bys etc, which just need the record key and maybe
> > don't
> > > > have
> > > > > > access to the encryption key. Maybe only record _values_ should
> be
> > > > > > encrypted, and maybe Kafka Streams could defer decryption until
> the
> > > > > actual
> > > > > > value is inspected. That way joins etc are possible without the
> > > > > encryption
> > > > > > key, and RocksDB would not need to decrypt values before
> > > materializing
> > > > to
> > > > > > disk.
> > > > > >
> > > > >
> > > > > It's getting a bit late here, so maybe I overlooked something, but
> > > > wouldn't
> > > > > the natural thing to do be to make the "encrypted" key a hash of
> the
> > > > > original key, and let the value of the encrypted value be the
> cipher
> > > text
> > > > > of the (original key, original value) pair. A scheme like this
> would
> > > > > preserve equality of the key (strictly speaking there's a chance of
> > > > > collision of course). I guess this could also be a solution for the
> > > > > compacted topic issue Sönke mentioned.
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Tom
> > > > >
> > > > >
> > > > >
> > > > > On Thu, May 7, 2020 at 5:17 PM Ryanne Dolan <ryannedo...@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Thanks Sönke, this is an area in which Kafka is really, really
> far
> > > > > behind.
> > > > > >
> > > > > > I've built secure systems around Kafka as laid out in the KIP.
> One
> > > > issue
> > > > > > that is not addressed in the KIP is re-encryption of records
> after
> > a
> > > > key
> > > > > > rotation. When a key is compromised, it's important that any data
> > > > > encrypted
> > > > > > using that key is immediately destroyed or re-encrypted with a
> new
> > > key.
> > > > > > Ideally first-class support for end-to-end encryption in Kafka
> > would
> > > > make
> > > > > > this possible natively, or else I'm not sure what the point would
> > be.
> > > > It
> > > > > > seems to me that the brokers would need to be involved in this
> > > process,
> > > > > so
> > > > > > perhaps a client-first approach will be painting ourselves into a
> > > > corner.
> > > > > > Not sure.
> > > > > >
> > > > > > Another issue is whether materialized tables, e.g. in Kafka
> > Streams,
> > > > > would
> > > > > > see unencrypted or encrypted records. If we implemented the KIP
> as
> > > > > written,
> > > > > > it would still result in a bunch of plain text data in RocksDB
> > > > > everywhere.
> > > > > > Again, I'm not sure what the point would be. Perhaps using custom
> > > > serdes
> > > > > > would actually be a more holistic approach, since Kafka Streams
> etc
> > > > could
> > > > > > leverage these as well.
> > > > > >
> > > > > > Similarly, if the whole record is encrypted, it becomes
> impossible
> > to
> > > > do
> > > > > > joins, group bys etc, which just need the record key and maybe
> > don't
> > > > have
> > > > > > access to the encryption key. Maybe only record _values_ should
> be
> > > > > > encrypted, and maybe Kafka Streams could defer decryption until
> the
> > > > > actual
> > > > > > value is inspected. That way joins etc are possible without the
> > > > > encryption
> > > > > > key, and RocksDB would not need to decrypt values before
> > > materializing
> > > > to
> > > > > > disk.
> > > > > >
> > > > > > This is why I've implemented encryption on a per-field basis, not
> > at
> > > > the
> > > > > > record level, when addressing kafka security in the past. And
> I've
> > > had
> > > > to
> > > > > > build external pipelines that purge, re-encrypt, and re-ingest
> > > records
> > > > > when
> > > > > > keys are compromised.
> > > > > >
> > > > > > This KIP might be a step in the right direction, not sure. But
> I'm
> > > > > hesitant
> > > > > > to support the idea of end-to-end encryption without a plan to
> > > address
> > > > > the
> > > > > > myriad other problems.
> > > > > >
> > > > > > That said, we need this badly and I hope something shakes out.
> > > > > >
> > > > > > Ryanne
> > > > > >
> > > > > > On Tue, Apr 28, 2020, 6:26 PM Sönke Liebau
> > > > > > <soenke.lie...@opencore.com.invalid> wrote:
> > > > > >
> > > > > > > All,
> > > > > > >
> > > > > > > I've asked for comments on this KIP in the past, but since I
> > didn't
> > > > > > really
> > > > > > > get any feedback I've decided to reduce the initial scope of
> the
> > > KIP
> > > > a
> > > > > > bit
> > > > > > > and try again.
> > > > > > >
> > > > > > > I have reworked to KIP to provide a limited, but useful set of
> > > > features
> > > > > > for
> > > > > > > this initial KIP and laid out a very rough roadmap of what I'd
> > > > envision
> > > > > > > this looking like in a final version.
> > > > > > >
> > > > > > > I am aware that the KIP is currently light on implementation
> > > details,
> > > > > but
> > > > > > > would like to get some feedback on the general approach before
> > > fully
> > > > > > > speccing everything.
> > > > > > >
> > > > > > > The KIP can be found at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka
> > > > > > >
> > > > > > >
> > > > > > > I would very much appreciate any feedback!
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Sönke
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Sönke Liebau
> > > Partner
> > > Tel. +49 179 7940878
> > > OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany
> > >
> >
>

Reply via email to