Re: GDPR appliance

2018-01-26 Thread David Espinosa
Thanks a lot. I think that's the only way that ensures GDPR compliance. In a second iteration, my thoughts are to anonymize instead of removing, maybe identifying PII fields using AVRO custom types. Thanks again, 2017-11-28 15:54 GMT+01:00 Ben Stopford : > You should also be able to manage this

Re: GDPR appliance

2017-11-28 Thread Ben Stopford
You should also be able to manage this with a compacted topic. If you give each message a unique key you'd then be able to delete, or overwrite specific records. Kafka will delete them from disk when compaction runs. If you need to partition for ordering purposes you'd need to use a custom partitio

Re: GDPR appliance

2017-11-26 Thread Wim Van Leuven
Thanks, Lars, for the most interesting read! On Sun, 26 Nov 2017 at 00:38 Lars Albertsson wrote: > Hi David, > > You might find this presentation useful: > https://www.slideshare.net/lallea/protecting-privacy-in-practice > > It explains privacy building blocks primarily in a batch processing >

Re: GDPR appliance

2017-11-25 Thread Lars Albertsson
Hi David, You might find this presentation useful: https://www.slideshare.net/lallea/protecting-privacy-in-practice It explains privacy building blocks primarily in a batch processing context, but most of the principles are applicable for stream processing as well, e.g. splitting non-PII and PII

Re: GDPR appliance

2017-11-23 Thread Wim Van Leuven
Sounds nice! I'm discussing with a customer to create a fully anonymized stream for future analytical purposes. Remaining question: the anonymization algorithm/strategy that maintains statistical relevance while being resilient against brute force. Thoughts? -wim On Thu, 23 Nov 2017 at 19:03 Sc

Re: GDPR appliance

2017-11-23 Thread Scott Reynolds
Our legal departments interpretation is when an account is deleted any data that is kept longer then K days must be deleted. We setup our un-redacted Kafka topics to never be greater then K days. This simplifies the problem greatly. Our solution is designed to limit the ability of services to see

Re: GDPR appliance

2017-11-23 Thread Wim Van Leuven
I think the best way to implement this is via envelope encryption: your system manages a key encryption key (kek) which is used to encrypt data encryption keys (dek) per user/customer which are used to encrypt the user's/customer's data. If the user/customer walks away, simply drop the dek. His da

Re: GDPR appliance

2017-11-23 Thread David Espinosa
Hi Scott and thanks for your reply. For what you say, I guess that when you are asked to delete some "data user" (that's the "right to be forgotten" in GDPR), what you are really doing is blocking the access to it. I had a similar approach, based on the idea of Greg Young's solution of encrypting a

Re: GDPR appliance

2017-11-22 Thread Scott Reynolds
We are using Kafka Connect consumers that consume from the raw unredacted topic and apply transformations and produce to a redacted topic. Using kafka connect allows us to set it all up with an HTTP request and doesn't require additional infrastructure. Then we wrote a KafkaPrincipal builder to au

GDPR appliance

2017-11-22 Thread David Espinosa
Hi all, I would like to double check with you how we want to apply some GDPR into my kafka topics. In concrete the "right to be forgotten", what forces us to delete some data contained in the messages. So not deleting the message, but editing it. For doing that, my intention is to replicate the top