Sounds nice! I'm discussing with a customer to create a fully anonymized stream for future analytical purposes.
Remaining question: the anonymization algorithm/strategy that maintains statistical relevance while being resilient against brute force. Thoughts? -wim On Thu, 23 Nov 2017 at 19:03 Scott Reynolds <sreyno...@twilio.com.invalid> wrote: > Our legal departments interpretation is when an account is deleted any data > that is kept longer then K days must be deleted. We setup our un-redacted > Kafka topics to never be greater then K days. This simplifies the problem > greatly. > > Our solution is designed to limit the ability of services to see parts of > the data they do not require to operate. It simplifies the technical > requirements ( no key management, library implementation in multiple > languages etc) requires little coordination with other teams (they change > the topic they read from, which is just a string) and fits cleanly within > Kafka ecosystem allowing teams to use new streaming technologies, older > technologies etc without requiring our data infrastructure team to support > them. > > I am really proud of our solution because it doesn't try to boil the ocean. > > On Thu, Nov 23, 2017 at 9:31 AM Wim Van Leuven < > wim.vanleu...@highestpoint.biz> wrote: > > > I think the best way to implement this is via envelope encryption: your > > system manages a key encryption key (kek) which is used to encrypt data > > encryption keys (dek) per user/customer which are used to encrypt the > > user's/customer's data. > > > > If the user/customer walks away, simply drop the dek. His data becomes > > undecryptable. > > > > You do have to implement reencryption in case keks or deks become > > compromised. > > > > If you run in the cloud, AWS and GCloud have basic services Key > Management > > Services (KMS) to manage the KEKs esp. The access to it and versioning > it. > > > > Their docs explain such a setup very well. > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__cloud.google.com_kms_docs_envelope-2Dencryption&d=DwIBaQ&c=x_Y1Lz9GyeGp2OvBCa_eow&r=ChXZJWKniTslJvQGptpIW7qAh4kkrpgYSer_wfh4G5w&m=01Zi8Fp_9BjyOtL5xBYDXRoWwaW_Om105MYasPnG_oc&s=zesl_dG4CDCFF-denLAjKtzsf6Hy0pB07O-jA3y-2zo&e= > > > > HTH > > -wim > > > > On Thu, Nov 23, 2017, 09:55 David Espinosa <espi...@gmail.com> wrote: > > > > > Hi Scott and thanks for your reply. > > > For what you say, I guess that when you are asked to delete some "data > > > user" (that's the "right to be forgotten" in GDPR), what you are really > > > doing is blocking the access to it. I had a similar approach, based on > > the > > > idea of Greg Young's solution of encrypting any private data and > > forgetting > > > the key when data has to deleted. > > > Sadly, our legal department after some checkins has conclude that this > > > approach is "to block" data but not deleting it, as a consequence it > can > > > take us problems. If my guess about your solution is right, you could > > have > > > the same problems. > > > > > > Thanks > > > > > > 2017-11-22 19:59 GMT+01:00 Scott Reynolds <sreyno...@twilio.com.invalid > > >: > > > > > > > We are using Kafka Connect consumers that consume from the raw > > unredacted > > > > topic and apply transformations and produce to a redacted topic. > Using > > > > kafka connect allows us to set it all up with an HTTP request and > > doesn't > > > > require additional infrastructure. > > > > > > > > Then we wrote a KafkaPrincipal builder to authenticate each consumer > to > > > > their service names. KafkaPrincipal class is specified in the > > > > > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__server.properties&d=DwIBaQ&c=x_Y1Lz9GyeGp2OvBCa_eow&r=ChXZJWKniTslJvQGptpIW7qAh4kkrpgYSer_wfh4G5w&m=01Zi8Fp_9BjyOtL5xBYDXRoWwaW_Om105MYasPnG_oc&s=t3YyvA1dDi-dZuJ-HNuo1SYyEPSNqmQ0DT63RNz4OLQ&e= > > file on the brokers. To provide topic level access > > > > control we just configured SimpleAclAuthorizer. The net result is, > some > > > > consumers can only read redacted topic and very few have consumers > can > > > read > > > > unredacted. > > > > > > > > On Wed, Nov 22, 2017 at 10:47 AM David Espinosa <espi...@gmail.com> > > > wrote: > > > > > > > > > Hi all, > > > > > I would like to double check with you how we want to apply some > GDPR > > > into > > > > > my kafka topics. In concrete the "right to be forgotten", what > forces > > > us > > > > to > > > > > delete some data contained in the messages. So not deleting the > > > message, > > > > > but editing it. > > > > > For doing that, my intention is to replicate the topic and apply a > > > > > transformation over it. > > > > > I think that frameworks like Kafka Streams or Apache Storm. > > > > > > > > > > Did anybody had to solve this problem? > > > > > > > > > > Thanks in advance. > > > > > > > > > -- > > > > > > > > Scott Reynolds > > > > Principal Engineer > > > > [image: twilio] <http://www.twilio.com/?utm_source=email_signature> > > > > MOBILE (630) 254-2474 > > > > EMAIL sreyno...@twilio.com > > > > > > > > > > -- > > Scott Reynolds > Principal Engineer > [image: twilio] <http://www.twilio.com/?utm_source=email_signature> > MOBILE (630) 254-2474 > EMAIL sreyno...@twilio.com >