Sounds nice!

I'm discussing with a customer to create a fully anonymized stream for
future analytical purposes.

Remaining question: the anonymization algorithm/strategy that maintains
statistical relevance while being resilient against brute force.

Thoughts?
-wim

On Thu, 23 Nov 2017 at 19:03 Scott Reynolds <sreyno...@twilio.com.invalid>
wrote:

> Our legal departments interpretation is when an account is deleted any data
> that is kept longer then K days must be deleted. We setup our un-redacted
> Kafka topics to never be greater then K days. This simplifies the problem
> greatly.
>
> Our solution is designed to limit the ability of services to see parts of
> the data they do not require to operate. It simplifies  the technical
> requirements ( no key management, library implementation in multiple
> languages etc) requires little coordination with other teams (they change
> the topic they read from, which is just a string) and fits cleanly within
> Kafka ecosystem allowing teams to use new streaming technologies, older
> technologies etc without  requiring our data infrastructure team to support
> them.
>
> I am really proud of our solution because it doesn't try to boil the ocean.
>
> On Thu, Nov 23, 2017 at 9:31 AM Wim Van Leuven <
> wim.vanleu...@highestpoint.biz> wrote:
>
> > I think the best way to implement this is via envelope encryption: your
> > system manages a key encryption key (kek) which is used to encrypt data
> > encryption keys (dek) per user/customer which are used to encrypt the
> > user's/customer's data.
> >
> > If the user/customer walks away, simply drop the dek. His data becomes
> > undecryptable.
> >
> > You do have to implement reencryption in case keks or deks become
> > compromised.
> >
> > If you run in the cloud, AWS and GCloud have basic services Key
> Management
> > Services (KMS) to manage the KEKs esp. The access to it and versioning
> it.
> >
> > Their docs explain such a setup very well.
> >
> >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cloud.google.com_kms_docs_envelope-2Dencryption&d=DwIBaQ&c=x_Y1Lz9GyeGp2OvBCa_eow&r=ChXZJWKniTslJvQGptpIW7qAh4kkrpgYSer_wfh4G5w&m=01Zi8Fp_9BjyOtL5xBYDXRoWwaW_Om105MYasPnG_oc&s=zesl_dG4CDCFF-denLAjKtzsf6Hy0pB07O-jA3y-2zo&e=
> >
> > HTH
> > -wim
> >
> > On Thu, Nov 23, 2017, 09:55 David Espinosa <espi...@gmail.com> wrote:
> >
> > > Hi Scott and thanks for your reply.
> > > For what you say, I guess that when you are asked to delete some "data
> > > user" (that's the "right to be forgotten" in GDPR), what you are really
> > > doing is blocking the access to it. I had a similar approach, based on
> > the
> > > idea of Greg Young's solution of encrypting any private data and
> > forgetting
> > > the key when data has to deleted.
> > > Sadly, our legal department after some checkins has conclude that this
> > > approach is "to block" data but not deleting it, as a consequence it
> can
> > > take us problems. If my guess about your solution is right, you could
> > have
> > > the same problems.
> > >
> > > Thanks
> > >
> > > 2017-11-22 19:59 GMT+01:00 Scott Reynolds <sreyno...@twilio.com.invalid
> > >:
> > >
> > > > We are using Kafka Connect consumers that consume from the raw
> > unredacted
> > > > topic and apply transformations and produce to a redacted topic.
> Using
> > > > kafka connect allows us to set it all up with an HTTP request and
> > doesn't
> > > > require additional infrastructure.
> > > >
> > > > Then we wrote a KafkaPrincipal builder to authenticate each consumer
> to
> > > > their service names. KafkaPrincipal class is specified in the
> > > >
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__server.properties&d=DwIBaQ&c=x_Y1Lz9GyeGp2OvBCa_eow&r=ChXZJWKniTslJvQGptpIW7qAh4kkrpgYSer_wfh4G5w&m=01Zi8Fp_9BjyOtL5xBYDXRoWwaW_Om105MYasPnG_oc&s=t3YyvA1dDi-dZuJ-HNuo1SYyEPSNqmQ0DT63RNz4OLQ&e=
> > file on the brokers. To provide topic level access
> > > > control we just configured SimpleAclAuthorizer. The net result is,
> some
> > > > consumers can only read redacted topic and very few have consumers
> can
> > > read
> > > > unredacted.
> > > >
> > > > On Wed, Nov 22, 2017 at 10:47 AM David Espinosa <espi...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi all,
> > > > > I would like to double check with you how we want to apply some
> GDPR
> > > into
> > > > > my kafka topics. In concrete the "right to be forgotten", what
> forces
> > > us
> > > > to
> > > > > delete some data contained in the messages. So not deleting the
> > > message,
> > > > > but editing it.
> > > > > For doing that, my intention is to replicate the topic and apply a
> > > > > transformation over it.
> > > > > I think that frameworks like Kafka Streams or Apache Storm.
> > > > >
> > > > > Did anybody had to solve this problem?
> > > > >
> > > > > Thanks in advance.
> > > > >
> > > > --
> > > >
> > > > Scott Reynolds
> > > > Principal Engineer
> > > > [image: twilio] <http://www.twilio.com/?utm_source=email_signature>
> > > > MOBILE (630) 254-2474
> > > > EMAIL sreyno...@twilio.com
> > > >
> > >
> >
> --
>
> Scott Reynolds
> Principal Engineer
> [image: twilio] <http://www.twilio.com/?utm_source=email_signature>
> MOBILE (630) 254-2474
> EMAIL sreyno...@twilio.com
>

Reply via email to