Eno, For us as well the requirement is around compacted topics because they are the topics that already facilitate selective deletes. Currently they allow specifying a minimum life time, but lacks the ability to specify a maximum life time.
For non compacted topics there's no ability to delete individual messages, they're immutable logs. We treat those with hard rules: Max retention time on the topic; accept the topic may get truncated; or to not store information that may be subject to GDPR. (and i've read others use tricks with encryption and forgetting the decryption key). Enhancing compaction to support a max compaction time makes the compacted topics more useful, especially in that it allows the dirty ratio to be used for its intended purpose while allowing automatic cleaning based on a new time config. On Tue, Aug 14, 2018 at 9:00 PM Eno Thereska <eno.there...@gmail.com> wrote: > Adding to this, what about topics that are not log compacted? As Dong says, > "one of the GDPR requirement is that we can not keep messages longer than > e.g. 30 days in storage (e.g. Kafka)". The GDPR requirement must hold > irrespective of the low level details, on whether the topic is compacted or > not, right? > > Thanks > Eno > > > On Mon, Aug 13, 2018 at 6:58 PM, Dong Lin <lindon...@gmail.com> wrote: > > > Hey Xiongqi, > > > > Thanks for the KIP. I have two questions regarding the use-case for > meeting > > GDPR requirement. > > > > 1) If I recall correctly, one of the GDPR requirement is that we can not > > keep messages longer than e.g. 30 days in storage (e.g. Kafka). Say there > > exists a partition p0 which contains message1 with key1 and message2 with > > key2. And then user keeps producing messages with key=key2 to this > > partition. Since message1 with key1 is never overridden, sooner or later > we > > will want to delete message1 and keep the latest message with key=key2. > But > > currently it looks like log compact logic in Kafka will always put these > > messages in the same segment. Will this be an issue? > > > > 2) The current KIP intends to provide the capability to delete a given > > message in log compacted topic. Does such use-case also require Kafka to > > keep the messages produced before the given message? If yes, then we can > > probably just use AdminClient.deleteRecords() or time-based log retention > > to meet the use-case requirement. If no, do you know what is the GDPR's > > requirement on time-to-deletion after user explicitly requests the > deletion > > (e.g. 1 hour, 1 day, 7 day)? > > > > Thanks, > > Dong > > > > > > On Mon, Aug 13, 2018 at 3:44 PM, xiongqi wu <xiongq...@gmail.com> wrote: > > > > > Hi Eno, > > > > > > The GDPR request we are getting here at linkedin is if we get a request > > to > > > delete a record through a null key on a log compacted topic, > > > we want to delete the record via compaction in a given time period > like 2 > > > days (whatever is required by the policy). > > > > > > There might be other issues (such as orphan log segments under certain > > > conditions) that lead to GDPR problem but they are more like something > > we > > > need to fix anyway regardless of GDPR. > > > > > > > > > -- Xiongqi (Wesley) Wu > > > > > > On Mon, Aug 13, 2018 at 2:56 PM, Eno Thereska <eno.there...@gmail.com> > > > wrote: > > > > > > > Hello, > > > > > > > > Thanks for the KIP. I'd like to see a more precise definition of what > > > part > > > > of GDPR you are targeting as well as some sort of verification that > > this > > > > KIP actually addresses the problem. Right now I find this a bit > vague: > > > > > > > > "Ability to delete a log message through compaction in a timely > manner > > > has > > > > become an important requirement in some use cases (e.g., GDPR)" > > > > > > > > > > > > Is there any guarantee that after this KIP the GDPR problem is solved > > or > > > do > > > > we need to do something else as well, e.g., more KIPs? > > > > > > > > > > > > Thanks > > > > > > > > Eno > > > > > > > > > > > > > > > > On Thu, Aug 9, 2018 at 4:18 PM, xiongqi wu <xiongq...@gmail.com> > > wrote: > > > > > > > > > Hi Kafka, > > > > > > > > > > This KIP tries to address GDPR concern to fulfill deletion request > on > > > > time > > > > > through time-based log compaction on a compaction enabled topic: > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-> > > > > > 354%3A+Time-based+log+compaction+policy > > > > > > > > > > Any feedback will be appreciated. > > > > > > > > > > > > > > > Xiongqi (Wesley) Wu > > > > > > > > > > > > > > > -- Brett Rann Senior DevOps Engineer Zendesk International Ltd 395 Collins Street, Melbourne VIC 3000 Australia Mobile: +61 (0) 418 826 017