Might also be worth moving to a vote thread? Discussion seems to have gone as far as it can.
> On 4 Sep 2018, at 12:08, xiongqi wu <xiongq...@gmail.com> wrote: > > Brett, > > Yes, I will post PR tomorrow. > > Xiongqi (Wesley) Wu > > > On Sun, Sep 2, 2018 at 6:28 PM Brett Rann <br...@zendesk.com.invalid> wrote: > > > +1 (non-binding) from me on the interface. I'd like to see someone familiar > > with > > the code comment on the approach, and note there's a couple of different > > approaches: what's documented in the KIP, and what Xiaohe Dong was working > > on > > here: > > > > https://github.com/dongxiaohe/kafka/tree/dongxiaohe/log-cleaner-compaction-max-lifetime-2.0 > > > > If you have code working already Xiongqi Wu could you share a PR? I'd be > > happy > > to start testing. > > > > On Tue, Aug 28, 2018 at 5:57 AM xiongqi wu <xiongq...@gmail.com> wrote: > > > > > Hi All, > > > > > > Do you have any additional comments on this KIP? > > > > > > > > > On Thu, Aug 16, 2018 at 9:17 PM, xiongqi wu <xiongq...@gmail.com> wrote: > > > > > > > on 2) > > > > The offsetmap is built starting from dirty segment. > > > > The compaction starts from the beginning of the log partition. That's > > how > > > > it ensure the deletion of tomb keys. > > > > I will double check tomorrow. > > > > > > > > Xiongqi (Wesley) Wu > > > > > > > > > > > > On Thu, Aug 16, 2018 at 6:46 PM Brett Rann <br...@zendesk.com.invalid> > > > > wrote: > > > > > > > >> To just clarify a bit on 1. whether there's an external storage/DB > > isn't > > > >> relevant here. > > > >> Compacted topics allow a tombstone record to be sent (a null value > > for a > > > >> key) which > > > >> currently will result in old values for that key being deleted if some > > > >> conditions are met. > > > >> There are existing controls to make sure the old values will stay > > around > > > >> for a minimum > > > >> time at least, but no dedicated control to ensure the tombstone will > > > >> delete > > > >> within a > > > >> maximum time. > > > >> > > > >> One popular reason that maximum time for deletion is desirable right > > now > > > >> is > > > >> GDPR with > > > >> PII. But we're not proposing any GDPR awareness in kafka, just being > > > able > > > >> to guarantee > > > >> a max time where a tombstoned key will be removed from the compacted > > > >> topic. > > > >> > > > >> on 2) > > > >> huh, i thought it kept track of the first dirty segment and didn't > > > >> recompact older "clean" ones. > > > >> But I didn't look at code or test for that. > > > >> > > > >> On Fri, Aug 17, 2018 at 10:57 AM xiongqi wu <xiongq...@gmail.com> > > > wrote: > > > >> > > > >> > 1, Owner of data (in this sense, kafka is the not the owner of data) > > > >> > should keep track of lifecycle of the data in some external > > > storage/DB. > > > >> > The owner determines when to delete the data and send the delete > > > >> request to > > > >> > kafka. Kafka doesn't know about the content of data but to provide a > > > >> mean > > > >> > for deletion. > > > >> > > > > >> > 2 , each time compaction runs, it will start from first segments (no > > > >> > matter if it is compacted or not). The time estimation here is only > > > used > > > >> > to determine whether we should run compaction on this log partition. > > > So > > > >> we > > > >> > only need to estimate uncompacted segments. > > > >> > > > > >> > On Thu, Aug 16, 2018 at 5:35 PM, Dong Lin <lindon...@gmail.com> > > > wrote: > > > >> > > > > >> > > Hey Xiongqi, > > > >> > > > > > >> > > Thanks for the update. I have two questions for the latest KIP. > > > >> > > > > > >> > > 1) The motivation section says that one use case is to delete PII > > > >> > (Personal > > > >> > > Identifiable information) data within 7 days while keeping non-PII > > > >> > > indefinitely in compacted format. I suppose the use-case depends > > on > > > >> the > > > >> > > application to determine when to delete those PII data. Could you > > > >> explain > > > >> > > how can application reliably determine the set of keys that should > > > be > > > >> > > deleted? Is application required to always messages from the topic > > > >> after > > > >> > > every restart and determine the keys to be deleted by looking at > > > >> message > > > >> > > timestamp, or is application supposed to persist the key-> > > timstamp > > > >> > > information in a separate persistent storage system? > > > >> > > > > > >> > > 2) It is mentioned in the KIP that "we only need to estimate > > > earliest > > > >> > > message timestamp for un-compacted log segments because the > > deletion > > > >> > > requests that belong to compacted segments have already been > > > >> processed". > > > >> > > Not sure if it is correct. If a segment is compacted before user > > > sends > > > >> > > message to delete a key in this segment, it seems that we still > > need > > > >> to > > > >> > > ensure that the segment will be compacted again within the given > > > time > > > >> > after > > > >> > > the deletion is requested, right? > > > >> > > > > > >> > > Thanks, > > > >> > > Dong > > > >> > > > > > >> > > On Thu, Aug 16, 2018 at 10:27 AM, xiongqi wu <xiongq...@gmail.com > > > > > > >> > wrote: > > > >> > > > > > >> > > > Hi Xiaohe, > > > >> > > > > > > >> > > > Quick note: > > > >> > > > 1) Use minimum of segment.ms and max.compaction.lag.ms > > > >> > > > <http://max.compaction.ms > > > <http://max.compaction.ms> > > > >> > <http://max.compaction.ms > > > <http://max.compaction.ms>>> > > > >> > > > > > > >> > > > 2) I am not sure if I get your second question. first, we have > > > >> jitter > > > >> > > when > > > >> > > > we roll the active segment. second, on each compaction, we > > compact > > > >> upto > > > >> > > > the offsetmap could allow. Those will not lead to perfect > > > compaction > > > >> > > storm > > > >> > > > overtime. In addition, I expect we are setting > > > >> max.compaction.lag.ms > > > >> > on > > > >> > > > the order of days. > > > >> > > > > > > >> > > > 3) I don't have access to the confluent community slack for > > now. I > > > >> am > > > >> > > > reachable via the google handle out. > > > >> > > > To avoid the double effort, here is my plan: > > > >> > > > a) Collect more feedback and feature requriement on the KIP. > > > >> > > > b) Wait unitl this KIP is approved. > > > >> > > > c) I will address any additional requirements in the > > > implementation. > > > >> > (My > > > >> > > > current implementation only complies to whatever described in > > the > > > >> KIP > > > >> > > now) > > > >> > > > d) I can share the code with the you and community see you want > > to > > > >> add > > > >> > > > anything. > > > >> > > > e) submission through committee > > > >> > > > > > > >> > > > > > > >> > > > On Wed, Aug 15, 2018 at 11:42 PM, XIAOHE DONG < > > > >> dannyriv...@gmail.com> > > > >> > > > wrote: > > > >> > > > > > > >> > > > > Hi Xiongqi > > > >> > > > > > > > >> > > > > Thanks for thinking about implementing this as well. :) > > > >> > > > > > > > >> > > > > I was thinking about using `segment.ms` to trigger the > > segment > > > >> roll. > > > >> > > > > Also, its value can be the largest time bias for the record > > > >> deletion. > > > >> > > For > > > >> > > > > example, if the `segment.ms` is 1 day and `max.compaction.ms` > > > is > > > >> 30 > > > >> > > > days, > > > >> > > > > the compaction may happen around 31 days. > > > >> > > > > > > > >> > > > > For my curiosity, is there a way we can do some performance > > test > > > >> for > > > >> > > this > > > >> > > > > and any tools you can recommend. As you know, previously, it > > is > > > >> > cleaned > > > >> > > > up > > > >> > > > > by respecting dirty ratio, but now it may happen anytime if > > max > > > >> lag > > > >> > has > > > >> > > > > passed for each message. I wonder what would happen if clients > > > >> send > > > >> > > huge > > > >> > > > > amount of tombstone records at the same time. > > > >> > > > > > > > >> > > > > I am looking forward to have a quick chat with you to avoid > > > double > > > >> > > effort > > > >> > > > > on this. I am in confluent community slack during the work > > time. > > > >> My > > > >> > > name > > > >> > > > is > > > >> > > > > Xiaohe Dong. :) > > > >> > > > > > > > >> > > > > Rgds > > > >> > > > > Xiaohe Dong > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > On 2018/08/16 01:22:22, xiongqi wu <xiongq...@gmail.com> > > wrote: > > > >> > > > > > Brett, > > > >> > > > > > > > > >> > > > > > Thank you for your comments. > > > >> > > > > > I was thinking since we already has immediate compaction > > > >> setting by > > > >> > > > > setting > > > >> > > > > > min dirty ratio to 0, so I decide to use "0" as disabled > > > state. > > > >> > > > > > I am ok to go with -1(disable), 0 (immediate) options. > > > >> > > > > > > > > >> > > > > > For the implementation, there are a few differences between > > > mine > > > >> > and > > > >> > > > > > "Xiaohe Dong"'s : > > > >> > > > > > 1) I used the estimated creation time of a log segment > > instead > > > >> of > > > >> > > > largest > > > >> > > > > > timestamp of a log to determine the compaction eligibility, > > > >> > because a > > > >> > > > log > > > >> > > > > > segment might stay as an active segment up to "max > > compaction > > > >> lag". > > > >> > > > (see > > > >> > > > > > the KIP for detail). > > > >> > > > > > 2) I measure how much bytes that we must clean to follow the > > > >> "max > > > >> > > > > > compaction lag" rule, and use that to determine the order of > > > >> > > > compaction. > > > >> > > > > > 3) force active segment to roll to follow the "max > > compaction > > > >> lag" > > > >> > > > > > > > > >> > > > > > I can share my code so we can coordinate. > > > >> > > > > > > > > >> > > > > > I haven't think about a new API to force a compaction. what > > is > > > >> the > > > >> > > use > > > >> > > > > case > > > >> > > > > > for this one? > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > On Wed, Aug 15, 2018 at 5:33 PM, Brett Rann > > > >> > > <br...@zendesk.com.invalid > > > >> > > > > > > > >> > > > > > wrote: > > > >> > > > > > > > > >> > > > > > > We've been looking into this too. > > > >> > > > > > > > > > >> > > > > > > Mailing list: > > > >> > > > > > > https://lists.apache.org/thread.html/ > > > <https://lists.apache.org/thread.html/> > > > >> > <https://lists.apache.org/thread.html/ > > > <https://lists.apache.org/thread.html/>> > > > >> > > ed7f6a6589f94e8c2a705553f364ef > > > >> > > > > > > 599cb6915e4c3ba9b561e610e4@%3Cdev.kafka.apache.org%3E > > > >> > > > > > > jira wish: > > https://issues.apache.org/jira/browse/KAFKA-7137 > > > <https://issues.apache.org/jira/browse/KAFKA-7137> > > > >> > <https://issues.apache.org/jira/browse/KAFKA-7137 > > > <https://issues.apache.org/jira/browse/KAFKA-7137>> > > > >> > > > > > > confluent slack discussion: > > > >> > > > > > > https://confluentcommunity.slack.com/archives/C49R61XMM/ > > > <https://confluentcommunity.slack.com/archives/C49R61XMM/> > > > >> > <https://confluentcommunity.slack.com/archives/C49R61XMM/ > > > <https://confluentcommunity.slack.com/archives/C49R61XMM/>> > > > >> > > > > p1530760121000039 > > > >> > > > > > > > > > >> > > > > > > A person on my team has started on code so you might want > > to > > > >> > > > > coordinate: > > > >> > > > > > > https://github.com/dongxiaohe/kafka/tree/dongxiaohe/log- > > > <https://github.com/dongxiaohe/kafka/tree/dongxiaohe/log-> > > > >> > <https://github.com/dongxiaohe/kafka/tree/dongxiaohe/log- > > > <https://github.com/dongxiaohe/kafka/tree/dongxiaohe/log->> > > > >> > > > > > > cleaner-compaction-max-lifetime-2.0 > > > >> > > > > > > > > > >> > > > > > > He's been working with Jason Gustafson and James Chen > > around > > > >> the > > > >> > > > > changes. > > > >> > > > > > > You can ping him on confluent slack as Xiaohe Dong. > > > >> > > > > > > > > > >> > > > > > > It's great to know others are thinking on it as well. > > > >> > > > > > > > > > >> > > > > > > You've added the requirement to force a segment roll which > > > we > > > >> > > hadn't > > > >> > > > > gotten > > > >> > > > > > > to yet, which is great. I was content with it not > > including > > > >> the > > > >> > > > active > > > >> > > > > > > segment. > > > >> > > > > > > > > > >> > > > > > > > Adding topic level configuration "max.compaction.lag.ms > > ", > > > >> and > > > >> > > > > > > corresponding broker configuration " > > > >> > log.cleaner.max.compaction.la > > > >> > > > g.ms > > > >> > > > > ", > > > >> > > > > > > which is set to 0 (disabled) by default. > > > >> > > > > > > > > > >> > > > > > > Glancing at some other settings convention seems to me to > > be > > > >> -1 > > > >> > for > > > >> > > > > > > disabled (or infinite, which is more meaningful here). 0 > > to > > > me > > > >> > > > implies > > > >> > > > > > > instant, a little quicker than 1. > > > >> > > > > > > > > > >> > > > > > > We've been trying to think about a way to trigger > > compaction > > > >> as > > > >> > > well > > > >> > > > > > > through an API call, which would need to be flagged > > > somewhere > > > >> (ZK > > > >> > > > > admin/ > > > >> > > > > > > space?) but we're struggling to think how that would be > > > >> > coordinated > > > >> > > > > across > > > >> > > > > > > brokers and partitions. Have you given any thought to > > that? > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > On Thu, Aug 16, 2018 at 8:44 AM xiongqi wu < > > > >> xiongq...@gmail.com> > > > >> > > > > wrote: > > > >> > > > > > > > > > >> > > > > > > > Eno, Dong, > > > >> > > > > > > > > > > >> > > > > > > > I have updated the KIP. We decide not to address the > > issue > > > >> that > > > >> > > we > > > >> > > > > might > > > >> > > > > > > > have for both compaction and time retention enabled > > topics > > > >> (see > > > >> > > the > > > >> > > > > > > > rejected alternative item 2). This KIP will only ensure > > > log > > > >> can > > > >> > > be > > > >> > > > > > > > compacted after a specified time-interval. > > > >> > > > > > > > > > > >> > > > > > > > As suggested by Dong, we will also enforce " > > > >> > > max.compaction.lag.ms" > > > >> > > > > is > > > >> > > > > > > not > > > >> > > > > > > > less than "min.compaction.lag.ms". > > > >> > > > > > > > > > > >> > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-354 > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-354> > > > >> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-354 > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-354>> > > > >> > > > > Time-based > > > >> > > > > > > log > > > >> > > > > > > > compaction policy > > > >> > > > > > > > < > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-354 > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-354> > > > >> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-354 > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-354>> > > > >> > > > > Time-based > > > >> > > > > > > log compaction policy> > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > On Tue, Aug 14, 2018 at 5:01 PM, xiongqi wu < > > > >> > xiongq...@gmail.com > > > >> > > > > > > >> > > > > wrote: > > > >> > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > Per discussion with Dong, he made a very good point > > that > > > >> if > > > >> > > > > compaction > > > >> > > > > > > > > and time based retention are both enabled on a topic, > > > the > > > >> > > > > compaction > > > >> > > > > > > > might > > > >> > > > > > > > > prevent records from being deleted on time. The reason > > > is > > > >> > when > > > >> > > > > > > compacting > > > >> > > > > > > > > multiple segments into one single segment, the newly > > > >> created > > > >> > > > > segment > > > >> > > > > > > will > > > >> > > > > > > > > have same lastmodified timestamp as latest original > > > >> segment. > > > >> > We > > > >> > > > > lose > > > >> > > > > > > the > > > >> > > > > > > > > timestamp of all original segments except the last > > one. > > > >> As a > > > >> > > > > result, > > > >> > > > > > > > > records might not be deleted as it should be through > > > time > > > >> > based > > > >> > > > > > > > retention. > > > >> > > > > > > > > > > > >> > > > > > > > > With the current KIP proposal, if we want to ensure > > > timely > > > >> > > > > deletion, we > > > >> > > > > > > > > have the following configurations: > > > >> > > > > > > > > 1) enable time based log compaction only : deletion is > > > >> done > > > >> > > > though > > > >> > > > > > > > > overriding the same key > > > >> > > > > > > > > 2) enable time based log retention only: deletion is > > > done > > > >> > > though > > > >> > > > > > > > > time-based retention > > > >> > > > > > > > > 3) enable both log compaction and time based > > retention: > > > >> > > Deletion > > > >> > > > > is not > > > >> > > > > > > > > guaranteed. > > > >> > > > > > > > > > > > >> > > > > > > > > Not sure if we have use case 3 and also want deletion > > to > > > >> > happen > > > >> > > > on > > > >> > > > > > > time. > > > >> > > > > > > > > There are several options to address deletion issue > > when > > > >> > enable > > > >> > > > > both > > > >> > > > > > > > > compaction and retention: > > > >> > > > > > > > > A) During log compaction, looking into record > > timestamp > > > to > > > >> > > delete > > > >> > > > > > > expired > > > >> > > > > > > > > records. This can be done in compaction logic itself > > or > > > >> use > > > >> > > > > > > > > AdminClient.deleteRecords() . But this assumes we have > > > >> record > > > >> > > > > > > timestamp. > > > >> > > > > > > > > B) retain the lastModifed time of original segments > > > during > > > >> > log > > > >> > > > > > > > compaction. > > > >> > > > > > > > > This requires extra meta data to record the > > information > > > or > > > >> > not > > > >> > > > > grouping > > > >> > > > > > > > > multiple segments into one during compaction. > > > >> > > > > > > > > > > > >> > > > > > > > > If we have use case 3 in general, I would prefer > > > solution > > > >> A > > > >> > and > > > >> > > > > rely on > > > >> > > > > > > > > record timestamp. > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > Two questions: > > > >> > > > > > > > > Do we have use case 3? Is it nice to have or must > > have? > > > >> > > > > > > > > If we have use case 3 and want to go with solution A, > > > >> should > > > >> > we > > > >> > > > > > > introduce > > > >> > > > > > > > > a new configuration to enforce deletion by timestamp? > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > On Tue, Aug 14, 2018 at 1:52 PM, xiongqi wu < > > > >> > > xiongq...@gmail.com > > > >> > > > > > > > >> > > > > > > wrote: > > > >> > > > > > > > > > > > >> > > > > > > > >> Dong, > > > >> > > > > > > > >> > > > >> > > > > > > > >> Thanks for the comment. > > > >> > > > > > > > >> > > > >> > > > > > > > >> There are two retention policy: log compaction and > > time > > > >> > based > > > >> > > > > > > retention. > > > >> > > > > > > > >> > > > >> > > > > > > > >> Log compaction: > > > >> > > > > > > > >> > > > >> > > > > > > > >> we have use cases to keep infinite retention of a > > topic > > > >> > (only > > > >> > > > > > > > >> compaction). GDPR cares about deletion of PII > > (personal > > > >> > > > > identifiable > > > >> > > > > > > > >> information) data. > > > >> > > > > > > > >> Since Kafka doesn't know what records contain PII, it > > > >> relies > > > >> > > on > > > >> > > > > upper > > > >> > > > > > > > >> layer to delete those records. > > > >> > > > > > > > >> For those infinite retention uses uses, kafka needs > > to > > > >> > > provide a > > > >> > > > > way > > > >> > > > > > > to > > > >> > > > > > > > >> enforce compaction on time. This is what we try to > > > >> address > > > >> > in > > > >> > > > this > > > >> > > > > > > KIP. > > > >> > > > > > > > >> > > > >> > > > > > > > >> Time based retention, > > > >> > > > > > > > >> > > > >> > > > > > > > >> There are also use cases that users of Kafka might > > want > > > >> to > > > >> > > > expire > > > >> > > > > all > > > >> > > > > > > > >> their data. > > > >> > > > > > > > >> In those cases, they can use time based retention of > > > >> their > > > >> > > > topics. > > > >> > > > > > > > >> > > > >> > > > > > > > >> > > > >> > > > > > > > >> Regarding your first question, if a user wants to > > > delete > > > >> a > > > >> > key > > > >> > > > in > > > >> > > > > the > > > >> > > > > > > > >> log compaction topic, the user has to send a deletion > > > >> using > > > >> > > the > > > >> > > > > same > > > >> > > > > > > > key. > > > >> > > > > > > > >> Kafka only makes sure the deletion will happen under > > a > > > >> > certain > > > >> > > > > time > > > >> > > > > > > > >> periods (like 2 days/7 days). > > > >> > > > > > > > >> > > > >> > > > > > > > >> Regarding your second question. In most cases, we > > might > > > >> want > > > >> > > to > > > >> > > > > delete > > > >> > > > > > > > >> all duplicated keys at the same time. > > > >> > > > > > > > >> Compaction might be more efficient since we need to > > > scan > > > >> the > > > >> > > log > > > >> > > > > and > > > >> > > > > > > > find > > > >> > > > > > > > >> all duplicates. However, the expected use case is to > > > set > > > >> the > > > >> > > > time > > > >> > > > > > > based > > > >> > > > > > > > >> compaction interval on the order of days, and be > > larger > > > >> than > > > >> > > > 'min > > > >> > > > > > > > >> compaction lag". We don't want log compaction to > > happen > > > >> > > > frequently > > > >> > > > > > > since > > > >> > > > > > > > >> it is expensive. The purpose is to help low > > production > > > >> rate > > > >> > > > topic > > > >> > > > > to > > > >> > > > > > > get > > > >> > > > > > > > >> compacted on time. For the topic with "normal" > > incoming > > > >> > > message > > > >> > > > > > > message > > > >> > > > > > > > >> rate, the "min dirty ratio" might have triggered the > > > >> > > compaction > > > >> > > > > before > > > >> > > > > > > > this > > > >> > > > > > > > >> time based compaction policy takes effect. > > > >> > > > > > > > >> > > > >> > > > > > > > >> > > > >> > > > > > > > >> Eno, > > > >> > > > > > > > >> > > > >> > > > > > > > >> For your question, like I mentioned we have long time > > > >> > > retention > > > >> > > > > use > > > >> > > > > > > case > > > >> > > > > > > > >> for log compacted topic, but we want to provide > > ability > > > >> to > > > >> > > > delete > > > >> > > > > > > > certain > > > >> > > > > > > > >> PII records on time. > > > >> > > > > > > > >> Kafka itself doesn't know whether a record contains > > > >> > sensitive > > > >> > > > > > > > information > > > >> > > > > > > > >> and relies on the user for deletion. > > > >> > > > > > > > >> > > > >> > > > > > > > >> > > > >> > > > > > > > >> On Mon, Aug 13, 2018 at 6:58 PM, Dong Lin < > > > >> > > lindon...@gmail.com> > > > >> > > > > > > wrote: > > > >> > > > > > > > >> > > > >> > > > > > > > >>> Hey Xiongqi, > > > >> > > > > > > > >>> > > > >> > > > > > > > >>> Thanks for the KIP. I have two questions regarding > > the > > > >> > > use-case > > > >> > > > > for > > > >> > > > > > > > >>> meeting > > > >> > > > > > > > >>> GDPR requirement. > > > >> > > > > > > > >>> > > > >> > > > > > > > >>> 1) If I recall correctly, one of the GDPR > > requirement > > > is > > > >> > that > > > >> > > > we > > > >> > > > > can > > > >> > > > > > > > not > > > >> > > > > > > > >>> keep messages longer than e.g. 30 days in storage > > > (e.g. > > > >> > > Kafka). > > > >> > > > > Say > > > >> > > > > > > > there > > > >> > > > > > > > >>> exists a partition p0 which contains message1 with > > > key1 > > > >> and > > > >> > > > > message2 > > > >> > > > > > > > with > > > >> > > > > > > > >>> key2. And then user keeps producing messages with > > > >> key=key2 > > > >> > to > > > >> > > > > this > > > >> > > > > > > > >>> partition. Since message1 with key1 is never > > > overridden, > > > >> > > sooner > > > >> > > > > or > > > >> > > > > > > > later > > > >> > > > > > > > >>> we > > > >> > > > > > > > >>> will want to delete message1 and keep the latest > > > message > > > >> > with > > > >> > > > > > > key=key2. > > > >> > > > > > > > >>> But > > > >> > > > > > > > >>> currently it looks like log compact logic in Kafka > > > will > > > >> > > always > > > >> > > > > put > > > >> > > > > > > > these > > > >> > > > > > > > >>> messages in the same segment. Will this be an issue? > > > >> > > > > > > > >>> > > > >> > > > > > > > >>> 2) The current KIP intends to provide the capability > > > to > > > >> > > delete > > > >> > > > a > > > >> > > > > > > given > > > >> > > > > > > > >>> message in log compacted topic. Does such use-case > > > also > > > >> > > require > > > >> > > > > Kafka > > > >> > > > > > > > to > > > >> > > > > > > > >>> keep the messages produced before the given message? > > > If > > > >> > yes, > > > >> > > > > then we > > > >> > > > > > > > can > > > >> > > > > > > > >>> probably just use AdminClient.deleteRecords() or > > > >> time-based > > > >> > > log > > > >> > > > > > > > retention > > > >> > > > > > > > >>> to meet the use-case requirement. If no, do you know > > > >> what > > > >> > is > > > >> > > > the > > > >> > > > > > > GDPR's > > > >> > > > > > > > >>> requirement on time-to-deletion after user > > explicitly > > > >> > > requests > > > >> > > > > the > > > >> > > > > > > > >>> deletion > > > >> > > > > > > > >>> (e.g. 1 hour, 1 day, 7 day)? > > > >> > > > > > > > >>> > > > >> > > > > > > > >>> Thanks, > > > >> > > > > > > > >>> Dong > > > >> > > > > > > > >>> > > > >> > > > > > > > >>> > > > >> > > > > > > > >>> On Mon, Aug 13, 2018 at 3:44 PM, xiongqi wu < > > > >> > > > xiongq...@gmail.com > > > >> > > > > > > > > >> > > > > > > > wrote: > > > >> > > > > > > > >>> > > > >> > > > > > > > >>> > Hi Eno, > > > >> > > > > > > > >>> > > > > >> > > > > > > > >>> > The GDPR request we are getting here at linkedin > > is > > > >> if we > > > >> > > > get a > > > >> > > > > > > > >>> request to > > > >> > > > > > > > >>> > delete a record through a null key on a log > > > compacted > > > >> > > topic, > > > >> > > > > > > > >>> > we want to delete the record via compaction in a > > > given > > > >> > time > > > >> > > > > period > > > >> > > > > > > > >>> like 2 > > > >> > > > > > > > >>> > days (whatever is required by the policy). > > > >> > > > > > > > >>> > > > > >> > > > > > > > >>> > There might be other issues (such as orphan log > > > >> segments > > > >> > > > under > > > >> > > > > > > > certain > > > >> > > > > > > > >>> > conditions) that lead to GDPR problem but they are > > > >> more > > > >> > > like > > > >> > > > > > > > >>> something we > > > >> > > > > > > > >>> > need to fix anyway regardless of GDPR. > > > >> > > > > > > > >>> > > > > >> > > > > > > > >>> > > > > >> > > > > > > > >>> > -- Xiongqi (Wesley) Wu > > > >> > > > > > > > >>> > > > > >> > > > > > > > >>> > On Mon, Aug 13, 2018 at 2:56 PM, Eno Thereska < > > > >> > > > > > > > eno.there...@gmail.com> > > > >> > > > > > > > >>> > wrote: > > > >> > > > > > > > >>> > > > > >> > > > > > > > >>> > > Hello, > > > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > Thanks for the KIP. I'd like to see a more > > precise > > > >> > > > > definition of > > > >> > > > > > > > what > > > >> > > > > > > > >>> > part > > > >> > > > > > > > >>> > > of GDPR you are targeting as well as some sort > > of > > > >> > > > > verification > > > >> > > > > > > that > > > >> > > > > > > > >>> this > > > >> > > > > > > > >>> > > KIP actually addresses the problem. Right now I > > > find > > > >> > > this a > > > >> > > > > bit > > > >> > > > > > > > >>> vague: > > > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > "Ability to delete a log message through > > > compaction > > > >> in > > > >> > a > > > >> > > > > timely > > > >> > > > > > > > >>> manner > > > >> > > > > > > > >>> > has > > > >> > > > > > > > >>> > > become an important requirement in some use > > cases > > > >> > (e.g., > > > >> > > > > GDPR)" > > > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > Is there any guarantee that after this KIP the > > > GDPR > > > >> > > problem > > > >> > > > > is > > > >> > > > > > > > >>> solved or > > > >> > > > > > > > >>> > do > > > >> > > > > > > > >>> > > we need to do something else as well, e.g., more > > > >> KIPs? > > > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > Thanks > > > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > Eno > > > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > On Thu, Aug 9, 2018 at 4:18 PM, xiongqi wu < > > > >> > > > > xiongq...@gmail.com> > > > >> > > > > > > > >>> wrote: > > > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > > Hi Kafka, > > > >> > > > > > > > >>> > > > > > > >> > > > > > > > >>> > > > This KIP tries to address GDPR concern to > > > fulfill > > > >> > > > deletion > > > >> > > > > > > > request > > > >> > > > > > > > >>> on > > > >> > > > > > > > >>> > > time > > > >> > > > > > > > >>> > > > through time-based log compaction on a > > > compaction > > > >> > > enabled > > > >> > > > > > > topic: > > > >> > > > > > > > >>> > > > > > > >> > > > > > > > >>> > > > > > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-> > > > >> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP->> > > > >> > > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-> > > > >> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP->>> > > > >> > > > > > > > >>> > > > 354%3A+Time-based+log+compaction+policy > > > >> > > > > > > > >>> > > > > > > >> > > > > > > > >>> > > > Any feedback will be appreciated. > > > >> > > > > > > > >>> > > > > > > >> > > > > > > > >>> > > > > > > >> > > > > > > > >>> > > > Xiongqi (Wesley) Wu > > > >> > > > > > > > >>> > > > > > > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > > > >> > > > > > > > >>> > > > >> > > > > > > > >> > > > >> > > > > > > > >> > > > >> > > > > > > > >> > > > >> > > > > > > > >> -- > > > >> > > > > > > > >> Xiongqi (Wesley) Wu > > > >> > > > > > > > >> > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > -- > > > >> > > > > > > > > Xiongqi (Wesley) Wu > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > -- > > > >> > > > > > > > Xiongqi (Wesley) Wu > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > -- > > > >> > > > > > > > > > >> > > > > > > Brett Rann > > > >> > > > > > > > > > >> > > > > > > Senior DevOps Engineer > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > Zendesk International Ltd > > > >> > > > > > > > > > >> > > > > > > 395 Collins Street, Melbourne VIC 3000 Australia > > > >> > > > > > > > > > >> > > > > > > Mobile: +61 (0) 418 826 017 > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > -- > > > >> > > > > > Xiongqi (Wesley) Wu > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > -- > > > >> > > > Xiongqi (Wesley) Wu > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > >> > > > > >> > -- > > > >> > Xiongqi (Wesley) Wu > > > >> > > > > >> > > > >> > > > >> -- > > > >> > > > >> Brett Rann > > > >> > > > >> Senior DevOps Engineer > > > >> > > > >> > > > >> Zendesk International Ltd > > > >> > > > >> 395 Collins Street, Melbourne VIC 3000 Australia > > > >> > > > >> Mobile: +61 (0) 418 826 017 > > > >> > > > > > > > > > > > > > -- > > > Xiongqi (Wesley) Wu > > > > > > > > > -- > > > > Brett Rann > > > > Senior DevOps Engineer > > > > > > Zendesk International Ltd > > > > 395 Collins Street, Melbourne VIC 3000 Australia > > > > Mobile: +61 (0) 418 826 017 > >