+1 (binding) On Wed, Oct 5, 2016 at 1:55 PM, Bill Warshaw <wdwars...@gmail.com> wrote: > Bumping for visibility. KIP is here: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy > > On Wed, Aug 24, 2016 at 2:32 PM Bill Warshaw <wdwars...@gmail.com> wrote: > >> Hello Guozhang, >> >> KIP-71 seems unrelated to this KIP. KIP-47 is just adding a new deletion >> policy (minimum timestamp), while KIP-71 is allowing deletion and >> compaction to coexist. >> >> They both will touch LogManager, but the change for KIP-47 is very >> isolated. >> >> On Wed, Aug 24, 2016 at 2:21 PM Guozhang Wang <wangg...@gmail.com> wrote: >> >> Hi Bill, >> >> I would like to reason if there is any correlation between this KIP and >> KIP-71 >> >> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-71%3A+Enable+log+compaction+and+deletion+to+co-exist >> >> I feel they are orthogonal but would like to double check with you. >> >> >> Guozhang >> >> >> On Wed, Aug 24, 2016 at 11:05 AM, Bill Warshaw <wdwars...@gmail.com> >> wrote: >> >> > I'd like to re-awaken this voting thread now that KIP-33 has merged. >> This >> > KIP is now completely unblocked. I have a working branch off of trunk >> with >> > my proposed fix, including testing. >> > >> > On Mon, May 9, 2016 at 8:30 PM Guozhang Wang <wangg...@gmail.com> wrote: >> > >> > > Jay, Bill: >> > > >> > > I'm thinking of one general use case of using timestamp rather than >> > offset >> > > for log deletion, which is that for expiration handling in data >> > > replication, when the source data store decides to expire some data >> > records >> > > based on their timestamps, today we need to configure the corresponding >> > > Kafka changelog topic for compaction, and actively send a tombstone for >> > > each expired record. Since expiration usually happens with a bunch of >> > > records, this could generate large tombstone traffic. For example I >> think >> > > LI's data replication for Espresso is seeing similar issues and they >> are >> > > just not sending tombstone at all. >> > > >> > > With timestamp based log deletion policy, this can be easily handled by >> > > simply setting the current expiration timestamp; but ideally one would >> > > prefer to configure this topic to be both log compaction enabled as >> well >> > as >> > > log deletion enabled. From that point of view, I feel that current KIP >> > > still has value to be accepted. >> > > >> > > Guozhang >> > > >> > > >> > > On Mon, May 2, 2016 at 2:37 PM, Bill Warshaw <wdwars...@gmail.com> >> > wrote: >> > > >> > > > Yes, I'd agree that offset is a more precise configuration than >> > > timestamp. >> > > > If there was a way to set a partition-level configuration, I would >> > rather >> > > > use log.retention.min.offset than timestamp. If you have an approach >> > in >> > > > mind I'd be open to investigating it. >> > > > >> > > > On Mon, May 2, 2016 at 5:33 PM, Jay Kreps <j...@confluent.io> wrote: >> > > > >> > > > > Gotcha, good point. But barring that limitation, you agree that >> that >> > > > makes >> > > > > more sense? >> > > > > >> > > > > -Jay >> > > > > >> > > > > On Mon, May 2, 2016 at 2:29 PM, Bill Warshaw <wdwars...@gmail.com> >> > > > wrote: >> > > > > >> > > > > > The problem with offset as a config option is that offsets are >> > > > > > partition-specific, so we'd need a per-partition config. This >> > would >> > > > work >> > > > > > for our particular use case, where we have single-partition >> topics, >> > > but >> > > > > for >> > > > > > multiple-partition topics it would delete from all partitions >> based >> > > on >> > > > a >> > > > > > global topic-level offset. >> > > > > > >> > > > > > On Mon, May 2, 2016 at 4:32 PM, Jay Kreps <j...@confluent.io> >> > wrote: >> > > > > > >> > > > > > > I think you are saying you considered a kind of trim() api that >> > > would >> > > > > > > synchronously chop off the tail of the log starting from a >> given >> > > > > offset. >> > > > > > > That would be one option, but what I was saying was slightly >> > > > different: >> > > > > > in >> > > > > > > the proposal you have where there is a config that controls >> > > retention >> > > > > > that >> > > > > > > the user would update, wouldn't it make more sense for this >> > config >> > > to >> > > > > be >> > > > > > > based on offset rather than timestamp? >> > > > > > > >> > > > > > > -Jay >> > > > > > > >> > > > > > > On Mon, May 2, 2016 at 12:53 PM, Bill Warshaw < >> > wdwars...@gmail.com >> > > > >> > > > > > wrote: >> > > > > > > >> > > > > > > > 1. Initially I looked at using the actual offset, by adding >> a >> > > call >> > > > > to >> > > > > > > > AdminUtils to just delete anything in a given topic/partition >> > to >> > > a >> > > > > > given >> > > > > > > > offset. I ran into a lot of trouble here trying to work out >> > how >> > > > the >> > > > > > > system >> > > > > > > > would recognize that every broker had successfully deleted >> that >> > > > range >> > > > > > > from >> > > > > > > > the partition before returning to the client. If we were ok >> > > > treating >> > > > > > > this >> > > > > > > > as a completely asynchronous operation I would be open to >> > > > revisiting >> > > > > > this >> > > > > > > > approach. >> > > > > > > > >> > > > > > > > 2. For our use case, we would be updating the config every >> few >> > > > hours >> > > > > > > for a >> > > > > > > > given topic, and there would not a be a sizable amount of >> > > > > consumers. I >> > > > > > > > imagine that this would not scale well if someone was >> adjusting >> > > > this >> > > > > > > config >> > > > > > > > very frequently on a large system, but I don't know if there >> > are >> > > > any >> > > > > > use >> > > > > > > > cases where that would occur. I imagine most use cases would >> > > > involve >> > > > > > > > truncating the log after taking a snapshot or doing some >> other >> > > > > > expensive >> > > > > > > > operation that didn't occur very frequently. >> > > > > > > > >> > > > > > > > On Mon, May 2, 2016 at 2:23 PM, Jay Kreps <j...@confluent.io> >> > > > wrote: >> > > > > > > > >> > > > > > > > > Two comments: >> > > > > > > > > >> > > > > > > > > 1. Is there a reason to use physical time rather than >> > > offset? >> > > > > The >> > > > > > > idea >> > > > > > > > > is for the consumer to say when it has consumed >> something >> > so >> > > > it >> > > > > > can >> > > > > > > be >> > > > > > > > > deleted, right? It seems like offset would be a much >> more >> > > > > precise >> > > > > > > way >> > > > > > > > > to do >> > > > > > > > > this--i.e. the consumer says "I have checkpointed state >> up >> > > to >> > > > > > > offset X >> > > > > > > > > you >> > > > > > > > > can get rid of anything prior to that". Doing this by >> > > > timestamp >> > > > > > > seems >> > > > > > > > > like >> > > > > > > > > it is just more error prone... >> > > > > > > > > 2. Is this mechanism practical to use at scale? It >> > requires >> > > > > > several >> > > > > > > ZK >> > > > > > > > > writes per config change, so I guess that depends on how >> > > > > > frequently >> > > > > > > > the >> > > > > > > > > consumers would update the value and how many consumers >> > > there >> > > > > > > > are...any >> > > > > > > > > thoughts on this? >> > > > > > > > > >> > > > > > > > > -Jay >> > > > > > > > > >> > > > > > > > > On Thu, Apr 28, 2016 at 8:28 AM, Bill Warshaw < >> > > > wdwars...@gmail.com >> > > > > > >> > > > > > > > wrote: >> > > > > > > > > >> > > > > > > > > > I'd like to re-initiate the vote for KIP-47 now that >> KIP-33 >> > > has >> > > > > > been >> > > > > > > > > > accepted and is in-progress. I've updated the KIP ( >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- >> > 47+-+Add+timestamp-based+log+deletion+policy >> > > > > > > > > > ). >> > > > > > > > > > I have a commit with the functionality for KIP-47 ready >> to >> > go >> > > > > once >> > > > > > > > KIP-33 >> > > > > > > > > > is complete; it's a fairly minor change. >> > > > > > > > > > >> > > > > > > > > > On Wed, Mar 9, 2016 at 8:42 PM, Gwen Shapira < >> > > > g...@confluent.io> >> > > > > > > > wrote: >> > > > > > > > > > >> > > > > > > > > > > For convenience, the KIP is here: >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- >> > 47+-+Add+timestamp-based+log+deletion+policy >> > > > > > > > > > > >> > > > > > > > > > > Do you mind updating the KIP with time formats we plan >> > on >> > > > > > > supporting >> > > > > > > > > > > in the configuration? >> > > > > > > > > > > >> > > > > > > > > > > On Wed, Mar 9, 2016 at 11:44 AM, Bill Warshaw < >> > > > > > wdwars...@gmail.com >> > > > > > > > >> > > > > > > > > > wrote: >> > > > > > > > > > > > Hello, >> > > > > > > > > > > > >> > > > > > > > > > > > I'd like to initiate the vote for KIP-47. >> > > > > > > > > > > > >> > > > > > > > > > > > Thanks, >> > > > > > > > > > > > Bill Warshaw >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > >> > > >> > > -- >> > > -- Guozhang >> > > >> > >> >> >> >> -- >> -- Guozhang >> >>
-- Gwen Shapira Product Manager | Confluent 650.450.2760 | @gwenshap Follow us: Twitter | blog