+1. On Fri, Oct 7, 2016 at 3:35 PM, Gwen Shapira <g...@confluent.io> wrote:
> +1 (binding) > > On Wed, Oct 5, 2016 at 1:55 PM, Bill Warshaw <wdwars...@gmail.com> wrote: > > Bumping for visibility. KIP is here: > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+ > Add+timestamp-based+log+deletion+policy > > > > On Wed, Aug 24, 2016 at 2:32 PM Bill Warshaw <wdwars...@gmail.com> > wrote: > > > >> Hello Guozhang, > >> > >> KIP-71 seems unrelated to this KIP. KIP-47 is just adding a new > deletion > >> policy (minimum timestamp), while KIP-71 is allowing deletion and > >> compaction to coexist. > >> > >> They both will touch LogManager, but the change for KIP-47 is very > >> isolated. > >> > >> On Wed, Aug 24, 2016 at 2:21 PM Guozhang Wang <wangg...@gmail.com> > wrote: > >> > >> Hi Bill, > >> > >> I would like to reason if there is any correlation between this KIP and > >> KIP-71 > >> > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-71%3A+ > Enable+log+compaction+and+deletion+to+co-exist > >> > >> I feel they are orthogonal but would like to double check with you. > >> > >> > >> Guozhang > >> > >> > >> On Wed, Aug 24, 2016 at 11:05 AM, Bill Warshaw <wdwars...@gmail.com> > >> wrote: > >> > >> > I'd like to re-awaken this voting thread now that KIP-33 has merged. > >> This > >> > KIP is now completely unblocked. I have a working branch off of trunk > >> with > >> > my proposed fix, including testing. > >> > > >> > On Mon, May 9, 2016 at 8:30 PM Guozhang Wang <wangg...@gmail.com> > wrote: > >> > > >> > > Jay, Bill: > >> > > > >> > > I'm thinking of one general use case of using timestamp rather than > >> > offset > >> > > for log deletion, which is that for expiration handling in data > >> > > replication, when the source data store decides to expire some data > >> > records > >> > > based on their timestamps, today we need to configure the > corresponding > >> > > Kafka changelog topic for compaction, and actively send a tombstone > for > >> > > each expired record. Since expiration usually happens with a bunch > of > >> > > records, this could generate large tombstone traffic. For example I > >> think > >> > > LI's data replication for Espresso is seeing similar issues and they > >> are > >> > > just not sending tombstone at all. > >> > > > >> > > With timestamp based log deletion policy, this can be easily > handled by > >> > > simply setting the current expiration timestamp; but ideally one > would > >> > > prefer to configure this topic to be both log compaction enabled as > >> well > >> > as > >> > > log deletion enabled. From that point of view, I feel that current > KIP > >> > > still has value to be accepted. > >> > > > >> > > Guozhang > >> > > > >> > > > >> > > On Mon, May 2, 2016 at 2:37 PM, Bill Warshaw <wdwars...@gmail.com> > >> > wrote: > >> > > > >> > > > Yes, I'd agree that offset is a more precise configuration than > >> > > timestamp. > >> > > > If there was a way to set a partition-level configuration, I would > >> > rather > >> > > > use log.retention.min.offset than timestamp. If you have an > approach > >> > in > >> > > > mind I'd be open to investigating it. > >> > > > > >> > > > On Mon, May 2, 2016 at 5:33 PM, Jay Kreps <j...@confluent.io> > wrote: > >> > > > > >> > > > > Gotcha, good point. But barring that limitation, you agree that > >> that > >> > > > makes > >> > > > > more sense? > >> > > > > > >> > > > > -Jay > >> > > > > > >> > > > > On Mon, May 2, 2016 at 2:29 PM, Bill Warshaw < > wdwars...@gmail.com> > >> > > > wrote: > >> > > > > > >> > > > > > The problem with offset as a config option is that offsets are > >> > > > > > partition-specific, so we'd need a per-partition config. This > >> > would > >> > > > work > >> > > > > > for our particular use case, where we have single-partition > >> topics, > >> > > but > >> > > > > for > >> > > > > > multiple-partition topics it would delete from all partitions > >> based > >> > > on > >> > > > a > >> > > > > > global topic-level offset. > >> > > > > > > >> > > > > > On Mon, May 2, 2016 at 4:32 PM, Jay Kreps <j...@confluent.io> > >> > wrote: > >> > > > > > > >> > > > > > > I think you are saying you considered a kind of trim() api > that > >> > > would > >> > > > > > > synchronously chop off the tail of the log starting from a > >> given > >> > > > > offset. > >> > > > > > > That would be one option, but what I was saying was slightly > >> > > > different: > >> > > > > > in > >> > > > > > > the proposal you have where there is a config that controls > >> > > retention > >> > > > > > that > >> > > > > > > the user would update, wouldn't it make more sense for this > >> > config > >> > > to > >> > > > > be > >> > > > > > > based on offset rather than timestamp? > >> > > > > > > > >> > > > > > > -Jay > >> > > > > > > > >> > > > > > > On Mon, May 2, 2016 at 12:53 PM, Bill Warshaw < > >> > wdwars...@gmail.com > >> > > > > >> > > > > > wrote: > >> > > > > > > > >> > > > > > > > 1. Initially I looked at using the actual offset, by > adding > >> a > >> > > call > >> > > > > to > >> > > > > > > > AdminUtils to just delete anything in a given > topic/partition > >> > to > >> > > a > >> > > > > > given > >> > > > > > > > offset. I ran into a lot of trouble here trying to work > out > >> > how > >> > > > the > >> > > > > > > system > >> > > > > > > > would recognize that every broker had successfully deleted > >> that > >> > > > range > >> > > > > > > from > >> > > > > > > > the partition before returning to the client. If we were > ok > >> > > > treating > >> > > > > > > this > >> > > > > > > > as a completely asynchronous operation I would be open to > >> > > > revisiting > >> > > > > > this > >> > > > > > > > approach. > >> > > > > > > > > >> > > > > > > > 2. For our use case, we would be updating the config > every > >> few > >> > > > hours > >> > > > > > > for a > >> > > > > > > > given topic, and there would not a be a sizable amount of > >> > > > > consumers. I > >> > > > > > > > imagine that this would not scale well if someone was > >> adjusting > >> > > > this > >> > > > > > > config > >> > > > > > > > very frequently on a large system, but I don't know if > there > >> > are > >> > > > any > >> > > > > > use > >> > > > > > > > cases where that would occur. I imagine most use cases > would > >> > > > involve > >> > > > > > > > truncating the log after taking a snapshot or doing some > >> other > >> > > > > > expensive > >> > > > > > > > operation that didn't occur very frequently. > >> > > > > > > > > >> > > > > > > > On Mon, May 2, 2016 at 2:23 PM, Jay Kreps < > j...@confluent.io> > >> > > > wrote: > >> > > > > > > > > >> > > > > > > > > Two comments: > >> > > > > > > > > > >> > > > > > > > > 1. Is there a reason to use physical time rather than > >> > > offset? > >> > > > > The > >> > > > > > > idea > >> > > > > > > > > is for the consumer to say when it has consumed > >> something > >> > so > >> > > > it > >> > > > > > can > >> > > > > > > be > >> > > > > > > > > deleted, right? It seems like offset would be a much > >> more > >> > > > > precise > >> > > > > > > way > >> > > > > > > > > to do > >> > > > > > > > > this--i.e. the consumer says "I have checkpointed > state > >> up > >> > > to > >> > > > > > > offset X > >> > > > > > > > > you > >> > > > > > > > > can get rid of anything prior to that". Doing this by > >> > > > timestamp > >> > > > > > > seems > >> > > > > > > > > like > >> > > > > > > > > it is just more error prone... > >> > > > > > > > > 2. Is this mechanism practical to use at scale? It > >> > requires > >> > > > > > several > >> > > > > > > ZK > >> > > > > > > > > writes per config change, so I guess that depends on > how > >> > > > > > frequently > >> > > > > > > > the > >> > > > > > > > > consumers would update the value and how many > consumers > >> > > there > >> > > > > > > > are...any > >> > > > > > > > > thoughts on this? > >> > > > > > > > > > >> > > > > > > > > -Jay > >> > > > > > > > > > >> > > > > > > > > On Thu, Apr 28, 2016 at 8:28 AM, Bill Warshaw < > >> > > > wdwars...@gmail.com > >> > > > > > > >> > > > > > > > wrote: > >> > > > > > > > > > >> > > > > > > > > > I'd like to re-initiate the vote for KIP-47 now that > >> KIP-33 > >> > > has > >> > > > > > been > >> > > > > > > > > > accepted and is in-progress. I've updated the KIP ( > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > >> > 47+-+Add+timestamp-based+log+deletion+policy > >> > > > > > > > > > ). > >> > > > > > > > > > I have a commit with the functionality for KIP-47 > ready > >> to > >> > go > >> > > > > once > >> > > > > > > > KIP-33 > >> > > > > > > > > > is complete; it's a fairly minor change. > >> > > > > > > > > > > >> > > > > > > > > > On Wed, Mar 9, 2016 at 8:42 PM, Gwen Shapira < > >> > > > g...@confluent.io> > >> > > > > > > > wrote: > >> > > > > > > > > > > >> > > > > > > > > > > For convenience, the KIP is here: > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > >> > 47+-+Add+timestamp-based+log+deletion+policy > >> > > > > > > > > > > > >> > > > > > > > > > > Do you mind updating the KIP with time formats we > plan > >> > on > >> > > > > > > supporting > >> > > > > > > > > > > in the configuration? > >> > > > > > > > > > > > >> > > > > > > > > > > On Wed, Mar 9, 2016 at 11:44 AM, Bill Warshaw < > >> > > > > > wdwars...@gmail.com > >> > > > > > > > > >> > > > > > > > > > wrote: > >> > > > > > > > > > > > Hello, > >> > > > > > > > > > > > > >> > > > > > > > > > > > I'd like to initiate the vote for KIP-47. > >> > > > > > > > > > > > > >> > > > > > > > > > > > Thanks, > >> > > > > > > > > > > > Bill Warshaw > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > >> > > > >> > > -- > >> > > -- Guozhang > >> > > > >> > > >> > >> > >> > >> -- > >> -- Guozhang > >> > >> > > > > -- > Gwen Shapira > Product Manager | Confluent > 650.450.2760 | @gwenshap > Follow us: Twitter | blog > -- -- Guozhang