Bumping this thread so Wes can reply to it. Ignore this mail. 2016-02-24 0:36 GMT+01:00 Joel Koshy <jjkosh...@gmail.com>:
> Great - thanks for clarifying. > > Joel > > On Tue, Feb 23, 2016 at 1:47 PM, Bill Warshaw <wdwars...@gmail.com> wrote: > > > Sorry that I didn't see this comment before the meeting Joel. I'll try > to > > clarify what I said at the meeting: > > > > - The KIP currently states that timestamp-based log deletion will only > work > > with LogAppendTime. I need to update the KIP to reflect that, after the > > work is done for KIP-33, it will work with both LogAppendTime and > > CreateTime. > > - To use the existing time-based retention mechanism to delete a precise > > range of messages, a client application would need to do the following: > > - by default, turn off these retention mechanisms > > - when the application wishes to delete a range of messages which were > > sent before a certain time, compute an approximate value to set > > "log.retention.minutes" to, to create a window of messages based on that > > timestamp that are ok to delete. There is some degree of imprecision > > implied here. > > - wait until we are confident that the log retention mechanism has been > > run and deleted any stale segments > > - reset "log.retention.minutes" to turn off time-based log retention > > until the next time the client application wants to delete something > > > > - To use the proposed timestamp-based retention mechanism, there is only > > one step: the application just has to set "log.retention.min.timestamp" > to > > whatever time boundary it deems fit. It doesn't need to compute any > fuzzy > > windows, try to wait until asynchronous processes have been completed or > > continually flip settings between enabled and disabled. > > > > I will update the KIP to reflect the discussion around LogAppendTime vs > > CreateTime and the work being done in KIP-33. > > > > Thanks, > > Bill > > > > > > On Tue, Feb 23, 2016 at 1:22 PM, Joel Koshy <jjkosh...@gmail.com> wrote: > > > > > I'm having some trouble reconciling the current proposal with your > > original > > > requirement which was essentially being able to purge log data up to a > > > precise point (an offset). The KIP currently suggests that > > timestamp-based > > > deletion would only work with LogAppendTime, so it does not seem > > > significantly different from time-based retention (after KIP-32/33) - > IOW > > > to me it appears that you would need to use CreateTime and not > > > LogAppendTime. Also one of the rejected alternatives observes that > > changing > > > the existing configuration settings to try to flush ranges of a given > > > partition's log are problematic, but it seems to me you would have to > do > > > this in with timestamp-based deletion as well right? I think it would > be > > > useful for me if you or anyone else can go over the exact > > > mechanics/workflow for accomplishing precise purges at today's KIP > > meeting. > > > > > > Thanks, > > > > > > Joel > > > > > > On Monday, February 22, 2016, Bill Warshaw <wdwars...@gmail.com> > wrote: > > > > > > > Sounds good. I'll hold off on sending out a VOTE thread until after > > the > > > > KIP meeting tomorrow. > > > > > > > > On Mon, Feb 22, 2016 at 12:56 PM, Becket Qin <becket....@gmail.com> > > > wrote: > > > > > > > > > Hi Jun, > > > > > > > > > > I think it makes sense to implement KIP-47 after KIP-33 so we can > > make > > > it > > > > > work for both LogAppendTime and CreateTime. > > > > > > > > > > And yes, I'm actively working on KIP-33. I had a voting thread on > > > KIP-33 > > > > > before and I'll bump it up. > > > > > > > > > > Thanks, > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > > > > > On Mon, Feb 22, 2016 at 9:11 AM, Jun Rao <j...@confluent.io> wrote: > > > > > > > > > > > Becket, > > > > > > > > > > > > Since you submitted KIP-33, are you actively working on that? If > > so, > > > it > > > > > > would make sense to implement KIP-47 after KIP-33 so that it > works > > > for > > > > > both > > > > > > CreateTime and LogAppendTime. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Feb 19, 2016 at 6:25 PM, Bill Warshaw < > wdwars...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > Hi Jun, > > > > > > > > > > > > > > 1. I thought more about Andrew's comment about LogAppendTime. > > The > > > > > > > time-based index you are referring to is associated with > KIP-33, > > > > > correct? > > > > > > > Currently my implementation is just checking the last message > in > > a > > > > > > segment, > > > > > > > so we're restricted to LogAppendTime. When the work for KIP-33 > > is > > > > > > > completed, it sounds like CreateTime would also be valid. Do > you > > > > > happen > > > > > > to > > > > > > > know if anyone is currently working on KIP-33? > > > > > > > > > > > > > > 2. I did update the wiki after reading your original comment, > but > > > > > reading > > > > > > > over it again I realize I could word a couple things more > > > clearly. I > > > > > > will > > > > > > > do that tonight. > > > > > > > > > > > > > > Bill > > > > > > > > > > > > > > On Fri, Feb 19, 2016 at 7:02 PM, Jun Rao <j...@confluent.io> > > wrote: > > > > > > > > > > > > > > > Hi, Bill, > > > > > > > > > > > > > > > > I replied with the following comments earlier to the thread. > > Did > > > > you > > > > > > see > > > > > > > > that? > > > > > > > > > > > > > > > > Thanks for the proposal. A couple of comments. > > > > > > > > > > > > > > > > 1. It seems that this new policy should work for CreateTime > as > > > > well. > > > > > > If a > > > > > > > > topic is configured with CreateTime, messages may not be > added > > in > > > > > > strict > > > > > > > > order in the log. However, to build a time-based index, we > will > > > be > > > > > > > > maintaining the largest timestamp for all messages in a log > > > > segment. > > > > > We > > > > > > > can > > > > > > > > delete a segment if its largest timestamp is less than > > > > > > > > log.retention.min.timestamp. This guarantees that no messages > > > newer > > > > > > than > > > > > > > > log.retention.min.timestamp will be deleted, which is > probably > > > what > > > > > the > > > > > > > > user wants. > > > > > > > > > > > > > > > > 2. Right now, the user can specify "delete" as the retention > > > policy > > > > > > and a > > > > > > > > log segment will be deleted either when the size of a > partition > > > > > > exceeds a > > > > > > > > threshold or the timestamp of a segment is older than a > > relative > > > > > period > > > > > > > of > > > > > > > > time (say 7 days) from now. What you are proposing is not a > new > > > > > > retention > > > > > > > > policy, but an additional check that will cause a segment to > be > > > > > deleted > > > > > > > > when the timestamp of a segment is older than an absolute > > > > timestamp? > > > > > If > > > > > > > so, > > > > > > > > could you update the wiki accordingly? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > On Fri, Feb 19, 2016 at 2:57 PM, Bill Warshaw < > > > wdwars...@gmail.com > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hello all, > > > > > > > > > > > > > > > > > > What is the next step with this proposal? The work for > > KIP-32 > > > > that > > > > > > it > > > > > > > > was > > > > > > > > > based off merged earlier today ( > > > > > > > https://github.com/apache/kafka/pull/764 > > > > > > > > , > > > > > > > > > thank you Becket). I have an implementation with tests, > and > > > I've > > > > > > > > confirmed > > > > > > > > > that it actually works in a live system. Is there more > > > > discussion > > > > > > that > > > > > > > > > needs to be had about this KIP, or should I start a VOTE > > > thread? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Feb 16, 2016 at 5:06 PM, Jun Rao <j...@confluent.io > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Bill, > > > > > > > > > > > > > > > > > > > > Thanks for the proposal. A couple of comments. > > > > > > > > > > > > > > > > > > > > 1. It seems that this new policy should work for > CreateTime > > > as > > > > > > well. > > > > > > > > If a > > > > > > > > > > topic is configured with CreateTime, messages may not be > > > added > > > > in > > > > > > > > strict > > > > > > > > > > order in the log. However, to build a time-based index, > we > > > will > > > > > be > > > > > > > > > > maintaining the largest timestamp for all messages in a > log > > > > > > segment. > > > > > > > We > > > > > > > > > can > > > > > > > > > > delete a segment if its largest timestamp is less than > > > > > > > > > > log.retention.min.timestamp. This guarantees that no > > messages > > > > > newer > > > > > > > > than > > > > > > > > > > log.retention.min.timestamp will be deleted, which is > > > probably > > > > > what > > > > > > > the > > > > > > > > > > user wants. > > > > > > > > > > > > > > > > > > > > 2. Right now, the user can specify "delete" as the > > retention > > > > > policy > > > > > > > > and a > > > > > > > > > > log segment will be deleted either when the size of a > > > partition > > > > > > > > exceeds a > > > > > > > > > > threshold or the timestamp of a segment is older than a > > > > relative > > > > > > > period > > > > > > > > > of > > > > > > > > > > time (say 7 days) from now. What you are proposing is > not a > > > new > > > > > > > > retention > > > > > > > > > > policy, but an additional check that will cause a segment > > to > > > be > > > > > > > deleted > > > > > > > > > > when the timestamp of a segment is older than an absolute > > > > > > timestamp? > > > > > > > If > > > > > > > > > so, > > > > > > > > > > could you update the wiki accordingly? > > > > > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Feb 13, 2016 at 3:23 PM, Bill Warshaw < > > > > > wdwars...@gmail.com > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > > > > > That is a good catch, thanks for pointing it out. If > > this > > > > KIP > > > > > is > > > > > > > > > > accepted, > > > > > > > > > > > we'd need to document this and make the log cleaner not > > run > > > > > > > > > > timestamp-based > > > > > > > > > > > deletion unless message.timestamp.type=LogAppendTime. > > > > > > > > > > > > > > > > > > > > > > On Sat, Feb 13, 2016 at 5:38 AM, Andrew Schofield < > > > > > > > > > > > andrew_schofield_j...@outlook.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > This KIP is related to KIP-32, but I strikes me that > it > > > > only > > > > > > > makes > > > > > > > > > > sense > > > > > > > > > > > > with one of the two proposed message timestamp types. > > If > > > I > > > > > > > > understand > > > > > > > > > > > > correctly, message timestamps are only certain to be > > > > > > > monotonically > > > > > > > > > > > > increasing in the log if > > > > > message.timestamp.type=LogAppendTime. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Does timestamp-based auto-expiration require use of > > > > > > > > > > > > message.timestamp.type=LogAppendTime? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think this KIP is a good idea, but I think it > relies > > on > > > > > > strict > > > > > > > > > > ordering > > > > > > > > > > > > of timestamps to be workable. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Andrew Schofield > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Date: Fri, 12 Feb 2016 10:38:46 -0800 > > > > > > > > > > > > > Subject: Re: [DISCUSS] KIP-47 - Add timestamp-based > > log > > > > > > > deletion > > > > > > > > > > policy > > > > > > > > > > > > > From: n...@confluent.io > > > > > > > > > > > > > To: dev@kafka.apache.org > > > > > > > > > > > > > > > > > > > > > > > > > > Adding a timestamp based auto-expiration is useful > > and > > > > this > > > > > > > > > proposal > > > > > > > > > > > > makes > > > > > > > > > > > > > sense. Thx! > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Feb 10, 2016 at 3:35 PM, Jay Kreps wrote: > > > > > > > > > > > > > > > > > > > > > > > > > >> I think this makes a lot of sense and won't be > hard > > to > > > > > > > implement > > > > > > > > > and > > > > > > > > > > > > >> doesn't create too much in the way of new > > interfaces. > > > > > > > > > > > > >> > > > > > > > > > > > > >> -Jay > > > > > > > > > > > > >> > > > > > > > > > > > > >> On Tue, Feb 9, 2016 at 8:13 AM, Bill Warshaw > wrote: > > > > > > > > > > > > >> > > > > > > > > > > > > >>> Hello, > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> I just submitted KIP-47 for adding a new log > > deletion > > > > > > policy > > > > > > > > > based > > > > > > > > > > > on a > > > > > > > > > > > > >>> minimum timestamp of messages to retain. > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> I'm open to any comments or suggestions. > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> Thanks, > > > > > > > > > > > > >>> Bill Warshaw > > > > > > > > > > > > >>> > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > Neha > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >