Sorry that I didn't see this comment before the meeting Joel. I'll try to clarify what I said at the meeting:
- The KIP currently states that timestamp-based log deletion will only work with LogAppendTime. I need to update the KIP to reflect that, after the work is done for KIP-33, it will work with both LogAppendTime and CreateTime. - To use the existing time-based retention mechanism to delete a precise range of messages, a client application would need to do the following: - by default, turn off these retention mechanisms - when the application wishes to delete a range of messages which were sent before a certain time, compute an approximate value to set "log.retention.minutes" to, to create a window of messages based on that timestamp that are ok to delete. There is some degree of imprecision implied here. - wait until we are confident that the log retention mechanism has been run and deleted any stale segments - reset "log.retention.minutes" to turn off time-based log retention until the next time the client application wants to delete something - To use the proposed timestamp-based retention mechanism, there is only one step: the application just has to set "log.retention.min.timestamp" to whatever time boundary it deems fit. It doesn't need to compute any fuzzy windows, try to wait until asynchronous processes have been completed or continually flip settings between enabled and disabled. I will update the KIP to reflect the discussion around LogAppendTime vs CreateTime and the work being done in KIP-33. Thanks, Bill On Tue, Feb 23, 2016 at 1:22 PM, Joel Koshy <jjkosh...@gmail.com> wrote: > I'm having some trouble reconciling the current proposal with your original > requirement which was essentially being able to purge log data up to a > precise point (an offset). The KIP currently suggests that timestamp-based > deletion would only work with LogAppendTime, so it does not seem > significantly different from time-based retention (after KIP-32/33) - IOW > to me it appears that you would need to use CreateTime and not > LogAppendTime. Also one of the rejected alternatives observes that changing > the existing configuration settings to try to flush ranges of a given > partition's log are problematic, but it seems to me you would have to do > this in with timestamp-based deletion as well right? I think it would be > useful for me if you or anyone else can go over the exact > mechanics/workflow for accomplishing precise purges at today's KIP meeting. > > Thanks, > > Joel > > On Monday, February 22, 2016, Bill Warshaw <wdwars...@gmail.com> wrote: > > > Sounds good. I'll hold off on sending out a VOTE thread until after the > > KIP meeting tomorrow. > > > > On Mon, Feb 22, 2016 at 12:56 PM, Becket Qin <becket....@gmail.com> > wrote: > > > > > Hi Jun, > > > > > > I think it makes sense to implement KIP-47 after KIP-33 so we can make > it > > > work for both LogAppendTime and CreateTime. > > > > > > And yes, I'm actively working on KIP-33. I had a voting thread on > KIP-33 > > > before and I'll bump it up. > > > > > > Thanks, > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > On Mon, Feb 22, 2016 at 9:11 AM, Jun Rao <j...@confluent.io> wrote: > > > > > > > Becket, > > > > > > > > Since you submitted KIP-33, are you actively working on that? If so, > it > > > > would make sense to implement KIP-47 after KIP-33 so that it works > for > > > both > > > > CreateTime and LogAppendTime. > > > > > > > > Thanks, > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > On Fri, Feb 19, 2016 at 6:25 PM, Bill Warshaw <wdwars...@gmail.com> > > > wrote: > > > > > > > > > Hi Jun, > > > > > > > > > > 1. I thought more about Andrew's comment about LogAppendTime. The > > > > > time-based index you are referring to is associated with KIP-33, > > > correct? > > > > > Currently my implementation is just checking the last message in a > > > > segment, > > > > > so we're restricted to LogAppendTime. When the work for KIP-33 is > > > > > completed, it sounds like CreateTime would also be valid. Do you > > > happen > > > > to > > > > > know if anyone is currently working on KIP-33? > > > > > > > > > > 2. I did update the wiki after reading your original comment, but > > > reading > > > > > over it again I realize I could word a couple things more > clearly. I > > > > will > > > > > do that tonight. > > > > > > > > > > Bill > > > > > > > > > > On Fri, Feb 19, 2016 at 7:02 PM, Jun Rao <j...@confluent.io> wrote: > > > > > > > > > > > Hi, Bill, > > > > > > > > > > > > I replied with the following comments earlier to the thread. Did > > you > > > > see > > > > > > that? > > > > > > > > > > > > Thanks for the proposal. A couple of comments. > > > > > > > > > > > > 1. It seems that this new policy should work for CreateTime as > > well. > > > > If a > > > > > > topic is configured with CreateTime, messages may not be added in > > > > strict > > > > > > order in the log. However, to build a time-based index, we will > be > > > > > > maintaining the largest timestamp for all messages in a log > > segment. > > > We > > > > > can > > > > > > delete a segment if its largest timestamp is less than > > > > > > log.retention.min.timestamp. This guarantees that no messages > newer > > > > than > > > > > > log.retention.min.timestamp will be deleted, which is probably > what > > > the > > > > > > user wants. > > > > > > > > > > > > 2. Right now, the user can specify "delete" as the retention > policy > > > > and a > > > > > > log segment will be deleted either when the size of a partition > > > > exceeds a > > > > > > threshold or the timestamp of a segment is older than a relative > > > period > > > > > of > > > > > > time (say 7 days) from now. What you are proposing is not a new > > > > retention > > > > > > policy, but an additional check that will cause a segment to be > > > deleted > > > > > > when the timestamp of a segment is older than an absolute > > timestamp? > > > If > > > > > so, > > > > > > could you update the wiki accordingly? > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Jun > > > > > > > > > > > > On Fri, Feb 19, 2016 at 2:57 PM, Bill Warshaw < > wdwars...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > Hello all, > > > > > > > > > > > > > > What is the next step with this proposal? The work for KIP-32 > > that > > > > it > > > > > > was > > > > > > > based off merged earlier today ( > > > > > https://github.com/apache/kafka/pull/764 > > > > > > , > > > > > > > thank you Becket). I have an implementation with tests, and > I've > > > > > > confirmed > > > > > > > that it actually works in a live system. Is there more > > discussion > > > > that > > > > > > > needs to be had about this KIP, or should I start a VOTE > thread? > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Feb 16, 2016 at 5:06 PM, Jun Rao <j...@confluent.io> > > wrote: > > > > > > > > > > > > > > > Bill, > > > > > > > > > > > > > > > > Thanks for the proposal. A couple of comments. > > > > > > > > > > > > > > > > 1. It seems that this new policy should work for CreateTime > as > > > > well. > > > > > > If a > > > > > > > > topic is configured with CreateTime, messages may not be > added > > in > > > > > > strict > > > > > > > > order in the log. However, to build a time-based index, we > will > > > be > > > > > > > > maintaining the largest timestamp for all messages in a log > > > > segment. > > > > > We > > > > > > > can > > > > > > > > delete a segment if its largest timestamp is less than > > > > > > > > log.retention.min.timestamp. This guarantees that no messages > > > newer > > > > > > than > > > > > > > > log.retention.min.timestamp will be deleted, which is > probably > > > what > > > > > the > > > > > > > > user wants. > > > > > > > > > > > > > > > > 2. Right now, the user can specify "delete" as the retention > > > policy > > > > > > and a > > > > > > > > log segment will be deleted either when the size of a > partition > > > > > > exceeds a > > > > > > > > threshold or the timestamp of a segment is older than a > > relative > > > > > period > > > > > > > of > > > > > > > > time (say 7 days) from now. What you are proposing is not a > new > > > > > > retention > > > > > > > > policy, but an additional check that will cause a segment to > be > > > > > deleted > > > > > > > > when the timestamp of a segment is older than an absolute > > > > timestamp? > > > > > If > > > > > > > so, > > > > > > > > could you update the wiki accordingly? > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Feb 13, 2016 at 3:23 PM, Bill Warshaw < > > > wdwars...@gmail.com > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > That is a good catch, thanks for pointing it out. If this > > KIP > > > is > > > > > > > > accepted, > > > > > > > > > we'd need to document this and make the log cleaner not run > > > > > > > > timestamp-based > > > > > > > > > deletion unless message.timestamp.type=LogAppendTime. > > > > > > > > > > > > > > > > > > On Sat, Feb 13, 2016 at 5:38 AM, Andrew Schofield < > > > > > > > > > andrew_schofield_j...@outlook.com> wrote: > > > > > > > > > > > > > > > > > > > This KIP is related to KIP-32, but I strikes me that it > > only > > > > > makes > > > > > > > > sense > > > > > > > > > > with one of the two proposed message timestamp types. If > I > > > > > > understand > > > > > > > > > > correctly, message timestamps are only certain to be > > > > > monotonically > > > > > > > > > > increasing in the log if > > > message.timestamp.type=LogAppendTime. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Does timestamp-based auto-expiration require use of > > > > > > > > > > message.timestamp.type=LogAppendTime? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think this KIP is a good idea, but I think it relies on > > > > strict > > > > > > > > ordering > > > > > > > > > > of timestamps to be workable. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Andrew Schofield > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Date: Fri, 12 Feb 2016 10:38:46 -0800 > > > > > > > > > > > Subject: Re: [DISCUSS] KIP-47 - Add timestamp-based log > > > > > deletion > > > > > > > > policy > > > > > > > > > > > From: n...@confluent.io > > > > > > > > > > > To: dev@kafka.apache.org > > > > > > > > > > > > > > > > > > > > > > Adding a timestamp based auto-expiration is useful and > > this > > > > > > > proposal > > > > > > > > > > makes > > > > > > > > > > > sense. Thx! > > > > > > > > > > > > > > > > > > > > > > On Wed, Feb 10, 2016 at 3:35 PM, Jay Kreps wrote: > > > > > > > > > > > > > > > > > > > > > >> I think this makes a lot of sense and won't be hard to > > > > > implement > > > > > > > and > > > > > > > > > > >> doesn't create too much in the way of new interfaces. > > > > > > > > > > >> > > > > > > > > > > >> -Jay > > > > > > > > > > >> > > > > > > > > > > >> On Tue, Feb 9, 2016 at 8:13 AM, Bill Warshaw wrote: > > > > > > > > > > >> > > > > > > > > > > >>> Hello, > > > > > > > > > > >>> > > > > > > > > > > >>> I just submitted KIP-47 for adding a new log deletion > > > > policy > > > > > > > based > > > > > > > > > on a > > > > > > > > > > >>> minimum timestamp of messages to retain. > > > > > > > > > > >>> > > > > > > > > > > >>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy > > > > > > > > > > >>> > > > > > > > > > > >>> I'm open to any comments or suggestions. > > > > > > > > > > >>> > > > > > > > > > > >>> Thanks, > > > > > > > > > > >>> Bill Warshaw > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > Thanks, > > > > > > > > > > > Neha > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >