Great - thanks for clarifying.

Joel

On Tue, Feb 23, 2016 at 1:47 PM, Bill Warshaw <wdwars...@gmail.com> wrote:

> Sorry that I didn't see this comment before the meeting Joel.  I'll try to
> clarify what I said at the meeting:
>
> - The KIP currently states that timestamp-based log deletion will only work
> with LogAppendTime.  I need to update the KIP to reflect that, after the
> work is done for KIP-33, it will work with both LogAppendTime and
> CreateTime.
> - To use the existing time-based retention mechanism to delete a precise
> range of messages, a client application would need to do the following:
>   - by default, turn off these retention mechanisms
>   - when the application wishes to delete a range of messages which were
> sent before a certain time, compute an approximate value to set
> "log.retention.minutes" to, to create a window of messages based on that
> timestamp that are ok to delete.  There is some degree of imprecision
> implied here.
>   - wait until we are confident that the log retention mechanism has been
> run and deleted any stale segments
>   - reset "log.retention.minutes" to turn off time-based log retention
> until the next time the client application wants to delete something
>
> - To use the proposed timestamp-based retention mechanism, there is only
> one step: the application just has to set "log.retention.min.timestamp" to
> whatever time boundary it deems fit.  It doesn't need to compute any fuzzy
> windows, try to wait until asynchronous processes have been completed or
> continually flip settings between enabled and disabled.
>
> I will update the KIP to reflect the discussion around LogAppendTime vs
> CreateTime and the work being done in KIP-33.
>
> Thanks,
> Bill
>
>
> On Tue, Feb 23, 2016 at 1:22 PM, Joel Koshy <jjkosh...@gmail.com> wrote:
>
> > I'm having some trouble reconciling the current proposal with your
> original
> > requirement which was essentially being able to purge log data up to a
> > precise point (an offset). The KIP currently suggests that
> timestamp-based
> > deletion would only work with LogAppendTime, so it does not seem
> > significantly different from time-based retention (after KIP-32/33) - IOW
> > to me it appears that you would need to use CreateTime and not
> > LogAppendTime. Also one of the rejected alternatives observes that
> changing
> > the existing configuration settings to try to flush ranges of a given
> > partition's log are problematic, but it seems to me you would have to do
> > this in with timestamp-based deletion as well right? I think it would be
> > useful for me if you or anyone else can go over the exact
> > mechanics/workflow for accomplishing precise purges at today's KIP
> meeting.
> >
> > Thanks,
> >
> > Joel
> >
> > On Monday, February 22, 2016, Bill Warshaw <wdwars...@gmail.com> wrote:
> >
> > > Sounds good.  I'll hold off on sending out a VOTE thread until after
> the
> > > KIP meeting tomorrow.
> > >
> > > On Mon, Feb 22, 2016 at 12:56 PM, Becket Qin <becket....@gmail.com>
> > wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > I think it makes sense to implement KIP-47 after KIP-33 so we can
> make
> > it
> > > > work for both LogAppendTime and CreateTime.
> > > >
> > > > And yes, I'm actively working on KIP-33. I had a voting thread on
> > KIP-33
> > > > before and I'll bump it up.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > >
> > > >
> > > > On Mon, Feb 22, 2016 at 9:11 AM, Jun Rao <j...@confluent.io> wrote:
> > > >
> > > > > Becket,
> > > > >
> > > > > Since you submitted KIP-33, are you actively working on that? If
> so,
> > it
> > > > > would make sense to implement KIP-47 after KIP-33 so that it works
> > for
> > > > both
> > > > > CreateTime and LogAppendTime.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Feb 19, 2016 at 6:25 PM, Bill Warshaw <wdwars...@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > 1.  I thought more about Andrew's comment about LogAppendTime.
> The
> > > > > > time-based index you are referring to is associated with KIP-33,
> > > > correct?
> > > > > > Currently my implementation is just checking the last message in
> a
> > > > > segment,
> > > > > > so we're restricted to LogAppendTime.  When the work for KIP-33
> is
> > > > > > completed, it sounds like CreateTime would also be valid.  Do you
> > > > happen
> > > > > to
> > > > > > know if anyone is currently working on KIP-33?
> > > > > >
> > > > > > 2. I did update the wiki after reading your original comment, but
> > > > reading
> > > > > > over it again I realize I could word a couple things more
> > clearly.  I
> > > > > will
> > > > > > do that tonight.
> > > > > >
> > > > > > Bill
> > > > > >
> > > > > > On Fri, Feb 19, 2016 at 7:02 PM, Jun Rao <j...@confluent.io>
> wrote:
> > > > > >
> > > > > > > Hi, Bill,
> > > > > > >
> > > > > > > I replied with the following comments earlier to the thread.
> Did
> > > you
> > > > > see
> > > > > > > that?
> > > > > > >
> > > > > > > Thanks for the proposal. A couple of comments.
> > > > > > >
> > > > > > > 1. It seems that this new policy should work for CreateTime as
> > > well.
> > > > > If a
> > > > > > > topic is configured with CreateTime, messages may not be added
> in
> > > > > strict
> > > > > > > order in the log. However, to build a time-based index, we will
> > be
> > > > > > > maintaining the largest timestamp for all messages in a log
> > > segment.
> > > > We
> > > > > > can
> > > > > > > delete a segment if its largest timestamp is less than
> > > > > > > log.retention.min.timestamp. This guarantees that no messages
> > newer
> > > > > than
> > > > > > > log.retention.min.timestamp will be deleted, which is probably
> > what
> > > > the
> > > > > > > user wants.
> > > > > > >
> > > > > > > 2. Right now, the user can specify "delete" as the retention
> > policy
> > > > > and a
> > > > > > > log segment will be deleted either when the size of a partition
> > > > > exceeds a
> > > > > > > threshold or the timestamp of a segment is older than a
> relative
> > > > period
> > > > > > of
> > > > > > > time (say 7 days) from now. What you are proposing is not a new
> > > > > retention
> > > > > > > policy, but an additional check that will cause a segment to be
> > > > deleted
> > > > > > > when the timestamp of a segment is older than an absolute
> > > timestamp?
> > > > If
> > > > > > so,
> > > > > > > could you update the wiki accordingly?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > > On Fri, Feb 19, 2016 at 2:57 PM, Bill Warshaw <
> > wdwars...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hello all,
> > > > > > > >
> > > > > > > > What is the next step with this proposal?  The work for
> KIP-32
> > > that
> > > > > it
> > > > > > > was
> > > > > > > > based off merged earlier today (
> > > > > > https://github.com/apache/kafka/pull/764
> > > > > > > ,
> > > > > > > > thank you Becket).  I have an implementation with tests, and
> > I've
> > > > > > > confirmed
> > > > > > > > that it actually works in a live system.  Is there more
> > > discussion
> > > > > that
> > > > > > > > needs to be had about this KIP, or should I start a VOTE
> > thread?
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Feb 16, 2016 at 5:06 PM, Jun Rao <j...@confluent.io>
> > > wrote:
> > > > > > > >
> > > > > > > > > Bill,
> > > > > > > > >
> > > > > > > > > Thanks for the proposal. A couple of comments.
> > > > > > > > >
> > > > > > > > > 1. It seems that this new policy should work for CreateTime
> > as
> > > > > well.
> > > > > > > If a
> > > > > > > > > topic is configured with CreateTime, messages may not be
> > added
> > > in
> > > > > > > strict
> > > > > > > > > order in the log. However, to build a time-based index, we
> > will
> > > > be
> > > > > > > > > maintaining the largest timestamp for all messages in a log
> > > > > segment.
> > > > > > We
> > > > > > > > can
> > > > > > > > > delete a segment if its largest timestamp is less than
> > > > > > > > > log.retention.min.timestamp. This guarantees that no
> messages
> > > > newer
> > > > > > > than
> > > > > > > > > log.retention.min.timestamp will be deleted, which is
> > probably
> > > > what
> > > > > > the
> > > > > > > > > user wants.
> > > > > > > > >
> > > > > > > > > 2. Right now, the user can specify "delete" as the
> retention
> > > > policy
> > > > > > > and a
> > > > > > > > > log segment will be deleted either when the size of a
> > partition
> > > > > > > exceeds a
> > > > > > > > > threshold or the timestamp of a segment is older than a
> > > relative
> > > > > > period
> > > > > > > > of
> > > > > > > > > time (say 7 days) from now. What you are proposing is not a
> > new
> > > > > > > retention
> > > > > > > > > policy, but an additional check that will cause a segment
> to
> > be
> > > > > > deleted
> > > > > > > > > when the timestamp of a segment is older than an absolute
> > > > > timestamp?
> > > > > > If
> > > > > > > > so,
> > > > > > > > > could you update the wiki accordingly?
> > > > > > > > >
> > > > > > > > > Jun
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Sat, Feb 13, 2016 at 3:23 PM, Bill Warshaw <
> > > > wdwars...@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hello,
> > > > > > > > > >
> > > > > > > > > > That is a good catch, thanks for pointing it out.  If
> this
> > > KIP
> > > > is
> > > > > > > > > accepted,
> > > > > > > > > > we'd need to document this and make the log cleaner not
> run
> > > > > > > > > timestamp-based
> > > > > > > > > > deletion unless message.timestamp.type=LogAppendTime.
> > > > > > > > > >
> > > > > > > > > > On Sat, Feb 13, 2016 at 5:38 AM, Andrew Schofield <
> > > > > > > > > > andrew_schofield_j...@outlook.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > This KIP is related to KIP-32, but I strikes me that it
> > > only
> > > > > > makes
> > > > > > > > > sense
> > > > > > > > > > > with one of the two proposed message timestamp types.
> If
> > I
> > > > > > > understand
> > > > > > > > > > > correctly, message timestamps are only certain to be
> > > > > > monotonically
> > > > > > > > > > > increasing in the log if
> > > > message.timestamp.type=LogAppendTime.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Does timestamp-based auto-expiration require use of
> > > > > > > > > > > message.timestamp.type=LogAppendTime?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I think this KIP is a good idea, but I think it relies
> on
> > > > > strict
> > > > > > > > > ordering
> > > > > > > > > > > of timestamps to be workable.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Andrew Schofield
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > Date: Fri, 12 Feb 2016 10:38:46 -0800
> > > > > > > > > > > > Subject: Re: [DISCUSS] KIP-47 - Add timestamp-based
> log
> > > > > > deletion
> > > > > > > > > policy
> > > > > > > > > > > > From: n...@confluent.io
> > > > > > > > > > > > To: dev@kafka.apache.org
> > > > > > > > > > > >
> > > > > > > > > > > > Adding a timestamp based auto-expiration is useful
> and
> > > this
> > > > > > > > proposal
> > > > > > > > > > > makes
> > > > > > > > > > > > sense. Thx!
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Feb 10, 2016 at 3:35 PM, Jay Kreps  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >> I think this makes a lot of sense and won't be hard
> to
> > > > > > implement
> > > > > > > > and
> > > > > > > > > > > >> doesn't create too much in the way of new
> interfaces.
> > > > > > > > > > > >>
> > > > > > > > > > > >> -Jay
> > > > > > > > > > > >>
> > > > > > > > > > > >> On Tue, Feb 9, 2016 at 8:13 AM, Bill Warshaw  wrote:
> > > > > > > > > > > >>
> > > > > > > > > > > >>> Hello,
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> I just submitted KIP-47 for adding a new log
> deletion
> > > > > policy
> > > > > > > > based
> > > > > > > > > > on a
> > > > > > > > > > > >>> minimum timestamp of messages to retain.
> > > > > > > > > > > >>>
> > > > > > > > > > > >>>
> > > > > > > > > > > >>>
> > > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> I'm open to any comments or suggestions.
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> Thanks,
> > > > > > > > > > > >>> Bill Warshaw
> > > > > > > > > > > >>>
> > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Neha
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to