The sentence ` (KAFKA-4545 <https://issues.apache.org/jira/browse/>) `
could be change to `KAFKA-4545 <
https://issues.apache.org/jira/browse/KAFKA-4545
<https://issues.apache.org/jira/browse/KAFKA-8522>>` ?

On Sun, Oct 13, 2019 at 2:08 AM Richard Yu <yohan.richard...@gmail.com>
wrote:

> Hi Jun, Jason,
>
> I've updated the KIP accordingly.
> Sorry for taking a while to get back to you guys.
>
> We should've ironed out the general approach fairly well now.
> So if there isn't any other comments, I will get the vote started then. :)
>
> Cheers,
> Richard
>
> On Thu, Oct 10, 2019 at 3:41 PM Jun Rao <j...@confluent.io> wrote:
>
> > Hi, Jason,
> >
> > I agree that your approach is better since it's more general, more
> accurate
> > and simpler. The only thing is that it may not work for the old message
> > format. I am not sure how important it is since most users are probably
> > already on the new message format. Perhaps we can just document this
> > limitation.
> >
> > Hi, Richard,
> >
> > Could you update the KIP with Jason's approach? Also, it seems that
> KIP-515
> > is already taken by another KIP. Could you use a new KIP number for this?
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Sep 27, 2019 at 3:59 PM Richard Yu <yohan.richard...@gmail.com>
> > wrote:
> >
> > > Hi Jason,
> > >
> > > That actually sounds like a pretty good idea to me. No doubt if we use
> > this
> > > approach, then some comments need to be added that indicates this.
> > > But all things considered, I think its not bad at all.
> > >
> > > I definitely agree with you on that its a little hacky, but it works.
> > >
> > > Cheers,
> > > Richard
> > >
> > > On Tue, Sep 24, 2019 at 10:44 AM Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > >
> > > > Hi Richard,
> > > >
> > > > It would be unsatisfying to make a big change to the checkpointing
> > logic
> > > in
> > > > order to handle only one case of this problem, right?
> > > >
> > > > I did have one idea about how to do this. It's a bit of a hack, but
> > keep
> > > an
> > > > open mind ;). The basic problem is having somewhere to embed the
> delete
> > > > horizon for each batch. In the v2 format, each batch header contains
> > two
> > > > timestamps: the base timestamp and the max timestamp. Each record in
> > the
> > > > batch contains a timestamp delta which is relative to the base
> > timestamp.
> > > > In other words, to get the record timestamp, you add the record delta
> > to
> > > > the base timestamp.
> > > >
> > > > Typically there is no reason for the base timestamp to be different
> > from
> > > > the timestamp of the first message, but this is not a strict
> > requirement.
> > > > As long as you can get to the record timestamp by adding the base
> > > timestamp
> > > > and delta, then we are good. So the idea is to set the base timestamp
> > to
> > > > the delete horizon and adjust the deltas accordingly. We could then
> use
> > > one
> > > > bit from the batch attributes to indicate when the base timestamp had
> > > been
> > > > set to the delete horizon. There would be no change to the batch max
> > > > timestamp, so indexing would not be affected by this change.
> > > >
> > > > So the logic would look something like this when cleaning the log.
> > > >
> > > > Case 1: Normal batch
> > > >
> > > > a. If delete horizon flag is set, then retain tombstones as long as
> the
> > > > current time is before the horizon.
> > > > b. If no delete horizon is set, then retain tombstones and set the
> > delete
> > > > horizon in the cleaned batch to current time +
> > > > log.cleaner.delete.retention.ms.
> > > >
> > > > Case 2: Control batch
> > > >
> > > > a. If delete horizon flag is set, then retain the batch and the
> marker
> > > > as long as the current time is before the horizon.
> > > > b. If no delete horizon is set and there are no records remaining
> from
> > > the
> > > > transaction, then retain the marker and set the delete horizon in the
> > > > cleaned batch to current time + log.cleaner.delete.retention.ms.
> > > >
> > > > What do you think?
> > > >
> > > > -Jason
> > > >
> > > >
> > > >
> > > > On Thu, Sep 19, 2019 at 3:21 PM Richard Yu <
> yohan.richard...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hi Jason,
> > > > >
> > > > > That hadn't occurred to me.
> > > > >
> > > > > I think I missed your comment in the discussion, so I created this
> > KIP
> > > > only
> > > > > with resolving the problem regarding tombstones.
> > > > > Whats your thoughts? If the problem regarding transaction markers
> is
> > a
> > > > > little too complex, then we can we just leave it out of the KIP and
> > fix
> > > > the
> > > > > tombstones issue.
> > > > >
> > > > > Cheers,
> > > > > Richard
> > > > >
> > > > > On Thu, Sep 19, 2019 at 8:47 AM Jason Gustafson <
> ja...@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > Hi Richard,
> > > > > >
> > > > > > Just reposting my comment from the JIRA:
> > > > > >
> > > > > > The underlying problem here also impacts the cleaning of
> > transaction
> > > > > > markers. We use the same delete horizon in order to tell when it
> is
> > > > safe
> > > > > to
> > > > > > remove the marker. If all the data from a transaction has been
> > > cleaned
> > > > > and
> > > > > > the delete horizon has passed enough time, then the marker is
> > > eligible
> > > > > for
> > > > > > deletion.
> > > > > >
> > > > > > However, I don't think the same approach that we're proposing to
> > fix
> > > > the
> > > > > > problem for tombstones will work transaction markers. What we
> need
> > to
> > > > > track
> > > > > > is the timestamp when all the records from a transaction have
> been
> > > > > removed.
> > > > > > That is when we start the timer for deletion. But this would be
> > > > different
> > > > > > for every transaction and there is no guarantee that earlier
> > > > transactions
> > > > > > will be eligible for deletion before later ones. It all depends
> on
> > > the
> > > > > keys
> > > > > > written in the transaction. I don't see an obvious way to solve
> > this
> > > > > > problem without some record-level bookkeeping, but I might be
> > missing
> > > > > > something.
> > > > > >
> > > > > > Thanks,
> > > > > > Jason
> > > > > >
> > > > > > On Mon, Sep 9, 2019 at 7:21 PM Richard Yu <
> > > yohan.richard...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Jun,
> > > > > > >
> > > > > > > Thanks for chipping in. :)
> > > > > > >
> > > > > > > The description you provided is pretty apt in describing the
> > > > motivation
> > > > > > of
> > > > > > > the KIP, so I will add it. I've made some changes to the KIP
> and
> > > > > outlined
> > > > > > > the basic approaches of what we have so far (basically changing
> > the
> > > > > > > checkpoint file organization or incorporating an extra internal
> > > > header
> > > > > > > field for a record). I will expand on them shortly.
> > > > > > >
> > > > > > > Any comments are appreciated!
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Richard
> > > > > > >
> > > > > > > On Mon, Sep 9, 2019 at 3:10 PM Jun Rao <j...@confluent.io>
> wrote:
> > > > > > >
> > > > > > > > Hi, Richard,
> > > > > > > >
> > > > > > > > Thanks for drafting the KIP. A few comments below.
> > > > > > > >
> > > > > > > > 1. We need to provide a better motivation for the KIP. The
> goal
> > > of
> > > > > the
> > > > > > > KIP
> > > > > > > > is not to reorganize the checkpoint for log cleaning. It's
> just
> > > an
> > > > > > > > implementation detail. I was thinking that we could add sth
> > like
> > > > the
> > > > > > > > following in the Motivation/Problem section.
> > > > > > > >
> > > > > > > > "The idea of the configuration delete.retention.ms for
> > compacted
> > > > > > topics
> > > > > > > is
> > > > > > > > to prevent an application that has read a key to not see a
> > > > subsequent
> > > > > > > > deletion of the key because it's physically removed too
> early.
> > To
> > > > > solve
> > > > > > > > this problem, from the latest possible time (deleteHorizonMs)
> > > that
> > > > an
> > > > > > > > application could have read a non tombstone key before a
> > > tombstone,
> > > > > we
> > > > > > > > preserve that tombstone for at least delete.retention.ms and
> > > > require
> > > > > > the
> > > > > > > > application to complete the reading of the tombstone by then.
> > > > > > > >
> > > > > > > > deleteHorizonMs is no later than the time when the cleaner
> has
> > > > > cleaned
> > > > > > up
> > > > > > > > to the tombstone. After that time, no application can read a
> > > > > > > non-tombstone
> > > > > > > > key before the tombstone because they have all been cleaned
> > away
> > > > > > through
> > > > > > > > compaction. Since currently we don't explicitly store the
> time
> > > > when a
> > > > > > > round
> > > > > > > > of cleaning completes, deleteHorizonMs is estimated by the
> last
> > > > > > modified
> > > > > > > > time of the segment containing firstDirtyOffset. When merging
> > > > > multiple
> > > > > > > log
> > > > > > > > segments into a single one, the last modified time is
> inherited
> > > > from
> > > > > > the
> > > > > > > > last merged segment. So the last modified time of the newly
> > > merged
> > > > > > > segment
> > > > > > > > is actually not an accurate estimate of deleteHorizonMs. It
> > could
> > > > be
> > > > > > > > arbitrarily before (KAFKA-4545 <
> > > > > https://issues.apache.org/jira/browse/
> > > > > > >)
> > > > > > > > or
> > > > > > > > after (KAFKA-8522 <
> > > > https://issues.apache.org/jira/browse/KAFKA-8522
> > > > > >).
> > > > > > > The
> > > > > > > > former causes the tombstone to be deleted too early, which
> can
> > > > cause
> > > > > an
> > > > > > > > application to miss the deletion of a key. The latter causes
> > the
> > > > > > > tombstone
> > > > > > > > to be retained longer than needed and potentially forever."
> > > > > > > >
> > > > > > > > We probably want to change the title of the KIP accordingly.
> > > > > > > >
> > > > > > > > 2. The proposed implementation of the KIP is to remember the
> > > > > > > > firstDirtyOffset offset and the corresponding cleaning time
> in
> > a
> > > > > > > checkpoint
> > > > > > > > file per partition and then use them to estimate
> > deleteHorizonMs.
> > > > It
> > > > > > > would
> > > > > > > > be useful to document the format of the new checkpoint file
> and
> > > how
> > > > > it
> > > > > > > will
> > > > > > > > be used during cleaning. Some examples will be helpful.
> > > > > > > >
> > > > > > > > 3. Thinking about this more. There is another way to solve
> this
> > > > > > problem.
> > > > > > > We
> > > > > > > > could write the deleteHorizonMs for each tombstone as an
> > internal
> > > > > > header
> > > > > > > > field of the record (e.g., __deleteHorizonMs). That timestamp
> > > could
> > > > > be
> > > > > > > the
> > > > > > > > starting time of the log cleaner when the tombstone's offset
> is
> > > <=
> > > > > > > > firstDirtyOffset. We could use this timestamp to determine
> > > whether
> > > > > the
> > > > > > > > tombstone should be removed in subsequent rounds of cleaning.
> > > This
> > > > > way,
> > > > > > > we
> > > > > > > > can still keep the current per disk checkpoint file, which is
> > > more
> > > > > > > > efficient. Personally, I think this approach may be better.
> > Could
> > > > you
> > > > > > > > document this approach in the wiki as well so that we can
> > discuss
> > > > > which
> > > > > > > one
> > > > > > > > to pick?
> > > > > > > >
> > > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > > >
> > > > > > > > On Sun, Sep 1, 2019 at 7:45 PM Richard Yu <
> > > > > yohan.richard...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > A KIP has been written that wishes to upgrade the
> checkpoint
> > > file
> > > > > > > system
> > > > > > > > in
> > > > > > > > > log cleaner.
> > > > > > > > > If anybody wishes to comment, feel free to do so. :)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-515%3A+Reorganize+checkpoint+file+system+in+log+cleaner+to+be+per+partition
> > > > > > > > > Above is the link for reference.
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Richard
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to