Re: [DISCUSS] KIP-515: Reorganize checkpoint system in log cleaner to be per partition

Richard Yu Thu, 19 Sep 2019 15:22:14 -0700

Hi Jason,

That hadn't occurred to me.


I think I missed your comment in the discussion, so I created this KIP only
with resolving the problem regarding tombstones.
Whats your thoughts? If the problem regarding transaction markers is a
little too complex, then we can we just leave it out of the KIP and fix the
tombstones issue.

Cheers,
Richard

On Thu, Sep 19, 2019 at 8:47 AM Jason Gustafson <ja...@confluent.io> wrote:

> Hi Richard,
>
> Just reposting my comment from the JIRA:
>
> The underlying problem here also impacts the cleaning of transaction
> markers. We use the same delete horizon in order to tell when it is safe to
> remove the marker. If all the data from a transaction has been cleaned and
> the delete horizon has passed enough time, then the marker is eligible for
> deletion.
>
> However, I don't think the same approach that we're proposing to fix the
> problem for tombstones will work transaction markers. What we need to track
> is the timestamp when all the records from a transaction have been removed.
> That is when we start the timer for deletion. But this would be different
> for every transaction and there is no guarantee that earlier transactions
> will be eligible for deletion before later ones. It all depends on the keys
> written in the transaction. I don't see an obvious way to solve this
> problem without some record-level bookkeeping, but I might be missing
> something.
>
> Thanks,
> Jason
>
> On Mon, Sep 9, 2019 at 7:21 PM Richard Yu <yohan.richard...@gmail.com>
> wrote:
>
> > Hi Jun,
> >
> > Thanks for chipping in. :)
> >
> > The description you provided is pretty apt in describing the motivation
> of
> > the KIP, so I will add it. I've made some changes to the KIP and outlined
> > the basic approaches of what we have so far (basically changing the
> > checkpoint file organization or incorporating an extra internal header
> > field for a record). I will expand on them shortly.
> >
> > Any comments are appreciated!
> >
> > Cheers,
> > Richard
> >
> > On Mon, Sep 9, 2019 at 3:10 PM Jun Rao <j...@confluent.io> wrote:
> >
> > > Hi, Richard,
> > >
> > > Thanks for drafting the KIP. A few comments below.
> > >
> > > 1. We need to provide a better motivation for the KIP. The goal of the
> > KIP
> > > is not to reorganize the checkpoint for log cleaning. It's just an
> > > implementation detail. I was thinking that we could add sth like the
> > > following in the Motivation/Problem section.
> > >
> > > "The idea of the configuration delete.retention.ms for compacted
> topics
> > is
> > > to prevent an application that has read a key to not see a subsequent
> > > deletion of the key because it's physically removed too early. To solve
> > > this problem, from the latest possible time (deleteHorizonMs) that an
> > > application could have read a non tombstone key before a tombstone, we
> > > preserve that tombstone for at least delete.retention.ms and require
> the
> > > application to complete the reading of the tombstone by then.
> > >
> > > deleteHorizonMs is no later than the time when the cleaner has cleaned
> up
> > > to the tombstone. After that time, no application can read a
> > non-tombstone
> > > key before the tombstone because they have all been cleaned away
> through
> > > compaction. Since currently we don't explicitly store the time when a
> > round
> > > of cleaning completes, deleteHorizonMs is estimated by the last
> modified
> > > time of the segment containing firstDirtyOffset. When merging multiple
> > log
> > > segments into a single one, the last modified time is inherited from
> the
> > > last merged segment. So the last modified time of the newly merged
> > segment
> > > is actually not an accurate estimate of deleteHorizonMs. It could be
> > > arbitrarily before (KAFKA-4545 <https://issues.apache.org/jira/browse/
> >)
> > > or
> > > after (KAFKA-8522 <https://issues.apache.org/jira/browse/KAFKA-8522>).
> > The
> > > former causes the tombstone to be deleted too early, which can cause an
> > > application to miss the deletion of a key. The latter causes the
> > tombstone
> > > to be retained longer than needed and potentially forever."
> > >
> > > We probably want to change the title of the KIP accordingly.
> > >
> > > 2. The proposed implementation of the KIP is to remember the
> > > firstDirtyOffset offset and the corresponding cleaning time in a
> > checkpoint
> > > file per partition and then use them to estimate deleteHorizonMs. It
> > would
> > > be useful to document the format of the new checkpoint file and how it
> > will
> > > be used during cleaning. Some examples will be helpful.
> > >
> > > 3. Thinking about this more. There is another way to solve this
> problem.
> > We
> > > could write the deleteHorizonMs for each tombstone as an internal
> header
> > > field of the record (e.g., __deleteHorizonMs). That timestamp could be
> > the
> > > starting time of the log cleaner when the tombstone's offset is <=
> > > firstDirtyOffset. We could use this timestamp to determine whether the
> > > tombstone should be removed in subsequent rounds of cleaning. This way,
> > we
> > > can still keep the current per disk checkpoint file, which is more
> > > efficient. Personally, I think this approach may be better. Could you
> > > document this approach in the wiki as well so that we can discuss which
> > one
> > > to pick?
> > >
> > >
> > > Jun
> > >
> > >
> > > On Sun, Sep 1, 2019 at 7:45 PM Richard Yu <yohan.richard...@gmail.com>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > A KIP has been written that wishes to upgrade the checkpoint file
> > system
> > > in
> > > > log cleaner.
> > > > If anybody wishes to comment, feel free to do so. :)
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-515%3A+Reorganize+checkpoint+file+system+in+log+cleaner+to+be+per+partition
> > > > Above is the link for reference.
> > > >
> > > > Cheers,
> > > > Richard
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-515: Reorganize checkpoint system in log cleaner to be per partition

Reply via email to