Re: [VOTE] KIP-534: Retain tombstones for approximately delete.retention.ms milliseconds

Guozhang Wang Thu, 17 Oct 2019 15:55:18 -0700

Thanks for the reply Jason.

On Thu, Oct 17, 2019 at 10:40 AM Jason Gustafson <ja...@confluent.io> wrote:


> Hi Guozhang,
>
> It's a fair point. For control records, I think it's a non-issue since they
> are tiny and not batched. So the only case where this might matter is large
> batch deletions. I think the size difference is not a major issue itself,
> but I think it's worth mentioning in the KIP the risk of exceeding the max
> message size. I think the code should probably make this more of a soft
> limit when cleaning. We have run into scenarios in the past as well where
> recompression has actually increased message size. We may also want to be
> able to upconvert messages to the new format in the future in the cleaner.
>
> -Jason
>
>
>
> On Thu, Oct 17, 2019 at 9:08 AM Guozhang Wang <wangg...@gmail.com> wrote:
>
> > Here's my understanding: when log compaction kicks in, the system time at
> > the moment would be larger than the message timestamp to be compacted, so
> > the modification on the batch timestamp would practically be increasing
> its
> > value, and hence the deltas for each inner message would be negative to
> > maintain their actual timestamp. Depending on the time diff between the
> > actual timestamp of the message and the time when log compaction happens,
> > this negative delta can be large or small since it not long depends on
> the
> > cleaner thread wakeup frequency but also dirty ratio etc.
> >
> > With varInt encoding, the num.bytes needed for encode an int varies from
> 1
> > to 5 bytes; before compaction, the deltas should be relatively small
> > positive values compared with the base timestamp, and hence most likely 1
> > or 2 bytes needed to encode, after compaction, the deltas could be
> > relatively large negative values that may take more bytes to encode.
> With a
> > record batch of 512 in practice, and suppose after compaction each record
> > would take 2 more byte for encoding deltas, that would be 1K more per
> > batch. Usually it would not be too big of an issue with reasonable sized
> > message, but I just wanted to point out this as a potential regression.
> >
> >
> > Guozhang
> >
> > On Wed, Oct 16, 2019 at 9:36 PM Richard Yu <yohan.richard...@gmail.com>
> > wrote:
> >
> > > Hi Guozhang,
> > >
> > > Your understanding basically is on point.
> > >
> > > I haven't looked into the details for what happens if we change the
> base
> > > timestamp and how its calculated, so I'm not clear on how small the
> delta
> > > (or big) is.
> > > To be fair, would the the delta size pose a big problem if it takes up
> > more
> > > bytes to encode?
> > >
> > > Cheers,
> > > Richard
> > >
> > > On Wed, Oct 16, 2019 at 7:36 PM Guozhang Wang <wangg...@gmail.com>
> > wrote:
> > >
> > > > Hello Richard,
> > > >
> > > > Thanks for the KIP, I just have one clarification regarding "So the
> > idea
> > > is
> > > > to set the base timestamp to the delete horizon and adjust the deltas
> > > > accordingly." My understanding is that during compaction, for each
> > > > compacted new segment, we would set its base offset of each batch as
> > the
> > > > delete horizon, which is the "current system time that cleaner has
> seen
> > > so
> > > > far", and adjust the delta timestamps of each of the inner records of
> > the
> > > > batch (and practically the deltas will be all negative)?
> > > >
> > > > If that's case, could we do some back of the envelope calculation on
> > > what's
> > > > the possible smallest case of deltas? Note that since we use varInt
> for
> > > > delta values for each record, the smaller the negative delta, that
> > would
> > > > take more bytes to encode.
> > > >
> > > > Guozhang
> > > >
> > > > On Wed, Oct 16, 2019 at 6:48 PM Jason Gustafson <ja...@confluent.io>
> > > > wrote:
> > > >
> > > > > +1. Thanks Richard.
> > > > >
> > > > > On Wed, Oct 16, 2019 at 10:04 AM Richard Yu <
> > > yohan.richard...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > Want to try to get this KIP wrapped up. So it would be great if
> we
> > > can
> > > > > get
> > > > > > some votes.
> > > > > >
> > > > > > Cheers,
> > > > > > Richard
> > > > > >
> > > > > > On Tue, Oct 15, 2019 at 12:58 PM Jun Rao <j...@confluent.io>
> wrote:
> > > > > >
> > > > > > > Hi, Richard,
> > > > > > >
> > > > > > > Thanks for the updated KIP. +1 from me.
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > > On Tue, Oct 15, 2019 at 12:46 PM Richard Yu <
> > > > > yohan.richard...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Jun,
> > > > > > > >
> > > > > > > > I've updated the link accordingly. :)
> > > > > > > > Here is the updated KIP link:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+and+transaction+markers+for+approximately+delete.retention.ms+milliseconds
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Richard
> > > > > > > >
> > > > > > > > On Mon, Oct 14, 2019 at 5:12 PM Jun Rao <j...@confluent.io>
> > > wrote:
> > > > > > > >
> > > > > > > > > Hi, Richard,
> > > > > > > > >
> > > > > > > > > Thanks for the KIP. Looks good to me overall. A few minor
> > > > comments
> > > > > > > below.
> > > > > > > > >
> > > > > > > > > 1. Could you change the title from "Retain tombstones" to
> > > "Retain
> > > > > > > > > tombstones and transaction markers" to make it more
> general?
> > > > > > > > >
> > > > > > > > > 2. Could you document which bit in the batch attribute will
> > be
> > > > used
> > > > > > for
> > > > > > > > the
> > > > > > > > > new flag? The current format of the batch attribute is the
> > > > > following.
> > > > > > > > >
> > > > > > > > > *
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------
> > > > > > > > > *  | Unused (6-15) | Control (5) | Transactional (4) |
> > > Timestamp
> > > > > Type
> > > > > > > > > (3) | Compression Type (0-2) |
> > > > > > > > >
> > > > > > > > > *
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------
> > > > > > > > >
> > > > > > > > > 3. Could you provide the reasons for the rejected
> proposals?
> > > For
> > > > > > > > > proposal 1, one reason is that it doesn't cover the
> > transaction
> > > > > > > > > markers. For proposal 2, one reason is that the interval
> > record
> > > > > > header
> > > > > > > > > could be exposed to the clients.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Jun
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Oct 14, 2019 at 4:42 PM Richard Yu <
> > > > > > yohan.richard...@gmail.com
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi all,
> > > > > > > > > >
> > > > > > > > > > The discussion for KIP-534 seems to have concluded.
> > > > > > > > > > So I wish to vote this in so that we can get it done.
> Its a
> > > > small
> > > > > > bug
> > > > > > > > > fix.
> > > > > > > > > > :)
> > > > > > > > > >
> > > > > > > > > > Below is the KIP link:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+for+approximately+delete.retention.ms+milliseconds
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Richard
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>


-- 
-- Guozhang

Re: [VOTE] KIP-534: Retain tombstones for approximately delete.retention.ms milliseconds

Reply via email to