It's true that the size of the tombstone or the control marker could
increase after a round of cleaning. This can already happen with
re-compression. We already have the logic in the log cleaner to auto grow
the read/write buffer during cleaning and can accommodate for an oversized
message. The only potential issue is that the estimation of the output
segment assumes that a batch doesn't grow in size after cleaning, which can
cause the output segment to overflow the 2GB size limit. However, since the
default log segment size is 1GB, this is unlikely going to be an issue in
practice.
Thanks,

Jun

On Thu, Oct 17, 2019 at 3:55 PM Guozhang Wang <wangg...@gmail.com> wrote:

> Thanks for the reply Jason.
>
> On Thu, Oct 17, 2019 at 10:40 AM Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > Hi Guozhang,
> >
> > It's a fair point. For control records, I think it's a non-issue since
> they
> > are tiny and not batched. So the only case where this might matter is
> large
> > batch deletions. I think the size difference is not a major issue itself,
> > but I think it's worth mentioning in the KIP the risk of exceeding the
> max
> > message size. I think the code should probably make this more of a soft
> > limit when cleaning. We have run into scenarios in the past as well where
> > recompression has actually increased message size. We may also want to be
> > able to upconvert messages to the new format in the future in the
> cleaner.
> >
> > -Jason
> >
> >
> >
> > On Thu, Oct 17, 2019 at 9:08 AM Guozhang Wang <wangg...@gmail.com>
> wrote:
> >
> > > Here's my understanding: when log compaction kicks in, the system time
> at
> > > the moment would be larger than the message timestamp to be compacted,
> so
> > > the modification on the batch timestamp would practically be increasing
> > its
> > > value, and hence the deltas for each inner message would be negative to
> > > maintain their actual timestamp. Depending on the time diff between the
> > > actual timestamp of the message and the time when log compaction
> happens,
> > > this negative delta can be large or small since it not long depends on
> > the
> > > cleaner thread wakeup frequency but also dirty ratio etc.
> > >
> > > With varInt encoding, the num.bytes needed for encode an int varies
> from
> > 1
> > > to 5 bytes; before compaction, the deltas should be relatively small
> > > positive values compared with the base timestamp, and hence most
> likely 1
> > > or 2 bytes needed to encode, after compaction, the deltas could be
> > > relatively large negative values that may take more bytes to encode.
> > With a
> > > record batch of 512 in practice, and suppose after compaction each
> record
> > > would take 2 more byte for encoding deltas, that would be 1K more per
> > > batch. Usually it would not be too big of an issue with reasonable
> sized
> > > message, but I just wanted to point out this as a potential regression.
> > >
> > >
> > > Guozhang
> > >
> > > On Wed, Oct 16, 2019 at 9:36 PM Richard Yu <yohan.richard...@gmail.com
> >
> > > wrote:
> > >
> > > > Hi Guozhang,
> > > >
> > > > Your understanding basically is on point.
> > > >
> > > > I haven't looked into the details for what happens if we change the
> > base
> > > > timestamp and how its calculated, so I'm not clear on how small the
> > delta
> > > > (or big) is.
> > > > To be fair, would the the delta size pose a big problem if it takes
> up
> > > more
> > > > bytes to encode?
> > > >
> > > > Cheers,
> > > > Richard
> > > >
> > > > On Wed, Oct 16, 2019 at 7:36 PM Guozhang Wang <wangg...@gmail.com>
> > > wrote:
> > > >
> > > > > Hello Richard,
> > > > >
> > > > > Thanks for the KIP, I just have one clarification regarding "So the
> > > idea
> > > > is
> > > > > to set the base timestamp to the delete horizon and adjust the
> deltas
> > > > > accordingly." My understanding is that during compaction, for each
> > > > > compacted new segment, we would set its base offset of each batch
> as
> > > the
> > > > > delete horizon, which is the "current system time that cleaner has
> > seen
> > > > so
> > > > > far", and adjust the delta timestamps of each of the inner records
> of
> > > the
> > > > > batch (and practically the deltas will be all negative)?
> > > > >
> > > > > If that's case, could we do some back of the envelope calculation
> on
> > > > what's
> > > > > the possible smallest case of deltas? Note that since we use varInt
> > for
> > > > > delta values for each record, the smaller the negative delta, that
> > > would
> > > > > take more bytes to encode.
> > > > >
> > > > > Guozhang
> > > > >
> > > > > On Wed, Oct 16, 2019 at 6:48 PM Jason Gustafson <
> ja...@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > +1. Thanks Richard.
> > > > > >
> > > > > > On Wed, Oct 16, 2019 at 10:04 AM Richard Yu <
> > > > yohan.richard...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > Want to try to get this KIP wrapped up. So it would be great if
> > we
> > > > can
> > > > > > get
> > > > > > > some votes.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Richard
> > > > > > >
> > > > > > > On Tue, Oct 15, 2019 at 12:58 PM Jun Rao <j...@confluent.io>
> > wrote:
> > > > > > >
> > > > > > > > Hi, Richard,
> > > > > > > >
> > > > > > > > Thanks for the updated KIP. +1 from me.
> > > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > > > On Tue, Oct 15, 2019 at 12:46 PM Richard Yu <
> > > > > > yohan.richard...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Jun,
> > > > > > > > >
> > > > > > > > > I've updated the link accordingly. :)
> > > > > > > > > Here is the updated KIP link:
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+and+transaction+markers+for+approximately+delete.retention.ms+milliseconds
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Richard
> > > > > > > > >
> > > > > > > > > On Mon, Oct 14, 2019 at 5:12 PM Jun Rao <j...@confluent.io>
> > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi, Richard,
> > > > > > > > > >
> > > > > > > > > > Thanks for the KIP. Looks good to me overall. A few minor
> > > > > comments
> > > > > > > > below.
> > > > > > > > > >
> > > > > > > > > > 1. Could you change the title from "Retain tombstones" to
> > > > "Retain
> > > > > > > > > > tombstones and transaction markers" to make it more
> > general?
> > > > > > > > > >
> > > > > > > > > > 2. Could you document which bit in the batch attribute
> will
> > > be
> > > > > used
> > > > > > > for
> > > > > > > > > the
> > > > > > > > > > new flag? The current format of the batch attribute is
> the
> > > > > > following.
> > > > > > > > > >
> > > > > > > > > > *
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------
> > > > > > > > > > *  | Unused (6-15) | Control (5) | Transactional (4) |
> > > > Timestamp
> > > > > > Type
> > > > > > > > > > (3) | Compression Type (0-2) |
> > > > > > > > > >
> > > > > > > > > > *
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------
> > > > > > > > > >
> > > > > > > > > > 3. Could you provide the reasons for the rejected
> > proposals?
> > > > For
> > > > > > > > > > proposal 1, one reason is that it doesn't cover the
> > > transaction
> > > > > > > > > > markers. For proposal 2, one reason is that the interval
> > > record
> > > > > > > header
> > > > > > > > > > could be exposed to the clients.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Jun
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Oct 14, 2019 at 4:42 PM Richard Yu <
> > > > > > > yohan.richard...@gmail.com
> > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi all,
> > > > > > > > > > >
> > > > > > > > > > > The discussion for KIP-534 seems to have concluded.
> > > > > > > > > > > So I wish to vote this in so that we can get it done.
> > Its a
> > > > > small
> > > > > > > bug
> > > > > > > > > > fix.
> > > > > > > > > > > :)
> > > > > > > > > > >
> > > > > > > > > > > Below is the KIP link:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+for+approximately+delete.retention.ms+milliseconds
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Richard
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Guozhang
> > > > >
> > > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>
>
> --
> -- Guozhang
>

Reply via email to