Thanks for the reply Jason. On Thu, Oct 17, 2019 at 10:40 AM Jason Gustafson <ja...@confluent.io> wrote:
> Hi Guozhang, > > It's a fair point. For control records, I think it's a non-issue since they > are tiny and not batched. So the only case where this might matter is large > batch deletions. I think the size difference is not a major issue itself, > but I think it's worth mentioning in the KIP the risk of exceeding the max > message size. I think the code should probably make this more of a soft > limit when cleaning. We have run into scenarios in the past as well where > recompression has actually increased message size. We may also want to be > able to upconvert messages to the new format in the future in the cleaner. > > -Jason > > > > On Thu, Oct 17, 2019 at 9:08 AM Guozhang Wang <wangg...@gmail.com> wrote: > > > Here's my understanding: when log compaction kicks in, the system time at > > the moment would be larger than the message timestamp to be compacted, so > > the modification on the batch timestamp would practically be increasing > its > > value, and hence the deltas for each inner message would be negative to > > maintain their actual timestamp. Depending on the time diff between the > > actual timestamp of the message and the time when log compaction happens, > > this negative delta can be large or small since it not long depends on > the > > cleaner thread wakeup frequency but also dirty ratio etc. > > > > With varInt encoding, the num.bytes needed for encode an int varies from > 1 > > to 5 bytes; before compaction, the deltas should be relatively small > > positive values compared with the base timestamp, and hence most likely 1 > > or 2 bytes needed to encode, after compaction, the deltas could be > > relatively large negative values that may take more bytes to encode. > With a > > record batch of 512 in practice, and suppose after compaction each record > > would take 2 more byte for encoding deltas, that would be 1K more per > > batch. Usually it would not be too big of an issue with reasonable sized > > message, but I just wanted to point out this as a potential regression. > > > > > > Guozhang > > > > On Wed, Oct 16, 2019 at 9:36 PM Richard Yu <yohan.richard...@gmail.com> > > wrote: > > > > > Hi Guozhang, > > > > > > Your understanding basically is on point. > > > > > > I haven't looked into the details for what happens if we change the > base > > > timestamp and how its calculated, so I'm not clear on how small the > delta > > > (or big) is. > > > To be fair, would the the delta size pose a big problem if it takes up > > more > > > bytes to encode? > > > > > > Cheers, > > > Richard > > > > > > On Wed, Oct 16, 2019 at 7:36 PM Guozhang Wang <wangg...@gmail.com> > > wrote: > > > > > > > Hello Richard, > > > > > > > > Thanks for the KIP, I just have one clarification regarding "So the > > idea > > > is > > > > to set the base timestamp to the delete horizon and adjust the deltas > > > > accordingly." My understanding is that during compaction, for each > > > > compacted new segment, we would set its base offset of each batch as > > the > > > > delete horizon, which is the "current system time that cleaner has > seen > > > so > > > > far", and adjust the delta timestamps of each of the inner records of > > the > > > > batch (and practically the deltas will be all negative)? > > > > > > > > If that's case, could we do some back of the envelope calculation on > > > what's > > > > the possible smallest case of deltas? Note that since we use varInt > for > > > > delta values for each record, the smaller the negative delta, that > > would > > > > take more bytes to encode. > > > > > > > > Guozhang > > > > > > > > On Wed, Oct 16, 2019 at 6:48 PM Jason Gustafson <ja...@confluent.io> > > > > wrote: > > > > > > > > > +1. Thanks Richard. > > > > > > > > > > On Wed, Oct 16, 2019 at 10:04 AM Richard Yu < > > > yohan.richard...@gmail.com> > > > > > wrote: > > > > > > > > > > > Hi all, > > > > > > > > > > > > Want to try to get this KIP wrapped up. So it would be great if > we > > > can > > > > > get > > > > > > some votes. > > > > > > > > > > > > Cheers, > > > > > > Richard > > > > > > > > > > > > On Tue, Oct 15, 2019 at 12:58 PM Jun Rao <j...@confluent.io> > wrote: > > > > > > > > > > > > > Hi, Richard, > > > > > > > > > > > > > > Thanks for the updated KIP. +1 from me. > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > On Tue, Oct 15, 2019 at 12:46 PM Richard Yu < > > > > > yohan.richard...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > Hi Jun, > > > > > > > > > > > > > > > > I've updated the link accordingly. :) > > > > > > > > Here is the updated KIP link: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+and+transaction+markers+for+approximately+delete.retention.ms+milliseconds > > > > > > > > > > > > > > > > Cheers, > > > > > > > > Richard > > > > > > > > > > > > > > > > On Mon, Oct 14, 2019 at 5:12 PM Jun Rao <j...@confluent.io> > > > wrote: > > > > > > > > > > > > > > > > > Hi, Richard, > > > > > > > > > > > > > > > > > > Thanks for the KIP. Looks good to me overall. A few minor > > > > comments > > > > > > > below. > > > > > > > > > > > > > > > > > > 1. Could you change the title from "Retain tombstones" to > > > "Retain > > > > > > > > > tombstones and transaction markers" to make it more > general? > > > > > > > > > > > > > > > > > > 2. Could you document which bit in the batch attribute will > > be > > > > used > > > > > > for > > > > > > > > the > > > > > > > > > new flag? The current format of the batch attribute is the > > > > > following. > > > > > > > > > > > > > > > > > > * > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------------------------- > > > > > > > > > * | Unused (6-15) | Control (5) | Transactional (4) | > > > Timestamp > > > > > Type > > > > > > > > > (3) | Compression Type (0-2) | > > > > > > > > > > > > > > > > > > * > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------------------------- > > > > > > > > > > > > > > > > > > 3. Could you provide the reasons for the rejected > proposals? > > > For > > > > > > > > > proposal 1, one reason is that it doesn't cover the > > transaction > > > > > > > > > markers. For proposal 2, one reason is that the interval > > record > > > > > > header > > > > > > > > > could be exposed to the clients. > > > > > > > > > > > > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Oct 14, 2019 at 4:42 PM Richard Yu < > > > > > > yohan.richard...@gmail.com > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > The discussion for KIP-534 seems to have concluded. > > > > > > > > > > So I wish to vote this in so that we can get it done. > Its a > > > > small > > > > > > bug > > > > > > > > > fix. > > > > > > > > > > :) > > > > > > > > > > > > > > > > > > > > Below is the KIP link: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+for+approximately+delete.retention.ms+milliseconds > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > Richard > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > -- Guozhang > > > > > > > > > > > > > -- > > -- Guozhang > > > -- -- Guozhang