Thanks Richard, I'm +1 on the KIP On Thu, Oct 17, 2019 at 3:51 PM Richard Yu <yohan.richard...@gmail.com> wrote:
> Hi Guozhang, Jason, > > I've updated the KIP to include this warning as well. > If there is anything else that we need for it, let me know. :) > Otherwise, we should vote this KIP in. > > Cheers, > Richard > > On Thu, Oct 17, 2019 at 10:41 AM Jason Gustafson <ja...@confluent.io> > wrote: > > > Hi Guozhang, > > > > It's a fair point. For control records, I think it's a non-issue since > they > > are tiny and not batched. So the only case where this might matter is > large > > batch deletions. I think the size difference is not a major issue itself, > > but I think it's worth mentioning in the KIP the risk of exceeding the > max > > message size. I think the code should probably make this more of a soft > > limit when cleaning. We have run into scenarios in the past as well where > > recompression has actually increased message size. We may also want to be > > able to upconvert messages to the new format in the future in the > cleaner. > > > > -Jason > > > > > > > > On Thu, Oct 17, 2019 at 9:08 AM Guozhang Wang <wangg...@gmail.com> > wrote: > > > > > Here's my understanding: when log compaction kicks in, the system time > at > > > the moment would be larger than the message timestamp to be compacted, > so > > > the modification on the batch timestamp would practically be increasing > > its > > > value, and hence the deltas for each inner message would be negative to > > > maintain their actual timestamp. Depending on the time diff between the > > > actual timestamp of the message and the time when log compaction > happens, > > > this negative delta can be large or small since it not long depends on > > the > > > cleaner thread wakeup frequency but also dirty ratio etc. > > > > > > With varInt encoding, the num.bytes needed for encode an int varies > from > > 1 > > > to 5 bytes; before compaction, the deltas should be relatively small > > > positive values compared with the base timestamp, and hence most > likely 1 > > > or 2 bytes needed to encode, after compaction, the deltas could be > > > relatively large negative values that may take more bytes to encode. > > With a > > > record batch of 512 in practice, and suppose after compaction each > record > > > would take 2 more byte for encoding deltas, that would be 1K more per > > > batch. Usually it would not be too big of an issue with reasonable > sized > > > message, but I just wanted to point out this as a potential regression. > > > > > > > > > Guozhang > > > > > > On Wed, Oct 16, 2019 at 9:36 PM Richard Yu <yohan.richard...@gmail.com > > > > > wrote: > > > > > > > Hi Guozhang, > > > > > > > > Your understanding basically is on point. > > > > > > > > I haven't looked into the details for what happens if we change the > > base > > > > timestamp and how its calculated, so I'm not clear on how small the > > delta > > > > (or big) is. > > > > To be fair, would the the delta size pose a big problem if it takes > up > > > more > > > > bytes to encode? > > > > > > > > Cheers, > > > > Richard > > > > > > > > On Wed, Oct 16, 2019 at 7:36 PM Guozhang Wang <wangg...@gmail.com> > > > wrote: > > > > > > > > > Hello Richard, > > > > > > > > > > Thanks for the KIP, I just have one clarification regarding "So the > > > idea > > > > is > > > > > to set the base timestamp to the delete horizon and adjust the > deltas > > > > > accordingly." My understanding is that during compaction, for each > > > > > compacted new segment, we would set its base offset of each batch > as > > > the > > > > > delete horizon, which is the "current system time that cleaner has > > seen > > > > so > > > > > far", and adjust the delta timestamps of each of the inner records > of > > > the > > > > > batch (and practically the deltas will be all negative)? > > > > > > > > > > If that's case, could we do some back of the envelope calculation > on > > > > what's > > > > > the possible smallest case of deltas? Note that since we use varInt > > for > > > > > delta values for each record, the smaller the negative delta, that > > > would > > > > > take more bytes to encode. > > > > > > > > > > Guozhang > > > > > > > > > > On Wed, Oct 16, 2019 at 6:48 PM Jason Gustafson < > ja...@confluent.io> > > > > > wrote: > > > > > > > > > > > +1. Thanks Richard. > > > > > > > > > > > > On Wed, Oct 16, 2019 at 10:04 AM Richard Yu < > > > > yohan.richard...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > Want to try to get this KIP wrapped up. So it would be great if > > we > > > > can > > > > > > get > > > > > > > some votes. > > > > > > > > > > > > > > Cheers, > > > > > > > Richard > > > > > > > > > > > > > > On Tue, Oct 15, 2019 at 12:58 PM Jun Rao <j...@confluent.io> > > wrote: > > > > > > > > > > > > > > > Hi, Richard, > > > > > > > > > > > > > > > > Thanks for the updated KIP. +1 from me. > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > On Tue, Oct 15, 2019 at 12:46 PM Richard Yu < > > > > > > yohan.richard...@gmail.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi Jun, > > > > > > > > > > > > > > > > > > I've updated the link accordingly. :) > > > > > > > > > Here is the updated KIP link: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+and+transaction+markers+for+approximately+delete.retention.ms+milliseconds > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > Richard > > > > > > > > > > > > > > > > > > On Mon, Oct 14, 2019 at 5:12 PM Jun Rao <j...@confluent.io> > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi, Richard, > > > > > > > > > > > > > > > > > > > > Thanks for the KIP. Looks good to me overall. A few minor > > > > > comments > > > > > > > > below. > > > > > > > > > > > > > > > > > > > > 1. Could you change the title from "Retain tombstones" to > > > > "Retain > > > > > > > > > > tombstones and transaction markers" to make it more > > general? > > > > > > > > > > > > > > > > > > > > 2. Could you document which bit in the batch attribute > will > > > be > > > > > used > > > > > > > for > > > > > > > > > the > > > > > > > > > > new flag? The current format of the batch attribute is > the > > > > > > following. > > > > > > > > > > > > > > > > > > > > * > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------------------------- > > > > > > > > > > * | Unused (6-15) | Control (5) | Transactional (4) | > > > > Timestamp > > > > > > Type > > > > > > > > > > (3) | Compression Type (0-2) | > > > > > > > > > > > > > > > > > > > > * > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------------------------- > > > > > > > > > > > > > > > > > > > > 3. Could you provide the reasons for the rejected > > proposals? > > > > For > > > > > > > > > > proposal 1, one reason is that it doesn't cover the > > > transaction > > > > > > > > > > markers. For proposal 2, one reason is that the interval > > > record > > > > > > > header > > > > > > > > > > could be exposed to the clients. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Oct 14, 2019 at 4:42 PM Richard Yu < > > > > > > > yohan.richard...@gmail.com > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > > > The discussion for KIP-534 seems to have concluded. > > > > > > > > > > > So I wish to vote this in so that we can get it done. > > Its a > > > > > small > > > > > > > bug > > > > > > > > > > fix. > > > > > > > > > > > :) > > > > > > > > > > > > > > > > > > > > > > Below is the KIP link: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+for+approximately+delete.retention.ms+milliseconds > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > Richard > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > -- Guozhang > > > > > > > > > > > > > > > > > > -- > > > -- Guozhang > > > > > > -- -- Guozhang