Hi Michael, That whilst sending tombstone and non null value, the consumer can expect only to receive the non-null message only in step (3) is this correct? ---> I do agree with you here.
Becket, Ismael : can you guys review the migration plan listed above using magic byte? Thanks, Mayuresh On Fri, Nov 18, 2016 at 8:58 AM, Michael Pearce <michael.pea...@ig.com> wrote: > Many thanks for this Mayuresh. I don't have any objections. > > I assume we should state: > > That whilst sending tombstone and non null value, the consumer can expect > only to receive the non-null message only in step (3) is this correct? > > Cheers > Mike > > > > Sent using OWA for iPhone > ________________________________________ > From: Mayuresh Gharat <gharatmayures...@gmail.com> > Sent: Thursday, November 17, 2016 5:18:41 PM > To: dev@kafka.apache.org > Subject: Re: [DISCUSS] KIP-87 - Add Compaction Tombstone Flag > > Hi Ismael, > > Thanks for the explanation. > Specially I like this part where in you mentioned we can get rid of the > older null value support for log compaction later on, here : > We can't change semantics of the message format without having a long > transition period. And we can't rely > on people reading documentation or acting on a warning for something so > fundamental. As such, my take is that we need to bump the magic byte. The > good news is > that we don't have to support all versions forever. We have said that we > will support direct upgrades for 2 years. That means that message format > version n could, in theory, be removed 2 years after the it's introduced. > > Just a heads up, I would like to mention that even without bumping magic > byte, we will *NOT* loose zero copy as in the client(x+1) in my explanation > above will convert internally a null value to have a tombstone bit set and > a tombstone bit set to have a null value automatically internally and by > the time we move to version (x+2), the clients would have upgraded. > Obviously if we support a request from consumer(x), we will loose zero copy > but that is the same case with magic byte. > > But if magic byte bump makes life easier for transition for the above > reasons that you explained, I am OK with it since we are going to meet the > end goal down the road :) > > On a side note can we update the doc here on magic byte to say that "*it > should be bumped whenever the message format is changed or the > interpretation of message format (usage of the reserved bits as well) is > changed*". > > > Hi Michael, > > Here is the update plan that we discussed offline yesterday : > > Currently the magic-byte which corresponds to the "message.format.version" > is set to 1. > > 1) On broker it will be set to 1 initially. > > 2) When a producer client sends a message with magic-byte = 2, since the > broker is on magic-byte = 1, we will down convert it, which means if the > tombstone bit is set, the value will be set to null. A consumer > understanding magic-byte = 1, will still work with this. A consumer working > with magic-byte =2 will also be able to understand this, since it > understands the tombstone. > Now there is still the question of supporting a non-tombstone and null > value from producer client with magic-byte = 2.* (I am not sure if we > should support this. Ismael/Becket can comment here)* > > 3) When almost all the clients have upgraded, the message.format.version on > the broker can be changed to 2, where in the down conversion in the above > step will not happen. If at this point we get a consumer request from a > older consumer, we might have to down convert where in we loose zero copy, > but these cases should be rare. > > Becket can you review this plan and add more details if I have > missed/wronged something, before we put it on KIP. > > Thanks, > > Mayuresh > > On Wed, Nov 16, 2016 at 11:07 PM, Michael Pearce <michael.pea...@ig.com> > wrote: > > > Thanks guys, for discussing this offline and getting some consensus. > > > > So its clear for myself and others what is proposed now (i think i > > understand, but want to make sure) > > > > Could i ask either directly update the kip to detail the migration > > strategy, or (re-)state your offline discussed and agreed migration > > strategy based on a magic byte is in this thread. > > > > > > The main original driver for the KIP was to support compaction where > value > > isn't null, based off the discussions on KIP-82 thread. > > > > We should be able to support non-tombstone + null value by the completion > > of the KIP, as we noted when discussing this kip, having logic based on a > > null value isn't very clean and also separates the concerns. > > > > As discussed already though we can split this into KIP-87a and KIP-87b > > > > Where we look to deliver KIP-87a on a compacted topic (to address the > > immediate issues) > > * tombstone + null value > > * tombstone + non-null value > > * non-tombstone + non-null value > > > > Then we can discuss once KIP-87a is completed options later and how we > > support the second part KIP-87b to deliver: > > * non-tombstone + null value > > > > Cheers > > Mike > > > > > > > > ________________________________________ > > From: Becket Qin <becket....@gmail.com> > > Sent: Thursday, November 17, 2016 1:43 AM > > To: dev@kafka.apache.org > > Subject: Re: [DISCUSS] KIP-87 - Add Compaction Tombstone Flag > > > > Renu, Mayuresh and I had an offline discussion, and following is a brief > > summary. > > > > 1. We agreed that not bumping up magic value may result in losing zero > copy > > during migration. > > 2. Given that bumping up magic value is almost free and has benefit of > > avoiding potential performance issue. It is probably worth doing. > > > > One issue we still need to think about is whether we want to support a > > non-tombstone message with null value. > > Currently it is not supported by Kafka. If we allow a non-tombstone null > > value message to exist after KIP-87. The problem is that such message > will > > not be supported by the consumers prior to KIP-87. Because a null value > > will always be interpreted to a tombstone. > > > > One option is that we keep the current way, i.e. do not support such > > message. It would be good to know if there is a concrete use case for > such > > message. If there is not, we can probably just not support it. > > > > Thanks, > > > > JIangjie (Becket) Qin > > > > > > > > On Wed, Nov 16, 2016 at 1:28 PM, Mayuresh Gharat < > > gharatmayures...@gmail.com > > > wrote: > > > > > Hi Ismael, > > > > > > This is something I can think of for migration plan: > > > So the migration plan can look something like this, with up conversion > : > > > > > > 1) Currently lets say we have Broker at version x. > > > 2) Currently we have clients at version x. > > > 3) a) We move the version to Broker(x+1) : supports both tombstone and > > null > > > for log compaction. > > > b) We upgrade the client to version client(x+1) : if in the > producer > > > client(x+1) the value is set to null, we will automatically set the > > > Tombstone bit internally. If the producer client(x+1) sets the > tombstone > > > itself, well and good. For producer client(x), the broker will up > convert > > > to have the tombstone bit. Broker(x+1) is supporting both. Consumer > > > client(x+1) will be aware of this and should be able to handle this. > For > > > consumer client(x) we will down convert the message on the broker side. > > > c) At this point we will have to specify a warning or clearly > specify > > > in docs that this behavior is about to be changed for log compaction. > > > 4) a) In next release of the Broker(x+2), we say that only Tombstone is > > > used for log compaction on the Broker side. Clients(x+1) still is > > > supported. > > > b) We upgrade the client to version client(x+2) : if value is set > to > > > null, tombstone will not be set automatically. The client will have to > > call > > > setTombstone() to actually set the tombstone. > > > > > > We should compare this migration plan with the migration plan for magic > > > byte bump and do whatever looks good. > > > I am just worried that if we go down magic byte route, unless I am > > missing > > > something, it sounds like kafka will be stuck with supporting both null > > > value and tombstone bit for log compaction for life long, which does > not > > > look like a good end state. > > > > > > Thanks, > > > > > > Mayuresh > > > > > > > > > > > > > > > On Wed, Nov 16, 2016 at 9:32 AM, Mayuresh Gharat < > > > gharatmayures...@gmail.com > > > > wrote: > > > > > > > Hi Ismael, > > > > > > > > That's a very good point which I might have not considered earlier. > > > > > > > > Here is a plan that I can think of: > > > > > > > > Stage 1) The broker from now on, up converts the message to have the > > > > tombstone marker. The log compaction thread does log compaction based > > on > > > > both null and tombstone marker. This is our transition period. > > > > Stage 2) The next release we only say that log compaction is based on > > > > tombstone marker. (Open source kafka makes this as a policy). By this > > > time, > > > > the organization which is moving to this release will be sure that > they > > > > have gone through the entire transition period. > > > > > > > > My only goal of doing this is that Kafka clearly specifies the end > > state > > > > about what log compaction means (is it null value or a tombstone > > marker, > > > > but not both). > > > > > > > > What do you think? > > > > > > > > Thanks, > > > > > > > > Mayuresh > > > > . > > > > > > > > On Wed, Nov 16, 2016 at 9:17 AM, Ismael Juma <ism...@juma.me.uk> > > wrote: > > > > > > > >> One comment below. > > > >> > > > >> On Wed, Nov 16, 2016 at 5:08 PM, Mayuresh Gharat < > > > >> gharatmayures...@gmail.com > > > >> > wrote: > > > >> > > > >> > - If we don't bump up the magic byte, on the broker side, the > > > broker > > > >> > will always have to look at both tombstone bit and the value > when > > > do > > > >> the > > > >> > compaction. Assuming we do not bump up the magic byte, > > > >> > imagine the broker sees a message which does not have a > tombstone > > > bit > > > >> > set. The broker does not know when the message was produced > (i.e. > > > >> > whether > > > >> > the message has been up converted or not), it has to take a > > further > > > >> > look at > > > >> > the value to see if it is null or not in order to determine if > it > > > is > > > >> a > > > >> > tombstone. The same logic has to be put on the consumer as well > > > >> because > > > >> > the > > > >> > consumer does not know if the message has been up converted or > > not. > > > >> > - If we upconvert while appending, this is not the case, > > right? > > > >> > > > >> > > > >> If I understand you correctly, this is not sufficient because the > log > > > may > > > >> have messages appended before it was upgraded to include KIP-87. > > > >> > > > >> Ismael > > > >> > > > > > > > > > > > > > > > > -- > > > > -Regards, > > > > Mayuresh R. Gharat > > > > (862) 250-7125 > > > > > > > > > > > > > > > > -- > > > -Regards, > > > Mayuresh R. Gharat > > > (862) 250-7125 > > > > > The information contained in this email is strictly confidential and for > > the use of the addressee only, unless otherwise indicated. If you are not > > the intended recipient, please do not read, copy, use or disclose to > others > > this message or any attachment. Please also notify the sender by replying > > to this email or by telephone (+44(020 7896 0011) and then delete the > email > > and any copies of it. Opinions, conclusion (etc) that do not relate to > the > > official business of this company shall be understood as neither given > nor > > endorsed by it. IG is a trading name of IG Markets Limited (a company > > registered in England and Wales, company number 04008957) and IG Index > > Limited (a company registered in England and Wales, company number > > 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, > > London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG > > Index Limited (register number 114059) are authorised and regulated by > the > > Financial Conduct Authority. > > > > > > -- > -Regards, > Mayuresh R. Gharat > (862) 250-7125 > The information contained in this email is strictly confidential and for > the use of the addressee only, unless otherwise indicated. If you are not > the intended recipient, please do not read, copy, use or disclose to others > this message or any attachment. Please also notify the sender by replying > to this email or by telephone (+44(020 7896 0011) and then delete the email > and any copies of it. Opinions, conclusion (etc) that do not relate to the > official business of this company shall be understood as neither given nor > endorsed by it. IG is a trading name of IG Markets Limited (a company > registered in England and Wales, company number 04008957) and IG Index > Limited (a company registered in England and Wales, company number > 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, > London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG > Index Limited (register number 114059) are authorised and regulated by the > Financial Conduct Authority. > -- -Regards, Mayuresh R. Gharat (862) 250-7125