Hi Michael, I have updated the migration section of the KIP. Can you please take a look?
Thanks, Mayuresh On Fri, Nov 18, 2016 at 9:07 AM, Mayuresh Gharat <gharatmayures...@gmail.com > wrote: > Hi Michael, > > That whilst sending tombstone and non null value, the consumer can expect > only to receive the non-null message only in step (3) is this correct? > ---> I do agree with you here. > > Becket, Ismael : can you guys review the migration plan listed above using > magic byte? > > Thanks, > > Mayuresh > > On Fri, Nov 18, 2016 at 8:58 AM, Michael Pearce <michael.pea...@ig.com> > wrote: > >> Many thanks for this Mayuresh. I don't have any objections. >> >> I assume we should state: >> >> That whilst sending tombstone and non null value, the consumer can expect >> only to receive the non-null message only in step (3) is this correct? >> >> Cheers >> Mike >> >> >> >> Sent using OWA for iPhone >> ________________________________________ >> From: Mayuresh Gharat <gharatmayures...@gmail.com> >> Sent: Thursday, November 17, 2016 5:18:41 PM >> To: dev@kafka.apache.org >> Subject: Re: [DISCUSS] KIP-87 - Add Compaction Tombstone Flag >> >> Hi Ismael, >> >> Thanks for the explanation. >> Specially I like this part where in you mentioned we can get rid of the >> older null value support for log compaction later on, here : >> We can't change semantics of the message format without having a long >> transition period. And we can't rely >> on people reading documentation or acting on a warning for something so >> fundamental. As such, my take is that we need to bump the magic byte. The >> good news is >> that we don't have to support all versions forever. We have said that we >> will support direct upgrades for 2 years. That means that message format >> version n could, in theory, be removed 2 years after the it's introduced. >> >> Just a heads up, I would like to mention that even without bumping magic >> byte, we will *NOT* loose zero copy as in the client(x+1) in my >> explanation >> above will convert internally a null value to have a tombstone bit set and >> a tombstone bit set to have a null value automatically internally and by >> the time we move to version (x+2), the clients would have upgraded. >> Obviously if we support a request from consumer(x), we will loose zero >> copy >> but that is the same case with magic byte. >> >> But if magic byte bump makes life easier for transition for the above >> reasons that you explained, I am OK with it since we are going to meet the >> end goal down the road :) >> >> On a side note can we update the doc here on magic byte to say that "*it >> should be bumped whenever the message format is changed or the >> interpretation of message format (usage of the reserved bits as well) is >> changed*". >> >> >> Hi Michael, >> >> Here is the update plan that we discussed offline yesterday : >> >> Currently the magic-byte which corresponds to the "message.format.version" >> is set to 1. >> >> 1) On broker it will be set to 1 initially. >> >> 2) When a producer client sends a message with magic-byte = 2, since the >> broker is on magic-byte = 1, we will down convert it, which means if the >> tombstone bit is set, the value will be set to null. A consumer >> understanding magic-byte = 1, will still work with this. A consumer >> working >> with magic-byte =2 will also be able to understand this, since it >> understands the tombstone. >> Now there is still the question of supporting a non-tombstone and null >> value from producer client with magic-byte = 2.* (I am not sure if we >> should support this. Ismael/Becket can comment here)* >> >> 3) When almost all the clients have upgraded, the message.format.version >> on >> the broker can be changed to 2, where in the down conversion in the above >> step will not happen. If at this point we get a consumer request from a >> older consumer, we might have to down convert where in we loose zero copy, >> but these cases should be rare. >> >> Becket can you review this plan and add more details if I have >> missed/wronged something, before we put it on KIP. >> >> Thanks, >> >> Mayuresh >> >> On Wed, Nov 16, 2016 at 11:07 PM, Michael Pearce <michael.pea...@ig.com> >> wrote: >> >> > Thanks guys, for discussing this offline and getting some consensus. >> > >> > So its clear for myself and others what is proposed now (i think i >> > understand, but want to make sure) >> > >> > Could i ask either directly update the kip to detail the migration >> > strategy, or (re-)state your offline discussed and agreed migration >> > strategy based on a magic byte is in this thread. >> > >> > >> > The main original driver for the KIP was to support compaction where >> value >> > isn't null, based off the discussions on KIP-82 thread. >> > >> > We should be able to support non-tombstone + null value by the >> completion >> > of the KIP, as we noted when discussing this kip, having logic based on >> a >> > null value isn't very clean and also separates the concerns. >> > >> > As discussed already though we can split this into KIP-87a and KIP-87b >> > >> > Where we look to deliver KIP-87a on a compacted topic (to address the >> > immediate issues) >> > * tombstone + null value >> > * tombstone + non-null value >> > * non-tombstone + non-null value >> > >> > Then we can discuss once KIP-87a is completed options later and how we >> > support the second part KIP-87b to deliver: >> > * non-tombstone + null value >> > >> > Cheers >> > Mike >> > >> > >> > >> > ________________________________________ >> > From: Becket Qin <becket....@gmail.com> >> > Sent: Thursday, November 17, 2016 1:43 AM >> > To: dev@kafka.apache.org >> > Subject: Re: [DISCUSS] KIP-87 - Add Compaction Tombstone Flag >> > >> > Renu, Mayuresh and I had an offline discussion, and following is a brief >> > summary. >> > >> > 1. We agreed that not bumping up magic value may result in losing zero >> copy >> > during migration. >> > 2. Given that bumping up magic value is almost free and has benefit of >> > avoiding potential performance issue. It is probably worth doing. >> > >> > One issue we still need to think about is whether we want to support a >> > non-tombstone message with null value. >> > Currently it is not supported by Kafka. If we allow a non-tombstone null >> > value message to exist after KIP-87. The problem is that such message >> will >> > not be supported by the consumers prior to KIP-87. Because a null value >> > will always be interpreted to a tombstone. >> > >> > One option is that we keep the current way, i.e. do not support such >> > message. It would be good to know if there is a concrete use case for >> such >> > message. If there is not, we can probably just not support it. >> > >> > Thanks, >> > >> > JIangjie (Becket) Qin >> > >> > >> > >> > On Wed, Nov 16, 2016 at 1:28 PM, Mayuresh Gharat < >> > gharatmayures...@gmail.com >> > > wrote: >> > >> > > Hi Ismael, >> > > >> > > This is something I can think of for migration plan: >> > > So the migration plan can look something like this, with up >> conversion : >> > > >> > > 1) Currently lets say we have Broker at version x. >> > > 2) Currently we have clients at version x. >> > > 3) a) We move the version to Broker(x+1) : supports both tombstone and >> > null >> > > for log compaction. >> > > b) We upgrade the client to version client(x+1) : if in the >> producer >> > > client(x+1) the value is set to null, we will automatically set the >> > > Tombstone bit internally. If the producer client(x+1) sets the >> tombstone >> > > itself, well and good. For producer client(x), the broker will up >> convert >> > > to have the tombstone bit. Broker(x+1) is supporting both. Consumer >> > > client(x+1) will be aware of this and should be able to handle this. >> For >> > > consumer client(x) we will down convert the message on the broker >> side. >> > > c) At this point we will have to specify a warning or clearly >> specify >> > > in docs that this behavior is about to be changed for log compaction. >> > > 4) a) In next release of the Broker(x+2), we say that only Tombstone >> is >> > > used for log compaction on the Broker side. Clients(x+1) still is >> > > supported. >> > > b) We upgrade the client to version client(x+2) : if value is set >> to >> > > null, tombstone will not be set automatically. The client will have to >> > call >> > > setTombstone() to actually set the tombstone. >> > > >> > > We should compare this migration plan with the migration plan for >> magic >> > > byte bump and do whatever looks good. >> > > I am just worried that if we go down magic byte route, unless I am >> > missing >> > > something, it sounds like kafka will be stuck with supporting both >> null >> > > value and tombstone bit for log compaction for life long, which does >> not >> > > look like a good end state. >> > > >> > > Thanks, >> > > >> > > Mayuresh >> > > >> > > >> > > >> > > >> > > On Wed, Nov 16, 2016 at 9:32 AM, Mayuresh Gharat < >> > > gharatmayures...@gmail.com >> > > > wrote: >> > > >> > > > Hi Ismael, >> > > > >> > > > That's a very good point which I might have not considered earlier. >> > > > >> > > > Here is a plan that I can think of: >> > > > >> > > > Stage 1) The broker from now on, up converts the message to have the >> > > > tombstone marker. The log compaction thread does log compaction >> based >> > on >> > > > both null and tombstone marker. This is our transition period. >> > > > Stage 2) The next release we only say that log compaction is based >> on >> > > > tombstone marker. (Open source kafka makes this as a policy). By >> this >> > > time, >> > > > the organization which is moving to this release will be sure that >> they >> > > > have gone through the entire transition period. >> > > > >> > > > My only goal of doing this is that Kafka clearly specifies the end >> > state >> > > > about what log compaction means (is it null value or a tombstone >> > marker, >> > > > but not both). >> > > > >> > > > What do you think? >> > > > >> > > > Thanks, >> > > > >> > > > Mayuresh >> > > > . >> > > > >> > > > On Wed, Nov 16, 2016 at 9:17 AM, Ismael Juma <ism...@juma.me.uk> >> > wrote: >> > > > >> > > >> One comment below. >> > > >> >> > > >> On Wed, Nov 16, 2016 at 5:08 PM, Mayuresh Gharat < >> > > >> gharatmayures...@gmail.com >> > > >> > wrote: >> > > >> >> > > >> > - If we don't bump up the magic byte, on the broker side, the >> > > broker >> > > >> > will always have to look at both tombstone bit and the value >> when >> > > do >> > > >> the >> > > >> > compaction. Assuming we do not bump up the magic byte, >> > > >> > imagine the broker sees a message which does not have a >> tombstone >> > > bit >> > > >> > set. The broker does not know when the message was produced >> (i.e. >> > > >> > whether >> > > >> > the message has been up converted or not), it has to take a >> > further >> > > >> > look at >> > > >> > the value to see if it is null or not in order to determine >> if it >> > > is >> > > >> a >> > > >> > tombstone. The same logic has to be put on the consumer as >> well >> > > >> because >> > > >> > the >> > > >> > consumer does not know if the message has been up converted or >> > not. >> > > >> > - If we upconvert while appending, this is not the case, >> > right? >> > > >> >> > > >> >> > > >> If I understand you correctly, this is not sufficient because the >> log >> > > may >> > > >> have messages appended before it was upgraded to include KIP-87. >> > > >> >> > > >> Ismael >> > > >> >> > > > >> > > > >> > > > >> > > > -- >> > > > -Regards, >> > > > Mayuresh R. Gharat >> > > > (862) 250-7125 >> > > > >> > > >> > > >> > > >> > > -- >> > > -Regards, >> > > Mayuresh R. Gharat >> > > (862) 250-7125 >> > > >> > The information contained in this email is strictly confidential and for >> > the use of the addressee only, unless otherwise indicated. If you are >> not >> > the intended recipient, please do not read, copy, use or disclose to >> others >> > this message or any attachment. Please also notify the sender by >> replying >> > to this email or by telephone (+44(020 7896 0011) and then delete the >> email >> > and any copies of it. Opinions, conclusion (etc) that do not relate to >> the >> > official business of this company shall be understood as neither given >> nor >> > endorsed by it. IG is a trading name of IG Markets Limited (a company >> > registered in England and Wales, company number 04008957) and IG Index >> > Limited (a company registered in England and Wales, company number >> > 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, >> > London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG >> > Index Limited (register number 114059) are authorised and regulated by >> the >> > Financial Conduct Authority. >> > >> >> >> >> -- >> -Regards, >> Mayuresh R. Gharat >> (862) 250-7125 >> The information contained in this email is strictly confidential and for >> the use of the addressee only, unless otherwise indicated. If you are not >> the intended recipient, please do not read, copy, use or disclose to others >> this message or any attachment. Please also notify the sender by replying >> to this email or by telephone (+44(020 7896 0011) and then delete the email >> and any copies of it. Opinions, conclusion (etc) that do not relate to the >> official business of this company shall be understood as neither given nor >> endorsed by it. IG is a trading name of IG Markets Limited (a company >> registered in England and Wales, company number 04008957) and IG Index >> Limited (a company registered in England and Wales, company number >> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, >> London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG >> Index Limited (register number 114059) are authorised and regulated by the >> Financial Conduct Authority. >> > > > > -- > -Regards, > Mayuresh R. Gharat > (862) 250-7125 > -- -Regards, Mayuresh R. Gharat (862) 250-7125