Hi David. Thanks for the feedback. I have updated the KIP as you suggested.
On Mon, Feb 13, 2023 at 7:34 AM David Jacot <dja...@confluent.io.invalid> wrote: > Hi Calvin, > > 01. Should we update the KIP to mention this? > > 09. The upgrade part is still not very clear in my opinion. It would be > good to clearly mention how we are going to handle the upgrade of the > AlterPartition API and the Fetch API. For the former, the broker uses the > ApiVersions API to discover the supported versions of the controller so it > will use the appropriate version based on this. For the later, we use the > IBP/metadata version to gate the version of the fetch request used. > Therefore, the version 14 will be used only when the IBP/metadata version > is upgraded. > > Besides those two points, the KIP looks good to me. > > Thanks, > David > > On Fri, Feb 10, 2023 at 7:47 PM Calvin Liu <ca...@confluent.io.invalid> > wrote: > > > Hi David, > > 01. It is a good idea to fence some delayed Fetch requests from the > reboot > > followers. > > 06. Ack > > 09. Rephrased the sentence to let the ZK controller ignore the broker > > epoch. > > > > Thanks, > > Calvin. > > > > On Fri, Feb 10, 2023 at 6:11 AM David Jacot <dja...@confluent.io.invalid > > > > wrote: > > > > > Hi Calvin, > > > > > > 01. Yeah, this should generally not happen. I was wondering if it could > > > happen in the case of a partial failure of a broker for instance. > > > Generally, I think that being defensive on the leader side does not > hurt > > > here. I am perhaps too extreme here. > > > > > > 03. NewIsr does not need to be ignorable because we will only set it > when > > > an old version is used. > > > > > > 06. It is a bit weird to have ReplicaId and BrokerId in the final > > schema. I > > > would remove ReplicaId. The comment is enough to explain the renaming. > > > > > > 09. Could you elaborate on "As the AlterPartition and Fetch requests > are > > > shared between ZK and Kraft mode, the related field will keep empty in > ZK > > > mode and will not be used."? I would do it the other way around, I > think > > > that the followers and the leader should always populare all the fields > > but > > > the ZK controller should not take it into account. It will be easier > like > > > this vs having to put gates everywhere. > > > > > > Best, > > > David > > > > > > On Thu, Feb 9, 2023 at 7:27 PM Calvin Liu <ca...@confluent.io.invalid> > > > wrote: > > > > > > > Hi Alexandre, > > > > 104: I am not aware of any proposals around the up-to-date replica > > > > election. It can be a good place to dig deeper. > > > > 105: Yes, it almost puts nothing on the performance. > > > > > > > > Thanks, > > > > Calvin > > > > > > > > On Thu, Feb 9, 2023 at 1:26 AM Alexandre Dupriez < > > > > alexandre.dupr...@gmail.com> wrote: > > > > > > > > > Hi, Calvin, > > > > > > > > > > Many thanks for the replies. > > > > > > > > > > 103. Thanks for confirming the testing strategy. Exercising the > right > > > > > sequence of requests is a good idea. > > > > > > > > > > 104. Out of scope here, but the KIP reminds that currently, the > > > > > controller does not attempt to detect data loss on the latest > replica > > > > > in the ISR and to choose the most up-to-date replica as a leader. > Is > > > > > this an area where the community wants to invest in building a > > > > > potential solution? > > > > > > > > > > 105. I assume any performance impact of the change in Fetch > requests > > > > > is marginal if even noticeable at all. > > > > > > > > > > Thanks, > > > > > Alexandre > > > > > > > > > > Le mer. 8 févr. 2023 à 23:57, Calvin Liu > <ca...@confluent.io.invalid > > > > > > a > > > > > écrit : > > > > > > > > > > > > For Jun: > > > > > > --- Since this KIP changes the inter-broker protocol, should we > > bump > > > up > > > > > the > > > > > > metadata version (the equivalent of IBP) during the upgrade? > > > > > > Yes, we can. > > > > > > > > > > > > For Artem: > > > > > > --- Could you elaborate on the behavior during rolls in the > > > > Compatibility > > > > > > section? > > > > > > We can update the metadata version(IBP) to make sure the brokers > > can > > > > have > > > > > > the latest code. > > > > > > Also, during the IBP upgrade, the leader can just skip setting > > broker > > > > > epoch > > > > > > or set -1 for the brokers that have not started using the new > APIs. > > > The > > > > > > controller can bypass the check for such brokers so that the > > cluster > > > > can > > > > > > still change ISR during the upgrade. > > > > > > > > > > > > --- Also for compatibility it's probably going to be easier to > just > > > > add a > > > > > > new > > > > > > array of epochs in addition to the existing array of broker ids, > > > > instead > > > > > of > > > > > > removing one field and adding another. > > > > > > We can do it both ways. Just prefer to replace it with a new > field > > > for > > > > > > better readability. > > > > > > > > > > > > --- The KIP mentions that we would explicitly do something > special > > in > > > > ZK > > > > > > mode > > > > > > in order to not implement new functionality. > > > > > > It can be as simple as skipping setting the broker epoch if it is > > in > > > ZK > > > > > > mode. > > > > > > > > > > > > For Alexandre: > > > > > > 100: Yes, you are right, it does not require a controller > > movement. I > > > > > > removed the controller movement part when quoting the scenario > from > > > the > > > > > > KAFKA-14139. > > > > > > 101: In the current code, you can't remove the last replica in > the > > > ISR. > > > > > > But, any damage to this replica will result in data loss. > > > > > > 102: Yeah, I understand your concern. It is definitely great to > > have > > > a > > > > > fix > > > > > > in ZK, but considering the KAFKA-14139 is a rare case, it has > > higher > > > > ROE > > > > > > for applying to just Kraft mode. > > > > > > 103: I found that it is much easier to repro the bug in Kraft > mode > > by > > > > > > feeding the controller with a given sequence of events/requests. > So > > > we > > > > > may > > > > > > just need a unit test to cover the case. > > > > > > > > > > > > For David: > > > > > > 01: > > > > > > Can you help explain why the follower can have a stale broker > > epoch? > > > Is > > > > > it > > > > > > because the registration request has any delay? But the broker > will > > > not > > > > > > start fetching before the registration success. > > > > > > On the other hand, if the follower fetches with the stale broker > > > epoch, > > > > > is > > > > > > it good enough to ask the leader to hold to include this follower > > > until > > > > > the > > > > > > fetch is consistent with the metadata cache? > > > > > > 02: Ack > > > > > > 03: I think NewIsrWithEpochs is not ignorable. For the deprecated > > > > > > field NewIsr, what is the right way to make it ignorable? Just > mark > > > it > > > > > > ignorable in this change? > > > > > > 04: Ack > > > > > > 05: Ack > > > > > > 06: Ack > > > > > > 07: Yes, will add it. > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Feb 8, 2023 at 6:52 AM David Jacot > > > <dja...@confluent.io.invalid > > > > > > > > > > > wrote: > > > > > > > > > > > > > Hi Calvin, > > > > > > > > > > > > > > Thanks for the KIP! The overall approach looks reasonable to > me. > > I > > > > > have a > > > > > > > few questions/comments: > > > > > > > > > > > > > > 01. I wonder if the leader should also verify the broker epochs > > > based > > > > > on > > > > > > > its metadata cache before sending the AlterPartition request to > > the > > > > > > > controller. Imagine the case where a follower not in the ISR > > would > > > > keep > > > > > > > fetching with a stale broker epoch but with a valid leader > epoch. > > > In > > > > > this > > > > > > > case, the leader would try to add it back to the ISR when the > > > > follower > > > > > > > catches up but the controller would reject it because its > broker > > > > epoch > > > > > is > > > > > > > stale. Until this condition resolves itself, the leader can't > add > > > any > > > > > other > > > > > > > followers back to the ISR because the ISR validation is all or > > > > nothing > > > > > at > > > > > > > the moment. Letting the leader check before sending the > > > > AlterPartition > > > > > > > request would mitigate this. This condition must be rare but > > could > > > > > happen. > > > > > > > We did this in KIP-841. > > > > > > > > > > > > > > 02. In AlterPartitionRequest/AlterPartitionResponse, you need > to > > > > > update the > > > > > > > `validVersions` to `0-3`. > > > > > > > > > > > > > > 03. Personally, I like the `NewIsrWithEpochs` field in > > > > > > > the AlterPartitionRequest. We can handle the backward > > compatibility > > > > in > > > > > the > > > > > > > request builder. Basically, we would always populate > > > NewIsrWithEpochs > > > > > and > > > > > > > the request builder can translate it to NewIsr if an earlier > > > version > > > > is > > > > > > > used. Should the field be ignorable? > > > > > > > > > > > > > > 04. Should NewIsrWithEpochs.BrokerEpoch have -1 as default? > > > > > > > > > > > > > > 05. In FetchRequest/FetchResponse, you need to update the > > > > > `validVersions` > > > > > > > as well. > > > > > > > > > > > > > > 06. It is a little weird to have ReplicaId and BrokerEpoch in > the > > > > > > > FetchRequest. I wonder if we should rename ReplicaId to > BrokerId > > > > > because it > > > > > > > is actually the broker id (even the documentation of the field > > says > > > > > it). > > > > > > > > > > > > > > 07. On the followers, the fetch request version is derived from > > the > > > > > > > metadata version/IBP. As we add a new fetch version, we need to > > > add a > > > > > new > > > > > > > metadata version/IBP as well. > > > > > > > > > > > > > > 08. Regarding KRaft vs ZK, I don't have a strong opinion. ZK is > > on > > > > the > > > > > way > > > > > > > out so not doing it seems fine. If we do this, we could > basically > > > > just > > > > > > > ignore the broker epoch in ZK and it will keep working as > today. > > > > > > > > > > > > > > Best, > > > > > > > David > > > > > > > > > > > > > > On Wed, Feb 8, 2023 at 3:01 PM Alexandre Dupriez < > > > > > > > alexandre.dupr...@gmail.com> wrote: > > > > > > > > > > > > > > > Hi, Calvin, > > > > > > > > > > > > > > > > Thanks for the KIP and fast follow-up. A few questions. > > > > > > > > > > > > > > > > 100. The scenario illustrated in the KIP involves a > controller > > > > > > > > movement. Is this really required? Cannot the scenario occur > > > with a > > > > > > > > similar stale AlterPartition request and the same controller > > > > > > > > throughout? > > > > > > > > > > > > > > > > 101. In the case where card(ISR) = 1 and the last replica > > leaves, > > > > it > > > > > > > > will be re-elected as the leader upon reconnection. If the > > > replica > > > > is > > > > > > > > empty, all data for the partition will be lost. Is this a > > correct > > > > > > > > understanding of the scenario? > > > > > > > > > > > > > > > > 102. I understand that ZK is going to be unsupported soon. > > > However > > > > > for > > > > > > > > as long as it is available to end users, is there any reason > > not > > > to > > > > > > > > support the fix in ZK mode? Arguably, the implementation for > > the > > > > > logic > > > > > > > > to AlterPartition is duplicated for both controller types, > and > > it > > > > may > > > > > > > > be more work than is worth if ZK is fully decommissioned in > the > > > > next > > > > > > > > few months. (Alternatively, is there a plan to back port the > > fix > > > to > > > > > > > > older minor versions?). > > > > > > > > > > > > > > > > 103. The KIP mentions system tests to be used to simulate the > > > race > > > > > > > > condition. Would it be possible to provide more details about > > it? > > > > Do > > > > > > > > we think it worth having this scenario be exercised in the > > > > functional > > > > > > > > tests of the core/server module? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Alexandre > > > > > > > > > > > > > > > > Le mer. 8 févr. 2023 à 03:31, Artem Livshits > > > > > > > > <alivsh...@confluent.io.invalid> a écrit : > > > > > > > > > > > > > > > > > > Hi Calvin, > > > > > > > > > > > > > > > > > > Thank you for the KIP. I have a similar question -- we > need > > to > > > > > support > > > > > > > > > rolling upgrades (when we have some old brokers and some > new > > > > > brokers), > > > > > > > so > > > > > > > > > there could be combinations of old leader - new follower, > new > > > > > leader - > > > > > > > > old > > > > > > > > > follower, new leader - old controller, old leader - new > > > > controller. > > > > > > > > Could > > > > > > > > > you elaborate on the behavior during rolls in the > > Compatibility > > > > > > > section? > > > > > > > > > > > > > > > > > > Also for compatibility it's probably going to be easier to > > just > > > > > add a > > > > > > > new > > > > > > > > > array of epochs in addition to the existing array of broker > > > ids, > > > > > > > instead > > > > > > > > of > > > > > > > > > removing one field and adding another. > > > > > > > > > > > > > > > > > > The KIP mentions that we would explicitly do something > > special > > > in > > > > > ZK > > > > > > > mode > > > > > > > > > in order to not implement new functionality. I think it > may > > be > > > > > easier > > > > > > > to > > > > > > > > > implement functionality for both ZK and KRraft mode than > > adding > > > > > code to > > > > > > > > > disable it in ZK mode. > > > > > > > > > > > > > > > > > > -Artem > > > > > > > > > > > > > > > > > > On Tue, Feb 7, 2023 at 4:58 PM Jun Rao > > > <j...@confluent.io.invalid > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi, Calvin, > > > > > > > > > > > > > > > > > > > > Thanks for the KIP. Looks good to me overall. > > > > > > > > > > > > > > > > > > > > Since this KIP changes the inter-broker protocol, should > we > > > > bump > > > > > up > > > > > > > the > > > > > > > > > > metadata version (the equivalent of IBP) during upgrade? > > > > > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Feb 3, 2023 at 10:55 AM Calvin Liu > > > > > > > <ca...@confluent.io.invalid > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > I'd like to discuss the fix for the broker reboot data > > loss > > > > > > > > KAFKA-14139 > > > > > > > > > > > <https://issues.apache.org/jira/browse/KAFKA-14139>. > > > > > > > > > > > It changes the Fetch and AlterPartition requests to > > include > > > > the > > > > > > > > broker > > > > > > > > > > > epochs. Then the controller can use these epochs to > help > > > > > reject the > > > > > > > > stale > > > > > > > > > > > AlterPartition request. > > > > > > > > > > > Please take a look. Thanks! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-903%3A+Replicas+with+stale+broker+epoch+should+not+be+allowed+to+join+the+ISR > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >