Hi Alexandre, 104: I am not aware of any proposals around the up-to-date replica election. It can be a good place to dig deeper. 105: Yes, it almost puts nothing on the performance.
Thanks, Calvin On Thu, Feb 9, 2023 at 1:26 AM Alexandre Dupriez < alexandre.dupr...@gmail.com> wrote: > Hi, Calvin, > > Many thanks for the replies. > > 103. Thanks for confirming the testing strategy. Exercising the right > sequence of requests is a good idea. > > 104. Out of scope here, but the KIP reminds that currently, the > controller does not attempt to detect data loss on the latest replica > in the ISR and to choose the most up-to-date replica as a leader. Is > this an area where the community wants to invest in building a > potential solution? > > 105. I assume any performance impact of the change in Fetch requests > is marginal if even noticeable at all. > > Thanks, > Alexandre > > Le mer. 8 févr. 2023 à 23:57, Calvin Liu <ca...@confluent.io.invalid> a > écrit : > > > > For Jun: > > --- Since this KIP changes the inter-broker protocol, should we bump up > the > > metadata version (the equivalent of IBP) during the upgrade? > > Yes, we can. > > > > For Artem: > > --- Could you elaborate on the behavior during rolls in the Compatibility > > section? > > We can update the metadata version(IBP) to make sure the brokers can have > > the latest code. > > Also, during the IBP upgrade, the leader can just skip setting broker > epoch > > or set -1 for the brokers that have not started using the new APIs. The > > controller can bypass the check for such brokers so that the cluster can > > still change ISR during the upgrade. > > > > --- Also for compatibility it's probably going to be easier to just add a > > new > > array of epochs in addition to the existing array of broker ids, instead > of > > removing one field and adding another. > > We can do it both ways. Just prefer to replace it with a new field for > > better readability. > > > > --- The KIP mentions that we would explicitly do something special in ZK > > mode > > in order to not implement new functionality. > > It can be as simple as skipping setting the broker epoch if it is in ZK > > mode. > > > > For Alexandre: > > 100: Yes, you are right, it does not require a controller movement. I > > removed the controller movement part when quoting the scenario from the > > KAFKA-14139. > > 101: In the current code, you can't remove the last replica in the ISR. > > But, any damage to this replica will result in data loss. > > 102: Yeah, I understand your concern. It is definitely great to have a > fix > > in ZK, but considering the KAFKA-14139 is a rare case, it has higher ROE > > for applying to just Kraft mode. > > 103: I found that it is much easier to repro the bug in Kraft mode by > > feeding the controller with a given sequence of events/requests. So we > may > > just need a unit test to cover the case. > > > > For David: > > 01: > > Can you help explain why the follower can have a stale broker epoch? Is > it > > because the registration request has any delay? But the broker will not > > start fetching before the registration success. > > On the other hand, if the follower fetches with the stale broker epoch, > is > > it good enough to ask the leader to hold to include this follower until > the > > fetch is consistent with the metadata cache? > > 02: Ack > > 03: I think NewIsrWithEpochs is not ignorable. For the deprecated > > field NewIsr, what is the right way to make it ignorable? Just mark it > > ignorable in this change? > > 04: Ack > > 05: Ack > > 06: Ack > > 07: Yes, will add it. > > > > > > > > On Wed, Feb 8, 2023 at 6:52 AM David Jacot <dja...@confluent.io.invalid> > > wrote: > > > > > Hi Calvin, > > > > > > Thanks for the KIP! The overall approach looks reasonable to me. I > have a > > > few questions/comments: > > > > > > 01. I wonder if the leader should also verify the broker epochs based > on > > > its metadata cache before sending the AlterPartition request to the > > > controller. Imagine the case where a follower not in the ISR would keep > > > fetching with a stale broker epoch but with a valid leader epoch. In > this > > > case, the leader would try to add it back to the ISR when the follower > > > catches up but the controller would reject it because its broker epoch > is > > > stale. Until this condition resolves itself, the leader can't add any > other > > > followers back to the ISR because the ISR validation is all or nothing > at > > > the moment. Letting the leader check before sending the AlterPartition > > > request would mitigate this. This condition must be rare but could > happen. > > > We did this in KIP-841. > > > > > > 02. In AlterPartitionRequest/AlterPartitionResponse, you need to > update the > > > `validVersions` to `0-3`. > > > > > > 03. Personally, I like the `NewIsrWithEpochs` field in > > > the AlterPartitionRequest. We can handle the backward compatibility in > the > > > request builder. Basically, we would always populate NewIsrWithEpochs > and > > > the request builder can translate it to NewIsr if an earlier version is > > > used. Should the field be ignorable? > > > > > > 04. Should NewIsrWithEpochs.BrokerEpoch have -1 as default? > > > > > > 05. In FetchRequest/FetchResponse, you need to update the > `validVersions` > > > as well. > > > > > > 06. It is a little weird to have ReplicaId and BrokerEpoch in the > > > FetchRequest. I wonder if we should rename ReplicaId to BrokerId > because it > > > is actually the broker id (even the documentation of the field says > it). > > > > > > 07. On the followers, the fetch request version is derived from the > > > metadata version/IBP. As we add a new fetch version, we need to add a > new > > > metadata version/IBP as well. > > > > > > 08. Regarding KRaft vs ZK, I don't have a strong opinion. ZK is on the > way > > > out so not doing it seems fine. If we do this, we could basically just > > > ignore the broker epoch in ZK and it will keep working as today. > > > > > > Best, > > > David > > > > > > On Wed, Feb 8, 2023 at 3:01 PM Alexandre Dupriez < > > > alexandre.dupr...@gmail.com> wrote: > > > > > > > Hi, Calvin, > > > > > > > > Thanks for the KIP and fast follow-up. A few questions. > > > > > > > > 100. The scenario illustrated in the KIP involves a controller > > > > movement. Is this really required? Cannot the scenario occur with a > > > > similar stale AlterPartition request and the same controller > > > > throughout? > > > > > > > > 101. In the case where card(ISR) = 1 and the last replica leaves, it > > > > will be re-elected as the leader upon reconnection. If the replica is > > > > empty, all data for the partition will be lost. Is this a correct > > > > understanding of the scenario? > > > > > > > > 102. I understand that ZK is going to be unsupported soon. However > for > > > > as long as it is available to end users, is there any reason not to > > > > support the fix in ZK mode? Arguably, the implementation for the > logic > > > > to AlterPartition is duplicated for both controller types, and it may > > > > be more work than is worth if ZK is fully decommissioned in the next > > > > few months. (Alternatively, is there a plan to back port the fix to > > > > older minor versions?). > > > > > > > > 103. The KIP mentions system tests to be used to simulate the race > > > > condition. Would it be possible to provide more details about it? Do > > > > we think it worth having this scenario be exercised in the functional > > > > tests of the core/server module? > > > > > > > > Thanks, > > > > Alexandre > > > > > > > > Le mer. 8 févr. 2023 à 03:31, Artem Livshits > > > > <alivsh...@confluent.io.invalid> a écrit : > > > > > > > > > > Hi Calvin, > > > > > > > > > > Thank you for the KIP. I have a similar question -- we need to > support > > > > > rolling upgrades (when we have some old brokers and some new > brokers), > > > so > > > > > there could be combinations of old leader - new follower, new > leader - > > > > old > > > > > follower, new leader - old controller, old leader - new controller. > > > > Could > > > > > you elaborate on the behavior during rolls in the Compatibility > > > section? > > > > > > > > > > Also for compatibility it's probably going to be easier to just > add a > > > new > > > > > array of epochs in addition to the existing array of broker ids, > > > instead > > > > of > > > > > removing one field and adding another. > > > > > > > > > > The KIP mentions that we would explicitly do something special in > ZK > > > mode > > > > > in order to not implement new functionality. I think it may be > easier > > > to > > > > > implement functionality for both ZK and KRraft mode than adding > code to > > > > > disable it in ZK mode. > > > > > > > > > > -Artem > > > > > > > > > > On Tue, Feb 7, 2023 at 4:58 PM Jun Rao <j...@confluent.io.invalid> > > > wrote: > > > > > > > > > > > Hi, Calvin, > > > > > > > > > > > > Thanks for the KIP. Looks good to me overall. > > > > > > > > > > > > Since this KIP changes the inter-broker protocol, should we bump > up > > > the > > > > > > metadata version (the equivalent of IBP) during upgrade? > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > On Fri, Feb 3, 2023 at 10:55 AM Calvin Liu > > > <ca...@confluent.io.invalid > > > > > > > > > > > wrote: > > > > > > > > > > > > > Hi everyone, > > > > > > > I'd like to discuss the fix for the broker reboot data loss > > > > KAFKA-14139 > > > > > > > <https://issues.apache.org/jira/browse/KAFKA-14139>. > > > > > > > It changes the Fetch and AlterPartition requests to include the > > > > broker > > > > > > > epochs. Then the controller can use these epochs to help > reject the > > > > stale > > > > > > > AlterPartition request. > > > > > > > Please take a look. Thanks! > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-903%3A+Replicas+with+stale+broker+epoch+should+not+be+allowed+to+join+the+ISR > > > > > > > > > > > > > > > > > > > > >