Hi Colin, Thanks for your feedback! Please find my answers below.
> However, I wonder if this will be feasible for ZK-based brokers to do. We > still support pre-topic-id IBP versions there, right? This might end up being > more complex than you were hoping. That's right. We support pre-topic-id IBP versions for ZK-based brokers. We will only use version 2 if all the topics have an ID when the request is constructed. Version 1 is used otherwise. I have already implemented this part in the draft PR if you want to see how it looks. It brings a little more complexity in the request building/handling but that works. I think that it is definitely worth it. > We should add a comment in AlterPartitionResponse about the new error code > INELIGIBLE_REPLICA. This is very important for error codes so we can track > which ones are returned in which RPC version. Totally. I have those comments in the PR but I forgot to add them in the KIP. I will fix this. > I also wonder if we should add a special error code for the case where the > AlterPartitions call completes a reassignment AND the completed reassignment > no longer includes the previous leader. We have been overloading > FENCED_LEADER_EPOCH for this case, but this is coonfusing to operators > (especially since this RPC does not support error messages, as opposed to > codes) Interesting. I was not aware of this one. We don't do this in the ZK controller, right? I do agree that using FENCED_LEADER_EPOCH is confusing here. We could introduce a NEW_LEADER_ELECTED error code for this purpose. > One thing, though, is that we should define how this interacts with replica > placement. It seems to me that replicas should not be able to be placed on > these inControlledShutdown nodes (unless done manually via the explicit > placement API). Yeah, I agree that we need to clarify this. If not mistaken, replicas can be placed on fenced nodes so I don't see why it should be different for inControlledShutdown nodes. Otherwise we will again have that create topic issue when the cluster has only 3 nodes. However I think that we have to guarantee two other invariants: 1) an inControlledShutdown node should not be added to the ISR when a partition is created. 2) an inControlledShutdown node should not be picked as a leader for a partition. (e.g. KAFKA-13944) Let me add this part to the KIP. > I also think we should spell out the fact that once you go into controlled > shutdown, you don't come out except by creating a new broker instance. (new > incarnation ID). This also makes me wonder if we need to support the shutting > down -> not shutting down transition in BrokerRegistrationChangeRecord, since > we don't plan on using that transition. That makes sense. I will add this and remove that transition from BrokerRegistrationChangeRecord. > Finally, RegisterBrokerRecord / BrokerRegistrationChangeRecord should be > bumped to the next RPC version since we have added a new field. You will also > need to assign yourself a new IBP / MetadataVersion. (For > BrokerRegistrationChangeRecord it would be possible to avoid the version > bump, since we're using tagged fields, but it's better to have it for > consistency, I think.) Noted. Let me update the KIP to incorporate all your feedback. Cheers, David On Tue, May 31, 2022 at 9:42 PM Colin McCabe <cmcc...@apache.org> wrote: > > > We should add a comment in AlterPartitionResponse about the new error > > code INELIGIBLE_REPLICA. This is very important for error codes so we > > can track which ones are returned in which RPC version. I also wonder > > Here I'm referring to AlterPartitionResponse.json > > cheers, > Colin > > > > > > > On Tue, May 31, 2022, at 08:36, David Jacot wrote: > >> Hi folks, > >> > >> I'd like to start a vote for KIP-841: > >> https://cwiki.apache.org/confluence/x/phmhD. > >> > >> Thanks, > >> David