hachikuji commented on pull request #9100: URL: https://github.com/apache/kafka/pull/9100#issuecomment-729316632
@jacky1193610322 I missed this comment before. It's a good question. In general, the leader will continue in its current state as long as possible. As you say, as soon as it needs to shrink/expand the ISR, it grabs the leaderAndIsr update and attempts to synchronously update the state. If Zookeeper can't be reached, then the thread gets stuck. Eventually this causes the broker to effectively deadlock, which has the side effect of preventing any Produce requests (and any other requests) from getting through. I think it's a fair point that this affords some protection for acks=1 requests, but I think we tend to view the side effect of deadlocking the broker as worse than any benefit. In KIP-500, we have an alternative approach for self-fencing. The analogous case is when the leader cannot reach the controller. We use a heartbeating mechanism to maintain liveness in the cluster. Unlike with Zookeeper, we do not rely on the session expiration event in order to tell that a broker has been declared dead. Instead if we do not get a heartbeat response from the controller before some timeout, then we will stop accepting Produce requests. I have been thinking a little bit about your suggestion to self-fence after getting an invalid version error from AlterIsr. It might help in the interim before KIP-500 is complete. I think our expectation here was that if we get an invalid version error, then the LeaderAndIsr with the updated state should soon be on the way. I suppose we could come up with reasons why that assumption might fail, so it might make sense to be a little more defensive. I will file a jira about this and we can see what others think. Thanks for the suggestion! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org