hachikuji commented on pull request #9100:
URL: https://github.com/apache/kafka/pull/9100#issuecomment-729316632


   @jacky1193610322 I missed this comment before. It's a good question. In 
general, the leader will continue in its current state as long as possible. As 
you say, as soon as it needs to shrink/expand the ISR, it grabs the 
leaderAndIsr update and attempts to synchronously update the state. If 
Zookeeper can't be reached, then the thread gets stuck. Eventually this causes 
the broker to effectively deadlock, which has the side effect of preventing any 
Produce requests (and any other requests) from getting through.
   
   I think it's a fair point that this affords some protection for acks=1 
requests, but I think we tend to view the side effect of deadlocking the broker 
as worse than any benefit. In KIP-500, we have an alternative approach for 
self-fencing. The analogous case is when the leader cannot reach the 
controller. We use a heartbeating mechanism to maintain liveness in the 
cluster. Unlike with Zookeeper, we do not rely on the session expiration event 
in order to tell that a broker has been declared dead. Instead if we do not get 
a heartbeat response from the controller before some timeout, then we will stop 
accepting Produce requests. 
   
   I have been thinking a little bit about your suggestion to self-fence after 
getting an invalid version error from AlterIsr. It might help in the interim 
before KIP-500 is complete. I think our expectation here was that if we get an 
invalid version error, then the LeaderAndIsr with the updated state should soon 
be on the way. I suppose we could come up with reasons why that assumption 
might fail, so it might make sense to be a little more defensive. I will file a 
jira about this and we can see what others think. Thanks for the suggestion!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to