Zhanxiang (Patrick) Huang created KAFKA-8667: ------------------------------------------------
Summary: Improve leadership transition time Key: KAFKA-8667 URL: https://issues.apache.org/jira/browse/KAFKA-8667 Project: Kafka Issue Type: Improvement Reporter: Zhanxiang (Patrick) Huang Assignee: Zhanxiang (Patrick) Huang When the replica fetcher thread processes fetch response, it will hold the {{partitionMapLock}}. If at the same time, a LeaderAndIsr request comes in, it will be blocked at the end of its processing when calling {{shutdownIdleFetcherThread}} because it will need to wait for the {{partitionMapLock}} of each replica fetcher thread to be acquired to check whether there is any partition assigned to each fetcher and the request handler thread performs this check sequentially for the fetcher threads For example, in a cluster with 20 brokers and num.replica.fetcher.thread set to 32, if each fetcher thread holds lock for a little bit longer, the total time for the request handler thread to finish shutdownIdleFetcherThread can be a lot larger due to waiting for the partitionMapLock for a longer time for each fetcher thread. If the LeaderAndIsr gets blocked for >request.timeout.ms (default to 30s) in the broker, request send thread in the controller side will timeout while waiting for the response and try to establish a new connection to the broker and re-send the request, which will break in-order delivery because we will have more than one channel talking to the broker. Moreover, this may make the lock contention problem worse or saturate request handler threads because duplicate control requests are sent to the broker for multiple time. In our own testing, we saw up to *8 duplicate LeaderAndIsrRequest* being sent to the broker during bounce and the 99th LeaderAndIsr local time goes up to ~500s. -- This message was sent by Atlassian JIRA (v7.6.14#76016)