Rajini Sivaram created KAFKA-6916:
-------------------------------------

             Summary: AdminClient does not refresh metadata on broker failure
                 Key: KAFKA-6916
                 URL: https://issues.apache.org/jira/browse/KAFKA-6916
             Project: Kafka
          Issue Type: Task
          Components: admin
    Affects Versions: 1.0.1, 1.1.0
            Reporter: Rajini Sivaram
            Assignee: Rajini Sivaram
             Fix For: 2.0.0


There are intermittent test failures in DynamicBrokerReconfigurationTest when 
brokers are restarted. The test uses ephemeral ports and hence ports after 
server restart are not the same as the ports before restart. The tests rely on 
metadata refresh on producers, consumers and admin clients to obtain new server 
ports when connections fail. This works with producers and consumers, but 
results in intermittent failures with admin client because refresh is not 
triggered.

There are a couple of issues in AdminClient:
 # Unlike producers and consumers, adminClient does not request metadata update 
when connection to a broker fails. This is particularly bad if controller goes 
down. Controller is used for various requests like createTopics and 
describeTopics. If controller goes down and adminClient.describeTopics() is 
invoked, adminClient sends the request to the old controller. If the connection 
fails, it keeps retrying with the same address. Metadata refresh is never 
triggered. The request times out after 2 minutes by default, metadata is not 
refreshed for 5 minutes by default. We should refresh metadata whenever 
connection to a broker fails.
 # Admin client requests are always retried on the same node. In the example 
above, if controller goes down and a new controller is elected, it will be good 
if the retried request is sent to the new controller. Otherwise we are just 
blocking the call for 2 minutes with a lot of retries that would never succeed.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to