sciclon2 opened a new pull request, #15030:
URL: https://github.com/apache/kafka/pull/15030

   ### Summary
   Basically when executing `kafka-leader-election.sh` the custom timeouts 
passed by the config file (--admin.config option) is not honoured. SO after 
[this investigation](https://issues.apache.org/jira/browse/KAFKA-16015) I came 
across  with the issue.
   
   Using the kafka-leader-election.sh I was getting random timeouts like these:
   `Error completing leader election (PREFERRED) for partition: 
sebatestemptytopic-4: org.apache.kafka.common.errors.TimeoutException: The 
request timed out.
   Error completing leader election (PREFERRED) for partition: 
__CruiseControlMetrics-3: org.apache.kafka.common.errors.TimeoutException: The 
request timed out.
   Error completing leader election (PREFERRED) for partition: 
__KafkaCruiseControlModelTrainingSamples-18: 
org.apache.kafka.common.errors.TimeoutException: The request timed out.
   Error completing leader election (PREFERRED) for partition: 
__KafkaCruiseControlPartitionMetricSamples-8: 
org.apache.kafka.common.errors.TimeoutException: The request timed out. `
   
   
   These timeouts were raised from the client side as the controller always 
finished with all the Kafka leader elections.
   One pattern I detected was always the timeouts were raised after about 15 
seconds.
   So i checked this command has an option to pass configurations
   `Option                                  Description
   ------                                  -----------
   --admin.config <String: config file>    Configuration properties files to 
pass
                                             to the admin client `
   
   I created the file in order to increment the values of request.timeout.ms  
and default.api.timeout.ms. So even after increasing these values  I got the 
same result, timeouts were happening, like the new values were not having any 
effect. 
   
   So I checked the source code and I came across with a bug, no matter the 
value we pass to the timeouts the default values were ALWAYS overwriting them.
   This is the code in the[ trunk 
branch](https://github.com/apache/kafka/blob/trunk/tools/src/main/java/org/apache/kafka/tools/LeaderElectionCommand.java#L102)
   `    static void run(Duration timeoutMs, String... args) throws Exception {
           LeaderElectionCommandOptions commandOptions = new 
LeaderElectionCommandOptions(args);
   
           commandOptions.maybePrintHelpOrVersion();
   
           commandOptions.validate();
           ElectionType electionType = commandOptions.getElectionType();
           Optional<Set<TopicPartition>> jsonFileTopicPartitions =
               Optional.ofNullable(commandOptions.getPathToJsonFile())
                   .map(path -> parseReplicaElectionData(path));
   
           Optional<String> topicOption = 
Optional.ofNullable(commandOptions.getTopic());
           Optional<Integer> partitionOption = 
Optional.ofNullable(commandOptions.getPartition());
           final Optional<Set<TopicPartition>> singleTopicPartition =
               (topicOption.isPresent() && partitionOption.isPresent()) ?
                   Optional.of(Collections.singleton(new 
TopicPartition(topicOption.get(), partitionOption.get()))) :
                   Optional.empty();
   
           /* Note: No need to look at --all-topic-partitions as we want this 
to be null if it is use.
            * The validate function should be checking that this option is 
required if the --topic and --path-to-json-file
            * are not specified.
            */
           Optional<Set<TopicPartition>> topicPartitions = 
jsonFileTopicPartitions.map(Optional::of).orElse(singleTopicPartition);
   
           Properties props = new Properties();
           if (commandOptions.hasAdminClientConfig()) {
               
props.putAll(Utils.loadProps(commandOptions.getAdminClientConfig()));
           }
           props.setProperty(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, 
commandOptions.getBootstrapServer());
           props.setProperty(AdminClientConfig.DEFAULT_API_TIMEOUT_MS_CONFIG, 
Long.toString(timeoutMs.toMillis()));
           props.setProperty(AdminClientConfig.REQUEST_TIMEOUT_MS_CONFIG, 
Long.toString(timeoutMs.toMillis() / 2));`
   
   As we can see the default timeout is 30 seconds, and the request timeout is 
30/2 which validates the 15 seconds timeout.
   Also we can see in the code how the custom values passed by the config file 
are overwritten by the defaults.
   The proposal is easy, we need to use the defaults values only when the 
timeouts were not defined by the config file, for example like this:
   `      if 
(!props.containsKey(AdminClientConfig.DEFAULT_API_TIMEOUT_MS_CONFIG)) {
             props.setProperty(AdminClientConfig.DEFAULT_API_TIMEOUT_MS_CONFIG, 
timeout.toMillis.toString)
         }
         if (!props.containsKey(AdminClientConfig.REQUEST_TIMEOUT_MS_CONFIG)) {
             props.setProperty(AdminClientConfig.REQUEST_TIMEOUT_MS_CONFIG, 
(timeout.toMillis / 2).toString)
         } `
   
   I tested it and now I am able to modify the timeouts and make my application 
to catch the result of the command properly.
   
   
   
   ### Committer Checklist (excluded from commit message)
   - [x] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to