Chia-Ping Tsai created KAFKA-10225: -------------------------------------- Summary: Increase default zk session timeout for system test Key: KAFKA-10225 URL: https://issues.apache.org/jira/browse/KAFKA-10225 Project: Kafka Issue Type: Improvement Reporter: Chia-Ping Tsai Assignee: Chia-Ping Tsai
I'm digging in the flaky system tests and then I noticed there are many flaky caused by following check. {code} with node.account.monitor_log(KafkaService.STDOUT_STDERR_CAPTURE) as monitor: node.account.ssh(cmd) # Kafka 1.0.0 and higher don't have a space between "Kafka" and "Server" monitor.wait_until("Kafka\s*Server.*started", timeout_sec=timeout_sec, backoff_sec=.25, err_msg="Kafka server didn't finish startup in %d seconds" % timeout_sec) {code} And the error message in broker log is shown below. {quote} kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING at kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:262) at kafka.zookeeper.ZooKeeperClient.<init>(ZooKeeperClient.scala:119) at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1880) at kafka.server.KafkaServer.createZkClient$1(KafkaServer.scala:430) at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:455) at kafka.server.KafkaServer.startup(KafkaServer.scala:227) at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:44) at kafka.Kafka$.main(Kafka.scala:82) at kafka.Kafka.main(Kafka.scala) {quote} I'm surprised the default timeout of zk connection in system test is only 2 seconds as the default timeout in production is increased to 18s (see https://github.com/apache/kafka/commit/4bde9bb3ccaf5571be76cb96ea051dadaeeaf5c7) {code} config_property.ZOOKEEPER_CONNECTION_TIMEOUT_MS: 2000 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)