Hi all, we have a cluster with 10 brokers, and our kafka version is 0.9.0.1,we repeatedly get our metric data such as offlinePartition metric from each broker with 2 minutes gap to achieve the goal of cluster’s monitor. but accidental timeout occurs when we get data from some of brokers. which will leads to false alarm information. such as we may get exception as below. error: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: 10.11.12.13; nested exception is: java.net.ConnectException: Connection timed out] we find our TcpExt.TCPBacklogDrop index is fluctuate repeatedly, may be this is some root cause. if it’s the problem. how can I optimize it.
Any suggestion is appreciated. Thanks before.