absolute8511 opened a new issue, #7056: URL: https://github.com/apache/rocketmq/issues/7056
### Before Creating the Bug Report - [X] I found a bug, not just asking a question, which should be created in [GitHub Discussions](https://github.com/apache/rocketmq/discussions). - [X] I have searched the [GitHub Issues](https://github.com/apache/rocketmq/issues) and [GitHub Discussions](https://github.com/apache/rocketmq/discussions) of this repository and believe that this is not a duplicate. - [X] I have confirmed that this bug belongs to the current repository, not other repositories of RocketMQ. ### Runtime platform environment Linux ### RocketMQ version 4.9.x ### JDK Version _No response_ ### Describe the Bug The client called `invokeSync` with timeout 3000ms, it will fail forever when there are 2 nameservers with the first nameserver unreachable. ### Steps to Reproduce In the `invokeSync` method, when there are 2 nameservers, if the first nameserver failed to connect(which will timeout after 3000ms), `getAndCreateChannel` will always cost more than 3000ms after the second nameserver success. https://github.com/apache/rocketmq/blob/804f2d85f22d9ee52573b9c6ee6abae248c9b387/remoting/src/main/java/org/apache/rocketmq/remoting/netty/NettyRemotingClient.java#L531 RemotingTimeoutException will be throwed, and the second success channel will be closed. Then next `invokeSync` will choose the first in the `getAndCreateChannel` and will fail again, and forever failed in next `invokeSync`. for example the logs below ``` 2023-07-19 17:03:47 WARN MQClientFactoryScheduledThread11%2381103787518 - createChannel: connect remote host[xxx-nameserver-0.rocketmq.svc.xxx:9876] timeout 3000ms, AbstractBootstrap$PendingRegistrationPromise@2bf472e1(uncancellable) 2023-07-19 17:03:47 INFO MQClientFactoryScheduledThread11%2381103787518 - new name server is chosen. OLD: xxx-nameserver-1.rocketmq.svc.xxx:9876 , NEW: xxx-nameserver-1.rocketmq.svc.xxx:9876. namesrvIndex = 87 2023-07-19 17:03:47 INFO MQClientFactoryScheduledThread11%2381103787518 - createChannel: begin to connect remote host[xxx-nameserver-1.rocketmq.svc.xxx:9876] asynchronously 2023-07-19 17:03:47 INFO NettyClientWorkerThread_1 - NETTY CLIENT PIPELINE: CLOSE 2023-07-19 17:03:47 INFO NettyClientWorkerThread_1 - closeChannel: the channel[xxx-nameserver-0.rocketmq.svc.xxx:9876] was removed from channel table 2023-07-19 17:03:47 INFO NettyClientWorkerThread_1 - NETTY CLIENT PIPELINE: CLOSE 2023-07-19 17:03:47 INFO NettyClientWorkerThread_1 - eventCloseChannel: the channel[null] has been removed from the channel table before 2023-07-19 17:03:47 INFO NettyClientWorkerThread_2 - NETTY CLIENT PIPELINE: CONNECT UNKNOWN => xxx-nameserver-1.rocketmq.svc.xxx.org/172.20.x.x:9876 2023-07-19 17:03:47 INFO 11%2381103787518_NettyClientSelector_1 - closeChannel: close the connection to remote address[] result: true 2023-07-19 17:03:47 INFO MQClientFactoryScheduledThread11%2381103787518 - createChannel: connect remote host[xxx-nameserver-1.rocketmq.svc.xxx:9876] success, AbstractBootstrap$PendingRegistrationPromise@a3c936(success) 2023-07-19 17:03:47 INFO MQClientFactoryScheduledThread11%2381103787518 - closeChannel: begin close the channel[172.20.x.x:9876] Found: false 2023-07-19 17:03:47 INFO MQClientFactoryScheduledThread11%2381103787518 - closeChannel: the channel[172.20.x.x:9876] has been removed from the channel table before 2023-07-19 17:03:47 WARN MQClientFactoryScheduledThread11%2381103787518 - invokeSync: close socket because of timeout, 3000ms, null 2023-07-19 17:03:47 WARN MQClientFactoryScheduledThread11%2381103787518 - invokeSync: wait response timeout exception, the channel[null] 2023-07-19 17:03:47 INFO NettyClientWorkerThread_2 - NETTY CLIENT PIPELINE: CLOSE 172.20.x.x:9876 2023-07-19 17:03:47 INFO NettyClientWorkerThread_2 - closeChannel: the channel[xxx-1.rocketmq.svc.xxx:9876] was removed from channel table 2023-07-19 17:03:47 INFO NettyClientWorkerThread_2 - NETTY CLIENT PIPELINE: CLOSE 172.20.x.x:9876 2023-07-19 17:03:47 INFO NettyClientWorkerThread_2 - eventCloseChannel: the channel[null] has been removed from the channel table before 2023-07-19 17:03:47 INFO 11%2381103787518_NettyClientSelector_1 - closeChannel: close the connection to remote address[172.20.x.x:9876] result: true ``` ### What Did You Expect to See? invokeSync should success in the next call since the second nameserver is ok ### What Did You See Instead? invokeSync failed for a long time ### Additional Context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
