zhuyuemufeng opened a new issue, #7927: URL: https://github.com/apache/rocketmq/issues/7927
### Before Creating the Bug Report - [X] I found a bug, not just asking a question, which should be created in [GitHub Discussions](https://github.com/apache/rocketmq/discussions). - [X] I have searched the [GitHub Issues](https://github.com/apache/rocketmq/issues) and [GitHub Discussions](https://github.com/apache/rocketmq/discussions) of this repository and believe that this is not a duplicate. - [X] I have confirmed that this bug belongs to the current repository, not other repositories of RocketMQ. ### Runtime platform environment linux ### RocketMQ version 5.1.x ### JDK Version jdk 1.8 ### Describe the Bug When the enableSlaveActingMaster switch is turned on and a master node goes down, the slave node attempts to deliver scheduled messages to other master nodes with a maximum of four retries. I find this retry mechanism somewhat unreasonable. For instance, if there's a temporary network interruption causing the remote master node to be temporarily unreachable, it may take up to eight retries for messages to select another available master node. During this process, some messages may be lost. Code: ![image](https://github.com/apache/rocketmq/assets/51144340/d2c8eaf9-35a7-4beb-9d5c-fcbc0c7f65d0) ![image](https://github.com/apache/rocketmq/assets/51144340/59e213be-4080-4614-afec-496aeb14fa55) My approach is to keep looping until a successful remote delivery is achieved. This ensures that no messages are lost, as I believe the severity of message loss outweighs the inconvenience of temporary blocked delivery. ### Steps to Reproduce 1.Set up a cluster with 3 masters and 3 slaves, and enable the enableSlaveActingMaster feature. 2.Send 100 scheduled messages to the cluster with message time range between 1 to 3 minutes. 3.Start consumption, and during the consumption process, shut down one of the master nodes. 4.When a slave delivers scheduled messages and the network connection to a specific master is disconnected for a period of time before being restored. By comparing the sent messages with the consumed messages, you may encounter message loss and Broker errors. ### What Did You Expect to See? If remote delivery fails, continue looping until a viable node is found. ### What Did You See Instead? retrun PUT_NEED_RETRY ### Additional Context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@rocketmq.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org