suyanhj opened a new issue, #8948:
URL: https://github.com/apache/rocketmq/issues/8948

   ### Before Creating the Bug Report
   
   - [X] I found a bug, not just asking a question, which should be created in 
[GitHub Discussions](https://github.com/apache/rocketmq/discussions).
   
   - [X] I have searched the [GitHub 
Issues](https://github.com/apache/rocketmq/issues) and [GitHub 
Discussions](https://github.com/apache/rocketmq/discussions)  of this 
repository and believe that this is not a duplicate.
   
   - [X] I have confirmed that this bug belongs to the current repository, not 
other repositories of RocketMQ.
   
   
   ### Runtime platform environment
   
   Rocky Linux 9.3
   docker 27.0.3
   
   docker 运行rocketmq 集群
   
   ### RocketMQ version
   
   rocketmq:5.2.0
   
   ### JDK Version
   
   openjdk 11 最新
   
   ### Describe the Bug
   
   
消息延迟收不到这个问题只出现在生产环境,有做了最大消息体配置修改,默认4M,改成了128M,看了官方文档推荐4m,但业务需要调这么大,有个服务发的消息体比较大,也比较频繁
   
   
我用python写了测试发延迟消息的脚本,给大部分延迟级别都发了消息,发现延迟消息在3-6,14-16的等级下,接受不到消息,其他的可以正常接受,通过在rocketmq-console中观察,延迟消息已经进入SCHEDULE_TOPIC_XXXX,但没有正确转到对应的topic,让我很疑惑,如果延迟有问题,感觉应该是所有延迟都收不到才对
   
   
   ### Steps to Reproduce
   
   暂未实测到复现情况
   
   ### What Did You Expect to See?
   
   延迟消息可以正常接受
   
   ### What Did You See Instead?
   
   未收到各别延迟消息
   
   ### Additional Context
   
   # 环境
   **版本:** rocketmq v5.2.0
   **模式:** controller集群模式,3 nameserver + 3 broker,1主2从
   **机器:** 4c12g
   
   # 集群配置(所有节点都是复制的同一份文件仅做ip地址修改,所以不存在配置项差异)
   ## name server配置
   ```conf
   #Namesrv config
   listenPort = 9876
   
   #controller config
   enableControllerInNamesrv = true
   controllerDLegerGroup = group1
   controllerDLegerPeers = 
n0-10.218.0.31:9878;n1-10.218.0.69:9878;n2-10.218.0.44:9878
   controllerDLegerSelfId = n1
   controllerStorePath = /data/rocketmq/namesrv/data
   enableElectUncleanMaster = false
   notifyBrokerRoleChanged = true
   ```
   
   ## broker 配置
   ```conf
   brokerClusterName = DefaultCluster
   brokerName = broker-a
   brokerId = -1
   brokerRole = SLAVE
   deleteWhen = 04
   fileReservedTime = 48
   listenPort=10911
   brokerIP1=10.218.0.69
   brokerIP2=10.218.0.69
   flushDiskType=ASYNC_FLUSH
   storePathRootDir=/data/rocketmq/broker/data
   autoCreateSubscriptionGroup=true
   traceTopicEnable=false
   autoCreateTopicEnable=true
   #defaultTopicQueueNums=4
   mapedFileSizeConsumeQueue=300000
   diskMaxUsedSpaceRatio=88
   maxMessageSize=134217728
   ##sendMessageThreadPoolNums=128
   ##pullMessageThreadPoolNums=128
   useEpollNativeSelector=true
   ##highSpeedMode=true
   messageDelayLevel = 1s 5s 10s 30s 1m 2m 3m 4m 5m 6m 7m 8m 9m 10m 15m 30m 1h 
2h
   
   ###### controller #####
   enableControllerMode = true
   controllerAddr = 10.218.0.31:9878;10.218.0.69:9878;10.218.0.44:9878
   ##syncBrokerMetadataPeriod=5000
   ##checkSyncStateSetPeriod=5000
   ##syncControllerMetadataPeriod=10000
   ##haMaxTimeSlaveNotCatchup=60000
   ##allAckInSyncStateSet=true
   ##syncFromLastFile=false
   ##asyncLearner=false
   ##1主2从
   ##inSyncReplicas=0
   ##minInSyncReplicas=0
   
   ###### m-s #####
   slaveReadEnable=true
   offsetCheckInSlave=true
   ```
   
   # 集群状态验证
   
![image](https://github.com/user-attachments/assets/0c8cb901-e775-4964-9a3d-21a887b5eb4f)
   
   
![image](https://github.com/user-attachments/assets/dc06050e-4305-40e1-adbe-71e9fac2618c)
   
   # 日志
   storeerror.log日志内容如下
   
![image](https://github.com/user-attachments/assets/599b54fe-af47-4407-9e59-cba46d66c1f0)
   delayOffset.json 内容如下
   
![image](https://github.com/user-attachments/assets/967915da-47d1-4ca5-823f-8814f15182e7)
   
   # 异常情况
   发送端:
   
![image](https://github.com/user-attachments/assets/f252cff0-158e-43d0-8008-a1353ccb5ead)
   消费端:
   
![image](https://github.com/user-attachments/assets/17fe79e5-5d5f-4286-8a6c-5282f4eb9c6b)
   
   控制台:控制台中查看 SCHEDULE_TOPIC_XXXX topic,4 5 6 
3的队列都是有正常进消息的,但到了时间后并没有转到原来的topic中去
   
![image](https://github.com/user-attachments/assets/88a1bc3d-0975-4eaa-a3d5-54f48250a256)
   
   
   补充:
   - 我拿生产环境的配置在本地机器搭建了一模一样的集群(机器:8c16g),但没有复现出来问题,各延迟等级可以正常发送消息
   - 后面又起了单节点的mq,尝试改大消息体,也没有问题
   - 
唯一可能引起问题的点在于,本地集群用自带的压测工具,发送线上的383K大小的消息体期间,我的python脚本确实出现了延迟消息接受不到的情况,但我观察到系统负载也比较高8c机器负载17,也可能是负载过高引起,内存是充足的,当我停止自带压测脚本的时候,消息后面又收到了


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@rocketmq.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to