Kris20030907 opened a new issue, #9900:
URL: https://github.com/apache/rocketmq/issues/9900
### Before Creating the Enhancement Request
- [x] I have confirmed that this should be classified as an enhancement
rather than a bug/feature.
### Summary
In large RocketMQ clusters with many broker nodes (e.g., 177 brokers in our
production environment), the consumer startup time is significantly delayed due
to serial heartbeat sending.
### Motivation
Currently, RocketMQ clients send heartbeats to broker nodes serially, which
creates a linear relationship between startup time and broker count. In our
production environment with 177 brokers:
- **Current startup time**: >15 seconds (measured from consumer start to
first message consumption)
- **Impact on containerized deployments**: Causes Kubernetes
readiness/liveness probe failures, leading to multiple restarts during
deployment
- **Business impact**: Significantly slows down service deployment and
rollback processes
While heartbeat V2 reduces data size, it doesn't address the fundamental
serial execution bottleneck.
### Describe the Solution You'd Like
Introduce concurrent heartbeat sending with the following design:
1. **New configuration parameters**:
- `enableConcurrentHeartbeat`: Boolean flag to enable/disable concurrent
mode (default: false for backward compatibility)
- `concurrentHeartbeatThreadPoolSize`: Thread pool size for concurrent
heartbeats (default: Current available CPU cores)
2. **Implementation approach**:
- Create a fixed thread pool for heartbeat sending when concurrent mode
is enabled
- Submit heartbeat tasks to all brokers in parallel
- Use `CountDownLatch` to wait for all tasks to complete
- Maintain the same error handling and logging as the serial
implementation
3. **Performance target**: Reduce consumer startup time from >15 seconds to
<1 second for 177-broker clusters
### Describe Alternatives You've Considered
1. **Batch heartbeats**: Sending one heartbeat covering multiple brokers
requires protocol changes and broker-side support
### Additional Context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]