Mooshua opened a new issue, #24676: URL: https://github.com/apache/pulsar/issues/24676
### Search before reporting - [x] I searched in the [issues](https://github.com/apache/pulsar/issues) and found nothing similar. ### Motivation Suppose we have three nodes each receiving updates for some resources to provide values to pulsar consumers. We want the load to be balanced between each of these three nodes evenly, in order to reduce the load on both the brokers and the nodes themselves. Basically, instead of using WaitForExclusive for *leader elections*, we use them for load balancing. Currently, I assign each topic a primary node. When that node creates a producer for that topic, it uses a `ExclusiveWithFencing` access mode, and the two "standby" nodes for that topic establish `WaitForExclusive` producers. Therefore, the connection process is: - Iterate all watched resources - For each resource we are a primary for, establish a `ExclusiveWithFence` producer for that topic - For each resource we are a standby for, establish a `WaitForExclusive` producer for that topic - When any of our producers are created, connect to the external resource update channel and stream updates into pulsar, effectively taking on the load for that resource For my use case, I want nodes to be able to regularly drop out in order to perform rolling updates or other maintenance. **However, when they drop, their producer channels tend to go to *one* node, the first node to register WaitForExclusive access with the brokers.** This leaves one node running 2/3rds of the cluster's work, and the other node only carrying 1/3. I'm fine with this sort of failure state, but I feel like this change would be a simple way to improve the failure scenario for use cases like mine. ### Solution When a producer files `WaitForExclusive` access with the broker, and it is not immediately assigned exclusitivity, it should be put into a *random* spot in the queue, rather than the same spot every time. This would ensure that, should a primary node fail, it's work would be randomly distributed among the remaining nodes. ### Alternatives One alternative is to sort the queue based on a producer-provided priority. This would allow producers to establish an order of which producer gets exclusivity when the exclusive producer closes. This exact functionality could then be implemented on the producer side by having each producer pick a random priority, creating a random order. This could also have use cases far beyond randomizing producer rebalancing. For example, creating a priority queue could enable establishing a whole other cluster of fallover producers in a backup region, and de-prioritizing those so they only gain control when the entire "primary" cluster gets taken down. ### Anything else? _No response_ ### Are you willing to submit a PR? - [x] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
