Hi Linlin, > This is an incompatible modification, so the entire cluster needs to be upgraded, not just a part of the nodes
Appreciate your contribution to the new feature in PIP-255. I have a question regarding the load-balancing aspect of this feature. You mentioned that this is an incompatible modification, and the entire cluster needs to be upgraded, not just a part of the nodes. I was wondering why we can only have one load-balancing strategy. Would it be possible to abstract the logic here and make it an optional choice? This way, we could have multiple load-balancing strategies, such as hash-based, round-robin, etc., available for users to choose from. I'd love to hear your thoughts on this. Best regards, Xiangying On Mon, Apr 10, 2023 at 8:23 PM PengHui Li <peng...@apache.org> wrote: > Hi Lin, > > > The load managed by each Bundle is not even. Even if the number of > partitions managed > by each bundle is the same, there is no guarantee that the sum of the > loads of these partitions > will be the same. > > Do we expect that the bundles should have the same loads? The bundle is the > base unit of the > load balancer, we can set the high watermark of the bundle, e.g., the > maximum topics and throughput. > But the bundle can have different real loads, and if one bundle runs out of > the high watermark, the bundle > will be split. Users can tune the high watermark to distribute the loads > evenly across brokers. > > For example, there are 4 bundles with loads 1, 3, 2, 4, the maximum load of > a bundle is 5 and 2 brokers. > We can assign bundle 0 and bundle 3 to broker-0 and bundle 1 and bundle 2 > to broker-2. > > Of course, this is the ideal situation. If bundle 0 has been assigned to > broker-0 and bundle 1 has been > assigned to broker-1. Now, bundle 2 will go to broker 1, and bundle 3 will > go to broker 1. The loads for each > broker are 3 and 7. Dynamic programming can help to find an optimized > solution with more bundle unloads. > > So, should we design the bundle to have even loads? It is difficult to > achieve in reality. And the proposal > said, "Let each bundle carry the same load as possible". Is it the correct > direction for the load balancer? > > > Doesn't shed loads very well. The existing default policy > ThresholdShedder has a relatively high usage > threshold, and various traffic thresholds need to be set. Many clusters > with high TPS and small message > bodies may have high CPU but low traffic; And for many small-scale > clusters, the threshold needs to be > modified according to the actual business. > > Can it be resolved by introducing the entry write/read rate to the bundle > stats? > > > The removed Bundle cannot be well distributed to other Brokers. The load > information of each Broker > will be reported at regular intervals, so the judgment of the Leader > Broker when allocating Bundles cannot > be guaranteed to be completely correct. Secondly, if there are a large > number of Bundles to be redistributed, > the Leader may make the low-load Broker a new high-load node when the > load information is not up-to-date. > > Can we try to force-sync the load data of the brokers before performing the > distribution of a large number of > bundles? > > For the Goal section in the proposal. It looks like it doesn't map to the > issues mentioned in the Motivation section. > IMO, the proposal should clearly describe the Goal, like which problem will > be resolved with this proposal. > Both of the above 3 issues or part of them. And what is the high-level > solution to resolve the issue, > and what are the pros and cons compared with the existing solution without > diving into the implementation section. > > Another consideration is the default max bundles of a namespace is 128. I > don't think the common cases that need > to set 128 partitions for a topic. If the partitions < the bundle's count, > will the new solution basically be equivalent to > the current way? > > If this is not a general solution for common scenarios. I support making > the topic-bundle assigner pluggable without > introducing the implementation to the Pulsar repo. Users can implement > their own assigner based on the business > requirement. Pulsar's general solution may not be good for all scenarios, > but it is better for scalability (bundle split) > and enough for most common scenarios. We can keep improving the general > solution for the general requirement > for the most common scenarios. > > Regards, > Penghui > > > On Wed, Mar 22, 2023 at 9:52 AM Lin Lin <lin...@apache.org> wrote: > > > > > > This appears to be the "round-robin topic-to-bundle mapping" option in > > > the `fundBundle` function. Is this the only place that needs an update? > > Can > > > you list what change is required? > > > > In this PIP, we only discuss topic-to-bundle mapping > > Change is required: > > 1) > > When lookup, partitions is assigned to bundle: > > Lookup -> NamespaceService#getBrokerServiceUrlAsync -> > > NamespaceService#getBundleAsync -> > > NamespaceBundles#findBundle > > Consistent hashing is now used to assign partitions to bundle in > > NamespaceBundles#findBundle. > > We should add a configuration item partitionAssignerClassName, so that > > different partition assignment algorithms can be dynamically configured. > > The existing algorithm will be used as the default > > (partitionAssignerClassName=ConsistentHashingPartitionAssigner) > > 2) > > Implement a new partition assignment class RoundRobinPartitionAssigner. > > New partition assignments will be implemented in this class > > > > > > > How do we enable this "round-robin topic-to-bundle mapping option" (by > > > namespace policy and broker.conf)? > > > > In broker.conf, a new option called `partitionAssignerClassName` > > > > > Can we apply this option to existing namespaces? (what's the admin > > > operation to enable this option)? > > > > The cluster must ensure that all nodes use the same algorithm. > > Broker-level configuration can be made effective by restarting or admin > API > > BrokersBase#updateDynamicConfiguration > > > > > I assume the "round-robin topic-to-bundle mapping option" works with a > > > single partitioned topic, because other topics might show different > load > > > per partition. Is this intention? (so users need to ensure not to put > > other > > > topics in the namespace, if this option is configured) > > > > For single-partition topics, since the starting bundle is determined > > using a consistent hash. > > Therefore, single-partition topics will spread out to different bundle > as > > much as possible. > > For high load single-partition topics, current algorithms cannot solve > > this problem. > > This PIP cannot solve this problem as well. > > If it just a low load single-partition topic , the impact on the entire > > bundle is very small. > > However, in real scenarios, high-load businesses will share the load > > through multiple partitions. > > > > > Some brokers might have more bundles than other brokers. Do we have > > > different logic for bundle balancing across brokers? or do we rely on > the > > > existing assign/unload/split logic to balance bundles among brokers? > > > > In this PIP, we do not involve the mapping between bundles and brokers, > > the existing algorithm works well with this PIP. > > However, we will also contribute our mapping algorithm in the subsequent > > PIP. > > For example: bundles under same namespace can be assigned to broker in a > > round-robin manner. > > > > > > >