Re: [DISCUSS] PIP-255: Assign topic partitions to bundle by round robin

Xiangying Meng Tue, 11 Apr 2023 00:37:37 -0700

Hi Linlin,
> This is an incompatible modification, so the entire cluster needs to be
upgraded, not just a part of the nodes


Appreciate your contribution to the new feature in PIP-255.
 I have a question regarding the load-balancing aspect of this feature.

You mentioned that this is an incompatible modification,
and the entire cluster needs to be upgraded, not just a part of the nodes.
 I was wondering why we can only have one load-balancing strategy.
Would it be possible to abstract the logic here and make it an optional
choice?
This way, we could have multiple load-balancing strategies,
such as hash-based, round-robin, etc., available for users to choose from.

I'd love to hear your thoughts on this.

Best regards,
Xiangying

On Mon, Apr 10, 2023 at 8:23 PM PengHui Li <peng...@apache.org> wrote:

> Hi Lin,
>
> > The load managed by each Bundle is not even. Even if the number of
> partitions managed
>    by each bundle is the same, there is no guarantee that the sum of the
> loads of these partitions
>    will be the same.
>
> Do we expect that the bundles should have the same loads? The bundle is the
> base unit of the
> load balancer, we can set the high watermark of the bundle, e.g., the
> maximum topics and throughput.
> But the bundle can have different real loads, and if one bundle runs out of
> the high watermark, the bundle
> will be split. Users can tune the high watermark to distribute the loads
> evenly across brokers.
>
> For example, there are 4 bundles with loads 1, 3, 2, 4, the maximum load of
> a bundle is 5 and 2 brokers.
> We can assign bundle 0 and bundle 3 to broker-0 and bundle 1 and bundle 2
> to broker-2.
>
> Of course, this is the ideal situation. If bundle 0 has been assigned to
> broker-0 and bundle 1 has been
> assigned to broker-1. Now, bundle 2 will go to broker 1, and bundle 3 will
> go to broker 1. The loads for each
> broker are 3 and 7. Dynamic programming can help to find an optimized
> solution with more bundle unloads.
>
> So, should we design the bundle to have even loads? It is difficult to
> achieve in reality. And the proposal
> said, "Let each bundle carry the same load as possible". Is it the correct
> direction for the load balancer?
>
> > Doesn't shed loads very well. The existing default policy
> ThresholdShedder has a relatively high usage
>    threshold, and various traffic thresholds need to be set. Many clusters
> with high TPS and small message
>    bodies may have high CPU but low traffic; And for many small-scale
> clusters, the threshold needs to be
>    modified according to the actual business.
>
> Can it be resolved by introducing the entry write/read rate to the bundle
> stats?
>
> > The removed Bundle cannot be well distributed to other Brokers. The load
> information of each Broker
>    will be reported at regular intervals, so the judgment of the Leader
> Broker when allocating Bundles cannot
>    be guaranteed to be completely correct. Secondly, if there are a large
> number of Bundles to be redistributed,
>    the Leader may make the low-load Broker a new high-load node when the
> load information is not up-to-date.
>
> Can we try to force-sync the load data of the brokers before performing the
> distribution of a large number of
> bundles?
>
> For the Goal section in the proposal. It looks like it doesn't map to the
> issues mentioned in the Motivation section.
> IMO, the proposal should clearly describe the Goal, like which problem will
> be resolved with this proposal.
> Both of the above 3 issues or part of them. And what is the high-level
> solution to resolve the issue,
> and what are the pros and cons compared with the existing solution without
> diving into the implementation section.
>
> Another consideration is the default max bundles of a namespace is 128. I
> don't think the common cases that need
> to set 128 partitions for a topic. If the partitions < the bundle's count,
> will the new solution basically be equivalent to
> the current way?
>
> If this is not a general solution for common scenarios. I support making
> the topic-bundle assigner pluggable without
> introducing the implementation to the Pulsar repo. Users can implement
> their own assigner based on the business
> requirement. Pulsar's general solution may not be good for all scenarios,
> but it is better for scalability (bundle split)
> and enough for most common scenarios. We can keep improving the general
> solution for the general requirement
> for the most common scenarios.
>
> Regards,
> Penghui
>
>
> On Wed, Mar 22, 2023 at 9:52 AM Lin Lin <lin...@apache.org> wrote:
>
> >
> > > This appears to be the "round-robin topic-to-bundle mapping" option in
> > > the `fundBundle` function. Is this the only place that needs an update?
> > Can
> > > you list what change is required?
> >
> > In this PIP, we only discuss topic-to-bundle mapping
> > Change is required:
> > 1)
> > When lookup, partitions is assigned to bundle:
> > Lookup -> NamespaceService#getBrokerServiceUrlAsync ->
> > NamespaceService#getBundleAsync ->
> > NamespaceBundles#findBundle
> > Consistent hashing is now used to assign partitions to bundle in
> > NamespaceBundles#findBundle.
> > We should add a configuration item partitionAssignerClassName, so that
> > different partition assignment algorithms can be dynamically configured.
> > The existing algorithm will be used as the default
> > （partitionAssignerClassName=ConsistentHashingPartitionAssigner）
> > 2)
> > Implement a new partition assignment class RoundRobinPartitionAssigner.
> > New partition assignments will be implemented in this class
> >
> >
> > > How do we enable this "round-robin topic-to-bundle mapping option" (by
> > > namespace policy and broker.conf)?
> >
> > In broker.conf, a new option called `partitionAssignerClassName`
> >
> > > Can we apply this option to existing namespaces? (what's the admin
> > > operation to enable this option)?
> >
> > The cluster must ensure that all nodes use the same algorithm.
> > Broker-level configuration can be made effective by restarting or admin
> API
> > BrokersBase#updateDynamicConfiguration
> >
> > > I assume the "round-robin topic-to-bundle mapping option" works with a
> > > single partitioned topic, because other topics might show different
> load
> > > per partition. Is this intention? (so users need to ensure not to put
> > other
> > > topics in the namespace, if this option is configured)
> >
> > For  single-partition topics, since the starting bundle is determined
> > using a consistent hash.
> > Therefore,  single-partition topics will spread out to different bundle
> as
> > much as possible.
> > For high load single-partition topics, current algorithms cannot solve
> > this problem.
> > This PIP cannot solve this problem as well.
> > If it just a low load single-partition topic , the impact on the entire
> > bundle is very small.
> > However, in real scenarios, high-load businesses will share the load
> > through multiple partitions.
> >
> > > Some brokers might have more bundles than other brokers. Do we have
> > > different logic for bundle balancing across brokers? or do we rely on
> the
> > > existing assign/unload/split logic to balance bundles among brokers?
> >
> > In this PIP, we do not involve the mapping between bundles and brokers,
> > the existing algorithm works well with this PIP.
> > However, we will also contribute our mapping algorithm in the subsequent
> > PIP.
> > For example: bundles under same namespace can be assigned to broker in a
> > round-robin manner.
> >
> >
> >
>

Re: [DISCUSS] PIP-255: Assign topic partitions to bundle by round robin

Reply via email to