Hi Heesung, For 2.10, I would like to suggest fixing the issue instead of cherry-picking the PR. The problem that https://github.com/apache/pulsar/pull/388 had resolved will happen again if `loadBalancerDistributeBundlesEvenlyEnabled` is disabled. We should try to remove the configuration in the future because users are difficult to decide whether to enable or disable it. Both of them have problems, just different issues.
> I think we also need to consider the namespace anit-affinity-group logic too. +1, it should be fixed to avoid an infinite bundle unloading loop. Thanks, Penghui On Sat, Jul 8, 2023 at 4:07 AM Heesung Sohn <heesung.s...@streamnative.io.invalid> wrote: > Hi dev, > > I think we also need to consider the namespace anit-affinity-group logic > too. These logics seem to do similar things. > > https://pulsar.apache.org/docs/3.0.x/administration-load-balance/#distribute-anti-affinity-namespaces-across-failure-domains > > > PengHui > We got three biding votes here. Do you think we should proceed to > cherry-pick the PR to 2.10, then? > > Thanks, > Heesung > > > > > > On Sun, Jul 2, 2023 at 5:22 PM PengHui Li <peng...@apache.org> wrote: > > > > `removeMostServicingBrokersForNamespace ` is introduced by [1] to > > solve the problem that when all bundles in a particular namespace > > belong to 1 or few machines, customers owning that namespace will be > > heavily impacted if that broker goes down. Of course, this PR caused > > the infinite unloading issue and we need to fix it. > > > > Thanks for the context. > > It looks like we can also try to fix the infinite unloading issue. > > Now, the broker is unloading the bundles without checking the > distribution > > of the bundles under a namespace, but it will check when finding > > a new owner. Is it possible to check the bundle distribution before > > unloading the bundles to avoid infinite unloading? > > > > Regards, > > Penghui > > > > > > On Sun, Jul 2, 2023 at 3:28 PM Enrico Olivelli <eolive...@gmail.com> > > wrote: > > > > > +1 > > > > > > Enrico > > > > > > Il Dom 2 Lug 2023, 06:19 Hang Chen <chenh...@apache.org> ha scritto: > > > > > > > +1 for cherry-picking it to branch-2.10. We have a flag to control > > > > whether to enable or disable it. > > > > > > > > `removeMostServicingBrokersForNamespace ` is introduced by [1] to > > > > solve the problem that when all bundles in a particular namespace > > > > belong to 1 or few machines, customers owning that namespace will be > > > > heavily impacted if that broker goes down. Of course, this PR caused > > > > the infinite unloading issue and we need to fix it. > > > > > > > > > I agree with making it false for the next major version release by > > > > default. > > > > We'd better remove the config in the next version instead of change > > > > the default value to `false`, which will make Pulsar's configuration > > > > keep increasing. > > > > > > > > Thanks, > > > > Hang > > > > > > > > [1] https://github.com/apache/pulsar/pull/388 > > > > > > > > PengHui Li <peng...@apache.org> 于2023年7月1日周六 09:38写道: > > > > > > > > > > +1 for cherry-pick to branch-2.10 since users don't have a > workaround > > > > > for this issue, and the change is well-understand, low risk. > > > > > > > > > > I agree with making it false for the next major version release by > > > > default. > > > > > > > > > > Thanks, > > > > > Penghui > > > > > > > > > > On Sat, Jul 1, 2023 at 9:26 AM Heesung Sohn > > > > > <heesung.s...@streamnative.io.invalid> wrote: > > > > > > > > > > > Hi dev, > > > > > > > > > > > > I realized that `removeMostServicingBrokersForNamespace` func in > > the > > > > broker > > > > > > selection logic can cause infinite unloading. > > > > > > > > > > > > Suppose an overloaded broker unloaded a bundle and only has the > > > minimum > > > > > > number of bundles(in that namespace) among brokers. In that case, > > the > > > > > > selection logic (`removeMostServicingBrokersForNamespace`) will > > > filter > > > > out > > > > > > other brokers and always reassign the bundle to the previous > > broker. > > > > This > > > > > > will cause infinite unloading(like a boomerang). > > > > > > > > > > > > To mitigate this issue, we need to cherry-pick this PR to disable > > > this > > > > > > logic by the config. > > > > > > https://github.com/apache/pulsar/pull/16059 > > > > > > > > > > > > And we probably want to disable this > > > > > > `removeMostServicingBrokersForNamespace` logic by default. > > > > > > > > > > > > Regards, > > > > > > Heesung > > > > > > > > > > > > > > > >