For the sake of the VOTE, I am classifying this as a bug and plan to cherry pick it to all active release branches.
Thanks, Michael On Tue, Oct 11, 2022 at 2:59 PM Michael Marshall <mmarsh...@apache.org> wrote: > > Hi All, > > I created PIP 212: https://github.com/apache/pulsar/issues/18012. It > already has a PR https://github.com/apache/pulsar/pull/15640. It is a > minor change that I view as more of a bug fix, but because it changes > a default value, it needs a PIP. > > The main point for discussion is whether we can cherry pick this to > active release branches. My view is that it should be cherry picked > because the feature doesn't work out of the box. > > If there isn't much discussion, I'll start a vote later this week. > > Thanks, > Michael > > > ### Motivation > > The current Bookkeeper configuration defaults to using > `org.apache.bookkeeper.net.ScriptBasedMapping` for the > `DNSToSwitchMapping` implementation. However, this default > configuration does not align with the Broker's default configuration, > which is `org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping`. As > such, the default configuration for a Pulsar cluster does not lead to > ideal rack awareness when ledgers need to be recovered. The result is > that a user can configure a cluster for rack awareness and the brokers > will honor that configuration, but the autorecovery process will not > because it does not have the correct bookkeeper cluster topology view. > > I propose we configure bookkeeper to use the broker's > `ZkBookieRackAffinityMapping` class. That way, autorecovery will honor > the operator's configured rack awareness policies out of the box. > > ### Goal > > Ensure consistent rack awareness policies. > > I propose this is a bug fix that requires patching all active versions > of Pulsar. > > ### API Changes > > None > > ### Implementation > > See https://github.com/apache/pulsar/pull/15640. > > Add default value for `reppDnsResolverClass` to the > `conf/bookkeeper.conf` configuration. This change effectively switches > the default from `org.apache.bookkeeper.net.ScriptBasedMapping` to > `org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping`. > > ### Alternatives > > The tradeoff is that a user relying on the `ScriptBasedMapping` > default might accidentally get switched to using the > `ZkBookieRackAffinityMapping` implementation. Given that > `ScriptBasedMapping` doesn't work out of the box, and that the > broker's default to `ZkBookieRackAffinityMapping`, I think this is an > acceptable tradeoff. > > ### Anything else? > > I manually verified that the `ZkBookieRackAffinityMapping` works by > running some tests in a minikube cluster deployed with the DataStax > helm chart for Apache Pulsar. I set up 3 racks, 4 bookies, and a topic > with a E=2, Qw=2, and Qa=2. I then verified that the autorecovery pod > correctly discovered the racks and then identified when an ensemble > was not following the rack placement policy after two bookies were > removed. I documented my testing a bit more here: > https://github.com/datastax/pulsar-helm-chart/pull/214.