Background In the Pulsar, it has two features:
- The first feature allows users to set group and rack information for bookies using pulsar-admin bookies set-bookie-rack. Here, users set bookie1 to bookie5 to the default group and bookie6 to bookie10 to the share group using commands, they don't care about rack information, they only care about which group the bookie belongs to. default={bookie1:3181=BookieInfoImpl(rack=default-rack, hostname=null), bookie2:3181=BookieInfoImpl(rack=default-rack, hostname=null), bookie3:3181=BookieInfoImpl(rack=default-rack, hostname=null), bookie4:3181=BookieInfoImpl(rack=default-rack, hostname=null), bookie5:3181=BookieInfoImpl(rack=default-rack, hostname=null)} _shared_={bookie6:3181=BookieInfoImpl(rack=default-rack, hostname=null), bookie7:3181=BookieInfoImpl(rack=default-rack, hostname=null), bookie8:3181=BookieInfoImpl(rack=default-rack, hostname=null), bookie9:3181=BookieInfoImpl(rack=default-rack, hostname=null), bookie10:3181=BookieInfoImpl(rack=default-rack, hostname=null)} - The second feature allows users to set the priority of traffic for a namespace, where traffic is directed to the primary group first and then to the secondary group. Users can set this priority using pulsar-admin ns-isolation-policy set --namespaces public/default --primary "group" --secondary "group". Here, users set the primary group of the /public/default namespace to "share" using a command. { "bookkeeperAffinityGroupPrimary" : "share" } After this work is completed, all traffic under the /public/default namespace will be directed to bookie6-10 in the "share" group. Drawbacks After a period of time, users added some new bookies [bk11, bk12, bk13, bk14, bk15] to the bookie cluster, they found that some traffic under the /public/default namespace was directed to the newly added machines. After investigation, we eventually found that this was a defect in the working mechanism of bookkeeperAffinityGroupPrimary. *bookkeeperAffinityGroupPrimary work mechanism* All bookies in the cluster: bk1-bk15. Here are the steps of the broker pick bookies. 1. Get the bookie rack info config default: [bk1, bk2, bk3, bk4, bk5]; share: [bk6, bk7, bk8, bk9, bk10] 2. Exclude the bookies which are not the bookkeeperAffinityGroupPrimary (share). 3. Exclude the default group bookies [bk1, bk2, bk3, bk4, bk5]. 4. Pick bookies from the remaining bookies [bk6, bk7, bk8, bk9, bk10, bk11, bk12, bk13, bk14, bk15] Therefore, some traffic may go to bk11-bk15, which is not what the users expect. The reason is that the new bookies, bk11 to bk15, did not have rack information set and were not part of any group. We provided a workaround for users to set the rack information for bk11 to bk15 in advance using the command pulsar-admin bookies set-bookie-rack before starting them. After user adopting this workaround, the traffic worked as expected. For user, it may be a bit inconvenient as they need to set rack information in advance before bringing new bookies online. In scenarios where there are strict limitations on traffic, if the bookie operation and maintenance personnel overlook this step, it could cause problems. Improvement I would like to introduce a new configuration strict for bookkeeperAffinityGroupPrimary. The default value for this configuration is false, which means that for old users upgrading to the new version, the logic will remain the same and bookies without rack information will not be constrained. If users manually set strict to true using the command pulsar-admin ns-isolation-policy set --namespaces public/default --primary "group" --secondary "group" --strict true, when the broker selects a bookie, it will only choose from the bookies in the primary group. If there are not enough bookies in the primary group, it will choose from the bookies in the secondary group. If there are not enough bookies in either group, an exception will be thrown. Bookies without rack information set using pulsar-admin bookies set-bookie-rack will not be selected. Compatibility When users upgrade from the old version to the new version, the working mechanism of bookkeeperAffinityGroupPrimary remains the same as before. When users upgrade to the new version and set strict to true using the command pulsar-admin ns-isolation-policy set --namespaces public/default --primary "group" --secondary "group" --strict true, and then roll back to the old version, the broker should be able to correctly parse the ns-isolation-policy configuration.