I want to add only one step to your plan. If you introduce this flag in Y.X, then in Y.(X+1), let's remove this flag and keep the "true" value as the behavior.
On Mon, Jun 19, 2023 at 4:57 AM horizonzy <horizo...@apache.org> wrote: > Background > > In the Pulsar, it has two features: > > - > > The first feature allows users to set group and rack information for > bookies using pulsar-admin bookies set-bookie-rack. > > Here, users set bookie1 to bookie5 to the default group and bookie6 to > bookie10 to the share group using commands, they don't care about rack > information, they only care about which group the bookie belongs to. > > default={bookie1:3181=BookieInfoImpl(rack=default-rack, > hostname=null), bookie2:3181=BookieInfoImpl(rack=default-rack, > hostname=null), bookie3:3181=BookieInfoImpl(rack=default-rack, > hostname=null), bookie4:3181=BookieInfoImpl(rack=default-rack, > hostname=null), bookie5:3181=BookieInfoImpl(rack=default-rack, > hostname=null)} > > _shared_={bookie6:3181=BookieInfoImpl(rack=default-rack, > hostname=null), bookie7:3181=BookieInfoImpl(rack=default-rack, > hostname=null), bookie8:3181=BookieInfoImpl(rack=default-rack, > hostname=null), bookie9:3181=BookieInfoImpl(rack=default-rack, > hostname=null), bookie10:3181=BookieInfoImpl(rack=default-rack, > hostname=null)} > > > - > > The second feature allows users to set the priority of traffic for a > namespace, where traffic is directed to the primary group first and > then to > the secondary group. Users can set this priority using pulsar-admin > ns-isolation-policy set --namespaces public/default --primary "group" > --secondary "group". > > Here, users set the primary group of the /public/default namespace to > "share" using a command. > > { > "bookkeeperAffinityGroupPrimary" : "share" > } > > After this work is completed, all traffic under the /public/default > namespace will be directed to bookie6-10 in the "share" group. > > Drawbacks > > After a period of time, users added some new bookies [bk11, bk12, bk13, > bk14, bk15] to the bookie cluster, they found that some traffic under the > /public/default namespace was directed to the newly added machines. After > investigation, we eventually found that this was a defect in the working > mechanism of bookkeeperAffinityGroupPrimary. > > *bookkeeperAffinityGroupPrimary work mechanism* > > All bookies in the cluster: bk1-bk15. > > Here are the steps of the broker pick bookies. > > 1. > > Get the bookie rack info config default: [bk1, bk2, bk3, bk4, bk5]; > share: > [bk6, bk7, bk8, bk9, bk10] > 2. > > Exclude the bookies which are not the bookkeeperAffinityGroupPrimary > (share). > 3. > > Exclude the default group bookies [bk1, bk2, bk3, bk4, bk5]. > 4. > > Pick bookies from the remaining bookies [bk6, bk7, bk8, bk9, bk10, bk11, > bk12, bk13, bk14, bk15] > > Therefore, some traffic may go to bk11-bk15, which is not what the users > expect. The reason is that the new bookies, bk11 to bk15, did not have rack > information set and were not part of any group. > > We provided a workaround for users to set the rack information for bk11 to > bk15 in advance using the command pulsar-admin bookies set-bookie-rack > before starting them. After user adopting this workaround, the traffic > worked as expected. > > For user, it may be a bit inconvenient as they need to set rack information > in advance before bringing new bookies online. In scenarios where there are > strict limitations on traffic, if the bookie operation and maintenance > personnel overlook this step, it could cause problems. > > Improvement > > I would like to introduce a new configuration strict for > bookkeeperAffinityGroupPrimary. The default value for this configuration is > false, which means that for old users upgrading to the new version, the > logic will remain the same and bookies without rack information will not be > constrained. > > If users manually set strict to true using the command pulsar-admin > ns-isolation-policy set --namespaces public/default --primary "group" > --secondary "group" --strict true, when the broker selects a bookie, it > will only choose from the bookies in the primary group. If there are not > enough bookies in the primary group, it will choose from the bookies in the > secondary group. If there are not enough bookies in either group, an > exception will be thrown. Bookies without rack information set using > pulsar-admin > bookies set-bookie-rack will not be selected. > > Compatibility > > When users upgrade from the old version to the new version, the working > mechanism of bookkeeperAffinityGroupPrimary remains the same as before. > When users upgrade to the new version and set strict to true using the > command pulsar-admin ns-isolation-policy set --namespaces public/default > --primary "group" --secondary "group" --strict true, and then roll back to > the old version, the broker should be able to correctly parse the > ns-isolation-policy configuration. >