I want to add only one step to your plan.
If you introduce this flag in Y.X, then in Y.(X+1), let's remove this flag
and keep the "true" value as the behavior.


On Mon, Jun 19, 2023 at 4:57 AM horizonzy <horizo...@apache.org> wrote:

> Background
>
> In the Pulsar, it has two features:
>
>    -
>
>    The first feature allows users to set group and rack information for
>    bookies using pulsar-admin bookies set-bookie-rack.
>
> Here, users set bookie1 to bookie5 to the default group and bookie6 to
> bookie10 to the share group using commands, they don't care about rack
> information, they only care about which group the bookie belongs to.
>
> default={bookie1:3181=BookieInfoImpl(rack=default-rack,
> hostname=null), bookie2:3181=BookieInfoImpl(rack=default-rack,
> hostname=null), bookie3:3181=BookieInfoImpl(rack=default-rack,
> hostname=null), bookie4:3181=BookieInfoImpl(rack=default-rack,
> hostname=null), bookie5:3181=BookieInfoImpl(rack=default-rack,
> hostname=null)}
>
> _shared_={bookie6:3181=BookieInfoImpl(rack=default-rack,
> hostname=null), bookie7:3181=BookieInfoImpl(rack=default-rack,
> hostname=null), bookie8:3181=BookieInfoImpl(rack=default-rack,
> hostname=null), bookie9:3181=BookieInfoImpl(rack=default-rack,
> hostname=null), bookie10:3181=BookieInfoImpl(rack=default-rack,
> hostname=null)}
>
>
>    -
>
>    The second feature allows users to set the priority of traffic for a
>    namespace, where traffic is directed to the primary group first and
> then to
>    the secondary group. Users can set this priority using pulsar-admin
>    ns-isolation-policy set --namespaces public/default --primary "group"
>    --secondary "group".
>
> Here, users set the primary group of the /public/default namespace to
> "share" using a command.
>
> {
>   "bookkeeperAffinityGroupPrimary" : "share"
> }
>
> After this work is completed, all traffic under the /public/default
> namespace will be directed to bookie6-10 in the "share" group.
>
> Drawbacks
>
> After a period of time, users added some new bookies [bk11, bk12, bk13,
> bk14, bk15] to the bookie cluster, they found that some traffic under the
> /public/default namespace was directed to the newly added machines. After
> investigation, we eventually found that this was a defect in the working
> mechanism of bookkeeperAffinityGroupPrimary.
>
> *bookkeeperAffinityGroupPrimary work mechanism*
>
> All bookies in the cluster: bk1-bk15.
>
> Here are the steps of the broker pick bookies.
>
>    1.
>
>    Get the bookie rack info config default: [bk1, bk2, bk3, bk4, bk5];
> share:
>    [bk6, bk7, bk8, bk9, bk10]
>    2.
>
>    Exclude the bookies which are not the bookkeeperAffinityGroupPrimary
>    (share).
>    3.
>
>    Exclude the default group bookies [bk1, bk2, bk3, bk4, bk5].
>    4.
>
>    Pick bookies from the remaining bookies [bk6, bk7, bk8, bk9, bk10, bk11,
>    bk12, bk13, bk14, bk15]
>
> Therefore, some traffic may go to bk11-bk15, which is not what the users
> expect. The reason is that the new bookies, bk11 to bk15, did not have rack
> information set and were not part of any group.
>
> We provided a workaround for users to set the rack information for bk11 to
> bk15 in advance using the command pulsar-admin bookies set-bookie-rack
> before starting them. After user adopting this workaround, the traffic
> worked as expected.
>
> For user, it may be a bit inconvenient as they need to set rack information
> in advance before bringing new bookies online. In scenarios where there are
> strict limitations on traffic, if the bookie operation and maintenance
> personnel overlook this step, it could cause problems.
>
> Improvement
>
> I would like to introduce a new configuration strict for
> bookkeeperAffinityGroupPrimary. The default value for this configuration is
> false, which means that for old users upgrading to the new version, the
> logic will remain the same and bookies without rack information will not be
> constrained.
>
> If users manually set strict to true using the command pulsar-admin
> ns-isolation-policy set --namespaces public/default --primary "group"
> --secondary "group" --strict true, when the broker selects a bookie, it
> will only choose from the bookies in the primary group. If there are not
> enough bookies in the primary group, it will choose from the bookies in the
> secondary group. If there are not enough bookies in either group, an
> exception will be thrown. Bookies without rack information set using
> pulsar-admin
> bookies set-bookie-rack will not be selected.
>
> Compatibility
>
> When users upgrade from the old version to the new version, the working
> mechanism of bookkeeperAffinityGroupPrimary remains the same as before.
> When users upgrade to the new version and set strict to true using the
> command pulsar-admin ns-isolation-policy set --namespaces public/default
> --primary "group" --secondary "group" --strict true, and then roll back to
> the old version, the broker should be able to correctly parse the
> ns-isolation-policy configuration.
>

Reply via email to