Hi Penghui, I put my answers inline.
On Thu, Oct 20, 2022 at 5:11 PM PengHui Li <peng...@apache.org> wrote: > Hi Heesung. > > Is it possible to send the promoted value to the topic again to achieve > eventual consistency? > Yes, as long as the state change is valid, BSC will accept it and broadcast it to all brokers. > For example: > > We have 3 brokers, broker-a, broker-b, and broker-c > The message for leader election could be "own: broker-b", "own: broker-c", > "own: broker-a" > The broker-b will win in the end. > The broker-b can write a new message "own: broker-b" to the topic. After > the topic compaction. > Only the broker-b will be present in the topic. Does it work? The proposal does not use a topic for leader election because of the circular dependency. The proposal uses the metadata store, zookeeper, to elect the leader broker(s) of BSC. This part is explained in the "Bundle State Channel Owner Selection and Discovery" section in pip-192. *Bundle State Channel Owner Selection and Discovery* *Bundle State Channel(BSC) is another topic, and because of its circular dependency, we can't use the BundleStateChannel to find the owner broker of the BSC topic. For example, when a cluster starts, each broker needs to initiate BSC TopicLookUp(to find the owner broker) in order to consume the messages in BSC. However, initially, each broker does not know which broker owns the BSC.* *The ZK leader election can be a good option to break this circular dependency, like the followings.* *Channel Owner Selection* *The cluster can use the ZK leader election to select the owner broker. If the owner becomes unavailable, one of the followers will become the new owner. We can elect the owner for each bundle state channel partition.* *Channel Owner Discovery* *Then, in brokers’ TopicLookUp logic, we will add a special case to return the current leader(the elected BSC owner) for the BSC topics.* > > Maybe I missed something. > > Thanks, > Penghui > > On Thu, Oct 20, 2022 at 1:30 AM Heesung Sohn > <heesung.s...@streamnative.io.invalid> wrote: > > > Oops. > > I forgot to mention another important item. I added it below(in bold). > > > > Pros: > > - It supports more distributed load balance operations(bundle assignment) > > in a sequentially consistent manner > > - For really large clusters, by a partitioned system topic, BSC can be > more > > scalable than the current single-leader coordination solution. > > - The load balance commands(across brokers) are sent via event > > sourcing(more reliable/transparent/easy-to-track) instead of RPC with > > retries. > > *- Bundle ownerships can be cached in the topic table-view from BSC. (no > > longer needs to store bundle ownership in metadata store(ZK))* > > > > Cons: > > - It is a new implementation and will require significant effort to > > stabilize the new implementation. > > (Based on our PoC code, I think the event sourcing handlers are easier to > > understand and follow the logic. > > Also, this new load balancer will be pluggable(will be implemented in new > > classes), so it should not break the existing load balance logic. > > Users will be able to configure old/new broker load balancer.) > > > > On Wed, Oct 19, 2022 at 10:17 AM Heesung Sohn < > > heesung.s...@streamnative.io> > > wrote: > > > > > Hi, > > > > > > On Wed, Oct 19, 2022 at 2:06 AM 丛搏 <congbobo...@gmail.com> wrote: > > > > > >> Hi, Heesung: > > >> I have some doubts. > > >> I review your PIP-192: New Pulsar Broker Load Balancer. I found that > > >> unload topic uses the leader broker to do, (Assigning, Return) uses > > >> the lookup request broker. why (Assigning, Return) not use a leader > > >> broker? > > >> I can think of a few reasons: > > >> 1. reduce leader broker pressure > > >> 2. does not strongly depend on the leader broker > > >> > > >> Yes, one of the goals of the PIP-192 is to distribute the load balance > > > logic to individual brokers (bundle assignment and bundle split). > > > > > > If (Assigning, Return) does not depend on the leader, it will bring > > > the following problems: > > > > > >> If (Assigning, Return) does not depend on the leader, it will bring > > >> the following problems: > > > > > > I assume what you meant by `(Assigning, Return) does not depend on the > > > leader` is the distributed topic assignment here(concurrent bundle > > > assignment across brokers). > > > > > > 1. leader clear bundle op and (Assigning, Return) will do at the same > > >> time, It will cause many requests to be retried, and the broker will > > >> be in chaos for a long time. > > > > > > I assume `leader clear bundle op` means bundle unloading, and `many > > > requests` means topic lookup requests(bundle assignment requests). > > > The leader unloads only high-loaded bundles in the "Owned" state. So, > the > > > leader does not unload bundles that are in the process of assignment > > states. > > > Even if there are conflict state changes, only the first valid state > > > change will be accepted(as explained in Conflict State Resolution(Race > > > Conditions section in the PIP)) in BSC. > > > > > > Also, another goal of this PIP-192 is to reduce client lookup retries. > In > > > BSC, the client lookup response will be deferred(max x secs) until the > > > bundle state becomes finally "Owned". > > > > > > > > > > > > > > >> 2. bundle State Channel(BSC) owner depends on the leader broker, this > > >> also makes topic transfer strongly dependent on the leader. > > >> > > > BSC will use separate leader znodes to decide the owner brokers of the > > > internal BSC system topic.As described in this section in the PIP-192, > > > "Bundle State and Load Data TableView Scalability", > > > We could use a partitioned topic(configurable) for this BSC system > topic. > > > Then, there could be a separate owner broker for each partition > > > (e.g. zk leader znodes, /loadbalance/leader/part-1-owner, part-2-owner, > > > ..etc). > > > > > > > > > > > >> 3. the code becomes more complex and harder to maintain > > >> > > >> What tradeoffs are the current implementations based on? > > >> > > >> Here are some Pros and Cons of BSC I can think of. > > > > > > Pros: > > > - It supports more distributed load balance operations(bundle > assignment) > > > in a sequentially consistent manner > > > - For really large clusters, by a partitioned system topic, BSC can be > > > more scalable than the current single-leader coordination solution. > > > - The load balance commands(across brokers) are sent via event > > > sourcing(more reliable/transparent/easy-to-track) instead of RPC with > > > retries. > > > > > > Cons: > > > - It is a new implementation and will require significant effort to > > > stabilize the new implementation. > > > (Based on our PoC code, I think the event sourcing handlers are easier > to > > > understand and follow the logic. > > > Also, this new load balancer will be pluggable(will be implemented in > new > > > classes), so it should not break the existing load balance logic. > > > Users will be able to configure old/new broker load balancer.) > > > > > > > > > Thank you for sharing your questions about PIP-192 here. But I think > this > > > PIP-215 is independent of PIP-192(though PIP-192 needs some of the > > features > > > in PIP-215). > > > > > > Thanks, > > > Heesung > > > > > > > > > > > > > > > > > >> Thanks, > > >> bo > > >> > > >> Heesung Sohn <heesung.s...@streamnative.io.invalid> 于2022年10月19日周三 > > >> 07:54写道: > > >> > > > >> > Hi pulsar-dev community, > > >> > > > >> > I raised a pip to discuss : PIP-215: Configurable Topic Compaction > > >> Strategy > > >> > > > >> > PIP link: https://github.com/apache/pulsar/issues/18099 > > >> > > > >> > Regards, > > >> > Heesung > > >> > > > > > >