Sorry, heesung, I think I used a confusing name "leader election". Actually, I meant to say "topic owner".
>From my understanding, the issue is if we are using the table view for a compacted topic. We will always get the last value of a key. But it will not work for the "broker ownership conflicts handling". First, we need to change the table view that is able to keep only the first value of a key. Even without compaction, the "broker ownership conflicts handling" will still work correctly, right? But if the table view works on a compaction topic. The table view will show the last value of a key after the compaction. So you want also to change the topic compaction to make sure the table view will always show the first value of a key. Maybe I missed something here. My point is if we can just write the owner(final, the first value of the key) broker back to the topic. So that the table view will always show the first value of the key before the topic compaction or after the topic compaction. Thanks, Penghui On Sat, Oct 22, 2022 at 12:23 AM Heesung Sohn <heesung.s...@streamnative.io.invalid> wrote: > Hi Penghui, > > I put my answers inline. > > On Thu, Oct 20, 2022 at 5:11 PM PengHui Li <peng...@apache.org> wrote: > > > Hi Heesung. > > > > Is it possible to send the promoted value to the topic again to achieve > > eventual consistency? > > > > Yes, as long as the state change is valid, BSC will accept it and broadcast > it to all brokers. > > > > For example: > > > > We have 3 brokers, broker-a, broker-b, and broker-c > > The message for leader election could be "own: broker-b", "own: > broker-c", > > "own: broker-a" > > The broker-b will win in the end. > > > The broker-b can write a new message "own: broker-b" to the topic. After > > the topic compaction. > > Only the broker-b will be present in the topic. Does it work? > > > The proposal does not use a topic for leader election because of the > circular dependency. The proposal uses the metadata store, zookeeper, to > elect the leader broker(s) of BSC. > This part is explained in the "Bundle State Channel Owner Selection and > Discovery" section in pip-192. > > *Bundle State Channel Owner Selection and Discovery* > > *Bundle State Channel(BSC) is another topic, and because of its circular > dependency, we can't use the BundleStateChannel to find the owner broker of > the BSC topic. For example, when a cluster starts, each broker needs to > initiate BSC TopicLookUp(to find the owner broker) in order to consume the > messages in BSC. However, initially, each broker does not know which broker > owns the BSC.* > > *The ZK leader election can be a good option to break this circular > dependency, like the followings.* > *Channel Owner Selection* > > *The cluster can use the ZK leader election to select the owner broker. If > the owner becomes unavailable, one of the followers will become the new > owner. We can elect the owner for each bundle state channel partition.* > *Channel Owner Discovery* > > *Then, in brokers’ TopicLookUp logic, we will add a special case to return > the current leader(the elected BSC owner) for the BSC topics.* > > > > > > > Maybe I missed something. > > > > Thanks, > > Penghui > > > > On Thu, Oct 20, 2022 at 1:30 AM Heesung Sohn > > <heesung.s...@streamnative.io.invalid> wrote: > > > > > Oops. > > > I forgot to mention another important item. I added it below(in bold). > > > > > > Pros: > > > - It supports more distributed load balance operations(bundle > assignment) > > > in a sequentially consistent manner > > > - For really large clusters, by a partitioned system topic, BSC can be > > more > > > scalable than the current single-leader coordination solution. > > > - The load balance commands(across brokers) are sent via event > > > sourcing(more reliable/transparent/easy-to-track) instead of RPC with > > > retries. > > > *- Bundle ownerships can be cached in the topic table-view from BSC. > (no > > > longer needs to store bundle ownership in metadata store(ZK))* > > > > > > Cons: > > > - It is a new implementation and will require significant effort to > > > stabilize the new implementation. > > > (Based on our PoC code, I think the event sourcing handlers are easier > to > > > understand and follow the logic. > > > Also, this new load balancer will be pluggable(will be implemented in > new > > > classes), so it should not break the existing load balance logic. > > > Users will be able to configure old/new broker load balancer.) > > > > > > On Wed, Oct 19, 2022 at 10:17 AM Heesung Sohn < > > > heesung.s...@streamnative.io> > > > wrote: > > > > > > > Hi, > > > > > > > > On Wed, Oct 19, 2022 at 2:06 AM 丛搏 <congbobo...@gmail.com> wrote: > > > > > > > >> Hi, Heesung: > > > >> I have some doubts. > > > >> I review your PIP-192: New Pulsar Broker Load Balancer. I found that > > > >> unload topic uses the leader broker to do, (Assigning, Return) uses > > > >> the lookup request broker. why (Assigning, Return) not use a leader > > > >> broker? > > > >> I can think of a few reasons: > > > >> 1. reduce leader broker pressure > > > >> 2. does not strongly depend on the leader broker > > > >> > > > >> Yes, one of the goals of the PIP-192 is to distribute the load > balance > > > > logic to individual brokers (bundle assignment and bundle split). > > > > > > > > If (Assigning, Return) does not depend on the leader, it will bring > > > > the following problems: > > > > > > > >> If (Assigning, Return) does not depend on the leader, it will bring > > > >> the following problems: > > > > > > > > I assume what you meant by `(Assigning, Return) does not depend on > the > > > > leader` is the distributed topic assignment here(concurrent bundle > > > > assignment across brokers). > > > > > > > > 1. leader clear bundle op and (Assigning, Return) will do at the same > > > >> time, It will cause many requests to be retried, and the broker will > > > >> be in chaos for a long time. > > > > > > > > I assume `leader clear bundle op` means bundle unloading, and `many > > > > requests` means topic lookup requests(bundle assignment requests). > > > > The leader unloads only high-loaded bundles in the "Owned" state. So, > > the > > > > leader does not unload bundles that are in the process of assignment > > > states. > > > > Even if there are conflict state changes, only the first valid state > > > > change will be accepted(as explained in Conflict State > Resolution(Race > > > > Conditions section in the PIP)) in BSC. > > > > > > > > Also, another goal of this PIP-192 is to reduce client lookup > retries. > > In > > > > BSC, the client lookup response will be deferred(max x secs) until > the > > > > bundle state becomes finally "Owned". > > > > > > > > > > > > > > > > > > > >> 2. bundle State Channel(BSC) owner depends on the leader broker, > this > > > >> also makes topic transfer strongly dependent on the leader. > > > >> > > > > BSC will use separate leader znodes to decide the owner brokers of > the > > > > internal BSC system topic.As described in this section in the > PIP-192, > > > > "Bundle State and Load Data TableView Scalability", > > > > We could use a partitioned topic(configurable) for this BSC system > > topic. > > > > Then, there could be a separate owner broker for each partition > > > > (e.g. zk leader znodes, /loadbalance/leader/part-1-owner, > part-2-owner, > > > > ..etc). > > > > > > > > > > > > > > > >> 3. the code becomes more complex and harder to maintain > > > >> > > > >> What tradeoffs are the current implementations based on? > > > >> > > > >> Here are some Pros and Cons of BSC I can think of. > > > > > > > > Pros: > > > > - It supports more distributed load balance operations(bundle > > assignment) > > > > in a sequentially consistent manner > > > > - For really large clusters, by a partitioned system topic, BSC can > be > > > > more scalable than the current single-leader coordination solution. > > > > - The load balance commands(across brokers) are sent via event > > > > sourcing(more reliable/transparent/easy-to-track) instead of RPC with > > > > retries. > > > > > > > > Cons: > > > > - It is a new implementation and will require significant effort to > > > > stabilize the new implementation. > > > > (Based on our PoC code, I think the event sourcing handlers are > easier > > to > > > > understand and follow the logic. > > > > Also, this new load balancer will be pluggable(will be implemented in > > new > > > > classes), so it should not break the existing load balance logic. > > > > Users will be able to configure old/new broker load balancer.) > > > > > > > > > > > > Thank you for sharing your questions about PIP-192 here. But I think > > this > > > > PIP-215 is independent of PIP-192(though PIP-192 needs some of the > > > features > > > > in PIP-215). > > > > > > > > Thanks, > > > > Heesung > > > > > > > > > > > > > > > > > > > > > > > >> Thanks, > > > >> bo > > > >> > > > >> Heesung Sohn <heesung.s...@streamnative.io.invalid> 于2022年10月19日周三 > > > >> 07:54写道: > > > >> > > > > >> > Hi pulsar-dev community, > > > >> > > > > >> > I raised a pip to discuss : PIP-215: Configurable Topic Compaction > > > >> Strategy > > > >> > > > > >> > PIP link: https://github.com/apache/pulsar/issues/18099 > > > >> > > > > >> > Regards, > > > >> > Heesung > > > >> > > > > > > > > > >