Hi Heesung, Thanks for the explanation I got your point now. We don't know when the topic compaction task will be triggered. If we are not able to write the final decision message to the topic. We will lose the first message of the key. It makes sense to me.
+1 Penghui On Tue, Oct 25, 2022 at 12:16 PM Heesung Sohn <heesung.s...@streamnative.io.invalid> wrote: > Hi, > > Regarding the release plan of this strategic compaction, I think we can > take a conservative approach. > > First, since this is required for the system topic introduced in PIP-192, > we can make this strategic compaction internal-only(the PIP-192, new broker > load balancer will be the only use-case initially). > > Once this strategic compaction is proven to be stable enough, and there is > demand from the customer topics, then we can expose the following admin > APIs to enable strategic compaction to customer topics. > > pulsar-admin topicPolicies set-compaction-strategy options > pulsar-admin topicPolicies get-compaction-strategy options > > Regards, > Heesung > > > > On Mon, Oct 24, 2022 at 1:41 PM Heesung Sohn <heesung.s...@streamnative.io > > > wrote: > > > Hi, please find my answers inline. > > > > On Sun, Oct 23, 2022 at 7:11 PM PengHui Li <peng...@apache.org> wrote: > > > >> Sorry, heesung, > >> > >> I think I used a confusing name "leader election". > >> Actually, I meant to say "topic owner". > >> > >> From my understanding, the issue is if we are using the table view for a > >> compacted topic. > >> We will always get the last value of a key. But it will not work for the > >> "broker ownership conflicts handling". > >> First, we need to change the table view that is able to keep only the > >> first > >> value of a key. > >> Even without compaction, the "broker ownership conflicts handling" will > >> still work correctly, right? > >> > >> > > Yes, the BSC conflict resolution needs to take the first valid > value(state > > change) per key, instead of just the latest value. For non-compacted > topic, > > only this table-view update(taking a strategic cache update) will serve > its > > purpose. > > > > > >> But if the table view works on a compaction topic. The table view will > >> show > >> the last value of a key after the > >> compaction. So you want also to change the topic compaction to make sure > >> the table view will always show the > >> first value of a key. > >> > >> Yes. > > > > > >> Maybe I missed something here. > >> > >> My point is if we can just write the owner(final, the first value of the > >> key) broker back to the topic. > >> So that the table view will always show the first value of the key > before > >> the topic compaction or after the topic compaction. > >> > >> > > But how do we conflict-resolve if the tail messages of the topic are > > non-terminal states? > > > > 1. bundle 1 assigned by broker 1 // in the process of assignment > > 2. bundle 1 assigned by broker 2 > > 3. bundle 2 released by broker 1 // in the process of transfer > > 3. bundle 2 assigned by broker 1 > > 5. bundle 3 splitting by broker 1 // in the process of split > > 6. bundle 3 assigned by broker 2 > > > > > > Regards, > > Heesung > > > > > >> Thanks, > >> Penghui > >> > >> On Sat, Oct 22, 2022 at 12:23 AM Heesung Sohn > >> <heesung.s...@streamnative.io.invalid> wrote: > >> > >> > Hi Penghui, > >> > > >> > I put my answers inline. > >> > > >> > On Thu, Oct 20, 2022 at 5:11 PM PengHui Li <peng...@apache.org> > wrote: > >> > > >> > > Hi Heesung. > >> > > > >> > > Is it possible to send the promoted value to the topic again to > >> achieve > >> > > eventual consistency? > >> > > > >> > > >> > Yes, as long as the state change is valid, BSC will accept it and > >> broadcast > >> > it to all brokers. > >> > > >> > > >> > > For example: > >> > > > >> > > We have 3 brokers, broker-a, broker-b, and broker-c > >> > > The message for leader election could be "own: broker-b", "own: > >> > broker-c", > >> > > "own: broker-a" > >> > > The broker-b will win in the end. > >> > > > >> > The broker-b can write a new message "own: broker-b" to the topic. > After > >> > > the topic compaction. > >> > > Only the broker-b will be present in the topic. Does it work? > >> > > >> > > >> > The proposal does not use a topic for leader election because of the > >> > circular dependency. The proposal uses the metadata store, zookeeper, > to > >> > elect the leader broker(s) of BSC. > >> > This part is explained in the "Bundle State Channel Owner Selection > and > >> > Discovery" section in pip-192. > >> > > >> > *Bundle State Channel Owner Selection and Discovery* > >> > > >> > *Bundle State Channel(BSC) is another topic, and because of its > circular > >> > dependency, we can't use the BundleStateChannel to find the owner > >> broker of > >> > the BSC topic. For example, when a cluster starts, each broker needs > to > >> > initiate BSC TopicLookUp(to find the owner broker) in order to consume > >> the > >> > messages in BSC. However, initially, each broker does not know which > >> broker > >> > owns the BSC.* > >> > > >> > *The ZK leader election can be a good option to break this circular > >> > dependency, like the followings.* > >> > *Channel Owner Selection* > >> > > >> > *The cluster can use the ZK leader election to select the owner > broker. > >> If > >> > the owner becomes unavailable, one of the followers will become the > new > >> > owner. We can elect the owner for each bundle state channel > partition.* > >> > *Channel Owner Discovery* > >> > > >> > *Then, in brokers’ TopicLookUp logic, we will add a special case to > >> return > >> > the current leader(the elected BSC owner) for the BSC topics.* > >> > > >> > > >> > > >> > > > >> > > Maybe I missed something. > >> > > > >> > > Thanks, > >> > > Penghui > >> > > > >> > > On Thu, Oct 20, 2022 at 1:30 AM Heesung Sohn > >> > > <heesung.s...@streamnative.io.invalid> wrote: > >> > > > >> > > > Oops. > >> > > > I forgot to mention another important item. I added it below(in > >> bold). > >> > > > > >> > > > Pros: > >> > > > - It supports more distributed load balance operations(bundle > >> > assignment) > >> > > > in a sequentially consistent manner > >> > > > - For really large clusters, by a partitioned system topic, BSC > can > >> be > >> > > more > >> > > > scalable than the current single-leader coordination solution. > >> > > > - The load balance commands(across brokers) are sent via event > >> > > > sourcing(more reliable/transparent/easy-to-track) instead of RPC > >> with > >> > > > retries. > >> > > > *- Bundle ownerships can be cached in the topic table-view from > BSC. > >> > (no > >> > > > longer needs to store bundle ownership in metadata store(ZK))* > >> > > > > >> > > > Cons: > >> > > > - It is a new implementation and will require significant effort > to > >> > > > stabilize the new implementation. > >> > > > (Based on our PoC code, I think the event sourcing handlers are > >> easier > >> > to > >> > > > understand and follow the logic. > >> > > > Also, this new load balancer will be pluggable(will be implemented > >> in > >> > new > >> > > > classes), so it should not break the existing load balance logic. > >> > > > Users will be able to configure old/new broker load balancer.) > >> > > > > >> > > > On Wed, Oct 19, 2022 at 10:17 AM Heesung Sohn < > >> > > > heesung.s...@streamnative.io> > >> > > > wrote: > >> > > > > >> > > > > Hi, > >> > > > > > >> > > > > On Wed, Oct 19, 2022 at 2:06 AM 丛搏 <congbobo...@gmail.com> > wrote: > >> > > > > > >> > > > >> Hi, Heesung: > >> > > > >> I have some doubts. > >> > > > >> I review your PIP-192: New Pulsar Broker Load Balancer. I found > >> that > >> > > > >> unload topic uses the leader broker to do, (Assigning, Return) > >> uses > >> > > > >> the lookup request broker. why (Assigning, Return) not use a > >> leader > >> > > > >> broker? > >> > > > >> I can think of a few reasons: > >> > > > >> 1. reduce leader broker pressure > >> > > > >> 2. does not strongly depend on the leader broker > >> > > > >> > >> > > > >> Yes, one of the goals of the PIP-192 is to distribute the load > >> > balance > >> > > > > logic to individual brokers (bundle assignment and bundle > split). > >> > > > > > >> > > > > If (Assigning, Return) does not depend on the leader, it will > >> bring > >> > > > > the following problems: > >> > > > > > >> > > > >> If (Assigning, Return) does not depend on the leader, it will > >> bring > >> > > > >> the following problems: > >> > > > > > >> > > > > I assume what you meant by `(Assigning, Return) does not depend > on > >> > the > >> > > > > leader` is the distributed topic assignment here(concurrent > bundle > >> > > > > assignment across brokers). > >> > > > > > >> > > > > 1. leader clear bundle op and (Assigning, Return) will do at the > >> same > >> > > > >> time, It will cause many requests to be retried, and the broker > >> will > >> > > > >> be in chaos for a long time. > >> > > > > > >> > > > > I assume `leader clear bundle op` means bundle unloading, and > >> `many > >> > > > > requests` means topic lookup requests(bundle assignment > requests). > >> > > > > The leader unloads only high-loaded bundles in the "Owned" > state. > >> So, > >> > > the > >> > > > > leader does not unload bundles that are in the process of > >> assignment > >> > > > states. > >> > > > > Even if there are conflict state changes, only the first valid > >> state > >> > > > > change will be accepted(as explained in Conflict State > >> > Resolution(Race > >> > > > > Conditions section in the PIP)) in BSC. > >> > > > > > >> > > > > Also, another goal of this PIP-192 is to reduce client lookup > >> > retries. > >> > > In > >> > > > > BSC, the client lookup response will be deferred(max x secs) > until > >> > the > >> > > > > bundle state becomes finally "Owned". > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > >> 2. bundle State Channel(BSC) owner depends on the leader > broker, > >> > this > >> > > > >> also makes topic transfer strongly dependent on the leader. > >> > > > >> > >> > > > > BSC will use separate leader znodes to decide the owner brokers > of > >> > the > >> > > > > internal BSC system topic.As described in this section in the > >> > PIP-192, > >> > > > > "Bundle State and Load Data TableView Scalability", > >> > > > > We could use a partitioned topic(configurable) for this BSC > system > >> > > topic. > >> > > > > Then, there could be a separate owner broker for each partition > >> > > > > (e.g. zk leader znodes, /loadbalance/leader/part-1-owner, > >> > part-2-owner, > >> > > > > ..etc). > >> > > > > > >> > > > > > >> > > > > > >> > > > >> 3. the code becomes more complex and harder to maintain > >> > > > >> > >> > > > >> What tradeoffs are the current implementations based on? > >> > > > >> > >> > > > >> Here are some Pros and Cons of BSC I can think of. > >> > > > > > >> > > > > Pros: > >> > > > > - It supports more distributed load balance operations(bundle > >> > > assignment) > >> > > > > in a sequentially consistent manner > >> > > > > - For really large clusters, by a partitioned system topic, BSC > >> can > >> > be > >> > > > > more scalable than the current single-leader coordination > >> solution. > >> > > > > - The load balance commands(across brokers) are sent via event > >> > > > > sourcing(more reliable/transparent/easy-to-track) instead of RPC > >> with > >> > > > > retries. > >> > > > > > >> > > > > Cons: > >> > > > > - It is a new implementation and will require significant effort > >> to > >> > > > > stabilize the new implementation. > >> > > > > (Based on our PoC code, I think the event sourcing handlers are > >> > easier > >> > > to > >> > > > > understand and follow the logic. > >> > > > > Also, this new load balancer will be pluggable(will be > >> implemented in > >> > > new > >> > > > > classes), so it should not break the existing load balance > logic. > >> > > > > Users will be able to configure old/new broker load balancer.) > >> > > > > > >> > > > > > >> > > > > Thank you for sharing your questions about PIP-192 here. But I > >> think > >> > > this > >> > > > > PIP-215 is independent of PIP-192(though PIP-192 needs some of > the > >> > > > features > >> > > > > in PIP-215). > >> > > > > > >> > > > > Thanks, > >> > > > > Heesung > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > >> Thanks, > >> > > > >> bo > >> > > > >> > >> > > > >> Heesung Sohn <heesung.s...@streamnative.io.invalid> > >> 于2022年10月19日周三 > >> > > > >> 07:54写道: > >> > > > >> > > >> > > > >> > Hi pulsar-dev community, > >> > > > >> > > >> > > > >> > I raised a pip to discuss : PIP-215: Configurable Topic > >> Compaction > >> > > > >> Strategy > >> > > > >> > > >> > > > >> > PIP link: https://github.com/apache/pulsar/issues/18099 > >> > > > >> > > >> > > > >> > Regards, > >> > > > >> > Heesung > >> > > > >> > >> > > > > > >> > > > > >> > > > >> > > >> > > >