Il giorno lun 16 gen 2023 alle ore 11:58 r...@apache.org <ranxiaolong...@gmail.com> ha scritto: > > I agree with @Enrico @Bo, if we encounter a subscribe stuck situation, we > must continue to spend more time to locate and fix this problem, which is > what we have been doing. > > But let's think about this problem from another angle. At this time, a user > in the production environment encounters a consumer stuck situation, what > should we do? For a user in a production environment, our first reaction > when encountering a problem is how to quickly recover and how to quickly > reduce user losses. Even at this point in time, we don't think about > whether this is a bug on the Broker side, a bug on the SDK side, or a bug > used by the user himself? In the process of fast recovery, our most common > method is to quickly re-establish the connection between the broker and the > client through the topic specified by unload. In this process, we try to > retain as much context as possible to assist us in the subsequent > continuous positioning and repair of this problem. > > So I don't think these two things conflict. Why we expose the admin CLI of > the unload topic is why we expect to expose the unload subscribe. If we > stand from the perspective of a developer, we definitely want to completely > fix the problem that caused the stuck. If we think about this issue from > the perspective of the user, when a scenario such as consumer stuck occurs > to the user, the user does not care about the specific cause of the > problem, but expects the business to recover quickly in the shortest > possible time to avoid further loss. > > I admit that this is a relatively hacky way, but it can indeed solve the > problems we are currently encountering, and at the same time, it will not > cause a major conflict with Pulsar's existing logic. So I still insist on > agreeing with yubiao's point of view.
Usually when a subscription is "stuck" even if you unload the topic it returns to the "stuck" state again if you don't solve the problem. This is a very common issue with Pulsar users, I am spending much time helping users to troubleshoot their production problems and unloading the topic is never a solution, it can give you seconds, minutes or hours of "working state", then the problem will happen again. You say that it can solve the problems you are encountering. Could you please give more context ? (in Slack if this is not something that can be discussed in public) I apologise if I seem too much of a skeptic this time, I am sure that you have a real problem and you want to fix it, but I would like to help you find the best way. Pulsar is used by many people and we shouldn't add hacky tools for temporary workarounds. Once we deliver an API we should maintain it for an unlimited time. You could patch your system and use the patched version temporarily until you find the root case. Enrico > > -- > Thanks > Xiaolong Ran > > > Yubiao Feng <yubiao.f...@streamnative.io.invalid> 于2023年1月15日周日 20:59写道: > > > Hi Qiang > > > > > 1. How do you handle the race condition when you are trying to unload the > > subscription, and the new consumer wants to subscribe to this subscription > > at the same time? I'm unsure if it has the race condition. I just want to > > remind you about that.:) > > > > These methods `addConsumer`, `removeConsumer` all have synchronized locks, > > we also add synchronized lock when executing `reset subscription` can solve > > the problem. > > > > > 2. Would you like to add some restful API design to clarify the > > implementation? > > > > Already added the rest API design in the proposal > > https://github.com/apache/pulsar/issues/19187 > > > > On Thu, Jan 12, 2023 at 3:22 PM <mattisonc...@gmail.com> wrote: > > > > > Hi, Yubiao > > > > > > I agree with this idea because some users care about the production rate. > > > They don't want to unload the whole topic to fix the subscription > > problem. > > > > > > I've got some questions: > > > > > > 1. How do you handle the race condition when you are trying to unload the > > > subscription, and the new consumer wants to subscribe to this > > subscription > > > at the same time? I'm unsure if it has the race condition. I just want to > > > remind you about that. :) > > > 2. Would you like to add some restful API design to clarify the > > > implementation? > > > a. Request method > > > b. Request path > > > c. Response code > > > d. etc. > > > > > > > > > Thanks for your work. > > > Mattison > > > On Jan 11, 2023, 17:01 +0800, Yubiao Feng <yubiao.f...@streamnative.io > > .invalid>, > > > wrote: > > > > Hi community > > > > > > > > I am starting a DISCUSS for PIP-240: A new API to unload subscriptions. > > > > > > > > PIP issue: https://github.com/apache/pulsar/issues/19187 > > > > > > > > ### Motivation > > > > > > > > We sometimes try to unload the topic to resolve some consumption-stop > > > > issues. But the unloading topic will also impact the producer side. > > > > > > > > ### Goal > > > > > > > > Providing a new API to unload the subscription dimension triggers > > > > reconnection of all consumers on that subscription and reconnection is > > > > guaranteed by the client. The API will be used in these ways: > > > > - unload special subscription of one topic(or partitioned topic) > > > > - unload all subscriptions of one topic(or partitioned topic) > > > > - unload subscriptions of one topic(or partitioned topic) by regular > > > > expression > > > > - If a reader's subscription name is not set, a random subscription > > name > > > > prefixed with 'multiTopicsReader-' or 'reader-' will be used, and users > > > can > > > > uninstall these subscriptions using regular expressions. > > > > > > > > In addition to triggering consumer disconnection, Unloading Subscribers > > > > will restart the Dispatcher, which resets the redeliver message queue > > and > > > > delayed message queue in the Broker's memory, which can help resolve > > > issues > > > > caused by an abnormal dispatcher state. However, the execution flow of > > > > Unloading Subscribers does not include a restart of the Managed Cursor > > > > related to this dispatcher; if there is a problem with the cursor, we > > can > > > > only rely on the unload topic to solve it. > > > > > > > > Note: From the client's perspective, this connection may be shared by > > > > consumers, producers, and transactions, so Unloading Subscribers maybe > > > > impact the producer and transaction. > > > > > > > > #### These scenarios are not supported > > > > - Functions `message-dedup`, `geo-replication,` and `shadow-topic` also > > > > read messages from the topic, but Unloading subscribers will not > > support > > > > triggering restarts of these three functions( because the cursor is > > used > > > > directly to read the data in these scenarios, not the consumer or > > reader > > > ). > > > > - The Compression task(subscription name is `__compaction`) also use a > > > > reader to read data, but Unloading Subscribers does not support it > > > because > > > > this task creates a new reader each time it starts. > > > > - Do not support all topics related to Transaction features. > > > > - `__transaction_buffer_snapshot` works with the task TB recover, and > > > > this task will create a new reader each time they start. > > > > - `__transaction_pending_ack` works with the task Transaction Pending > > Ack > > > > Store replay, and this task will use managed cursor directly to read > > > data. > > > > - `__transaction_log_xxx` works with the task Transaction Log, which > > will > > > > use managed cursor directly to read data. > > > > - `transaction_coordinator_assign` No data will be written on this > > topic. > > > > > > > > #### Special system topic supports > > > > The system topic `__change_events` is used to support topic-level > > > policies, > > > > there may also be some message delivery issues in this scenario, so > > > > Unloading Subscribers will support this topic. > > > > > > > > ### API Changes > > > > > > > > #### For persistent topic > > > > ``` > > > > pulsar-admin persistent unload {topic_name} -s {sub_name} > > > > ``` > > > > > > > > #### For non-persistent topic > > > > ``` > > > > pulsar-admin non-persistent unload {topic_name} -s {sub_name} > > > > ``` > > > > > > > > #### Explain the param `-s` > > > > - set param `-s` to special sub name to unload special subscription > > > > - set param `-s` to `**` to unload all subscriptions under this topic > > > > - set param `-s` to `regexp` to unload a batch subscriptions under this > > > > topic > > > > > > > > > > > > Thanks > > > > Yubiao Feng > > > > >