Yubiao, thanks for sharing your problem and a proposal, this is very helpful for the community to get in touch with the pain of Pulsar users/administrators.
In my experience if a "subscription is stuck", the problems are: * the client has some problems (bug in the client/misconfiguration somewhere) - 99.9% * there is a bug in Pulsar - 0.1% Unloading a topic is an operation that triggers some reset of the state on both the broker and the clients and this usually TEMPORARY unblocks the subscription. I have never seen a problem that is temporarily solved by topic unload/broker restart to be permanently solved with that operation. If there is a problem we should spend time on investigating the problem and not in adding this kind of tool. I believe that we should not continue to add these kinds of hacks into Pulsar: - easy reset... - ignore errors... (catch Throwable...) The overall result is a system that "seems to work" but it actually doesn't work properly I would invest more time in: - monitoring tools (tools to detect quickly stuck consumers) - circuit breakers (fast fail/shut the door to consumers/producers that don't behave correctly) - guard rails (limits to clients to prevent them to exhaust the resources on the brokers) Enrico Il giorno gio 12 gen 2023 alle ore 08:22 <mattisonc...@gmail.com> ha scritto: > > Hi, Yubiao > > I agree with this idea because some users care about the production rate. > They don't want to unload the whole topic to fix the subscription problem. > > I've got some questions: > > 1. How do you handle the race condition when you are trying to unload the > subscription, and the new consumer wants to subscribe to this subscription at > the same time? I'm unsure if it has the race condition. I just want to remind > you about that. :) > 2. Would you like to add some restful API design to clarify the > implementation? > a. Request method > b. Request path > c. Response code > d. etc. > > > Thanks for your work. > Mattison > On Jan 11, 2023, 17:01 +0800, Yubiao Feng > <yubiao.f...@streamnative.io.invalid>, wrote: > > Hi community > > > > I am starting a DISCUSS for PIP-240: A new API to unload subscriptions. > > > > PIP issue: https://github.com/apache/pulsar/issues/19187 > > > > ### Motivation > > > > We sometimes try to unload the topic to resolve some consumption-stop > > issues. But the unloading topic will also impact the producer side. > > > > ### Goal > > > > Providing a new API to unload the subscription dimension triggers > > reconnection of all consumers on that subscription and reconnection is > > guaranteed by the client. The API will be used in these ways: > > - unload special subscription of one topic(or partitioned topic) > > - unload all subscriptions of one topic(or partitioned topic) > > - unload subscriptions of one topic(or partitioned topic) by regular > > expression > > - If a reader's subscription name is not set, a random subscription name > > prefixed with 'multiTopicsReader-' or 'reader-' will be used, and users can > > uninstall these subscriptions using regular expressions. > > > > In addition to triggering consumer disconnection, Unloading Subscribers > > will restart the Dispatcher, which resets the redeliver message queue and > > delayed message queue in the Broker's memory, which can help resolve issues > > caused by an abnormal dispatcher state. However, the execution flow of > > Unloading Subscribers does not include a restart of the Managed Cursor > > related to this dispatcher; if there is a problem with the cursor, we can > > only rely on the unload topic to solve it. > > > > Note: From the client's perspective, this connection may be shared by > > consumers, producers, and transactions, so Unloading Subscribers maybe > > impact the producer and transaction. > > > > #### These scenarios are not supported > > - Functions `message-dedup`, `geo-replication,` and `shadow-topic` also > > read messages from the topic, but Unloading subscribers will not support > > triggering restarts of these three functions( because the cursor is used > > directly to read the data in these scenarios, not the consumer or reader ). > > - The Compression task(subscription name is `__compaction`) also use a > > reader to read data, but Unloading Subscribers does not support it because > > this task creates a new reader each time it starts. > > - Do not support all topics related to Transaction features. > > - `__transaction_buffer_snapshot` works with the task TB recover, and > > this task will create a new reader each time they start. > > - `__transaction_pending_ack` works with the task Transaction Pending Ack > > Store replay, and this task will use managed cursor directly to read data. > > - `__transaction_log_xxx` works with the task Transaction Log, which will > > use managed cursor directly to read data. > > - `transaction_coordinator_assign` No data will be written on this topic. > > > > #### Special system topic supports > > The system topic `__change_events` is used to support topic-level policies, > > there may also be some message delivery issues in this scenario, so > > Unloading Subscribers will support this topic. > > > > ### API Changes > > > > #### For persistent topic > > ``` > > pulsar-admin persistent unload {topic_name} -s {sub_name} > > ``` > > > > #### For non-persistent topic > > ``` > > pulsar-admin non-persistent unload {topic_name} -s {sub_name} > > ``` > > > > #### Explain the param `-s` > > - set param `-s` to special sub name to unload special subscription > > - set param `-s` to `**` to unload all subscriptions under this topic > > - set param `-s` to `regexp` to unload a batch subscriptions under this > > topic > > > > > > Thanks > > Yubiao Feng