I agree with @Enrico @Bo, if we encounter a subscribe stuck situation, we must continue to spend more time to locate and fix this problem, which is what we have been doing.
But let's think about this problem from another angle. At this time, a user in the production environment encounters a consumer stuck situation, what should we do? For a user in a production environment, our first reaction when encountering a problem is how to quickly recover and how to quickly reduce user losses. Even at this point in time, we don't think about whether this is a bug on the Broker side, a bug on the SDK side, or a bug used by the user himself? In the process of fast recovery, our most common method is to quickly re-establish the connection between the broker and the client through the topic specified by unload. In this process, we try to retain as much context as possible to assist us in the subsequent continuous positioning and repair of this problem. So I don't think these two things conflict. Why we expose the admin CLI of the unload topic is why we expect to expose the unload subscribe. If we stand from the perspective of a developer, we definitely want to completely fix the problem that caused the stuck. If we think about this issue from the perspective of the user, when a scenario such as consumer stuck occurs to the user, the user does not care about the specific cause of the problem, but expects the business to recover quickly in the shortest possible time to avoid further loss. I admit that this is a relatively hacky way, but it can indeed solve the problems we are currently encountering, and at the same time, it will not cause a major conflict with Pulsar's existing logic. So I still insist on agreeing with yubiao's point of view. -- Thanks Xiaolong Ran Yubiao Feng <yubiao.f...@streamnative.io.invalid> 于2023年1月15日周日 20:59写道: > Hi Qiang > > > 1. How do you handle the race condition when you are trying to unload the > subscription, and the new consumer wants to subscribe to this subscription > at the same time? I'm unsure if it has the race condition. I just want to > remind you about that.:) > > These methods `addConsumer`, `removeConsumer` all have synchronized locks, > we also add synchronized lock when executing `reset subscription` can solve > the problem. > > > 2. Would you like to add some restful API design to clarify the > implementation? > > Already added the rest API design in the proposal > https://github.com/apache/pulsar/issues/19187 > > On Thu, Jan 12, 2023 at 3:22 PM <mattisonc...@gmail.com> wrote: > > > Hi, Yubiao > > > > I agree with this idea because some users care about the production rate. > > They don't want to unload the whole topic to fix the subscription > problem. > > > > I've got some questions: > > > > 1. How do you handle the race condition when you are trying to unload the > > subscription, and the new consumer wants to subscribe to this > subscription > > at the same time? I'm unsure if it has the race condition. I just want to > > remind you about that. :) > > 2. Would you like to add some restful API design to clarify the > > implementation? > > a. Request method > > b. Request path > > c. Response code > > d. etc. > > > > > > Thanks for your work. > > Mattison > > On Jan 11, 2023, 17:01 +0800, Yubiao Feng <yubiao.f...@streamnative.io > .invalid>, > > wrote: > > > Hi community > > > > > > I am starting a DISCUSS for PIP-240: A new API to unload subscriptions. > > > > > > PIP issue: https://github.com/apache/pulsar/issues/19187 > > > > > > ### Motivation > > > > > > We sometimes try to unload the topic to resolve some consumption-stop > > > issues. But the unloading topic will also impact the producer side. > > > > > > ### Goal > > > > > > Providing a new API to unload the subscription dimension triggers > > > reconnection of all consumers on that subscription and reconnection is > > > guaranteed by the client. The API will be used in these ways: > > > - unload special subscription of one topic(or partitioned topic) > > > - unload all subscriptions of one topic(or partitioned topic) > > > - unload subscriptions of one topic(or partitioned topic) by regular > > > expression > > > - If a reader's subscription name is not set, a random subscription > name > > > prefixed with 'multiTopicsReader-' or 'reader-' will be used, and users > > can > > > uninstall these subscriptions using regular expressions. > > > > > > In addition to triggering consumer disconnection, Unloading Subscribers > > > will restart the Dispatcher, which resets the redeliver message queue > and > > > delayed message queue in the Broker's memory, which can help resolve > > issues > > > caused by an abnormal dispatcher state. However, the execution flow of > > > Unloading Subscribers does not include a restart of the Managed Cursor > > > related to this dispatcher; if there is a problem with the cursor, we > can > > > only rely on the unload topic to solve it. > > > > > > Note: From the client's perspective, this connection may be shared by > > > consumers, producers, and transactions, so Unloading Subscribers maybe > > > impact the producer and transaction. > > > > > > #### These scenarios are not supported > > > - Functions `message-dedup`, `geo-replication,` and `shadow-topic` also > > > read messages from the topic, but Unloading subscribers will not > support > > > triggering restarts of these three functions( because the cursor is > used > > > directly to read the data in these scenarios, not the consumer or > reader > > ). > > > - The Compression task(subscription name is `__compaction`) also use a > > > reader to read data, but Unloading Subscribers does not support it > > because > > > this task creates a new reader each time it starts. > > > - Do not support all topics related to Transaction features. > > > - `__transaction_buffer_snapshot` works with the task TB recover, and > > > this task will create a new reader each time they start. > > > - `__transaction_pending_ack` works with the task Transaction Pending > Ack > > > Store replay, and this task will use managed cursor directly to read > > data. > > > - `__transaction_log_xxx` works with the task Transaction Log, which > will > > > use managed cursor directly to read data. > > > - `transaction_coordinator_assign` No data will be written on this > topic. > > > > > > #### Special system topic supports > > > The system topic `__change_events` is used to support topic-level > > policies, > > > there may also be some message delivery issues in this scenario, so > > > Unloading Subscribers will support this topic. > > > > > > ### API Changes > > > > > > #### For persistent topic > > > ``` > > > pulsar-admin persistent unload {topic_name} -s {sub_name} > > > ``` > > > > > > #### For non-persistent topic > > > ``` > > > pulsar-admin non-persistent unload {topic_name} -s {sub_name} > > > ``` > > > > > > #### Explain the param `-s` > > > - set param `-s` to special sub name to unload special subscription > > > - set param `-s` to `**` to unload all subscriptions under this topic > > > - set param `-s` to `regexp` to unload a batch subscriptions under this > > > topic > > > > > > > > > Thanks > > > Yubiao Feng > > >