Yubiao,
thanks for sharing your problem and a proposal, this is very helpful
for the community to get in touch with the pain of Pulsar
users/administrators.

In my experience if a "subscription is stuck", the problems are:
* the client has some problems (bug in the client/misconfiguration
somewhere) - 99.9%
* there is a bug in Pulsar - 0.1%

Unloading a topic is an operation that triggers some reset of the
state on both the broker and the clients and this usually TEMPORARY
unblocks the subscription.

I have never seen a problem that is temporarily solved by topic
unload/broker restart to be permanently solved with that operation.
If there is a problem we should spend time on investigating the
problem and not in adding this kind of tool.

I believe that we should not continue to add these kinds of hacks into Pulsar:
- easy reset...
- ignore errors... (catch Throwable...)

The overall result is a system that "seems to work" but it actually
doesn't work properly

I would invest more time in:
- monitoring tools (tools to detect quickly stuck consumers)
- circuit breakers (fast fail/shut the door to consumers/producers
that don't behave correctly)
- guard rails (limits to clients to prevent them to exhaust the
resources on the brokers)



Enrico

Il giorno gio 12 gen 2023 alle ore 08:22 <mattisonc...@gmail.com> ha scritto:
>
> Hi, Yubiao
>
> I agree with this idea because some users care about the production rate. 
> They don't want to unload the whole topic to fix the subscription problem.
>
> I've got some questions:
>
> 1. How do you handle the race condition when you are trying to unload the 
> subscription, and the new consumer wants to subscribe to this subscription at 
> the same time? I'm unsure if it has the race condition. I just want to remind 
> you about that. :)
> 2. Would you like to add some restful API design to clarify the 
> implementation?
>     a. Request method
>     b. Request path
>     c. Response code
>     d. etc.
>
>
> Thanks for your work.
> Mattison
> On Jan 11, 2023, 17:01 +0800, Yubiao Feng 
> <yubiao.f...@streamnative.io.invalid>, wrote:
> > Hi community
> >
> > I am starting a DISCUSS for PIP-240: A new API to unload subscriptions.
> >
> > PIP issue: https://github.com/apache/pulsar/issues/19187
> >
> > ### Motivation
> >
> > We sometimes try to unload the topic to resolve some consumption-stop
> > issues. But the unloading topic will also impact the producer side.
> >
> > ### Goal
> >
> > Providing a new API to unload the subscription dimension triggers
> > reconnection of all consumers on that subscription and reconnection is
> > guaranteed by the client. The API will be used in these ways:
> > - unload special subscription of one topic(or partitioned topic)
> > - unload all subscriptions of one topic(or partitioned topic)
> > - unload subscriptions of one topic(or partitioned topic) by regular
> > expression
> > - If a reader's subscription name is not set, a random subscription name
> > prefixed with 'multiTopicsReader-' or 'reader-' will be used, and users can
> > uninstall these subscriptions using regular expressions.
> >
> > In addition to triggering consumer disconnection, Unloading Subscribers
> > will restart the Dispatcher, which resets the redeliver message queue and
> > delayed message queue in the Broker's memory, which can help resolve issues
> > caused by an abnormal dispatcher state. However, the execution flow of
> > Unloading Subscribers does not include a restart of the Managed Cursor
> > related to this dispatcher; if there is a problem with the cursor, we can
> > only rely on the unload topic to solve it.
> >
> > Note: From the client's perspective, this connection may be shared by
> > consumers, producers, and transactions, so Unloading Subscribers maybe
> > impact the producer and transaction.
> >
> > #### These scenarios are not supported
> > - Functions `message-dedup`, `geo-replication,` and `shadow-topic` also
> > read messages from the topic, but Unloading subscribers will not support
> > triggering restarts of these three functions( because the cursor is used
> > directly to read the data in these scenarios, not the consumer or reader ).
> > - The Compression task(subscription name is `__compaction`) also use a
> > reader to read data, but Unloading Subscribers does not support it because
> > this task creates a new reader each time it starts.
> > - Do not support all topics related to Transaction features.
> > - `__transaction_buffer_snapshot` works with the task TB recover, and
> > this task will create a new reader each time they start.
> > - `__transaction_pending_ack` works with the task Transaction Pending Ack
> > Store replay, and this task will use managed cursor directly to read data.
> > - `__transaction_log_xxx` works with the task Transaction Log, which will
> > use managed cursor directly to read data.
> > - `transaction_coordinator_assign` No data will be written on this topic.
> >
> > #### Special system topic supports
> > The system topic `__change_events` is used to support topic-level policies,
> > there may also be some message delivery issues in this scenario, so
> > Unloading Subscribers will support this topic.
> >
> > ### API Changes
> >
> > #### For persistent topic
> > ```
> > pulsar-admin persistent unload {topic_name} -s {sub_name}
> > ```
> >
> > #### For non-persistent topic
> > ```
> > pulsar-admin non-persistent unload {topic_name} -s {sub_name}
> > ```
> >
> > #### Explain the param `-s`
> > - set param `-s` to special sub name to unload special subscription
> > - set param `-s` to `**` to unload all subscriptions under this topic
> > - set param `-s` to `regexp` to unload a batch subscriptions under this
> > topic
> >
> >
> > Thanks
> > Yubiao Feng

Reply via email to