Inclined to agree with Enrico.  If it's a hard problem, it will repeat, and
this is not helping.  If it's some race on the client, it will occur
randomly and rarely, and this unload sub will get programmed in as a way of
life.

>If you don't think unloading the subscription can't help anything.
Unloading
the topic should be the same. From my experience, most of the unloading
topic operations are to mitigate the problems related to message
consumption.

Comparisons with unloading a topic are not the bar here, as that is a first
class broker utility that is needed for operational reasons outside of
"fixing"  consumer side issues . The side effect of using "unload topic" is
a loss of transient topic state. I will fully agree that this side-effect
has been  pervasively abused for fixing problems (ala Ctlrl-Alt-Del) , but
that's not the rationale for having an unload topic utility.

What kind of problems is this trying to fix?
And why cannot that be solved by client-side fixes?

In shared sub issues, it's hard to  pinpoint which consumer/where
the problem lies, and to reset that one at the client. The totality of
state spread between the brokers and all the consumers of the shared sub
needs to be put together .  Is that why we are doing this?


On Tue, Jan 17, 2023 at 5:30 PM PengHui Li <peng...@apache.org> wrote:

> I agree that if we encounter a stuck consumption issue, we should continue
> to find the root cause of the problem.
>
> Subscription unloading is just an option to mitigate the impact first.
> Maybe it can mitigate the issue for 1 hour sometimes. Especially in
> key_shared subscription. Sometimes it's not a BUG from Pulsar.
> But users need time to fix the issue. But it doesn't make sense to let
> the impaction continues until the fix is applied.
>
> I also helped many people to troubleshoot the stuck consumption
> issue related to key_shared subscriptions and transactions etc.
> In most cases, unloading the topic can mitigate the impact.
> For example, due to the un-catched exception, the dispatch thread
> stopped reading messages from the managed-ledger. The exception
> is a very infrequent occurrence. Unloading the topic is the best choice for
> now, right?
>
> If you don't think unloading the subscription can't help anything.
> Unloading
> the topic should be the same. From my experience, most of the unloading
> topic operations are to mitigate the problems related to message
> consumption.
>
> Best,
> Penghui
>
> On Tue, Jan 17, 2023 at 11:09 PM Enrico Olivelli <eolive...@gmail.com>
> wrote:
>
> > Il giorno lun 16 gen 2023 alle ore 11:58 r...@apache.org
> > <ranxiaolong...@gmail.com> ha scritto:
> > >
> > > I agree with @Enrico @Bo, if we encounter a subscribe stuck situation,
> we
> > > must continue to spend more time to locate and fix this problem, which
> is
> > > what we have been doing.
> > >
> > > But let's think about this problem from another angle. At this time, a
> > user
> > > in the production environment encounters a consumer stuck situation,
> what
> > > should we do? For a user in a production environment, our first
> reaction
> > > when encountering a problem is how to quickly recover and how to
> quickly
> > > reduce user losses. Even at this point in time, we don't think about
> > > whether this is a bug on the Broker side, a bug on the SDK side, or a
> bug
> > > used by the user himself? In the process of fast recovery, our most
> > common
> > > method is to quickly re-establish the connection between the broker and
> > the
> > > client through the topic specified by unload. In this process, we try
> to
> > > retain as much context as possible to assist us in the subsequent
> > > continuous positioning and repair of this problem.
> > >
> > > So I don't think these two things conflict. Why we expose the admin CLI
> > of
> > > the unload topic is why we expect to expose the unload subscribe. If we
> > > stand from the perspective of a developer, we definitely want to
> > completely
> > > fix the problem that caused the stuck. If we think about this issue
> from
> > > the perspective of the user, when a scenario such as consumer stuck
> > occurs
> > > to the user, the user does not care about the specific cause of the
> > > problem, but expects the business to recover quickly in the shortest
> > > possible time to avoid further loss.
> > >
> > > I admit that this is a relatively hacky way, but it can indeed solve
> the
> > > problems we are currently encountering, and at the same time, it will
> not
> > > cause a major conflict with Pulsar's existing logic. So I still insist
> on
> > > agreeing with yubiao's point of view.
> >
> >
> >
> > Usually when a subscription is "stuck" even if you unload the topic
> > it returns to the "stuck" state again if you don't solve the problem.
> >
> > This is a very common issue with Pulsar users, I am spending much time
> > helping users to troubleshoot their production problems and unloading the
> > topic
> > is never a solution, it can give you seconds, minutes or hours of
> > "working state",
> > then the problem will happen again.
> >
> > You say that it can solve the problems you are encountering.
> > Could you please give more context ? (in Slack if this is not
> > something that can be discussed in public)
> > I apologise if I seem  too much of a skeptic this time, I am sure that
> > you have a real problem
> > and you want to fix it, but I would like to help you find the best way.
> >
> > Pulsar is used by many people and we shouldn't add hacky tools for
> > temporary workarounds.
> > Once we deliver an API we should maintain it for an unlimited time.
> >
> > You could patch your system and use the patched version temporarily
> > until you find the root case.
> >
> > Enrico
> >
> > >
> > > --
> > > Thanks
> > > Xiaolong Ran
> > >
> > >
> > > Yubiao Feng <yubiao.f...@streamnative.io.invalid> 于2023年1月15日周日
> 20:59写道:
> > >
> > > > Hi Qiang
> > > >
> > > > > 1. How do you handle the race condition when you are trying to
> > unload the
> > > > subscription, and the new consumer wants to subscribe to this
> > subscription
> > > > at the same time? I'm unsure if it has the race condition. I just
> want
> > to
> > > > remind you about that.:)
> > > >
> > > > These methods `addConsumer`, `removeConsumer` all have synchronized
> > locks,
> > > > we also add synchronized lock when executing `reset subscription` can
> > solve
> > > > the problem.
> > > >
> > > > > 2. Would you like to add some restful API design to clarify the
> > > > implementation?
> > > >
> > > > Already added the rest API design in the proposal
> > > > https://github.com/apache/pulsar/issues/19187
> > > >
> > > > On Thu, Jan 12, 2023 at 3:22 PM <mattisonc...@gmail.com> wrote:
> > > >
> > > > > Hi, Yubiao
> > > > >
> > > > > I agree with this idea because some users care about the production
> > rate.
> > > > > They don't want to unload the whole topic to fix the subscription
> > > > problem.
> > > > >
> > > > > I've got some questions:
> > > > >
> > > > > 1. How do you handle the race condition when you are trying to
> > unload the
> > > > > subscription, and the new consumer wants to subscribe to this
> > > > subscription
> > > > > at the same time? I'm unsure if it has the race condition. I just
> > want to
> > > > > remind you about that. :)
> > > > > 2. Would you like to add some restful API design to clarify the
> > > > > implementation?
> > > > >     a. Request method
> > > > >     b. Request path
> > > > >     c. Response code
> > > > >     d. etc.
> > > > >
> > > > >
> > > > > Thanks for your work.
> > > > > Mattison
> > > > > On Jan 11, 2023, 17:01 +0800, Yubiao Feng <
> > yubiao.f...@streamnative.io
> > > > .invalid>,
> > > > > wrote:
> > > > > > Hi community
> > > > > >
> > > > > > I am starting a DISCUSS for PIP-240: A new API to unload
> > subscriptions.
> > > > > >
> > > > > > PIP issue: https://github.com/apache/pulsar/issues/19187
> > > > > >
> > > > > > ### Motivation
> > > > > >
> > > > > > We sometimes try to unload the topic to resolve some
> > consumption-stop
> > > > > > issues. But the unloading topic will also impact the producer
> side.
> > > > > >
> > > > > > ### Goal
> > > > > >
> > > > > > Providing a new API to unload the subscription dimension triggers
> > > > > > reconnection of all consumers on that subscription and
> > reconnection is
> > > > > > guaranteed by the client. The API will be used in these ways:
> > > > > > - unload special subscription of one topic(or partitioned topic)
> > > > > > - unload all subscriptions of one topic(or partitioned topic)
> > > > > > - unload subscriptions of one topic(or partitioned topic) by
> > regular
> > > > > > expression
> > > > > > - If a reader's subscription name is not set, a random
> subscription
> > > > name
> > > > > > prefixed with 'multiTopicsReader-' or 'reader-' will be used, and
> > users
> > > > > can
> > > > > > uninstall these subscriptions using regular expressions.
> > > > > >
> > > > > > In addition to triggering consumer disconnection, Unloading
> > Subscribers
> > > > > > will restart the Dispatcher, which resets the redeliver message
> > queue
> > > > and
> > > > > > delayed message queue in the Broker's memory, which can help
> > resolve
> > > > > issues
> > > > > > caused by an abnormal dispatcher state. However, the execution
> > flow of
> > > > > > Unloading Subscribers does not include a restart of the Managed
> > Cursor
> > > > > > related to this dispatcher; if there is a problem with the
> cursor,
> > we
> > > > can
> > > > > > only rely on the unload topic to solve it.
> > > > > >
> > > > > > Note: From the client's perspective, this connection may be
> shared
> > by
> > > > > > consumers, producers, and transactions, so Unloading Subscribers
> > maybe
> > > > > > impact the producer and transaction.
> > > > > >
> > > > > > #### These scenarios are not supported
> > > > > > - Functions `message-dedup`, `geo-replication,` and
> `shadow-topic`
> > also
> > > > > > read messages from the topic, but Unloading subscribers will not
> > > > support
> > > > > > triggering restarts of these three functions( because the cursor
> is
> > > > used
> > > > > > directly to read the data in these scenarios, not the consumer or
> > > > reader
> > > > > ).
> > > > > > - The Compression task(subscription name is `__compaction`) also
> > use a
> > > > > > reader to read data, but Unloading Subscribers does not support
> it
> > > > > because
> > > > > > this task creates a new reader each time it starts.
> > > > > > - Do not support all topics related to Transaction features.
> > > > > > - `__transaction_buffer_snapshot` works with the task TB recover,
> > and
> > > > > > this task will create a new reader each time they start.
> > > > > > - `__transaction_pending_ack` works with the task Transaction
> > Pending
> > > > Ack
> > > > > > Store replay, and this task will use managed cursor directly to
> > read
> > > > > data.
> > > > > > - `__transaction_log_xxx` works with the task Transaction Log,
> > which
> > > > will
> > > > > > use managed cursor directly to read data.
> > > > > > - `transaction_coordinator_assign` No data will be written on
> this
> > > > topic.
> > > > > >
> > > > > > #### Special system topic supports
> > > > > > The system topic `__change_events` is used to support topic-level
> > > > > policies,
> > > > > > there may also be some message delivery issues in this scenario,
> so
> > > > > > Unloading Subscribers will support this topic.
> > > > > >
> > > > > > ### API Changes
> > > > > >
> > > > > > #### For persistent topic
> > > > > > ```
> > > > > > pulsar-admin persistent unload {topic_name} -s {sub_name}
> > > > > > ```
> > > > > >
> > > > > > #### For non-persistent topic
> > > > > > ```
> > > > > > pulsar-admin non-persistent unload {topic_name} -s {sub_name}
> > > > > > ```
> > > > > >
> > > > > > #### Explain the param `-s`
> > > > > > - set param `-s` to special sub name to unload special
> subscription
> > > > > > - set param `-s` to `**` to unload all subscriptions under this
> > topic
> > > > > > - set param `-s` to `regexp` to unload a batch subscriptions
> under
> > this
> > > > > > topic
> > > > > >
> > > > > >
> > > > > > Thanks
> > > > > > Yubiao Feng
> > > > >
> > > >
> >
>

Reply via email to