Thanks Devin, very thoughtful observations.

Do you think that it would be possible to create a repro of the case with
some "dummy" code in the functions? Perhaps messages could be produced to
the input topic partitions with pulsar-perf?

There's a lot of interesting and useful details in your observation. I
haven't had the chance to go through all the details. I have a few comments
or questions on some topics.

> 3. When a subscription is "frozen," the broker that owns the "frozen"
topic is doing nothing but processing pendingAddEntries in the do loop of
*ManagedLedgerImpl.updateLedgersIdsComplete(..)*, here:
https://github.com/apache/pulsar/blob/30d48cbbec8b522e8ab595f70fb1edf8b31bb51b/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L1507

How did you detect that nothing else is executed? Based on log output?

> 5. Functions get blocked from producing due to waiting for ack's from the
broker on pending messages, which blocks the function semaphore and results
in large backlog accumulation.

For this case, it would be useful to do a jstack thread dump for both
broker and client side when the problem is ongoing. Multiple subsequent
thread dump taken a few seconds apart are useful in detecting
blocked/dead-locked threads. Heap dumps would also be useful, but those
contain confidential details about the messages and environment
credentials, so they shouldn't be shared publicly or by using in-secure
channels.

-Lari

On Tue, May 18, 2021 at 4:26 AM Devin Bost <devin.b...@gmail.com> wrote:

> While researching https://github.com/apache/pulsar/issues/6054, I
> discovered some key things that revealed a major performance issue that
> manifests in high-velocity workloads:
>
> 1. The problem can be reproduced when batching is disabled on all Pulsar
> functions
> 2. When a subscription is "frozen," it's actually still processing
> messages but extremely slowly
> 3. When a subscription is "frozen," the broker that owns the "frozen"
> topic is doing nothing but processing pendingAddEntries in the do loop of
> *ManagedLedgerImpl.updateLedgersIdsComplete(..)*, here:
> https://github.com/apache/pulsar/blob/30d48cbbec8b522e8ab595f70fb1edf8b31bb51b/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L1507
> 4. Restarting the broker that owns the topic will unblock the "freeze" for
> a non-deterministic amount of time (sometimes minutes, sometimes hours.)
> 5. Functions get blocked from producing due to waiting for ack's from the
> broker on pending messages, which blocks the function semaphore and results
> in large backlog accumulation.
> 6. If input topics stop being written to, the backlogs will eventually
> clear (after some amount of hours), even on topics that are "frozen."
> 7. The issue is easiest to reproduce in high-velocity/high-volume
> workloads, especially if partitioned topics and parallel function instances
> are used.
> 8. Permit behavior is not abnormal.
>
> Here are some thoughts:
> 1. I noticed that *ManagedLedgerImpl.updateLedgersIdsComplete(..)* is
> synchronized. I'm concerned that the mutex may be locking up too much of
> ManagedLedgerImpl, but I suspect this is not the root cause of the issue,
> and I'm not sure if it's possible to add parallelization in this area.
> (Someone with a deeper understanding of ManagedLedger could chime in here.)
> 2. Load isn't sufficiently balanced across the brokers. Checking debug
> logs for executions of *ManagedLedgerImpl.updateLedgersIdsComplete(..)*
> indicated:
>    broker1: 0 executions
>    broker2: 2228 executions
>    broker3: 8221 executions. (This is the broker that owns the "frozen"
> topic.)
> In the flow that reproduced the issue, there were 3 topics separated by
> Pulsar Functions, the first topic (the input topic) has 4 partitions, both
> Pulsar Functions are filtering messages, and the first function (consuming
> from the first topic) has 4 parallel instances. Topics are all persistent
> without retention or tiered storage, and the subscriptions are all shared.
> Here is a graphic that shows the flow:
> [image: image.png]
> (The green numbers are backlog counts.)
> In the image, the topic that is "frozen" is the "siege-filtered" topic on
> broker3.
>
> Looking at the sizes of debug logs that accumulated on the brokers (with
> the most frequently executed debug log line occurring inside the do loop of
> *ManagedLedgerImpl.updateLedgersIdsComplete(..)*, the load imbalance is
> even clearer:
>    broker1: 7.1MB
>    broker2: 242 MB
>    broker3: 6.6 GB
>
> CPU isn't fully utilized:
>   broker1: 35.30%
>   broker2: 53.24%
>   broker3: 36.49%
>
> The bookies' CPU also isn't fully utilized:
>   bookie1 (on broker1): 14.27%
>   bookie2 (on broker2): 16.72%
>   bookie3 (on broker3): 9.40%
>
> The thing that's not clear to me yet is why unloading the "frozen" topic
> would clear the blockage. I'd think it would just move the performance
> issue to another broker. Any ideas on why unloading the "frozen" topic to
> another broker might improve performance (which is often enough of a boost
> to allow backlogs to catch up) would be helpful.
>
> I'm planning to continue the investigation into the root cause of the
> performance issue, but I wanted to bring this to the attention of the
> community since I know a number of you (such as Rajan, Penghui, Jerry,
> Sijie, Lari Hotari, and others) have assisted with various aspects of the
> investigation at various points in time.
>
> Devin G. Bost
>

Reply via email to