Lari, Thanks for sharing your thoughts. I'll work on making this easier to reproduce. That's a good suggestion.
> How did you detect that nothing else is executed? Based on log output? I suppose it may be misleading for me to say that "nothing" else is executed since I'm judging based on debug log output, and not all modules are outputting to the log. However, I'm logging the modules mentioned in https://github.com/apache/pulsar/issues/6054#issuecomment-842626569, which are: - name: org.apache.pulsar.broker.service level: trace ref: "${sys:pulsar.log.appender}" - name: org.apache.pulsar.common level: trace ref: "${sys:pulsar.log.appender}" - name: org.apache.pulsar.client level: trace ref: "${sys:pulsar.log.appender}" - name: org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl level: trace ref: "${sys:pulsar.log.appender}" Despite that, without adding custom log lines throughout the code, the debug lines go quiet on the broker that owns the topic that stops performing, as shown in https://github.com/apache/pulsar/issues/6054#issuecomment-834127635 Also, as a quick update, I did some performance testing in *ManagedLedgerImpl.updateLedgersIdsComplete(..)* and didn't find any obvious issues, so that may have been misleading as well. I'm continuing my performance testing by testing other methods that are called prior to the broker sending ack's to the client (since that's where the slowdown appears to be.) Devin G. Bost On Tue, May 18, 2021 at 1:15 PM Lari Hotari <l...@hotari.net> wrote: > Thanks Devin, very thoughtful observations. > > Do you think that it would be possible to create a repro of the case with > some "dummy" code in the functions? Perhaps messages could be produced to > the input topic partitions with pulsar-perf? > > There's a lot of interesting and useful details in your observation. I > haven't had the chance to go through all the details. I have a few comments > or questions on some topics. > > > 3. When a subscription is "frozen," the broker that owns the "frozen" > topic is doing nothing but processing pendingAddEntries in the do loop of > *ManagedLedgerImpl.updateLedgersIdsComplete(..)*, here: > > https://github.com/apache/pulsar/blob/30d48cbbec8b522e8ab595f70fb1edf8b31bb51b/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L1507 > > How did you detect that nothing else is executed? Based on log output? > > > 5. Functions get blocked from producing due to waiting for ack's from the > broker on pending messages, which blocks the function semaphore and results > in large backlog accumulation. > > For this case, it would be useful to do a jstack thread dump for both > broker and client side when the problem is ongoing. Multiple subsequent > thread dump taken a few seconds apart are useful in detecting > blocked/dead-locked threads. Heap dumps would also be useful, but those > contain confidential details about the messages and environment > credentials, so they shouldn't be shared publicly or by using in-secure > channels. > > -Lari > > On Tue, May 18, 2021 at 4:26 AM Devin Bost <devin.b...@gmail.com> wrote: > > > While researching https://github.com/apache/pulsar/issues/6054, I > > discovered some key things that revealed a major performance issue that > > manifests in high-velocity workloads: > > > > 1. The problem can be reproduced when batching is disabled on all Pulsar > > functions > > 2. When a subscription is "frozen," it's actually still processing > > messages but extremely slowly > > 3. When a subscription is "frozen," the broker that owns the "frozen" > > topic is doing nothing but processing pendingAddEntries in the do loop of > > *ManagedLedgerImpl.updateLedgersIdsComplete(..)*, here: > > > https://github.com/apache/pulsar/blob/30d48cbbec8b522e8ab595f70fb1edf8b31bb51b/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L1507 > > 4. Restarting the broker that owns the topic will unblock the "freeze" > for > > a non-deterministic amount of time (sometimes minutes, sometimes hours.) > > 5. Functions get blocked from producing due to waiting for ack's from the > > broker on pending messages, which blocks the function semaphore and > results > > in large backlog accumulation. > > 6. If input topics stop being written to, the backlogs will eventually > > clear (after some amount of hours), even on topics that are "frozen." > > 7. The issue is easiest to reproduce in high-velocity/high-volume > > workloads, especially if partitioned topics and parallel function > instances > > are used. > > 8. Permit behavior is not abnormal. > > > > Here are some thoughts: > > 1. I noticed that *ManagedLedgerImpl.updateLedgersIdsComplete(..)* is > > synchronized. I'm concerned that the mutex may be locking up too much of > > ManagedLedgerImpl, but I suspect this is not the root cause of the issue, > > and I'm not sure if it's possible to add parallelization in this area. > > (Someone with a deeper understanding of ManagedLedger could chime in > here.) > > 2. Load isn't sufficiently balanced across the brokers. Checking debug > > logs for executions of *ManagedLedgerImpl.updateLedgersIdsComplete(..)* > > indicated: > > broker1: 0 executions > > broker2: 2228 executions > > broker3: 8221 executions. (This is the broker that owns the "frozen" > > topic.) > > In the flow that reproduced the issue, there were 3 topics separated by > > Pulsar Functions, the first topic (the input topic) has 4 partitions, > both > > Pulsar Functions are filtering messages, and the first function > (consuming > > from the first topic) has 4 parallel instances. Topics are all persistent > > without retention or tiered storage, and the subscriptions are all > shared. > > Here is a graphic that shows the flow: > > [image: image.png] > > (The green numbers are backlog counts.) > > In the image, the topic that is "frozen" is the "siege-filtered" topic on > > broker3. > > > > Looking at the sizes of debug logs that accumulated on the brokers (with > > the most frequently executed debug log line occurring inside the do loop > of > > *ManagedLedgerImpl.updateLedgersIdsComplete(..)*, the load imbalance is > > even clearer: > > broker1: 7.1MB > > broker2: 242 MB > > broker3: 6.6 GB > > > > CPU isn't fully utilized: > > broker1: 35.30% > > broker2: 53.24% > > broker3: 36.49% > > > > The bookies' CPU also isn't fully utilized: > > bookie1 (on broker1): 14.27% > > bookie2 (on broker2): 16.72% > > bookie3 (on broker3): 9.40% > > > > The thing that's not clear to me yet is why unloading the "frozen" topic > > would clear the blockage. I'd think it would just move the performance > > issue to another broker. Any ideas on why unloading the "frozen" topic to > > another broker might improve performance (which is often enough of a > boost > > to allow backlogs to catch up) would be helpful. > > > > I'm planning to continue the investigation into the root cause of the > > performance issue, but I wanted to bring this to the attention of the > > community since I know a number of you (such as Rajan, Penghui, Jerry, > > Sijie, Lari Hotari, and others) have assisted with various aspects of the > > investigation at various points in time. > > > > Devin G. Bost > > >