Thanks for driving this discussion, Yubiao. I appreciate you bringing
this important issue to the community's attention.

Regarding the solution in 24423, I think we need to look further.

Pulsar already contains multiple ways for backpressure and rate
limiting where Netty's autoread is already being toggled. This could
conflict with the solution that is being proposed. Additionally, due
to the fact that a single connection might be serving multiple
producers and consumers, toggling autoread would also impact producers
and consumers. Pulsar already handles the memory held in buffers for
producer and consumer use cases. Adding the proposed solution in PR
24423 could have a negative impact in multiple ways.

I would assume that a more optimal solution lies in targeting
specifically the GET_TOPICS_OF_NAMESPACE request and
GET_TOPICS_OF_NAMESPACE_RESPONSE response, as well as the
WATCH_TOPIC_LIST request with
WATCH_TOPIC_LIST_SUCCESS/WATCH_TOPIC_UPDATE responses.

An additional benefit of targeting topic listing responses directly is
that it would be possible to add a global limit for the broker. The
proposed PR 24423 cannot ensure that the broker memory usage would be
limited since it's targeting individual connections and there isn't a
global limit for the broker. Due to this reason, the PR 24423 solution
wouldn't solve the problem it intends to solve. The topic listing
limit should be global for the broker like we already have
maxMessagePublishBufferSizeInMB,
managedLedgerMaxReadsInFlightSizeInMB, and managedLedgerCacheSizeMB
settings.

One possible way would be to ensure that the size of responses could
be calculated and there could be a solution that limits in-flight
responses, somewhat similar to how
managedLedgerMaxReadsInFlightSizeInMB has been implemented
asynchronously in InflightReadsLimiter. In this solution, it would be
necessary to deduplicate the java.lang.String and TopicName instances
as demonstrated in PR 24457 so that keeping the listings in memory
doesn't add significant extra memory usage. Side note: This
deduplication is necessary in any case for large amounts of
WATCH_TOPIC_LIST requests since each instance keeps a list of topics
in the memory of the broker.

There could be some additional challenges that come up in the
implementation, but the idea would be to postpone and delay the
specific topic listing responses instead of relying on toggling
autoread for the connection. Toggling autoread could be the last
resort in corner cases. In the Pulsar broker, autoread should be
handled with org.apache.pulsar.broker.service.ServerCnxThrottleTracker's
incrementThrottleCount/decrementThrottleCount instead of toggling
autoread directly. (The javadoc for ServerCnxThrottleTracker contains
more details.)

I hope this helps us find a more optimal solution to the problem.
Looking forward to continuing this discussion and working together to
improve Pulsar's stability.

-Lari


On Tue, 8 Jul 2025 at 09:07, Yubiao Feng
<yubiao.f...@streamnative.io.invalid> wrote:
>
> Hi all
>
> I want to satrt a discussion, which relates to the PR. #24423: Handling
> Overloaded Netty Channels in Apache Pulsar
>
> Problem Statement
> We've encountered a critical issue in our Apache Pulsar clusters where
> brokers experience Out-Of-Memory (OOM) errors and continuous restarts under
> specific load patterns. This occurs when Netty channel write buffers become
> full, leading to a buildup of unacknowledged responses in the broker's
> memory.
>
> Background
> Our clusters are configured with numerous namespaces, each containing
> approximately 8,000 to 10,000 topics. Our consumer applications are quite
> large, with each consumer using a regular expression (regex) pattern to
> subscribe to all topics within a namespace.
>
> The problem manifests particularly during consumer application restarts.
> When a consumer restarts, it issues a getTopicsOfNamespace request. Due to
> the sheer number of topics, the response size is extremely large. This
> massive response overwhelms the socket output buffer, causing it to fill up
> rapidly. Consequently, the broker's responses get backlogged in memory,
> eventually leading to the broker's OOM and subsequent restart loop.
>
> Why "Returning an Error" Is Not a Solution
> A common approach to handling overload is to simply return an error when
> the broker cannot process a request. However, in this specific scenario,
> this solution is ineffective. If a consumer application fails to start due
> to an error, it triggers a user pod restart, which then leads to the same
> getTopicsOfNamespace request being reissued, resulting in a continuous loop
> of errors and restarts. This creates an unrecoverable state for the
> consumer application and puts immense pressure on the brokers.
>
> Proposed Solution and Justification
> We believe the solution proposed in
> https://github.com/apache/pulsar/pull/24423 is highly suitable for
> addressing this issue. The core mechanism introduced in this PR – pausing
> acceptance of new requests when a channel cannot handle more output – is
> exceptionally reasonable and addresses the root cause of the memory
> pressure.
>
> This approach prevents the broker from accepting new requests when its
> write buffers are full, effectively backpressuring the client and
> preventing the memory buildup that leads to OOMs. Furthermore, we
> anticipate that this mechanism will not significantly increase future
> maintenance costs, as it elegantly handles overload scenarios at a
> fundamental network layer.
>
> I invite the community to discuss this solution and its potential benefits
> for the overall stability and resilience of Apache Pulsar.
>
> Thanks
> Yubiao Feng
>
> --
> This email and any attachments are intended solely for the recipient(s)
> named above and may contain confidential, privileged, or proprietary
> information. If you are not the intended recipient, you are hereby notified
> that any disclosure, copying, distribution, or reproduction of this
> information is strictly prohibited. If you have received this email in
> error, please notify the sender immediately by replying to this email and
> delete it from your system.

Reply via email to