Re: [Discussion] PR #24423 Handling Overloaded Netty Channels in Apache Pulsar

Lari Hotari Wed, 09 Jul 2025 04:27:32 -0700

> > Adding the proposed solution in PR
> > 24423 could have a negative impact
> > in multiple ways.
>
> Could you explain the negative impacts?


My previous email already mentioned multiple details. I'd also like to
emphasize that a connection-level solution wouldn't limit the overall
broker memory usage. The solution in the PR wouldn't address the issue
that it intends to solve.

Besides the negative impacts mentioned in the previous email, one
potential negative impact relates to performance. This would
potentially cause a negative performance impact since Pulsar shares
broker connections. Netty's writability is constantly changing when
dispatching to consumers, especially when the connection is shared
across many consumers from the same client. The channel writability is
controlled with
https://netty.io/4.1/api/io/netty/channel/WriteBufferWaterMark.html
settings. We don't expose the configuration for these settings. By
default, the high watermark is 64kB. When more than 64kB of output is
buffered, the writability will change to false. The reason why this
could impact performance is that there would be less "pipelining" in
the Pulsar broker after this change is made.

Less pipelining could mean that a single consumer dispatch will toggle
the autoread to false, and only after the dispatched records have been
written to the socket successfully would the Netty channel resume
processing new input commands. This would mean that there couldn't be
many pending operations in progress simultaneously. This would most
likely impact performance negatively. Pipelining is important for
performance in many distributed systems.

It could be useful to throttle based on WriteBufferWaterMark settings,
but the settings should be much higher than the defaults. Throttling
should be implemented using ServerCnxThrottleTracker's
incrementThrottleCount/decrementThrottleCount. However, this solution
wouldn't limit the overall broker memory usage and therefore wouldn't
be the primary solution for addressing the presented issue.

By the way, the solution in PR 24423 uses an incorrect way for
backpressuring based on outboundBuffer's totalPendingWriteBytes.
Instead of having a new setting "connectionMaxPendingWriteBytes", the
correct way would be to expose and configure WriteBufferWaterMark high
and low settings for the child channel
(ChannelOption.WRITE_BUFFER_WATER_MARK) and rely on
channelWritabilityChanged and use ServerCnxThrottleTracker's
incrementThrottleCount/decrementThrottleCount there.

I'd be fine with the PR 24423 solution if it would be based on
ChannelOption.WRITE_BUFFER_WATER_MARK settings and handle autoread
toggling with ServerCnxThrottleTracker's
incrementThrottleCount/decrementThrottleCount in the
channelWritabilityChanged callback method. There should be a separate
setting to enable it, since configuring WriteBufferWaterMark high and
low values are separate from the throttling based on channel
writability. It's just that this solution wouldn't be the correct way
to fix the actual problem.

-Lari


-Lari

On Wed, 9 Jul 2025 at 12:59, Yubiao Feng
<yubiao.f...@streamnative.io.invalid> wrote:
>
> Hi Lari
>
> > Adding the proposed solution in PR
> > 24423 could have a negative impact
> > in multiple ways.
>
> Could you explain the negative impacts?
>
> On Wed, Jul 9, 2025 at 4:47 PM Lari Hotari <lhot...@apache.org> wrote:
>
> > Thanks for driving this discussion, Yubiao. I appreciate you bringing
> > this important issue to the community's attention.
> >
> > Regarding the solution in 24423, I think we need to look further.
> >
> > Pulsar already contains multiple ways for backpressure and rate
> > limiting where Netty's autoread is already being toggled. This could
> > conflict with the solution that is being proposed. Additionally, due
> > to the fact that a single connection might be serving multiple
> > producers and consumers, toggling autoread would also impact producers
> > and consumers. Pulsar already handles the memory held in buffers for
> > producer and consumer use cases. Adding the proposed solution in PR
> > 24423 could have a negative impact in multiple ways.
> >
> > I would assume that a more optimal solution lies in targeting
> > specifically the GET_TOPICS_OF_NAMESPACE request and
> > GET_TOPICS_OF_NAMESPACE_RESPONSE response, as well as the
> > WATCH_TOPIC_LIST request with
> > WATCH_TOPIC_LIST_SUCCESS/WATCH_TOPIC_UPDATE responses.
> >
> > An additional benefit of targeting topic listing responses directly is
> > that it would be possible to add a global limit for the broker. The
> > proposed PR 24423 cannot ensure that the broker memory usage would be
> > limited since it's targeting individual connections and there isn't a
> > global limit for the broker. Due to this reason, the PR 24423 solution
> > wouldn't solve the problem it intends to solve. The topic listing
> > limit should be global for the broker like we already have
> > maxMessagePublishBufferSizeInMB,
> > managedLedgerMaxReadsInFlightSizeInMB, and managedLedgerCacheSizeMB
> > settings.
> >
> > One possible way would be to ensure that the size of responses could
> > be calculated and there could be a solution that limits in-flight
> > responses, somewhat similar to how
> > managedLedgerMaxReadsInFlightSizeInMB has been implemented
> > asynchronously in InflightReadsLimiter. In this solution, it would be
> > necessary to deduplicate the java.lang.String and TopicName instances
> > as demonstrated in PR 24457 so that keeping the listings in memory
> > doesn't add significant extra memory usage. Side note: This
> > deduplication is necessary in any case for large amounts of
> > WATCH_TOPIC_LIST requests since each instance keeps a list of topics
> > in the memory of the broker.
> >
> > There could be some additional challenges that come up in the
> > implementation, but the idea would be to postpone and delay the
> > specific topic listing responses instead of relying on toggling
> > autoread for the connection. Toggling autoread could be the last
> > resort in corner cases. In the Pulsar broker, autoread should be
> > handled with org.apache.pulsar.broker.service.ServerCnxThrottleTracker's
> > incrementThrottleCount/decrementThrottleCount instead of toggling
> > autoread directly. (The javadoc for ServerCnxThrottleTracker contains
> > more details.)
> >
> > I hope this helps us find a more optimal solution to the problem.
> > Looking forward to continuing this discussion and working together to
> > improve Pulsar's stability.
> >
> > -Lari
> >
> >
> > On Tue, 8 Jul 2025 at 09:07, Yubiao Feng
> > <yubiao.f...@streamnative.io.invalid> wrote:
> > >
> > > Hi all
> > >
> > > I want to satrt a discussion, which relates to the PR. #24423: Handling
> > > Overloaded Netty Channels in Apache Pulsar
> > >
> > > Problem Statement
> > > We've encountered a critical issue in our Apache Pulsar clusters where
> > > brokers experience Out-Of-Memory (OOM) errors and continuous restarts
> > under
> > > specific load patterns. This occurs when Netty channel write buffers
> > become
> > > full, leading to a buildup of unacknowledged responses in the broker's
> > > memory.
> > >
> > > Background
> > > Our clusters are configured with numerous namespaces, each containing
> > > approximately 8,000 to 10,000 topics. Our consumer applications are quite
> > > large, with each consumer using a regular expression (regex) pattern to
> > > subscribe to all topics within a namespace.
> > >
> > > The problem manifests particularly during consumer application restarts.
> > > When a consumer restarts, it issues a getTopicsOfNamespace request. Due
> > to
> > > the sheer number of topics, the response size is extremely large. This
> > > massive response overwhelms the socket output buffer, causing it to fill
> > up
> > > rapidly. Consequently, the broker's responses get backlogged in memory,
> > > eventually leading to the broker's OOM and subsequent restart loop.
> > >
> > > Why "Returning an Error" Is Not a Solution
> > > A common approach to handling overload is to simply return an error when
> > > the broker cannot process a request. However, in this specific scenario,
> > > this solution is ineffective. If a consumer application fails to start
> > due
> > > to an error, it triggers a user pod restart, which then leads to the same
> > > getTopicsOfNamespace request being reissued, resulting in a continuous
> > loop
> > > of errors and restarts. This creates an unrecoverable state for the
> > > consumer application and puts immense pressure on the brokers.
> > >
> > > Proposed Solution and Justification
> > > We believe the solution proposed in
> > > https://github.com/apache/pulsar/pull/24423 is highly suitable for
> > > addressing this issue. The core mechanism introduced in this PR – pausing
> > > acceptance of new requests when a channel cannot handle more output – is
> > > exceptionally reasonable and addresses the root cause of the memory
> > > pressure.
> > >
> > > This approach prevents the broker from accepting new requests when its
> > > write buffers are full, effectively backpressuring the client and
> > > preventing the memory buildup that leads to OOMs. Furthermore, we
> > > anticipate that this mechanism will not significantly increase future
> > > maintenance costs, as it elegantly handles overload scenarios at a
> > > fundamental network layer.
> > >
> > > I invite the community to discuss this solution and its potential
> > benefits
> > > for the overall stability and resilience of Apache Pulsar.
> > >
> > > Thanks
> > > Yubiao Feng
> > >
> > > --
> > > This email and any attachments are intended solely for the recipient(s)
> > > named above and may contain confidential, privileged, or proprietary
> > > information. If you are not the intended recipient, you are hereby
> > notified
> > > that any disclosure, copying, distribution, or reproduction of this
> > > information is strictly prohibited. If you have received this email in
> > > error, please notify the sender immediately by replying to this email and
> > > delete it from your system.
> >
>
> --
> This email and any attachments are intended solely for the recipient(s)
> named above and may contain confidential, privileged, or proprietary
> information. If you are not the intended recipient, you are hereby notified
> that any disclosure, copying, distribution, or reproduction of this
> information is strictly prohibited. If you have received this email in
> error, please notify the sender immediately by replying to this email and
> delete it from your system.

Re: [Discussion] PR #24423 Handling Overloaded Netty Channels in Apache Pulsar

Reply via email to