> > Adding the proposed solution in PR > > 24423 could have a negative impact > > in multiple ways. > > Could you explain the negative impacts?
My previous email already mentioned multiple details. I'd also like to emphasize that a connection-level solution wouldn't limit the overall broker memory usage. The solution in the PR wouldn't address the issue that it intends to solve. Besides the negative impacts mentioned in the previous email, one potential negative impact relates to performance. This would potentially cause a negative performance impact since Pulsar shares broker connections. Netty's writability is constantly changing when dispatching to consumers, especially when the connection is shared across many consumers from the same client. The channel writability is controlled with https://netty.io/4.1/api/io/netty/channel/WriteBufferWaterMark.html settings. We don't expose the configuration for these settings. By default, the high watermark is 64kB. When more than 64kB of output is buffered, the writability will change to false. The reason why this could impact performance is that there would be less "pipelining" in the Pulsar broker after this change is made. Less pipelining could mean that a single consumer dispatch will toggle the autoread to false, and only after the dispatched records have been written to the socket successfully would the Netty channel resume processing new input commands. This would mean that there couldn't be many pending operations in progress simultaneously. This would most likely impact performance negatively. Pipelining is important for performance in many distributed systems. It could be useful to throttle based on WriteBufferWaterMark settings, but the settings should be much higher than the defaults. Throttling should be implemented using ServerCnxThrottleTracker's incrementThrottleCount/decrementThrottleCount. However, this solution wouldn't limit the overall broker memory usage and therefore wouldn't be the primary solution for addressing the presented issue. By the way, the solution in PR 24423 uses an incorrect way for backpressuring based on outboundBuffer's totalPendingWriteBytes. Instead of having a new setting "connectionMaxPendingWriteBytes", the correct way would be to expose and configure WriteBufferWaterMark high and low settings for the child channel (ChannelOption.WRITE_BUFFER_WATER_MARK) and rely on channelWritabilityChanged and use ServerCnxThrottleTracker's incrementThrottleCount/decrementThrottleCount there. I'd be fine with the PR 24423 solution if it would be based on ChannelOption.WRITE_BUFFER_WATER_MARK settings and handle autoread toggling with ServerCnxThrottleTracker's incrementThrottleCount/decrementThrottleCount in the channelWritabilityChanged callback method. There should be a separate setting to enable it, since configuring WriteBufferWaterMark high and low values are separate from the throttling based on channel writability. It's just that this solution wouldn't be the correct way to fix the actual problem. -Lari -Lari On Wed, 9 Jul 2025 at 12:59, Yubiao Feng <yubiao.f...@streamnative.io.invalid> wrote: > > Hi Lari > > > Adding the proposed solution in PR > > 24423 could have a negative impact > > in multiple ways. > > Could you explain the negative impacts? > > On Wed, Jul 9, 2025 at 4:47 PM Lari Hotari <lhot...@apache.org> wrote: > > > Thanks for driving this discussion, Yubiao. I appreciate you bringing > > this important issue to the community's attention. > > > > Regarding the solution in 24423, I think we need to look further. > > > > Pulsar already contains multiple ways for backpressure and rate > > limiting where Netty's autoread is already being toggled. This could > > conflict with the solution that is being proposed. Additionally, due > > to the fact that a single connection might be serving multiple > > producers and consumers, toggling autoread would also impact producers > > and consumers. Pulsar already handles the memory held in buffers for > > producer and consumer use cases. Adding the proposed solution in PR > > 24423 could have a negative impact in multiple ways. > > > > I would assume that a more optimal solution lies in targeting > > specifically the GET_TOPICS_OF_NAMESPACE request and > > GET_TOPICS_OF_NAMESPACE_RESPONSE response, as well as the > > WATCH_TOPIC_LIST request with > > WATCH_TOPIC_LIST_SUCCESS/WATCH_TOPIC_UPDATE responses. > > > > An additional benefit of targeting topic listing responses directly is > > that it would be possible to add a global limit for the broker. The > > proposed PR 24423 cannot ensure that the broker memory usage would be > > limited since it's targeting individual connections and there isn't a > > global limit for the broker. Due to this reason, the PR 24423 solution > > wouldn't solve the problem it intends to solve. The topic listing > > limit should be global for the broker like we already have > > maxMessagePublishBufferSizeInMB, > > managedLedgerMaxReadsInFlightSizeInMB, and managedLedgerCacheSizeMB > > settings. > > > > One possible way would be to ensure that the size of responses could > > be calculated and there could be a solution that limits in-flight > > responses, somewhat similar to how > > managedLedgerMaxReadsInFlightSizeInMB has been implemented > > asynchronously in InflightReadsLimiter. In this solution, it would be > > necessary to deduplicate the java.lang.String and TopicName instances > > as demonstrated in PR 24457 so that keeping the listings in memory > > doesn't add significant extra memory usage. Side note: This > > deduplication is necessary in any case for large amounts of > > WATCH_TOPIC_LIST requests since each instance keeps a list of topics > > in the memory of the broker. > > > > There could be some additional challenges that come up in the > > implementation, but the idea would be to postpone and delay the > > specific topic listing responses instead of relying on toggling > > autoread for the connection. Toggling autoread could be the last > > resort in corner cases. In the Pulsar broker, autoread should be > > handled with org.apache.pulsar.broker.service.ServerCnxThrottleTracker's > > incrementThrottleCount/decrementThrottleCount instead of toggling > > autoread directly. (The javadoc for ServerCnxThrottleTracker contains > > more details.) > > > > I hope this helps us find a more optimal solution to the problem. > > Looking forward to continuing this discussion and working together to > > improve Pulsar's stability. > > > > -Lari > > > > > > On Tue, 8 Jul 2025 at 09:07, Yubiao Feng > > <yubiao.f...@streamnative.io.invalid> wrote: > > > > > > Hi all > > > > > > I want to satrt a discussion, which relates to the PR. #24423: Handling > > > Overloaded Netty Channels in Apache Pulsar > > > > > > Problem Statement > > > We've encountered a critical issue in our Apache Pulsar clusters where > > > brokers experience Out-Of-Memory (OOM) errors and continuous restarts > > under > > > specific load patterns. This occurs when Netty channel write buffers > > become > > > full, leading to a buildup of unacknowledged responses in the broker's > > > memory. > > > > > > Background > > > Our clusters are configured with numerous namespaces, each containing > > > approximately 8,000 to 10,000 topics. Our consumer applications are quite > > > large, with each consumer using a regular expression (regex) pattern to > > > subscribe to all topics within a namespace. > > > > > > The problem manifests particularly during consumer application restarts. > > > When a consumer restarts, it issues a getTopicsOfNamespace request. Due > > to > > > the sheer number of topics, the response size is extremely large. This > > > massive response overwhelms the socket output buffer, causing it to fill > > up > > > rapidly. Consequently, the broker's responses get backlogged in memory, > > > eventually leading to the broker's OOM and subsequent restart loop. > > > > > > Why "Returning an Error" Is Not a Solution > > > A common approach to handling overload is to simply return an error when > > > the broker cannot process a request. However, in this specific scenario, > > > this solution is ineffective. If a consumer application fails to start > > due > > > to an error, it triggers a user pod restart, which then leads to the same > > > getTopicsOfNamespace request being reissued, resulting in a continuous > > loop > > > of errors and restarts. This creates an unrecoverable state for the > > > consumer application and puts immense pressure on the brokers. > > > > > > Proposed Solution and Justification > > > We believe the solution proposed in > > > https://github.com/apache/pulsar/pull/24423 is highly suitable for > > > addressing this issue. The core mechanism introduced in this PR – pausing > > > acceptance of new requests when a channel cannot handle more output – is > > > exceptionally reasonable and addresses the root cause of the memory > > > pressure. > > > > > > This approach prevents the broker from accepting new requests when its > > > write buffers are full, effectively backpressuring the client and > > > preventing the memory buildup that leads to OOMs. Furthermore, we > > > anticipate that this mechanism will not significantly increase future > > > maintenance costs, as it elegantly handles overload scenarios at a > > > fundamental network layer. > > > > > > I invite the community to discuss this solution and its potential > > benefits > > > for the overall stability and resilience of Apache Pulsar. > > > > > > Thanks > > > Yubiao Feng > > > > > > -- > > > This email and any attachments are intended solely for the recipient(s) > > > named above and may contain confidential, privileged, or proprietary > > > information. If you are not the intended recipient, you are hereby > > notified > > > that any disclosure, copying, distribution, or reproduction of this > > > information is strictly prohibited. If you have received this email in > > > error, please notify the sender immediately by replying to this email and > > > delete it from your system. > > > > -- > This email and any attachments are intended solely for the recipient(s) > named above and may contain confidential, privileged, or proprietary > information. If you are not the intended recipient, you are hereby notified > that any disclosure, copying, distribution, or reproduction of this > information is strictly prohibited. If you have received this email in > error, please notify the sender immediately by replying to this email and > delete it from your system.