Thanks for driving this discussion, Yubiao. I appreciate you bringing this important issue to the community's attention.
Regarding the solution in 24423, I think we need to look further. Pulsar already contains multiple ways for backpressure and rate limiting where Netty's autoread is already being toggled. This could conflict with the solution that is being proposed. Additionally, due to the fact that a single connection might be serving multiple producers and consumers, toggling autoread would also impact producers and consumers. Pulsar already handles the memory held in buffers for producer and consumer use cases. Adding the proposed solution in PR 24423 could have a negative impact in multiple ways. I would assume that a more optimal solution lies in targeting specifically the GET_TOPICS_OF_NAMESPACE request and GET_TOPICS_OF_NAMESPACE_RESPONSE response, as well as the WATCH_TOPIC_LIST request with WATCH_TOPIC_LIST_SUCCESS/WATCH_TOPIC_UPDATE responses. An additional benefit of targeting topic listing responses directly is that it would be possible to add a global limit for the broker. The proposed PR 24423 cannot ensure that the broker memory usage would be limited since it's targeting individual connections and there isn't a global limit for the broker. Due to this reason, the PR 24423 solution wouldn't solve the problem it intends to solve. The topic listing limit should be global for the broker like we already have maxMessagePublishBufferSizeInMB, managedLedgerMaxReadsInFlightSizeInMB, and managedLedgerCacheSizeMB settings. One possible way would be to ensure that the size of responses could be calculated and there could be a solution that limits in-flight responses, somewhat similar to how managedLedgerMaxReadsInFlightSizeInMB has been implemented asynchronously in InflightReadsLimiter. In this solution, it would be necessary to deduplicate the java.lang.String and TopicName instances as demonstrated in PR 24457 so that keeping the listings in memory doesn't add significant extra memory usage. Side note: This deduplication is necessary in any case for large amounts of WATCH_TOPIC_LIST requests since each instance keeps a list of topics in the memory of the broker. There could be some additional challenges that come up in the implementation, but the idea would be to postpone and delay the specific topic listing responses instead of relying on toggling autoread for the connection. Toggling autoread could be the last resort in corner cases. In the Pulsar broker, autoread should be handled with org.apache.pulsar.broker.service.ServerCnxThrottleTracker's incrementThrottleCount/decrementThrottleCount instead of toggling autoread directly. (The javadoc for ServerCnxThrottleTracker contains more details.) I hope this helps us find a more optimal solution to the problem. Looking forward to continuing this discussion and working together to improve Pulsar's stability. -Lari On Tue, 8 Jul 2025 at 09:07, Yubiao Feng <yubiao.f...@streamnative.io.invalid> wrote: > > Hi all > > I want to satrt a discussion, which relates to the PR. #24423: Handling > Overloaded Netty Channels in Apache Pulsar > > Problem Statement > We've encountered a critical issue in our Apache Pulsar clusters where > brokers experience Out-Of-Memory (OOM) errors and continuous restarts under > specific load patterns. This occurs when Netty channel write buffers become > full, leading to a buildup of unacknowledged responses in the broker's > memory. > > Background > Our clusters are configured with numerous namespaces, each containing > approximately 8,000 to 10,000 topics. Our consumer applications are quite > large, with each consumer using a regular expression (regex) pattern to > subscribe to all topics within a namespace. > > The problem manifests particularly during consumer application restarts. > When a consumer restarts, it issues a getTopicsOfNamespace request. Due to > the sheer number of topics, the response size is extremely large. This > massive response overwhelms the socket output buffer, causing it to fill up > rapidly. Consequently, the broker's responses get backlogged in memory, > eventually leading to the broker's OOM and subsequent restart loop. > > Why "Returning an Error" Is Not a Solution > A common approach to handling overload is to simply return an error when > the broker cannot process a request. However, in this specific scenario, > this solution is ineffective. If a consumer application fails to start due > to an error, it triggers a user pod restart, which then leads to the same > getTopicsOfNamespace request being reissued, resulting in a continuous loop > of errors and restarts. This creates an unrecoverable state for the > consumer application and puts immense pressure on the brokers. > > Proposed Solution and Justification > We believe the solution proposed in > https://github.com/apache/pulsar/pull/24423 is highly suitable for > addressing this issue. The core mechanism introduced in this PR – pausing > acceptance of new requests when a channel cannot handle more output – is > exceptionally reasonable and addresses the root cause of the memory > pressure. > > This approach prevents the broker from accepting new requests when its > write buffers are full, effectively backpressuring the client and > preventing the memory buildup that leads to OOMs. Furthermore, we > anticipate that this mechanism will not significantly increase future > maintenance costs, as it elegantly handles overload scenarios at a > fundamental network layer. > > I invite the community to discuss this solution and its potential benefits > for the overall stability and resilience of Apache Pulsar. > > Thanks > Yubiao Feng > > -- > This email and any attachments are intended solely for the recipient(s) > named above and may contain confidential, privileged, or proprietary > information. If you are not the intended recipient, you are hereby notified > that any disclosure, copying, distribution, or reproduction of this > information is strictly prohibited. If you have received this email in > error, please notify the sender immediately by replying to this email and > delete it from your system.