Hi Lari > Adding the proposed solution in PR > 24423 could have a negative impact > in multiple ways.
Could you explain the negative impacts? On Wed, Jul 9, 2025 at 4:47 PM Lari Hotari <lhot...@apache.org> wrote: > Thanks for driving this discussion, Yubiao. I appreciate you bringing > this important issue to the community's attention. > > Regarding the solution in 24423, I think we need to look further. > > Pulsar already contains multiple ways for backpressure and rate > limiting where Netty's autoread is already being toggled. This could > conflict with the solution that is being proposed. Additionally, due > to the fact that a single connection might be serving multiple > producers and consumers, toggling autoread would also impact producers > and consumers. Pulsar already handles the memory held in buffers for > producer and consumer use cases. Adding the proposed solution in PR > 24423 could have a negative impact in multiple ways. > > I would assume that a more optimal solution lies in targeting > specifically the GET_TOPICS_OF_NAMESPACE request and > GET_TOPICS_OF_NAMESPACE_RESPONSE response, as well as the > WATCH_TOPIC_LIST request with > WATCH_TOPIC_LIST_SUCCESS/WATCH_TOPIC_UPDATE responses. > > An additional benefit of targeting topic listing responses directly is > that it would be possible to add a global limit for the broker. The > proposed PR 24423 cannot ensure that the broker memory usage would be > limited since it's targeting individual connections and there isn't a > global limit for the broker. Due to this reason, the PR 24423 solution > wouldn't solve the problem it intends to solve. The topic listing > limit should be global for the broker like we already have > maxMessagePublishBufferSizeInMB, > managedLedgerMaxReadsInFlightSizeInMB, and managedLedgerCacheSizeMB > settings. > > One possible way would be to ensure that the size of responses could > be calculated and there could be a solution that limits in-flight > responses, somewhat similar to how > managedLedgerMaxReadsInFlightSizeInMB has been implemented > asynchronously in InflightReadsLimiter. In this solution, it would be > necessary to deduplicate the java.lang.String and TopicName instances > as demonstrated in PR 24457 so that keeping the listings in memory > doesn't add significant extra memory usage. Side note: This > deduplication is necessary in any case for large amounts of > WATCH_TOPIC_LIST requests since each instance keeps a list of topics > in the memory of the broker. > > There could be some additional challenges that come up in the > implementation, but the idea would be to postpone and delay the > specific topic listing responses instead of relying on toggling > autoread for the connection. Toggling autoread could be the last > resort in corner cases. In the Pulsar broker, autoread should be > handled with org.apache.pulsar.broker.service.ServerCnxThrottleTracker's > incrementThrottleCount/decrementThrottleCount instead of toggling > autoread directly. (The javadoc for ServerCnxThrottleTracker contains > more details.) > > I hope this helps us find a more optimal solution to the problem. > Looking forward to continuing this discussion and working together to > improve Pulsar's stability. > > -Lari > > > On Tue, 8 Jul 2025 at 09:07, Yubiao Feng > <yubiao.f...@streamnative.io.invalid> wrote: > > > > Hi all > > > > I want to satrt a discussion, which relates to the PR. #24423: Handling > > Overloaded Netty Channels in Apache Pulsar > > > > Problem Statement > > We've encountered a critical issue in our Apache Pulsar clusters where > > brokers experience Out-Of-Memory (OOM) errors and continuous restarts > under > > specific load patterns. This occurs when Netty channel write buffers > become > > full, leading to a buildup of unacknowledged responses in the broker's > > memory. > > > > Background > > Our clusters are configured with numerous namespaces, each containing > > approximately 8,000 to 10,000 topics. Our consumer applications are quite > > large, with each consumer using a regular expression (regex) pattern to > > subscribe to all topics within a namespace. > > > > The problem manifests particularly during consumer application restarts. > > When a consumer restarts, it issues a getTopicsOfNamespace request. Due > to > > the sheer number of topics, the response size is extremely large. This > > massive response overwhelms the socket output buffer, causing it to fill > up > > rapidly. Consequently, the broker's responses get backlogged in memory, > > eventually leading to the broker's OOM and subsequent restart loop. > > > > Why "Returning an Error" Is Not a Solution > > A common approach to handling overload is to simply return an error when > > the broker cannot process a request. However, in this specific scenario, > > this solution is ineffective. If a consumer application fails to start > due > > to an error, it triggers a user pod restart, which then leads to the same > > getTopicsOfNamespace request being reissued, resulting in a continuous > loop > > of errors and restarts. This creates an unrecoverable state for the > > consumer application and puts immense pressure on the brokers. > > > > Proposed Solution and Justification > > We believe the solution proposed in > > https://github.com/apache/pulsar/pull/24423 is highly suitable for > > addressing this issue. The core mechanism introduced in this PR – pausing > > acceptance of new requests when a channel cannot handle more output – is > > exceptionally reasonable and addresses the root cause of the memory > > pressure. > > > > This approach prevents the broker from accepting new requests when its > > write buffers are full, effectively backpressuring the client and > > preventing the memory buildup that leads to OOMs. Furthermore, we > > anticipate that this mechanism will not significantly increase future > > maintenance costs, as it elegantly handles overload scenarios at a > > fundamental network layer. > > > > I invite the community to discuss this solution and its potential > benefits > > for the overall stability and resilience of Apache Pulsar. > > > > Thanks > > Yubiao Feng > > > > -- > > This email and any attachments are intended solely for the recipient(s) > > named above and may contain confidential, privileged, or proprietary > > information. If you are not the intended recipient, you are hereby > notified > > that any disclosure, copying, distribution, or reproduction of this > > information is strictly prohibited. If you have received this email in > > error, please notify the sender immediately by replying to this email and > > delete it from your system. > -- This email and any attachments are intended solely for the recipient(s) named above and may contain confidential, privileged, or proprietary information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or reproduction of this information is strictly prohibited. If you have received this email in error, please notify the sender immediately by replying to this email and delete it from your system.