[ https://issues.apache.org/jira/browse/KAFKA-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888768#comment-16888768 ]
Sönke Liebau commented on KAFKA-1016: ------------------------------------- Is this still relevant after the Purgatory redesign in KAFKA-1430 and KAFKA-1989 ? It seems to me that the improvements made there would at least alleviate the issue described here due to vast performance improvements, even if no hard limit is introduced (which I'm not sure we want to do, as this basically would mean limiting the number of consumers that we are willing to serve). > Broker should limit purgatory size > ---------------------------------- > > Key: KAFKA-1016 > URL: https://issues.apache.org/jira/browse/KAFKA-1016 > Project: Kafka > Issue Type: Bug > Components: purgatory > Affects Versions: 0.8.0 > Reporter: Chris Riccomini > Assignee: Joel Koshy > Priority: Major > > I recently ran into a case where a poorly configured Kafka consumer was able > to trigger out of memory exceptions in multiple Kafka brokers. The consumer > was configured to have a fetcher.max.wait of Int.MaxInt. > For low volume topics, this configuration causes the consumer to block for > frequently, and for long periods of time. [~junrao] informs me that the fetch > request will time out after the socket timeout is reached. In our case, this > was set to 30s. > With several thousand consumer threads, the fetch request purgatory got into > the 100,000-400,000 range, which we believe triggered the out of memory > exception. [~nehanarkhede] claims to have seem similar behavior in other high > volume clusters. > It kind of seems like a bad thing that a poorly configured consumer can > trigger out of memory exceptions in the broker. I was thinking maybe it makes > sense to have the broker try and protect itself from this situation. Here are > some potential solutions: > 1. Have a broker-side max wait config for fetch requests. > 2. Threshold the purgatory size, and either drop the oldest connections in > purgatory, or reject the newest fetch requests when purgatory is full. -- This message was sent by Atlassian JIRA (v7.6.14#76016)