Chris Riccomini created KAFKA-1016:
--------------------------------------

             Summary: Broker should limit purgatory size
                 Key: KAFKA-1016
                 URL: https://issues.apache.org/jira/browse/KAFKA-1016
             Project: Kafka
          Issue Type: Bug
          Components: purgatory
    Affects Versions: 0.8
            Reporter: Chris Riccomini
            Assignee: Joel Koshy


I recently ran into a case where a poorly configured Kafka consumer was able to 
trigger out of memory exceptions in multiple Kafka brokers. The consumer was 
configured to have a fetcher.max.wait of Int.MaxInt.

For low volume topics, this configuration causes the consumer to block for 
frequently, and for long periods of time. [~junrao] informs me that the fetch 
request will time out after the socket timeout is reached. In our case, this 
was set to 30s.

With several thousand consumer threads, the fetch request purgatory got into 
the 100,000-400,000 range, which we believe triggered the out of memory 
exception. [~nehanarkhede] claims to have seem similar behavior in other high 
volume clusters.

It kind of seems like a bad thing that a poorly configured consumer can trigger 
out of memory exceptions in the broker. I was thinking maybe it makes sense to 
have the broker try and protect itself from this situation. Here are some 
potential solutions:

1. Have a broker-side max wait config for fetch requests.
2. Threshold the purgatory size, and either drop the oldest connections in 
purgatory, or reject the newest fetch requests when purgatory is full.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to