[ https://issues.apache.org/jira/browse/KAFKA-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851880#comment-15851880 ]
Tim Carey-Smith commented on KAFKA-4725: ---------------------------------------- Hi there, Jeff and I have prototyped a fix for this bug. We repeated our stress tests against a new build and have not yet been able to reproduce the leak. The branch is hosted on GitHub at https://github.com/apache/kafka/compare/0.10.1.1...heroku:fix-throttled-response-leak Before we open a PR, which base branch should we set as the target for the PR? Thanks, Tim > Kafka broker fails due to OOM when producer exceeds throttling quota for > extended periods of time > ------------------------------------------------------------------------------------------------- > > Key: KAFKA-4725 > URL: https://issues.apache.org/jira/browse/KAFKA-4725 > Project: Kafka > Issue Type: Bug > Components: core, producer > Affects Versions: 0.10.1.1 > Environment: Ubuntu Trusty (14.04.5), Oracle JDK 8 > Reporter: Jeff Chao > Priority: Critical > Labels: reliability > Fix For: 0.10.3.0, 0.10.2.1 > > Attachments: oom-references.png > > > Steps to Reproduce: > 1. Create a non-compacted topic with 1 partition > 2. Set a produce quota of 512 KB/s > 3. Send messages at 20 MB/s > 4. Observe heap memory growth as time progresses > Investigation: > While running performance tests with a user configured with a produce quota, > we found that the lead broker serving the requests would exhaust heap memory > if the producer sustained a inbound request throughput greater than the > produce quota. > Upon further investigation, we took a heap dump from that broker process and > discovered the ThrottledResponse object has a indirect reference to the > byte[] holding the messages associated with the ProduceRequest. > We're happy contributing a patch but in the meantime wanted to first raise > the issue and get feedback from the community. -- This message was sent by Atlassian JIRA (v6.3.15#6346)