[ 
https://issues.apache.org/jira/browse/KAFKA-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16258663#comment-16258663
 ] 

Robin Tweedie commented on KAFKA-6199:
--------------------------------------

I'm going to share as much as I can from Eclipse Memory Analyzer running 
against a heap dump from the broker with its heap close to exhaustion.

from *Leak Suspects* (the second biggest thing was LogCleaner, but I think this 
was similar on the same broker right after it is restarted)
{noformat}
2,469 instances of "org.apache.kafka.common.network.NetworkSend", loaded by 
"sun.misc.Launcher$AppClassLoader @ 0x90102770" occupy 820.95 MB (70.46%) 
bytes. 

Keywords
sun.misc.Launcher$AppClassLoader @ 0x90102770
org.apache.kafka.common.network.NetworkSend
{noformat}

The {{NetworkSend}}s appear right at the top-level (see attached 
dominator_tree.png) -- using "path to GC Roots" tools on individual instances 
(screenshots also attached) does not seem to lead anywhere interesting, just 
more networking code.

The values inside the byte buffers just look like lists of broker hostnames.

I am not sure where would be useful to go from here. I looked at another heap 
dump right after the broker is restarted and there are very few NetworkSend 
objects. I think I'd see a similar data if I compared with a "healthy" broker 
(haven't done this, but could if it will help).

Any ideas?

> Single broker with fast growing heap usage
> ------------------------------------------
>
>                 Key: KAFKA-6199
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6199
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.10.2.1
>         Environment: Amazon Linux
>            Reporter: Robin Tweedie
>         Attachments: Screen Shot 2017-11-10 at 1.55.33 PM.png, Screen Shot 
> 2017-11-10 at 11.59.06 AM.png, dominator_tree.png, merge_shortest_paths.png, 
> path2gc.png
>
>
> We have a single broker in our cluster of 25 with fast growing heap usage 
> which necessitates us restarting it every 12 hours. If we don't restart the 
> broker, it becomes very slow from long GC pauses and eventually has 
> {{OutOfMemory}} errors.
> See {{Screen Shot 2017-11-10 at 11.59.06 AM.png}} for a graph of heap usage 
> percentage on the broker. A "normal" broker in the same cluster stays below 
> 50% (averaged) over the same time period.
> We have taken heap dumps when the broker's heap usage is getting dangerously 
> high, and there are a lot of retained {{NetworkSend}} objects referencing 
> byte buffers.
> We also noticed that the single affected broker logs a lot more of this kind 
> of warning than any other broker:
> {noformat}
> WARN Attempting to send response via channel for which there is no open 
> connection, connection id 13 (kafka.network.Processor)
> {noformat}
> See {{Screen Shot 2017-11-10 at 1.55.33 PM.png}} for counts of that WARN log 
> message visualized across all the brokers (to show it happens a bit on other 
> brokers, but not nearly as much as it does on the "bad" broker).
> I can't make the heap dumps public, but would appreciate advice on how to pin 
> down the problem better. We're currently trying to narrow it down to a 
> particular client, but without much success so far.
> Let me know what else I could investigate or share to track down the source 
> of this leak.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to