Kaare Nilsen created KAFKA-9690:
-----------------------------------

             Summary: MemoryLeak in JMX Reporter
                 Key: KAFKA-9690
                 URL: https://issues.apache.org/jira/browse/KAFKA-9690
             Project: Kafka
          Issue Type: Bug
          Components: consumer
    Affects Versions: 2.4.0
            Reporter: Kaare Nilsen
         Attachments: image-2020-03-10-12-37-49-259.png, 
image-2020-03-10-12-44-11-688.png

We use kafka in a streamin http application creating a new consumer for each 
incoming requests. In version 2.4.0 we experience that the memory builds up for 
each new consumer. After debugging the issue after a memory dump revealed it 
was in the JMX subsystem we found that one of the JMX beans (kafka.consumer) 
build up one metric consumer-metrics without releasing them on closing the 
consumer.

What we found is that the metricRemoval  
{code:java}
public void metricRemoval(KafkaMetric metric) {
    synchronized (LOCK) {
        MetricName metricName = metric.metricName();
        String mBeanName = getMBeanName(prefix, metricName);
        KafkaMbean mbean = removeAttribute(metric, mBeanName);
        if (mbean != null) {
            if (mbean.metrics.isEmpty()) {
                unregister(mbean);
                mbeans.remove(mBeanName);
            } else
                reregister(mbean);
        }
    }
}
{code}
The check mbean.metrics.isEmpty() for this particular metric never yielded true 
so the mbean was never removed. Thus building up the mbeans HashMap.

The metrics that is not released are:
{code:java}
last-poll-seconds-ago
poll-idle-ratio-avg")
time-between-poll-avg
time-between-poll-max
{code}
I have a workaround in my code now by having a modified JMXReporter in my pwn 
project with the following close method
{code:java}
public void close() {
    synchronized (LOCK) {
        for (KafkaMbean mbean : this.mbeans.values()) {
            mbean.removeAttribute("last-poll-seconds-ago");
            mbean.removeAttribute("poll-idle-ratio-avg");
            mbean.removeAttribute("time-between-poll-avg");
            mbean.removeAttribute("time-between-poll-max");
            unregister(mbean);
        }
    }
}
{code}
This will remove the attributes that are not cleaned up and prevent the memory 
leakage, but I have not found the root casue.
Another workaround is to use kafka client 2.3.1

 

this is how it looks in the jmx console after a couple of clients have 
connected and disconnected. Here you can see that the one metric builds up and 
the old ones have the four attributes that makes the unregister fail.

 

!image-2020-03-10-12-37-49-259.png!

 

dThis Is how it looks after a while in kafka client 2.3.1
!image-2020-03-10-12-44-11-688.png!

As you can see no leakage here.

I suspect this pull request to be the one that have introduced the leak: 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-517%3A+Add+consumer+metrics+to+observe+user+poll+behavior]

https://issues.apache.org/jira/browse/KAFKA-8874



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to