java.lang.OutOfMemoryError memory leak on 1.0.0 with 0.11.0.1 on disk and converting to 0.9 clients

Brett Rann Tue, 07 Nov 2017 19:50:06 -0800

https://issues.apache.org/jira/browse/KAFKA-6185


This was also reported by another person, Ben, attached to the end of this
ticket https://issues.apache.org/jira/browse/KAFKA-6042 which is a broker
lockup/FD issue, but a new ticket was requested. It was also suggested by
Ismael Juma there to discuss it on the mail list, so starting a thread here.

We are testing 1.0.0 in a couple of environments.
Both have about 5 brokers, with two 1.0.0 brokers and the rest 0.11.0.1
brokers.
One is using on disk message format 0.9.0.1, the other 0.11.0.1
we have 0.9, 0.10, and 0.11 clients connecting.

The cluster on the 0.9.0.1 format is running fine for a week.

But the cluster on the 0.11.0.1 format is consistently having memory
issues. (prior to this it was running all 0.11.0.1 binary and disk message
format for many weeks translating backwards with no issues)

The first occurrence of the error comes along with this stack trace

{"timestamp":"2017-11-06
14:22:32,402","level":"ERROR","logger":"kafka.server.KafkaApis","thread":"kafka-request-handler-7","message":"[KafkaApi-1]
Error when handling request
{replica_id=-1,max_wait_time=500,min_bytes=1,topics=[{topic=maxwell.users,partitions=[{partition=0,fetch_offset=227537,max_bytes=11000000},{partition=4,fetch_offset=354468,max_bytes=11000000},{partition=5,fetch_offset=266524,max_bytes=11000000},{partition=8,fetch_offset=324562,max_bytes=11000000},{partition=10,fetch_offset=292931,max_bytes=11000000},{partition=12,fetch_offset=325718,max_bytes=11000000},{partition=15,fetch_offset=229036,max_bytes=11000000}]}]}"}
java.lang.OutOfMemoryError: Java heap space
        at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
        at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
        at 
org.apache.kafka.common.record.AbstractRecords.downConvert(AbstractRecords.java:101)
        at 
org.apache.kafka.common.record.FileRecords.downConvert(FileRecords.java:253)
        at 
kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$convertedPartitionData$1$1$$anonfun$apply$4.apply(KafkaApis.scala:520)
        at 
kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$convertedPartitionData$1$1$$anonfun$apply$4.apply(KafkaApis.scala:518)
        at scala.Option.map(Option.scala:146)
        at 
kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$convertedPartitionData$1$1.apply(KafkaApis.scala:518)
        at 
kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$convertedPartitionData$1$1.apply(KafkaApis.scala:508)
        at scala.Option.flatMap(Option.scala:171)
        at 
kafka.server.KafkaApis.kafka$server$KafkaApis$$convertedPartitionData$1(KafkaApis.scala:508)
        at 
kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$createResponse$2$1.apply(KafkaApis.scala:556)
        at 
kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$createResponse$2$1.apply(KafkaApis.scala:555)
        at scala.collection.Iterator$class.foreach(Iterator.scala:891)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
        at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
        at 
kafka.server.KafkaApis.kafka$server$KafkaApis$$createResponse$2(KafkaApis.scala:555)
        at 
kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$fetchResponseCallback$1$1.apply(KafkaApis.scala:569)
        at 
kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$fetchResponseCallback$1$1.apply(KafkaApis.scala:569)
        at 
kafka.server.KafkaApis$$anonfun$sendResponseMaybeThrottle$1.apply$mcVI$sp(KafkaApis.scala:2034)
        at 
kafka.server.ClientRequestQuotaManager.maybeRecordAndThrottle(ClientRequestQuotaManager.scala:52)
        at 
kafka.server.KafkaApis.sendResponseMaybeThrottle(KafkaApis.scala:2033)
        at 
kafka.server.KafkaApis.kafka$server$KafkaApis$$fetchResponseCallback$1(KafkaApis.scala:569)
        at 
kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$processResponseCallback$1$1.apply$mcVI$sp(KafkaApis.scala:588)
        at 
kafka.server.ClientQuotaManager.maybeRecordAndThrottle(ClientQuotaManager.scala:175)
        at 
kafka.server.KafkaApis.kafka$server$KafkaApis$$processResponseCallback$1(KafkaApis.scala:587)
        at 
kafka.server.KafkaApis$$anonfun$handleFetchRequest$3.apply(KafkaApis.scala:604)
        at 
kafka.server.KafkaApis$$anonfun$handleFetchRequest$3.apply(KafkaApis.scala:604)
        at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:820)
        at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:596)
        at kafka.server.KafkaApis.handle(KafkaApis.scala:100)

And then after a few of those it settles into this kind of pattern

{"timestamp":"2017-11-06
15:06:48,114","level":"ERROR","logger":"kafka.server.KafkaApis","thread":"kafka-request-handler-1","message":"[KafkaApi-1]
Error when handling request
{replica_id=-1,max_wait_time=500,min_bytes=1,topics=[{topic=maxwell.accounts,partitions=[{partition=4,fetch_offset=560631,max_bytes=11000000},{partition=8,fetch_offset=557589,max_bytes=11000000},{partition=12,fetch_offset=551712,max_bytes=11000000}]}]}"}
java.lang.OutOfMemoryError: Java heap space
{"timestamp":"2017-11-06
15:06:48,811","level":"ERROR","logger":"kafka.server.KafkaApis","thread":"kafka-request-handler-7","message":"[KafkaApi-1]
Error when handling request
{replica_id=-1,max_wait_time=500,min_bytes=1,topics=[{topic=maxwell.accounts,partitions=[{partition=4,fetch_offset=560631,max_bytes=11000000},{partition=8,fetch_offset=557589,max_bytes=11000000},{partition=12,fetch_offset=551712,max_bytes=11000000}]}]}"}
java.lang.OutOfMemoryError: Java heap space

I've attached the heap use graphs. It steadily increases to max at which
time the error starts appearing.
[image: Inline image 1]

I've tripled the heap space for one of the 1.0.0 hosts to see what happens,
and it similarly climbs to near 6, then similarly starts having
java.lang.OutOfMemoryError errors. I've attached those heap space graphs
also, where the line that starts climbing from 2gb was when it was
restarted with 6gb heap. The out of memory error started right at the peak
of the flatline.

[image: Inline image 2]

Here's a snippit from the broker logs:
https://gist.github.com/brettrann/4bb8041e884a299b7b0b12645a04492d

I've redacted some group names because I'd need to check with the teams
about making them public. This is a test cluster and the problem appears
reproducible easily enough. Happy to gather as much info as needed, attach
a debugger, whatever is needed. Just ask!

Our config is:

broker.id=2
delete.topic.enable=true
auto.create.topics.enable=false
auto.leader.rebalance.enable=true
inter.broker.protocol.version=0.11.0.1
log.message.format.version=0.11.0.1group.max.session.timeout.ms = 300000
port=9092
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
replica.fetch.max.bytes=10485760
log.dirs=/data/kafka/logs
num.partitions=1
num.recovery.threads.per.data.dir=1
log.retention.hours=168
offsets.retention.minutes=10080
log.segment.bytes=1073741824log.retention.check.interval.ms=300000
log.cleaner.enable=true
zookeeper.connect=zoo1:2181,zoo2:2181,zoo3:2181/kafkazookeeper.connection.timeout.ms=6000



-- 

Brett Rann

Senior DevOps Engineer


Zendesk International Ltd

395 Collins Street, Melbourne VIC 3000 Australia

Mobile: +61 (0) 418 826 017

java.lang.OutOfMemoryError memory leak on 1.0.0 with 0.11.0.1 on disk and converting to 0.9 clients

Reply via email to