[ https://issues.apache.org/jira/browse/KAFKA-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367625#comment-15367625 ]
Tom Crayford commented on KAFKA-3933: ------------------------------------- Ismael: I've pushed the PR here: https://github.com/apache/kafka/pull/1598. For now, I've only fixed the precise memory issue that has been causing us notable production issues. I'm happy picking up other parts of the codebase where this can happen, but would rather get feedback on the first part of the approach for now (this is my first time contributing code to Kafka). I couldn't see a good or easy way to write any unit tests for this code right now. > Kafka OOM During Log Recovery Due to Leaked Native Memory > --------------------------------------------------------- > > Key: KAFKA-3933 > URL: https://issues.apache.org/jira/browse/KAFKA-3933 > Project: Kafka > Issue Type: Bug > Components: log > Affects Versions: 0.8.2.2, 0.9.0.1, 0.10.0.0 > Environment: Linux, latest oracle java-8 > Reporter: Tom Crayford > Assignee: Tom Crayford > Priority: Critical > Fix For: 0.10.0.1 > > > Hi there. We've been tracking an issue where Kafka hits an > java.lang.OutOfMemoryError during log recovery. > After a bunch of tracking work, we've realized we've hit an instance of a > long known issue: http://www.evanjones.ca/java-native-leak-bug.html > TLDR: Kafka breaks the rule "Always close GZIPInputStream and > GZIPOutputStream since they use native memory via zlib" from that article. > As such, during broker startup, when you're recovering log segments that have > been compressed with gzip, Kafka leaks `GZIPInputStream` all over the place. > Our crashes during startup have this profile - the JVM heap is empty (a few > hundred MB), but the offheap memory is full of allocations caused by > `Java_java_util_zip_Deflater_init` and `deflatInit2`. > This leads to broker crashes during startup. The only real mitigation is > having *far* more memory than you need to boot (which I'd guess is why folk > haven't noticed this in production that much yet). > To dig into the code more (this is based on trunk). Log recovery on unflushed > segments eventually calls `LogSegment.recover`: > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LogSegment.scala#L172 > On compressed segments, that leads to a call to `deepIterator`: > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LogSegment.scala#L189 > That leads to a call to `CompressionFactory`: > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/message/ByteBufferMessageSet.scala#L95 > which creates a `GZIPInputStream`: > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/message/CompressionFactory.scala#L46 > That `GZIPInputStream` is never closed anywhere, and the low *heap* pressure > means that the finalizer on `GZIPInputStream` that deallocates the native > buffers is never called, because GC is never triggered. Instead, we just > exhaust the offheap memory and then Kafka dies from an OutOfMemory error. > Kafka *does* trigger an `inputstream.close()` call, but only when *fully* > reading the whole input stream (see > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/message/ByteBufferMessageSet.scala#L156). > When it's performing log recovery, in > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LogSegment.scala#L189 > it doesn't read to the end of the stream, but instead reads the first offset > and leaves things alone. > This issue likely impacts `lz4` and `snappy` compressed topics in exactly the > same way. I think (but haven't 100% verified) that it impacts all versions of > Kafka that are supported (0.8 -> 0.10). > Fixing this seems relatively annoying, but only because of some "small > matters of coding", nothing hugely problematic. > The main issue is that `deepIterator` only returns an `Iterator`, which > doesn't have a `close()` method of any kind. We could create a new > `ClosableIterator` trait and have it extend Java's `AutoCloseable` > (https://docs.oracle.com/javase/7/docs/api/java/lang/AutoCloseable.html), > then explicitly call `close()` everywhere we use a `deepIterator()` and don't > always read to the end. Scala unfortunately doesn't seem to have a built in > version of Java's `try-with-resources` statement, but we can explicitly call > close everywhere perfectly happily. > Another (but much more hacky) solution would be to always read to the end of > the iterator in `LogSegment.recover`, but that seems pretty bad, using far > more resources than is needed during recovery. > I can't think of any other reasonable solutions for now, but would love to > hear input from the community. > We're happy doing the work of developing a patch, but thought we'd report the > issue before starting down that path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)