How did you arrive at the 10 GB JVM heap value? I'm running Kafka on 16 GB
RAM instances with ~4000 partitions each and only assigning 5 GB to JVM of
which Kafka only seems to be using ~2 GB at any given time.

Also, I've set vm.max_map_count to 262144 -- didn't use any formula to
estimate that, must have been some answer I found online, but it's been
doing its trick -- no issues so far.

On Fri, Sep 27, 2019 at 11:29 AM Arpit Gogia <ar...@ixigo.com> wrote:

> Hello Kafka user group
>
>
> I am running a Kafka cluster with 3 brokers and have been experiencing
> frequent OutOfMemory errors each time with similar error stack trace
>
>
> java.io.IOException: Map failed
>
>     at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:938)
>
>     at
> kafka.log.AbstractIndex$$anonfun$resize$1.apply$mcZ$sp(AbstractIndex.scala:188)
>
>     at
> kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:173)
>
>     at
> kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:173)
>
>     at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
>
>     at kafka.log.AbstractIndex.resize(AbstractIndex.scala:173)
>
>     at
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply$mcZ$sp(AbstractIndex.scala:242)
>
>     at
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:242)
>
>     at
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:242)
>
>     at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
>
>     at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:241)
>
>     at kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:501)
>
>     at
> kafka.log.Log$$anonfun$roll$2$$anonfun$apply$32.apply(Log.scala:1635)
>
>     at
> kafka.log.Log$$anonfun$roll$2$$anonfun$apply$32.apply(Log.scala:1635)
>
>     at scala.Option.foreach(Option.scala:257)
>
>     at kafka.log.Log$$anonfun$roll$2.apply(Log.scala:1635)
>
>     at kafka.log.Log$$anonfun$roll$2.apply(Log.scala:1599)
>
>     at kafka.log.Log.maybeHandleIOException(Log.scala:1996)
>
>     at kafka.log.Log.roll(Log.scala:1599)
>
>     at kafka.log.Log$$anonfun$deleteSegments$1.apply$mcI$sp(Log.scala:1434)
>
>     at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log.scala:1429)
>
>     at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log.scala:1429)
>
>     at kafka.log.Log.maybeHandleIOException(Log.scala:1996)
>
>     at kafka.log.Log.deleteSegments(Log.scala:1429)
>
>     at kafka.log.Log.deleteOldSegments(Log.scala:1424)
>
>     at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1501)
>
>     at kafka.log.Log.deleteOldSegments(Log.scala:1492)
>
>     at
> kafka.log.LogCleaner$CleanerThread$$anonfun$cleanFilthiestLog$1.apply(LogCleaner.scala:328)
>
>     at
> kafka.log.LogCleaner$CleanerThread$$anonfun$cleanFilthiestLog$1.apply(LogCleaner.scala:324)
>
>     at scala.collection.immutable.List.foreach(List.scala:392)
>
>     at
> kafka.log.LogCleaner$CleanerThread.cleanFilthiestLog(LogCleaner.scala:324)
>
>     at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:300)
>
>     at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
>
> Caused by: java.lang.OutOfMemoryError: Map failed
>
>     at sun.nio.ch.FileChannelImpl.map0(Native Method)
>
>     at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:935)
>
>     ... 32 more
>
>
> Each broker possesses 16 GB of memory out of which 10 GB is allotted to
> the JVM as heap. Total partition count on each broker is approximately 2000
> with an average partition size of 300 MB.
>
>
> After looking around, I found out that increasing the OS level memory map
> area limit `vm.max_map_count` is a viable solution, since Kafka memory
> map’s segment files while rolling over and the above stack trace indicates
> a failure in doing that. Since then I have increased this every time a
> broker goes down with this error. Currently I am at 250,000 on two brokers
> and 200,000 on one, which is very high considering the estimation formula
> mentioned at https://kafka.apache.org/documentation/#os. Most recently I
> started to monitor the memory map file count (using /proc/<pid>/maps) of
> the Kafka process on each broker, below is the graph.
>
>
> [image: Screenshot 2019-09-27 at 12.02.38 PM.png]
>
>
> My concern is that this value is on an overall increasing trend, with an
> average increase of 27.7K across brokers in the roughly 2 days of
> monitoring.
>
>
> Following are my questions:
>
>    1. Will I have to keep incrementing `vm.max_map_count` till I arrive
>    at a stable value?
>    2. Could this by any chance indicate a memory leak? Maybe in the
>    subroutine that rolls over segment files?
>    3. Could the lack of page cache memory be a cause as well? Volume of
>    cached memory seems to remain consistent across time so it doesn’t appear
>    to be a suspect by I am not ruling it out for now. As a mitigation I will
>    be decreasing the JVM heap next time so that more memory is available for
>    page cache.
>
> --
>
> *Arpit Gogia **|* *Data Engineer*
>

Reply via email to