Thanks for the input.
I do plan on reducing the heap for the next restart. Initially I'd thought
that less heap memory could also be the reason but since then I've learned
that cannot be the issue.
I also found the number 262144 online but I believe those results were
related to Elastic Search.


On Fri, Sep 27, 2019 at 2:31 PM Karolis Pocius
<karolis.poc...@sentiance.com.invalid> wrote:

> How did you arrive at the 10 GB JVM heap value? I'm running Kafka on 16 GB
> RAM instances with ~4000 partitions each and only assigning 5 GB to JVM of
> which Kafka only seems to be using ~2 GB at any given time.
>
> Also, I've set vm.max_map_count to 262144 -- didn't use any formula to
> estimate that, must have been some answer I found online, but it's been
> doing its trick -- no issues so far.
>
> On Fri, Sep 27, 2019 at 11:29 AM Arpit Gogia <ar...@ixigo.com> wrote:
>
> > Hello Kafka user group
> >
> >
> > I am running a Kafka cluster with 3 brokers and have been experiencing
> > frequent OutOfMemory errors each time with similar error stack trace
> >
> >
> > java.io.IOException: Map failed
> >
> >     at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:938)
> >
> >     at
> >
> kafka.log.AbstractIndex$$anonfun$resize$1.apply$mcZ$sp(AbstractIndex.scala:188)
> >
> >     at
> > kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:173)
> >
> >     at
> > kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:173)
> >
> >     at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
> >
> >     at kafka.log.AbstractIndex.resize(AbstractIndex.scala:173)
> >
> >     at
> >
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply$mcZ$sp(AbstractIndex.scala:242)
> >
> >     at
> >
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:242)
> >
> >     at
> >
> kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:242)
> >
> >     at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
> >
> >     at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:241)
> >
> >     at kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:501)
> >
> >     at
> > kafka.log.Log$$anonfun$roll$2$$anonfun$apply$32.apply(Log.scala:1635)
> >
> >     at
> > kafka.log.Log$$anonfun$roll$2$$anonfun$apply$32.apply(Log.scala:1635)
> >
> >     at scala.Option.foreach(Option.scala:257)
> >
> >     at kafka.log.Log$$anonfun$roll$2.apply(Log.scala:1635)
> >
> >     at kafka.log.Log$$anonfun$roll$2.apply(Log.scala:1599)
> >
> >     at kafka.log.Log.maybeHandleIOException(Log.scala:1996)
> >
> >     at kafka.log.Log.roll(Log.scala:1599)
> >
> >     at
> kafka.log.Log$$anonfun$deleteSegments$1.apply$mcI$sp(Log.scala:1434)
> >
> >     at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log.scala:1429)
> >
> >     at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log.scala:1429)
> >
> >     at kafka.log.Log.maybeHandleIOException(Log.scala:1996)
> >
> >     at kafka.log.Log.deleteSegments(Log.scala:1429)
> >
> >     at kafka.log.Log.deleteOldSegments(Log.scala:1424)
> >
> >     at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1501)
> >
> >     at kafka.log.Log.deleteOldSegments(Log.scala:1492)
> >
> >     at
> >
> kafka.log.LogCleaner$CleanerThread$$anonfun$cleanFilthiestLog$1.apply(LogCleaner.scala:328)
> >
> >     at
> >
> kafka.log.LogCleaner$CleanerThread$$anonfun$cleanFilthiestLog$1.apply(LogCleaner.scala:324)
> >
> >     at scala.collection.immutable.List.foreach(List.scala:392)
> >
> >     at
> >
> kafka.log.LogCleaner$CleanerThread.cleanFilthiestLog(LogCleaner.scala:324)
> >
> >     at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:300)
> >
> >     at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> >
> > Caused by: java.lang.OutOfMemoryError: Map failed
> >
> >     at sun.nio.ch.FileChannelImpl.map0(Native Method)
> >
> >     at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:935)
> >
> >     ... 32 more
> >
> >
> > Each broker possesses 16 GB of memory out of which 10 GB is allotted to
> > the JVM as heap. Total partition count on each broker is approximately
> 2000
> > with an average partition size of 300 MB.
> >
> >
> > After looking around, I found out that increasing the OS level memory map
> > area limit `vm.max_map_count` is a viable solution, since Kafka memory
> > map’s segment files while rolling over and the above stack trace
> indicates
> > a failure in doing that. Since then I have increased this every time a
> > broker goes down with this error. Currently I am at 250,000 on two
> brokers
> > and 200,000 on one, which is very high considering the estimation formula
> > mentioned at https://kafka.apache.org/documentation/#os. Most recently I
> > started to monitor the memory map file count (using /proc/<pid>/maps) of
> > the Kafka process on each broker, below is the graph.
> >
> >
> > [image: Screenshot 2019-09-27 at 12.02.38 PM.png]
> >
> >
> > My concern is that this value is on an overall increasing trend, with an
> > average increase of 27.7K across brokers in the roughly 2 days of
> > monitoring.
> >
> >
> > Following are my questions:
> >
> >    1. Will I have to keep incrementing `vm.max_map_count` till I arrive
> >    at a stable value?
> >    2. Could this by any chance indicate a memory leak? Maybe in the
> >    subroutine that rolls over segment files?
> >    3. Could the lack of page cache memory be a cause as well? Volume of
> >    cached memory seems to remain consistent across time so it doesn’t
> appear
> >    to be a suspect by I am not ruling it out for now. As a mitigation I
> will
> >    be decreasing the JVM heap next time so that more memory is available
> for
> >    page cache.
> >
> > --
> >
> > *Arpit Gogia **|* *Data Engineer*
> >
>


-- 

*Arpit Gogia **|* *Software Engineer*

*+91 9650232753*

Reply via email to