I think "java.lang.OutOfMemoryError: Map failed" has usually been "out of address space for mmap" if memory serves.
If you sum the length of all .index files while the service is running (not after stopped), do they sum to something really close to 2GB? If so it is likely either that the OS/arch is 32 bit (which on slack you said it wasn't) or possibly the jvm is in 32 bit mode? If you want to debug easiest test would be a simple program that did something like: public void static main(String[] args) throws Exception { RandomAccessFile raf = new RandomAccessFile("test-file-1", "rw"); RandomAccessFile raf2 = new RandomAccessFile("test-file-2", "rw"); raf1.setLength(2*1024*1024*1024); raf2.setLength(2*1024*1024*1024); MappedByteBuffer b1 = raf1.getChannel.map(FileChannel.MapMode.READ_WRITE, 0, 2*1024*1024*1024); MappedByteBuffer b2 = raf2.getChannel.map(FileChannel.MapMode.READ_WRITE, 0, 2*1024*1024*1024); } If you compile this and run with the same options you're running kafka with it should succeed but if it fails with the same error that is the address space limit for 32 bits kicking in. -Jay On Wed, May 13, 2015 at 4:24 PM, Jeff Field <jvfi...@blizzard.com> wrote: > Hello, > We are doing a Kafka POC on our CDH cluster. We are running 3 brokers with > 24TB (48TB Raw) of available RAID10 storage (XFS filesystem mounted with > nobarrier/largeio) (HP Smart Array P420i for the controller, latest > firmware) and 48GB of RAM. The broker is running with "-Xmx4G -Xms4G > -server -XX:+UseParNewGC -XX:+UseConcMarkSweepGC > -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark > -XX:+DisableExplicitGC". This is on RHEL 6.6 with the > 2.6.32-504.8.1.el6.x86_64 kernel. JDK is jdk1.7.0_67 64-bit. We were using > the 1.2.0 version of the Cloudera Kafka 0.8.2.0 build. We are upgrading to > 1.3.0 after the RAID testing, but none of the fixes they included in 1.3.0 > seem to be related to what we're seeing. > > We are using a custom producer to push copies of real messages from our > existing messaging system onto Kafka in order to test ingestion rates and > compression ratios. After a couple of hours (during which about 4.3 > billion, ~2.2 terabytes before replication), one of our brokers will fail > with an I/O error (2 slightly different ones so far) followed by a memory > error. We're currently doing stress testing on the arrays (write/verify > with IOzone set for 24 threads), but assuming the tests don't find anything > on IO, what could cause this? Errors are included below. > > Thanks, > -Jeff > > Occurrence 1: > 2015-05-12 03:55:08,291 FATAL kafka.server.KafkaApis: [KafkaApi-834] > Halting due to unrecoverable I/O error while handling produce request: > kafka.common.KafkaStorageException: I/O exception in append to log > 'TEST_TOPIC-1' > at kafka.log.Log.append(Log.scala:266) > at > kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:379) > at > kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:365) > at kafka.utils.Utils$.inLock(Utils.scala:561) > at kafka.utils.Utils$.inReadLock(Utils.scala:567) > at > kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:365) > at > kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:291) > at > kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:282) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at kafka.server.KafkaApis.appendToLocalLog(KafkaApis.scala:282) > at > kafka.server.KafkaApis.handleProducerOrOffsetCommitRequest(KafkaApis.scala:204) > at kafka.server.KafkaApis.handle(KafkaApis.scala:59) > at > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:59) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Map failed > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:888) > at kafka.log.OffsetIndex.<init>(OffsetIndex.scala:74) > at kafka.log.LogSegment.<init>(LogSegment.scala:57) > at kafka.log.Log.roll(Log.scala:565) > at kafka.log.Log.maybeRoll(Log.scala:539) > at kafka.log.Log.append(Log.scala:306) > ... 21 more > Caused by: java.lang.OutOfMemoryError: Map failed > at sun.nio.ch.FileChannelImpl.map0(Native Method) > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:885) > ... 26 more > > Occurrence 2: > > 2015-05-12 20:08:15,052 FATAL kafka.server.KafkaApis: [KafkaApi-835] > Halting due to unrecoverable I/O error while handling produce request: > > kafka.common.KafkaStorageException: I/O exception in append to log > 'TEST_TOPIC-23' > > at kafka.log.Log.append(Log.scala:266) > > at > kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:379) > > at > kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:365) > > at kafka.utils.Utils$.inLock(Utils.scala:561) > > at kafka.utils.Utils$.inReadLock(Utils.scala:567) > > at > kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:365) > > at > kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:291) > > at > kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:282) > > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) > > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) > > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) > > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) > > at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) > > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > > at kafka.server.KafkaApis.appendToLocalLog(KafkaApis.scala:282) > > at > kafka.server.KafkaApis.handleProducerOrOffsetCommitRequest(KafkaApis.scala:204) > > at kafka.server.KafkaApis.handle(KafkaApis.scala:59) > > at > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:59) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: java.io.IOException: Map failed > > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:888) > > at kafka.log.OffsetIndex.<init>(OffsetIndex.scala:74) > > at kafka.log.LogSegment.<init>(LogSegment.scala:57) > > at kafka.log.Log.roll(Log.scala:565) > > at kafka.log.Log.maybeRoll(Log.scala:539) > > at kafka.log.Log.append(Log.scala:306) > > ... 21 more > > Caused by: java.lang.OutOfMemoryError: Map failed > > at sun.nio.ch.FileChannelImpl.map0(Native Method) > > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:885) > > ... 26 more > > Occurrence 3: > > 2015-05-13 01:11:14,626 FATAL kafka.server.ReplicaFetcherThread: > [ReplicaFetcherThread-0-835], Disk error while replicating data. > > kafka.common.KafkaStorageException: I/O exception in append to log > 'TEST_TOPIC-17' > > at kafka.log.Log.append(Log.scala:266) > > at > kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:54) > > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(AbstractFetcherThread.scala:128) > > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(AbstractFetcherThread.scala:109) > > at > scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224) > > at > scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) > > at > scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) > > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply$mcV$sp(AbstractFetcherThread.scala:109) > > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(AbstractFetcherThread.scala:109) > > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(AbstractFetcherThread.scala:109) > > at kafka.utils.Utils$.inLock(Utils.scala:561) > > at > kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:108) > > at > kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:86) > > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) > > Caused by: java.io.IOException: Map failed > > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:888) > > at kafka.log.OffsetIndex.<init>(OffsetIndex.scala:74) > > at kafka.log.LogSegment.<init>(LogSegment.scala:57) > > at kafka.log.Log.roll(Log.scala:565) > > at kafka.log.Log.maybeRoll(Log.scala:539) > > at kafka.log.Log.append(Log.scala:306) > > ... 13 more > > Caused by: java.lang.OutOfMemoryError: Map failed > > at sun.nio.ch.FileChannelImpl.map0(Native Method) > > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:885) > > ... 18 more > > >