I had only 1 topic with 45 partitions replicated across 3 brokers. After several hours of uploading some data to kafka 1 broker died with the following exception. I guess i can fix it raising limit for open files, but I wonder how it happened under described circumstances.
[2013-11-02 00:19:14,862] INFO Reconnect due to socket error: null (kafka.consumer.SimpleConsumer) [2013-11-02 00:19:14,706] INFO Reconnect due to socket error: null (kafka.consumer.SimpleConsumer) [2013-11-02 00:19:05,150] INFO Reconnect due to socket error: null (kafka.consumer.SimpleConsumer) [2013-11-02 00:09:08,569] FATAL [ReplicaFetcherThread-0-2], Disk error while replicating data. (kafka.server.ReplicaFetcherThread) kafka.common.KafkaStorageException: I/O exception in append to log 'perf1-4' at kafka.log.Log.append(Unknown Source) at kafka.server.ReplicaFetcherThread.processPartitionData(Unknown Source) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(Unknown Source) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(Unknown Source) at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply$mcV$sp(Unknown Source) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(Unknown Source) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(Unknown Source) at kafka.utils.Utils$.inLock(Unknown Source) at kafka.server.AbstractFetcherThread.processFetchRequest(Unknown Source) at kafka.server.AbstractFetcherThread.doWork(Unknown Source) at kafka.utils.ShutdownableThread.run(Unknown Source) Caused by: java.io.FileNotFoundException: /disk1/kafka-logs/perf1-4/00000000000000010558.index (Too many open files) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.<init>(RandomAccessFile.java:241) at kafka.log.OffsetIndex$$anonfun$resize$1.apply(Unknown Source) at kafka.log.OffsetIndex$$anonfun$resize$1.apply(Unknown Source) at kafka.utils.Utils$.inLock(Unknown Source) at kafka.log.OffsetIndex.resize(Unknown Source) at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(Unknown Source) at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(Unknown Source) at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(Unknown Source) at kafka.utils.Utils$.inLock(Unknown Source) at kafka.log.OffsetIndex.trimToValidSize(Unknown Source) at kafka.log.Log.roll(Unknown Source) at kafka.log.Log.maybeRoll(Unknown Source)