Hi Madhukar, Thanks for your quick response. The path is "/tmp/kafka-logs/". But the servers have not been restarted any time lately. The uptime for all the 3 servers is almost 67 days.
Regards, Rahul Misra -----Original Message----- From: Madhukar Bharti [mailto:bhartimadhu...@gmail.com] Sent: Wednesday, June 22, 2016 8:37 PM To: users@kafka.apache.org Subject: Re: Kafka broker crash Hi Rahul, Whether the path is "/tmp/kafka-logs/" or "/temp/kafka-logs" ? Mostly if path is set to "/tmp/" then in case machine restart it may delete the files. So it is throwing FileNotFoundException. you can change the file location to some other path and restart all broker. This might fix the issue. Regrads, Madhukar On Wed, Jun 22, 2016 at 1:40 PM, Misra, Rahul <rahul.mi...@altisource.com> wrote: > Hi, > > I'm facing a strange issue in my Kafka cluster. Could anybody please > help me with it. The issue is as follows: > > We have a 3 node kafka cluster. We installed the zookeeper separately > and have pointed the brokers to it. The zookeeper is also 3 node, but > for our POC setup, the zookeeper nodes are on the same machines as the > Kafka brokers. > > While receiving messages from an existing topic using a new groupId, 2 > of the brokers crashed with same FATAL errors: > > -------------------------------------------------------- > <<<<<<<<<<<<<---- [server 2 logs] ---->>>>>>>>>>>>>>> > > [2016-06-21 23:09:14,697] INFO [GroupCoordinator 1]: Stabilized group > pocTestNew11 generation 1 (kafka.coordinator.Gro > upCoordinator) > [2016-06-21 23:09:15,006] INFO [GroupCoordinator 1]: Assignment > received from leader for group pocTestNew11 for genera tion 1 > (kafka.coordinator.GroupCoordinator) > [2016-06-21 23:09:20,335] FATAL [Replica Manager on Broker 1]: Halting > due to unrecoverable I/O error while handling p roduce request: > (kafka.server.ReplicaManager) > kafka.common.KafkaStorageException: I/O exception in append to log > '__consumer_offsets-4' > at kafka.log.Log.append(Log.scala:318) > at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:442) > at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:428) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) > at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:268) > at > kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:428) > at > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:401) > at > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:386) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at scala.collection.immutable.Map$Map1.foreach(Map.scala:116) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:386) > at > kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:322) > at > kafka.coordinator.GroupMetadataManager.store(GroupMetadataManager.scala:228) > at > kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429) > at > kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429) > at scala.Option.foreach(Option.scala:257) > at > kafka.coordinator.GroupCoordinator.handleCommitOffsets(GroupCoordinator.scala:429) > at > kafka.server.KafkaApis.handleOffsetCommitRequest(KafkaApis.scala:280) > at kafka.server.KafkaApis.handle(KafkaApis.scala:76) > at > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.FileNotFoundException: > /tmp/kafka-logs/__consumer_offsets-4/00000000000000000000.index (No > such file or directory) > at java.io.RandomAccessFile.open0(Native Method) > at java.io.RandomAccessFile.open(RandomAccessFile.java:316) > at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243) > at > kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:277) > at > kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) > at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276) > at > kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetIndex.scala:265) > at > kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265) > at > kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) > at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264) > at kafka.log.Log.roll(Log.scala:627) > at kafka.log.Log.maybeRoll(Log.scala:602) > at kafka.log.Log.append(Log.scala:357) > > ---------------------------------------------- > <<<<<<<<<<<<<---- [server 3 logs] ---->>>>>>>>>>>>>>> > > [2016-06-21 23:08:49,796] FATAL [ReplicaFetcherThread-0-0], Disk error > while replicating data. (kafka.server.ReplicaFe > tcherThread) > kafka.common.KafkaStorageException: I/O exception in append to log > '__consumer_offsets-4' > at kafka.log.Log.append(Log.scala:318) > at > kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:113) > at > kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:42) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2. > apply(AbstractFetcherThread.scala:138) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2. > apply(AbstractFetcherThread.scala:122) > at scala.Option.foreach(Option.scala:257) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$ano > nfun$apply$mcV$sp$1.apply(AbstractFet > cherThread.scala:122) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:120) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:120) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:120) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:120) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) > at > kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118) > at > kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:93) > at > kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) > Caused by: java.io.FileNotFoundException: > /tmp/kafka-logs/__consumer_offsets-4/00000000000000000000.index (No > such file or directory) > at java.io.RandomAccessFile.open0(Native Method) > at java.io.RandomAccessFile.open(RandomAccessFile.java:316) > at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243) > at > kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:277) > at > kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) > at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276) > at > kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetIndex.scala:265) > at > kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265) > at > kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) > at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264) > at kafka.log.Log.roll(Log.scala:627) > at kafka.log.Log.maybeRoll(Log.scala:602) > at kafka.log.Log.append(Log.scala:357) > ... 19 more > > > > For the topic "__consumer_offsets" which is used to commit consumer > offsets the default number of partitions is 50 and the replication > factor is 3. > So ideally all the 3 brokers should have logs for all partitions for > "__consumer_offsets". > I checked the "/temp/kafka-logs" directory for each server and except > for the broker 1, the other 2 brokers (server 2 and 3) do not contain > replicas for all the partitions for "__consumer_offsets". There are > log directories missing for many partitions for "__consumer_offsets" > on brokers 2 and 3 (including partition 4 which resulted in the above crash). > > What could be the cause for this crash. Is there any mis-configuration > for the broker that can cause this? > > Regards, > Rahul Misra > > Technical Lead > Altisource(tm) > Mobile: 9886141541 | Ext: 298269 > rahul.mi...@altisource.com<mailto:rahul.mi...@altisource.com> | > www.Altisource.com<http://www.altisource.com/> > > This email message and any attachments are intended solely for the use > of the addressee. If you are not the intended recipient, you are > prohibited from reading, disclosing, reproducing, distributing, > disseminating or otherwise using this transmission. If you have > received this message in error, please promptly notify the sender by > reply email and immediately delete this message from your system. This > message and any attachments may contain information that is > confidential, privileged or exempt from disclosure. Delivery of this > message to any person other than the intended recipient is not > intended to waive any right or privilege. Message transmission is not > guaranteed to be secure or free of software viruses. > > ********************************************************************** > ************************************************* > This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. ***********************************************************************************************************************