Hi All,

We are running Kafka version 0.9.0.1, at the time the brokers crashed
yesterday we were running in a 2 mode cluster. This has now been increased
to 3.

We are not specifying a broker id and relying on kafka generating one.

After the brokers crashed (at exactly the same time) we left kafka stopped
for a while. After kafka was started back up, the broker id's on both
servers were incremented, they were 1001/1002 and they flipped to
1003/1004. This seemed to cause some problems as partitions were assigned
to broker id's which it believed had disappeared and so were not
recoverable.

We noticed that the broker id's are actually stored in:

/tmp/kafka-logs/meta.properties

So we set these back to what they were and restarted. Is there a reason why
these would change?

Below are the error logs from each server:

Server 1

[2016-05-25 09:05:52,827] INFO [ReplicaFetcherManager on broker 1002]
Removed fetcher for partitions [Topic1Heartbeat,1]
(kafka.server.ReplicaFetcherManager)
[2016-05-25 09:05:52,831] INFO Completed load of log Topic1Heartbeat-1 with
log end offset 0 (kafka.log.Log)
[2016-05-25 09:05:52,831] INFO Created log for partition
[Topic1Heartbeat,1] in /tmp/kafka-logs with properties {compression.type ->
producer, file.delete.delay.ms -> 60000, max.message.bytes -> 1000012,
min.insync.replicas -> 1, segment.
jitter.ms -> 0, preallocate -> false, min.cleanable.dirty.ratio -> 0.5,
index.interval.bytes -> 4096, unclean.leader.election.enable -> true,
retention.bytes -> -1, delete.retention.ms -> 86400000, cleanup.policy ->
delete, flush.ms -> 9
223372036854775807, segment.ms -> 604800000, segment.bytes -> 1073741824,
retention.ms -> 604800000, segment.index.bytes -> 10485760, flush.messages
-> 9223372036854775807}. (kafka.log.LogManager)
[2016-05-25 09:05:52,831] INFO Partition [Topic1Heartbeat,1] on broker
1002: No checkpointed highwatermark is found for partition
[Topic1Heartbeat,1] (kafka.cluster.Partition)
[2016-05-25 09:14:12,189] INFO [GroupCoordinator 1002]: Preparing to
restabilize group Topic1 with old generation 0
(kafka.coordinator.GroupCoordinator)
[2016-05-25 09:14:12,190] INFO [GroupCoordinator 1002]: Stabilized group
Topic1 generation 1 (kafka.coordinator.GroupCoordinator)
[2016-05-25 09:14:12,195] INFO [GroupCoordinator 1002]: Assignment received
from leader for group Topic1 for generation 1
(kafka.coordinator.GroupCoordinator)
[2016-05-25 09:14:12,749] FATAL [Replica Manager on Broker 1002]: Halting
due to unrecoverable I/O error while handling produce request:
 (kafka.server.ReplicaManager)
kafka.common.KafkaStorageException: I/O exception in append to log
'__consumer_offsets-0'
        at kafka.log.Log.append(Log.scala:318)
        at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:442)
        at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:428)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
        at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:268)
        at
kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:428)
        at
kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:401)
        at
kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:386)
        at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
        at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
        at scala.collection.immutable.Map$Map1.foreach(Map.scala:116)
        at
scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
        at scala.collection.AbstractTraversable.map(Traversable.scala:104)
        at
kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:386)
        at
kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:322)
        at
kafka.coordinator.GroupMetadataManager.store(GroupMetadataManager.scala:228)
        at
kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429)
        at
kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429)
        at scala.Option.foreach(Option.scala:257)
        at
kafka.coordinator.GroupCoordinator.handleCommitOffsets(GroupCoordinator.scala:429)
        at
kafka.server.KafkaApis.handleOffsetCommitRequest(KafkaApis.scala:280)
        at kafka.server.KafkaApis.handle(KafkaApis.scala:76)
        at
kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException:
/tmp/kafka-logs/__consumer_offsets-0/00000000000000000000.index (No such
file or directory)
        at java.io.RandomAccessFile.open0(Native Method)
        at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
        at
kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:277)
        at
kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
        at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276)
        at
kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetIndex.scala:265)
        at
kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
        at
kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
        at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264)
        at kafka.log.Log.roll(Log.scala:627)
        at kafka.log.Log.maybeRoll(Log.scala:602)
        at kafka.log.Log.append(Log.scala:357)
        ... 23 more


Server 2


[2016-05-25 09:14:18,968] INFO [Group Metadata Manager on Broker 1001]:
Loading offsets and group metadata from [__consumer_offsets,16]
(kafka.coordinator.GroupMetadataManager)
[2016-05-25 09:14:19,004] INFO New leader is 1001
(kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
[2016-05-25 09:14:19,054] FATAL [Replica Manager on Broker 1001]: Halting
due to unrecoverable I/O error while handling produce request:
 (kafka.server.ReplicaManager)
kafka.common.KafkaStorageException: I/O exception in append to log
'__consumer_offsets-0'
        at kafka.log.Log.append(Log.scala:318)
        at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:442)
        at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:428)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
        at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:268)
        at
kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:428)
        at
kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:401)
        at
kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:386)
        at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
        at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
        at scala.collection.immutable.Map$Map1.foreach(Map.scala:116)
        at
scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
        at scala.collection.AbstractTraversable.map(Traversable.scala:104)
        at
kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:386)
        at
kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:322)
        at
kafka.coordinator.GroupMetadataManager.store(GroupMetadataManager.scala:228)
        at
kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429)
        at
kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429)
        at scala.Option.foreach(Option.scala:257)
        at
kafka.coordinator.GroupCoordinator.handleCommitOffsets(GroupCoordinator.scala:429)
        at
kafka.server.KafkaApis.handleOffsetCommitRequest(KafkaApis.scala:280)
        at kafka.server.KafkaApis.handle(KafkaApis.scala:76)
        at
kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException:
/tmp/kafka-logs/__consumer_offsets-0/00000000000000000000.index (No such
file or directory)
        at java.io.RandomAccessFile.open0(Native Method)
        at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
        at
kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:277)
        at
kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
        at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276)
        at
kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetIndex.scala:265)
        at
kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
        at
kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
        at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264)
        at kafka.log.Log.roll(Log.scala:627)
        at kafka.log.Log.maybeRoll(Log.scala:602)
        at kafka.log.Log.append(Log.scala:357)
        ... 23 more


This happened on 2 different servers, so I find it hard to believe they
both had i/o problems at the same time. Does anyone have any idea about
what might have happened?

Thanks!

Reply via email to