Our 2 kafka brokers ( 1 & 5) were rebooted due to hypervisor going down and I think we encountered a similar issue that was discussed in thread "Problem with node after restart no partitions?". The resulting JIRA <https://issues.apache.org/jira/browse/KAFKA-2108> is closed without conclusions or
recovery steps.

Our Brokers 5 and 1 were also running zookeeper of our cluster (along with broker 2),
we are running kafka version 0.8.2.1

After doing a controlled restarts over all brokers a few times our cluster seems ok now.

But there are a some topics that have replicas out of sync with Leaders.

Partition 2 below has Leader 5 and replicas order should be 5,1


Topic:2015-01-12        PartitionCount:3 ReplicationFactor:2     Configs:
Topic: 2015-01-12 Partition: 0 Leader: 4 Replicas: 4,3 Isr: 3,4 Topic: 2015-01-12 Partition: 1 Leader: 0 Replicas: 0,4 Isr: 0,4 Topic: 2015-01-12 Partition: 2 Leader: 5 Replicas: 1,5 Isr: 5


I tried reassigning partition 2 replicas to broker 5 (leader) and broker : 0

Now partition reassignment is stuck for more than a day.


%) /usr/local/kafka/bin/kafka-reassign-partitions.sh --zookeeper kafka-trgt05:2182 --reassignment-json-file 2015-01-12_2.json --verify
Status of partition reassignment:
Reassignment of partition [2015-01-12,2] is still in progress

And In zookeeper, reassign_partitions is empty..

[zk: kafka-trgt05:2182(CONNECTED) 2] ls /admin/reassign_partitions
[]



Any thoughts on how to recover from this scenario?



Cheers,
/Manish








Our server.properties :

broker.id=0
port=9192
num.network.threads=12
num.io.threads=12
socket.send.buffer.bytes=1048576
socket.receive.buffer.bytes=1048576
socket.request.max.bytes=104857600
queued.max.requests=16
auto.leader.rebalance.enable=true
controlled.shutdown.enable=true
controlled.shutdown.retry.backoff.ms=30000
fetch.purgatory.purge.interval.requests=100
producer.purgatory.purge.interval.requests=100
controller.socket.timeout.ms=30000
controller.message.queue.size=10000
log.dirs=/opt/kafka/data/logs
num.partitions=5
default.replication.factor=2
delete.topic.enable=true
num.replica.fetchers=8
replica.fetch.max.bytes=1048576
replica.fetch.wait.max.ms=5000
replica.socket.timeout.ms=30000
replica.socket.receive.buffer.bytes=1048576
replica.lag.time.max.ms=10000
replica.lag.max.messages=4000
replica.fetch.min.bytes=10240
log.flush.interval.messages=10000
log.flush.interval.ms=1000
log.retention.hours=72
log.segment.bytes=536870912
log.retention.check.interval.ms=60000
log.cleaner.enable=true
zookeeper.connect=kafka-trgt05:2182,kafka-trgt01:2182,kafka-trgt02:2182
zookeeper.connection.timeout.ms=1000000

Reply via email to