0.11.0.1: ReplicaFetcherThread exceptions after clearing corrupted broker data and restarting

Vanlerberghe, Luc Wed, 08 Nov 2017 07:33:48 -0800

Hi,

We have a kafka setup with 6 brokers and topics having replication factor 3 
(single partition).


After an improper shutdown, we had corrupted index files on two of our 
production servers, causing "WARN Found a corrupted index file due to 
requirement failed: Corrupt index found," messages and kafka shutting down on 
startup with a "FATAL Exiting Kafka.(kafka.server.KafkaServerStartable)" 
message.

All topics are still accessible, but unfortunately the most important one has 
only a single ISR left.

We decided to clear all kafka data and restart the brokers believing they would 
fetch all needed data back from the leader to become in-sync again, but on 
startup we see the following messages in the log (repeating at an alarming rate)
WARN [ReplicaFetcherThread-0-3]: Replica 4 for partition <topic>-0 reset its 
fetch offset from 0 to current leader 3's start offset 0 
(kafka.server.ReplicaFetcherThread)
ERROR [ReplicaFetcherThread-0-3]: Current offset 0 for partition [<topic>,0] 
out of range; reset offset to 0 (kafka.server.ReplicaFetcherThread)

This looks to me as a similar problem as 
https://issues.apache.org/jira/browse/KAFKA-6003

While trying to reassign a topic that had lost one of its ISRs (I kept the 
existing ISRs, but deleted the failing broker and added an existing one) we got 
the same messages on that existing broker.

[2017-11-08 16:21:30,893] WARN [ReplicaFetcherThread-0-1]: Replica 5 for 
partition <topic>-0 reset its fetch offset from 0 to current leader 1's start 
offset 0 (kafka.server.ReplicaFetcherThread)
[2017-11-08 16:21:30,893] ERROR [ReplicaFetcherThread-0-1]: Current offset 0 
for partition [<topic>,0] out of range; reset offset to 0 
(kafka.server.ReplicaFetcherThread)

This is even more annoying since I don't want to shut down that broker as well 
and it generates about 800M logs per hour (fortunately only about 100M 
compressed)

Does anybody have a clue what's going on and how to fix it?
If the fix in 0.11.0.2 would solve our issue, how soon can we expect the 
release (if at all)

Thanks,

Luc

0.11.0.1: ReplicaFetcherThread exceptions after clearing corrupted broker data and restarting

Reply via email to