Franco Bonazza created KAFKA-6933:
-------------------------------------

             Summary: Broker reports Corrupted index warnings apparently 
infinitely
                 Key: KAFKA-6933
                 URL: https://issues.apache.org/jira/browse/KAFKA-6933
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 1.0.1
            Reporter: Franco Bonazza


I'm running into a situation where the server logs show continuously the 
following snippet:
{noformat}
[2018-05-23 10:58:56,590] INFO Loading producer state from offset 20601420 for 
partition transaction_r10_updates-6 with message format version 2 
(kafka.log.Log)
[2018-05-23 10:58:56,592] INFO Loading producer state from snapshot file 
'/data/0/kafka-logs/transaction_r10_updates-6/00000000000020601420.snapshot' 
for partition transaction_r10_u
pdates-6 (kafka.log.ProducerStateManager)
[2018-05-23 10:58:56,593] INFO Completed load of log transaction_r10_updates-6 
with 74 log segments, log start offset 0 and log end offset 20601420 in 5823 ms 
(kafka.log.Log)
[2018-05-23 10:58:58,761] WARN Found a corrupted index file due to requirement 
failed: Corrupt index found, index file 
(/data/0/kafka-logs/transaction_r10_updates-15/00000000000020544956.index) has 
non-zero size but the last offset is 20544956 which is no larger than the base 
offset 20544956.}. deleting 
/data/0/kafka-logs/transaction_r10_updates-15/00000000000020544956.timeindex, 
/data/0/kafka-logs/transaction_r10_updates-15/00000000000020544956.index, and 
/data/0/kafka-logs/transaction_r10_updates-15/00000000000020544956.txnindex and 
rebuilding index... (kafka.log.Log)
[2018-05-23 10:58:58,763] INFO Loading producer state from snapshot file 
'/data/0/kafka-logs/transaction_r10_updates-15/00000000000020544956.snapshot' 
for partition transaction_r10_updates-15 (kafka.log.ProducerStateManager)
[2018-05-23 10:59:02,202] INFO Recovering unflushed segment 20544956 in log 
transaction_r10_updates-15. (kafka.log.Log){noformat}
The set up is the following,

Broker is 1.0.1

There are mirrors from another cluster using client 0.10.2.1 

There are kafka streams and other custom consumer / producers using 1.0.0 
client.

 

While is doing this the JVM of the broker is up but it doesn't respond so it's 
impossible to produce, consume or run any commands.

If I delete all the index files the WARN turns into an ERROR, which takes a 
long time (1 day last time I tried) but eventually it goes into a healthy 
state, then I start the producers and things are still healthy, but when I 
start the consumers it quickly goes into the original WARN loop, which seems 
infinite.

 

I couldn't find any references to the problem, it seems to be at least 
mis-reporting the issue, and perhaps it's not infinite? I let it loop over the 
WARN for over a day and it never moved past that, and if there was something 
really wrong with the state maybe it should be reported.

The log cleaner log showed a few "too many files open" when it originally 
happened but ulimit has always been set to unlimited so I'm not sure what that 
error means.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to