[ 
https://issues.apache.org/jira/browse/KAFKA-6933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16514858#comment-16514858
 ] 

Ismael Juma commented on KAFKA-6933:
------------------------------------

When a broker is shutdown cleanly it does what we call a "controlled shutdown", 
it tells the controller that it's going away and waits until leadership is 
moved to other brokers. This can take some time in some cases and if you have a 
script that kills the broker after a short period of time waiting, that could 
be the reason. 1.1.x does controlled shutdowns much faster too.

> Broker reports Corrupted index warnings apparently infinitely
> -------------------------------------------------------------
>
>                 Key: KAFKA-6933
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6933
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 1.0.1
>            Reporter: Franco Bonazza
>            Priority: Major
>
> I'm running into a situation where the server logs show continuously the 
> following snippet:
> {noformat}
> [2018-05-23 10:58:56,590] INFO Loading producer state from offset 20601420 
> for partition transaction_r10_updates-6 with message format version 2 
> (kafka.log.Log)
> [2018-05-23 10:58:56,592] INFO Loading producer state from snapshot file 
> '/data/0/kafka-logs/transaction_r10_updates-6/00000000000020601420.snapshot' 
> for partition transaction_r10_u
> pdates-6 (kafka.log.ProducerStateManager)
> [2018-05-23 10:58:56,593] INFO Completed load of log 
> transaction_r10_updates-6 with 74 log segments, log start offset 0 and log 
> end offset 20601420 in 5823 ms (kafka.log.Log)
> [2018-05-23 10:58:58,761] WARN Found a corrupted index file due to 
> requirement failed: Corrupt index found, index file 
> (/data/0/kafka-logs/transaction_r10_updates-15/00000000000020544956.index) 
> has non-zero size but the last offset is 20544956 which is no larger than the 
> base offset 20544956.}. deleting 
> /data/0/kafka-logs/transaction_r10_updates-15/00000000000020544956.timeindex, 
> /data/0/kafka-logs/transaction_r10_updates-15/00000000000020544956.index, and 
> /data/0/kafka-logs/transaction_r10_updates-15/00000000000020544956.txnindex 
> and rebuilding index... (kafka.log.Log)
> [2018-05-23 10:58:58,763] INFO Loading producer state from snapshot file 
> '/data/0/kafka-logs/transaction_r10_updates-15/00000000000020544956.snapshot' 
> for partition transaction_r10_updates-15 (kafka.log.ProducerStateManager)
> [2018-05-23 10:59:02,202] INFO Recovering unflushed segment 20544956 in log 
> transaction_r10_updates-15. (kafka.log.Log){noformat}
> The set up is the following,
> Broker is 1.0.1
> There are mirrors from another cluster using client 0.10.2.1 
> There are kafka streams and other custom consumer / producers using 1.0.0 
> client.
>  
> While is doing this the JVM of the broker is up but it doesn't respond so 
> it's impossible to produce, consume or run any commands.
> If I delete all the index files the WARN turns into an ERROR, which takes a 
> long time (1 day last time I tried) but eventually it goes into a healthy 
> state, then I start the producers and things are still healthy, but when I 
> start the consumers it quickly goes into the original WARN loop, which seems 
> infinite.
>  
> I couldn't find any references to the problem, it seems to be at least 
> missreporting the issue, and perhaps it's not infinite? I let it loop over 
> the WARN for over a day and it never moved past that, and if there was 
> something really wrong with the state maybe it should be reported.
> The log cleaner log showed a few "too many files open" when it originally 
> happened but ulimit has always been set to unlimited so I'm not sure what 
> that error means.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to