[ https://issues.apache.org/jira/browse/KAFKA-4972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929298#comment-16929298 ]
Sri Vishnu commented on KAFKA-4972: ----------------------------------- Hi all, we had a similar issue when we were restarting our brokers. Turns out, for us, it was an issue with the {{systemd}} configuration. We have 350 GB of data on each broker with 150 topics and shutting down the Kafka server needs about 8 minutes. However, {{systemd}} was configured to wait only 90 seconds for the server to shutdown and then its force kills the server. When the server is restarted, it will end up having corrupted index file because its didn't shutdown properly. The fix was to set the {{TimeoutStopSec=600}} config in systemd configuration. We summarised the issue and the fix in a blog post: [https://blog.experteer.engineering/kafka-corrupted-index-file-warnings-after-broker-restart.html] Hopefully, it is helpful for some of you. > Kafka 0.10.0 Found a corrupted index file during Kafka broker startup > ---------------------------------------------------------------------- > > Key: KAFKA-4972 > URL: https://issues.apache.org/jira/browse/KAFKA-4972 > Project: Kafka > Issue Type: Bug > Components: log > Affects Versions: 0.10.0.0 > Environment: JDK: HotSpot x64 1.7.0_80 > Tag: 0.10.0 > Reporter: fangjinuo > Priority: Critical > Labels: reliability > Attachments: Snap3.png > > > -deleted text-After force shutdown all kafka brokers one by one, restart them > one by one, but a broker startup failure. > The following WARN leval log was found in the log file: > found a corrutped index file, xxxx.index , delet it ... > you can view details by following attachment. > ~I look up some codes in core module, found out : > the nonthreadsafe method LogSegment.append(offset, messages) has tow caller: > 1) Log.append(messages) // here has a synchronized > lock > 2) LogCleaner.cleanInto(topicAndPartition, source, dest, map, retainDeletes, > messageFormatVersion) // here has not > So I guess this may be the reason for the repeated offset in 00000xx.log file > (logsegment's .log) ~ > Although this is just my inference, but I hope that this problem can be > quickly repaired -- This message was sent by Atlassian Jira (v8.3.2#803003)