mayong created KAFKA-6668: ----------------------------- Summary: Broker crashes on restart ,got a CorruptRecordException: Record size is smaller than minimum record overhead(14) Key: KAFKA-6668 URL: https://issues.apache.org/jira/browse/KAFKA-6668 Project: Kafka Issue Type: Bug Components: log Affects Versions: 0.11.0.1 Environment: Linux version : 3.10.0-514.26.2.el7.x86_64 (mockbuild@cgslv5.buildsys213) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11)
docker version: Client: Version: 1.12.6 API version: 1.24 Go version: go1.7.5 Git commit: ad3fef854be2172454902950240a9a9778d24345 Built: Mon Jan 15 22:01:14 2018 OS/Arch: linux/amd64 Reporter: mayong There is a kafka cluster with a broker ,running in docker container. Because of disk full, the container crashes, and gets restarted again, and crashes again....... log when disk full : {code:java} [2018-03-14 00:11:40,764] INFO Rolled new log segment for 'oem-debug-log-1' in 1 ms. (kafka.log.Log) [2018-03-14 00:11:40,765] ERROR Uncaught exception in scheduled task 'flush-log' (kafka.utils.KafkaScheduler) java.io.IOException: I/O error at sun.nio.ch.FileDispatcherImpl.force0(Native Method) at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76) at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:388) at org.apache.kafka.common.record.FileRecords.flush(FileRecords.java:162) at kafka.log.LogSegment$$anonfun$flush$1.apply$mcV$sp(LogSegment.scala:377) at kafka.log.LogSegment$$anonfun$flush$1.apply(LogSegment.scala:376) at kafka.log.LogSegment$$anonfun$flush$1.apply(LogSegment.scala:376) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31) at kafka.log.LogSegment.flush(LogSegment.scala:376) at kafka.log.Log$$anonfun$flush$2.apply(Log.scala:1312) at kafka.log.Log$$anonfun$flush$2.apply(Log.scala:1311) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at kafka.log.Log.flush(Log.scala:1311) at kafka.log.Log$$anonfun$roll$1.apply$mcV$sp(Log.scala:1283) at kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110) at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:57) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) [2018-03-14 00:11:56,514] ERROR [KafkaApi-7] Error when handling request {replica_id=-1,max_wait_time=100,min_bytes=1,topics=[{topic=oem-debug- log,partitions=[{partition=0,fetch_offset=0,max_bytes=1048576},{partition=1,fetch_offset=131382630,max_bytes=1048576}]}]} (kafka.server.KafkaAp is) org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller than minimum record overhead (14). {code} log when resolved issue of disk full,and kafka restart: {code:java} [2018-03-15 23:00:08,998] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/kafka/kafka-logs/__consumer _offsets-19/00000000000003396188.index) has non-zero size but the last offset is 3396188 which is no larger than the base offset 3396188.}. deleting /kafka/kafka-logs/__consumer_offsets-19/00000000000003396188.timeindex, /kafka/kafka-logs/__consumer_offsets-19/00000000000003396188.index, and /ka fka/kafka-logs/__consumer_offsets-19/00000000000003396188.txnindex and rebuilding index... (kafka.log.Log) [2018-03-15 23:00:08,999] INFO Loading producer state from snapshot file '/kafka/kafka-logs/__consumer_offsets-19/00000000000003396188.snapshot' for partition __consumer_offsets-19 (kafka.log.ProducerStateManager) [2018-03-15 23:00:09,242] INFO Recovering unflushed segment 3396188 in log __consumer_offsets-19. (kafka.log.Log) [2018-03-15 23:00:09,243] INFO Loading producer state from snapshot file '/kafka/kafka-logs/__consumer_offsets-19/00000000000003396188.snapshot' for partition __consumer_offsets-19 (kafka.log.ProducerStateManager) [2018-03-15 23:00:09,497] INFO Loading producer state from offset 3576788 for partition __consumer_offsets-19 with message format version 2 (kafka.l og.Log) [2018-03-15 23:00:09,497] INFO Loading producer state from snapshot file '/kafka/kafka-logs/__consumer_offsets-19/00000000000003576788.snapshot' for partition __consumer_offsets-19 (kafka.log.ProducerStateManager) [2018-03-15 23:00:09,498] INFO Completed load of log __consumer_offsets-19 with 3 log segments, log start offset 0 and log end offset 3576788 in 501 ms (kafka.log.Log) [2018-03-15 23:00:09,503] ERROR Could not find offset index file corresponding to log file /kafka/kafka-logs/__consumer_offsets-28/00000000000004649 658.log, rebuilding index... (kafka.log.Log) [2018-03-15 23:00:09,503] INFO Loading producer state from snapshot file '/kafka/kafka-logs/__consumer_offsets-28/00000000000004649658.snapshot' for partition __consumer_offsets-28 (kafka.log.ProducerStateManager) [2018-03-15 23:00:09,505] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/kafka/kafka-logs/__consumer _offsets-38/00000000000001866889.index) has non-zero size but the last offset is 1866889 which is no larger than the base offset 1866889.}. deleting /kafka/kafka-logs/__consumer_offsets-38/00000000000001866889.timeindex, /kafka/kafka-logs/__consumer_offsets-38/00000000000001866889.index, and /ka fka/kafka-logs/__consumer_offsets-38/00000000000001866889.txnindex and rebuilding index... (kafka.log.Log) [2018-03-15 23:00:09,506] INFO Loading producer state from snapshot file '/kafka/kafka-logs/__consumer_offsets-38/00000000000001244589.snapshot' for partition __consumer_offsets-38 (kafka.log.ProducerStateManager) [2018-03-15 23:00:09,507] ERROR There was an error in one of the threads during logs loading: org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller than minimum record overhead (14). (kafka.log.LogManager) [2018-03-15 23:00:09,509] FATAL [Kafka Server 7], Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) org.apache.kafka.common.errors.CorruptRecordException: Record size is smaller than minimum record overhead (14). [2018-03-15 23:00:09,514] INFO [Kafka Server 7], shutting down (kafka.server.KafkaServer) [2018-03-15 23:00:09,519] INFO Terminate ZkClient event thread. (org.I0Itec.zkclient.ZkEventThread) [2018-03-15 23:00:09,522] INFO Session: 0x162286e30d40711 closed (org.apache.zookeeper.ZooKeeper) [2018-03-15 23:00:09,525] INFO EventThread shut down for session: 0x162286e30d40711 (org.apache.zookeeper.ClientCnxn) [2018-03-15 23:00:09,532] INFO [Kafka Server 7], shut down completed (kafka.server.KafkaServer) [2018-03-15 23:00:09,532] FATAL Exiting Kafka. (kafka.server.KafkaServerStartable) [2018-03-15 23:00:09,550] INFO [Kafka Server 7], shutting down (kafka.server.KafkaServer) {code} And the .snapshot file is empty: {code:java} -rw-r--r-- 1 root root 0 Mar 13 09:50 00000000000000000000.index -rw-r--r-- 1 root root 653 Mar 13 09:50 00000000000000000000.log -rw-r--r-- 1 root root 12 Mar 13 09:50 00000000000000000000.timeindex -rw-r--r-- 1 root root 0 Mar 13 22:08 00000000000020480273.index -rw-r--r-- 1 root root 821 Mar 13 22:08 00000000000020480273.log -rw-r--r-- 1 root root 10 Mar 13 09:50 00000000000020480273.snapshot -rw-r--r-- 1 root root 12 Mar 13 22:08 00000000000020480273.timeindex -rw-r--r-- 1 root root 10485760 Mar 16 09:28 00000000000021450679.index -rw-r--r-- 1 root root 86736662 Mar 15 15:23 00000000000021450679.log -rw-r--r-- 1 root root 10 Mar 13 22:08 00000000000021450679.snapshot -rw-r--r-- 1 root root 10485756 Mar 16 09:28 00000000000021450679.timeindex -rw-r--r-- 1 root root 10 Mar 16 09:28 00000000000022253427.snapshot -rw-r--r-- 1 root root 8 Mar 1 09:17 leader-epoch-checkpoint {code} I resolved this by executing command: {code:java} find /paasdata/commsrvkafka/data/broker/kafka-logs -name *.snapshot |xargs rm {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)