[ https://issues.apache.org/jira/browse/KAFKA-7814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
EverZhang resolved KAFKA-7814. ------------------------------ Resolution: Not A Problem > Broker shut down while cleaning up log file > ------------------------------------------- > > Key: KAFKA-7814 > URL: https://issues.apache.org/jira/browse/KAFKA-7814 > Project: Kafka > Issue Type: Bug > Components: log, offset manager > Affects Versions: 1.1.0, 2.1.0 > Environment: os: aliYun, centos7 > docker image:wurstmeister/kafka:2.12-2.1.0 > Reporter: EverZhang > Priority: Critical > Attachments: broker1.log-cleaner.log.2019-01-11-17, > broker1.server.log.2019-01-11-17, broker2.log-cleaner.log.2019-01-11-17, > broker2.server.log.2019-01-11-17, broker3.log-cleaner.log.2019-01-11-17, > broker3.server.log.2019-01-11-17 > > > Kafka cluster with 3 brokers(version:1.1.0) and is well running for over 6 > months. > Then we modified partitions from 3 to 48 for every topic after 2018/12/12, > then the brokers shutdown every 5-10 days. > Then we upgraded the broker from 1.1.0 to 2.1.0, but the brokers still keep > shutting down every 5-10 days. > Each time, one broker shut down after the following error log, then several > minutes later, the other 2 brokers shut down too, with the same error but > other partition log files. > {code:bash} > [2019-01-11 17:16:36,572] INFO [ProducerStateManager > partition=__transaction_state-11] Writing producer snapshot at offset 807760 > (kafka.log.ProducerStateManager) > [2019-01-11 17:16:36,572] INFO [Log partition=__transaction_state-11, > dir=/kafka/logs] Rolled new log segment at offset 807760 in 4 ms. > (kafka.log.Log) > [2019-01-11 17:16:46,150] WARN Resetting first dirty offset of > __transaction_state-35 to log start offset 194404 since the checkpointed > offset 194345 is invalid. (kafka.log.LogCleanerManager$) > [2019-01-11 17:16:46,239] ERROR Failed to clean up log for > __transaction_state-11 in dir /kafka/logs due to IOException > (kafka.server.LogDirFailureChannel) > java.nio.file.NoSuchFileException: > /kafka/logs/__transaction_state-11/00000000000000807727.log > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409) > at > sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262) > at java.nio.file.Files.move(Files.java:1395) > at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:809) > at > org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:222) > at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:488) > at kafka.log.Log.asyncDeleteSegment(Log.scala:1838) > at kafka.log.Log.$anonfun$replaceSegments$6(Log.scala:1901) > at kafka.log.Log.$anonfun$replaceSegments$6$adapted(Log.scala:1896) > at scala.collection.immutable.List.foreach(List.scala:388) > at kafka.log.Log.replaceSegments(Log.scala:1896) > at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:583) > at kafka.log.Cleaner.$anonfun$doClean$6(LogCleaner.scala:515) > at kafka.log.Cleaner.$anonfun$doClean$6$adapted(LogCleaner.scala:514) > at scala.collection.immutable.List.foreach(List.scala:388) > at kafka.log.Cleaner.doClean(LogCleaner.scala:514) > at kafka.log.Cleaner.clean(LogCleaner.scala:492) > at kafka.log.LogCleaner$CleanerThread.cleanLog(LogCleaner.scala:353) > at > kafka.log.LogCleaner$CleanerThread.cleanFilthiestLog(LogCleaner.scala:319) > at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:300) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82) > Suppressed: java.nio.file.NoSuchFileException: > /kafka/logs/__transaction_state-11/00000000000000807727.log -> > /kafka/logs/__transaction_state-11/00000000000000807727.log.deleted > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396) > at > sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262) > at java.nio.file.Files.move(Files.java:1395) > at > org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:806) > ... 17 more > [2019-01-11 17:16:46,245] INFO [ReplicaManager broker=2] Stopping serving > replicas in dir /kafka/logs (kafka.server.ReplicaManager) > [2019-01-11 17:16:46,314] INFO Stopping serving logs in dir /kafka/logs > (kafka.log.LogManager) > [2019-01-11 17:16:46,326] ERROR Shutdown broker because all log dirs in > /kafka/logs have failed (kafka.log.LogManager) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)