[
https://issues.apache.org/jira/browse/KAFKA-7814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740570#comment-16740570
]
Dhruvil Shah edited comment on KAFKA-7814 at 1/11/19 8:01 PM:
--------------------------------------------------------------
[~Ever600686] thanks for reporting the issue. It looks like the segment was
deleted while log cleaner was trying to clean it. Could you please upload
broker and log cleaner logs from around the time of the incidence?
was (Author: dhruvilshah):
[~Ever600686] thanks for reporting the issue. It looks like the segment was
deleted while log cleaner was trying to clean it. What is the topic's
`cleanup.policy` set to? Could you also upload broker and log cleaner logs from
around the time of the incidence?
> Broker shut down while cleaning up log file
> -------------------------------------------
>
> Key: KAFKA-7814
> URL: https://issues.apache.org/jira/browse/KAFKA-7814
> Project: Kafka
> Issue Type: Bug
> Components: log, offset manager
> Affects Versions: 1.1.0, 2.1.0
> Environment: os: aliYun, centos7
> docker image:wurstmeister/kafka:2.12-2.1.0
> Reporter: EverZhang
> Priority: Critical
>
> Kafka cluster with 3 brokers(version:1.1.0) and is well running for over 6
> months.
> Then we modified partitions from 3 to 48 for every topic after 2018/12/12,
> then the brokers shutdown every 5-10 days.
> Then we upgraded the broker from 1.1.0 to 2.1.0, but the brokers still keep
> shutting down every 5-10 days.
> Each time, one broker shut down after the following error log, then several
> minutes later, the other 2 brokers shut down too, with the same error but
> other partition log files.
> {code:bash}
> [2019-01-11 17:16:36,572] INFO [ProducerStateManager
> partition=__transaction_state-11] Writing producer snapshot at offset 807760
> (kafka.log.ProducerStateManager)
> [2019-01-11 17:16:36,572] INFO [Log partition=__transaction_state-11,
> dir=/kafka/logs] Rolled new log segment at offset 807760 in 4 ms.
> (kafka.log.Log)
> [2019-01-11 17:16:46,150] WARN Resetting first dirty offset of
> __transaction_state-35 to log start offset 194404 since the checkpointed
> offset 194345 is invalid. (kafka.log.LogCleanerManager$)
> [2019-01-11 17:16:46,239] ERROR Failed to clean up log for
> __transaction_state-11 in dir /kafka/logs due to IOException
> (kafka.server.LogDirFailureChannel)
> java.nio.file.NoSuchFileException:
> /kafka/logs/__transaction_state-11/00000000000000807727.log
> at
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
> at
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> at
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
> at
> sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
> at java.nio.file.Files.move(Files.java:1395)
> at
> org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:809)
> at
> org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:222)
> at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:488)
> at kafka.log.Log.asyncDeleteSegment(Log.scala:1838)
> at kafka.log.Log.$anonfun$replaceSegments$6(Log.scala:1901)
> at kafka.log.Log.$anonfun$replaceSegments$6$adapted(Log.scala:1896)
> at scala.collection.immutable.List.foreach(List.scala:388)
> at kafka.log.Log.replaceSegments(Log.scala:1896)
> at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:583)
> at kafka.log.Cleaner.$anonfun$doClean$6(LogCleaner.scala:515)
> at kafka.log.Cleaner.$anonfun$doClean$6$adapted(LogCleaner.scala:514)
> at scala.collection.immutable.List.foreach(List.scala:388)
> at kafka.log.Cleaner.doClean(LogCleaner.scala:514)
> at kafka.log.Cleaner.clean(LogCleaner.scala:492)
> at kafka.log.LogCleaner$CleanerThread.cleanLog(LogCleaner.scala:353)
> at
> kafka.log.LogCleaner$CleanerThread.cleanFilthiestLog(LogCleaner.scala:319)
> at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:300)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> Suppressed: java.nio.file.NoSuchFileException:
> /kafka/logs/__transaction_state-11/00000000000000807727.log ->
> /kafka/logs/__transaction_state-11/00000000000000807727.log.deleted
> at
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
> at
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
> at
> sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
> at java.nio.file.Files.move(Files.java:1395)
> at
> org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:806)
> ... 17 more
> [2019-01-11 17:16:46,245] INFO [ReplicaManager broker=2] Stopping serving
> replicas in dir /kafka/logs (kafka.server.ReplicaManager)
> [2019-01-11 17:16:46,314] INFO Stopping serving logs in dir /kafka/logs
> (kafka.log.LogManager)
> [2019-01-11 17:16:46,326] ERROR Shutdown broker because all log dirs in
> /kafka/logs have failed (kafka.log.LogManager)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)