[ https://issues.apache.org/jira/browse/KAFKA-8725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899237#comment-16899237 ]
GEORGE LI commented on KAFKA-8725: ---------------------------------- We saw similar issues. After the thread die, the "good" partitions also stops cleaning and accumulated backlog. One more improvement might be to restart the log cleaner thread without bouncing the broker. The current Dynamic config log.cleaner.threads seems to be able to start the thread only one-time. > Improve LogCleaner error handling when failing to grab the filthiest log > ------------------------------------------------------------------------ > > Key: KAFKA-8725 > URL: https://issues.apache.org/jira/browse/KAFKA-8725 > Project: Kafka > Issue Type: Improvement > Reporter: Stanislav Kozlovski > Assignee: Stanislav Kozlovski > Priority: Major > > https://issues.apache.org/jira/browse/KAFKA-7215 improved error handling in > the log cleaner with the goal of not having the whole thread die when an > exception happens, but rather mark the partition that caused it as > uncleanable and continue cleaning the error-free partitions. > Unfortunately, the current code can still bubble up an exception and cause > the thread to die when an error happens before we can grab the filthiest log > and start cleaning it. At that point, we don't have a clear reference to the > log that caused the exception and chose to throw an IllegalStateException - > [https://github.com/apache/kafka/blob/39bcc8447c906506d63b8df156cf90174bbb8b78/core/src/main/scala/kafka/log/LogCleaner.scala#L346] > (as seen in https://issues.apache.org/jira/browse/KAFKA-8724) > Essentially, exceptions in `grabFilthiestCompactedLog` still cause the thread > to die. This can be further improved by trying to catch what log caused the > exception in the aforementioned function -- This message was sent by Atlassian JIRA (v7.6.14#76016)