[ 
https://issues.apache.org/jira/browse/KAFKA-10723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kowshik Prakasam updated KAFKA-10723:
-------------------------------------
    Summary: LogManager leaks internal thread pool activity during shutdown  
(was: LogManager shutdown error handler doesn't shutdown all internal thread 
pools)

> LogManager leaks internal thread pool activity during shutdown
> --------------------------------------------------------------
>
>                 Key: KAFKA-10723
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10723
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Kowshik Prakasam
>            Priority: Major
>
> *TL;DR:*
> The asynchronous shutdown in {{LogManager}} has the shortcoming that if 
> during shutdown any of the internal futures fail, then we do not always 
> ensure that all futures are completed before {{LogManager.shutdown}} returns. 
> As a result, despite the shut down completed message from KafkaServer is seen 
> in the error logs, some futures continue to run from inside LogManager 
> attempting to close the logs.
> *Description:*
> When LogManager is shutting down, exceptions in log closure are handled 
> [here|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LogManager.scala#L497-L501].
>  However, this 
> [line|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LogManager.scala#L502]
>  in the finally clause shutsdown the thread pools *asynchronously*.
> {noformat}
> threadPools.foreach(_.shutdown()){noformat}
> initiates an orderly shutdown (for each thread pool) in which previously 
> submitted tasks are executed, but no new tasks will be accepted (see javadoc 
> link 
> [here|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorService.html#shutdown()]).
>  As a result, if there is an exception during log closure, some of the thread 
> pools which are closing logs could be leaked and continue to run in the 
> background, after the control returns to the caller (i.e. {{KafkaServer}}). 
> As a result, even after the _shut down completed_ message is seen in the 
> error logs (originating from {{KafkaServer}} shutdown sequence), log closures 
> continue to happen in the background. 
>   
> *Proposed options for fixes:*
> It seems useful that we maintain the contract with {{KafkaServer}} that after 
> {{LogManager.shutdown}} is called once, all tasks that close the logs are 
> guaranteed to have completed before the call returns. There are probably 
> couple ways to fix this:
>  # A simple fix could be to replace {{_threadPools.foreach(.shutdown())_ with 
> _threadPools.foreach(.awaitTermination())_}}{{.}} This ensures that we wait 
> for all threads to be shutdown before returning the {{_LogManager.shutdown_}} 
> call.
>  # An advanced fix could be to just skip creating of checkpoint and clean 
> shutdown file only for the affected directory if any of its futures throw an 
> error. We continue to wait for all futures to complete for all directories.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to