[ https://issues.apache.org/jira/browse/KAFKA-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15307077#comment-15307077 ]
Ewen Cheslack-Postava commented on KAFKA-3335: ---------------------------------------------- [~shikhar] The shutdown hook is added at the beginning to make sure we clean up even if something happens during startup -- any services that did get started up should be properly cleaned up. I think a relevant piece of info that was missing is that it looks like this was against the 0.9 releases (or a version of trunk after 0.9 and before 0.10) and the code has since been cleaned up a bit. The startLatch wasn't previously in a finally block which explains why it was never triggered. Since that's fixed, it won't block the subsequent stop() call. I've validated by manually triggering an exception in both the 0.9.0.1 code and the trunk code and the issue is only reproduced in the old release. > Kafka Connect hangs in shutdown hook > ------------------------------------ > > Key: KAFKA-3335 > URL: https://issues.apache.org/jira/browse/KAFKA-3335 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect > Affects Versions: 0.9.0.1 > Reporter: Ben Kirwin > > The `Connect` class can run into issues during start, such as: > {noformat} > Exception in thread "main" org.apache.kafka.connect.errors.ConnectException: > Could not look up partition metadata for offset backing store topic in > allotted period. This could indicate a connectivity issue, unavailable topic > partitions, or if this is your first use of the topic it may have taken too > long to create. > at > org.apache.kafka.connect.util.KafkaBasedLog.start(KafkaBasedLog.java:130) > at > org.apache.kafka.connect.storage.KafkaOffsetBackingStore.start(KafkaOffsetBackingStore.java:85) > at org.apache.kafka.connect.runtime.Worker.start(Worker.java:108) > at org.apache.kafka.connect.runtime.Connect.start(Connect.java:56) > at > org.apache.kafka.connect.cli.ConnectDistributed.main(ConnectDistributed.java:62) > {noformat} > This exception halts the startup process. It also triggers the shutdown > hook... which blocks waiting for the service to start up before calling stop. > This causes the process to hang forever. > There's a few things that could be done here, but it would be nice to bound > the amount of time the process spends trying to exit gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)