Geoff Anderson created KAFKA-2770: ------------------------------------- Summary: Race condition causes Mirror Maker to hang during shutdown (new consumer) Key: KAFKA-2770 URL: https://issues.apache.org/jira/browse/KAFKA-2770 Project: Kafka Issue Type: Bug Reporter: Geoff Anderson
I recently added clean bounce with new consumer to the mirror maker tests (https://github.com/apache/kafka/pull/427), and noticed that in this case the mirror maker process (with new consumer) sometimes hangs and fails to stop when stopped with kill -15 {code:title=mirror_maker.log|borderStyle=solid} [2015-11-06 22:06:04,213] INFO Start clean shutdown. (kafka.tools.MirrorMaker$) [2015-11-06 22:06:04,221] INFO Shutting down consumer threads. (kafka.tools.MirrorMaker$) [2015-11-06 22:06:04,239] INFO [mirrormaker-thread-0] mirrormaker-thread-0 shutting down (kafka.tools.MirrorMaker$MirrorMakerThread) [2015-11-06 22:06:04,253] INFO [mirrormaker-thread-0] Flushing producer. (kafka.tools.MirrorMaker$MirrorMakerThread) [2015-11-06 22:06:04,254] INFO [mirrormaker-thread-0] Committing consumer offsets. (kafka.tools.MirrorMaker$MirrorMakerThread) Exception in thread "mirrormaker-thread-0" org.apache.kafka.common.errors.WakeupException at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:304) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:194) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:184) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:154) at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:347) at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:895) at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:869) at kafka.tools.MirrorMaker$MirrorMakerNewConsumer.commit(MirrorMaker.scala:522) at kafka.tools.MirrorMaker$.commitOffsets(MirrorMaker.scala:338) at kafka.tools.MirrorMaker$MirrorMakerThread.run(MirrorMaker.scala:406) [2015-11-06 22:06:29,448] DEBUG Connection with worker4/192.168.50.104 disconnected (org.apache.kafka.common.network.Selector) java.io.EOFException at org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:83) at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71) at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:160) at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:141) at org.apache.kafka.common.network.Selector.poll(Selector.java:288) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:270) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) at java.lang.Thread.run(Thread.java:745) {code} The current working hypothesis is this: a WakeupException is being triggered during the finally block in mirror maker by the call to commitOffsets, and the mirror maker thread dies before the call to shutdownLatch.countDown(). Therefore the shutdownLatch.await() call in awaitShutdown() blocks forever and the process never exits. Why can commitOffsets trigger a wakeup exception? The shutdown hook is triggered in another thread, and does this: shuttingDown = true mirrorMakerConsumer.stop() # Calls consumer.wakeup() If the timing is right (wrong), the wakeup flag is set, but the mirrormaker produce/consume loop exits without triggering the WakeupException, and the WakeupException isn't thrown until commitOffsets() is called in the finally block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)