Broker does not stop gracefully with SIGTERM

Jonathan Santilli Thu, 16 May 2019 01:44:37 -0700

Hello, hope you all are doing well,

Am trying to stop gracefully a Broker with SIGTERM (-15). After almost 12
hours the process is still alive.
I do not see any data/replication going in or out from this Broker.
The following are the logs immediately after sending the SIGTERM signal to
the process:


[2019-05-15 20:47:29,637] INFO Terminating process due to signal SIGTERM
(org.apache.kafka.common.utils.LoggingSignalHandler)
[2019-05-15 20:47:29,642] INFO [KafkaServer id=1] shutting down
(kafka.server.KafkaServer)
[2019-05-15 20:47:29,646] INFO [KafkaServer id=1] Starting controlled
shutdown (kafka.server.KafkaServer)
[2019-05-15 20:47:29,746] INFO [ReplicaFetcherManager on broker 1] Removed
fetcher for partitions Set(...,

...

[2019-05-15 20:47:30,134] INFO [ReplicaAlterLogDirsManager on broker 1]
Removed fetcher for partitions Set(__consumer_offsets-46)
(kafka.server.ReplicaAlterLogDirsManager)
[2019-05-15 20:47:30,134] INFO [KafkaServer id=1] Controlled shutdown
succeeded (kafka.server.KafkaServer)
[2019-05-15 20:47:30,135] INFO [ReplicaFetcherManager on broker 1] Removed
fetcher for partitions Set(...) (kafka.server.ReplicaFetcherManager)
[2019-05-15 20:47:30,135] INFO [ReplicaAlterLogDirsManager on broker 1]
Removed fetcher for partitions Set(...)
(kafka.server.ReplicaAlterLogDirsManager)
[2019-05-15 20:47:30,136] INFO [/config/changes-event-process-thread]:
Shutting down
(kafka.common.ZkNodeChangeNotificationListener$ChangeEventProcessThread)
[2019-05-15 20:47:30,136] INFO [/config/changes-event-process-thread]:
Stopped
(kafka.common.ZkNodeChangeNotificationListener$ChangeEventProcessThread)
[2019-05-15 20:47:30,136] INFO [/config/changes-event-process-thread]:
Shutdown completed
(kafka.common.ZkNodeChangeNotificationListener$ChangeEventProcessThread)
[2019-05-15 20:47:30,137] INFO [SocketServer brokerId=1] Stopping socket
server request processors (kafka.network.SocketServer)
[2019-05-15 20:47:30,137] INFO [ReplicaFetcherManager on broker 1] Removed
fetcher for partitions Set(...) (kafka.server.ReplicaFetcherManager)
[2019-05-15 20:47:30,137] INFO [ReplicaAlterLogDirsManager on broker 1]
Removed fetcher for partitions Set(...)
(kafka.server.ReplicaAlterLogDirsManager)

...

[2019-05-15 20:47:30,235] INFO [ReplicaAlterLogDirsManager on broker 1]
Removed fetcher for partitions Set(...)
(kafka.server.ReplicaAlterLogDirsManager)
[2019-05-15 20:47:30,773] INFO [SocketServer brokerId=1] Stopped socket
server request processors (kafka.network.SocketServer)
[2019-05-15 20:47:30,773] INFO [data-plane Kafka Request Handler on Broker
1], shutting down (kafka.server.KafkaRequestHandlerPool)
[2019-05-15 20:47:30,813] INFO [data-plane Kafka Request Handler on Broker
1], shut down completely (kafka.server.KafkaRequestHandlerPool)
[2019-05-15 20:49:58,327] INFO [GroupMetadataManager brokerId=1] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)

...

[2019-05-15 22:47:03,400] INFO [ProducerStateManager
partition=__consumer_offsets-40] Writing producer snapshot at offset
312043064 (kafka.log.ProducerStateManager)
[2019-05-15 22:47:03,404] ERROR [ReplicaFetcher replicaId=1, leaderId=4,
fetcherId=7] Error due to (kafka.server.ReplicaFetcherThread)
org.apache.kafka.common.KafkaException: Error processing data for partition
__consumer_offsets-40 offset 312043064
at
kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$7(AbstractFetcherThread.scala:342)
at scala.Option.foreach(Option.scala:274)
at
kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6(AbstractFetcherThread.scala:300)
at
kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6$adapted(AbstractFetcherThread.scala:299)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at
kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$5(AbstractFetcherThread.scala:299)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
at
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:299)
at
kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:132)
at
kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:131)
at scala.Option.foreach(Option.scala:274)
at
kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:131)
at
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:113)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
Caused by: java.lang.IllegalStateException: Kafka scheduler is not running.
at kafka.utils.KafkaScheduler.ensureRunning(KafkaScheduler.scala:140)
at kafka.utils.KafkaScheduler.schedule(KafkaScheduler.scala:110)
at kafka.log.Log.$anonfun$roll$2(Log.scala:1675)
at kafka.log.Log.maybeHandleIOException(Log.scala:2013)
at kafka.log.Log.roll(Log.scala:1616)
at kafka.log.Log.maybeRoll(Log.scala:1601)
at kafka.log.Log.$anonfun$append$2(Log.scala:934)
at kafka.log.Log.maybeHandleIOException(Log.scala:2013)
at kafka.log.Log.append(Log.scala:827)
at kafka.log.Log.appendAsFollower(Log.scala:807)
at
kafka.cluster.Partition.$anonfun$doAppendRecordsToFollowerOrFutureReplica$1(Partition.scala:708)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:257)
at
kafka.cluster.Partition.doAppendRecordsToFollowerOrFutureReplica(Partition.scala:699)
at
kafka.cluster.Partition.appendRecordsToFollowerOrFutureReplica(Partition.scala:715)
at
kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:157)
at
kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$7(AbstractFetcherThread.scala:311)
... 16 more
[2019-05-15 22:47:03,404] INFO [ReplicaFetcher replicaId=1, leaderId=4,
fetcherId=7] Stopped (kafka.server.ReplicaFetcherThread)
[2019-05-15 22:49:58,327] INFO [GroupMetadataManager brokerId=1] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)

...


After almost 12 hours and the logs showing that the Broker stopped
gracefully, it continues showing logs like:

[2019-05-16 09:09:58,327] INFO [GroupMetadataManager brokerId=1] Removed 0
expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)


But the process is still alive. According to the symptoms, what
could/should I check? I hope you guys can give me a clue.


Cheers!
-- 
Santilli Jonathan

Broker does not stop gracefully with SIGTERM

Reply via email to