Uwe Eisele created KAFKA-6714:
---------------------------------
Summary: KafkaController marks all Brokers as "Shutting down",
though only one broker has been shut down
Key: KAFKA-6714
URL: https://issues.apache.org/jira/browse/KAFKA-6714
Project: Kafka
Issue Type: Bug
Components: controller, core
Affects Versions: 0.11.0.2
Environment: Kafka Cluster on Amazon AWS EC2 r4.2xlarge instances with
5 nodes and a Zookeeper Cluster on r4.2xlarge instances with 3 nodes. The
Cluster is distributed across 2 availability zones.
Reporter: Uwe Eisele
In our Kafka Cluster we experienced a situation in wich the Kafka controller
has all Brokers marked as "Shutting down", though indeed only one Broker has
been shut down.
The last log entry about the broker state before the entry that states that all
brokers are shutting down states that no brokers are shutting down.
The consequence of this weird state is, that the Kafka controller is not able
to elect any partition leader.
{code:java}
[2018-03-15 16:28:24,288] INFO [Controller 5]: Shutting down broker 5
(kafka.controller.KafkaController)
[2018-03-15 16:28:24,288] DEBUG [Controller 5]: All shutting down brokers: 5
(kafka.controller.KafkaController)
[2018-03-15 16:28:24,288] DEBUG [Controller 5]: Live brokers: 1,2,3,4
(kafka.controller.KafkaController)
...
[2018-03-15 16:28:36,846] INFO [Controller 3]: Currently active brokers in the
cluster: Set(1, 2, 3, 4) (kafka.controller.KafkaController)
[2018-03-15 16:28:36,846] INFO [Controller 3]: Currently shutting brokers in
the cluster: Set() (kafka.controller.KafkaController)
...
[2018-03-19 17:57:22,273] INFO [Controller 3]: Shutting down broker 1
(kafka.controller.KafkaController)
[2018-03-19 17:57:22,273] DEBUG [Controller 3]: All shutting down brokers:
1,5,2,3,4 (kafka.controller.KafkaController)
[2018-03-19 17:57:22,273] DEBUG [Controller 3]: Live brokers:
(kafka.controller.KafkaController)
...
[2018-03-19 17:57:22,275] ERROR Controller 3 epoch 83 encountered error while
electing leader for partition
[zughaltphase_v3_intern_intern_partitioned_by_evanummer,6] due to: No other
replicas in ISR 1,3,5 for
[zughaltphase_v3_intern_intern_partitioned_by_evanummer,6] besides shutting
down brokers 1,5,2,3,4. (state.change.logger) {code}
The question is why the Kafka controller assumes that all brokers are shutting
down?
The only place in the Kafka code (0.11.0.2) we found in which the shutting down
broker set is changed is in the class _kafka.controller.KafkaControler_ in line
1407 in the method _doControlledShutdown_.
{code:java}
info("Shutting down broker " + id)
if (!controllerContext.liveOrShuttingDownBrokerIds.contains(id))
throw new BrokerNotAvailableException("Broker id %d does not
exist.".format(id))
controllerContext.shuttingDownBrokerIds.add(id)
{code}
However, we should see the log entry "Shutting down broker n" for all Brokers
in the log file, but it is not there.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)