[ 
https://issues.apache.org/jira/browse/KAFKA-17752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josep Prat updated KAFKA-17752:
-------------------------------
    Component/s: kraft

> Contoller crashes when removed if it is an initial controller
> -------------------------------------------------------------
>
>                 Key: KAFKA-17752
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17752
>             Project: Kafka
>          Issue Type: Bug
>          Components: kraft
>    Affects Versions: 3.9.0
>            Reporter: Juha Mynttinen
>            Priority: Major
>
> Hey,
> Tested using 3.9.0 RC0. The issue only affects kraft.
> It seems that "kafka-metadata-quorum.sh remove-controller" causes the removed 
> controller to crash if it is one of the controllers specified using 
> "--initial-controllers "
> Steps to reproduce:
> Clean up and setup the environment
> rm -rf /tmp/controllers && \
> mkdir -p /tmp/controllers/c1 && \
> mkdir -p /tmp/controllers/c2 && \
> mkdir -p /tmp/controllers/c3
> export KAFKA_HOME=<your_kafka_3_9_home>
> Format the controllers
> $KAFKA_HOME/bin/kafka-storage.sh format --cluster-id 
> 00000000-0000-0000-0000-000000000001 --initial-controllers 
> 1001@localhost:10001:AAAAAAAAAAEAAAAAAAAAAA,1002@localhost:10002:AAAAAAAAAAEAAAAAAAAAAA,1003@localhost:10003:AAAAAAAAAAEAAAAAAAAAAA
>  --config c1.properties
> $KAFKA_HOME/bin/kafka-storage.sh format --cluster-id 
> 00000000-0000-0000-0000-000000000001 --initial-controllers 
> 1001@localhost:10001:AAAAAAAAAAEAAAAAAAAAAA,1002@localhost:10002:AAAAAAAAAAEAAAAAAAAAAA,1003@localhost:10003:AAAAAAAAAAEAAAAAAAAAAA
>  --config c2.properties
> $KAFKA_HOME/bin/kafka-storage.sh format --cluster-id 
> 00000000-0000-0000-0000-000000000001 --initial-controllers 
> 1001@localhost:10001:AAAAAAAAAAEAAAAAAAAAAA,1002@localhost:10002:AAAAAAAAAAEAAAAAAAAAAA,1003@localhost:10003:AAAAAAAAAAEAAAAAAAAAAA
>  --config c3.properties
> Start the controllers, in separate terminals
> $KAFKA_HOME/bin/kafka-run-class.sh -name kafkaService kafka.Kafka 
> c1.properties
> $KAFKA_HOME/bin/kafka-run-class.sh -name kafkaService kafka.Kafka 
> c2.properties
> $KAFKA_HOME/bin/kafka-run-class.sh -name kafkaService kafka.Kafka 
> c3.properties
> Remove a controller:
> $KAFKA_HOME/bin/kafka-metadata-quorum.sh --bootstrap-controller 
> localhost:10001,localhost:10002,localhost:10003,localhost:10004 
> remove-controller --controller-id 1001 --controller-directory-id 
> AAAAAAAAAAEAAAAAAAAAAA
> The process crashes with the following error:
> [2024-10-09 15:19:15,574] ERROR Encountered fatal fault: exception while 
> renouncing leadership 
> (org.apache.kafka.server.fault.ProcessTerminatingFaultHandler)
> java.lang.RuntimeException: Unable to reset to last stable offset 55. No 
> in-memory snapshot found for this offset.
>         at 
> org.apache.kafka.controller.OffsetControlManager.deactivate(OffsetControlManager.java:268)
>         at 
> org.apache.kafka.controller.QuorumController.renounce(QuorumController.java:1281)
>         at 
> org.apache.kafka.controller.QuorumController.handleEventException(QuorumController.java:552)
>         at 
> org.apache.kafka.controller.QuorumController.access$800(QuorumController.java:180)
>         at 
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:885)
>         at 
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleException(QuorumController.java:875)
>         at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.completeWithException(KafkaEventQueue.java:153)
>         at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:142)
>         at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:215)
>         at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:186)
>         at java.base/java.lang.Thread.run(Thread.java:840)
> If the process that died is restarted it joins the cluster and becomes on 
> observer, as expected.
> The crash doesn't happen in a slightly different case, exact steps missing. 
> But the idea is this:
> 1. Create a 3-controller cluster as above
> 2. Format and start a 4rd controller. 
> 3. Add the 4th controller as a voter.
> 4. Remove the 4th controller to make it an observer. It becomes observer as 
> expected.
> Because this case works, I'm guessing the crash is somehow related to the 
> controller being one of the initial controllers.
> I didn't dig deeper on why the crash occurs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to