[ 
https://issues.apache.org/jira/browse/KAFKA-14287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen updated KAFKA-14287:
------------------------------
    Description: 
Multiple nodes with kraft combined mode (i.e. 
process.roles='broker,controller') can startup successfully. When shutdown in 
combined mode, we'll unfence broker first. When the remaining controller nodes 
are less than quorum size (i.e. N / 2 + 1), the unfence record will not get 
committed to metadata topic successfully. So the broker will keep waiting for 
the shutdown granting response and then timeout error:

 
{code:java}
2022-10-11 18:01:14,341] ERROR [kafka-raft-io-thread]: Graceful shutdown of 
RaftClient failed (kafka.raft.KafkaRaftManager$RaftIoThread)
java.util.concurrent.TimeoutException: Timeout expired before graceful shutdown 
completed
    at 
org.apache.kafka.raft.KafkaRaftClient$GracefulShutdown.failWithTimeout(KafkaRaftClient.java:2408)
    at 
org.apache.kafka.raft.KafkaRaftClient.maybeCompleteShutdown(KafkaRaftClient.java:2163)
    at org.apache.kafka.raft.KafkaRaftClient.poll(KafkaRaftClient.java:2230)
    at kafka.raft.KafkaRaftManager$RaftIoThread.doWork(RaftManager.scala:52)
    at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)
 {code}
 

 

to reproduce:
 # start up 2 kraft combines nodes, so we need 2 nodes get quorum
 # shutdown any one node, in this time, it will shutdown successfully because 
when broker shutdown, the 2 controllers are all alive, so broker can be granted 
for shutdown
 # shutdown 2nd node, this time, the shutdown will be pending, and then timeout

  was:
Multiple nodes with kraft combined mode (i.e. 
process.roles='broker,controller') can startup successfully. When shutdown in 
combined mode, we'll unfence broker first. When the remaining controller nodes 
are less than quorum size (i.e. N / 2 + 1), the unfence record will not get 
committed to metadata topic successfully. So the broker will keep waiting for 
the shutdown granting response and then timeout error:

 
{code:java}
2022-10-11 18:01:14,341] ERROR [kafka-raft-io-thread]: Graceful shutdown of 
RaftClient failed (kafka.raft.KafkaRaftManager$RaftIoThread)
java.util.concurrent.TimeoutException: Timeout expired before graceful shutdown 
completed
    at 
org.apache.kafka.raft.KafkaRaftClient$GracefulShutdown.failWithTimeout(KafkaRaftClient.java:2408)
    at 
org.apache.kafka.raft.KafkaRaftClient.maybeCompleteShutdown(KafkaRaftClient.java:2163)
    at org.apache.kafka.raft.KafkaRaftClient.poll(KafkaRaftClient.java:2230)
    at kafka.raft.KafkaRaftManager$RaftIoThread.doWork(RaftManager.scala:52)
    at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)
 {code}


> Multi noded with kraft combined mode will fail shutdown
> -------------------------------------------------------
>
>                 Key: KAFKA-14287
>                 URL: https://issues.apache.org/jira/browse/KAFKA-14287
>             Project: Kafka
>          Issue Type: Bug
>          Components: kraft
>    Affects Versions: 3.3.1
>            Reporter: Luke Chen
>            Assignee: Luke Chen
>            Priority: Major
>
> Multiple nodes with kraft combined mode (i.e. 
> process.roles='broker,controller') can startup successfully. When shutdown in 
> combined mode, we'll unfence broker first. When the remaining controller 
> nodes are less than quorum size (i.e. N / 2 + 1), the unfence record will not 
> get committed to metadata topic successfully. So the broker will keep waiting 
> for the shutdown granting response and then timeout error:
>  
> {code:java}
> 2022-10-11 18:01:14,341] ERROR [kafka-raft-io-thread]: Graceful shutdown of 
> RaftClient failed (kafka.raft.KafkaRaftManager$RaftIoThread)
> java.util.concurrent.TimeoutException: Timeout expired before graceful 
> shutdown completed
>     at 
> org.apache.kafka.raft.KafkaRaftClient$GracefulShutdown.failWithTimeout(KafkaRaftClient.java:2408)
>     at 
> org.apache.kafka.raft.KafkaRaftClient.maybeCompleteShutdown(KafkaRaftClient.java:2163)
>     at org.apache.kafka.raft.KafkaRaftClient.poll(KafkaRaftClient.java:2230)
>     at kafka.raft.KafkaRaftManager$RaftIoThread.doWork(RaftManager.scala:52)
>     at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)
>  {code}
>  
>  
> to reproduce:
>  # start up 2 kraft combines nodes, so we need 2 nodes get quorum
>  # shutdown any one node, in this time, it will shutdown successfully because 
> when broker shutdown, the 2 controllers are all alive, so broker can be 
> granted for shutdown
>  # shutdown 2nd node, this time, the shutdown will be pending, and then 
> timeout



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to