[jira] [Updated] (KAFKA-9359) Controller does not handle requests while broker is being shutdown

Lucas Bradstreet (Jira) Sun, 10 May 2020 09:24:18 -0700


     [ 
https://issues.apache.org/jira/browse/KAFKA-9359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Lucas Bradstreet updated KAFKA-9359:
------------------------------------
    Description: 
When a broker is shutdown it first tries to go through a controlled shutdown, 
resigning leadership of its partitions and then it stops the socket server from 
processing requests and shuts down the various data plane and control plane 
handlers and processors:

 
{noformat}
if (socketServer != null)
  CoreUtils.swallow(socketServer.stopProcessingRequests(), this)
if (dataPlaneRequestHandlerPool != null)
  CoreUtils.swallow(dataPlaneRequestHandlerPool.shutdown(), this)
if (controlPlaneRequestHandlerPool != null)
  CoreUtils.swallow(controlPlaneRequestHandlerPool.shutdown(), this)
if (kafkaScheduler != null)
  CoreUtils.swallow(kafkaScheduler.shutdown(), this)

if (dataPlaneRequestProcessor != null)
  CoreUtils.swallow(dataPlaneRequestProcessor.close(), this)
if (controlPlaneRequestProcessor != null)
  CoreUtils.swallow(controlPlaneRequestProcessor.close(), this){noformat}
The kafkaController component is only shut down much later, after closing the 
logManager, a process which may take some time as log closing requires 
checkpointing state and flushing segments. If the broker being shutdown is the 
controller, this means there is a potentially large window in which no 
controller is processing controller requests. Only when the controller 
component is shutdown and the zkClient is closed will the controller resign 
leadership.

There is a second problem in that a broker that does not successfully undergo 
controlled shutdown will also remain the leader for its partitions until the 
zkClient is shutdown, and the potential window there is large due to the 
aforementioned log manager shutdown.

It would be ideal if:
 # controller leadership is resigned early in the shutdown process before 
request handling is stopped. Care will have to be taken so that the broker in 
question cannot regain it.
 # we can reduce the window between an uncontrolled shutdown and resigning 
leadership of partitions through the zkclient close failsafe.

See also https://issues.apache.org/jira/browse/KAFKA-9358

  was:
When a broker is shutdown it stops accepting requests, as it immediately socket 
server and handler pools are shutdown. It does so before shutting down the 
controller and or closing the log manager, and this may take some time to 
complete. During this time it will remain the controller as the zkClient has 
not been closed. We should improve the shutdown process such that a broker does 
not remain the controller while it is unable to accept requests that is 
expected of a controller.

See also https://issues.apache.org/jira/browse/KAFKA-9358


> Controller does not handle requests while broker is being shutdown
> ------------------------------------------------------------------
>
>                 Key: KAFKA-9359
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9359
>             Project: Kafka
>          Issue Type: Improvement
>          Components: controller, core
>            Reporter: Lucas Bradstreet
>            Priority: Major
>
> When a broker is shutdown it first tries to go through a controlled shutdown, 
> resigning leadership of its partitions and then it stops the socket server 
> from processing requests and shuts down the various data plane and control 
> plane handlers and processors:
>  
> {noformat}
> if (socketServer != null)
>   CoreUtils.swallow(socketServer.stopProcessingRequests(), this)
> if (dataPlaneRequestHandlerPool != null)
>   CoreUtils.swallow(dataPlaneRequestHandlerPool.shutdown(), this)
> if (controlPlaneRequestHandlerPool != null)
>   CoreUtils.swallow(controlPlaneRequestHandlerPool.shutdown(), this)
> if (kafkaScheduler != null)
>   CoreUtils.swallow(kafkaScheduler.shutdown(), this)
> if (dataPlaneRequestProcessor != null)
>   CoreUtils.swallow(dataPlaneRequestProcessor.close(), this)
> if (controlPlaneRequestProcessor != null)
>   CoreUtils.swallow(controlPlaneRequestProcessor.close(), this){noformat}
> The kafkaController component is only shut down much later, after closing the 
> logManager, a process which may take some time as log closing requires 
> checkpointing state and flushing segments. If the broker being shutdown is 
> the controller, this means there is a potentially large window in which no 
> controller is processing controller requests. Only when the controller 
> component is shutdown and the zkClient is closed will the controller resign 
> leadership.
> There is a second problem in that a broker that does not successfully undergo 
> controlled shutdown will also remain the leader for its partitions until the 
> zkClient is shutdown, and the potential window there is large due to the 
> aforementioned log manager shutdown.
> It would be ideal if:
>  # controller leadership is resigned early in the shutdown process before 
> request handling is stopped. Care will have to be taken so that the broker in 
> question cannot regain it.
>  # we can reduce the window between an uncontrolled shutdown and resigning 
> leadership of partitions through the zkclient close failsafe.
> See also https://issues.apache.org/jira/browse/KAFKA-9358



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (KAFKA-9359) Controller does not handle requests while broker is being shutdown

Reply via email to