cmccabe commented on pull request #10991:
URL: https://github.com/apache/kafka/pull/10991#issuecomment-876822385


   > That's a good idea. I think the diff above has a problem in that we would 
emit the message too early for the case where process.roles=broker, but I think 
the idea in general is sound.
   
   How about something like this?
   ```
   diff --git a/core/src/main/scala/kafka/server/KafkaRaftServer.scala 
b/core/src/main/scala/kafka/server/KafkaRaftServer.scala
   index 8e77357383..310aed8253 100644
   --- a/core/src/main/scala/kafka/server/KafkaRaftServer.scala
   +++ b/core/src/main/scala/kafka/server/KafkaRaftServer.scala
   @@ -106,10 +106,15 @@ class KafkaRaftServer(
      override def startup(): Unit = {
        Mx4jLoader.maybeLoad()
        raftManager.startup()
   -    controller.foreach(_.startup())
   -    broker.foreach(_.startup())
   +    controller.foreach { controller =>
   +      controller.startup()
   +      info(KafkaBroker.STARTED_MESSAGE + " controller")
   +    }
   +    broker.foreach { broker =>
   +      broker.startup()
   +      info(KafkaBroker.STARTED_MESSAGE + " broker")
   +    }
        AppInfoParser.registerAppInfo(Server.MetricsPrefix, 
config.brokerId.toString, metrics, time.milliseconds())
   -    info(KafkaBroker.STARTED_MESSAGE)
      }
    
      override def shutdown(): Unit = {
   ```
   
   > Maybe we emit a different message when there is a controller role and that 
starts, and we emit the same message we emit now after we start the broker (if 
any). So it might look like this (Im typing this free-form, so ignore any typos 
-- it's just to get the point across) [...] And then in the system test code, 
where we check for the log message, we check for the CONTROLLER_STARTED_MESSAGE 
when the roles include the controller role -- otherwise we check for the 
standard message.
   
   Hmm, what's the advantage of checking for two different strings?
   
   It seems like if we have the broker emit "Kafka Server started broker" and 
have the controller emit "Kafka Server started controller" we can check for 
different messages if we want later, but still have simple code now.
   
   > Just realized that one potential problem with the log message approach is 
that the last broker in a 3-broker cluster with 3 co-located controllers dos 
not shut down cleanly. The current approach deals with this by sending SIGKILL 
instead of SIGTERM. The log message approach doesn't deal with it, so the 
shutdown in the system test would end up timing out after 60 seconds and then 
sending SIGKILL later as part of the cleanup (the test would still pass). Not 
sure if this adds credence to the current approach?
   
   I think you do need to special case this and send SIGKILL to the last n / 2 
(round down) combined nodes. But at least we can avoid special-casing startup, 
right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to