[ https://issues.apache.org/jira/browse/KAFKA-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14057570#comment-14057570 ]
Oleg edited comment on KAFKA-1530 at 7/10/14 3:41 PM: ------------------------------------------------------ Hello Guozhang, We tried using "controlled.shutdown.enable", but we encountered this bug https://issues.apache.org/jira/browse/KAFKA-1342 which resulted in node not loading data from the leader after restart. Furthermore, if "controlled.shutdown.enable" worked correctly, we couldn't use it because we periodically encounter these bugs: https://issues.apache.org/jira/browse/KAFKA-1382 and https://issues.apache.org/jira/browse/KAFKA-1407, when a node goes "crazy" and stops working properly. In this case it will take it forever to shutdown with "controlled.shutdown.enable". Which results in us being unable to determine if node is still trying to catch up with the leader (which it does if "controlled.shutdown.enable" is on) or it encountered a bug and has just hung. These situations happen quite often in our cluster and they render the use of "controlled.shutdown.enable" not possible. So now we just restart Kafkas (sometimes with "kill -9") with the possible outcome of loosing some part of data. Another situation, which in fact is not very regular, but happened, which is all Kafka cluster going down due to power shortage in the datacenter. In this case some of the Kafkas could have been behind the leader. And we should begin starting them in the proper order (from the leaders) for the data not to be truncated. So, we are seeking the way to restart/update Kafka without loosing data (maybe you have a script to start it in the right order) Best regards, Oleg was (Author: ovgolovin): Hello Guozhang, We tried using "controlled.shutdown.enable", but we encountered this bug https://issues.apache.org/jira/browse/KAFKA-1342 which resulted in node not loading data from the leader after restart. Furthermore, if "controlled.shutdown.enable" worked correctly, we couldn't use it because we periodically encounter these bugs: https://issues.apache.org/jira/browse/KAFKA-1382 and https://issues.apache.org/jira/browse/KAFKA-1407, when a node goes "crazy" and stops working properly. In this case it will take it forever to shutdown with "controlled.shutdown.enable". Which results in us being unable to determine if node is still trying to catch up with the leader (which it does if "controlled.shutdown.enable" is on) or it encountered a bug and has just hung. These situations happen quite often in our cluster and they render the use of "controlled.shutdown.enable" not possible. So now we just restart Kafkas (sometimes with "kill -9") with the possible outcome of loosing some part of data. Another situation, which in fact is not very regular, but happened, which is all Kafka cluster going down due to power shortage in the datacenter. In this case some of the Kafkas could have been behind the leader. And we should begin starting them in the proper order (from the leaders) for the data not to be truncated. So, we are seeking the way to restart/update Kafka without loosing data. Best regards, Oleg > howto update continuously > ------------------------- > > Key: KAFKA-1530 > URL: https://issues.apache.org/jira/browse/KAFKA-1530 > Project: Kafka > Issue Type: Wish > Reporter: Stanislav Gilmulin > Priority: Minor > Labels: operating_manual, performance > > Hi, > > Could I ask you a question about the Kafka update procedure? > Is there a way to update software, which doesn't require service interruption > or lead to data losses? > We can't stop message brokering during the update as we have a strict SLA. > > Best regards > Stanislav Gilmulin -- This message was sent by Atlassian JIRA (v6.2#6252)