[ 
https://issues.apache.org/jira/browse/KAFKA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430981#comment-15430981
 ] 

Gabriel Ibarra commented on KAFKA-4051:
---------------------------------------

Hi, I'm sorry for the delay.
I agree with you guys, it is not a typical scenario, it is even difficult for 
me to think about the reasons for a System Administrator to change the 
data/time; but he can, and as Rajini Sivaram said the impact is quite big if it 
does happen.

Added to that, our system is working in a VM with NTP, and we see some spurious 
changes on the system time (this way we detected this issue), we are now 
analyzing why the time is changing, but we suspect that the system start with 
the hardware time and then NTP synchronize the system clock using the time zone 
configured in the VM.

It is a good news that it could be fixed with small changes. Great Job Rajini

> Strange behavior during rebalance when turning the OS clock back
> ----------------------------------------------------------------
>
>                 Key: KAFKA-4051
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4051
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer
>    Affects Versions: 0.10.0.0
>         Environment: OS: Ubuntu 14.04 - 64bits
>            Reporter: Gabriel Ibarra
>            Assignee: Rajini Sivaram
>
> If a rebalance is performed after turning the OS clock back, then the kafka 
> server enters in a loop and the rebalance cannot be completed until the 
> system returns to the previous date/hour.
> Steps to Reproduce:
> - Start a consumer for TOPIC_NAME with group id GROUP_NAME. It will be owner 
> of all the partitions.
> - Turn the system (OS) clock back. For instance 1 hour.
> - Start a new consumer for TOPIC_NAME  using the same group id, it will force 
> a rebalance.
> After these actions the kafka server logs constantly display the messages 
> below, and after a while both consumers do not receive more packages. This 
> condition lasts at least the time that the clock went back, for this example 
> 1 hour, and finally after this time kafka comes back to work.
> [2016-08-08 11:30:23,023] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 2 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,025] INFO [GroupCoordinator 0]: Stabilized group 
> GROUP_NAME generation 3 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,027] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 3 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,029] INFO [GroupCoordinator 0]: Group GROUP_NAME 
> generation 3 is dead and removed (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,032] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 0 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,032] INFO [GroupCoordinator 0]: Stabilized group 
> GROUP_NAME generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,033] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,034] INFO [GroupCoordinator 0]: Group GROUP generation 1 
> is dead and removed (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,043] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 0 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,044] INFO [GroupCoordinator 0]: Stabilized group 
> GROUP_NAME generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,044] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group GROUP_NAME with old generation 1 (kafka.coordinator.GroupCoordinator)
> [2016-08-08 11:30:23,045] INFO [GroupCoordinator 0]: Group GROUP_NAME 
> generation 1 is dead and removed (kafka.coordinator.GroupCoordinator)
> Due to the fact that some systems could have enabled NTP or an administrator 
> option to change the system clock (date/time) it's important to do it safely, 
> currently the only way to do it safely is following the next steps:
> 1-  Tear down the Kafka server.
> 2-  Change the date/time
> 3- Tear up the Kafka server.
> But, this approach can be done only if the change was performed by the 
> administrator, not for NTP. Also in many systems turning down the Kafka 
> server might cause the INFORMATION TO BE LOST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to