Github user resmo commented on the pull request: https://github.com/apache/cloudstack/pull/829#issuecomment-142882541 @anshul1886 @koushik-das @DaanHoogland and I had a debug session last friday, and since he is off for the next couple of days I can give you more details about we analysed. The powerReportMissing is not the problem, it is only the trigger. The graceful period is the problem. The calculation of this period is relaying (see https://github.com/apache/cloudstack/blob/4.5.2/engine/orchestration/src/com/cloud/vm/VirtualMachinePowerStateSyncImpl.java#L114) on the field `update_time` in table `vm_instance`. But if I look at the value it seems it doesn't get updated. So the grace period has most likely always passed. I tried to do a workaround doing the following, I ran an update sql for every 5 seconds which updated the `update_time` for my router r-342 which I was migrating around esx cluster nodes: ~~~ mysql -e 'update cloud.vm_instance set update_time=NOW() where id=342;' ~~~ And the router didn't get rebooted: ~~~ 2015-09-24 11:47:07,685 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-218:ctx-5849bd19) VM state report. host: 25, vm id: 342, power state: PowerOn 2015-09-24 11:47:07,696 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-218:ctx-5849bd19) VM state report is updated. host: 25, vm id: 342, power state: PowerOn 2015-09-24 11:48:06,462 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-55:ctx-84cd4323) VM state report. host: 19, vm id: 342, power state: PowerOn 2015-09-24 11:48:06,471 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-55:ctx-84cd4323) VM state report is updated. host: 19, vm id: 342, power state: PowerOn 2015-09-24 11:48:06,493 WARN [o.a.c.alerts] (DirectAgentCronJob-55:ctx-84cd4323) alertType:: 9 // dataCenterId:: 1 // podId:: 1 // clusterId:: null // message:: Router has been migrated out of band: r-342-VM 2015-09-24 11:49:06,539 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-29:ctx-2a57d676) Detected missing VM. host: 19, vm id: 342, power state: PowerReportMissing, last state update: 1443095344000 2015-09-24 11:49:06,539 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-29:ctx-2a57d676) vm id: 342 - time since last state update(-7197461ms) has not passed graceful period yet 2015-09-24 11:49:07,719 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] (DirectAgentCronJob-444:ctx-fdd4c055) VM state report. host: 20, vm id: 342, power state: PowerOn ~~~ Which means this patch is not fix the root cause. To me the root cause is that `update_time` is not updated or the gracePeriod calculation is wrong. Any thoughts?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---