Github user resmo commented on the pull request:

    https://github.com/apache/cloudstack/pull/829#issuecomment-142882541
  
    @anshul1886 @koushik-das 
    @DaanHoogland  and I  had a debug session last friday, and since he is off 
for the next couple of days I can give you more details about we analysed. 
    
    The powerReportMissing is not the problem, it is only the trigger. The 
graceful period is the problem. The calculation of this period is relaying (see 
https://github.com/apache/cloudstack/blob/4.5.2/engine/orchestration/src/com/cloud/vm/VirtualMachinePowerStateSyncImpl.java#L114)
 on the field `update_time` in table `vm_instance`. But if I look at the value  
it seems it doesn't get updated. So the grace period has most likely always 
passed. 
    
    I tried to do a workaround doing the following, I ran an update sql for 
every 5 seconds which updated the `update_time` for my router r-342 which I was 
migrating around esx cluster nodes:
    ~~~
     mysql -e 'update cloud.vm_instance set update_time=NOW() where id=342;'
    ~~~
    And the router didn't get rebooted:
    ~~~
    2015-09-24 11:47:07,685 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] 
(DirectAgentCronJob-218:ctx-5849bd19) VM state report. host: 25, vm id: 342, 
power state: PowerOn
    2015-09-24 11:47:07,696 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] 
(DirectAgentCronJob-218:ctx-5849bd19) VM state report is updated. host: 25, vm 
id: 342, power state: PowerOn
    2015-09-24 11:48:06,462 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] 
(DirectAgentCronJob-55:ctx-84cd4323) VM state report. host: 19, vm id: 342, 
power state: PowerOn
    2015-09-24 11:48:06,471 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] 
(DirectAgentCronJob-55:ctx-84cd4323) VM state report is updated. host: 19, vm 
id: 342, power state: PowerOn
    2015-09-24 11:48:06,493 WARN  [o.a.c.alerts] 
(DirectAgentCronJob-55:ctx-84cd4323)  alertType:: 9 // dataCenterId:: 1 // 
podId:: 1 // clusterId:: null // message:: Router has been migrated out of 
band: r-342-VM
    2015-09-24 11:49:06,539 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] 
(DirectAgentCronJob-29:ctx-2a57d676) Detected missing VM. host: 19, vm id: 342, 
power state: PowerReportMissing, last state update: 1443095344000
    2015-09-24 11:49:06,539 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] 
(DirectAgentCronJob-29:ctx-2a57d676) vm id: 342 - time since last state 
update(-7197461ms) has not passed graceful period yet
    2015-09-24 11:49:07,719 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] 
(DirectAgentCronJob-444:ctx-fdd4c055) VM state report. host: 20, vm id: 342, 
power state: PowerOn
    ~~~
    
    Which means this patch is not fix the root cause. To me the root cause is 
that `update_time` is not updated or the gracePeriod calculation is wrong.
    
    Any thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to