[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Santhanam updated CLOUDSTACK-3137:
-------------------------------------------

    Description: 
SystemVMs fail to start during HA on latest master 
(4591f94a0bb69875dca769c9738998d3b5b96106)

It looks like the HA of system VMs is failing. On our automated test 
environment we reboot the management server before the systemVMs spin up to 
ensure that the global settings are affected and the configuration for the 
secstorage.internal.sites is rightly set in the secondary storage VM. The 
management server upon reboot discovers that SSVM was put to perform HA in the 
op_ha_work table and starts to re-spin the SSVM. At this time the CPVM is also 
booting up. No further activity is observed after two HA attempts.

SystemVM start before Management server start: 
Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,738 DEBUG 
[cloud.vm.VirtualMachineManagerImpl] (secstorage-1:null) Allocating entries for 
VM: VM[SecondaryStorageVm|s-1-VM]
Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,771 DEBUG 
[cloud.vm.VirtualMachineManagerImpl] (secstorage-1:null) Allocating nics for 
VM[SecondaryStorageVm|s-1-VM]
Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,774 DEBUG 
[cloud.network.NetworkManagerImpl] (secstorage-1:null) Allocating nic for vm 
VM[SecondaryStorageVm|s-1-VM] in network Ntwk[200|Public|1] with requested 
profile NicProfile[0-0-null-null-null
Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,843 DEBUG 
[cloud.network.NetworkManagerImpl] (secstorage-1:null) Allocating nic for vm 
VM[SecondaryStorageVm|s-1-VM] in network Ntwk[202|Control|3] with requested 
profile null
Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,870 DEBUG 
[cloud.network.NetworkManagerImpl] (secstorage-1:null) Allocating nic for vm 
VM[SecondaryStorageVm|s-1-VM] in network Ntwk[201|Management|2] with requested 
profile null
Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,916 DEBUG 
[cloud.network.NetworkManagerImpl] (secstorage-1:null) Allocating nic for vm 
VM[SecondaryStorageVm|s-1-VM] in network Ntwk[203|Storage|4] with requested 
profile null
...
...
Jun 24 22:30:52 cloudstack-centos63 local0: 2013-06-25 05:30:52,552 DEBUG 
[agent.transport.Request] (consoleproxy-1:null) Seq 3-2067660811: Sending  { 
Cmd , MgmtId: 200973787296321, via: 3, Ver: v1, Flags: 100111, 
[{"org.apache.cloudstack.storage.command.CopyCommand":{"srcTO":{"org.apache.cloudstack.storage.to.TemplateObjectTO":{"path":"template/tmpl/1/1/","origUrl":"http://nfs/templates/acton/acton-systemvm-02062012.vhd.bz2","uuid":"1","id":1,"format":"VHD","accountId":1,"checksum":"f613f38c96bf039f2e5cbf92fa8ad4f8","hvm":false,"displayText":"SystemVM
 Template 
(XenServer)","imageDataStore":{"com.cloud.agent.api.to.NfsTO":{"_url":"nfs://nfs.fmt.vmops.com:/expo...

Jun 24 22:31:01 cloudstack-centos63 local0: 2013-06-25 05:31:01,824 INFO  
[context.support.XmlWebApplicationContext] (Thread-26:null) Closing Root 
WebApplicationContext: startup date [Mon Jun 24 23:25:50 UTC 2013]; root of 
context hierarchy
^^^^^^^^^^ Management Server Restarts ^^^^^^^^^^^^^^

After restart SSVM scheduled for HA
Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,394 INFO  
[cloud.ha.HighAvailabilityManagerImpl] (Timer-1:null) Schedule vm for HA:  
VM[SecondaryStorageVm|s-1-VM]
Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,455 INFO  
[cloud.vm.VirtualMachineManagerImpl] (Timer-1:null) Handling unfinished work 
item: ItWork[1805a89b-d03a-4f1f-b717-ac9f5da6edea-Starting-2-Prepare]
Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,504 INFO  
[cloud.ha.HighAvailabilityManagerImpl] (Timer-1:null) Schedule vm for HA:  
VM[ConsoleProxy|v-2-VM]
...
Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,540 INFO  
[cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:null) Starting work
Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,547 INFO  
[cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-1:null) Starting work
...
Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,815 INFO  
[cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-1) Processing 
HAWork[1-HA-1-Starting-Investigating]
...
Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,289 DEBUG 
[cloud.ha.CheckOnAgentInvestigator] (HA-Worker-0:work-2) Unable to reach the 
agent for VM[ConsoleProxy|v-2-VM]: Resource [Host:3] is unreachable: Host 3: 
Host with specified id is not in the right state: Disconnected
Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,290 INFO  
[cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-2) SimpleInvestigator 
found VM[ConsoleProxy|v-2-VM]to be alive? null
Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,293 DEBUG 
[cloud.ha.CheckOnAgentInvestigator] (HA-Worker-2:work-1) Unable to reach the 
agent for VM[SecondaryStorageVm|s-1-VM]: Resource [Host:2] is unreachable: Host 
2: Host with specified id is not in the right state: Disconnected
Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,298 INFO  
[cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-1) SimpleInvestigator 
found VM[SecondaryStorageVm|s-1-VM]to be alive? null
Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,336 INFO  
[cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-2) 
XenServerInvestigator found VM[ConsoleProxy|v-2-VM]to be alive? null
Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,338 DEBUG 
[cloud.ha.UserVmDomRInvestigator] (HA-Worker-0:work-2) Not a User Vm, unable to 
determine state of VM[ConsoleProxy|v-2-VM] returning null
Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,338 INFO  
[cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-2) null found 
VM[ConsoleProxy|v-2-VM]to be alive? null
Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,338 DEBUG 
[cloud.ha.ManagementIPSystemVMInvestigator] (HA-Worker-0:work-2) Testing if 
VM[ConsoleProxy|v-2-VM] is alive
....
Rescheduled HA
Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,623 INFO  
[cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-2) Rescheduling 
HAWork[2-HA-2-Starting-Investigating] to try again at Tue Jun 25 05:43:07 UTC 
2013
Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,626 INFO  
[cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-1) Rescheduling 
HAWork[1-HA-1-Starting-Investigating] to try again at Tue Jun 25 05:43:07 UTC 
2013
...
Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,586 INFO  
[cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-3:work-1) Processing 
HAWork[1-HA-1-Starting-Investigating]
Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,592 INFO  
[cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-3:work-1) HA on 
VM[SecondaryStorageVm|s-1-VM]
Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,593 INFO  
[cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-3:work-1) VM 
VM[SecondaryStorageVm|s-1-VM] has been changed.  Current State = Starting 
Previous State = Starting last updated = 6 previous updated = 2
Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,593 INFO  
[cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-3:work-1) Completed 
HAWork[1-HA-1-Starting-Investigating]
Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,611 INFO  
[cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-2) Processing 
HAWork[2-HA-2-Starting-Investigating]
Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,618 INFO  
[cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-2) HA on 
VM[ConsoleProxy|v-2-VM]
Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,618 INFO  
[cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-2) VM 
VM[ConsoleProxy|v-2-VM] has been changed.  Current State = Starting Previous 
State = Starting last updated = 6 previous updated = 2
Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,618 INFO  
[cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-2) Completed 
HAWork[2-HA-2-Starting-Investigating]


  was:
SystemVMs fail to start on latest master 
(4591f94a0bb69875dca769c9738998d3b5b96106)

Only following exception is seen:
Jun 22 10:13:18 cloudstack-centos63 local0: 2013-06-22 17:13:18,962 INFO  
[storage.volume.VolumeServiceImpl] (secstorage-1:null) Got exception in case of 
multi-thread
Jun 22 10:13:19 cloudstack-centos63 local0: 2013-06-22 17:13:19,112 INFO  
[storage.volume.VolumeServiceImpl] (consoleproxy-1:null) Got exception in case 
of multi-thread

Full logs attached.

    
> systemvms HA fails on latest master (4591f94)
> ---------------------------------------------
>
>                 Key: CLOUDSTACK-3137
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3137
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>            Reporter: Prasanna Santhanam
>            Priority: Blocker
>         Attachments: logs.tar.bz2
>
>
> SystemVMs fail to start during HA on latest master 
> (4591f94a0bb69875dca769c9738998d3b5b96106)
> It looks like the HA of system VMs is failing. On our automated test 
> environment we reboot the management server before the systemVMs spin up to 
> ensure that the global settings are affected and the configuration for the 
> secstorage.internal.sites is rightly set in the secondary storage VM. The 
> management server upon reboot discovers that SSVM was put to perform HA in 
> the op_ha_work table and starts to re-spin the SSVM. At this time the CPVM is 
> also booting up. No further activity is observed after two HA attempts.
> SystemVM start before Management server start: 
> Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,738 DEBUG 
> [cloud.vm.VirtualMachineManagerImpl] (secstorage-1:null) Allocating entries 
> for VM: VM[SecondaryStorageVm|s-1-VM]
> Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,771 DEBUG 
> [cloud.vm.VirtualMachineManagerImpl] (secstorage-1:null) Allocating nics for 
> VM[SecondaryStorageVm|s-1-VM]
> Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,774 DEBUG 
> [cloud.network.NetworkManagerImpl] (secstorage-1:null) Allocating nic for vm 
> VM[SecondaryStorageVm|s-1-VM] in network Ntwk[200|Public|1] with requested 
> profile NicProfile[0-0-null-null-null
> Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,843 DEBUG 
> [cloud.network.NetworkManagerImpl] (secstorage-1:null) Allocating nic for vm 
> VM[SecondaryStorageVm|s-1-VM] in network Ntwk[202|Control|3] with requested 
> profile null
> Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,870 DEBUG 
> [cloud.network.NetworkManagerImpl] (secstorage-1:null) Allocating nic for vm 
> VM[SecondaryStorageVm|s-1-VM] in network Ntwk[201|Management|2] with 
> requested profile null
> Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,916 DEBUG 
> [cloud.network.NetworkManagerImpl] (secstorage-1:null) Allocating nic for vm 
> VM[SecondaryStorageVm|s-1-VM] in network Ntwk[203|Storage|4] with requested 
> profile null
> ...
> ...
> Jun 24 22:30:52 cloudstack-centos63 local0: 2013-06-25 05:30:52,552 DEBUG 
> [agent.transport.Request] (consoleproxy-1:null) Seq 3-2067660811: Sending  { 
> Cmd , MgmtId: 200973787296321, via: 3, Ver: v1, Flags: 100111, 
> [{"org.apache.cloudstack.storage.command.CopyCommand":{"srcTO":{"org.apache.cloudstack.storage.to.TemplateObjectTO":{"path":"template/tmpl/1/1/","origUrl":"http://nfs/templates/acton/acton-systemvm-02062012.vhd.bz2","uuid":"1","id":1,"format":"VHD","accountId":1,"checksum":"f613f38c96bf039f2e5cbf92fa8ad4f8","hvm":false,"displayText":"SystemVM
>  Template 
> (XenServer)","imageDataStore":{"com.cloud.agent.api.to.NfsTO":{"_url":"nfs://nfs.fmt.vmops.com:/expo...
> Jun 24 22:31:01 cloudstack-centos63 local0: 2013-06-25 05:31:01,824 INFO  
> [context.support.XmlWebApplicationContext] (Thread-26:null) Closing Root 
> WebApplicationContext: startup date [Mon Jun 24 23:25:50 UTC 2013]; root of 
> context hierarchy
> ^^^^^^^^^^ Management Server Restarts ^^^^^^^^^^^^^^
> After restart SSVM scheduled for HA
> Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,394 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (Timer-1:null) Schedule vm for HA:  
> VM[SecondaryStorageVm|s-1-VM]
> Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,455 INFO  
> [cloud.vm.VirtualMachineManagerImpl] (Timer-1:null) Handling unfinished work 
> item: ItWork[1805a89b-d03a-4f1f-b717-ac9f5da6edea-Starting-2-Prepare]
> Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,504 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (Timer-1:null) Schedule vm for HA:  
> VM[ConsoleProxy|v-2-VM]
> ...
> Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,540 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:null) Starting work
> Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,547 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-1:null) Starting work
> ...
> Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,815 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-1) Processing 
> HAWork[1-HA-1-Starting-Investigating]
> ...
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,289 DEBUG 
> [cloud.ha.CheckOnAgentInvestigator] (HA-Worker-0:work-2) Unable to reach the 
> agent for VM[ConsoleProxy|v-2-VM]: Resource [Host:3] is unreachable: Host 3: 
> Host with specified id is not in the right state: Disconnected
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,290 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-2) 
> SimpleInvestigator found VM[ConsoleProxy|v-2-VM]to be alive? null
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,293 DEBUG 
> [cloud.ha.CheckOnAgentInvestigator] (HA-Worker-2:work-1) Unable to reach the 
> agent for VM[SecondaryStorageVm|s-1-VM]: Resource [Host:2] is unreachable: 
> Host 2: Host with specified id is not in the right state: Disconnected
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,298 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-1) 
> SimpleInvestigator found VM[SecondaryStorageVm|s-1-VM]to be alive? null
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,336 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-2) 
> XenServerInvestigator found VM[ConsoleProxy|v-2-VM]to be alive? null
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,338 DEBUG 
> [cloud.ha.UserVmDomRInvestigator] (HA-Worker-0:work-2) Not a User Vm, unable 
> to determine state of VM[ConsoleProxy|v-2-VM] returning null
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,338 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-2) null found 
> VM[ConsoleProxy|v-2-VM]to be alive? null
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,338 DEBUG 
> [cloud.ha.ManagementIPSystemVMInvestigator] (HA-Worker-0:work-2) Testing if 
> VM[ConsoleProxy|v-2-VM] is alive
> ....
> Rescheduled HA
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,623 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-2) Rescheduling 
> HAWork[2-HA-2-Starting-Investigating] to try again at Tue Jun 25 05:43:07 UTC 
> 2013
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,626 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-1) Rescheduling 
> HAWork[1-HA-1-Starting-Investigating] to try again at Tue Jun 25 05:43:07 UTC 
> 2013
> ...
> Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,586 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-3:work-1) Processing 
> HAWork[1-HA-1-Starting-Investigating]
> Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,592 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-3:work-1) HA on 
> VM[SecondaryStorageVm|s-1-VM]
> Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,593 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-3:work-1) VM 
> VM[SecondaryStorageVm|s-1-VM] has been changed.  Current State = Starting 
> Previous State = Starting last updated = 6 previous updated = 2
> Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,593 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-3:work-1) Completed 
> HAWork[1-HA-1-Starting-Investigating]
> Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,611 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-2) Processing 
> HAWork[2-HA-2-Starting-Investigating]
> Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,618 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-2) HA on 
> VM[ConsoleProxy|v-2-VM]
> Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,618 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-2) VM 
> VM[ConsoleProxy|v-2-VM] has been changed.  Current State = Starting Previous 
> State = Starting last updated = 6 previous updated = 2
> Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,618 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-2) Completed 
> HAWork[2-HA-2-Starting-Investigating]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to