Sateesh Chodapuneedi created CLOUDSTACK-4911:
------------------------------------------------
Summary: [Mixed Hypervisor] VM Status is marked as alive when exit
status of ping command is not available within command timeout
Key: CLOUDSTACK-4911
URL: https://issues.apache.org/jira/browse/CLOUDSTACK-4911
Project: CloudStack
Issue Type: Bug
Security Level: Public (Anyone can view this level - this is the default.)
Components: VMware
Affects Versions: 4.2.0
Environment: Zone with a KVM cluster and VMware cluster
Reporter: Sateesh Chodapuneedi
Assignee: Sateesh Chodapuneedi
Fix For: 4.2.1
Setup:
1-KVM-cluster with two hosts host1,host2
2-Vmware cluster with 1 host host3
3-In KVM cluster create HAenabled VM1 System vms including (virtual router1)
VR1 is running on host1 Rack2host17
4-In vmware cluster create HAenabled VM2 on host3 (vmware ) VR2 +1 guest vm is
running on host3 51.4
5-Deploy a HA enable VM3 on host2 Rack2Host18
Steps:
1) Create KVM Instance which connect to VMWare Virtual Router
Instance Name:v-cl-test-10658-003-M00000002
Network:PublicFrontSegment-VM
Virtual ROuter: r-13123-VM
2) Migrate the Instance to the host(tckktky4-pbhpv081) which will be down
3) Shutdown the host(tckktky4-pbhpv081)
17:27 tckktky4-pbhpv081 shutdown
4) Host down detected
2013-05-08 17:32:24,233 WARN [agent.manager.AgentAttache]
(StatsCollector-2:null) Seq 177-582680794: Timed out on null
2013-05-08 17:32:24,233 WARN [agent.manager.AgentManagerImpl]
(StatsCollector-2:null) Operation timed out: Commands 582680794 to Host 177
timed out after 3600
...
2013-05-08 17:32:28,552 DEBUG [cloud.ha.UserVmDomRInvestigator]
(HA-Worker-1:work-633) user vm v-cl-test-10658-003-M00000002 has been
successfully pinged, returning that it is alive
★ after detecting ping 100% loss, confirmed Instance alive in the log
・・・
2013-05-08 17:32:28,552 DEBUG [cloud.ha.HighAvailabilityManagerImpl]
(HA-Worker-1:work-633) Rescheduling because the host is not up but the vm is
alive
=====
VM HA re-scheduling was repeated for 8 times and succeeded after failure of 7
times to start VM. In 8th attempt VM got HAed to other KVM host.
Root cause is : Exit status of ping command is not available within command
timeout of 20 seconds.
--
This message was sent by Atlassian JIRA
(v6.1#6144)