Hi, Ive added more hosts and enabled ha on all of them. Now i shoot down node cs-hv-06, which is running r-199. Here are the logs iam gettin.
-- 2018-08-13 12:12:51,402 DEBUG [c.c.h.HighAvailabilityManagerImpl] (pool-5-thread-1:null) (logid:b71a09c7) Notifying HA Mgr of to restart vm 199-r-199-VM 2018-08-13 12:12:51,410 INFO [c.c.h.HighAvailabilityManagerImpl] (pool-5-thread-1:null) (logid:b71a09c7) Schedule vm for HA: VM[DomainRouter|r-199-VM] 2018-08-13 12:12:51,418 INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-3:ctx-bd6bd4a1 work-48) (logid:e99fa8f3) Processing work HAWork[48-HA-199-Running-Investigating] 2018-08-13 12:12:51,421 INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-3:ctx-bd6bd4a1 work-48) (logid:e99fa8f3) HA on VM[DomainRouter|r-199-VM] 2018-08-13 12:12:51,424 DEBUG [c.c.h.CheckOnAgentInvestigator] (HA-Worker-3:ctx-bd6bd4a1 work-48) (logid:e99fa8f3) Unable to reach the agent for VM[DomainRouter|r-199-VM]: Resource [Host:32] is unreachable: Host 32: Host with specified id is not in the right state: Down 2018-08-13 12:12:51,424 INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-3:ctx-bd6bd4a1 work-48) (logid:e99fa8f3) SimpleInvestigator could not find VM[DomainRouter|r-199-VM] 2018-08-13 12:12:51,424 INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-3:ctx-bd6bd4a1 work-48) (logid:e99fa8f3) XenServerInvestigator could not find VM[DomainRouter|r-199-VM] 2018-08-13 12:12:51,426 INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-3:ctx-bd6bd4a1 work-48) (logid:e99fa8f3) KVMInvestigator found VM[DomainRouter|r-199-VM] to be alive? true 2018-08-13 12:12:51,426 DEBUG [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-3:ctx-bd6bd4a1 work-48) (logid:e99fa8f3) VM r-199-VM is found to be alive by KVMInvestigator 2018-08-13 12:12:51,426 INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-3:ctx-bd6bd4a1 work-48) (logid:e99fa8f3) Rescheduling work HAWork[48-HA-199-Running-Investigating] to try again at Mon Aug 13 12:13:52 CEST 2018 2018-08-13 12:14:51,431 INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-631d0fad work-48) (logid:85d7428e) Processing work HAWork[48-HA-199-Running-Investigating] 2018-08-13 12:14:51,433 INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-631d0fad work-48) (logid:85d7428e) HA on VM[DomainRouter|r-199-VM] 2018-08-13 12:14:51,436 DEBUG [c.c.h.CheckOnAgentInvestigator] (HA-Worker-1:ctx-631d0fad work-48) (logid:85d7428e) Unable to reach the agent for VM[DomainRouter|r-199-VM]: Resource [Host:32] is unreachable: Host 32: Host with specified id is not in the right state: Down 2018-08-13 12:14:51,436 INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-631d0fad work-48) (logid:85d7428e) SimpleInvestigator could not find VM[DomainRouter|r-199-VM] 2018-08-13 12:14:51,436 INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-631d0fad work-48) (logid:85d7428e) XenServerInvestigator could not find VM[DomainRouter|r-199-VM] 2018-08-13 12:14:51,438 INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-631d0fad work-48) (logid:85d7428e) KVMInvestigator found VM[DomainRouter|r-199-VM] to be alive? true 2018-08-13 12:14:51,438 DEBUG [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-631d0fad work-48) (logid:85d7428e) VM r-199-VM is found to be alive by KVMInvestigator 2018-08-13 12:14:51,438 INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-631d0fad work-48) (logid:85d7428e) Rescheduling work HAWork[48-HA-199-Running-Investigating] to try again at Mon Aug 13 12:15:52 CEST 2018 -- The router ist dead, but its still detected as alive. It seems its also not BUG https://issues.apache.org/jira/browse/CLOUDSTACK-3535. kind regards, thomas On 02.08.2018 16:46, Thomas Heil wrote: > Hi, > > I have a setup with one advanced zone, one cluster and two Hosts. The > hosts are KVM and use a single NFS Storage von Primary and one for > Secondary. > > Everything is running smootly until I remove power from one host. > > In my honest opinion cloudstack should now delcare the faulty host as > dead, declare the vm's etc. that were running bevore there as dead and > start them on the leaving KVM host. > > But nothing happens. The VM's states remain as running, the system vms > become 'Agent State: dicconnected' and thats all. > > The only solution to fix that issue for me was to set the state of all > VM's that were running on the faulty node to 'Stopped'. > > Could anybody confirm that this is a reproduceable problem? > > kind regards, > thomas