Hi ! My test today: I stopped other instance, and changed to HA Offer. I started this instance.
After, I shutdown gracefully the KVM host of it. and I checked the investigators process: [root@1q2 ~]# grep -i Investigator /var/log/cloudstack/management/management-server.log [root@1q2 ~]# date Mon Jul 20 14:39:43 UTC 2015 [root@1q2 ~]# ls -ltrh /var/log/cloudstack/management/management-server.log -rw-rw-r--. 1 cloud cloud 14M Jul 20 14:39 /var/log/cloudstack/management/management-server.log Nothing. I dont know how internally these process work. but seems that they are not working well, agree? options value ha.investigators.exclude nothing ha.investigators.orde SimpleInvestigator,XenServerInvestigator,KVMInvestigator,HypervInvestigator,VMwareInvestigator,PingInvestigator,ManagementIPSysVMInvestigator investigate.retry.interval 60 There´s a way to check if these process are running ? [root@1q2 ~]# ps waux| grep -i java root 11408 0.0 0.0 103252 880 pts/0 S+ 14:44 0:00 grep -i java cloud 24225 0.7 1.7 16982036 876412 ? Sl Jul16 43:48 /usr/lib/jvm/jre-1.7.0/bin/java -Djava.awt.headless=true -Dcom.sun.management.jmxremote=false -Xmx2g -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/cloudstack/management/ -XX:PermSize=512M -XX:MaxPermSize=800m -Djava.security.properties=/etc/cloudstack/management/java.security.ciphers -classpath :::/etc/cloudstack/management:/usr/share/cloudstack-management/setup:/usr/share/cloudstack-management/bin/bootstrap.jar:/usr/share/cloudstack-management/bin/tomcat-juli.jar:/usr/share/java/commons-daemon.jar -Dcatalina.base=/usr/share/cloudstack-management -Dcatalina.home=/usr/share/cloudstack-management -Djava.endorsed.dirs= -Djava.io.tmpdir=/usr/share/cloudstack-management/temp -Djava.util.logging.config.file=/usr/share/cloudstack-management/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager org.apache.catalina.startup.Bootstrap start Thanks On Sat, Jul 18, 2015 at 1:53 PM, Milamber <[email protected]> wrote: > > > On 17/07/2015 22:26, Somesh Naidu wrote: > >> Perhaps, the management server don't reconize the host 3 totally down >>> (ping alive? or some quorum don't ok) >>> The only way to the mgt server to accept totally that the host 3 has a >>> real problem that the host 3 has been reboot (around 12:44)? >>> >> The host disconnect was triggered at 12:19 on host 3. Mgmt server was >> pretty sure the host is down (it was a graceful shutdown I believe) which >> is why it triggered a disconnect and notified other nodes. There was no >> checkhealth/checkonhost/etc. triggered; just the agent disconnected and all >> listeners (ping/etc.) notified. >> >> At this time mgmt server should have scheduled HA on all VMs running on >> that host. The HA investigators would then work their way identifying >> whether the VMs are still running, if they need to be fenced, etc. But this >> never happened. >> > > > AFAIK, stopping the cloudstack-agent service don't allow to start the HA > process for the VMs hosted by the node. Seems normal to me that the HA > process don't start at this moment. > If I would start the HA process on a node, I go to the Web UI (or > cloudmonkey) to change the state of the Host from Up to Maintenance. > > > (after I can stop the CS-agent service if I need for exemple reboot a node) > > > >> Regards, >> Somesh >> >> >> -----Original Message----- >> From: Milamber [mailto:[email protected]] >> Sent: Friday, July 17, 2015 6:01 PM >> To: [email protected] >> Subject: Re: HA feature - KVM - CloudStack 4.5.1 >> >> >> >> On 17/07/2015 21:23, Somesh Naidu wrote: >> >>> Ok, so here are my findings. >>> >>> 1. Host ID 3 was shutdown around 2015-07-16 12:19:09 at which point >>> management server called a disconnect. >>> 2. Based on the logs, it seems VM IDs 32, 18, 39 and 46 were running on >>> the host. >>> 3. No HA tasks for any of these VMs at this time. >>> 5. Management server restarted at around 2015-07-16 12:30:20. >>> 6. Host ID 3 connected back at around 2015-07-16 12:44:08. >>> 7. Management server identified the missing VMs and triggered HA on >>> those. >>> 8. The VMs were eventually started, all 4 of them. >>> >>> I am not 100% sure why HA wasn't triggered until 2015-07-16 12:30 (#3), >>> but I know that management server restart caused it not happen until the >>> host was reconnected. >>> >> Perhaps, the management server don't reconize the host 3 totally down >> (ping alive? or some quorum don't ok) >> The only way to the mgt server to accept totally that the host 3 has a >> real problem that the host 3 has been reboot (around 12:44)? >> >> What is the storage subsystem? CLVMd? >> >> >> Regards, >>> Somesh >>> >>> >>> -----Original Message----- >>> From: Luciano Castro [mailto:[email protected]] >>> Sent: Friday, July 17, 2015 12:13 PM >>> To: [email protected] >>> Subject: Re: HA feature - KVM - CloudStack 4.5.1 >>> >>> No problems Somesh, thanks for your help. >>> >>> Link of log: >>> >>> >>> https://dl.dropboxusercontent.com/u/6774061/management-server.log.2015-07-16.gz >>> >>> Luciano >>> >>> On Fri, Jul 17, 2015 at 12:00 PM, Somesh Naidu <[email protected]> >>> wrote: >>> >>> How large is the management server logs dated 2015-07-16? I would like >>>> to >>>> review the logs. All the information I need from that incident should >>>> be in >>>> there so I don't need any more testing. >>>> >>>> Regards, >>>> Somesh >>>> >>>> -----Original Message----- >>>> From: Luciano Castro [mailto:[email protected]] >>>> Sent: Friday, July 17, 2015 7:58 AM >>>> To: [email protected] >>>> Subject: Re: HA feature - KVM - CloudStack 4.5.1 >>>> >>>> Hi Somesh! >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> [root@1q2 ~]# zgrep -i -E >>>> >>>> >>>> 'SimpleIvestigator|KVMInvestigator|PingInvestigator|ManagementIPSysVMInvestigator' >>>> /var/log/cloudstack/management/management-server.log.2015-07-16.gz |tail >>>> -5000 > /tmp/management.txt >>>> [root@1q2 ~]# cat /tmp/management.txt >>>> 2015-07-16 12:30:45,452 DEBUG [o.a.c.s.l.r.ExtensionRegistry] >>>> (main:null) >>>> Registering extension [KVMInvestigator] in [Ha Investigators Registry] >>>> 2015-07-16 12:30:45,452 DEBUG [o.a.c.s.l.r.RegistryLifecycle] >>>> (main:null) >>>> Registered com.cloud.ha.KVMInvestigator@57ceec9a >>>> 2015-07-16 12:30:45,927 DEBUG [o.a.c.s.l.r.ExtensionRegistry] >>>> (main:null) >>>> Registering extension [PingInvestigator] in [Ha Investigators Registry] >>>> 2015-07-16 12:30:45,928 DEBUG [o.a.c.s.l.r.ExtensionRegistry] >>>> (main:null) >>>> Registering extension [ManagementIPSysVMInvestigator] in [Ha >>>> Investigators >>>> Registry] >>>> 2015-07-16 12:30:53,796 INFO [o.a.c.s.l.r.DumpRegistry] (main:null) >>>> Registry [Ha Investigators Registry] contains [SimpleInvestigator, >>>> XenServerInvestigator, KVMInv >>>> >>>> I searched this log before, but as I thought that had not nothing >>>> special. >>>> >>>> If you want propose to me another scenario of test, I can do it. >>>> >>>> Thanks >>>> >>>> >>>> On Thu, Jul 16, 2015 at 7:27 PM, Somesh Naidu <[email protected]> >>>> wrote: >>>> >>>> What about other investigators, specifically " KVMInvestigator, >>>>> PingInvestigator"? They report the VMs as alive=false too? >>>>> >>>>> Also, it is recommended that you look at the management-sever.log >>>>> instead >>>>> of catalina.out (for one, the latter doesn’t have timestamp). >>>>> >>>>> Regards, >>>>> Somesh >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Luciano Castro [mailto:[email protected]] >>>>> Sent: Thursday, July 16, 2015 1:14 PM >>>>> To: [email protected] >>>>> Subject: Re: HA feature - KVM - CloudStack 4.5.1 >>>>> >>>>> Hi Somesh! >>>>> >>>>> >>>>> thanks for help.. I did again ,and I collected new logs: >>>>> >>>>> My vm_instance name is i-2-39-VM. There was some routers in KVM host >>>>> 'A' >>>>> (this one that I powered off now): >>>>> >>>>> >>>>> [root@1q2 ~]# grep -i -E 'SimpleInvestigator.*false' >>>>> /var/log/cloudstack/management/catalina.out >>>>> INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-2:ctx-e2f91c9c >>>>> >>>> work-3) >>>> >>>>> SimpleInvestigator found VM[DomainRouter|r-4-VM]to be alive? false >>>>> INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-729acf4f >>>>> >>>> work-7) >>>> >>>>> SimpleInvestigator found VM[User|i-23-33-VM]to be alive? false >>>>> INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-4:ctx-a66a4941 >>>>> >>>> work-8) >>>> >>>>> SimpleInvestigator found VM[DomainRouter|r-36-VM]to be alive? false >>>>> INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-5977245e >>>>> work-10) SimpleInvestigator found VM[User|i-17-26-VM]to be alive? false >>>>> INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-c7f39be0 >>>>> >>>> work-9) >>>> >>>>> SimpleInvestigator found VM[DomainRouter|r-32-VM]to be alive? false >>>>> INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-3:ctx-ad4f5fda >>>>> work-10) SimpleInvestigator found VM[DomainRouter|r-46-VM]to be alive? >>>>> false >>>>> INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-0:ctx-0257f5af >>>>> work-11) SimpleInvestigator found VM[User|i-4-52-VM]to be alive? false >>>>> INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-4:ctx-7ddff382 >>>>> work-12) SimpleInvestigator found VM[DomainRouter|r-32-VM]to be alive? >>>>> false >>>>> INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-9f79917e >>>>> work-13) SimpleInvestigator found VM[User|i-2-39-VM]to be alive? false >>>>> >>>>> >>>>> >>>>> KVM host 'B' agent log (where the machine would be migrate): >>>>> >>>>> 2015-07-16 16:58:56,537 INFO [kvm.resource.LibvirtComputingResource] >>>>> (agentRequest-Handler-4:null) Live migration of instance i-2-39-VM >>>>> initiated >>>>> 2015-07-16 16:58:57,540 INFO [kvm.resource.LibvirtComputingResource] >>>>> (agentRequest-Handler-4:null) Waiting for migration of i-2-39-VM to >>>>> complete, waited 1000ms >>>>> 2015-07-16 16:58:58,541 INFO [kvm.resource.LibvirtComputingResource] >>>>> (agentRequest-Handler-4:null) Waiting for migration of i-2-39-VM to >>>>> complete, waited 2000ms >>>>> 2015-07-16 16:58:59,542 INFO [kvm.resource.LibvirtComputingResource] >>>>> (agentRequest-Handler-4:null) Waiting for migration of i-2-39-VM to >>>>> complete, waited 3000ms >>>>> 2015-07-16 16:59:00,543 INFO [kvm.resource.LibvirtComputingResource] >>>>> (agentRequest-Handler-4:null) Waiting for migration of i-2-39-VM to >>>>> complete, waited 4000ms >>>>> 2015-07-16 16:59:01,245 INFO [kvm.resource.LibvirtComputingResource] >>>>> (agentRequest-Handler-4:null) Migration thread for i-2-39-VM is done >>>>> >>>>> It said done for my i-2-39-VM instance, but I can´t ping this host. >>>>> >>>>> Luciano >>>>> >>>>> >>>> -- >>>> Luciano Castro >>>> >>>> >>> > -- Luciano Castro
