Hi Remi, Is this patched merged into 4.7.1 or 4.8.0, which was recently released? I am planning to do the upgrade and wanted to double check.
Thanks Andrei ----- Original Message ----- > From: "Remi Bergsma" <rberg...@schubergphilis.com> > To: "dev" <dev@cloudstack.apache.org> > Sent: Tuesday, 5 January, 2016 11:20:31 > Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout > Hi Andrei, > > You indeed need to build CloudStack for this to work. > > You can create packages with ./packaging/package.sh script in the source tree. > The PR is against 4.7 and when you create RPMs those will be 4.7.1-SHAPSHOT. I > do run this in production and it resolved the issue. Let me know if it works > for you too. > > Regards, > Remi > > > > > On 05/01/16 10:07, "Andrei Mikhailovsky" <and...@arhont.com> wrote: > >>Hi Remi, >> >>I've not tried the patch. I've missed it. Do I need to rebuild the ACS to >>apply >>the patch or would making changes to the two files suffice? >> >>Thanks >> >>Andrei >>----- Original Message ----- >>> From: "Remi Bergsma" <rberg...@schubergphilis.com> >>> To: "dev" <dev@cloudstack.apache.org> >>> Sent: Tuesday, 5 January, 2016 05:49:05 >>> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout >> >>> Hi Andrei, >>> >>> Did you try it in combination with the patch I created (PR1291)? You need >>> both >>> changes. >>> >>> Regards, Remi >>> >>> Sent from my iPhone >>> >>>> On 04 Jan 2016, at 22:17, Andrei Mikhailovsky <and...@arhont.com> wrote: >>>> >>>> Hi Remi, >>>> >>>> Thanks for your reply. However, your suggestion of increasing the >>>> router.aggregation.command.each.timeout didn't help. I've tried setting the >>>> value to 120 at no avail. Still fails with the same error. >>>> >>>> Andrei >>>> >>>> ----- Original Message ----- >>>>> From: "Remi Bergsma" <rberg...@schubergphilis.com> >>>>> To: "dev" <dev@cloudstack.apache.org> >>>>> Sent: Monday, 4 January, 2016 10:44:43 >>>>> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout >>>> >>>>> Hi Andrei, >>>>> >>>>> Missed that mail, sorry. I created a PR that allows for longer timeouts >>>>> [1]. >>>>> >>>>> Also, you can bump the router.aggregation.command.each.timeout global >>>>> setting to >>>>> say 15-30 so it will allow to boot. >>>>> >>>>> Next, we need to find why it takes so long in the first place. In our >>>>> environment it at least starts now. >>>>> >>>>> Regards, >>>>> Remi >>>>> >>>>> [1] https://github.com/apache/cloudstack/pull/1291 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> On 04/01/16 11:41, "Andrei Mikhailovsky" <and...@arhont.com> wrote: >>>>>> >>>>>> Hello guys, >>>>>> >>>>>> Tried the user's mailing list without any luck. Perhaps the dev guys >>>>>> know if >>>>>> this issue is being looked at for the next release? >>>>>> >>>>>> I've just upgraded to 4.6.2 and have similar issues with three virtual >>>>>> routers >>>>>> out of 22 in total. They are all failing exactly the same way as >>>>>> described >>>>>> here. >>>>>> >>>>>> Has anyone found a permanent workaround for this issue? >>>>>> >>>>>> Thanks >>>>>> >>>>>> Andrei >>>>>> >>>>>> ----- Original Message ----- >>>>>>> From: "Stephan Seitz" <s.se...@secretresearchfacility.com> >>>>>>> To: "users" <us...@cloudstack.apache.org> >>>>>>> Sent: Monday, 30 November, 2015 19:53:57 >>>>>>> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout >>>>>> >>>>>>> Does anybody else experiemce problems due to (very) slow deployment of >>>>>>> VRs? >>>>>>> >>>>>>> >>>>>>> Am Dienstag, den 24.11.2015, 16:31 +0100 schrieb Stephan Seitz: >>>>>>>> Update / FYI: >>>>>>>> After faking the particular VRu in sql, I tried to restart that >>>>>>>> network, >>>>>>>> and it always fails. To me it looks like the update_config.py - which >>>>>>>> takes almost all cpu ressources - runs way longer any watchdog will >>>>>>>> accept. >>>>>>>> >>>>>>>> I'm able to mitigate that by very nasty workarounds: >>>>>>>> a) start the router >>>>>>>> b) wait until its provisioned >>>>>>>> c) restart cloudstack-management >>>>>>>> d) update vm_instance >>>>>>>> set state='Running', >>>>>>>> power_state='PowerOn' where name = 'r-XXX-VM'; >>>>>>>> e) once: update domain_router >>>>>>>> set template_version="Cloudstack Release 4.6.0 Wed Nov 4 08:22:47 UTC >>>>>>>> 2015", >>>>>>>> scripts_version="546c9e7ac38e0aa16ecc498899dac8e2" >>>>>>>> where id=XXX; >>>>>>>> f) wait until update_config.py finishes (for me thats about 15 >>>>>>>> minutes) >>>>>>>> >>>>>>>> Since I expect the need for VR restarts in the future, this behaviour >>>>>>>> is >>>>>>>> somehow unsatisfying. It needs a lot of errorprone intervention. >>>>>>>> >>>>>>>> I'm quite unsure if it's introduced with the update or the particular >>>>>>>> VR >>>>>>>> just has simply not been restarted after getting configured with lots >>>>>>>> of >>>>>>>> ips and rules. >>>>>>>> >>>>>>>> >>>>>>>> Am Dienstag, den 24.11.2015, 12:29 +0100 schrieb Stephan Seitz: >>>>>>>>> Hi List! >>>>>>>>> >>>>>>>>> After upgrading from 4.5.2 to 4.6.0 I faced a problem with one >>>>>>>>> virtualrouter. This particular VR has about 10 IPs w/ LB and FW >>>>>>>>> rules >>>>>>>>> defined. During the upgrade process, and after about 4-5 minutes a >>>>>>>>> watchdog kicks in and kills the respective VR due to no response. >>>>>>>>> >>>>>>>>> So far I didn't find any timeout value in the global settings. >>>>>>>>> Temporarily setting network.router.EnableServiceMonitoring to false >>>>>>>>> doesn't change the behaviour. >>>>>>>>> >>>>>>>>> Any help, how to mitigate that nasty timeout would be really >>>>>>>>> appreciated :) >>>>>>>>> >>>>>>>>> cheers, >>>>>>>>> >>>>>>>>> Stephan >>>>>>>>> >>>>>>>>> From within the VR, the logs show >>>>>>>>> >>>>>>>>> 2015-11-24 11:24:33,807 CsFile.py search:123 Searching for >>>>>>>>> dhcp-range=interface:eth0,set:interface and replacing with >>>>>>>>> dhcp-range=interface:eth0,set:interface-eth0,10.10.22.1,static >>>>>>>>> 2015-11-24 11:24:33,808 merge.py load:56 Creating data bag type >>>>>>>>> guestnetwork >>>>>>>>> 2015-11-24 11:24:33,808 CsFile.py search:123 Searching for >>>>>>>>> dhcp-option=tag:interface-eth0,15 and replacing with >>>>>>>>> dhcp-option=tag:interface-eth0,15,heinlein.cloudservice >>>>>>>>> 2015-11-24 11:24:33,808 CsFile.py search:123 Searching for >>>>>>>>> dhcp-option=tag:interface-eth0,6 and replacing with >>>>>>>>> dhcp-option=tag:interface >>>>>>>>> -eth0,6,10.10.22.1,195.10.208.2,91.198.250.2 >>>>>>>>> 2015-11-24 11:24:33,809 CsFile.py search:123 Searching for >>>>>>>>> dhcp-option=tag:interface-eth0,3, and replacing with >>>>>>>>> dhcp-option=tag:interface-eth0,3,10.10.22.1 >>>>>>>>> 2015-11-24 11:24:33,809 CsFile.py search:123 Searching for >>>>>>>>> dhcp-option=tag:interface-eth0,1, and replacing with >>>>>>>>> dhcp-option=tag:interface-eth0,1,255.255.255.0 >>>>>>>>> 2015-11-24 11:24:33,810 CsHelper.py execute:160 Executing: service >>>>>>>>> dnsmasq restart >>>>>>>>> >>>>>>>>> ==> /var/log/messages <== >>>>>>>>> Nov 24 11:24:34 r-504-VM shutdown[6752]: shutting down for system >>>>>>>>> halt >>>>>>>>> >>>>>>>>> Broadcast message from root@r-504-VM (Tue Nov 24 11:24:34 2015): >>>>>>>>> >>>>>>>>> The system is going down for system halt NOW! >>>>>>>>> Nov 24 11:24:35 r-504-VM KVP: KVP starting; pid is:6844 >>>>>>>>> >>>>>>>>> ==> /var/log/cloud.log <== >>>>>>>>> /opt/cloud/bin/vr_cfg.sh: line 60: 6603 >>>>>>>>> Killed /opt/cloud/bin/update_config.py >>>>>>>>> vm_dhcp_entry.json >>>>>>>>> >>>>>>>>> ==> /var/log/messages <== >>>>>>>>> Nov 24 11:24:35 r-504-VM cloud: VR config: executing >>>>>>>>> failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json >>>>>>>>> >>>>>>>>> ==> /var/log/cloud.log <== >>>>>>>>> Tue Nov 24 11:24:35 UTC 2015 : VR config: executing >>>>>>>>> failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json >>>>>>>>> Connection to 169.254.2.192 closed by remote host. >>>>>>>>> Connection to 169.254.2.192 closed. >>>>>>>>> >>>>>>>>> >>>>>>>>> the management-server.log shows >>>>>>>>> >>>>>>>>> 2015-11-24 12:24:43,015 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] >>>>>>>>> (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Done executing >>>>>>>>> com.cloud.vm.VmWorkStart for job-5164 >>>>>>>>> 2015-11-24 12:24:43,017 INFO [o.a.c.f.j.i.AsyncJobMonitor] >>>>>>>>> (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Remove job >>>>>>>>> -5164 >>>>>>>>> from job monitoring >>>>>>>>> 2015-11-24 12:24:43,114 ERROR [c.c.a.ApiAsyncJobDispatcher] >>>>>>>>> (API-Job-Executor-1:ctx-760da779 job-5163) Unexpected exception >>>>>>>>> while >>>>>>>>> executing org.apache.cloudstack.api.command.admin. >>>>>>>>> router.StartRouterCmd >>>>>>>>> com.cloud.exception.AgentUnavailableException: Resource [Host:1] is >>>>>>>>> unreachable: Host 1: Unable to start instance due to Unable to >>>>>>>>> start >>>>>>>>> VM[DomainRouter|r-504-VM] due to error in f >>>>>>>>> inalizeStart, not retrying >>>>>>>>> at >>>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach >>>>>>>>> ineManagerImpl.java:1121) >>>>>>>>> at >>>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach >>>>>>>>> ineManagerImpl.java:4580) >>>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >>>>>>>>> Method) >>>>>>>>> at >>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImp >>>>>>>>> l.java:57) >>>>>>>>> at >>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcc >>>>>>>>> essorImpl.java:43) >>>>>>>>> at java.lang.reflect.Method.invoke(Method.java:606) >>>>>>>>> at >>>>>>>>> com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandler >>>>>>>>> Proxy.java:107) >>>>>>>>> at >>>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachi >>>>>>>>> neManagerImpl.java:4736) >>>>>>>>> at >>>>>>>>> com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:10 >>>>>>>>> 2) >>>>>>>>> at >>>>>>>>> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl >>>>>>>>> $5.runInContext(AsyncJobManagerImpl.java:537) >>>>>>>>> at >>>>>>>>> org.apache.cloudstack.managed.context.ManagedContextRunnable >>>>>>>>> $1.run(ManagedContextRunnable.java:49) >>>>>>>>> at >>>>>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext >>>>>>>>> $1.call(DefaultManagedContext.java:56) >>>>>>>>> at >>>>>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ca >>>>>>>>> llWithContext(DefaultManagedContext.java:103) >>>>>>>>> at >>>>>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ru >>>>>>>>> nWithContext(DefaultManagedContext.java:53) >>>>>>>>> at >>>>>>>>> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(Ma >>>>>>>>> nagedContextRunnable.java:46) >>>>>>>>> at >>>>>>>>> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl >>>>>>>>> $5.run(AsyncJobManagerImpl.java:494) >>>>>>>>> at java.util.concurrent.Executors >>>>>>>>> $RunnableAdapter.call(Executors.java:471) >>>>>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>>>>>>> at >>>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto >>>>>>>>> r.java:1145) >>>>>>>>> at java.util.concurrent.ThreadPoolExecutor >>>>>>>>> $Worker.run(ThreadPoolExecutor.java:615) >>>>>>>>> at java.lang.Thread.run(Thread.java:745) >>>>>>>>> Caused by: com.cloud.utils.exception.ExecutionException: Unable to >>>>>>>>> start >>>>>>>>> VM[DomainRouter|r-504-VM] due to error in finalizeStart, not >>>>>>>>> retrying >>>>>>>>> at >>>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach >>>>>>>>> ineManagerImpl.java:1085) >>>>>>>>> at >>>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach >>>>>>>>> ineManagerImpl.java:4580) >>>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >>>>>>>>> Method) >>>>>>>>> ... 18 more >>>>>>>>> 2015-11-24 12:24:43,115 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] >>>>>>>>> (API-Job-Executor-1:ctx-760da779 job-5163) Complete async job-5163, >>>>>>>>> jobStatus: FAILED, resultCode: 530, result: org. >>>>>>>>> apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[ >>>>>>>>> ],"errorcode":530,"errortext":"Resource [Host:1] is unreachable: >>>>>>>>> Host 1: Unable to start instance due to Unable t >>>>>>>>> o start VM[DomainRouter|r-504-VM] due to error in finalizeStart, >>>>>>>>> not >>>>>>>>> retrying"} >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>