Hi Andrei, You indeed need to build CloudStack for this to work.
You can create packages with ./packaging/package.sh script in the source tree. The PR is against 4.7 and when you create RPMs those will be 4.7.1-SHAPSHOT. I do run this in production and it resolved the issue. Let me know if it works for you too. Regards, Remi On 05/01/16 10:07, "Andrei Mikhailovsky" <and...@arhont.com> wrote: >Hi Remi, > >I've not tried the patch. I've missed it. Do I need to rebuild the ACS to >apply the patch or would making changes to the two files suffice? > >Thanks > >Andrei >----- Original Message ----- >> From: "Remi Bergsma" <rberg...@schubergphilis.com> >> To: "dev" <dev@cloudstack.apache.org> >> Sent: Tuesday, 5 January, 2016 05:49:05 >> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout > >> Hi Andrei, >> >> Did you try it in combination with the patch I created (PR1291)? You need >> both >> changes. >> >> Regards, Remi >> >> Sent from my iPhone >> >>> On 04 Jan 2016, at 22:17, Andrei Mikhailovsky <and...@arhont.com> wrote: >>> >>> Hi Remi, >>> >>> Thanks for your reply. However, your suggestion of increasing the >>> router.aggregation.command.each.timeout didn't help. I've tried setting the >>> value to 120 at no avail. Still fails with the same error. >>> >>> Andrei >>> >>> ----- Original Message ----- >>>> From: "Remi Bergsma" <rberg...@schubergphilis.com> >>>> To: "dev" <dev@cloudstack.apache.org> >>>> Sent: Monday, 4 January, 2016 10:44:43 >>>> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout >>> >>>> Hi Andrei, >>>> >>>> Missed that mail, sorry. I created a PR that allows for longer timeouts >>>> [1]. >>>> >>>> Also, you can bump the router.aggregation.command.each.timeout global >>>> setting to >>>> say 15-30 so it will allow to boot. >>>> >>>> Next, we need to find why it takes so long in the first place. In our >>>> environment it at least starts now. >>>> >>>> Regards, >>>> Remi >>>> >>>> [1] https://github.com/apache/cloudstack/pull/1291 >>>> >>>> >>>> >>>> >>>> >>>>> On 04/01/16 11:41, "Andrei Mikhailovsky" <and...@arhont.com> wrote: >>>>> >>>>> Hello guys, >>>>> >>>>> Tried the user's mailing list without any luck. Perhaps the dev guys know >>>>> if >>>>> this issue is being looked at for the next release? >>>>> >>>>> I've just upgraded to 4.6.2 and have similar issues with three virtual >>>>> routers >>>>> out of 22 in total. They are all failing exactly the same way as described >>>>> here. >>>>> >>>>> Has anyone found a permanent workaround for this issue? >>>>> >>>>> Thanks >>>>> >>>>> Andrei >>>>> >>>>> ----- Original Message ----- >>>>>> From: "Stephan Seitz" <s.se...@secretresearchfacility.com> >>>>>> To: "users" <us...@cloudstack.apache.org> >>>>>> Sent: Monday, 30 November, 2015 19:53:57 >>>>>> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout >>>>> >>>>>> Does anybody else experiemce problems due to (very) slow deployment of >>>>>> VRs? >>>>>> >>>>>> >>>>>> Am Dienstag, den 24.11.2015, 16:31 +0100 schrieb Stephan Seitz: >>>>>>> Update / FYI: >>>>>>> After faking the particular VRu in sql, I tried to restart that >>>>>>> network, >>>>>>> and it always fails. To me it looks like the update_config.py - which >>>>>>> takes almost all cpu ressources - runs way longer any watchdog will >>>>>>> accept. >>>>>>> >>>>>>> I'm able to mitigate that by very nasty workarounds: >>>>>>> a) start the router >>>>>>> b) wait until its provisioned >>>>>>> c) restart cloudstack-management >>>>>>> d) update vm_instance >>>>>>> set state='Running', >>>>>>> power_state='PowerOn' where name = 'r-XXX-VM'; >>>>>>> e) once: update domain_router >>>>>>> set template_version="Cloudstack Release 4.6.0 Wed Nov 4 08:22:47 UTC >>>>>>> 2015", >>>>>>> scripts_version="546c9e7ac38e0aa16ecc498899dac8e2" >>>>>>> where id=XXX; >>>>>>> f) wait until update_config.py finishes (for me thats about 15 >>>>>>> minutes) >>>>>>> >>>>>>> Since I expect the need for VR restarts in the future, this behaviour >>>>>>> is >>>>>>> somehow unsatisfying. It needs a lot of errorprone intervention. >>>>>>> >>>>>>> I'm quite unsure if it's introduced with the update or the particular >>>>>>> VR >>>>>>> just has simply not been restarted after getting configured with lots >>>>>>> of >>>>>>> ips and rules. >>>>>>> >>>>>>> >>>>>>> Am Dienstag, den 24.11.2015, 12:29 +0100 schrieb Stephan Seitz: >>>>>>>> Hi List! >>>>>>>> >>>>>>>> After upgrading from 4.5.2 to 4.6.0 I faced a problem with one >>>>>>>> virtualrouter. This particular VR has about 10 IPs w/ LB and FW >>>>>>>> rules >>>>>>>> defined. During the upgrade process, and after about 4-5 minutes a >>>>>>>> watchdog kicks in and kills the respective VR due to no response. >>>>>>>> >>>>>>>> So far I didn't find any timeout value in the global settings. >>>>>>>> Temporarily setting network.router.EnableServiceMonitoring to false >>>>>>>> doesn't change the behaviour. >>>>>>>> >>>>>>>> Any help, how to mitigate that nasty timeout would be really >>>>>>>> appreciated :) >>>>>>>> >>>>>>>> cheers, >>>>>>>> >>>>>>>> Stephan >>>>>>>> >>>>>>>> From within the VR, the logs show >>>>>>>> >>>>>>>> 2015-11-24 11:24:33,807 CsFile.py search:123 Searching for >>>>>>>> dhcp-range=interface:eth0,set:interface and replacing with >>>>>>>> dhcp-range=interface:eth0,set:interface-eth0,10.10.22.1,static >>>>>>>> 2015-11-24 11:24:33,808 merge.py load:56 Creating data bag type >>>>>>>> guestnetwork >>>>>>>> 2015-11-24 11:24:33,808 CsFile.py search:123 Searching for >>>>>>>> dhcp-option=tag:interface-eth0,15 and replacing with >>>>>>>> dhcp-option=tag:interface-eth0,15,heinlein.cloudservice >>>>>>>> 2015-11-24 11:24:33,808 CsFile.py search:123 Searching for >>>>>>>> dhcp-option=tag:interface-eth0,6 and replacing with >>>>>>>> dhcp-option=tag:interface >>>>>>>> -eth0,6,10.10.22.1,195.10.208.2,91.198.250.2 >>>>>>>> 2015-11-24 11:24:33,809 CsFile.py search:123 Searching for >>>>>>>> dhcp-option=tag:interface-eth0,3, and replacing with >>>>>>>> dhcp-option=tag:interface-eth0,3,10.10.22.1 >>>>>>>> 2015-11-24 11:24:33,809 CsFile.py search:123 Searching for >>>>>>>> dhcp-option=tag:interface-eth0,1, and replacing with >>>>>>>> dhcp-option=tag:interface-eth0,1,255.255.255.0 >>>>>>>> 2015-11-24 11:24:33,810 CsHelper.py execute:160 Executing: service >>>>>>>> dnsmasq restart >>>>>>>> >>>>>>>> ==> /var/log/messages <== >>>>>>>> Nov 24 11:24:34 r-504-VM shutdown[6752]: shutting down for system >>>>>>>> halt >>>>>>>> >>>>>>>> Broadcast message from root@r-504-VM (Tue Nov 24 11:24:34 2015): >>>>>>>> >>>>>>>> The system is going down for system halt NOW! >>>>>>>> Nov 24 11:24:35 r-504-VM KVP: KVP starting; pid is:6844 >>>>>>>> >>>>>>>> ==> /var/log/cloud.log <== >>>>>>>> /opt/cloud/bin/vr_cfg.sh: line 60: 6603 >>>>>>>> Killed /opt/cloud/bin/update_config.py >>>>>>>> vm_dhcp_entry.json >>>>>>>> >>>>>>>> ==> /var/log/messages <== >>>>>>>> Nov 24 11:24:35 r-504-VM cloud: VR config: executing >>>>>>>> failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json >>>>>>>> >>>>>>>> ==> /var/log/cloud.log <== >>>>>>>> Tue Nov 24 11:24:35 UTC 2015 : VR config: executing >>>>>>>> failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json >>>>>>>> Connection to 169.254.2.192 closed by remote host. >>>>>>>> Connection to 169.254.2.192 closed. >>>>>>>> >>>>>>>> >>>>>>>> the management-server.log shows >>>>>>>> >>>>>>>> 2015-11-24 12:24:43,015 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] >>>>>>>> (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Done executing >>>>>>>> com.cloud.vm.VmWorkStart for job-5164 >>>>>>>> 2015-11-24 12:24:43,017 INFO [o.a.c.f.j.i.AsyncJobMonitor] >>>>>>>> (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Remove job >>>>>>>> -5164 >>>>>>>> from job monitoring >>>>>>>> 2015-11-24 12:24:43,114 ERROR [c.c.a.ApiAsyncJobDispatcher] >>>>>>>> (API-Job-Executor-1:ctx-760da779 job-5163) Unexpected exception >>>>>>>> while >>>>>>>> executing org.apache.cloudstack.api.command.admin. >>>>>>>> router.StartRouterCmd >>>>>>>> com.cloud.exception.AgentUnavailableException: Resource [Host:1] is >>>>>>>> unreachable: Host 1: Unable to start instance due to Unable to >>>>>>>> start >>>>>>>> VM[DomainRouter|r-504-VM] due to error in f >>>>>>>> inalizeStart, not retrying >>>>>>>> at >>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach >>>>>>>> ineManagerImpl.java:1121) >>>>>>>> at >>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach >>>>>>>> ineManagerImpl.java:4580) >>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >>>>>>>> Method) >>>>>>>> at >>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImp >>>>>>>> l.java:57) >>>>>>>> at >>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcc >>>>>>>> essorImpl.java:43) >>>>>>>> at java.lang.reflect.Method.invoke(Method.java:606) >>>>>>>> at >>>>>>>> com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandler >>>>>>>> Proxy.java:107) >>>>>>>> at >>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachi >>>>>>>> neManagerImpl.java:4736) >>>>>>>> at >>>>>>>> com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:10 >>>>>>>> 2) >>>>>>>> at >>>>>>>> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl >>>>>>>> $5.runInContext(AsyncJobManagerImpl.java:537) >>>>>>>> at >>>>>>>> org.apache.cloudstack.managed.context.ManagedContextRunnable >>>>>>>> $1.run(ManagedContextRunnable.java:49) >>>>>>>> at >>>>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext >>>>>>>> $1.call(DefaultManagedContext.java:56) >>>>>>>> at >>>>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ca >>>>>>>> llWithContext(DefaultManagedContext.java:103) >>>>>>>> at >>>>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ru >>>>>>>> nWithContext(DefaultManagedContext.java:53) >>>>>>>> at >>>>>>>> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(Ma >>>>>>>> nagedContextRunnable.java:46) >>>>>>>> at >>>>>>>> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl >>>>>>>> $5.run(AsyncJobManagerImpl.java:494) >>>>>>>> at java.util.concurrent.Executors >>>>>>>> $RunnableAdapter.call(Executors.java:471) >>>>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>>>>>> at >>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto >>>>>>>> r.java:1145) >>>>>>>> at java.util.concurrent.ThreadPoolExecutor >>>>>>>> $Worker.run(ThreadPoolExecutor.java:615) >>>>>>>> at java.lang.Thread.run(Thread.java:745) >>>>>>>> Caused by: com.cloud.utils.exception.ExecutionException: Unable to >>>>>>>> start >>>>>>>> VM[DomainRouter|r-504-VM] due to error in finalizeStart, not >>>>>>>> retrying >>>>>>>> at >>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach >>>>>>>> ineManagerImpl.java:1085) >>>>>>>> at >>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach >>>>>>>> ineManagerImpl.java:4580) >>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >>>>>>>> Method) >>>>>>>> ... 18 more >>>>>>>> 2015-11-24 12:24:43,115 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] >>>>>>>> (API-Job-Executor-1:ctx-760da779 job-5163) Complete async job-5163, >>>>>>>> jobStatus: FAILED, resultCode: 530, result: org. >>>>>>>> apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[ >>>>>>>> ],"errorcode":530,"errortext":"Resource [Host:1] is unreachable: >>>>>>>> Host 1: Unable to start instance due to Unable t >>>>>>>> o start VM[DomainRouter|r-504-VM] due to error in finalizeStart, >>>>>>>> not >>>>>>>> retrying"} >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>