Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

Andrei Mikhailovsky Mon, 04 Jan 2016 13:17:39 -0800

Hi Remi,

Thanks for your reply. However, your suggestion of increasing the 
router.aggregation.command.each.timeout didn't help. I've tried setting the 
value to 120 at no avail. Still fails with the same error.


Andrei

----- Original Message -----
> From: "Remi Bergsma" <[email protected]>
> To: "dev" <[email protected]>
> Sent: Monday, 4 January, 2016 10:44:43
> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

> Hi Andrei,
> 
> Missed that mail, sorry. I created a PR that allows for longer timeouts [1].
> 
> Also, you can bump the router.aggregation.command.each.timeout global setting 
> to
> say 15-30 so it will allow to boot.
> 
> Next, we need to find why it takes so long in the first place. In our
> environment it at least starts now.
> 
> Regards,
> Remi
> 
> [1] https://github.com/apache/cloudstack/pull/1291
> 
> 
> 
> 
> 
> On 04/01/16 11:41, "Andrei Mikhailovsky" <[email protected]> wrote:
> 
>>Hello guys,
>>
>>Tried the user's mailing list without any luck. Perhaps the dev guys know if
>>this issue is being looked at for the next release?
>>
>>I've just upgraded to 4.6.2 and have similar issues with three virtual routers
>>out of 22 in total. They are all failing exactly the same way as described
>>here.
>>
>>Has anyone found a permanent workaround for this issue?
>>
>>Thanks
>>
>>Andrei
>>
>>----- Original Message -----
>>> From: "Stephan Seitz" <[email protected]>
>>> To: "users" <[email protected]>
>>> Sent: Monday, 30 November, 2015 19:53:57
>>> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout
>>
>>> Does anybody else experiemce problems due to (very) slow deployment of
>>> VRs?
>>> 
>>> 
>>> Am Dienstag, den 24.11.2015, 16:31 +0100 schrieb Stephan Seitz:
>>>> Update / FYI:
>>>> After faking the particular VRu in sql, I tried to restart that
>>>> network,
>>>> and it always fails. To me it looks like the update_config.py - which
>>>> takes almost all cpu ressources - runs way longer any watchdog will
>>>> accept.
>>>> 
>>>> I'm able to mitigate that by very nasty workarounds:
>>>> a) start the router
>>>> b) wait until its provisioned
>>>> c) restart cloudstack-management
>>>> d)  update vm_instance
>>>> set state='Running',
>>>> power_state='PowerOn' where name = 'r-XXX-VM';
>>>> e) once: update domain_router
>>>> set template_version="Cloudstack Release 4.6.0 Wed Nov 4 08:22:47 UTC
>>>> 2015",
>>>> scripts_version="546c9e7ac38e0aa16ecc498899dac8e2"
>>>> where id=XXX;
>>>> f) wait until update_config.py finishes (for me thats about 15
>>>> minutes)
>>>> 
>>>> Since I expect the need for VR restarts in the future, this behaviour
>>>> is
>>>> somehow unsatisfying. It needs a lot of errorprone intervention.
>>>> 
>>>> I'm quite unsure if it's introduced with the update or the particular
>>>> VR
>>>> just has simply not been restarted after getting configured with lots
>>>> of
>>>> ips and rules.
>>>> 
>>>> 
>>>> Am Dienstag, den 24.11.2015, 12:29 +0100 schrieb Stephan Seitz:
>>>> > Hi List!
>>>> > 
>>>> > After upgrading from 4.5.2 to 4.6.0 I faced a problem with one
>>>> > virtualrouter. This particular VR has about 10 IPs w/ LB and FW
>>>> > rules
>>>> > defined. During the upgrade process, and after about 4-5 minutes a
>>>> > watchdog kicks in and kills the respective VR due to no response.
>>>> > 
>>>> > So far I didn't find any timeout value in the global settings.
>>>> > Temporarily setting network.router.EnableServiceMonitoring to false
>>>> > doesn't change the behaviour.
>>>> > 
>>>> > Any help, how to mitigate that nasty timeout would be really
>>>> > appreciated :)
>>>> > 
>>>> > cheers,
>>>> > 
>>>> > Stephan
>>>> > 
>>>> > From within the VR, the logs show
>>>> > 
>>>> > 2015-11-24 11:24:33,807  CsFile.py search:123 Searching for
>>>> > dhcp-range=interface:eth0,set:interface and replacing with
>>>> > dhcp-range=interface:eth0,set:interface-eth0,10.10.22.1,static
>>>> > 2015-11-24 11:24:33,808  merge.py load:56 Creating data bag type
>>>> > guestnetwork
>>>> > 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>>>> > dhcp-option=tag:interface-eth0,15 and replacing with
>>>> > dhcp-option=tag:interface-eth0,15,heinlein.cloudservice
>>>> > 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>>>> > dhcp-option=tag:interface-eth0,6 and replacing with
>>>> > dhcp-option=tag:interface
>>>> > -eth0,6,10.10.22.1,195.10.208.2,91.198.250.2
>>>> > 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>>>> > dhcp-option=tag:interface-eth0,3, and replacing with
>>>> > dhcp-option=tag:interface-eth0,3,10.10.22.1
>>>> > 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>>>> > dhcp-option=tag:interface-eth0,1, and replacing with
>>>> > dhcp-option=tag:interface-eth0,1,255.255.255.0
>>>> > 2015-11-24 11:24:33,810  CsHelper.py execute:160 Executing: service
>>>> > dnsmasq restart
>>>> > 
>>>> > ==> /var/log/messages <==
>>>> > Nov 24 11:24:34 r-504-VM shutdown[6752]: shutting down for system
>>>> > halt
>>>> > 
>>>> > Broadcast message from root@r-504-VM (Tue Nov 24 11:24:34 2015):
>>>> > 
>>>> > The system is going down for system halt NOW!
>>>> > Nov 24 11:24:35 r-504-VM KVP: KVP starting; pid is:6844
>>>> > 
>>>> > ==> /var/log/cloud.log <==
>>>> > /opt/cloud/bin/vr_cfg.sh: line 60:  6603
>>>> > Killed                  /opt/cloud/bin/update_config.py
>>>> > vm_dhcp_entry.json
>>>> > 
>>>> > ==> /var/log/messages <==
>>>> > Nov 24 11:24:35 r-504-VM cloud: VR config: executing
>>>> > failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>>>> > 
>>>> > ==> /var/log/cloud.log <==
>>>> > Tue Nov 24 11:24:35 UTC 2015 : VR config: executing
>>>> > failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>>>> > Connection to 169.254.2.192 closed by remote host.
>>>> > Connection to 169.254.2.192 closed.
>>>> > 
>>>> > 
>>>> > the management-server.log shows
>>>> > 
>>>> > 2015-11-24 12:24:43,015 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>>>> > (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Done executing
>>>> > com.cloud.vm.VmWorkStart for job-5164
>>>> > 2015-11-24 12:24:43,017 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
>>>> > (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Remove job
>>>> > -5164
>>>> > from job monitoring
>>>> > 2015-11-24 12:24:43,114 ERROR [c.c.a.ApiAsyncJobDispatcher]
>>>> > (API-Job-Executor-1:ctx-760da779 job-5163) Unexpected exception
>>>> > while
>>>> > executing org.apache.cloudstack.api.command.admin.
>>>> > router.StartRouterCmd
>>>> > com.cloud.exception.AgentUnavailableException: Resource [Host:1] is
>>>> > unreachable: Host 1: Unable to start instance due to Unable to
>>>> > start
>>>> > VM[DomainRouter|r-504-VM] due to error in f
>>>> > inalizeStart, not retrying
>>>> >         at
>>>> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>> > ineManagerImpl.java:1121)
>>>> >         at
>>>> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>> > ineManagerImpl.java:4580)
>>>> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>> > Method)
>>>> >         at
>>>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImp
>>>> > l.java:57)
>>>> >         at
>>>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcc
>>>> > essorImpl.java:43)
>>>> >         at java.lang.reflect.Method.invoke(Method.java:606)
>>>> >         at
>>>> > com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandler
>>>> > Proxy.java:107)
>>>> >         at
>>>> > com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachi
>>>> > neManagerImpl.java:4736)
>>>> >         at
>>>> > com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:10
>>>> > 2)
>>>> >         at
>>>> > org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>>>> > $5.runInContext(AsyncJobManagerImpl.java:537)
>>>> >         at
>>>> > org.apache.cloudstack.managed.context.ManagedContextRunnable
>>>> > $1.run(ManagedContextRunnable.java:49)
>>>> >         at
>>>> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext
>>>> > $1.call(DefaultManagedContext.java:56)
>>>> >         at
>>>> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ca
>>>> > llWithContext(DefaultManagedContext.java:103)
>>>> >         at
>>>> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ru
>>>> > nWithContext(DefaultManagedContext.java:53)
>>>> >         at
>>>> > org.apache.cloudstack.managed.context.ManagedContextRunnable.run(Ma
>>>> > nagedContextRunnable.java:46)
>>>> >         at
>>>> > org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>>>> > $5.run(AsyncJobManagerImpl.java:494)
>>>> >         at java.util.concurrent.Executors
>>>> > $RunnableAdapter.call(Executors.java:471)
>>>> >         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>> >         at
>>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto
>>>> > r.java:1145)
>>>> >         at java.util.concurrent.ThreadPoolExecutor
>>>> > $Worker.run(ThreadPoolExecutor.java:615)
>>>> >         at java.lang.Thread.run(Thread.java:745)
>>>> > Caused by: com.cloud.utils.exception.ExecutionException: Unable to
>>>> > start
>>>> > VM[DomainRouter|r-504-VM] due to error in finalizeStart, not
>>>> > retrying
>>>> >         at
>>>> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>> > ineManagerImpl.java:1085)
>>>> >         at
>>>> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>> > ineManagerImpl.java:4580)
>>>> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>> > Method)
>>>> >         ... 18 more
>>>> > 2015-11-24 12:24:43,115 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>>>> > (API-Job-Executor-1:ctx-760da779 job-5163) Complete async job-5163,
>>>> > jobStatus: FAILED, resultCode: 530, result: org.
>>>> > apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[
>>>> > ],"errorcode":530,"errortext":"Resource [Host:1] is unreachable:
>>>> > Host 1: Unable to start instance due to Unable t
>>>> > o start VM[DomainRouter|r-504-VM] due to error in finalizeStart,
>>>> > not
>>>> > retrying"}
>>>> > 
>>>> > 
>>>> > 
>>>> > 
>>>> > 
>>>> >

Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

Reply via email to