Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

Stephan Seitz Mon, 30 Nov 2015 11:55:20 -0800

Does anybody else experiemce problems due to (very) slow deployment of
VRs?



Am Dienstag, den 24.11.2015, 16:31 +0100 schrieb Stephan Seitz:
> Update / FYI:
> After faking the particular VRu in sql, I tried to restart that
> network,
> and it always fails. To me it looks like the update_config.py - which
> takes almost all cpu ressources - runs way longer any watchdog will
> accept.
> 
> I'm able to mitigate that by very nasty workarounds:
> a) start the router
> b) wait until its provisioned
> c) restart cloudstack-management
> d)  update vm_instance
> set state='Running',
> power_state='PowerOn' where name = 'r-XXX-VM';
> e) once: update domain_router
> set template_version="Cloudstack Release 4.6.0 Wed Nov 4 08:22:47 UTC
> 2015",
> scripts_version="546c9e7ac38e0aa16ecc498899dac8e2"
> where id=XXX;
> f) wait until update_config.py finishes (for me thats about 15
> minutes)
> 
> Since I expect the need for VR restarts in the future, this behaviour
> is
> somehow unsatisfying. It needs a lot of errorprone intervention.
> 
> I'm quite unsure if it's introduced with the update or the particular
> VR
> just has simply not been restarted after getting configured with lots
> of
> ips and rules.
> 
> 
> Am Dienstag, den 24.11.2015, 12:29 +0100 schrieb Stephan Seitz: 
> > Hi List!
> > 
> > After upgrading from 4.5.2 to 4.6.0 I faced a problem with one
> > virtualrouter. This particular VR has about 10 IPs w/ LB and FW
> > rules
> > defined. During the upgrade process, and after about 4-5 minutes a
> > watchdog kicks in and kills the respective VR due to no response.
> > 
> > So far I didn't find any timeout value in the global settings.
> > Temporarily setting network.router.EnableServiceMonitoring to false
> > doesn't change the behaviour.
> > 
> > Any help, how to mitigate that nasty timeout would be really
> > appreciated :)
> > 
> > cheers,
> > 
> > Stephan 
> > 
> > From within the VR, the logs show
> > 
> > 2015-11-24 11:24:33,807  CsFile.py search:123 Searching for
> > dhcp-range=interface:eth0,set:interface and replacing with
> > dhcp-range=interface:eth0,set:interface-eth0,10.10.22.1,static
> > 2015-11-24 11:24:33,808  merge.py load:56 Creating data bag type
> > guestnetwork
> > 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
> > dhcp-option=tag:interface-eth0,15 and replacing with
> > dhcp-option=tag:interface-eth0,15,heinlein.cloudservice
> > 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
> > dhcp-option=tag:interface-eth0,6 and replacing with
> > dhcp-option=tag:interface
> > -eth0,6,10.10.22.1,195.10.208.2,91.198.250.2
> > 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
> > dhcp-option=tag:interface-eth0,3, and replacing with
> > dhcp-option=tag:interface-eth0,3,10.10.22.1
> > 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
> > dhcp-option=tag:interface-eth0,1, and replacing with
> > dhcp-option=tag:interface-eth0,1,255.255.255.0
> > 2015-11-24 11:24:33,810  CsHelper.py execute:160 Executing: service
> > dnsmasq restart
> > 
> > ==> /var/log/messages <==
> > Nov 24 11:24:34 r-504-VM shutdown[6752]: shutting down for system
> > halt
> > 
> > Broadcast message from root@r-504-VM (Tue Nov 24 11:24:34 2015):
> > 
> > The system is going down for system halt NOW!
> > Nov 24 11:24:35 r-504-VM KVP: KVP starting; pid is:6844
> > 
> > ==> /var/log/cloud.log <==
> > /opt/cloud/bin/vr_cfg.sh: line 60:  6603
> > Killed                  /opt/cloud/bin/update_config.py
> > vm_dhcp_entry.json
> > 
> > ==> /var/log/messages <==
> > Nov 24 11:24:35 r-504-VM cloud: VR config: executing
> > failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
> > 
> > ==> /var/log/cloud.log <==
> > Tue Nov 24 11:24:35 UTC 2015 : VR config: executing
> > failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
> > Connection to 169.254.2.192 closed by remote host.
> > Connection to 169.254.2.192 closed.
> > 
> > 
> > the management-server.log shows
> > 
> > 2015-11-24 12:24:43,015 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> > (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Done executing
> > com.cloud.vm.VmWorkStart for job-5164
> > 2015-11-24 12:24:43,017 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
> > (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Remove job
> > -5164
> > from job monitoring
> > 2015-11-24 12:24:43,114 ERROR [c.c.a.ApiAsyncJobDispatcher]
> > (API-Job-Executor-1:ctx-760da779 job-5163) Unexpected exception
> > while
> > executing org.apache.cloudstack.api.command.admin.
> > router.StartRouterCmd
> > com.cloud.exception.AgentUnavailableException: Resource [Host:1] is
> > unreachable: Host 1: Unable to start instance due to Unable to
> > start
> > VM[DomainRouter|r-504-VM] due to error in f
> > inalizeStart, not retrying
> >         at
> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
> > ineManagerImpl.java:1121)
> >         at
> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
> > ineManagerImpl.java:4580)
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > Method)
> >         at
> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImp
> > l.java:57)
> >         at
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcc
> > essorImpl.java:43)
> >         at java.lang.reflect.Method.invoke(Method.java:606)
> >         at
> > com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandler
> > Proxy.java:107)
> >         at
> > com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachi
> > neManagerImpl.java:4736)
> >         at
> > com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:10
> > 2)
> >         at
> > org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
> > $5.runInContext(AsyncJobManagerImpl.java:537)
> >         at
> > org.apache.cloudstack.managed.context.ManagedContextRunnable
> > $1.run(ManagedContextRunnable.java:49)
> >         at
> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext
> > $1.call(DefaultManagedContext.java:56)
> >         at
> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ca
> > llWithContext(DefaultManagedContext.java:103)
> >         at
> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ru
> > nWithContext(DefaultManagedContext.java:53)
> >         at
> > org.apache.cloudstack.managed.context.ManagedContextRunnable.run(Ma
> > nagedContextRunnable.java:46)
> >         at
> > org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
> > $5.run(AsyncJobManagerImpl.java:494)
> >         at java.util.concurrent.Executors
> > $RunnableAdapter.call(Executors.java:471)
> >         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> >         at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto
> > r.java:1145)
> >         at java.util.concurrent.ThreadPoolExecutor
> > $Worker.run(ThreadPoolExecutor.java:615)
> >         at java.lang.Thread.run(Thread.java:745)
> > Caused by: com.cloud.utils.exception.ExecutionException: Unable to
> > start
> > VM[DomainRouter|r-504-VM] due to error in finalizeStart, not
> > retrying
> >         at
> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
> > ineManagerImpl.java:1085)
> >         at
> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
> > ineManagerImpl.java:4580)
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > Method)
> >         ... 18 more
> > 2015-11-24 12:24:43,115 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> > (API-Job-Executor-1:ctx-760da779 job-5163) Complete async job-5163,
> > jobStatus: FAILED, resultCode: 530, result: org.
> > apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[
> > ],"errorcode":530,"errortext":"Resource [Host:1] is unreachable:
> > Host 1: Unable to start instance due to Unable t
> > o start VM[DomainRouter|r-504-VM] due to error in finalizeStart,
> > not
> > retrying"}
> > 
> > 
> > 
> > 
> > 
> > 
> 
>

signature.asc
Description: This is a digitally signed message part

Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

Reply via email to