Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

Andrei Mikhailovsky Mon, 01 Feb 2016 02:12:07 -0800

Hi Remi,

Is this patched merged into 4.7.1 or 4.8.0, which was recently released? I am 
planning to do the upgrade and wanted to double check.


Thanks

Andrei
----- Original Message -----
> From: "Remi Bergsma" <rberg...@schubergphilis.com>
> To: "dev" <dev@cloudstack.apache.org>
> Sent: Tuesday, 5 January, 2016 11:20:31
> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

> Hi Andrei,
> 
> You indeed need to build CloudStack for this to work.
> 
> You can create packages with ./packaging/package.sh script in the source tree.
> The PR is against 4.7 and when you create RPMs those will be 4.7.1-SHAPSHOT. I
> do run this in production and it resolved the issue. Let me know if it works
> for you too.
> 
> Regards,
> Remi
> 
> 
> 
> 
> On 05/01/16 10:07, "Andrei Mikhailovsky" <and...@arhont.com> wrote:
> 
>>Hi Remi,
>>
>>I've not tried the patch. I've missed it. Do I need to rebuild the ACS to 
>>apply
>>the patch or would making changes to the two files suffice?
>>
>>Thanks
>>
>>Andrei
>>----- Original Message -----
>>> From: "Remi Bergsma" <rberg...@schubergphilis.com>
>>> To: "dev" <dev@cloudstack.apache.org>
>>> Sent: Tuesday, 5 January, 2016 05:49:05
>>> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout
>>
>>> Hi Andrei,
>>> 
>>> Did you try it in combination with the patch I created (PR1291)? You need 
>>> both
>>> changes.
>>> 
>>> Regards, Remi
>>> 
>>> Sent from my iPhone
>>> 
>>>> On 04 Jan 2016, at 22:17, Andrei Mikhailovsky <and...@arhont.com> wrote:
>>>> 
>>>> Hi Remi,
>>>> 
>>>> Thanks for your reply. However, your suggestion of increasing the
>>>> router.aggregation.command.each.timeout didn't help. I've tried setting the
>>>> value to 120 at no avail. Still fails with the same error.
>>>> 
>>>> Andrei
>>>> 
>>>> ----- Original Message -----
>>>>> From: "Remi Bergsma" <rberg...@schubergphilis.com>
>>>>> To: "dev" <dev@cloudstack.apache.org>
>>>>> Sent: Monday, 4 January, 2016 10:44:43
>>>>> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout
>>>> 
>>>>> Hi Andrei,
>>>>> 
>>>>> Missed that mail, sorry. I created a PR that allows for longer timeouts 
>>>>> [1].
>>>>> 
>>>>> Also, you can bump the router.aggregation.command.each.timeout global 
>>>>> setting to
>>>>> say 15-30 so it will allow to boot.
>>>>> 
>>>>> Next, we need to find why it takes so long in the first place. In our
>>>>> environment it at least starts now.
>>>>> 
>>>>> Regards,
>>>>> Remi
>>>>> 
>>>>> [1] https://github.com/apache/cloudstack/pull/1291
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 04/01/16 11:41, "Andrei Mikhailovsky" <and...@arhont.com> wrote:
>>>>>> 
>>>>>> Hello guys,
>>>>>> 
>>>>>> Tried the user's mailing list without any luck. Perhaps the dev guys 
>>>>>> know if
>>>>>> this issue is being looked at for the next release?
>>>>>> 
>>>>>> I've just upgraded to 4.6.2 and have similar issues with three virtual 
>>>>>> routers
>>>>>> out of 22 in total. They are all failing exactly the same way as 
>>>>>> described
>>>>>> here.
>>>>>> 
>>>>>> Has anyone found a permanent workaround for this issue?
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>> Andrei
>>>>>> 
>>>>>> ----- Original Message -----
>>>>>>> From: "Stephan Seitz" <s.se...@secretresearchfacility.com>
>>>>>>> To: "users" <us...@cloudstack.apache.org>
>>>>>>> Sent: Monday, 30 November, 2015 19:53:57
>>>>>>> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout
>>>>>> 
>>>>>>> Does anybody else experiemce problems due to (very) slow deployment of
>>>>>>> VRs?
>>>>>>> 
>>>>>>> 
>>>>>>> Am Dienstag, den 24.11.2015, 16:31 +0100 schrieb Stephan Seitz:
>>>>>>>> Update / FYI:
>>>>>>>> After faking the particular VRu in sql, I tried to restart that
>>>>>>>> network,
>>>>>>>> and it always fails. To me it looks like the update_config.py - which
>>>>>>>> takes almost all cpu ressources - runs way longer any watchdog will
>>>>>>>> accept.
>>>>>>>> 
>>>>>>>> I'm able to mitigate that by very nasty workarounds:
>>>>>>>> a) start the router
>>>>>>>> b) wait until its provisioned
>>>>>>>> c) restart cloudstack-management
>>>>>>>> d)  update vm_instance
>>>>>>>> set state='Running',
>>>>>>>> power_state='PowerOn' where name = 'r-XXX-VM';
>>>>>>>> e) once: update domain_router
>>>>>>>> set template_version="Cloudstack Release 4.6.0 Wed Nov 4 08:22:47 UTC
>>>>>>>> 2015",
>>>>>>>> scripts_version="546c9e7ac38e0aa16ecc498899dac8e2"
>>>>>>>> where id=XXX;
>>>>>>>> f) wait until update_config.py finishes (for me thats about 15
>>>>>>>> minutes)
>>>>>>>> 
>>>>>>>> Since I expect the need for VR restarts in the future, this behaviour
>>>>>>>> is
>>>>>>>> somehow unsatisfying. It needs a lot of errorprone intervention.
>>>>>>>> 
>>>>>>>> I'm quite unsure if it's introduced with the update or the particular
>>>>>>>> VR
>>>>>>>> just has simply not been restarted after getting configured with lots
>>>>>>>> of
>>>>>>>> ips and rules.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Am Dienstag, den 24.11.2015, 12:29 +0100 schrieb Stephan Seitz:
>>>>>>>>> Hi List!
>>>>>>>>> 
>>>>>>>>> After upgrading from 4.5.2 to 4.6.0 I faced a problem with one
>>>>>>>>> virtualrouter. This particular VR has about 10 IPs w/ LB and FW
>>>>>>>>> rules
>>>>>>>>> defined. During the upgrade process, and after about 4-5 minutes a
>>>>>>>>> watchdog kicks in and kills the respective VR due to no response.
>>>>>>>>> 
>>>>>>>>> So far I didn't find any timeout value in the global settings.
>>>>>>>>> Temporarily setting network.router.EnableServiceMonitoring to false
>>>>>>>>> doesn't change the behaviour.
>>>>>>>>> 
>>>>>>>>> Any help, how to mitigate that nasty timeout would be really
>>>>>>>>> appreciated :)
>>>>>>>>> 
>>>>>>>>> cheers,
>>>>>>>>> 
>>>>>>>>> Stephan
>>>>>>>>> 
>>>>>>>>> From within the VR, the logs show
>>>>>>>>> 
>>>>>>>>> 2015-11-24 11:24:33,807  CsFile.py search:123 Searching for
>>>>>>>>> dhcp-range=interface:eth0,set:interface and replacing with
>>>>>>>>> dhcp-range=interface:eth0,set:interface-eth0,10.10.22.1,static
>>>>>>>>> 2015-11-24 11:24:33,808  merge.py load:56 Creating data bag type
>>>>>>>>> guestnetwork
>>>>>>>>> 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>>>>>>>>> dhcp-option=tag:interface-eth0,15 and replacing with
>>>>>>>>> dhcp-option=tag:interface-eth0,15,heinlein.cloudservice
>>>>>>>>> 2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
>>>>>>>>> dhcp-option=tag:interface-eth0,6 and replacing with
>>>>>>>>> dhcp-option=tag:interface
>>>>>>>>> -eth0,6,10.10.22.1,195.10.208.2,91.198.250.2
>>>>>>>>> 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>>>>>>>>> dhcp-option=tag:interface-eth0,3, and replacing with
>>>>>>>>> dhcp-option=tag:interface-eth0,3,10.10.22.1
>>>>>>>>> 2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
>>>>>>>>> dhcp-option=tag:interface-eth0,1, and replacing with
>>>>>>>>> dhcp-option=tag:interface-eth0,1,255.255.255.0
>>>>>>>>> 2015-11-24 11:24:33,810  CsHelper.py execute:160 Executing: service
>>>>>>>>> dnsmasq restart
>>>>>>>>> 
>>>>>>>>> ==> /var/log/messages <==
>>>>>>>>> Nov 24 11:24:34 r-504-VM shutdown[6752]: shutting down for system
>>>>>>>>> halt
>>>>>>>>> 
>>>>>>>>> Broadcast message from root@r-504-VM (Tue Nov 24 11:24:34 2015):
>>>>>>>>> 
>>>>>>>>> The system is going down for system halt NOW!
>>>>>>>>> Nov 24 11:24:35 r-504-VM KVP: KVP starting; pid is:6844
>>>>>>>>> 
>>>>>>>>> ==> /var/log/cloud.log <==
>>>>>>>>> /opt/cloud/bin/vr_cfg.sh: line 60:  6603
>>>>>>>>> Killed                  /opt/cloud/bin/update_config.py
>>>>>>>>> vm_dhcp_entry.json
>>>>>>>>> 
>>>>>>>>> ==> /var/log/messages <==
>>>>>>>>> Nov 24 11:24:35 r-504-VM cloud: VR config: executing
>>>>>>>>> failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>>>>>>>>> 
>>>>>>>>> ==> /var/log/cloud.log <==
>>>>>>>>> Tue Nov 24 11:24:35 UTC 2015 : VR config: executing
>>>>>>>>> failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
>>>>>>>>> Connection to 169.254.2.192 closed by remote host.
>>>>>>>>> Connection to 169.254.2.192 closed.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> the management-server.log shows
>>>>>>>>> 
>>>>>>>>> 2015-11-24 12:24:43,015 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>>>>>>>>> (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Done executing
>>>>>>>>> com.cloud.vm.VmWorkStart for job-5164
>>>>>>>>> 2015-11-24 12:24:43,017 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
>>>>>>>>> (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Remove job
>>>>>>>>> -5164
>>>>>>>>> from job monitoring
>>>>>>>>> 2015-11-24 12:24:43,114 ERROR [c.c.a.ApiAsyncJobDispatcher]
>>>>>>>>> (API-Job-Executor-1:ctx-760da779 job-5163) Unexpected exception
>>>>>>>>> while
>>>>>>>>> executing org.apache.cloudstack.api.command.admin.
>>>>>>>>> router.StartRouterCmd
>>>>>>>>> com.cloud.exception.AgentUnavailableException: Resource [Host:1] is
>>>>>>>>> unreachable: Host 1: Unable to start instance due to Unable to
>>>>>>>>> start
>>>>>>>>> VM[DomainRouter|r-504-VM] due to error in f
>>>>>>>>> inalizeStart, not retrying
>>>>>>>>>        at
>>>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>>>>>>> ineManagerImpl.java:1121)
>>>>>>>>>        at
>>>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>>>>>>> ineManagerImpl.java:4580)
>>>>>>>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>>>>>> Method)
>>>>>>>>>        at
>>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImp
>>>>>>>>> l.java:57)
>>>>>>>>>        at
>>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcc
>>>>>>>>> essorImpl.java:43)
>>>>>>>>>        at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>>>>>        at
>>>>>>>>> com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandler
>>>>>>>>> Proxy.java:107)
>>>>>>>>>        at
>>>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachi
>>>>>>>>> neManagerImpl.java:4736)
>>>>>>>>>        at
>>>>>>>>> com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:10
>>>>>>>>> 2)
>>>>>>>>>        at
>>>>>>>>> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>>>>>>>>> $5.runInContext(AsyncJobManagerImpl.java:537)
>>>>>>>>>        at
>>>>>>>>> org.apache.cloudstack.managed.context.ManagedContextRunnable
>>>>>>>>> $1.run(ManagedContextRunnable.java:49)
>>>>>>>>>        at
>>>>>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext
>>>>>>>>> $1.call(DefaultManagedContext.java:56)
>>>>>>>>>        at
>>>>>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ca
>>>>>>>>> llWithContext(DefaultManagedContext.java:103)
>>>>>>>>>        at
>>>>>>>>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ru
>>>>>>>>> nWithContext(DefaultManagedContext.java:53)
>>>>>>>>>        at
>>>>>>>>> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(Ma
>>>>>>>>> nagedContextRunnable.java:46)
>>>>>>>>>        at
>>>>>>>>> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
>>>>>>>>> $5.run(AsyncJobManagerImpl.java:494)
>>>>>>>>>        at java.util.concurrent.Executors
>>>>>>>>> $RunnableAdapter.call(Executors.java:471)
>>>>>>>>>        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>>>>>        at
>>>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto
>>>>>>>>> r.java:1145)
>>>>>>>>>        at java.util.concurrent.ThreadPoolExecutor
>>>>>>>>> $Worker.run(ThreadPoolExecutor.java:615)
>>>>>>>>>        at java.lang.Thread.run(Thread.java:745)
>>>>>>>>> Caused by: com.cloud.utils.exception.ExecutionException: Unable to
>>>>>>>>> start
>>>>>>>>> VM[DomainRouter|r-504-VM] due to error in finalizeStart, not
>>>>>>>>> retrying
>>>>>>>>>        at
>>>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>>>>>>> ineManagerImpl.java:1085)
>>>>>>>>>        at
>>>>>>>>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
>>>>>>>>> ineManagerImpl.java:4580)
>>>>>>>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>>>>>> Method)
>>>>>>>>>        ... 18 more
>>>>>>>>> 2015-11-24 12:24:43,115 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>>>>>>>>> (API-Job-Executor-1:ctx-760da779 job-5163) Complete async job-5163,
>>>>>>>>> jobStatus: FAILED, resultCode: 530, result: org.
>>>>>>>>> apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[
>>>>>>>>> ],"errorcode":530,"errortext":"Resource [Host:1] is unreachable:
>>>>>>>>> Host 1: Unable to start instance due to Unable t
>>>>>>>>> o start VM[DomainRouter|r-504-VM] due to error in finalizeStart,
>>>>>>>>> not
>>>>>>>>> retrying"}
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>

Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

Reply via email to