4.7.1 also affected.
All versions use 4.6 templates.

P.S.: would it be usefull to move discussion to devs list. Looks like it's
pretty big issue.

Hi!

After upgrading from 4.5.2 to 4.6.2, I also face this issue. We use both
KVM and XenServer. Any stable workaround for this?

Thanx

Martin

-----Ursprüngliche Nachricht-----
Von: Andrei Mikhailovsky [mailto:[email protected]]
Gesendet: Freitag, 18. Dezember 2015 02:55
An: users <[email protected]>
Betreff: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

Hello guys,

I've just upgraded to 4.6.2 and have similar issues with three virtual
routers out of 22 in total. They are all failing exactly the same way as
described here.

Has anyone found a permanent workaround for this issue?

Thanks

Andrei

----- Original Message -----
From: "Stephan Seitz" <[email protected]>
To: "users" <[email protected]>
Sent: Monday, 30 November, 2015 19:53:57
Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout

Does anybody else experiemce problems due to (very) slow deployment of
VRs?

Am Dienstag, den 24.11.2015, 16:31 +0100 schrieb Stephan Seitz:
Update / FYI:
After faking the particular VRu in sql, I tried to restart that
network, and it always fails. To me it looks like the
update_config.py - which takes almost all cpu ressources - runs way
longer any watchdog will accept.

I'm able to mitigate that by very nasty workarounds:
a) start the router
b) wait until its provisioned
c) restart cloudstack-management
d)  update vm_instance
set state='Running',
power_state='PowerOn' where name = 'r-XXX-VM';
e) once: update domain_router
set template_version="Cloudstack Release 4.6.0 Wed Nov 4 08:22:47 UTC
2015", scripts_version="546c9e7ac38e0aa16ecc498899dac8e2"
where id=XXX;
f) wait until update_config.py finishes (for me thats about 15
minutes)

Since I expect the need for VR restarts in the future, this behaviour
is somehow unsatisfying. It needs a lot of errorprone intervention.

I'm quite unsure if it's introduced with the update or the particular
VR just has simply not been restarted after getting configured with
lots of ips and rules.

Am Dienstag, den 24.11.2015, 12:29 +0100 schrieb Stephan Seitz:
Hi List!

After upgrading from 4.5.2 to 4.6.0 I faced a problem with one
virtualrouter. This particular VR has about 10 IPs w/ LB and FW
rules
defined. During the upgrade process, and after about 4-5 minutes a
watchdog kicks in and kills the respective VR due to no response.

So far I didn't find any timeout value in the global settings.
Temporarily setting network.router.EnableServiceMonitoring to false
doesn't change the behaviour.

Any help, how to mitigate that nasty timeout would be really
appreciated :)

cheers,

Stephan

From within the VR, the logs show

2015-11-24 11:24:33,807  CsFile.py search:123 Searching for
dhcp-range=interface:eth0,set:interface and replacing with
dhcp-range=interface:eth0,set:interface-eth0,10.10.22.1,static
2015-11-24 11:24:33,808  merge.py load:56 Creating data bag type
guestnetwork
2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
dhcp-option=tag:interface-eth0,15 and replacing with
dhcp-option=tag:interface-eth0,15,heinlein.cloudservice
2015-11-24 11:24:33,808  CsFile.py search:123 Searching for
dhcp-option=tag:interface-eth0,6 and replacing with
dhcp-option=tag:interface
-eth0,6,10.10.22.1,195.10.208.2,91.198.250.2
2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
dhcp-option=tag:interface-eth0,3, and replacing with
dhcp-option=tag:interface-eth0,3,10.10.22.1
2015-11-24 11:24:33,809  CsFile.py search:123 Searching for
dhcp-option=tag:interface-eth0,1, and replacing with
dhcp-option=tag:interface-eth0,1,255.255.255.0
2015-11-24 11:24:33,810  CsHelper.py execute:160 Executing: service
dnsmasq restart

==> /var/log/messages <==
Nov 24 11:24:34 r-504-VM shutdown[6752]: shutting down for system
halt

Broadcast message from root@r-504-VM (Tue Nov 24 11:24:34 2015):

The system is going down for system halt NOW!
Nov 24 11:24:35 r-504-VM KVP: KVP starting; pid is:6844

==> /var/log/cloud.log <==
/opt/cloud/bin/vr_cfg.sh: line 60:  6603
Killed                  /opt/cloud/bin/update_config.py
vm_dhcp_entry.json

==> /var/log/messages <==
Nov 24 11:24:35 r-504-VM cloud: VR config: executing
failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json

==> /var/log/cloud.log <==
Tue Nov 24 11:24:35 UTC 2015 : VR config: executing
failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json
Connection to 169.254.2.192 closed by remote host.
Connection to 169.254.2.192 closed.

the management-server.log shows

2015-11-24 12:24:43,015 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
(Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Done executing
com.cloud.vm.VmWorkStart for job-5164
2015-11-24 12:24:43,017 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
(Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Remove job
-5164
from job monitoring
2015-11-24 12:24:43,114 ERROR [c.c.a.ApiAsyncJobDispatcher]
(API-Job-Executor-1:ctx-760da779 job-5163) Unexpected exception
while
executing org.apache.cloudstack.api.command.admin.
router.StartRouterCmd
com.cloud.exception.AgentUnavailableException: Resource [Host:1] is
unreachable: Host 1: Unable to start instance due to Unable to
start
VM[DomainRouter|r-504-VM] due to error in f
inalizeStart, not retrying
         at
com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
ineManagerImpl.java:1121)
         at
com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
ineManagerImpl.java:4580)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
         at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImp
l.java:57)
         at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcc
essorImpl.java:43)
         at java.lang.reflect.Method.invoke(Method.java:606)
         at
com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandler
Proxy.java:107)
         at
com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachi
neManagerImpl.java:4736)
         at
com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:10
2)
         at
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
$5.runInContext(AsyncJobManagerImpl.java:537)
         at
org.apache.cloudstack.managed.context.ManagedContextRunnable
$1.run(ManagedContextRunnable.java:49)
         at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext
$1.call(DefaultManagedContext.java:56)
         at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ca
llWithContext(DefaultManagedContext.java:103)
         at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ru
nWithContext(DefaultManagedContext.java:53)
         at
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(Ma
nagedContextRunnable.java:46)
         at
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl
$5.run(AsyncJobManagerImpl.java:494)
         at java.util.concurrent.Executors
$RunnableAdapter.call(Executors.java:471)
         at
java.util.concurrent.FutureTask.run(FutureTask.java:262)
         at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto
r.java:1145)
         at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:615)
         at java.lang.Thread.run(Thread.java:745)
Caused by: com.cloud.utils.exception.ExecutionException: Unable to
start
VM[DomainRouter|r-504-VM] due to error in finalizeStart, not
retrying
         at
com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
ineManagerImpl.java:1085)
         at
com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach
ineManagerImpl.java:4580)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
         ... 18 more
2015-11-24 12:24:43,115 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
(API-Job-Executor-1:ctx-760da779 job-5163) Complete async job-5163,
jobStatus: FAILED, resultCode: 530, result: org.
apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[
],"errorcode":530,"errortext":"Resource [Host:1] is unreachable:
Host 1: Unable to start instance due to Unable t
o start VM[DomainRouter|r-504-VM] due to error in finalizeStart,
not
retrying"}

 

 
 Daniel Mezentsev, founder
(+1) 604 313 8592.
Soleks Data Group.
Shaping the clouds.

Reply via email to