I cannot be sure without (being able of) reading code, but I'm thinking there 
were no proper checks in place for this within Cloudstack until now.
Your instances probably worked somehow with the openvswitch bridge and you did 
not realise the problem since you have not actually used the security groups.

If that's the case I'd just downgrade to whatever version you tried last that 
worked and give myself some more time to plan this out in the longer term..

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

----- Original Message -----
> From: "Yiping Zhang" <[email protected]>
> To: [email protected]
> Sent: Friday, 10 July, 2015 01:28:44
> Subject: Re: [Urgent]:  xenserver hosts stuck in alert state after 4.3.2 -> 
> 4.5.1 upgrade

> Hi, Lucian:
> 
> Thanks for the reply. When I said it worked all this time I really meant
> that the CS instance worked as expected for what we were doing with it,
> not to mean that the SecurityGroup feature worked.
> 
> To be honest, we did not really use security group feature per se,  as
> this is a private cloud running on our own networks and we picked advanced
> networking with SG only to avoid assigning a public network IP range for
> guests.  The CS instance went through upgrades from 4.3.0 -> 4.3.1 ->
> 4.3.2 without a hiccup, until we tried to upgrade to 4.5.1.
> 
> Yiping
> 
> 
> On 7/9/15, 12:31 PM, "Nux!" <[email protected]> wrote:
> 
>>Hello,
>>
>>As far as I can tell, Xenserver has always required the network to be in
>>bridge mode for security groups to work, at least since 4.1 that I've
>>been playing with. Not sure how exactly it was working in your case ...
>>did you actually test the firewall rules were doing anything? (Sorry for
>>dumb question)
>>
>>Lucian
>>
>>--
>>Sent from the Delta quadrant using Borg technology!
>>
>>Nux!
>>www.nux.ro
>>
>>----- Original Message -----
>>> From: "Yiping Zhang" <[email protected]>
>>> To: [email protected]
>>> Sent: Thursday, 9 July, 2015 19:22:01
>>> Subject: [Urgent]:  xenserver hosts stuck in alert state after 4.3.2 ->
>>>4.5.1 upgrade
>>
>>> Hi, all:
>>> 
>>> We just did an upgrade from CS 4.3.2 -> 4.5.1.  Our environment is rhel
>>>6 + Adv.
>>> Zone with SecurtyGroup + XenServer 6.2.
>>> 
>>> After the upgrade, all xenserver hosts were in alert state and MS can¹t
>>>connect
>>> to them with following errors:
>>> 
>>> 
>>> INFO  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-9:ctx-7fc226ce) Host
>>> 10.0.100.25 OpaqueRef:996c575a-ad04-5f3f-cd6d-56b7daa16844: Host
>>>10.0.100.25 is
>>> already setup.
>>> 
>>> WARN  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-9:ctx-7fc226ce)
>>>Failed to
>>> configure brige firewall
>>> 
>>> WARN  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-9:ctx-7fc226ce) Check
>>>host
>>> 10.0.100.25 for CSP is installed or not and check network mode for
>>>bridge
>>> 
>>> WARN  [c.c.h.x.d.XcpServerDiscoverer] (AgentTaskPool-13:ctx-b4517020)
>>>Unable to
>>> setup agent 5 due to Failed to configure brige firewall
>>> 
>>> INFO  [c.c.u.e.CSExceptionErrorCode] (AgentTaskPool-13:ctx-b4517020)
>>>Could not
>>> find exception: com.cloud.exception.ConnectionException in error code
>>>list for
>>> exceptions
>>> 
>>> WARN  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-13:ctx-b4517020) Monitor
>>> XcpServerDiscoverer says there is an error in the connect process for 5
>>>due to
>>> Reinitialize agent after setup.
>>> 
>>> INFO  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-13:ctx-b4517020) Host 5
>>>is
>>> disconnecting with event AgentDisconnected
>>> 
>>> WARN  [c.c.r.ResourceManagerImpl] (AgentTaskPool-13:ctx-b4517020)
>>>Unable to
>>> connect due to
>>> 
>>> com.cloud.exception.ConnectionException: Reinitialize agent after setup.
>>> 
>>>        at
>>>        
>>>com.cloud.hypervisor.xenserver.discoverer.XcpServerDiscoverer.processConn
>>>ect(XcpServerDiscoverer.java:621)
>>> 
>>>        at
>>>        
>>>com.cloud.agent.manager.AgentManagerImpl.notifyMonitorsOfConnection(Agent
>>>ManagerImpl.java:539)
>>> 
>>>        at
>>>        
>>>com.cloud.agent.manager.AgentManagerImpl.handleDirectConnectAgent(AgentMa
>>>nagerImpl.java:1447)
>>> 
>>>        at
>>>        
>>>com.cloud.resource.ResourceManagerImpl.createHostAndAgent(ResourceManager
>>>Impl.java:1794)
>>> 
>>>        at
>>>        
>>>com.cloud.resource.ResourceManagerImpl.createHostAndAgent(ResourceManager
>>>Impl.java:1920)
>>> 
>>>        at sun.reflect.GeneratedMethodAccessor142.invoke(Unknown Source)
>>> 
>>>        at
>>>        
>>>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI
>>>mpl.java:43)
>>> 
>>>        at java.lang.reflect.Method.invoke(Method.java:606)
>>> 
>>>        at
>>>        
>>>org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(A
>>>opUtils.java:317)
>>> 
>>>        at
>>>        
>>>org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpo
>>>int(ReflectiveMethodInvocation.java:183)
>>> 
>>>        at
>>>        
>>>org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(Refl
>>>ectiveMethodInvocation.java:150)
>>> 
>>>        at
>>>        
>>>org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(Ex
>>>poseInvocationInterceptor.java:91)
>>> 
>>>        at
>>>        
>>>org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(Refl
>>>ectiveMethodInvocation.java:172)
>>> 
>>>        at
>>>        
>>>org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAop
>>>Proxy.java:204)
>>> 
>>>        at com.sun.proxy.$Proxy149.createHostAndAgent(Unknown Source)
>>> 
>>>        at
>>>        
>>>com.cloud.agent.manager.AgentManagerImpl$SimulateStartTask.runInContext(A
>>>gentManagerImpl.java:1078)
>>> 
>>>        at
>>>        
>>>org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(Manage
>>>dContextRunnable.java:49)
>>> 
>>>        at
>>>        
>>>org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(D
>>>efaultManagedContext.java:56)
>>> 
>>>        at
>>>        
>>>org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWith
>>>Context(DefaultManagedContext.java:103)
>>> 
>>>        at
>>>        
>>>org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithC
>>>ontext(DefaultManagedContext.java:53)
>>> 
>>>        at
>>>        
>>>org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedC
>>>ontextRunnable.java:46)
>>> 
>>>        at
>>>        
>>>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java
>>>:1145)
>>> 
>>>        at
>>>        
>>>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav
>>>a:615)
>>> 
>>>        at java.lang.Thread.run(Thread.java:745)
>>> 
>>> The error implies that it was expecting linux bridge on XenServer and
>>>couldn¹t
>>> find it. After checking on XenServer hosts, we have been using
>>>openvswitch
>>> instead of linux bridge as network backend all this time.
>>> 
>>> So the question comes down to:
>>> 
>>> 1.  If CS advanced zone with SecurityGroup only supported on XenServer
>>>using
>>> linux bridge backend, why did 4.3.x allow us to work with unsupported
>>> configurations for so long without any apparent issues, and does CS
>>>4.5.1 now
>>> enforce this requirement ?
>>> 2.  If we switch network backend from openvswitch to linux bridge,
>>>would this
>>> fix the problem?  We are hoping to avoid this step, as it requires
>>>rebooting
>>> all XenServer hosts and shuffling around hundreds of VM instances.
>>> 3.  Is there any other solutions to make newly upgraded cloudstack 4.5.1
>>> management server to reconnect to our XenServer hosts ?
>>> 
>>> Thanks in advance,
>>> 
> >> Yiping

Reply via email to