Thanks for the reply, guys. Just wanted to point out that this is on 4.4 for me (although the issue may also be present on master).
I have a sufficient number of IP addresses for both system and user VMs, so that should be OK (but good thought, Punith). I plan to continue debugging this later this afternoon, but have been in meetings all morning. Thanks! On Mon, Apr 28, 2014 at 10:41 AM, Dave Scott <dave.sc...@citrix.com> wrote: > Hi, > > (sorry to reply to my own email!) > > On 28 Apr 2014, at 11:42, Dave Scott <dave.sc...@citrix.com> wrote: > > > > > Hi Mike, > > > > On 28 Apr 2014, at 04:44, Mike Tutkowski <mike.tutkow...@solidfire.com> > wrote: > > > >> Hi, > >> > >> I recently installed 6.2 with XS62ESP1 and XS62ESP1004 (so that > >> Xenserver625StorageProcessor would be utilized). > >> > >> When I create a cloud from scratch, my SSVM starts up fine, but CPVM > ends > >> up in the Paused state. I have to force a shutdown of that VM and then > >> CloudStack restarts it and it works. This consistently happens. The > system > >> VMs are being deployed to the local storage of the one XS host I have > in my > >> one and only cluster. > >> > >> Any thoughts on that? > > > > I'm seeing the same symptom on my test cloud with 6.2 and XS62ESP1004. I > think there's a problem with XenAPI session and task handling in the > cloudstack master branch, although I've not tracked it down yet. In my > management server log I see: > > > > WARN [c.c.h.x.r.CitrixResourceBase] (DirectAgent-5:ctx-47dccee1) Unable > to start VM(v-2-VM) on host(1c4a31e9-469e-45c3-a0ad-9792ac7b > > 20f6) due to You gave an invalid session reference. It may have been > invalidated by a server restart, or timed out. You should get > > a new session handle, using one of the session.login_ calls. This error > does not invalidate the current connection. The handle para > > meter echoes the bad value given. > > You gave an invalid session reference. It may have been invalidated by > a server restart, or timed out. You should get a new session > > handle, using one of the session.login_ calls. This error does not > invalidate the current connection. The handle parameter echoes > > the bad value given. > > at com.xensource.xenapi.Types.checkResponse(Types.java:218) > > at com.xensource.xenapi.Connection.dispatch(Connection.java:395) > > at > com.cloud.hypervisor.xen.resource.XenServerConnectionPool$XenServerConnection.dispatch(XenServerConnectionPool.java:463) > > at com.xensource.xenapi.Event.from(Event.java:270) > > at > org.apache.cloudstack.hypervisor.xenserver.XenServerResourceNewBase.waitForTask(XenServerResourceNewBase.java:113) > > at > com.cloud.hypervisor.xen.resource.CitrixResourceBase.startVM(CitrixResourceBase.java:3455) > > > > Somehow the XenAPI session being used by the Event.from in the > XenServerResourceNewBase.waitForTask (used for recent 6.2 XenServers only) > is being logged-out somewhere. When this happens, the cloudstack cleanup > code calls Task.cancel and Task.destroy, and then the XenServer > Async.VM.start fails trying to update Task.progress before it internally > calls VM.unpause. > > > > I made a hack to disable caching of Connection/sessions: > > > > > https://github.com/djs55/cloudstack/commit/a388b71279086e42710e26340df0632d0d8135e4 > > For reference / experimentation, I've made a slightly more plausible patch: > > > https://github.com/djs55/cloudstack/commit/9d40f56c6384d04a5f0fb22e5b97530c0164e0b2 > > It catches the SESSION_INVALID in the XenServerConnection and > transparently logs back in. This would prevent the higher level bits of the > XenServer plugin from having to deal with sessions being expired beneath > them. > > Chers, > Dave > > > > > I suspect this now leaks Connections/sessions, but the symptom goes away. > > > > So far my thoughts are: > > > > 1. we need to find who's calling session.logout and why -- this will help > fix the problem in the short term > > > > 2. The XenServer XenAPI bindings are harder to use than they should be > (IMHO). In particular I think the bindings should take care of handling > SESSION_INVALID exceptions and re-authenticating transparently, to avoid > polluting the cloudstack code with rarely-used exception handlers. > > > > 3. the semantics of XenAPI task.destroy could be improved: instead of > immediately removing the task (which then causes cleanup code to fail > randomly it seems), it should be more like Unix waitpid with NOHANG i.e. > set a bit which says, "I'm done with this. Destroy it when you are finished > with it." > > > > > >> > >> Also, if I try to kick off a user VM to local storage, I get the > >> general-purpose InsufficientCapacityException and the virtual router > does > >> not even start up. > > > > No idea about this one :) > > > > Cheers, > > Dave > > > >> > >> Can anyone create a similar cloud to what I've described here with XS > 6.2, > >> XS62ESP1, and XS62ESP1004? I re-ran this test using a XS 6.1 host and it > >> works just fine. > >> > >> At the moment, this is blocking a test case I'm trying to execute to > verify > >> code I had to write in Xenserver625StorageProcessor. > >> > >> Thanks! > >> > >> -- > >> *Mike Tutkowski* > >> *Senior CloudStack Developer, SolidFire Inc.* > >> e: mike.tutkow...@solidfire.com > >> o: 303.746.7302 > >> Advancing the way the world uses the > >> cloud<http://solidfire.com/solution/overview/?video=play> > >> *(tm)* > > > > -- *Mike Tutkowski* *Senior CloudStack Developer, SolidFire Inc.* e: mike.tutkow...@solidfire.com o: 303.746.7302 Advancing the way the world uses the cloud<http://solidfire.com/solution/overview/?video=play> *(tm)*