Hi Mike,

On 28 Apr 2014, at 04:44, Mike Tutkowski <mike.tutkow...@solidfire.com> wrote:

> Hi,
> 
> I recently installed 6.2 with XS62ESP1 and XS62ESP1004 (so that
> Xenserver625StorageProcessor would be utilized).
> 
> When I create a cloud from scratch, my SSVM starts up fine, but CPVM ends
> up in the Paused state. I have to force a shutdown of that VM and then
> CloudStack restarts it and it works. This consistently happens. The system
> VMs are being deployed to the local storage of the one XS host I have in my
> one and only cluster.
> 
> Any thoughts on that?

I’m seeing the same symptom on my test cloud with 6.2 and XS62ESP1004. I think 
there’s a problem with XenAPI session and task handling in the cloudstack 
master branch, although I’ve not tracked it down yet. In my management server 
log I see:

WARN  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-5:ctx-47dccee1) Unable to 
start VM(v-2-VM) on host(1c4a31e9-469e-45c3-a0ad-9792ac7b
20f6) due to You gave an invalid session reference.  It may have been 
invalidated by a server restart, or timed out.  You should get 
a new session handle, using one of the session.login_ calls.  This error does 
not invalidate the current connection.  The handle para
meter echoes the bad value given.
You gave an invalid session reference.  It may have been invalidated by a 
server restart, or timed out.  You should get a new session
 handle, using one of the session.login_ calls.  This error does not invalidate 
the current connection.  The handle parameter echoes 
the bad value given.
        at com.xensource.xenapi.Types.checkResponse(Types.java:218)
        at com.xensource.xenapi.Connection.dispatch(Connection.java:395)
        at 
com.cloud.hypervisor.xen.resource.XenServerConnectionPool$XenServerConnection.dispatch(XenServerConnectionPool.java:463)
        at com.xensource.xenapi.Event.from(Event.java:270)
        at 
org.apache.cloudstack.hypervisor.xenserver.XenServerResourceNewBase.waitForTask(XenServerResourceNewBase.java:113)
        at 
com.cloud.hypervisor.xen.resource.CitrixResourceBase.startVM(CitrixResourceBase.java:3455)

Somehow the XenAPI session being used by the Event.from in the 
XenServerResourceNewBase.waitForTask (used for recent 6.2 XenServers only) is 
being logged-out somewhere. When this happens, the cloudstack cleanup code 
calls Task.cancel and Task.destroy, and then the XenServer Async.VM.start fails 
trying to update Task.progress before it internally calls VM.unpause.

I made a hack to disable caching of Connection/sessions:

https://github.com/djs55/cloudstack/commit/a388b71279086e42710e26340df0632d0d8135e4

I suspect this now leaks Connections/sessions, but the symptom goes away.

So far my thoughts are:

1. we need to find who’s calling session.logout and why — this will help fix 
the problem in the short term

2. The XenServer XenAPI bindings are harder to use than they should be (IMHO). 
In particular I think the bindings should take care of handling SESSION_INVALID 
exceptions and re-authenticating transparently, to avoid polluting the 
cloudstack code with rarely-used exception handlers.

3. the semantics of XenAPI task.destroy could be improved: instead of 
immediately removing the task (which then causes cleanup code to fail randomly 
it seems), it should be more like Unix waitpid with NOHANG i.e. set a bit which 
says, “I’m done with this. Destroy it when you are finished with it."


> 
> Also, if I try to kick off a user VM to local storage, I get the
> general-purpose InsufficientCapacityException and the virtual router does
> not even start up.

No idea about this one :)

Cheers,
Dave

> 
> Can anyone create a similar cloud to what I've described here with XS 6.2,
> XS62ESP1, and XS62ESP1004? I re-ran this test using a XS 6.1 host and it
> works just fine.
> 
> At the moment, this is blocking a test case I'm trying to execute to verify
> code I had to write in Xenserver625StorageProcessor.
> 
> Thanks!
> 
> -- 
> *Mike Tutkowski*
> *Senior CloudStack Developer, SolidFire Inc.*
> e: mike.tutkow...@solidfire.com
> o: 303.746.7302
> Advancing the way the world uses the
> cloud<http://solidfire.com/solution/overview/?video=play>
> *(tm)*

Reply via email to