Glad to hear you fixed the issue! :)

> On Jan 31, 2018, at 7:16 AM, David Mabry <dma...@ena.com.INVALID> wrote:
> 
> Mike and Wei,
> 
> Good news!  I was able to manually live migrate these VMs following the steps 
> outlined below:
> 
> 1.) virsh dumpxml 38 --migratable > 38.xml
> 2.) Change the vnc information in 38.xml to match destination host IP and 
> available VNC port
> 3.) virsh migrate --verbose --live 38 --xml 38.xml 
> qemu+tcp://destination.host.net/system
> 
> To my surprise, Cloudstack was able to discover and properly handle the fact 
> that this VM was live migrated to a new host without issue.  Very cool.
> 
> Wei, I suspect you are correct when you said this was an issue with the 
> cloudstack agent code.  After digging a little deeper, the agent is never 
> attempting to talk to libvirt at all after prepping the dxml to send to the 
> destination host.  I'm going to attempt to reproduce this in my lab and 
> attach a remote debugger and see if I can get to the bottom of it.
> 
> Thanks again for the help guys!  I really appreciate it.
> 
> Thanks,
> David Mabry
> 
> On 1/30/18, 9:55 AM, "David Mabry" <dma...@ena.com.INVALID> wrote:
> 
>    Ah, understood.  I'll take a closer look at the logs and make sure that I 
> didn't accidentally miss those lines when I pulled together the logs for this 
> email chain.
> 
>    Thanks,
>    David Mabry
>    On 1/30/18, 8:34 AM, "Wei ZHOU" <ustcweiz...@gmail.com> wrote:
> 
>        Hi David,
> 
>        I encountered the UnsupportAnswer once before, when I made some 
> changes in
>        the kvm plugin.
> 
>        Normally there should be some network configurations in the agent.log 
> but I
>        do not see it.
> 
>        -Wei
> 
> 
>        2018-01-30 15:00 GMT+01:00 David Mabry <dma...@ena.com.invalid>:
> 
>> Hi Wei,
>> 
>> I detached the iso and received the same error.  Just out of curiosity,
>> what leads you to believe it is something in the vxlan code?  I guess at
>> this point, attaching a remote debugger to the agent in question might be
>> the best way to get to the bottom of what is going on.
>> 
>> Thanks in advance for the help.  I really, really appreciate it.
>> 
>> Thanks,
>> David Mabry
>> 
>> On 1/30/18, 3:30 AM, "Wei ZHOU" <ustcweiz...@gmail.com> wrote:
>> 
>>    The answer should be caused by an exception in the cloudstack agent.
>>    I tried to migrate a vm in our testing env, it is working.
>> 
>>    there are some different between our env and yours.
>>    (1) vlan VS vxlan
>>    (2) no ISO VS attached ISO
>>    (3) both of us use ceph and centos7.
>> 
>>    I suspect it is caused by codes on vxlan.
>>    However, could you detach the ISO and try again ?
>> 
>>    -Wei
>> 
>> 
>> 
>>    2018-01-29 19:48 GMT+01:00 David Mabry <dma...@ena.com.invalid>:
>> 
>>> Good day Cloudstack Devs,
>>> 
>>> I've run across a real head scratcher.  I have two VMs, (initially 3
>> VMs,
>>> but more on that later) on a single host, that I cannot live migrate
>> to any
>>> other host in the same cluster.  We discovered this after attempting
>> to
>>> roll out patches going from CentOS 7.2 to CentOS 7.4.  Initially, we
>>> thought it had something to do with the new version of libvirtd or
>> qemu-kvm
>>> on the other hosts in the cluster preventing these VMs from
>> migrating, but
>>> we are able to live migrate other VMs to and from this host without
>> issue.
>>> We can even create new VMs on this specific host and live migrate
>> them
>>> after creation with no issue.  We've put the migration source agent,
>>> migration destination agent and the management server in debug and
>> don't
>>> seem to get anything useful other than "Unsupported command".
>> Luckily, we
>>> did have one VM that was shutdown and restarted, this is the 3rd VM
>>> mentioned above.  Since that VM has been restarted, it has no issues
>> live
>>> migrating to any other host in the cluster.
>>> 
>>> I'm at a loss as to what to try next and I'm hoping that someone out
>> there
>>> might have had a similar issue and could shed some light on what to
>> do.
>>> Obviously, I can contact the customer and have them shutdown their
>> VMs, but
>>> that will potentially just delay this problem to be solved another
>> day.
>>> Even if shutting down the VMs is ultimately the solution, I'd still
>> like to
>>> understand what happened to cause this issue in the first place with
>> the
>>> hopes of preventing it in the future.
>>> 
>>> Here's some information about my setup:
>>> Cloudstack 4.8 Advanced Networking
>>> CentOS 7.2 and 7.4 Hosts
>>> Ceph RBD Primary Storage
>>> NFS Secondary Storage
>>> Instance in Question for Debug: i-532-1392-NSVLTN
>>> 
>>> I have attached relevant debug logs to this email if anyone wishes
>> to take
>>> a look.  I think the most interesting error message that I have
>> received is
>>> the following:
>>> 
>>> 468390:2018-01-27 08:59:35,172 DEBUG [c.c.a.t.Request]
>>> (Work-Job-Executor-6:ctx-188ea30f job-181792/job-181802
>> ctx-8e7f45ad)
>>> (logid:f0888362) Seq 22-942378222027276319: Received:  { Ans: ,
>> MgmtId:
>>> 14038012703634, via: 22(csh02c01z01.nsvltn.ena.net), Ver: v1,
>> Flags: 110,
>>> { UnsupportedAnswer } }
>>> 468391:2018-01-27 08:59:35,172 WARN  [c.c.a.m.AgentManagerImpl]
>>> (Work-Job-Executor-6:ctx-188ea30f job-181792/job-181802
>> ctx-8e7f45ad)
>>> (logid:f0888362) Unsupported Command: Unsupported command issued:
>>> com.cloud.agent.api.PrepareForMigrationCommand.  Are you sure you
>> got the
>>> right type of server?
>>> 468392:2018-01-27 08:59:35,179 ERROR [c.c.v.VmWorkJobHandlerProxy]
>>> (Work-Job-Executor-6:ctx-188ea30f job-181792/job-181802
>> ctx-8e7f45ad)
>>> (logid:f0888362) Invocation exception, caused by:
>> com.cloud.exception.AgentUnavailableException:
>>> Resource [Host:22] is unreachable: Host 22: Unable to prepare for
>> migration
>>> due to Unsupported command issued: com.cloud.agent.api.
>> PrepareForMigrationCommand.
>>> Are you sure you got the right type of server?
>>> 468393:2018-01-27 08:59:35,179 INFO  [c.c.v.VmWorkJobHandlerProxy]
>>> (Work-Job-Executor-6:ctx-188ea30f job-181792/job-181802
>> ctx-8e7f45ad)
>>> (logid:f0888362) Rethrow exception com.cloud.exception.
>> AgentUnavailableException:
>>> Resource [Host:22] is unreachable: Host 22: Unable to prepare for
>> migration
>>> due to Unsupported command issued: com.cloud.agent.api.
>> PrepareForMigrationCommand.
>>> Are you sure you got the right type of server?
>>> 
>>> I've tracked this "Unsupported command" down in the CS 4.8 code to
>>> cloudstack/api/src/com/cloud/agent/api/Answer.java which is the
>> generic
>>> answer class.  I believe where the error is really being spawned
>> from is
>>> cloudstack/engine/orchestration/src/com/cloud/
>>> vm/VirtualMachineManagerImpl.java.  Specifically:
>>>        Answer pfma = null;
>>>        try {
>>>            pfma = _agentMgr.send(dstHostId, pfmc);
>>>            if (pfma == null || !pfma.getResult()) {
>>>                final String details = pfma != null ?
>> pfma.getDetails() :
>>> "null answer returned";
>>>                final String msg = "Unable to prepare for migration
>> due to
>>> " + details;
>>>                pfma = null;
>>>                throw new AgentUnavailableException(msg, dstHostId);
>>>            }
>>> 
>>> The pfma returned must be in error or is never returned and therefore
>>> still null.  That answer appears that it should be coming from the
>>> destination agent, but for the life of me I can't figure out what
>> the root
>>> cause of this error is beyond, "Unsupported command issued".  What
>> command
>>> is unsupported?  My guess is that it could be something wrong with
>> the dxml
>>> that is generated and passed to the destination host, but I have as
>> yet
>>> been unable to catch that dxml in debug.
>>> 
>>> Any help or guidance is greatly appreciated.
>>> 
>>> Thanks,
>>> David Mabry
>>> 
>>> 
>> 
>> 
>> 
> 
> 
> 
> 

Reply via email to