Hi everyone, a few people have reported issues with live migration lately, and I've been digging into them to narrow them down.
The symptom is relatively easy to describe: you run "nova live-migration <guest> <host>", and nothing happens. A few words of background: - System is Ubuntu precise with stock packages and regular updates, no external PPAs. nova-compute is at version 2012.1-0ubuntu2.1. - libvirtd is running with the "-l" option and with a working TCP socket as described here: http://docs.openstack.org/trunk/openstack-compute/admin/content/configuring-live-migrations.html - /var/lib/nova/instances is on GlusterFS. Now, if you're setting various --*vnc* flags in nova.conf, live migration fails even at the libvirt level (a similar issue has been reported here recently, see https://lists.launchpad.net/openstack/msg12425.html). # virsh migrate --live --p2p --domain instance-0000000a \ --desturi qemu+tcp://skunk-x/system error: Unable to read from monitor: Connection reset by peer ("skunk-x" is secondary IP address of the host "skunk", living in a dedicated network used for migrations). This is in the libvirt.log on the source host: 2012-06-05 20:39:25.838+0000: 12241: error : virNetClientProgramDispatchError:174 : Unable to read from monitor: Connection reset by peer At the same time, I am seeing this in the libvirtd log on the target host: 2012-06-05 20:39:25.394+0000: 6828: error : qemuMonitorIORead:513 : Unable to read from monitor: Connection reset by peer Removing all --*vnc* flags from nova.conf resolved that issue for me. Then, doing the same command as above resulted in a connection timeout, because even if I set "qemu+tcp://skunk-x/system" as the libvirt destination URI, libvirt opens a separate socket on an ephemeral port on skunk's primary interface, which in that case was being blocked by my iptables config: # virsh migrate --live --p2p \ --domain instance-0000000d --desturi qemu+tcp://skunk-x/system error: unable to connect to server at 'skunk:49159': Connection timed out Switching the migration to tunnelled mode solved that issue. # virsh domstate instance-0000000d running # virsh migrate --live --p2p \ --domain instance-0000000d --desturi qemu+tcp://skunk-x/system \ --tunnelled # virsh --connect qemu+tcp://skunk-x/system domstate instance-0000000d running So therefore, these are the flags that I'm using in my nova.conf: --live_migration_uri="qemu+tcp://%s-x/system" --live_migration_flag="VIR_MIGRATE_UNDEFINE_SOURCE, VIR_MIGRATE_PEER2PEER, VIR_MIGRATE_TUNNELLED" (Note that "VIR_MIGRATE_UNDEFINE_SOURCE, VIR_MIGRATE_PEER2PEER" is the default for --live_migration_flag; VIR_MIGRATE_TUNNELLED is my addition. I've also tried migrating over the primary interface, without tunnelling. No change: works in libvirt, doesn't work with Nova.) "nova live-migration <guest> <host>" returns an exit code of 0, and the only trace that I find of the migration in the logs is this, which is evidently from the pre_live_migration method. 2012-06-06 11:05:13 DEBUG nova.rpc.amqp [-] received {u'_context_roles': [u'KeystoneServiceAdmin', u'admin', u'KeystoneAdmin'], u'_msg_id': u'069c958b7c03482aa4f0dda00010eb10', u'_context_read_deleted': u'no', u'_context_request_id': u'req-71c4ffea-4d3d-471c-98bc-8a27aaff8f2c', u'args': {u'instance_id': 13, u'block_migration': False, u'disk': None}, u'_context_auth_token': '<SANITIZED>', u'_context_is_admin': True, u'_context_project_id': u'9c929e61e7624fbe895ae0de38bd1471', u'_context_timestamp': u'2012-06-06T09:05:09.992775', u'_context_user_id': u'1c8c118c7c244d2d94cc516ab6f24c03', u'method': u'pre_live_migration', u'_context_remote_address': u'10.43.0.2'} from (pid=14437) _safe_log /usr/lib/python2.7/dist-packages/nova/rpc/common.py:160 Looks like it never gets to live_migration. I'd be thankful for any clues as to where to dig further. Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp