Hi everyone, (running Mitaka)
When doing block live-migration, if the image / backing file is not present at destination host, sometimes pre-live migration fails after 60 seconds as shown below. Retrying the migration to the same destination host succeeds. It seems that an rpc_response_timeout of 60 seconds is not enough for this scenario, in which fetching the image involves 90 seconds. We don't like to increase rpc_response_timeout to say, 120 seconds, only for this reason ('cause in other kind of errors we prefer to fail fast). Given that migrations are usually long, shouldn't this operation be under the scope of a configurable timeout such as live_migration_progress_timeout or live_migration_completion_timeout which overrides the default rpc timeout? Thanks! SOURCE HOST Nov 27 09:22:22 source check data is LibvirtLiveMigrateData(bdms=<?>,block_migration=True,disk_available_mb=400384,disk_over_commit=False,filename='tmpLBvgop',graphics_listen_addr_spice=<?>,graphics_listen_addr_vnc=<?>,image_type='default',instance_relative_path='a3dc8ffe-5fd1-475e-a016-3d14128aa84c',is_shared_block_storage=False,is_shared_instance_path=False,is_volume_backed=False,migration=<?>,serial_listen_addr=<?>,target_connect_addr=<?>) Nov 27 09:23:22 [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] Pre live migration failed at r65-host18 2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] Traceback (most recent call last): 2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 5275, in _do_live_migration 2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] block_migration, disk, dest, migrate_data) 2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] File "/usr/lib/python2.7/dist-packages/nova/compute/rpcapi.py", line 692, in pre_live_migration 2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] disk=disk, migrate_data=migrate_data) 2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call 2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] retry=self.retry) 2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send 2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] timeout=timeout, retry=retry) 2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 470, in send 2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] retry=retry) 2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 459, in _send 2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] result = self._waiter.wait(msg_id, timeout) 2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 342, in wait 2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] message = self.waiters.get(msg_id, timeout=timeout) 2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 244, in get 2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] 'to message ID %s' % msg_id) 2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] MessagingTimeout: Timed out waiting for a reply to message ID 32be13b876094c47b293f89b8d445652 2017-11-27 09:23:22.854 9060 ERROR nova.compute.manager [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] DESTINATION HOST Nov 27 09:22:23 [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] migrate_data in pre_live_migration: LibvirtLiveMigrateData(bdms=<?>,block_migration=True,disk_available_mb=400384,disk_over_commit=False,filename='tmpLBvgop',graphics_listen_... Nov 27 09:22:23 [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] Creating instance directory: /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c Nov 27 09:22:23 [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] Creating disk.info with the contents: {u'/var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c/disk': u'qcow2'} Nov 27 09:22:23 [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] Checking to make sure images and backing files are present before live migration. Nov 27 09:22:23 Lock "bb045b7affecd2cd3976771e1809996277954a98" acquired by "nova.virt.libvirt.imagebackend.fetch_func_sync" :: waited 0.000s Nov 27 09:23:23 Acquired semaphore "refresh_cache-a3dc8ffe-5fd1-475e-a016-3d14128aa84c" Nov 27 09:23:23 [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] _get_instance_nw_info() Nov 27 09:23:23 REQ: curl -g -i -X GET http://prod-controller-01:9696/v2.0/ports.json?tenant_id=840871e744c8406a908838a6abe42a32&device_id=a3dc8ffe-5fd1-475e-a016-3d14128aa84c -H "User-Agent: python-neutronclient" -H "Accept: application/j... Nov 27 09:23:23 RESP: [200] Date: Mon, 27 Nov 2017 14:23:23 GMT Connection: keep-alive Content-Type: application/json; charset=UTF-8 Content-Length: 878 X-Openstack-Request-Id: req-1b61ebb8-3aab-4f5f-9d9f-da4675dcaa96 #012RESP BODY: {"port... ... ... Nov 27 09:23:23 Releasing semaphore "refresh_cache-a3dc8ffe-5fd1-475e-a016-3d14128aa84c" Nov 27 09:23:23 [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] During wait destroy, instance disappeared. ... ... Nov 27 09:23:23 mv /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c_del Nov 27 09:23:24 CMD "mv /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c_del" returned: 0 in 0.037s Nov 27 09:23:24 [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] Deleting instance files /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c_del Nov 27 09:23:24 [instance: a3dc8ffe-5fd1-475e-a016-3d14128aa84c] Deletion of /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c_del complete Nov 27 09:23:50 qemu-img info /var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98.part Nov 27 09:23:50 qemu-img info /var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98.part" returned: 0 in 0.048s Nov 27 09:23:50 4531211f-5252-403d-a90f-56ddcaa00928 was qcow2, converting to raw Nov 27 09:23:50 qemu-img convert -O raw /var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98.part /var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98.converted -f qcow2 Nov 27 09:23:53 CMD "qemu-img convert -O raw /var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98.part /var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98.converted -f qcow2" returned: 0 in 3.400s Nov 27 09:23:54 qemu-img info /var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98.converted Nov 27 09:23:54 qemu-img info /var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98.converted" returned: 0 in 0.047s Nov 27 09:23:54 Lock "bb045b7affecd2cd3976771e1809996277954a98" released by "nova.virt.libvirt.imagebackend.fetch_func_sync" :: held 90.662s Nov 27 09:23:54 touch -c /var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98 Nov 27 09:23:54 touch -c /var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98" returned: 0 in 0.113s Nov 27 09:23:54 qemu-img info /var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98 Nov 27 09:23:54 qemu-img info /var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98" returned: 0 in 0.048s Nov 27 09:23:54 Lock "bb045b7affecd2cd3976771e1809996277954a98" acquired by "nova.virt.libvirt.imagebackend.copy_qcow2_image" :: waited 0.000s Nov 27 09:23:54 qemu-img info /var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98 Nov 27 09:23:54 qemu-img info /var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98" returned: 0 in 0.046s Nov 27 09:23:54 qemu-img create -f qcow2 -o backing_file=/var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98 /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c/disk Nov 27 09:23:54 CMD "qemu-img create -f qcow2 -o backing_file=/var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98 /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c/disk" returned: 1 in 0.042s Nov 27 09:23:54 u'qemu-img create -f qcow2 -o backing_file=/var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98 /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c/disk' failed. Not Retrying. Nov 27 09:23:54 Lock "bb045b7affecd2cd3976771e1809996277954a98" released by "nova.virt.libvirt.imagebackend.copy_qcow2_image" :: held 0.092s Nov 27 09:23:54 Exception during message handling: Unexpected error while running command. Command: qemu-img create -f qcow2 -o backing_file=/var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98 /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c/disk Exit code: 1 Stdout: u"Formatting '/var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c/disk', fmt=qcow2 size=7516192768 backing_file=/var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16\n" Stderr: u'qemu-img: /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c/disk: Could not create file: No such file or directory\n' 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher Traceback (most recent call last): 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 138, in _dispatch_and_reply 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher incoming.message)) 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 185, in _dispatch 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher return self._do_dispatch(endpoint, method, ctxt, args) 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 127, in _do_dispatch 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher result = func(ctxt, **new_args) 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 110, in wrapped 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher payload) 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher self.force_reraise() 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb) 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 89, in wrapped 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher return f(self, context, *args, **kw) 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 409, in decorated_function 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher return function(self, context, *args, **kwargs) 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 387, in decorated_function 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher kwargs['instance'], e, sys.exc_info()) 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher self.force_reraise() 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb) 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 375, in decorated_function 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher return function(self, context, *args, **kwargs) 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 5221, in pre_live_migration 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher migrate_data) 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 6695, in pre_live_migration 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher fallback_from_host=instance.host) 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 6864, in _create_images_and_backing 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher fallback_from_host) 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 6791, in _try_fetch_image_cache 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher size=size) 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py", line 251, in cache 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher *args, **kwargs) 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py", line 626, in create_image 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher copy_qcow2_image(base, self.path, size) 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 274, in inner 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher return f(*args, **kwargs) 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py", line 584, in copy_qcow2_image 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher libvirt_utils.create_cow_image(base, target) 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/utils.py", line 107, in create_cow_image 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher execute(*cmd) 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/utils.py", line 56, in execute 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher return utils.execute(*args, **kwargs) 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/nova/utils.py", line 389, in execute 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher return processutils.execute(*cmd, **kwargs) 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_concurrency/processutils.py", line 389, in execute 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher cmd=sanitized_cmd) 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher ProcessExecutionError: Unexpected error while running command. 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher Command: qemu-img create -f qcow2 -o backing_file=/var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98 /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c/disk 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher Exit code: 1 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher Stdout: u"Formatting '/var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c/disk', fmt=qcow2 size=7516192768 backing_file=/var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16\n" 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher Stderr: u'qemu-img: /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c/disk: Could not create file: No such file or directory\n' 2017-11-27 09:23:54.666 35650 ERROR oslo_messaging.rpc.dispatcher Nov 27 09:23:54 Returning exception Unexpected error while running command. Command: qemu-img create -f qcow2 -o backing_file=/var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98 /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c/disk Exit code: 1 Stdout: u"Formatting '/var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c/disk', fmt=qcow2 size=7516192768 backing_file=/var/lib/nova/instances/_base/bb045b7affecd2cd3976771e1809996277954a98 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16\n" Stderr: u'qemu-img: /var/lib/nova/instances/a3dc8ffe-5fd1-475e-a016-3d14128aa84c/disk: Could not create file: No such file or directory\n' to caller
_______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators