Public bug reported: Nova while monitoring live migration progress bases on what libvirt reports under data_remaining property
https://github.com/openstack/nova/blob/54482fde22742bc852414c58552fe64ea59d61d5/nova/virt/libvirt/driver.py#L6189-L6193 However, data_remaining does not reflect any valuable information that nova can use to track live migration progress. It's just an information how many data needs to be transferred in current iteration to finish current iteration and check whether VM can be switched to destination, nothing more. As an example let's assume we have VM with 4 GBs of memory. In the very fist iteration libvirt will report that there is still 4GB of data to be transferred. During the first iteration this number will go down to 0 bytes (or almost 0) and this will end the first iteration. Let's say that during the first iteration VM has dirtied 3 GBs of memory. At the beginning of subsequent iteration QEMU will calculate number of dirty pages * page size and libvirt will report 3 GBs of data to be transferred in the second iteration. However, during second iteration data_remaining will again go down to zero at the end of second iteration. Given that nova makes snapshot of all those information once every 0.5 second and that data remaining reported by libvirt reflects only data remaining in particular iteration, we can't say whether LM is progressing or not. Therefore live migration progress timeout does not make sense as nova can take a snapshot from libvirt in the first iteration that will say that there is only 150 MB to be transferred to destination and very likely in every subsequent iteration nova will not take a snapshot with less amount of data to be transferred and will think that LM is not progressing. This affects all releases starting from Liberty. ** Affects: nova Importance: Undecided Status: New ** Tags: live-migration ** Description changed: Nova while monitoring live migration progress bases on what libvirt reports under data_remaining property https://github.com/openstack/nova/blob/54482fde22742bc852414c58552fe64ea59d61d5/nova/virt/libvirt/driver.py#L6189-L6193 However, data_remaining does not reflect any valuable information that nova can use to track live migration progress. It's just an information how many data needs to be transferred in current iteration to finish current iteration and check whether VM can be switched to destination, nothing more. As an example let's assume we have VM with 4 GBs of memory. In the very fist iteration libvirt will report that there is still 4GB of data to be transferred. During the first iteration this number will go down to 0 bytes (or almost 0) and this will end the first iteration. Let's say that during the first iteration VM has dirtied 3 GBs of memory. At the beginning of subsequent iteration QEMU will calculate number of dirty pages * page size and libvirt will report 3 GBs of data to be transferred in the second iteration. However, during second iteration data_remaining will again go down to zero at the end of second iteration. Given that nova makes snapshot of all those information once every 0.5 second and that data remaining reported by libvirt reflects only data remaining in particular iteration, we can't say whether LM is progressing or not. Therefore live migration progress timeout does not make sense as nova can take a snapshot from libvirt in the first iteration that will say that there is only 150 MB to be transferred to destination and very likely in every subsequent iteration nova will not take a snapshot with less amount of data to be transferred and will think that LM is not progressing. + + This affects all releases starting from Liberty. -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1644248 Title: Nova incorrectly tracks live migration progress Status in OpenStack Compute (nova): New Bug description: Nova while monitoring live migration progress bases on what libvirt reports under data_remaining property https://github.com/openstack/nova/blob/54482fde22742bc852414c58552fe64ea59d61d5/nova/virt/libvirt/driver.py#L6189-L6193 However, data_remaining does not reflect any valuable information that nova can use to track live migration progress. It's just an information how many data needs to be transferred in current iteration to finish current iteration and check whether VM can be switched to destination, nothing more. As an example let's assume we have VM with 4 GBs of memory. In the very fist iteration libvirt will report that there is still 4GB of data to be transferred. During the first iteration this number will go down to 0 bytes (or almost 0) and this will end the first iteration. Let's say that during the first iteration VM has dirtied 3 GBs of memory. At the beginning of subsequent iteration QEMU will calculate number of dirty pages * page size and libvirt will report 3 GBs of data to be transferred in the second iteration. However, during second iteration data_remaining will again go down to zero at the end of second iteration. Given that nova makes snapshot of all those information once every 0.5 second and that data remaining reported by libvirt reflects only data remaining in particular iteration, we can't say whether LM is progressing or not. Therefore live migration progress timeout does not make sense as nova can take a snapshot from libvirt in the first iteration that will say that there is only 150 MB to be transferred to destination and very likely in every subsequent iteration nova will not take a snapshot with less amount of data to be transferred and will think that LM is not progressing. This affects all releases starting from Liberty. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1644248/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp