Public bug reported: The WineTestBot (https://testbot.winehq.org/) uses QEmu live snapshots to ensure the Wine tests are always run in a pristine Windows environment. However the revert times keep increasing linearly with the age of the snapshot, going from tens of seconds to thousands. While the revert takes place the qemu process takes 100% of a core and there is no disk activity. Obviously waiting over 20 minutes before being able to run a 10 second test is not viable.
Only some VMs are impacted. Based on libvirt's XML files the common point appears to be the presence of the following <timer> tags: <clock offset='localtime'> <timer name='rtc' tickpolicy='delay'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> </clock> Where the unaffected VMs have the following clock definition instead: <clock offset='localtime'/> Yet shutting down the affected VMs, changing the clock definition, creating a live snapshot and trying to revert to it 6 months later results in slow revert times (>400 seconds). Changing the tickpolicy to catchup for rtc and/or pit has no effect on the revert time (and unsurprisingly causes the clock to run fast in the guest). To reproduce this problem do the following: * Create a Windows VM (either 32 or 64 bits). This is known to happen with at least Windows 2000, XP, 2003, 2008 and 10. * That VM will have the <timer> tags shown above, with the possible addition of an hypervclock timer. * Shut down the VM. * date -s "2014/04/01" * Start the VM. * Take a live snapshot. * Shut down the VM. * date -s "<your current date>" * Revert to the live snapshot. If the revert takes more than 2 minutes then there is a problem. A workaround is to set track='guest' on the rtc timer. This makes the revert fast and may even be the correct solution. But why is it not the default or better documented? * It setting track='wall' or omitting track, then the revert is slow and the clock in the guest is not updated. * It setting track='guest' the revert is fast and the clock in the guest is not updated. I found three past mentions of this issue but as far as I can tell none of them got anywhere: * [Qemu-discuss] massive slowdown for reverts after given amount of time on any newer versions https://lists.gnu.org/archive/html/qemu-discuss/2013-02/msg00000.html * The above post references another one from 2011 wrt qemu 0.14: https://lists.gnu.org/archive/html/qemu-devel/2011-03/msg02645.html * Comment #9 of Launchpad bug 1174654 matches this slow revert issue. However the bug was really about another issue so this was not followed on. https://bugs.launchpad.net/qemu/+bug/1174654/comments/9 I'm currently running into this issue with QEmu 2.1 but it looks like this bug has been there all along. 1:2.1+dfsg-12+deb8u2 qemu-kvm 1:2.1+dfsg-12+deb8u2 qemu-system-common 1:2.1+dfsg-12+deb8u2 qemu-system-x86 ** Affects: qemu Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1505041 Title: Live snapshot revert times increases linearly with snapshot age Status in QEMU: New Bug description: The WineTestBot (https://testbot.winehq.org/) uses QEmu live snapshots to ensure the Wine tests are always run in a pristine Windows environment. However the revert times keep increasing linearly with the age of the snapshot, going from tens of seconds to thousands. While the revert takes place the qemu process takes 100% of a core and there is no disk activity. Obviously waiting over 20 minutes before being able to run a 10 second test is not viable. Only some VMs are impacted. Based on libvirt's XML files the common point appears to be the presence of the following <timer> tags: <clock offset='localtime'> <timer name='rtc' tickpolicy='delay'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> </clock> Where the unaffected VMs have the following clock definition instead: <clock offset='localtime'/> Yet shutting down the affected VMs, changing the clock definition, creating a live snapshot and trying to revert to it 6 months later results in slow revert times (>400 seconds). Changing the tickpolicy to catchup for rtc and/or pit has no effect on the revert time (and unsurprisingly causes the clock to run fast in the guest). To reproduce this problem do the following: * Create a Windows VM (either 32 or 64 bits). This is known to happen with at least Windows 2000, XP, 2003, 2008 and 10. * That VM will have the <timer> tags shown above, with the possible addition of an hypervclock timer. * Shut down the VM. * date -s "2014/04/01" * Start the VM. * Take a live snapshot. * Shut down the VM. * date -s "<your current date>" * Revert to the live snapshot. If the revert takes more than 2 minutes then there is a problem. A workaround is to set track='guest' on the rtc timer. This makes the revert fast and may even be the correct solution. But why is it not the default or better documented? * It setting track='wall' or omitting track, then the revert is slow and the clock in the guest is not updated. * It setting track='guest' the revert is fast and the clock in the guest is not updated. I found three past mentions of this issue but as far as I can tell none of them got anywhere: * [Qemu-discuss] massive slowdown for reverts after given amount of time on any newer versions https://lists.gnu.org/archive/html/qemu-discuss/2013-02/msg00000.html * The above post references another one from 2011 wrt qemu 0.14: https://lists.gnu.org/archive/html/qemu-devel/2011-03/msg02645.html * Comment #9 of Launchpad bug 1174654 matches this slow revert issue. However the bug was really about another issue so this was not followed on. https://bugs.launchpad.net/qemu/+bug/1174654/comments/9 I'm currently running into this issue with QEmu 2.1 but it looks like this bug has been there all along. 1:2.1+dfsg-12+deb8u2 qemu-kvm 1:2.1+dfsg-12+deb8u2 qemu-system-common 1:2.1+dfsg-12+deb8u2 qemu-system-x86 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1505041/+subscriptions