This is just a burp of frustration, after years of running qemu+kvm.
I am currently running qemu-system version 2.1.0 (Debian package 2.1+dfsg-2~bpo70+2), and after several years of ongoing bug reports (and various patches)from others, snapshots are still painfully slow. Even simple snapshot operations take large amounts of time. Creating and removing snapshots from a running VM makes the VM unresponsive (pings etc. fail), often for as much as 30 minutes to an hour. Note that this is happening on a production system, so experimentation is limited and unsafe cache settings are just not an option. In one qemu bug tracker (not sure if it is the current one), I found reports and patches for at least 3 internal issues related to this: 1. Unnecessary disk flushes for each update of each entry in some qcow2 file format tables(I think it was the L1 or L2 tables). It seems that functions intended to perform and flush single allocations are being misused by calling them in a loop for the bulk allocations involved in snapshots. 2. Failure to remove a temporary copy of the savevm memory image from the HEAD of the snapshot tree (not sure why it was written there and not in the snapshot itself). 3. Metadata pre-allocation not surviving snapshot creation/removal. In addition to these bug reports, I have noticed in other documents, that snapshot-related features, such as streaming blocks to combine snapshots are inexplicably designed only to cover non-typical cases in terms of the direction blocks are copied, compared to what is actually needed by the snapshot commands exposed in the user interfaces. There is also a lack of clear rules as to how the qcow2 format handles being backed up while "live" and later restored, then jumped to a snapshot made just before the backup (this is the standard scenario for snapshot-based backups and restores). For instance this may cause the "reference count" fields in a restored file to be out of sync with the referring tables, if one or more blocks were written while the backup program was reading the qcow2 file sequentially. It seems strange that such a basic operation, using the native qemu file format, isn't considered apriority in terms of reliability and performance. P.S. In case you didn't know, the standard way to backup virtual machines (qemu orotherwise) is this sequence: - Create snapshot named "Backup #xxxx"using savevm. - Sequentially copy disk image file using a tool such as gnu tar. (Any byte ranges that change due to the running VM may get backed up with their values at any time during this copy operation). - Remove snapshot named "Backup #xxxx" After the disaster: - Restore disk image file as it was seen by the backup tool. -Restart virtual machine from the disk image, memory image etc. represented by thesnapshot named "Backup #xxxx" (loadvm). Enjoy Jakob -- Jakob Bohm, CIO, Partner, WiseMo A/S. http://www.wisemo.com Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10 This public discussion message is non-binding and may contain errors. WiseMo - Remote Service Management for PCs, Phones and Embedded