I recently ran across an issue (completely my own fault) that others
have encountered with varying details/success in fixing. I had a VM
stuck in shutdown (windoze asking/waiting to kill a program) that I
thought was already down when I created a snapshot on the 3 disks
attached to the VM. After running the snapshot command I went back to
the machine and instead of just turning off (which would have been
better), I let the shutdown complete. 

Needless to say all 3 images had corruption to varying degrees. The
first disk, system disk, was the worse. The other 2 has databases and
were repairable via the "qemu-img check -r all image.img" command (with
a bunch of messages/warnings). I suspect the limited activity on
shutdown helped save them. The system disk would not perform a check, it
encountered: 

qemu-img: Could not open 'image.img': Could not read snapshots: File too
large 

Searching online for this returns different repair methods, but the
latest version of qemu I compiled for a newer qemu-img (I did not want
to use an older version as suggested in posts), I pulled latest source,
compiled, but I got the same error trying to check or convert the image.
I dug into the qcow2 code, silenced that particular error, and was able
to get the check to actually run (I was able to work around the problem
and let the repair run with modifications to block/qcow2.c about line
1136 and ignoring the return result if 27 (EFBIG) and setting res to 0;
probably really bad to do, just did this to get get to checks). The
repair run repaired the image to the point the checks came back OK.
Unfortunately the image was still broke, trying to list snapshots or use
image returned the file to long error again. 

Ultimately I was able to repair the system disk by converting the image
to raw as suggested in other posts now that it was repaired and was able
to start the machine again right where it left off (or at least it
appears so). Disk checks within the machine return OK. One thing I am
unsure of is how safe the qemu images are in regards to snapshots, and I
dare not try to do anything with them as they are, and will convert to
raw then all of them back into qemu images. 

Even though this is entirely due to creating a snapshot while the disk
is in use, some thoughts: 

- if a user is trying to run a repair it should not error about
snapshots and proceed with checks/repairs and allow convert if possible.

- if possible, before actually doing a snapshot, check if the file is in
use to avoid this situation all together 

I would submit a patch, but I do not know enough about the possible
repercussions of ignoring an error and repairing/converting. 

Any questions please reply. 

 

Reply via email to