[Qemu-devel] QCow2 compression
Hello, I am hoping someone here can help me. I am implementing QCow2 support for a PC emulator project and have a couple questions regarding compression I haven't been able to figure out on my own. First some background: I am using the information I found at https://people.gnome.org/~markmc/qcow-image-format.html and I have implemented working support for QCow2 images as described there except for snapshots, encryption, and compression. Of these, only compression is of immediate use to me. I have some QCow2 images all using 16-bit clusters created using qemu-img 2.1.2 (the version bundled with Debian stable). According to the documentation I linked, 8 bits of an L2 table entry following the copy flag should say how many 512 byte sectors a compressed cluster takes up and the remaining bits are the actual offset of the cluster within the file. I have for example a compressed cluster with an L2 entry value of 4A C0 00 00 00 3D 97 50. This would lead me to believe the cluster starts at offset 0x3D9750 and has a length of 0x2B 512-byte sectors (or 0x2B times 0x200 = 0x5600). Added to the offset this would give an end for the cluster at offset 0x3DED50. However, it is clear from looking at the image that the compressed cluster extends further, the data ending at 0x3DEDD5 and being followed by some zero padding until 0x3DEDF0 where the file ends. How can I know the data extends beyond the length I calculated? Did I misunderstand the documentation somewhere? Why does the file end here versus a cluster aligned offset? A final question: I noticed that compressed clusters typically have a reference count higher than one, yet there are no snapshots present in the image. I suspect the count is incremented for each compressed cluster that exists even partially within a normal sized cluster region of the file, but I can find no documentation to this effect and am merely speculating. Am I correct? If it is the wrong place to ask these questions, I would appreciate it if anyone could direct me to a more appropriate venue. Thank you for taking the time to read this and tanks in advance for any assistance.
Re: [Qemu-devel] QCow2 compression
> > I have for example a compressed cluster with an L2 entry value of 4A > > C0 00 00 00 3D 97 50. This would lead me to believe the cluster starts > > at offset 0x3D9750 and has a length of 0x2B 512-byte sectors (or 0x2B > > times 0x200 = 0x5600). Added to the offset this would give an end for > > the cluster at offset 0x3DED50. However, it is clear from looking at > > the image that the compressed cluster extends further, the data ending > > at 0x3DEDD5 and being followed by some zero padding until 0x3DEDF0 > > where the file ends. How can I know the data extends beyond the length > > I calculated? Did I misunderstand the documentation somewhere? Why > > does the file end here versus a cluster aligned offset? > > This zero padding happens in the very last cluster in the image in order > to ensure that the image file is aligned to a multiple of the cluster > size (qcow2 images are defined to consist of "units of constant size", > i.e. only full clusters). > > The zeros are not part of the compressed data, though, that's why the > Compressed Cluster Descriptor indicates a shorter size. Had another > compressed cluster been written to the same image, it might have ended > up where you are seeing the zero padding now. (The trick with > compression is that multiple guest clusters can end up in a single host > cluster.) > Thanks, but the given length of 0x5600 is still short by 160(decimal) bytes compared to the non-zero data (which occupies an additional 133 bytes beyond the expected end at 0x3DED50) and zero padding (an additional 27 bytes beyond that). Could there be an off-by-one error somewhere? The file doesn't even end on a sector boundary let alone a cluster boundary. I can replicate this easily and produce files which demonstrate what I am seeing here. I will try to replicate using a newer version of the qemu-img. The version in Debian stable is quite old apparently.
Re: [Qemu-devel] QCow2 compression
Eric Blake wrote: > [any way you could convince your mailer to not break threading?] > > On 03/03/2016 09:24 PM, mgre...@cinci.rr.com wrote: > >> > >> The zeros are not part of the compressed data, though, that's why the > >> Compressed Cluster Descriptor indicates a shorter size. Had another > >> compressed cluster been written to the same image, it might have ended > >> up where you are seeing the zero padding now. (The trick with > >> compression is that multiple guest clusters can end up in a single host > >> cluster.) > >> > > > > Thanks, but the given length of 0x5600 is still short by 160(decimal) bytes > > compared to the > > non-zero data (which occupies an additional 133 bytes beyond the expected > > end at > > 0x3DED50) and zero > > padding (an additional 27 bytes beyond that). Could there be an off-by-one > > error > > somewhere? > > The file doesn't even end on a sector boundary let alone a cluster > > boundary. > > Based on an IRC conversation I had when you first asked the question, I > think the spec is indeed weak, and that we DO have some fishy code. > > Look at what the code does: > > uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs, >uint64_t offset, >int compressed_size) > ... > nb_csectors = ((cluster_offset + compressed_size - 1) >> 9) - > (cluster_offset >> 9); > > cluster_offset |= QCOW_OFLAG_COMPRESSED | > ((uint64_t)nb_csectors << s->csize_shift); > > That sure does sound like an off-by-one. cluster_offset does indeed > look like a byte offset (from qcow2_alloc_bytes); so let's consider what > happens if we've already allocated one compressed cluster in the past, > going from 65536 to 66999. So on this call, cluster_offset would be > 67000, and if compressed size is 1025 (just round numbers to make > discussion easy; no idea if gzip would really do this on any particular > data), we are computing ((67000+1025-1)>>9) - (67000>>9) == 2, but 1025 > bytes occupies 3 sectors, not 2. > > But we offset it by another off-by-one: > > int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset) > { > ... > nb_csectors = ((cluster_offset >> s->csize_shift) & > s->csize_mask) + 1; > > Yuck. That is horribly ugly. > > We need to fix the documentation to make it obvious that the sector > count is a _LOWER BOUND_ on the number of sectors occupied, and that you > need to read at least one more cluster's worth of data before decompressing. > > It would also be nice to fix qcow2 code to not have quite so many > off-by-one computations, but be more precise about what data is going where. > > -- > Eric Blake eblake redhat com+1-919-301-3266 > Libvirt virtualization library http://libvirt.org > Thanks for verifying the behavior for me. This will allow me to add compression support to the code I've already written. I've implemented QCow2 support for a fork of the popular DOSBox emulator. The project (not mine), is called DOSBox-X, and is geared towards emulating Windows9x era hardware with a specific focus on games: http://dosbox-x.com/ If you'd like to give me any feedback on the QCow2 code I've written please feel free to e-mail me directly. The relevant files are qcow2_disk.cpp and qcow_disk.h. If you do, please be kind as my C++ skills are very rusty. Overall I think it is pretty readable. It took me about 2 and a half weeks working in my spare time to go from reading the spec to the final code integrated into DOSBox-X. Thanks, Michael Greger