Re: [Qemu-devel] [PATCH v2] specs/qcow2: Fix documentation of the compressed cluster descriptor

Eric Blake Tue, 20 Feb 2018 11:41:49 -0800

On 02/20/2018 11:01 AM, Alberto Garcia wrote:

tl:dr; I think we need a v3 with even more clarification.

The documentation claims that the cluster descriptor contains the
number of sectors used to store the compressed data, but what it
actually contains is the number of sectors *minus one*.

That can be easily seen in qcow2_decompress_cluster(), that adds one
to the value stored in that field:

   nb_csectors = ((cluster_offset >> s->csize_shift) & s->csize_mask) + 1;

This is misleading. It says how we take what is in the qcow2 file onreading in order to decompress, but still doesn't show how we generatedthat number. Let's also compare it as well to what we WRITE into theqcow2 file:


    nb_csectors = ((cluster_offset + compressed_size - 1) >> 9) -
                  (cluster_offset >> 9);

I'm also making an additional observationn: Due to the pigeonholeprinciple and the fact that the compression stream adds metadata, weKNOW that there are some (rare) cases where attempting to compress datawill actually result in an INCREASE in size ('man gzip' backs up thisclaim, calling out a worst case -0.015% compression ratio, or 15 bytesadded for every 1000 bytes of input, on uncompressible data). Sopresumably, we should state that a cluster can only be written incompressed form IF it occupies less space than the uncompressed cluster(we could also allow a compressed form that occupies the same length asthe uncompressed cluster, but that's a waste of CPU cycles).

Once we have that restriction stated, then it becomes obvious that acompressed cluster should never REQUIRE using more than one host cluster(and this is backed up by qcow2_alloc_bytes() asserting that size <=s->cluster_size). Where things get interesting, though, is whether wePERMIT a compressed cluster to overlap a host cluster boundary.Technically, it might be possible, but qemu does NOT do that (again,looking at qcow2_alloc_bytes() - we loop if free_in_cluster < size) - sowe may want to be explicit about this point to prevent OTHERimplementations from creating a compressed cluster that crosses hostcluster boundaries (right now, I can't see qcow2_decompress_cluster()validating it, though - YIKES).


In addition to that this patch clarifies where the actual compressed
data is located.

Although the size of the data is specified in sectors, the offset is
not necessarily aligned to a sector boundary, so the actual data goes
from the specified offset until the end of the last sector, leaving
the initial bytes of the first sector (if any) unused.

Signed-off-by: Alberto Garcia <be...@igalia.com>
---

v2: I realized that the documentation is not completely clear about
     the exact location and size of the compressed data, so I updated
     the patch to clarify this.

---
  docs/interop/qcow2.txt | 12 ++++++++++--
  1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index d7fdb1fee3..dc2b9cefb2 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -427,9 +427,17 @@ Standard Cluster Descriptor:
  Compressed Clusters Descriptor (x = 62 - (cluster_bits - 8)):

I'm looking at how this works for different cluster sizes. If we have512-byte clusters, x is 61, and we DON'T have the 'number sectors' fieldat all! But that still makes sense, provided that all consecutive hostsectors used to hold a compressed guest cluster lie within a single hostcluster. If we ever allowed a compressed cluster to spill across twohost clusters, it would cause mayhem in trying to track refcounts andother things. So you can have two 512-byte guest clusters that manageto compress into the same host cluster, but must never have a singleguest cluster that spills over a host cluster boundary.

For all other cluster sizes, the value of x leaves us exactlylog2(cluster_size / 512) bits for the 'number sectors' field. Forexample, with 64k clusters, x is 54, leaving 7 bits (64k/512 == 128, and2^7 covers the number of sectors in a 64k host cluster). Whether all ofthese sectors should lie within the same host cluster should be stated(as I argued above, this is how qemu does it, so it SHOULD be part ofthe spec to prevent refcount and other confusion if some otherimplementation created images violating that).

      Bit  0 -  x:    Host cluster offset. This is usually _not_ aligned to a
-                    cluster boundary!
+                    cluster or sector boundary!

- x+1 - 61: Compressed size of the images in sectors of 512 bytes

+       x+1 - 61:    Number of 512-byte sectors used for the compressed data,
+                    minus one (that is, a value of n here means n+1 sectors).
+
+                    The actual compressed data is located at the end of this
+                    region, from the offset indicated in the previous field
+                    until the end of the last sector.
+
+                    The initial bytes of this region are therefore unused if
+                    the offset is not aligned to a sector boundary.

So, to make sure I understand, visually, suppose we have three 64kclusters that we compress; A which compresses to 224 bytes, B whichcompresses to 1120 bytes, and C which also compresses to 224 bytes(these numbers are reasonable, since 'printf "%*d" $((64*1024)) 1 | gzip-c | wc' compresses to 64k to 99 bytes) (the observant will notice Ipicked multiples of 32 bytes, for display purposes, but that in realitywe have full byte granularity for both starting location and length of acompressed cluster).


+----------------+----------------+----------------+----------------+
+0x00020000      +0x00020200      +0x00020400      +0x00020600      +
+AAAAAAABBBBBBBBB+BBBBBBBBBBBBBBBB+BBBBBBBBBBCCCCCC+C...............+

We achieve the following layout by compressing cluster A into the startof the cluster at 0x00020000 until it finishes, then cluster B into theunused tail of that sector until it finishes (starting at 0x000200e0),and cluster C into the tail of the sector started by B (starting at0x00020540, ending at 0x0002061f). Another interesting thing to note isthat our choice of compression algorithm carries enough state that wecan ALWAYS choose to feed too much to the decompression engine; but thetrailing data will be ignored because we stop once we've decompressed64k of material. So our real question is figuring out the bare minimumthat we can feed to the engine and still decompress everything we need.

The number of sectors used for cluster A is pretty obviously 0 (you donot have to read any additional sectors to decompress A). So the L2entry for A would be 0x40000000_00020000. On decompression, we read thesector starting at that offset, 0 additional sectors, and although wefeed the decompressor 512 bytes, it only needs to read 224 bytes beforewe are done reconstituting 64k bytes.

The value written in the number of sectors field for cluster B is 2.Yes, floor(1120/512) == 2, but more importantly, our decompression HASto read two additional sectors beyond the starting offset in order tofeed enough data to the engine. That is, the L2 entry is0x41000000_000200e0), and we feed the decompressor 1312 bytes eventhough it only needs 1120.

But the value of the field for cluster C is 1, not 0, EVEN THOUGH it wasthe same compressed size as cluster A! On write, we compute the sectorcontaining the last byte, which is one greater than the sectorcontaining the first byte. The L2 entry is 0x40800000_00020540; and wefeed the decompressor 704 bytes even though it only needs 224.


So if I may suggest:

   x+1 - 61:    Number of additional 512-byte sectors used for the
                compressed data, beyond the sector containing the
                offset in the previous field.  These sectors must fit
                within the same host cluster.  Note that the compressed
                data does not necessarily occupy all of the bytes in
                the final sector; rather, decompression stops when it
                has produced a cluster of data.  Another compressed
                cluster may map to the tail of the final sector used
                by this compressed cluster.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH v2] specs/qcow2: Fix documentation of the compressed cluster descriptor

Reply via email to