Re: [Qemu-devel] [PATCH 2/2] qcow2: Avoid memory over-allocation on compressed images

Eric Blake Wed, 21 Feb 2018 07:01:50 -0800

On 02/21/2018 04:04 AM, Alberto Garcia wrote:

On Tue 20 Feb 2018 11:24:59 PM CET, Eric Blake wrote:


I was also preparing a patch to change this, but you arrived first :-)

So, it's time to cut back on the waste.  A compressed cluster
will NEVER occupy more than an uncompressed cluster (okay, gzip
DOES document that because the compression stream adds metadata,
and because of the pigeonhole principle, there are worst case
scenarios where attempts to compress will actually inflate an
image - but in those cases, we would just write the cluster
uncompressed instead of inflating it).  And as that is a smaller
amount of memory, we can get by with the simpler g_malloc.

-        if (!s->cluster_cache) {
-            s->cluster_cache = g_malloc(s->cluster_size);
+            assert(!s->cluster_cache);
+            s->cluster_data = g_try_malloc(s->cluster_size);
+            s->cluster_cache = g_try_malloc(s->cluster_size);

Shoot - I made edits that I forgot to commit before sending; I meant forthese to be g_malloc() rather than g_try_malloc().


There's a few things here:

- QEMU won't write compressed data if the size is >= s->cluster_size
   (there's an explicit check for that in qcow2_co_pwritev_compressed())

Correct, we never cause inflation (and even if we wanted to, we can't,because the qcow2 format doesn't have enough bits for us to record thatmany sectors for a compressed stream that occupies more space than theoriginal cluster).


- The size field of the compressed cluster descriptor *does* allow
   larger sizes, so you can't simply read csize bytes into
   s->cluster_data becuase you could cause a buffer overflow.


Let's step through this:

  nb_csectors = ((cluster_offset >> s->csize_shift) & s->csize_mask + 1;

Since s->csize_mask is determined by s->cluster_size / 512, then afterthis assignment, the most nb_csectors can be is exactly s->cluster_size/ 512.


  sector_offset = coffset & 511;
  csize = nb_csectors * 512 - sector_offset;

And here, csize can only get smaller than the length picked bynb_csectors. Therefore, csize is GUARANTEED to be <= c->sector_size.


- Solution a: check that csize < s->cluster_size and return an error if
   it's not. However! although QEMU won't produce an image with a
   compressed cluster that is larger than the uncompressed one, the qcow2
   on-disk format in principle allows for that, so arguably we should
   accept it.

No, the qcow2 on-disk format does not have enough bits reserved forthat. It is impossible to store an inflated cluster, because you don'thave enough bits to record it.

That said, we MAY have a bug, more likely to be visible in 512-byteclusters but possible on any size. While the 'number sectors' field ISsufficient for any compressed cluster starting at offset 0 in thecluster, we run into issues if the starting offset is later in thecluster. That is, even though the compressed length of a 512-bytecluster is <= 512 (or we wouldn't compress it), if we start at offset510, we NEED to read the next cluster to get the rest of the compressedstream - but at 512-byte clusters, there are 0 bits reserved for 'numbersectors'. Per the above calculations with the example offset of 510,nb_csectors would be 1 (it can't be anything else for a 512-byte clusterimage), and csize would then be 2 bytes, which is insufficient forreading back enough data to reconstruct the cluster.

We probably need to clarify in the spec that if the starting offset of acompressed cluster falls mid-sector, then the compressed size has to besmaller than cluster size - (offset % 512) (this requirement is alreadythere implicitly due to the field widths, but being explicit can'thurt). We probably also need to fix our compression code to actually dothe right thing, particularly for 512-byte clusters where we are mostlikely to run into a compressed size that is likely to overflow thespace available for nb_csectors.


- Solution b: the width of the 'compressed cluster size' field is
   (cluster_bits - 8), that's (cluster_size / 256) sectors.

Not true. It is (cluster_bits - 9) or (cluster_size / 512). Remember,x = 62 - (cluster_bits - 8); for a 512-byte cluster, x = 61. The'number sectors' field is then bits x+1 - 61 (but you can't have abitfield occupying bit 62 upto 61; especially since bit 62 is the bitfor compressed cluster).

Since the
   size of each sector is 512 bytes, the maximum possible size that the
   field can store is (cluster_size * 2) bytes. So allocate that amount
   of space for s->cluster_data, read the data as it is on disk and let
   the decompression code return an error if the data is indeed
   corrupted or it doesn't fit in the output buffer.

Again, I argue that the qcow2 spec says that the maximum size for acompressed cluster is cluster_size, even if it spills over a hostcluster boundary. But if in practice, we HAVE allowed a spilloverbeyond the 'number fields' maximum size because of a non-zero offset,where we weren't careful about what got recorded into the qcow2 image,then we could instead state that reading s->cluster_size+512 isguaranteed to find the end of the compressed stream, even when 'numberfields' is 0 because of overflow, at which point, sizing the buffer tobe one sector larger than a cluster will mean that we can cope even with512-byte clusters that were compressed incorrectly.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH 2/2] qcow2: Avoid memory over-allocation on compressed images

Reply via email to