On 09/13/2010 06:48 AM, Kevin Wolf wrote:
Am 13.09.2010 13:34, schrieb Avi Kivity:
On 09/13/2010 01:28 PM, Kevin Wolf wrote:
Anytime you grow the freelist with qcow2, you have to write a brand new
freelist table and update the metadata synchronously to point to a new
version of it. That means for a 1TB image, you're potentially writing
out 128MB of data just to allocate a new cluster.
No. qcow2 has two-level tables.
File size: 1 TB
Number of clusters: 1 TB / 64 kB = 16 M
Number of refcount blocks: (16 M * 2 B) / 64kB = 512
Total size of all refcount blocks: 512 * 64kB = 32 MB
Size of recount table: 512 * 8 B = 4 kB
When we grow an image file, the refcount blocks can stay where they are,
only the refcount table needs to be rewritten. So we have to copy a
total of 4 kB for growing the image file when it's 1 TB in size (all
assuming 64k clusters).
The other result of this calculation is that we need to grow the
refcount table each time we cross a 16 TB boundary. So additionally to
being a small amount of data, it doesn't happen in practice anyway.
Interesting, I misremembered it as 8 bytes per cluster, not 2. So it's
actually fairly dense (though still not as dense as a bitmap).
Yes, refcounts are 16 bit. Just checked it with the code once again to
be 100% sure. But if it was only that, it would be just a small factor.
The important part is that it's a two-level structure, so Anthony's
numbers are completely off.
A two-level structure makes growth more efficient, however, searching
for a free cluster is still an expensive operation on large disk
images. This is an important point because without snapshots, the
argument for a refcount table is supporting UNMAP and efficient UNMAP
support in qcow2 looks like it will require an additional structure.
One of the troubles with qcow2 as a format is that the metadata on disk
is redundant, it's already defined as authoritative. So while in QED,
we can define the L1/L2 tables as the only authoritative source of
information and treat a freelist as an optimization, the refcount table
must remain authoritative in qcow2 in order to remain backwards compatible.
You could rewrite the header to be qcow3 in order to relax this
restriction but then you lose image mobility to older versions which
really negates the advantage of not introducing a new format.
Regards,
Anthony Liguori
Regards,
Anthony Liguori
Kevin