On 5/28/20 10:04 AM, Alberto Garcia wrote:
On Wed 27 May 2020 07:58:10 PM CEST, Eric Blake wrote:
There is just one thing to take into account for a possible future
improvement: compressed clusters cannot be partially zeroized so
zero_l2_subclusters() on the head or the tail can return -ENOTSUP.
This makes the caller repeat the *complete* request and write actual
zeroes to disk. This is sub-optimal because
1) if the head area was compressed we would still be able to use
the fast path for the body and possibly the tail.
2) if the tail area was compressed we are writing zeroes to the
head and the body areas, which are already zeroized.
Is this true? The block layer tries hard to break zero requests up so
that any non-cluster-aligned requests do not cross cluster boundaries.
In practice, that means that when you have an unaligned request, the
head and tail cluster will be the same cluster, and there is no body in
play, so that returning -ENOTSUP is correct because there really is no
other work to do and repeating the entire request (which is less than a
cluster in length) is the right approach.
Let's use an example.
cluster size is 64KB, subcluster size is 2KB, and we get this request:
write -z 31k 130k
Since pwrite_zeroes_alignment equals the cluster size (64KB), this
would result in 3 calls to qcow2_co_pwrite_zeroes():
offset=31k size=33k [-ENOTSUP, writes actual zeros]
offset=64k size=64k [zeroized using the relevant metadata bits]
offset=128k size=33k [-ENOTSUP, writes actual zeros]
However this patch changes the alignment:
bs->bl.pwrite_zeroes_alignment = s->subcluster_size;
Ah, I missed that trick. But it is nice, and indeed...
so we get these instead:
offset=31k size=1k [-ENOTSUP, writes actual zeros]
offset=32k size=128k [zeroized using the relevant metadata bits]
offset=160k size=1k [-ENOTSUP, writes actual zeros]
So far, so good. Reducing the alignment requirements allows us to
maximize the number of subclusters to zeroize.
...we can now hit a request that is not cluster-aligned.
Now let's suppose we have this request:
write -z 32k 128k
This one is aligned so it goes directly to qcow2_co_pwrite_zeroes().
However if the third cluster is compressed then the function will
return -ENOTSUP after having zeroized the first 96KB of the request,
forcing the caller to repeat it completely using the slow path.
I think the problem also exists in the current code (without my
patches). If you zeroize 10 clusters and the last one is compressed
you have to repeat the request after having zeroized 9 clusters.
Hmm. In the pre-patch code, qcow2_co_pwrite_zeroes() calls
qcow2_cluster_zeroize() which can fail with -ENOTSUP up front, but not
after the fact. Once it starts the while loop over clusters, its use of
zero_in_l2_slice() handles compressed clusters just fine; as far as I
can tell, only your new subcluster handling lets it now fail with
-ENOTSUP after earlier clusters have been visited.
But isn't this something we could solve recursively? Instead of
returning -ENOTSUP, we could have zero_in_l2_slice() call
bdrv_pwrite_zeroes() on the (sub-)clusters associated with a compressed
cluster.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org