Am 20.04.2012 15:06, schrieb Stefan Hajnoczi: > On Fri, Apr 20, 2012 at 1:27 PM, Stefan Hajnoczi <stefa...@gmail.com> wrote: >> On Fri, Apr 20, 2012 at 9:24 AM, Kevin Wolf <kw...@redhat.com> wrote: >>> Am 19.04.2012 23:18, schrieb Marcelo Tosatti: >>>> On Thu, Apr 19, 2012 at 05:14:20PM -0300, Marcelo Tosatti wrote: >>>>>> There is one intended change in functionality in this patch, which is >>>>>> that it allocates new clusters even when it could satisfy the first part >>>>>> of the request with already allocated clusters. In order to check if >>>>>> there is a problem with this scenario, the following patch should revert >>>>>> to the old behaviour: >>>>>> >>>>>> --- a/block/qcow2-cluster.c >>>>>> +++ b/block/qcow2-cluster.c >>>>>> @@ -847,7 +847,7 @@ again: >>>>>> keep_clusters = count_contiguous_clusters(nb_clusters, >>>>>> s->cluster_size, >>>>>> &l2_table[l2_index], >>>>>> 0, 0); >>>>>> assert(keep_clusters <= nb_clusters); >>>>>> - nb_clusters -= keep_clusters; >>>>>> + nb_clusters = 0; >>>>>> } else { >>>>>> /* For the moment, overwrite compressed clusters one by one */ >>>>>> if (cluster_offset & QCOW_OFLAG_COMPRESSED) { >>>>>> >>>>>> The rest is meant to be a functionally equivalent rewrite of the old >>>>>> code that was required in order to allow this change. >>>>> >>>>> Testing. >>>> >>>> Corruption gone with patch above. >>> >>> Okay, so something must be wrong only with the new code paths, which >>> makes things a bit easier. I'll have another closer look. Stefan, can >>> you re-review 250196f1 as well? >> >> I'm taking a look. > > Just looking at the qemu-img check output that Marcelo posted I'm > interpreting that as OFLAG_COPIED was set on the offset but its > refcount was 2. > > The refcount shouldn't be 2 unless internal snapshots were used. > > So we probably incremented the refcount either too often or for the > wrong block (which is more likely since the guest sees corruption).
I've looked at the same thing now and I think the same cluster was used both for a regular data allocation and for a new refcount block. This may happen because alloc_refcount_block() uses s->free_cluster_index which is updated by qcow2_alloc_cluster_noref(), but not by qcow2_alloc_cluster_at(). Just a theory so far, though, and not yet tried out in practice. If this is it, a patch would look like this (completely untested as well): --- a/block/qcow2-refcount.c +++ b/block/qcow2-refcount.c @@ -587,6 +587,7 @@ int qcow2_alloc_clusters_at(BlockDriverState *bs, uint64_t offset, { BDRVQcowState *s = bs->opaque; uint64_t cluster_index; + uint64_t old_free_cluster_index; int i, refcount, ret; /* Check how many clusters there are free */ @@ -602,11 +603,16 @@ int qcow2_alloc_clusters_at(BlockDriverState *bs, uint64_t offset, } /* And then allocate them */ + old_free_cluster_index = s->free_cluster_index; + s->free_cluster_index = cluster_index + i; + ret = update_refcount(bs, offset, i << s->cluster_bits, 1); if (ret < 0) { return ret; } + s->free_cluster_index = old_free_cluster_index; + return i; } Kevin