On Fri, Apr 20, 2012 at 1:27 PM, Stefan Hajnoczi <stefa...@gmail.com> wrote: > On Fri, Apr 20, 2012 at 9:24 AM, Kevin Wolf <kw...@redhat.com> wrote: >> Am 19.04.2012 23:18, schrieb Marcelo Tosatti: >>> On Thu, Apr 19, 2012 at 05:14:20PM -0300, Marcelo Tosatti wrote: >>>>> There is one intended change in functionality in this patch, which is >>>>> that it allocates new clusters even when it could satisfy the first part >>>>> of the request with already allocated clusters. In order to check if >>>>> there is a problem with this scenario, the following patch should revert >>>>> to the old behaviour: >>>>> >>>>> --- a/block/qcow2-cluster.c >>>>> +++ b/block/qcow2-cluster.c >>>>> @@ -847,7 +847,7 @@ again: >>>>> keep_clusters = count_contiguous_clusters(nb_clusters, >>>>> s->cluster_size, >>>>> &l2_table[l2_index], >>>>> 0, 0); >>>>> assert(keep_clusters <= nb_clusters); >>>>> - nb_clusters -= keep_clusters; >>>>> + nb_clusters = 0; >>>>> } else { >>>>> /* For the moment, overwrite compressed clusters one by one */ >>>>> if (cluster_offset & QCOW_OFLAG_COMPRESSED) { >>>>> >>>>> The rest is meant to be a functionally equivalent rewrite of the old >>>>> code that was required in order to allow this change. >>>> >>>> Testing. >>> >>> Corruption gone with patch above. >> >> Okay, so something must be wrong only with the new code paths, which >> makes things a bit easier. I'll have another closer look. Stefan, can >> you re-review 250196f1 as well? > > I'm taking a look.
Just looking at the qemu-img check output that Marcelo posted I'm interpreting that as OFLAG_COPIED was set on the offset but its refcount was 2. The refcount shouldn't be 2 unless internal snapshots were used. So we probably incremented the refcount either too often or for the wrong block (which is more likely since the guest sees corruption). Stefan