On 2014-10-21 at 09:52, Kevin Wolf wrote:
Am 20.10.2014 um 16:35 hat Max Reitz geschrieben:If a referenced cluster has a refcount of 0, increasing its refcount may result in clusters being allocated for the refcount structures. This may overwrite the referenced cluster, therefore we cannot simply increase the refcount then.In such cases, we can either try to replicate all the refcount operations solely for the check operation, basing the allocations on the in-memory refcount table; or we can simply rebuild the whole refcount structure based on the in-memory refcount table. Since the latter will be much easier, do that. To prepare for this, introduce a "rebuild" boolean which should be set to true whenever a fix is rather dangerous or too complicated using the current refcount structures. Another example for this is refcount blocks being referenced more than once. Signed-off-by: Max Reitz <mre...@redhat.com> Reviewed-by: BenoƮt Canet <benoit.ca...@nodalink.com> @@ -1557,6 +1442,7 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res, fprintf(stderr, "ERROR refcount block %" PRId64 " is not " "cluster aligned; refcount table entry corrupted\n", i); res->corruptions++; + *rebuild = true; continue; }@@ -1612,6 +1498,7 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res, resize_fail:res->corruptions++; + *rebuild = true; fprintf(stderr, "ERROR could not resize image: %s\n", strerror(-ret)); } else {Increasing res->corruptions in this patch looks right because you don't actually rebuild anything yet. However, at the end of the series this code still looks the same - shouldn't we somehow convert those the corruptions caused by wrong refcounts into res->corruptions_fixed? Hm... It seems that you do this by storing intermediate results in qcow2_check_refcounts(), which assumes that compare_refcounts() finds only refcount problems, and no other place can find any. This may be true, but wouldn't it be more elegant to have a separate counter for corruptions that will be fixed with a rebuild? We can use some Qcow2CheckResult instead of BdrvCheckResult internally and add more fields there. Also, I don't think all "ERROR" messages correctly say "Repairing" again at the end of the series.
I'll have a look on this, but this hunk for instance should not say "Repairing". It's an error and it is not directly repaired. I think the output "Rebuilding refcount structure" should be sufficient.
@@ -1624,40 +1511,10 @@ resize_fail: inc_refcounts(bs, res, *refcount_table, *nb_clusters, offset, s->cluster_size); if ((*refcount_table)[cluster] != 1) { - fprintf(stderr, "%s refcount block %" PRId64 - " refcount=%d\n", - fix & BDRV_FIX_ERRORS ? "Repairing" : - "ERROR", - i, (*refcount_table)[cluster]); - - if (fix & BDRV_FIX_ERRORS) { - int64_t new_offset; - - new_offset = realloc_refcount_block(bs, i, offset); - if (new_offset < 0) { - res->corruptions++; - continue; - } - - /* update refcounts */ - if ((new_offset >> s->cluster_bits) >= *nb_clusters) { - /* increase refcount_table size if necessary */ - int old_nb_clusters = *nb_clusters; - *nb_clusters = (new_offset >> s->cluster_bits) + 1; - *refcount_table = g_renew(uint16_t, *refcount_table, - *nb_clusters); - memset(&(*refcount_table)[old_nb_clusters], 0, - (*nb_clusters - old_nb_clusters) * - sizeof(uint16_t)); - } - (*refcount_table)[cluster]--; - inc_refcounts(bs, res, *refcount_table, *nb_clusters, - new_offset, s->cluster_size); - - res->corruptions_fixed++; - } else { - res->corruptions++; - } + fprintf(stderr, "ERROR refcount block %" PRId64 + " refcount=%d\n", i, (*refcount_table)[cluster]); + res->corruptions++; + *rebuild = true; } } } @@ -1669,8 +1526,8 @@ resize_fail: * Calculates an in-memory refcount table. */ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res, - BdrvCheckMode fix, uint16_t **refcount_table, - int64_t *nb_clusters) + BdrvCheckMode fix, bool *rebuild, + uint16_t **refcount_table, int64_t *nb_clusters) { BDRVQcowState *s = bs->opaque; int64_t i; @@ -1713,7 +1570,7 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res, s->refcount_table_offset, s->refcount_table_size * sizeof(uint64_t));- return check_refblocks(bs, res, fix, refcount_table, nb_clusters);+ return check_refblocks(bs, res, fix, rebuild, refcount_table, nb_clusters); }/*@@ -1721,7 +1578,8 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res, * refcount as reported by the refcount structures on-disk. */ static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res, - BdrvCheckMode fix, int64_t *highest_cluster, + BdrvCheckMode fix, bool *rebuild, + int64_t *highest_cluster, uint16_t *refcount_table, int64_t nb_clusters) { BDRVQcowState *s = bs->opaque; @@ -1748,10 +1606,15 @@ static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res, int *num_fixed = NULL; if (refcount1 > refcount2 && (fix & BDRV_FIX_LEAKS)) { num_fixed = &res->leaks_fixed; - } else if (refcount1 < refcount2 && (fix & BDRV_FIX_ERRORS)) { + } else if (refcount1 > 0 && refcount1 < refcount2 && + (fix & BDRV_FIX_ERRORS)) { num_fixed = &res->corruptions_fixed; }+ if (refcount1 == 0) {+ *rebuild = true; + }Why isn't this just another else if branch? Possibly even as the first or second condition, so that you don't have to add 'refcount1 > 0' for the other branch.
Because it's semantically different. The if-else sets the num_fixed pointer, whereas this conditional block forces a rebuild. I can include it as the second condition, just at the time I liked it more to have it externally.
Now that I'm reading once again over it... Probably using it as the second condition is better. I'll see to it.
+ fprintf(stderr, "%s cluster %" PRId64 " refcount=%d reference=%d\n", num_fixed != NULL ? "Repairing" : refcount1 < refcount2 ? "ERROR" : @@ -1790,6 +1653,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res, BDRVQcowState *s = bs->opaque; int64_t size, highest_cluster, nb_clusters; uint16_t *refcount_table = NULL; + bool rebuild = false; int ret;size = bdrv_getlength(bs->file);@@ -1807,14 +1671,19 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res, res->bfi.total_clusters = size_to_clusters(s, bs->total_sectors * BDRV_SECTOR_SIZE);- ret = calculate_refcounts(bs, res, fix, &refcount_table, &nb_clusters);+ ret = calculate_refcounts(bs, res, fix, &rebuild, &refcount_table, + &nb_clusters); if (ret < 0) { goto fail; }- compare_refcounts(bs, res, fix, &highest_cluster, refcount_table,+ compare_refcounts(bs, res, fix, &rebuild, &highest_cluster, refcount_table, nb_clusters);+ if (rebuild) {+ fprintf(stderr, "ERROR need to rebuild refcount structures\n"); + }Should this increate res->check_errors? (It doesn't survive until the end of the series anyway, but more places seem to be added later that print an error message without increasing the error count.)
Yes, it should. We could even bail out, but it doesn't make much sense here (it makes more sense in patch 8).
Thanks, Max