2011/8/5 Kevin Wolf <kw...@redhat.com>: > Am 05.08.2011 08:35, schrieb Frediano Ziglio: >> 2011/8/4 Kevin Wolf <kw...@redhat.com>: >>> When loading an internal snapshot whose L1 table is smaller than the >>> current L1 >>> table, the size of the current L1 would be shrunk to the snapshot's L1 size >>> in >>> memory, but not on disk. This lead to incorrect refcount updates and >>> eventuelly >>> to image corruption. >>> >>> Instead of writing the new L1 size to disk, this simply retains the bigger >>> L1 >>> size that is currently in use and makes sure that the unused part is zeroed. >>> >>> Signed-off-by: Kevin Wolf <kw...@redhat.com> >>> --- >>> >>> And the moment you send it out, you notice that it's wrong... *sigh* >>> >>> v2: >>> - Check for s->l1_size > sn->l1_size in order to avoid disasters... >>> >>> Philipp, I think this should fix your corruption. Please give it a try. >>> >>> Anthony, this must go into 0.15. Given the short time until -rc2, do you >>> prefer >>> to pick it up directly or should I send a pull request tomorrow? The patch >>> looks obvious, is tested with the given testcase and survives a basic >>> qemu-iotests run (though qemu-iotests doesn't exercise snapshots a lot) >>> >>> Stefan, please review :-) >>> >>> block/qcow2-snapshot.c | 5 ++++- >>> 1 files changed, 4 insertions(+), 1 deletions(-) >>> >>> diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c >>> index 74823a5..6972e66 100644 >>> --- a/block/qcow2-snapshot.c >>> +++ b/block/qcow2-snapshot.c >>> @@ -330,8 +330,11 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const >>> char *snapshot_id) >>> if (qcow2_grow_l1_table(bs, sn->l1_size, true) < 0) >>> goto fail; >>> >>> - s->l1_size = sn->l1_size; >>> + if (s->l1_size > sn->l1_size) { >>> + memset(s->l1_table + sn->l1_size, 0, s->l1_size - sn->l1_size); >>> + } >>> l1_size2 = s->l1_size * sizeof(uint64_t); >>> + >>> /* copy the snapshot l1 table to the current l1 table */ >>> if (bdrv_pread(bs->file, sn->l1_table_offset, >>> s->l1_table, l1_size2) != l1_size2) >>> -- >>> 1.7.6 >>> >> >> This patch looked odd at first sight. First a qcow2_grow_l1_table is >> called to shrink L1 so perhaps should be qcow2_resize_l1_table. > > No, it doesn't shrink the table: > > if (min_size <= s->l1_size) > return 0; >
Yes, but perhaps returning success and not clipping anything is not that correct, perhaps qcow2_snapshot_goto should not call qcow2_grow_l1_table with a shorter value. >> Perhaps also it would be better to clean entries in >> qcow2_grow_l1_table instead of qcow2_snapshot_goto to avoid same >> problem in different calls to qcow2_grow_l1_table. The other oddity >> (still to understand) is: why does some code use l1_table above >> l1_size ?? > > Which code do you mean specifically? > > Kevin > I think this is the issue # 1204 -> 128 cluster per L2 entry -> 128k per L2 entry # 128 cluster per L1 entry -> 16M per L1 entry qemu-img create -f qcow2 /tmp/sn.qcow2 16m -o cluster_size=1024 qemu-img snapshot -c foo /tmp/sn.qcow2 # extend L1 qemu-io -c 'write -b 0 4M' /tmp/sn.qcow2 # shrink qemu-img snapshot -a foo /tmp/sn.qcow2 qemu-img check /tmp/sn.qcow2 Well... I was trying to get a leak and got a core instead. I removed your patch and got leaks. also, should not be memset(s->l1_table + sn->l1_size, 0, (s->l1_size - sn->l1_size) * sizeof(uint64_t)); instead of memset(s->l1_table + sn->l1_size, 0, s->l1_size - sn->l1_size); Frediano