Hi all! Thinking of how to prevent dereferencing to zero (and discard) of host cluster during flush of compressed cache (which I'm working on now), I have a question.. What prevents it for normal writes?
A simple interactive qemu-io session on master branch: ./qemu-img create -f qcow2 x 1M [root@kvm build]# ./qemu-io blkdebug::x do initial write: qemu-io> write -P 1 0 64K wrote 65536/65536 bytes at offset 0 64 KiB, 1 ops; 00.12 sec (556.453 KiB/sec and 8.6946 ops/sec) rewrite, and break before write (assume long write by fs or hardware for some reason) qemu-io> break write_aio A qemu-io> aio_write -P 2 0 64K blkdebug: Suspended request 'A' OK, we stopped before write. Everything is already allocated on initial write, mutex now resumed.. And suddenly we do discard: qemu-io> discard 0 64K discard 65536/65536 bytes at offset 0 64 KiB, 1 ops; 00.00 sec (146.034 MiB/sec and 2336.5414 ops/sec) Now, start another write, to another place.. But it will allocate same host cluster!!! qemu-io> write -P 3 128K 64K wrote 65536/65536 bytes at offset 131072 64 KiB, 1 ops; 00.08 sec (787.122 KiB/sec and 12.2988 ops/sec) Check it: qemu-io> read -P 3 128K 64K read 65536/65536 bytes at offset 131072 64 KiB, 1 ops; 00.00 sec (188.238 MiB/sec and 3011.8033 ops/sec) resume our old write: qemu-io> resume A blkdebug: Resuming request 'A' qemu-io> wrote 65536/65536 bytes at offset 0 64 KiB, 1 ops; 0:05:07.10 (213.400382 bytes/sec and 0.0033 ops/sec) of course it doesn't influence first cluster, as it is discarded: qemu-io> read -P 2 0 64K Pattern verification failed at offset 0, 65536 bytes read 65536/65536 bytes at offset 0 64 KiB, 1 ops; 00.00 sec (726.246 MiB/sec and 11619.9352 ops/sec) qemu-io> read -P 0 0 64K read 65536/65536 bytes at offset 0 64 KiB, 1 ops; 00.00 sec (632.348 MiB/sec and 10117.5661 ops/sec) But in 3rd cluster data is corrupted now: qemu-io> read -P 3 128K 64K Pattern verification failed at offset 131072, 65536 bytes read 65536/65536 bytes at offset 131072 64 KiB, 1 ops; 00.00 sec (163.922 MiB/sec and 2622.7444 ops/sec) qemu-io> read -P 2 128K 64K read 65536/65536 bytes at offset 131072 64 KiB, 1 ops; 00.00 sec (257.058 MiB/sec and 4112.9245 ops/sec So, that's a classical use-after-free... For user it looks like racy write/discard to one cluster may corrupt another cluster... It may be even worse, if use-after-free corrupts metadata. Note, that initial write is significant, as when we do allocate cluster we write L2 entry after data write (as I understand), so the race doesn't happen. But, if consider compressed writes, they allocate everything before write.. Let's check: [root@kvm build]# ./qemu-img create -f qcow2 x 1M; ./qemu-io blkdebug::x Formatting 'x', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1048576 lazy_refcounts=off refcount_bits=16 qemu-io> break write_compressed A qemu-io> aio_write -c -P 1 0 64K qemu-io> compressed: 327680 79 blkdebug: Suspended request 'A' qemu-io> discard 0 64K discarded: 327680 discard 65536/65536 bytes at offset 0 64 KiB, 1 ops; 00.01 sec (7.102 MiB/sec and 113.6297 ops/sec) qemu-io> write -P 3 128K 64K normal cluster alloc: 327680 wrote 65536/65536 bytes at offset 131072 64 KiB, 1 ops; 00.06 sec (1.005 MiB/sec and 16.0774 ops/sec) qemu-io> resume A blkdebug: Resuming request 'A' qemu-io> wrote 65536/65536 bytes at offset 0 64 KiB, 1 ops; 0:00:15.90 (4.026 KiB/sec and 0.0629 ops/sec) qemu-io> read -P 3 128K 64K Pattern verification failed at offset 131072, 65536 bytes read 65536/65536 bytes at offset 131072 64 KiB, 1 ops; 00.00 sec (237.791 MiB/sec and 3804.6539 ops/sec) (strange, but seems it didn't fail several times for me.. But now it fails several times... Anyway, it's all not good). -- Best regards, Vladimir