On 04/28/2017 03:48 PM, Max Reitz wrote: > On 27.04.2017 03:46, Eric Blake wrote: >> We've already improved discards to operate efficiently on the tail >> of an unaligned qcow2 image; it's time to make a similar improvement >> to write zeroes. The special case is only valid at the tail >> cluster of a file, where we must recognize that any sectors beyond >> the image end would implicitly read as zero, and therefore should >> not penalize our logic for widening a partial cluster into writing >> the whole cluster as zero. >> >> Update test 154 to cover the new scenarios, using two images of >> intentionally differing length. >> >> While at it, fix the test to gracefully skip when run as >> ./check -qcow2 -o compat=0.10 154 >> since the older format lacks zero clusters already required earlier >> in the test. >> >> Signed-off-by: Eric Blake <ebl...@redhat.com> >>
>> >> +echo >> +echo == unaligned image tail cluster, no allocation needed == >> + >> +CLUSTER_SIZE=1024 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024)) > > Any reason for the CLUSTER_SIZE? It passes with 64 kB as well, and I > actually think that would be a better test because as it is, the image's > "unaligned" tail is exactly one cluster (so it isn't really unaligned). Uhhh - copy-and-paste? You're right, that 1024 is too small for what I'm actually doing with it. :( > >> +_make_test_img -b "$TEST_IMG.base" $((size + 2048)) >> + >> +# With no backing file, write to all or part of unallocated partial cluster >> + >> +# Backing file: 128m: ... | 00 -- > > Not sure what the "00 --" is supposed to mean. The cluster size is 1 kB, > so this is just a single cluster; which will be converted from > unallocated to a zero cluster because the tail reads as zero. > >> +$QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io I'm doing a write of 512 bytes, so I was trying to show that in the cluster starting at 128m, we have a write request for one sector to be written to 00, while the rest of the sector is left uninitialized. The obvious intent is that we note that the uninitialized portion reads as zero, so we can optimize the entire cluster (even though it is a partial cluster) to be a cluster write-zero instead of a sector-based allocating write. But since you've pointed out that my setup is flawed, I'll have to double-check that I'm actually testing what I thought I was (I know I hit the right breakpoints in gdb, but its the iotest expected output that needs to show the difference between pre- and post-patched qemu with regards to the partial-tail-cluster optimizations). >> + >> +# Backing file: 128m: ... | -- 00 > > Same here, but now it's the head that reads as zero (and it's already a > zero cluster so nothing changes here). > >> +$QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io > > But I suppose you mean $((size + 512))? Umm. Yeah. (Hey - you're not the only one that writes patches late at night). > >> + >> +# Backing file: 128m: ... | 00 00 > > And finally this doesn't change anything either -- but then again this > is a fully aligned request, so I don't quite see the point in testing > this here. At one point during my development, I had a bug where a partial write request at the head of the cluster behaved differently from a partial write request at the end (one allocated while the other did not). Maybe it's best if I do a single write, then inspect the map, then reset the image, rather than doing all three writes in a row and proving that the net result of the three writes had no allocation. In v9, when I was trying to use unallocated clusters as a shortcut on files with no backing image, this actually worked (after all three writes, the cluster would still be unallocated); but since Kevin convinced me that v10 has to set the zero flag on all three writes, I'm back to the point of needing to clear the zero flag between writes. Okay, I'll definitely have to fix it. >> +# With backing file that reads as zero, write to all or part of entire >> cluster >> + >> +# Backing file: 128m: ... | 00 00 >> +# Active layer: 128m: ... | 00 00 00 00 >> +$QEMU_IO -c "write -z $size 2048" "$TEST_IMG" | _filter_qemu_io >> +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map >> + >> +# Backing file: 128m: ... | 00 00 >> +# Active layer: 128m: ... | 00 00 -- -- > > How so? Shouldn't this just stay a zero cluster because the rest of the > cluster already reads as zero? Same problem as above. In v9, I was trying to exploit code that left a cluster unallocated; but that's not viable, so I'll have to reset the image between steps. > > Also, I'm not quite sure what you are testing by just issuing three > consecutive write requests without checking the result each time. You > assume it to be something which you wrote in the comments above each, > but you can't be sure (yes, if something goes wrong in between, the > following final map should catch it, but it's not for certain). Well, it helped me during development, but you're absolutely right that we can't be certain in the long-run. 3 separate checks, with resets between, is indeed the way to go. >> +echo >> +echo == unaligned image tail cluster, allocation required == >> + >> +CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024)) Here, my backing cluster size is intentionally small, so that bdrv_get_block_status() can easily return sane results for the backing file portions, so only the active layer has a partial cluster. But making it obvious in the comments can't hurt, given that my earlier tests were invalidated when my size choice didn't actually match with the operations I was attempting. >> +_make_test_img -b "$TEST_IMG.base" $((size + 2048)) >> + >> +# Write beyond backing file must COW >> +# Backing file: 128m: ... | XX -- > > Here, the "--" is an unallocated cluster... > >> +# Active layer: 128m: ... | -- -- 00 -- > > ...while here it is normally allocated data. Or is "--" supposed to mean > you just don't write to that area...? If we had subclusters, then it would be unallocated data. But we don't. Since I just reset the image, I'm doing a write onto an unallocated cluster in the current layer; the write request is to a subset of the cluster, and the code checking whether the rest of the cluster reads as zeros will see that a COW (and corresponding allocation) is required. > >> + >> +$QEMU_IO -c "write -P 1 $((size)) 512" "$TEST_IMG.base" | _filter_qemu_io >> +$QEMU_IO -c "write -z $((size + 1024)) 512" "$TEST_IMG" | _filter_qemu_io >> +$QEMU_IO -c "read -P 1 $((size)) 512" "$TEST_IMG" | _filter_qemu_io >> +$QEMU_IO -c "read -P 0 $((size + 512)) 1536" "$TEST_IMG" | _filter_qemu_io >> +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map >> + >> +CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024)) >> +_make_test_img -b "$TEST_IMG.base" $((size + 2048)) >> + >> +# Writes at boundaries of (partial) cluster must not lose mid-cluster data >> +# Backing file: 128m: ... | -- XX >> +# Active layer: 128m: ... | 00 -- -- 00 >> + >> +$QEMU_IO -c "write -P 1 $((size + 512)) 512" "$TEST_IMG.base" | >> _filter_qemu_io >> +$QEMU_IO -c "write -z $((size)) 512" "$TEST_IMG" | _filter_qemu_io >> +$QEMU_IO -c "read -P 0 $((size)) 512" "$TEST_IMG" | _filter_qemu_io >> +$QEMU_IO -c "read -P 1 $((size + 512)) 512" "$TEST_IMG" | _filter_qemu_io >> +$QEMU_IO -c "read -P 0 $((size + 1024)) 1024" "$TEST_IMG" | _filter_qemu_> >> +$QEMU_IO -c "write -z $((size + 1536)) 512" "$TEST_IMG" | _filter_qemu_io > > [1] > >> +$QEMU_IO -c "read -P 0 $((size)) 512" "$TEST_IMG" | _filter_qemu_io >> +$QEMU_IO -c "read -P 1 $((size + 512)) 512" "$TEST_IMG" | _filter_qemu_io >> +$QEMU_IO -c "read -P 0 $((size + 1024)) 1024" "$TEST_IMG" | _filter_qemu_io >> +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map >> + >> +CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024)) >> +_make_test_img -b "$TEST_IMG.base" $((size + 2048)) >> + >> +# Partial write must not lose data >> +# Backing file: 128m: ... | -- -- >> +# Active layer: 128m: ... | -- -- XX 00 > > Well, you have basically tested this in [1] already. After the first > write -z to the active layer, the final four sectors are fully allocated > (as they are in a single cluster) and look like this: > > 00 01 00 00 = XX XX XX XX > > (Because the 00s are just written as data.) > > Now the write -z in [1] just overwrites the last of those four sectors > with zero data (albeit it being zero already). > > But if you want to test it again, I'm not stopping you. :-) > Well, you've already proven that I need to revisit this test with a fine-tooth comb, especially if I split the series to do the blkdebug stuff separately from the qcow2 stuff. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
signature.asc
Description: OpenPGP digital signature