Am 22.01.2013 14:17, schrieb Laszlo Ersek: > As I understand it, the question is not whether b7ab0fea works or not > (it does), the question is whether *without* b7ab0fea there is a real > problem (user visible bug).
Correct. Or actually not whether, but why. > Commit b7ab0fea decreased "avail_sectors" by > > keep_clusters << (s->cluster_bits - BDRV_SECTOR_BITS) > > which should make a difference for "m->nb_available" in two cases: > > (a) > > avail_sectors_patched < > requested_sectors <= > avail_sectors_unpatched > > (ie. the decrease in "avail_sectors" flipped MIN to prefer it over > requested_sectors, lowering m->nb_available) > > (b) > > avail_sectors_patched < > avail_sectors_unpatched <= > requested_sectors > > (ie. MIN had already preferred "avail_sectors", and now it went lower). > > > It seems that > > 1 << (s->cluster_bits - BDRV_SECTOR_BITS) == s->cluster_sectors [1] > > Therefore both "avail_sectors_patched" and "avail_sectors_unpatched" are > an integral multiple of "s->cluster_sectors". Whenever MIN() selects > either of them, the condition in qcow2_alloc_cluster_link_l2() will > evaluate to false. > > > In case (b) there seems to be no difference between patched & unpatched > in this regard: MIN() always selects an integral multiple, and so > copy_sectors() is never invoked. > > In case (a), the patched version skips copy_sectors(), while the > unpatched version invokes it. > > I'm not sure if case (a) is possible at all -- let's expand it: > > avail_sectors_patched < > requested_sectors <= > avail_sectors_unpatched > > Substitute the assigned values: > > nb_cluster << (s->cluster_bits - BDRV_SECTOR_BITS) < > n_end - keep_clusters * s->cluster_sectors <= > (keep_clusters + nb_clusters) << (s->cluster_bits - BDRV_SECTOR_BITS) > > Using [1], replace the shifts with multiplication: > > nb_cluster * s->cluster_sectors < > n_end - keep_clusters * s->cluster_sectors <= > (keep_clusters + nb_clusters) * s->cluster_sectors > > Distribute the addition: > > nb_cluster * s->cluster_sectors < > n_end - keep_clusters * s->cluster_sectors <= > keep_clusters * s->cluster_sectors + > nb_clusters * s->cluster_sectors > > N := nb_cluster * s->cluster_sectors [bytes] > K := keep_clusters * s->cluster_sectors [bytes] > > N < n_end - K <= K + N > > Add K > > K + N < n_end <= 2*K + N [2] Thank you so much, Laszlo, you are great! I more or less completely failed to see case (a). With your excellent analysis I can indeed trigger a copy on write when it shouldn't happen. The series of requests to get into this state is somewhat tricky as it requires a specific physical ordering of clusters in the image file: Host cluster h: Guest cluster g is mapped to here Host cluster h + 1: Free, can be allocated for not yet mapped g + 1 Host cluster h + 2: Guest cluster g + 2 is mapped to here I can get this with this command on a fresh qcow2 image (64k cluster size): ./qemu-io -c 'write -P 0x66 0 64k' -c 'write 64k 64k' -c 'write -P 0x88 128k 64k' -c 'discard 64k 64k' -c 'write -P 0x77 0 160k' /tmp/test.qcow2 Now I get an additional COW for the area from 96k to 128k. However, this alone doesn't corrupt an image - a copy on write by itself is harmless because it only rewrites the data that is already there. The two things I'm going to check next are: a) What happens with concurrent requests now, are requests for the same area still correctly serialised? b) Can I modify the parameters so that copy_sectors() is working with values it wasn't designed for so that the data is copied to a wrong place or something like that. m->nb_available is used as an argument to it, so it certainly seems realistic. > I have no clue about qcow2, but *each one* of K and N seem to be a > function of *both* n_end and the current "qcow2 disk state" (ie. sectors > being allocated vs. just referenced). IOW, I suspect that case (a) is > possible for some (disk state, n_end) pairs. And case (a) -- if indeed > possible -- seems to contradict the commit message: > > allocation. A COW occurs only if the request wasn't cluster aligned, > ^^^^^^^ ^^^^^^^^^^^^^^^^^^^ > which in turn would imply that requested_sectors was less than > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > avail_sectors (both in the original and in the fixed version). In this > ^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^ > > In case (a) -- expanded as [2] --, "requested_sectors" is greater than > "avail_sectors_patched", and less than or equal to > "avail_sectors_unpatched". Yes, I agree. The commit message is definitely wrong and I understand now why, so that was a great step forward. There's still a missing piece about how it creates a user-visible bug, but hopefully it won't be that hard to find. Kevin