On 01/22/13 11:25, Kevin Wolf wrote: > Am 18.12.2012 10:46, schrieb Philipp Hahn: >> Hello Kevin, hello Michael, >> >> On Wednesday 12 December 2012 17:54:58 Kevin Wolf wrote: >>> Am 12.12.2012 15:09, schrieb Philipp Hahn: >>>> Am Mittwoch 12 Dezember 2012 14:41:49 schrieb Kevin Wolf: >>>>> As you can see in the commit message of that patch I was convinced that >>>>> no bug did exist in practice and this was only dangerous with respect to >>>>> future changes. Therefore my first question is if you're using an >>>>> unmodified upstream qemu or if some backported patches are applied to >>>>> it? If it's indeed unmodified, we should probably review the code once >>>>> again to understand why it makes a difference. >>>> >>>> This were all unmodified versions directly from git between >>>> "qemu-kvm-1.1.0" and "qemu-kvm-1.2.0" >>>> >>>> "git checkout b7ab0fea37c15ca9e249c42c46f5c48fd1a0943c" works, >>>> "git checkout b7ab0fea37c15ca9e249c42c46f5c48fd1a0943c~1" is broken. >>>> "git checkout qemu-kvm-1.1.2" is broken, >>>> "git checkout qemu-kvm-1.1.2 ; git cherry-pick >>>> b7ab0fea37c15ca9e249c42c46f5c48fd1a0943c" works >>> >>> Ok, thanks for clarifying. Then I must have missed some interesting case >>> while doing the patch. >> >> I think I found your missing link: >> After filling in "QCowL2Meta *m", that request ist queued: >> QLIST_INSERT_HEAD(&s->cluster_allocs, m, next_in_flight); >> do prevent double allocating the same cluster for overlapping requests, >> which >> is checked in do_alloc_cluster_offset(). >> I guess that since the sector count was wrong, the overlap detection didn't >> work and the two concurrent write requests to the same cluster overwrote >> each >> other. > > I'm still not seeing it. The overlap detection code doesn't use > m->nb_available at all, so why would it make a difference? > > I guess I need some people who can decently review code - Laszlo, Paolo, > any idea why upstream commit b7ab0fea does change the behaviour, > contrary to what I claimed in the commit message?
As I understand it, the question is not whether b7ab0fea works or not (it does), the question is whether *without* b7ab0fea there is a real problem (user visible bug). Commit b7ab0fea decreased "avail_sectors" by keep_clusters << (s->cluster_bits - BDRV_SECTOR_BITS) which should make a difference for "m->nb_available" in two cases: (a) avail_sectors_patched < requested_sectors <= avail_sectors_unpatched (ie. the decrease in "avail_sectors" flipped MIN to prefer it over requested_sectors, lowering m->nb_available) (b) avail_sectors_patched < avail_sectors_unpatched <= requested_sectors (ie. MIN had already preferred "avail_sectors", and now it went lower). It seems that 1 << (s->cluster_bits - BDRV_SECTOR_BITS) == s->cluster_sectors [1] Therefore both "avail_sectors_patched" and "avail_sectors_unpatched" are an integral multiple of "s->cluster_sectors". Whenever MIN() selects either of them, the condition in qcow2_alloc_cluster_link_l2() will evaluate to false. In case (b) there seems to be no difference between patched & unpatched in this regard: MIN() always selects an integral multiple, and so copy_sectors() is never invoked. In case (a), the patched version skips copy_sectors(), while the unpatched version invokes it. I'm not sure if case (a) is possible at all -- let's expand it: avail_sectors_patched < requested_sectors <= avail_sectors_unpatched Substitute the assigned values: nb_cluster << (s->cluster_bits - BDRV_SECTOR_BITS) < n_end - keep_clusters * s->cluster_sectors <= (keep_clusters + nb_clusters) << (s->cluster_bits - BDRV_SECTOR_BITS) Using [1], replace the shifts with multiplication: nb_cluster * s->cluster_sectors < n_end - keep_clusters * s->cluster_sectors <= (keep_clusters + nb_clusters) * s->cluster_sectors Distribute the addition: nb_cluster * s->cluster_sectors < n_end - keep_clusters * s->cluster_sectors <= keep_clusters * s->cluster_sectors + nb_clusters * s->cluster_sectors N := nb_cluster * s->cluster_sectors [bytes] K := keep_clusters * s->cluster_sectors [bytes] N < n_end - K <= K + N Add K K + N < n_end <= 2*K + N [2] I have no clue about qcow2, but *each one* of K and N seem to be a function of *both* n_end and the current "qcow2 disk state" (ie. sectors being allocated vs. just referenced). IOW, I suspect that case (a) is possible for some (disk state, n_end) pairs. And case (a) -- if indeed possible -- seems to contradict the commit message: allocation. A COW occurs only if the request wasn't cluster aligned, ^^^^^^^ ^^^^^^^^^^^^^^^^^^^ which in turn would imply that requested_sectors was less than ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ avail_sectors (both in the original and in the fixed version). In this ^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^ In case (a) -- expanded as [2] --, "requested_sectors" is greater than "avail_sectors_patched", and less than or equal to "avail_sectors_unpatched". Laszlo