Am 24.01.2011 16:26, schrieb Stefan Hajnoczi: > On Mon, Jan 24, 2011 at 2:54 PM, Kevin Wolf <kw...@redhat.com> wrote: >> [ Re-adding qemu-devel to CC ] >> >> Am 24.01.2011 15:34, schrieb Stefan Hajnoczi: >>> On Thu, Jan 20, 2011 at 5:10 PM, Kevin Wolf <kw...@redhat.com> wrote: >>>> @@ -702,17 +622,30 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState >>>> *bs, QCowL2Meta *m) >>>> >>>> if (m->nb_available & (s->cluster_sectors - 1)) { >>>> uint64_t end = m->nb_available & ~(uint64_t)(s->cluster_sectors - >>>> 1); >>>> + cow = true; >>>> ret = copy_sectors(bs, start_sect + end, cluster_offset + (end << >>>> 9), >>>> m->nb_available - end, s->cluster_sectors); >>>> if (ret < 0) >>>> goto err; >>>> } >>>> >>>> - /* update L2 table */ >>>> + /* >>>> + * Update L2 table. >>>> + * >>>> + * Before we update the L2 table to actually point to the new >>>> cluster, we >>>> + * need to be sure that the refcounts have been increased and COW was >>>> + * handled. >>>> + */ >>>> + if (cow) { >>>> + bdrv_flush(bs->file); >>> >>> Just bdrv_flush(bs->file) and not a refcounts cache flush? >> >> Refcounts and data need not to be ordered against each other. They only >> must both be on disk when we write the L2 table. > > Have I missed where refcounts actually get flushed from the cache out > to the disk because bdrv_flush(bs->file) only syncs the file but > doesn't write out dirty data held in cache.
The bdrv_flush isn't supposed to flush the refcounts, but the data copied during COW (what you call pre/postfill in QED) The refcounts are handled by the qcow2_cache_set_dependency below, so that before writing the L2 tables we always write the refcounts first. >>>> + } >>>> + >>>> + qcow2_cache_set_dependency(bs, s->l2_table_cache, >>>> s->refcount_block_cache); >>>> ret = get_cluster_table(bs, m->offset, &l2_table, &l2_offset, >>>> &l2_index); >>>> if (ret < 0) { >>>> goto err; >>>> } >>>> + qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table); >>>> >>>> for (i = 0; i < m->nb_clusters; i++) { >>>> /* if two concurrent writes happen to the same unallocated cluster >>>> @@ -728,16 +661,9 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, >>>> QCowL2Meta *m) >>>> (i << s->cluster_bits)) | QCOW_OFLAG_COPIED); >>>> } >>>> >>>> - /* >>>> - * Before we update the L2 table to actually point to the new >>>> cluster, we >>>> - * need to be sure that the refcounts have been increased and COW was >>>> - * handled. >>>> - */ >>>> - bdrv_flush(bs->file); >>>> >>>> - ret = write_l2_entries(bs, l2_table, l2_offset, l2_index, >>>> m->nb_clusters); >>>> + ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table); >>>> if (ret < 0) { >>>> - qcow2_l2_cache_reset(bs); >>>> goto err; >>>> } >>>> >>> >>> The function continues like this: >>> >>> /* >>> * If this was a COW, we need to decrease the refcount of the old cluster. >>> * Also flush bs->file to get the right order for L2 and refcount update. >>> */ >>> if (j != 0) { >>> bdrv_flush(bs->file); >>> for (i = 0; i < j; i++) { >>> qcow2_free_any_clusters(bs, >>> be64_to_cpu(old_cluster[i]) & ~QCOW_OFLAG_COPIED, 1); >>> } >>> } >>> >>> Does bdrv_flush(bs->file) "get the right order for L2 and refcount >>> update"? We've just put an L2 table, should this be an L2 table >>> flush? >> >> I agree, this looks wrong. What we need is a dependency to ensure that >> first L2 is flushed and then the refcount block. >> qcow2_free_any_clusters() calls update_refcount() indirectly, which >> takes care of setting this dependency. >> >> So in the end I think it's just an unnecessary bdrv_flush. Makes sense? > > I don't understand this fully. I've noticed that the cache isn't the > only mechanism for making changes to tables - there are functions like > write_l2_entries() that directly write out parts of tables without > honoring dependencies or using the dirty bit on the cache entry. I > probably need to look at this more carefully to fully understand > whether or not it is correct. No, the cache should be the only mechanism that is used for accessing L2 tables and refcount blocks. write_l2_entries() is an old function that is removed by the patch. Direct accesses should only be left for L1 tables and refcount tables. Kevin