On Fri, Nov 5, 2010 at 6:38 PM, Kevin Wolf <kw...@redhat.com> wrote: > Instead of directly executing writes and fsyncs, queue them and execute them > asynchronously. What makes this interesting is that we can delay syncs and if > multiple syncs occur, we can merge them into one bdrv_flush.
The block-queue concept adds another layer to the cache= considerations. We're going to complete write requests that have not yet been issued to the host. This won't work for cluster filesystems where multiple VMs are accessing the block device, although that is a niche case. I guess they can assume writes are visible after they receive a completion. There must be a trade-off between holding back requests until a flush is forced and issuing requests as they come (what we do today). At some point if you have too much data queued the flush is going to be painful and might result in odd timing patterns. This feels similar to batching tx packets on the network. > A typical sequence in qcow2 (simple cluster allocation) looks like this: > > 1. Update refcount table > 2. bdrv_flush > 3. Update L2 entry > > If we delay the operation and get three of these sequences queued before > actually executing, we end up with the following result, saving two syncs: > > 1. Update refcount table (req 1) > 2. Update refcount table (req 2) > 3. Update refcount table (req 3) > 4. bdrv_flush > 5. Update L2 entry (req 1) > 6. Update L2 entry (req 2) > 7. Update L2 entry (req 3) How does block-queue group writes 1-3 and 5-7 together? I thought reqs 1-3 will each have their own context but from a quick look at the code I don't think that is the case for qcow2 in patch 4. Another way of asking is, how does block-queue know that it is safe to put writes 1-3 together before a bdrv_flush? Why doesn't it also put writes 5-7 before the flush? Perhaps your current qcow2 implementation isn't taking advantage of block-queue bdrv_flush() batching for concurrent write requests? I'm missing something ;). Stefan