Am 17.11.2010 13:43, schrieb Stefan Hajnoczi: > On Fri, Nov 5, 2010 at 6:38 PM, Kevin Wolf <kw...@redhat.com> wrote: >> Instead of directly executing writes and fsyncs, queue them and execute them >> asynchronously. What makes this interesting is that we can delay syncs and if >> multiple syncs occur, we can merge them into one bdrv_flush. > > The block-queue concept adds another layer to the cache= > considerations. We're going to complete write requests that have not > yet been issued to the host. > > This won't work for cluster filesystems where multiple VMs are > accessing the block device, although that is a niche case. I guess > they can assume writes are visible after they receive a completion.
Right, but that doesn't work for anything but raw anyway. What I intend block-queue for is just metadata writes, so nothing that raw would use. > There must be a trade-off between holding back requests until a flush > is forced and issuing requests as they come (what we do today). At > some point if you have too much data queued the flush is going to be > painful and might result in odd timing patterns. This feels similar > to batching tx packets on the network. I agree. Though as long as you use it only for metadata, the requests are relatively small and they overwrite each other so that there's a chance that you can drop some of the from the queue before submitting them. What I implemented here is that the queue is flushed if it reaches some maximum length. So this value is where you'd need to find the right trade-off. >> A typical sequence in qcow2 (simple cluster allocation) looks like this: >> >> 1. Update refcount table >> 2. bdrv_flush >> 3. Update L2 entry >> >> If we delay the operation and get three of these sequences queued before >> actually executing, we end up with the following result, saving two syncs: >> >> 1. Update refcount table (req 1) >> 2. Update refcount table (req 2) >> 3. Update refcount table (req 3) >> 4. bdrv_flush >> 5. Update L2 entry (req 1) >> 6. Update L2 entry (req 2) >> 7. Update L2 entry (req 3) > > How does block-queue group writes 1-3 and 5-7 together? I thought > reqs 1-3 will each have their own context but from a quick look at the > code I don't think that is the case for qcow2 in patch 4. > > Another way of asking is, how does block-queue know that it is safe to > put writes 1-3 together before a bdrv_flush? Why doesn't it also put > writes 5-7 before the flush? > > Perhaps your current qcow2 implementation isn't taking advantage of > block-queue bdrv_flush() batching for concurrent write requests? > > I'm missing something ;). Yes, you are. ;-) The contexts are indeed the mechanism that it uses to achieve this. Have a look at qcow_aio_setup, patch 4/4 adds a blkqueue_init_context() call there. Kevin