On Fri, Sep 23, 2011 at 11:57 PM, Stefan Hajnoczi <stefa...@linux.vnet.ibm.com> wrote: > Here is my generic image streaming branch, which aims to provide a way > to copy the contents of a backing file into an image file of a running > guest without requiring specific support in the various block drivers > (e.g. qcow2, qed, vmdk): > > http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/image-streaming-api > > The tree does not provide full image streaming yet but I'd like to > discuss the approach taken in the code. Here are the main points: > > The image streaming API is available through HMP and QMP commands. When > streaming is started on a block device a coroutine is created to do the > background I/O work. The coroutine can be cancelled. > > While the coroutine copies data from the backing file into the image > file, the guest may be performing I/O to the image file. Guest reads do > not conflict with streaming but guest writes require special handling. > If the guest writes to a region of the image file that we are currently > copying, then there is the potential to clobber the guest write with old > data from the backing file. > > Previously I solved this in a QED-specific way by taking advantage of > the serialization of allocating write requests. In order to do this > generically we need to track in-flight requests and have the ability to > queue I/O. Guest writes that affect an in-flight streaming copy > operation must wait for that operation to complete before being issued. > Streaming copy operations must skip overlapping regions of guest writes. > > One big difference to the QED image streaming implementation is that > this generic implementation is not based on copy-on-read operations. > Instead we do a sequence of bdrv_is_allocated() to find regions for > streaming, followed by bdrv_co_read() and bdrv_co_write() in order to > populate the image file. > > It turns out that generic copy-on-read is not an attractive operation > because it requires using bounce buffers for every request. Kevin bounce buffers == buffer ring? > pointed out the case where a guest performs a read and pokes the data > buffer before the read completes, copy-on-read would write out the > modified memory into the image file unless we use a bounce buffer. Can you elaborate this?
> > There are a few pieces missing in my tree, which have mostly been solved > in other places and just need to be reused: > 1. Arbitration between guest and streaming requests (this is the only > real new thing). > 2. Efficient zero handling (skip writing those regions or mark them as > zero clusters). > 3. Queuing/dependencies when arbitration decides a request must wait. > I'm taking a look at reusing Zhi Yong's block queue. > 4. Rate-limiting to ensure streaming I/O does not impact the guest. > Already exists in the QED-specific patches, it may make sense to > extract common code that both migration and the block layer can use. > > Ideas or questions? > > Stefan > > -- Regards, Zhi Yong Wu