On Wed, Jun 8, 2011 at 4:10 PM, Jagane Sundar <jag...@sundar.org> wrote: > On 6/7/2011 5:15 AM, Stefan Hajnoczi wrote: >> >> On Mon, Jun 6, 2011 at 5:55 PM, Marcelo Tosatti<mtosa...@redhat.com> >> wrote: >> >> I haven't reviewed this whole patch yet, but comments below. >> >> This patch, like image streaming, may hit deadlocks due to synchronous >> I/O emulation. I discovered this problem when working on image >> streaming and it should be solved by getting rid of the asynchronous >> context concept. The problem is that async I/O emulation will push a >> new context, preventing existing requests to complete until the >> current context is popped again. If the image format has dependencies >> between requests (e.g. QED allocating writes are serialized), then >> this leads to deadlock because the new request cannot complete until >> the old one does, but the old one needs to wait for the context to be >> popped. I think you are not affected by the QED allocating write case >> since the source image is only read, not written, by live block copy. >> But you might encounter this problem in other places. >> > Hello Stefan, > > Can you expand on this some more? I have similar concerns for Livebackup. > > At the beginning of your paragraph, did you mean 'asynchronous I/O > emulation' instead of 'synchronous I/O emulation'? > > Also, I don't understand the 'stack' construct that you refer to. When you > say 'push a new context', are you talking about what happens when a new > thread picks up a new async I/O req from the VM, and then proceeds to > execute the I/O req? What is this stack that you refer to? > > Any design documents, code snippets that I can look, other pointers welcome.
See async.c. There is synchronous I/O emulation in block.c for BlockDrivers that don't support .bdrv_read()/.bdrv_write() but only .bdrv_aio_readv()/.bdrv_aio_writev(). The way it works is that it pushes a new I/O context and then issues async I/O. Then it runs a special event loop waiting for that I/O to complete. After the I/O completes it pops the context again. The point of the context is that completions only get called for the current context. Therefore callers of the synchronous I/O functions don't need to worry that the state of the world might change during their "synchronous" operation - only their own I/O operation can complete. If a pending async I/O completes during synchronous I/O emulation, its callback is not invoked until the bottom half (BH) is called after the async context is popped. This guarantees that the synchronous operation and its caller have completed before I/O completion callbacks are invoked for pending async I/O. The problem is that QED allocating writes are serialized and cannot make progress if a pending request is unable to complete. Preventing the pending request from completing deadlocks QEMU. The same thing could happen in other places. I'm bringing this up in case anyone hits such a deadlock. I think we can solve this in the future by eliminating the async context concept. Kevin and I have discussed how that can be done but it's not possible or trivial to do yet. Stefan