Re: Live migration broken when under heavy IO

2009-06-16 Thread Avi Kivity
On 06/16/2009 03:57 PM, Anthony Liguori wrote: The tricky bit is that this has to happen at the device layer because the opaques cannot be saved in a meaningful way. Do you mean the device has to record all cancelled requests and replay them? I think we can do it at the block layer (thoug

Re: Live migration broken when under heavy IO

2009-06-16 Thread Anthony Liguori
Avi Kivity wrote: Yes, that's even better (though without linux-aio, it's equivalent). Not absolutely equivalent. There many be queued requests that haven't yet been dispatched to the thread pool, but yeah, I understand what you mean. The tricky bit is that this has to happen at the devi

Re: Live migration broken when under heavy IO

2009-06-16 Thread Avi Kivity
On 06/16/2009 03:50 PM, Anthony Liguori wrote: Avi Kivity wrote: Does anyone have a clever idea how to fix this without just waiting for all IO requests to complete? What's wrong with waiting for requests to complete? It should take a few tens of milliseconds. An alternative would be to at

Re: Live migration broken when under heavy IO

2009-06-16 Thread Anthony Liguori
Avi Kivity wrote: Does anyone have a clever idea how to fix this without just waiting for all IO requests to complete? What's wrong with waiting for requests to complete? It should take a few tens of milliseconds. An alternative would be to attempt to cancel the requests. This incurs no n

Re: Live migration broken when under heavy IO

2009-06-16 Thread Avi Kivity
On 06/16/2009 12:10 PM, Avi Kivity wrote: Does anyone have a clever idea how to fix this without just waiting for all IO requests to complete? What's wrong with waiting for requests to complete? It should take a few tens of milliseconds. We could start throttling requests late in the live s

Re: Live migration broken when under heavy IO

2009-06-16 Thread Avi Kivity
On 06/15/2009 11:33 PM, Anthony Liguori wrote: The basic issue is that: migrate_fd_put_ready():bdrv_flush_all(); Does: block.c: foreach block driver: drv->flush(bs); Which in the case of raw, is just fsync(s->fd). Any submitted request is not queued or flushed which will lead to the