We run a cloud hosting provider using qemu-kvm 1.0, and are keen to find a contractor to track down and fix some issues with blockio throttled IDE devices in current qemu HEAD.
The impact of a heavy user of disk IO on other virtual machines' disk performance is a real and serious problem for us, and Zhi Yong Wu's new blockio limits feature is very useful in combating this. Consequently, we'd like to be able to deploy it as soon as possible. The patch set has since merged into qemu mainline via Kevin Wolf's block tree: http://lists.nongnu.org/archive/html/qemu-devel/2011-11/msg00947.html http://lists.nongnu.org/archive/html/qemu-devel/2011-12/msg00463.html and I back-ported it to qemu 1.0. However, there are (known) run-time assertion failures with throttled IDE devices[1], which show up in qemu-kvm 1.0 and apparently also in qemu HEAD. We have also sometimes seen throttled VMs spinning unresponsively with 100% CPU on start-up, which may be related. If anyone knowledgeable in the area would be interested in being paid to work on this, or if you know someone who might be, I would be delighted to hear from you. Cheers, Chris. [1] I can't immediately find the original reports in the archives, but I discussed this privately with Zhi Yong Wu and he had already had reports of the same issue. As a quick example, I can trigger an assertion failure in the IDE driver by turning on limits on a running guest doing heavy IO. I configure a guest with an IDE drive ide.0.0 and then do block_set_io_throttle ide.0.0 100000000 0 0 1000 0 0 Shortly afterwards, the qemu-kvm process exists with an assert(): qemu-kvm: /home/root/packages/qemu-kvm-1.0/src-76ig7q/hw/ide/pci.c:313: bmdma_cmd_writeb: Assertion `bm->bus->dma->aiocb == ((void *)0)' failed. i.e. bm->bus->dma->aiocb is not NULL after qemu_aio_flush() in bmdma_cmd_writeb in the IDE driver: void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val) { #ifdef DEBUG_IDE printf("%s: 0x%08x\n", __func__, val); #endif /* Ignore writes to SSBM if it keeps the old value */ if ((val & BM_CMD_START) != (bm->cmd & BM_CMD_START)) { if (!(val & BM_CMD_START)) { /* * We can't cancel Scatter Gather DMA in the middle of the * operation or a partial (not full) DMA transfer would reach * the storage so we wait for completion instead (we beahve * like if the DMA was completed by the time the guest trying * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not * set). * * In the future we'll be able to safely cancel the I/O if the * whole DMA operation will be submitted to disk with a single * aio operation with preadv/pwritev. */ if (bm->bus->dma->aiocb) { qemu_aio_flush(); assert(bm->bus->dma->aiocb == NULL); assert((bm->status & BM_STATUS_DMAING) == 0); } } else { bm->cur_addr = bm->addr; if (!(bm->status & BM_STATUS_DMAING)) { bm->status |= BM_STATUS_DMAING; /* start dma transfer if possible */ if (bm->dma_cb) bm->dma_cb(bmdma_active_if(bm), 0); } } } bm->cmd = val & 0x09; } (My uninformed guess is that this might be something to do with qemu_aio_flush() not being able to write out all the data because of the IO throttling?)