On Tue, Feb 14, 2012 at 2:30 PM, Chris Webb <ch...@arachsys.com> wrote: > However, there are (known) run-time > assertion failures with throttled IDE devices[1], which show up in qemu-kvm > 1.0 and apparently also in qemu HEAD. We have also sometimes seen throttled > VMs spinning unresponsively with 100% CPU on start-up, which may be related. ... > [1] I can't immediately find the original reports in the archives, but I > discussed this privately with Zhi Yong Wu and he had already had reports of > the same issue. As a quick example, I can trigger an assertion failure in > the IDE driver by turning on limits on a running guest doing heavy IO. I > configure a guest with an IDE drive ide.0.0 and then do > > block_set_io_throttle ide.0.0 100000000 0 0 1000 0 0 > > Shortly afterwards, the qemu-kvm process exists with an assert(): > > qemu-kvm: /home/root/packages/qemu-kvm-1.0/src-76ig7q/hw/ide/pci.c:313: > bmdma_cmd_writeb: Assertion `bm->bus->dma->aiocb == ((void *)0)' failed. > > i.e. bm->bus->dma->aiocb is not NULL after qemu_aio_flush() in > bmdma_cmd_writeb in the IDE driver: > > void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val) > { > #ifdef DEBUG_IDE > printf("%s: 0x%08x\n", __func__, val); > #endif > > /* Ignore writes to SSBM if it keeps the old value */ > if ((val & BM_CMD_START) != (bm->cmd & BM_CMD_START)) { > if (!(val & BM_CMD_START)) { > /* > * We can't cancel Scatter Gather DMA in the middle of the > * operation or a partial (not full) DMA transfer would reach > * the storage so we wait for completion instead (we beahve > * like if the DMA was completed by the time the guest trying > * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not > * set). > * > * In the future we'll be able to safely cancel the I/O if the > * whole DMA operation will be submitted to disk with a single > * aio operation with preadv/pwritev. > */ > if (bm->bus->dma->aiocb) { > qemu_aio_flush(); > assert(bm->bus->dma->aiocb == NULL); > assert((bm->status & BM_STATUS_DMAING) == 0); > } > } else { > bm->cur_addr = bm->addr; > if (!(bm->status & BM_STATUS_DMAING)) { > bm->status |= BM_STATUS_DMAING; > /* start dma transfer if possible */ > if (bm->dma_cb) > bm->dma_cb(bmdma_active_if(bm), 0); > } > } > } > > bm->cmd = val & 0x09; > } > > (My uninformed guess is that this might be something to do with > qemu_aio_flush() not being able to write out all the data because of the IO > throttling?)
Thanks for the bug report. This is an actively maintained part of the codebase, so chances are good this can be fixed in a reasonable time by the community. Just wanted to share my thoughts in case no one else replies - Zhi Yong and I are aware of this bug and there should be time to look into it. Stefan