[Qemu-devel] Block IO throttling: contractor wanted

Chris Webb Tue, 14 Feb 2012 13:28:40 -0800

We run a cloud hosting provider using qemu-kvm 1.0, and are keen to find a
contractor to track down and fix some issues with blockio throttled IDE
devices in current qemu HEAD.


The impact of a heavy user of disk IO on other virtual machines' disk
performance is a real and serious problem for us, and Zhi Yong Wu's new
blockio limits feature is very useful in combating this. Consequently, we'd
like to be able to deploy it as soon as possible.

The patch set has since merged into qemu mainline via Kevin Wolf's block
tree:

  http://lists.nongnu.org/archive/html/qemu-devel/2011-11/msg00947.html
  http://lists.nongnu.org/archive/html/qemu-devel/2011-12/msg00463.html

and I back-ported it to qemu 1.0. However, there are (known) run-time
assertion failures with throttled IDE devices[1], which show up in qemu-kvm
1.0 and apparently also in qemu HEAD. We have also sometimes seen throttled
VMs spinning unresponsively with 100% CPU on start-up, which may be related.

If anyone knowledgeable in the area would be interested in being paid to
work on this, or if you know someone who might be, I would be delighted to
hear from you.

Cheers,

Chris.


[1] I can't immediately find the original reports in the archives, but I
discussed this privately with Zhi Yong Wu and he had already had reports of
the same issue. As a quick example, I can trigger an assertion failure in
the IDE driver by turning on limits on a running guest doing heavy IO. I
configure a guest with an IDE drive ide.0.0 and then do

  block_set_io_throttle ide.0.0 100000000 0 0 1000 0 0

Shortly afterwards, the qemu-kvm process exists with an assert():

qemu-kvm: /home/root/packages/qemu-kvm-1.0/src-76ig7q/hw/ide/pci.c:313:
  bmdma_cmd_writeb: Assertion `bm->bus->dma->aiocb == ((void *)0)' failed.

i.e. bm->bus->dma->aiocb is not NULL after qemu_aio_flush() in
bmdma_cmd_writeb in the IDE driver:

  void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val)
  {
  #ifdef DEBUG_IDE
      printf("%s: 0x%08x\n", __func__, val);
  #endif
        
      /* Ignore writes to SSBM if it keeps the old value */
      if ((val & BM_CMD_START) != (bm->cmd & BM_CMD_START)) {
          if (!(val & BM_CMD_START)) {
              /*
               * We can't cancel Scatter Gather DMA in the middle of the
               * operation or a partial (not full) DMA transfer would reach
               * the storage so we wait for completion instead (we beahve
               * like if the DMA was completed by the time the guest trying
               * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not
               * set).
               *
               * In the future we'll be able to safely cancel the I/O if the
               * whole DMA operation will be submitted to disk with a single
               * aio operation with preadv/pwritev.
               */
              if (bm->bus->dma->aiocb) {
                  qemu_aio_flush();
                  assert(bm->bus->dma->aiocb == NULL);
                  assert((bm->status & BM_STATUS_DMAING) == 0);
              }
          } else {
              bm->cur_addr = bm->addr;
              if (!(bm->status & BM_STATUS_DMAING)) {
                  bm->status |= BM_STATUS_DMAING;
                  /* start dma transfer if possible */
                  if (bm->dma_cb)
                      bm->dma_cb(bmdma_active_if(bm), 0);
              }
          }
      }
       
      bm->cmd = val & 0x09;
  }

(My uninformed guess is that this might be something to do with
qemu_aio_flush() not being able to write out all the data because of the IO
throttling?)

[Qemu-devel] Block IO throttling: contractor wanted

Reply via email to