Re: [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify

Michael S. Tsirkin Tue, 25 Jan 2011 03:29:40 -0800

On Tue, Jan 25, 2011 at 09:49:04AM +0000, Stefan Hajnoczi wrote:
> On Tue, Jan 25, 2011 at 7:12 AM, Stefan Hajnoczi <stefa...@gmail.com> wrote:
> > On Mon, Jan 24, 2011 at 8:05 PM, Kevin Wolf <kw...@redhat.com> wrote:
> >> Am 24.01.2011 20:47, schrieb Michael S. Tsirkin:
> >>> On Mon, Jan 24, 2011 at 08:48:05PM +0100, Kevin Wolf wrote:
> >>>> Am 24.01.2011 20:36, schrieb Michael S. Tsirkin:
> >>>>> On Mon, Jan 24, 2011 at 07:54:20PM +0100, Kevin Wolf wrote:
> >>>>>> Am 12.12.2010 16:02, schrieb Stefan Hajnoczi:
> >>>>>>> Virtqueue notify is currently handled synchronously in userspace 
> >>>>>>> virtio.  This
> >>>>>>> prevents the vcpu from executing guest code while hardware emulation 
> >>>>>>> code
> >>>>>>> handles the notify.
> >>>>>>>
> >>>>>>> On systems that support KVM, the ioeventfd mechanism can be used to 
> >>>>>>> make
> >>>>>>> virtqueue notify a lightweight exit by deferring hardware emulation 
> >>>>>>> to the
> >>>>>>> iothread and allowing the VM to continue execution.  This model is 
> >>>>>>> similar to
> >>>>>>> how vhost receives virtqueue notifies.
> >>>>>>>
> >>>>>>> The result of this change is improved performance for userspace 
> >>>>>>> virtio devices.
> >>>>>>> Virtio-blk throughput increases especially for multithreaded 
> >>>>>>> scenarios and
> >>>>>>> virtio-net transmit throughput increases substantially.
> >>>>>>>
> >>>>>>> Some virtio devices are known to have guest drivers which expect a 
> >>>>>>> notify to be
> >>>>>>> processed synchronously and spin waiting for completion.  Only enable 
> >>>>>>> ioeventfd
> >>>>>>> for virtio-blk and virtio-net for now.
> >>>>>>>
> >>>>>>> Care must be taken not to interfere with vhost-net, which uses host
> >>>>>>> notifiers.  If the set_host_notifier() API is used by a device
> >>>>>>> virtio-pci will disable virtio-ioeventfd and let the device deal with
> >>>>>>> host notifiers as it wishes.
> >>>>>>>
> >>>>>>> After migration and on VM change state (running/paused) 
> >>>>>>> virtio-ioeventfd
> >>>>>>> will enable/disable itself.
> >>>>>>>
> >>>>>>>  * VIRTIO_CONFIG_S_DRIVER_OK -> enable virtio-ioeventfd
> >>>>>>>  * !VIRTIO_CONFIG_S_DRIVER_OK -> disable virtio-ioeventfd
> >>>>>>>  * virtio_pci_set_host_notifier() -> disable virtio-ioeventfd
> >>>>>>>  * vm_change_state(running=0) -> disable virtio-ioeventfd
> >>>>>>>  * vm_change_state(running=1) -> enable virtio-ioeventfd
> >>>>>>>
> >>>>>>> Signed-off-by: Stefan Hajnoczi <stefa...@linux.vnet.ibm.com>
> >>>>>>
> >>>>>> On current git master I'm getting hangs when running iozone on a
> >>>>>> virtio-blk disk. "Hang" means that it's not responsive any more and has
> >>>>>> 100% CPU consumption.
> >>>>>>
> >>>>>> I bisected the problem to this patch. Any ideas?
> >>>>>>
> >>>>>> Kevin
> >>>>>
> >>>>> Does it help if you set ioeventfd=off on command line?
> >>>>
> >>>> Yes, with ioeventfd=off it seems to work fine.
> >>>>
> >>>> Kevin
> >>>
> >>> Then it's the ioeventfd that is to blame.
> >>> Is it the io thread that consumes 100% CPU?
> >>> Or the vcpu thread?
> >>
> >> I was building with the default options, i.e. there is no IO thread.
> >>
> >> Now I'm just running the test with IO threads enabled, and so far
> >> everything looks good. So I can only reproduce the problem with IO
> >> threads disabled.
> >
> > Hrm...aio uses SIGUSR2 to force the vcpu to process aio completions
> > (relevant when --enable-io-thread is not used).  I will take a look at
> > that again and see why we're spinning without checking for ioeventfd
> > completion.
> 
> Here's my understanding of --disable-io-thread.  Added Anthony on CC,
> please correct me.
> 
> When I/O thread is disabled our only thread runs guest code until an
> exit request is made.  There are synchronous exit cases like a halt
> instruction or single step.  There are also asynchronous exit cases
> when signal handlers use qemu_notify_event(), which does cpu_exit(),
> to set env->exit_request = 1 and unlink the current tb.
> 
> With this structure in mind, anything which needs to interrupt the
> vcpu in order to process events must use signals and
> qemu_notify_event().  Otherwise that event source may be starved and
> never processed.
> 
> virtio-ioeventfd currently does not use signals and will therefore
> never interrupt the vcpu.
> 
> However, you normally don't notice the missing signal handler because
> some other event interrupts the vcpu and we enter select(2) to process
> all pending handlers.  So virtio-ioeventfd mostly gets a free ride on
> top of timer events.  This is suboptimal because it adds latency to
> virtqueue kick - we're waiting for another event to interrupt the vcpu
> before we can process virtqueue-kick.
> 
> If any other vcpu interruption makes virtio-ioeventfd chug along then
> why are you seeing 100% CPU livelock?  My theory is that dynticks has
> a race condition which causes timers to stop working in QEMU.  Here is
> an strace of QEMU --disable-io-thread entering live lock.  I can
> trigger this by starting a VM and running "while true; do true; done"
> at the shell.  Then strace the QEMU process:
> 
> 08:04:34.985177 ioctl(11, KVM_RUN, 0)   = 0
> 08:04:34.985242 --- SIGALRM (Alarm clock) @ 0 (0) ---
> 08:04:34.985319 write(6, "\1\0\0\0\0\0\0\0", 8) = 8
> 08:04:34.985368 rt_sigreturn(0x2758ad0) = 0
> 08:04:34.985423 select(15, [5 8 14], [], [], {0, 0}) = 1 (in [5], left {0, 0})
> 08:04:34.985484 read(5, "\1\0\0\0\0\0\0\0", 512) = 8
> 08:04:34.985538 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0
> 08:04:34.985588 timer_settime(0, 0, {it_interval={0, 0}, it_value={0,
> 273000}}, NULL) = 0
> 08:04:34.985646 ioctl(11, KVM_RUN, 0)   = -1 EINTR (Interrupted system call)
> 08:04:34.985928 --- SIGALRM (Alarm clock) @ 0 (0) ---
> 08:04:34.986007 write(6, "\1\0\0\0\0\0\0\0", 8) = 8
> 08:04:34.986063 rt_sigreturn(0x2758ad0) = -1 EINTR (Interrupted system call)
> 08:04:34.986124 select(15, [5 8 14], [], [], {0, 0}) = 1 (in [5], left {0, 0})
> 08:04:34.986188 read(5, "\1\0\0\0\0\0\0\0", 512) = 8
> 08:04:34.986246 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0
> 08:04:34.986299 timer_settime(0, 0, {it_interval={0, 0}, it_value={0,
> 250000}}, NULL) = 0
> 08:04:34.986359 ioctl(11, KVM_INTERRUPT, 0x7fff90404ef0) = 0
> 08:04:34.986406 ioctl(11, KVM_RUN, 0)   = 0
> 08:04:34.986465 ioctl(11, KVM_RUN, 0)   = 0              <--- guest
> finishes execution
> 
>                 v--- dynticks_rearm_timer() returns early because
> timer is already scheduled
> 08:04:34.986533 timer_gettime(0, {it_interval={0, 0}, it_value={0, 24203}}) = > 0
> 08:04:34.986585 --- SIGALRM (Alarm clock) @ 0 (0) ---    <--- timer expires
> 08:04:34.986661 write(6, "\1\0\0\0\0\0\0\0", 8) = 8
> 08:04:34.986710 rt_sigreturn(0x2758ad0) = 0
> 
>                 v--- we re-enter the guest without rearming the timer!
> 08:04:34.986754 ioctl(11, KVM_RUN^C <unfinished ...>
> [QEMU hang, 100% CPU]
> 
> So dynticks fails to rearm the timer before we enter the guest.  This
> is a race condition: we check that there is already a timer scheduled
> and head on towards re-entering the guest, the timer expires before we
> enter the guest, we re-enter the guest without realizing the timer has
> expired.  Now we're inside the guest without the hope of a timer
> expiring - and the guest is running a CPU-bound workload that doesn't
> need to perform I/O.
> 
> The result is a hung QEMU (screen does not update) and a softlockup
> inside the guest once we do kick it to life again (by detaching
> strace).
> 
> I think the only way to avoid this race condition in dynticks is to
> mask SIGALRM, then check if the timer expired, and then ioctl(KVM_RUN)
> with atomic signal mask change back to SIGALRM enabled.  Thoughts?
> 
> Back to virtio-ioeventfd, we really shouldn't use virtio-ioeventfd
> when there is no I/O thread.


Can we make it work with SIGIO?

>  It doesn't make sense because there's no
> opportunity to process the virtqueue while the guest code is executing
> in parallel like there is with I/O thread.  It will just degrade
> performance when QEMU only has one thread.

Probably. But it's really better to check this than theorethise about
it.

>  I'll send a patch to
> disable it when we build without I/O thread.
> 
> Stefan

Re: [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify

Reply via email to