On Tue, Jan 25, 2011 at 09:49:04AM +0000, Stefan Hajnoczi wrote: > On Tue, Jan 25, 2011 at 7:12 AM, Stefan Hajnoczi <stefa...@gmail.com> wrote: > > On Mon, Jan 24, 2011 at 8:05 PM, Kevin Wolf <kw...@redhat.com> wrote: > >> Am 24.01.2011 20:47, schrieb Michael S. Tsirkin: > >>> On Mon, Jan 24, 2011 at 08:48:05PM +0100, Kevin Wolf wrote: > >>>> Am 24.01.2011 20:36, schrieb Michael S. Tsirkin: > >>>>> On Mon, Jan 24, 2011 at 07:54:20PM +0100, Kevin Wolf wrote: > >>>>>> Am 12.12.2010 16:02, schrieb Stefan Hajnoczi: > >>>>>>> Virtqueue notify is currently handled synchronously in userspace > >>>>>>> virtio. This > >>>>>>> prevents the vcpu from executing guest code while hardware emulation > >>>>>>> code > >>>>>>> handles the notify. > >>>>>>> > >>>>>>> On systems that support KVM, the ioeventfd mechanism can be used to > >>>>>>> make > >>>>>>> virtqueue notify a lightweight exit by deferring hardware emulation > >>>>>>> to the > >>>>>>> iothread and allowing the VM to continue execution. This model is > >>>>>>> similar to > >>>>>>> how vhost receives virtqueue notifies. > >>>>>>> > >>>>>>> The result of this change is improved performance for userspace > >>>>>>> virtio devices. > >>>>>>> Virtio-blk throughput increases especially for multithreaded > >>>>>>> scenarios and > >>>>>>> virtio-net transmit throughput increases substantially. > >>>>>>> > >>>>>>> Some virtio devices are known to have guest drivers which expect a > >>>>>>> notify to be > >>>>>>> processed synchronously and spin waiting for completion. Only enable > >>>>>>> ioeventfd > >>>>>>> for virtio-blk and virtio-net for now. > >>>>>>> > >>>>>>> Care must be taken not to interfere with vhost-net, which uses host > >>>>>>> notifiers. If the set_host_notifier() API is used by a device > >>>>>>> virtio-pci will disable virtio-ioeventfd and let the device deal with > >>>>>>> host notifiers as it wishes. > >>>>>>> > >>>>>>> After migration and on VM change state (running/paused) > >>>>>>> virtio-ioeventfd > >>>>>>> will enable/disable itself. > >>>>>>> > >>>>>>> * VIRTIO_CONFIG_S_DRIVER_OK -> enable virtio-ioeventfd > >>>>>>> * !VIRTIO_CONFIG_S_DRIVER_OK -> disable virtio-ioeventfd > >>>>>>> * virtio_pci_set_host_notifier() -> disable virtio-ioeventfd > >>>>>>> * vm_change_state(running=0) -> disable virtio-ioeventfd > >>>>>>> * vm_change_state(running=1) -> enable virtio-ioeventfd > >>>>>>> > >>>>>>> Signed-off-by: Stefan Hajnoczi <stefa...@linux.vnet.ibm.com> > >>>>>> > >>>>>> On current git master I'm getting hangs when running iozone on a > >>>>>> virtio-blk disk. "Hang" means that it's not responsive any more and has > >>>>>> 100% CPU consumption. > >>>>>> > >>>>>> I bisected the problem to this patch. Any ideas? > >>>>>> > >>>>>> Kevin > >>>>> > >>>>> Does it help if you set ioeventfd=off on command line? > >>>> > >>>> Yes, with ioeventfd=off it seems to work fine. > >>>> > >>>> Kevin > >>> > >>> Then it's the ioeventfd that is to blame. > >>> Is it the io thread that consumes 100% CPU? > >>> Or the vcpu thread? > >> > >> I was building with the default options, i.e. there is no IO thread. > >> > >> Now I'm just running the test with IO threads enabled, and so far > >> everything looks good. So I can only reproduce the problem with IO > >> threads disabled. > > > > Hrm...aio uses SIGUSR2 to force the vcpu to process aio completions > > (relevant when --enable-io-thread is not used). I will take a look at > > that again and see why we're spinning without checking for ioeventfd > > completion. > > Here's my understanding of --disable-io-thread. Added Anthony on CC, > please correct me. > > When I/O thread is disabled our only thread runs guest code until an > exit request is made. There are synchronous exit cases like a halt > instruction or single step. There are also asynchronous exit cases > when signal handlers use qemu_notify_event(), which does cpu_exit(), > to set env->exit_request = 1 and unlink the current tb. > > With this structure in mind, anything which needs to interrupt the > vcpu in order to process events must use signals and > qemu_notify_event(). Otherwise that event source may be starved and > never processed. > > virtio-ioeventfd currently does not use signals and will therefore > never interrupt the vcpu. > > However, you normally don't notice the missing signal handler because > some other event interrupts the vcpu and we enter select(2) to process > all pending handlers. So virtio-ioeventfd mostly gets a free ride on > top of timer events. This is suboptimal because it adds latency to > virtqueue kick - we're waiting for another event to interrupt the vcpu > before we can process virtqueue-kick. > > If any other vcpu interruption makes virtio-ioeventfd chug along then > why are you seeing 100% CPU livelock? My theory is that dynticks has > a race condition which causes timers to stop working in QEMU. Here is > an strace of QEMU --disable-io-thread entering live lock. I can > trigger this by starting a VM and running "while true; do true; done" > at the shell. Then strace the QEMU process: > > 08:04:34.985177 ioctl(11, KVM_RUN, 0) = 0 > 08:04:34.985242 --- SIGALRM (Alarm clock) @ 0 (0) --- > 08:04:34.985319 write(6, "\1\0\0\0\0\0\0\0", 8) = 8 > 08:04:34.985368 rt_sigreturn(0x2758ad0) = 0 > 08:04:34.985423 select(15, [5 8 14], [], [], {0, 0}) = 1 (in [5], left {0, 0}) > 08:04:34.985484 read(5, "\1\0\0\0\0\0\0\0", 512) = 8 > 08:04:34.985538 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0 > 08:04:34.985588 timer_settime(0, 0, {it_interval={0, 0}, it_value={0, > 273000}}, NULL) = 0 > 08:04:34.985646 ioctl(11, KVM_RUN, 0) = -1 EINTR (Interrupted system call) > 08:04:34.985928 --- SIGALRM (Alarm clock) @ 0 (0) --- > 08:04:34.986007 write(6, "\1\0\0\0\0\0\0\0", 8) = 8 > 08:04:34.986063 rt_sigreturn(0x2758ad0) = -1 EINTR (Interrupted system call) > 08:04:34.986124 select(15, [5 8 14], [], [], {0, 0}) = 1 (in [5], left {0, 0}) > 08:04:34.986188 read(5, "\1\0\0\0\0\0\0\0", 512) = 8 > 08:04:34.986246 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0 > 08:04:34.986299 timer_settime(0, 0, {it_interval={0, 0}, it_value={0, > 250000}}, NULL) = 0 > 08:04:34.986359 ioctl(11, KVM_INTERRUPT, 0x7fff90404ef0) = 0 > 08:04:34.986406 ioctl(11, KVM_RUN, 0) = 0 > 08:04:34.986465 ioctl(11, KVM_RUN, 0) = 0 <--- guest > finishes execution > > v--- dynticks_rearm_timer() returns early because > timer is already scheduled > 08:04:34.986533 timer_gettime(0, {it_interval={0, 0}, it_value={0, 24203}}) = > 0 > 08:04:34.986585 --- SIGALRM (Alarm clock) @ 0 (0) --- <--- timer expires > 08:04:34.986661 write(6, "\1\0\0\0\0\0\0\0", 8) = 8 > 08:04:34.986710 rt_sigreturn(0x2758ad0) = 0 > > v--- we re-enter the guest without rearming the timer! > 08:04:34.986754 ioctl(11, KVM_RUN^C <unfinished ...> > [QEMU hang, 100% CPU] > > So dynticks fails to rearm the timer before we enter the guest. This > is a race condition: we check that there is already a timer scheduled > and head on towards re-entering the guest, the timer expires before we > enter the guest, we re-enter the guest without realizing the timer has > expired. Now we're inside the guest without the hope of a timer > expiring - and the guest is running a CPU-bound workload that doesn't > need to perform I/O. > > The result is a hung QEMU (screen does not update) and a softlockup > inside the guest once we do kick it to life again (by detaching > strace). > > I think the only way to avoid this race condition in dynticks is to > mask SIGALRM, then check if the timer expired, and then ioctl(KVM_RUN) > with atomic signal mask change back to SIGALRM enabled. Thoughts? > > Back to virtio-ioeventfd, we really shouldn't use virtio-ioeventfd > when there is no I/O thread.
Can we make it work with SIGIO? > It doesn't make sense because there's no > opportunity to process the virtqueue while the guest code is executing > in parallel like there is with I/O thread. It will just degrade > performance when QEMU only has one thread. Probably. But it's really better to check this than theorethise about it. > I'll send a patch to > disable it when we build without I/O thread. > > Stefan