Re: [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify

Stefan Hajnoczi Tue, 25 Jan 2011 05:20:44 -0800

On Tue, Jan 25, 2011 at 11:27 AM, Michael S. Tsirkin <m...@redhat.com> wrote:
> On Tue, Jan 25, 2011 at 09:49:04AM +0000, Stefan Hajnoczi wrote:
>> On Tue, Jan 25, 2011 at 7:12 AM, Stefan Hajnoczi <stefa...@gmail.com> wrote:
>> > On Mon, Jan 24, 2011 at 8:05 PM, Kevin Wolf <kw...@redhat.com> wrote:
>> >> Am 24.01.2011 20:47, schrieb Michael S. Tsirkin:
>> >>> On Mon, Jan 24, 2011 at 08:48:05PM +0100, Kevin Wolf wrote:
>> >>>> Am 24.01.2011 20:36, schrieb Michael S. Tsirkin:
>> >>>>> On Mon, Jan 24, 2011 at 07:54:20PM +0100, Kevin Wolf wrote:
>> >>>>>> Am 12.12.2010 16:02, schrieb Stefan Hajnoczi:
>> >>>>>>> Virtqueue notify is currently handled synchronously in userspace 
>> >>>>>>> virtio.  This
>> >>>>>>> prevents the vcpu from executing guest code while hardware emulation 
>> >>>>>>> code
>> >>>>>>> handles the notify.
>> >>>>>>>
>> >>>>>>> On systems that support KVM, the ioeventfd mechanism can be used to 
>> >>>>>>> make
>> >>>>>>> virtqueue notify a lightweight exit by deferring hardware emulation 
>> >>>>>>> to the
>> >>>>>>> iothread and allowing the VM to continue execution.  This model is 
>> >>>>>>> similar to
>> >>>>>>> how vhost receives virtqueue notifies.
>> >>>>>>>
>> >>>>>>> The result of this change is improved performance for userspace 
>> >>>>>>> virtio devices.
>> >>>>>>> Virtio-blk throughput increases especially for multithreaded 
>> >>>>>>> scenarios and
>> >>>>>>> virtio-net transmit throughput increases substantially.
>> >>>>>>>
>> >>>>>>> Some virtio devices are known to have guest drivers which expect a 
>> >>>>>>> notify to be
>> >>>>>>> processed synchronously and spin waiting for completion.  Only 
>> >>>>>>> enable ioeventfd
>> >>>>>>> for virtio-blk and virtio-net for now.
>> >>>>>>>
>> >>>>>>> Care must be taken not to interfere with vhost-net, which uses host
>> >>>>>>> notifiers.  If the set_host_notifier() API is used by a device
>> >>>>>>> virtio-pci will disable virtio-ioeventfd and let the device deal with
>> >>>>>>> host notifiers as it wishes.
>> >>>>>>>
>> >>>>>>> After migration and on VM change state (running/paused) 
>> >>>>>>> virtio-ioeventfd
>> >>>>>>> will enable/disable itself.
>> >>>>>>>
>> >>>>>>>  * VIRTIO_CONFIG_S_DRIVER_OK -> enable virtio-ioeventfd
>> >>>>>>>  * !VIRTIO_CONFIG_S_DRIVER_OK -> disable virtio-ioeventfd
>> >>>>>>>  * virtio_pci_set_host_notifier() -> disable virtio-ioeventfd
>> >>>>>>>  * vm_change_state(running=0) -> disable virtio-ioeventfd
>> >>>>>>>  * vm_change_state(running=1) -> enable virtio-ioeventfd
>> >>>>>>>
>> >>>>>>> Signed-off-by: Stefan Hajnoczi <stefa...@linux.vnet.ibm.com>
>> >>>>>>
>> >>>>>> On current git master I'm getting hangs when running iozone on a
>> >>>>>> virtio-blk disk. "Hang" means that it's not responsive any more and 
>> >>>>>> has
>> >>>>>> 100% CPU consumption.
>> >>>>>>
>> >>>>>> I bisected the problem to this patch. Any ideas?
>> >>>>>>
>> >>>>>> Kevin
>> >>>>>
>> >>>>> Does it help if you set ioeventfd=off on command line?
>> >>>>
>> >>>> Yes, with ioeventfd=off it seems to work fine.
>> >>>>
>> >>>> Kevin
>> >>>
>> >>> Then it's the ioeventfd that is to blame.
>> >>> Is it the io thread that consumes 100% CPU?
>> >>> Or the vcpu thread?
>> >>
>> >> I was building with the default options, i.e. there is no IO thread.
>> >>
>> >> Now I'm just running the test with IO threads enabled, and so far
>> >> everything looks good. So I can only reproduce the problem with IO
>> >> threads disabled.
>> >
>> > Hrm...aio uses SIGUSR2 to force the vcpu to process aio completions
>> > (relevant when --enable-io-thread is not used).  I will take a look at
>> > that again and see why we're spinning without checking for ioeventfd
>> > completion.
>>
>> Here's my understanding of --disable-io-thread.  Added Anthony on CC,
>> please correct me.
>>
>> When I/O thread is disabled our only thread runs guest code until an
>> exit request is made.  There are synchronous exit cases like a halt
>> instruction or single step.  There are also asynchronous exit cases
>> when signal handlers use qemu_notify_event(), which does cpu_exit(),
>> to set env->exit_request = 1 and unlink the current tb.
>>
>> With this structure in mind, anything which needs to interrupt the
>> vcpu in order to process events must use signals and
>> qemu_notify_event().  Otherwise that event source may be starved and
>> never processed.
>>
>> virtio-ioeventfd currently does not use signals and will therefore
>> never interrupt the vcpu.
>>
>> However, you normally don't notice the missing signal handler because
>> some other event interrupts the vcpu and we enter select(2) to process
>> all pending handlers.  So virtio-ioeventfd mostly gets a free ride on
>> top of timer events.  This is suboptimal because it adds latency to
>> virtqueue kick - we're waiting for another event to interrupt the vcpu
>> before we can process virtqueue-kick.
>>
>> If any other vcpu interruption makes virtio-ioeventfd chug along then
>> why are you seeing 100% CPU livelock?  My theory is that dynticks has
>> a race condition which causes timers to stop working in QEMU.  Here is
>> an strace of QEMU --disable-io-thread entering live lock.  I can
>> trigger this by starting a VM and running "while true; do true; done"
>> at the shell.  Then strace the QEMU process:
>>
>> 08:04:34.985177 ioctl(11, KVM_RUN, 0)   = 0
>> 08:04:34.985242 --- SIGALRM (Alarm clock) @ 0 (0) ---
>> 08:04:34.985319 write(6, "\1\0\0\0\0\0\0\0", 8) = 8
>> 08:04:34.985368 rt_sigreturn(0x2758ad0) = 0
>> 08:04:34.985423 select(15, [5 8 14], [], [], {0, 0}) = 1 (in [5], left {0, 
>> 0})
>> 08:04:34.985484 read(5, "\1\0\0\0\0\0\0\0", 512) = 8
>> 08:04:34.985538 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0
>> 08:04:34.985588 timer_settime(0, 0, {it_interval={0, 0}, it_value={0,
>> 273000}}, NULL) = 0
>> 08:04:34.985646 ioctl(11, KVM_RUN, 0)   = -1 EINTR (Interrupted system call)
>> 08:04:34.985928 --- SIGALRM (Alarm clock) @ 0 (0) ---
>> 08:04:34.986007 write(6, "\1\0\0\0\0\0\0\0", 8) = 8
>> 08:04:34.986063 rt_sigreturn(0x2758ad0) = -1 EINTR (Interrupted system call)
>> 08:04:34.986124 select(15, [5 8 14], [], [], {0, 0}) = 1 (in [5], left {0, 
>> 0})
>> 08:04:34.986188 read(5, "\1\0\0\0\0\0\0\0", 512) = 8
>> 08:04:34.986246 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0
>> 08:04:34.986299 timer_settime(0, 0, {it_interval={0, 0}, it_value={0,
>> 250000}}, NULL) = 0
>> 08:04:34.986359 ioctl(11, KVM_INTERRUPT, 0x7fff90404ef0) = 0
>> 08:04:34.986406 ioctl(11, KVM_RUN, 0)   = 0
>> 08:04:34.986465 ioctl(11, KVM_RUN, 0)   = 0              <--- guest
>> finishes execution
>>
>>                 v--- dynticks_rearm_timer() returns early because
>> timer is already scheduled
>> 08:04:34.986533 timer_gettime(0, {it_interval={0, 0}, it_value={0, 24203}}) 
>> = 0
>> 08:04:34.986585 --- SIGALRM (Alarm clock) @ 0 (0) ---    <--- timer expires
>> 08:04:34.986661 write(6, "\1\0\0\0\0\0\0\0", 8) = 8
>> 08:04:34.986710 rt_sigreturn(0x2758ad0) = 0
>>
>>                 v--- we re-enter the guest without rearming the timer!
>> 08:04:34.986754 ioctl(11, KVM_RUN^C <unfinished ...>
>> [QEMU hang, 100% CPU]
>>
>> So dynticks fails to rearm the timer before we enter the guest.  This
>> is a race condition: we check that there is already a timer scheduled
>> and head on towards re-entering the guest, the timer expires before we
>> enter the guest, we re-enter the guest without realizing the timer has
>> expired.  Now we're inside the guest without the hope of a timer
>> expiring - and the guest is running a CPU-bound workload that doesn't
>> need to perform I/O.
>>
>> The result is a hung QEMU (screen does not update) and a softlockup
>> inside the guest once we do kick it to life again (by detaching
>> strace).
>>
>> I think the only way to avoid this race condition in dynticks is to
>> mask SIGALRM, then check if the timer expired, and then ioctl(KVM_RUN)
>> with atomic signal mask change back to SIGALRM enabled.  Thoughts?
>>
>> Back to virtio-ioeventfd, we really shouldn't use virtio-ioeventfd
>> when there is no I/O thread.
>
> Can we make it work with SIGIO?
>
>>  It doesn't make sense because there's no
>> opportunity to process the virtqueue while the guest code is executing
>> in parallel like there is with I/O thread.  It will just degrade
>> performance when QEMU only has one thread.
>
> Probably. But it's really better to check this than theorethise about
> it.


eventfd does not seem to support O_ASYNC.  After adding the necessary
code into QEMU no signals were firing so I wrote a test:

#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <fcntl.h>
#include <signal.h>
#include <sys/eventfd.h>

int main(int argc, char **argv)
{
        int fd = eventfd(0, 0);
        if (fd < 0) {
                perror("eventfd");
                exit(1);
        }

        if (fcntl(fd, F_SETSIG, SIGTERM) < 0) {
                perror("fcntl(F_SETSIG)");
                exit(1);
        }

        if (fcntl(fd, F_SETOWN, getpid()) < 0) {
                perror("fcntl(F_SETOWN)");
                exit(1);
        }

        if (fcntl(fd, F_SETFL, O_NONBLOCK | O_ASYNC) < 0) {
                perror("fcntl(F_SETFL)");
                exit(1);
        }

        switch (fork()) {
        case -1:
                perror("fork");
                exit(1);

        case 0:         /* child */
                eventfd_write(fd, 1);
                exit(0);

        default:        /* parent */
                break;
        }

        sleep(5);
        wait(NULL);
        close(fd);
        return 0;
}

I'd expect the parent to get a SIGTERM but the process just sleeps and
then exits.  When replacing the eventfd with a pipe in this program
the parent does receive a SIGKILL.

Stefan

Re: [Qemu-devel] [PATCH v5 2/4] virtio-pci: Use ioeventfd for virtqueue notify

Reply via email to