On Tue, Jan 25, 2011 at 7:51 PM, Anthony Liguori <aligu...@linux.vnet.ibm.com> wrote: > On 01/25/2011 01:45 PM, Stefan Hajnoczi wrote: >> >> On Tue, Jan 25, 2011 at 7:18 PM, Anthony Liguori >> <aligu...@linux.vnet.ibm.com> wrote: >> >>> >>> On 01/25/2011 03:49 AM, Stefan Hajnoczi wrote: >>> >>>> >>>> On Tue, Jan 25, 2011 at 7:12 AM, Stefan Hajnoczi<stefa...@gmail.com> >>>> wrote: >>>> >>>> >>>>> >>>>> On Mon, Jan 24, 2011 at 8:05 PM, Kevin Wolf<kw...@redhat.com> wrote: >>>>> >>>>> >>>>>> >>>>>> Am 24.01.2011 20:47, schrieb Michael S. Tsirkin: >>>>>> >>>>>> >>>>>>> >>>>>>> On Mon, Jan 24, 2011 at 08:48:05PM +0100, Kevin Wolf wrote: >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Am 24.01.2011 20:36, schrieb Michael S. Tsirkin: >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Jan 24, 2011 at 07:54:20PM +0100, Kevin Wolf wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Am 12.12.2010 16:02, schrieb Stefan Hajnoczi: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Virtqueue notify is currently handled synchronously in userspace >>>>>>>>>>> virtio. This >>>>>>>>>>> prevents the vcpu from executing guest code while hardware >>>>>>>>>>> emulation code >>>>>>>>>>> handles the notify. >>>>>>>>>>> >>>>>>>>>>> On systems that support KVM, the ioeventfd mechanism can be used >>>>>>>>>>> to >>>>>>>>>>> make >>>>>>>>>>> virtqueue notify a lightweight exit by deferring hardware >>>>>>>>>>> emulation >>>>>>>>>>> to the >>>>>>>>>>> iothread and allowing the VM to continue execution. This model >>>>>>>>>>> is >>>>>>>>>>> similar to >>>>>>>>>>> how vhost receives virtqueue notifies. >>>>>>>>>>> >>>>>>>>>>> The result of this change is improved performance for userspace >>>>>>>>>>> virtio devices. >>>>>>>>>>> Virtio-blk throughput increases especially for multithreaded >>>>>>>>>>> scenarios and >>>>>>>>>>> virtio-net transmit throughput increases substantially. >>>>>>>>>>> >>>>>>>>>>> Some virtio devices are known to have guest drivers which expect >>>>>>>>>>> a >>>>>>>>>>> notify to be >>>>>>>>>>> processed synchronously and spin waiting for completion. Only >>>>>>>>>>> enable ioeventfd >>>>>>>>>>> for virtio-blk and virtio-net for now. >>>>>>>>>>> >>>>>>>>>>> Care must be taken not to interfere with vhost-net, which uses >>>>>>>>>>> host >>>>>>>>>>> notifiers. If the set_host_notifier() API is used by a device >>>>>>>>>>> virtio-pci will disable virtio-ioeventfd and let the device deal >>>>>>>>>>> with >>>>>>>>>>> host notifiers as it wishes. >>>>>>>>>>> >>>>>>>>>>> After migration and on VM change state (running/paused) >>>>>>>>>>> virtio-ioeventfd >>>>>>>>>>> will enable/disable itself. >>>>>>>>>>> >>>>>>>>>>> * VIRTIO_CONFIG_S_DRIVER_OK -> enable virtio-ioeventfd >>>>>>>>>>> * !VIRTIO_CONFIG_S_DRIVER_OK -> disable virtio-ioeventfd >>>>>>>>>>> * virtio_pci_set_host_notifier() -> disable virtio-ioeventfd >>>>>>>>>>> * vm_change_state(running=0) -> disable virtio-ioeventfd >>>>>>>>>>> * vm_change_state(running=1) -> enable virtio-ioeventfd >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Stefan Hajnoczi<stefa...@linux.vnet.ibm.com> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On current git master I'm getting hangs when running iozone on a >>>>>>>>>> virtio-blk disk. "Hang" means that it's not responsive any more >>>>>>>>>> and >>>>>>>>>> has >>>>>>>>>> 100% CPU consumption. >>>>>>>>>> >>>>>>>>>> I bisected the problem to this patch. Any ideas? >>>>>>>>>> >>>>>>>>>> Kevin >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> Does it help if you set ioeventfd=off on command line? >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> Yes, with ioeventfd=off it seems to work fine. >>>>>>>> >>>>>>>> Kevin >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> Then it's the ioeventfd that is to blame. >>>>>>> Is it the io thread that consumes 100% CPU? >>>>>>> Or the vcpu thread? >>>>>>> >>>>>>> >>>>>> >>>>>> I was building with the default options, i.e. there is no IO thread. >>>>>> >>>>>> Now I'm just running the test with IO threads enabled, and so far >>>>>> everything looks good. So I can only reproduce the problem with IO >>>>>> threads disabled. >>>>>> >>>>>> >>>>> >>>>> Hrm...aio uses SIGUSR2 to force the vcpu to process aio completions >>>>> (relevant when --enable-io-thread is not used). I will take a look at >>>>> that again and see why we're spinning without checking for ioeventfd >>>>> completion. >>>>> >>>>> >>>> >>>> Here's my understanding of --disable-io-thread. Added Anthony on CC, >>>> please correct me. >>>> >>>> When I/O thread is disabled our only thread runs guest code until an >>>> exit request is made. There are synchronous exit cases like a halt >>>> instruction or single step. There are also asynchronous exit cases >>>> when signal handlers use qemu_notify_event(), which does cpu_exit(), >>>> to set env->exit_request = 1 and unlink the current tb. >>>> >>>> >>> >>> Correct. >>> >>> Note that this is a problem today. If you have a tight loop in TCG and >>> you >>> have nothing that would generate a signal (no pending disk I/O and no >>> periodic timer) then the main loop is starved. >>> >> >> Even with KVM we can spin inside the guest and get a softlockup due to >> the dynticks race condition shown above. In a CPU bound guest that's >> doing no I/O it's possible to go AWOL for extended periods of time. >> > > This is a different race. I need to look more deeply into the code.
int kvm_cpu_exec(CPUState *env) { struct kvm_run *run = env->kvm_run; int ret; DPRINTF("kvm_cpu_exec()\n"); do { This is broken because a signal handler could change env->exit_request after this check: #ifndef CONFIG_IOTHREAD if (env->exit_request) { DPRINTF("interrupt exit requested\n"); ret = 0; break; } #endif if (kvm_arch_process_irqchip_events(env)) { ret = 0; break; } if (env->kvm_vcpu_dirty) { kvm_arch_put_registers(env, KVM_PUT_RUNTIME_STATE); env->kvm_vcpu_dirty = 0; } kvm_arch_pre_run(env, run); cpu_single_env = NULL; qemu_mutex_unlock_iothread(); env->exit_request might be set but we still reenter, possibly without rearming the timer: ret = kvm_vcpu_ioctl(env, KVM_RUN, 0); >> I can think of two solutions: >> 1. Block SIGALRM during critical regions, not sure if the necessary >> atomic signal mask capabilities are there in KVM. Haven't looked at >> TCG yet either. >> 2. Make a portion of the timer code signal-safe and rearm the timer >> from within the SIGLARM handler. >> > > Or, switch to timerfd and stop using a signal based alarm timer. Doesn't work for !CONFIG_IOTHREAD. Stefan