On 06.02.15 03:54, David Gibson wrote: > On Thu, Feb 05, 2015 at 12:55:45PM +0100, Alexander Graf wrote: >> >> >> On 05.02.15 12:30, David Gibson wrote: >>> On Thu, Feb 05, 2015 at 11:22:13AM +0100, Alexander Graf wrote: > [snip] >>>>>>>>>> [snip] >>>>>>>>>> >>>>>>>>>>> + ret1 = kvmppc_enable_hcall(kvm_state, H_LOGICAL_CI_LOAD); >>>>>>>>>>> + if (ret1 != 0) { >>>>>>>>>>> + fprintf(stderr, "Warning: error enabling H_LOGICAL_CI_LOAD >>>>>>>>>>> in KVM:" >>>>>>>>>>> + " %s\n", strerror(errno)); >>>>>>>>>>> + } >>>>>>>>>>> + >>>>>>>>>>> + ret2 = kvmppc_enable_hcall(kvm_state, H_LOGICAL_CI_STORE); >>>>>>>>>>> + if (ret2 != 0) { >>>>>>>>>>> + fprintf(stderr, "Warning: error enabling >>>>>>>>>>> H_LOGICAL_CI_STORE in KVM:" >>>>>>>>>>> + " %s\n", strerror(errno)); >>>>>>>>>>> + } >>>>>>>>>>> + >>>>>>>>>>> + if ((ret1 != 0) || (ret2 != 0)) { >>>>>>>>>>> + fprintf(stderr, "Warning: Couldn't enable H_LOGICAL_CI_* >>>>>>>>>>> in KVM, SLOF" >>>>>>>>>>> + " may be unable to operate devices with in-kernel >>>>>>>>>>> emulation\n"); >>>>>>>>>>> + } >>>>>>>>>> >>>>>>>>>> You'll always get these warnings if you're running on an old (meaning >>>>>>>>>> current upstream) kernel, which could be annoying. >>>>>>>>> >>>>>>>>> True. >>>>>>>>> >>>>>>>>>> Is there any way >>>>>>>>>> to tell whether you have configured any devices which need the >>>>>>>>>> in-kernel MMIO emulation and only warn if you have? >>>>>>>>> >>>>>>>>> In theory, I guess so. In practice I can't see how you'd enumerate >>>>>>>>> all devices that might require kernel intervention without something >>>>>>>>> horribly invasive. >>>>>>>> >>>>>>>> We could WARN_ONCE in QEMU if we emulate such a hypercall, but its >>>>>>>> handler is io_mem_unassigned (or we add another minimum priority huge >>>>>>>> memory region on all 64bits of address space that reports the >>>>>>>> breakage). >>>>>>> >>>>>>> Would that work for the virtio+iothread case? I had the impression >>>>>>> the kernel handled notification region was layered over the qemu >>>>>>> emulated region in that case. >>>>>> >>>>>> IIRC we don't have a way to call back into kvm saying "please write to >>>>>> this in-kernel device". But we could at least defer the warning to a >>>>>> point where we know that we actually hit it. >>>>> >>>>> Right, but I'm saying we might miss the warning in cases where we want >>>>> it, because the KVM device is shadowed by a qemu device, so qemu won't >>>>> see the IO as unassigned or unhandled. >>>>> >>>>> In particular, I think that will happen in the case of virtio-blk with >>>>> iothread, which is the simplest case in which to observe the problem. >>>>> The virtio-blk device exists in qemu and is functional, but we rely on >>>>> KVM catching the queue notification MMIO before it reaches the qemu >>>>> implementation of the rest of the device's IO space. >>>> >>>> But in that case the VM stays functional and will merely see a >>>> performance hit when using virtio in SLOF, no? I don't think that's >>>> a problem worth worrying users about. >>> >>> Alas, no. The iothread stuff *relies* on the in-kernel notification, >>> so it will not work if the IO gets punted to qemu. This is the whole >>> reason for the in-kernel hcall implementation. >> >> So at least with vhost-net the in-kernel trapping is optional. If we >> happen to get MMIO into QEMU, we'll just handle it there. >> >> Enlighten me why the iothread stuff can't handle it that way too. > > So, as I understand it, it could, but it doesn't. Working out how to > fix it properly requires better understanding of the dataplane code > than I currently possess, > > So, using virtio-blk as the example case. Normally the queue notify > mmio will get routed by the general virtio code to > virtio_blk_handle_output(). > > In the case of dataplane, that just calls > virtio_blk_data_plane_start(). So the first time we get a vq notify, > the dataplane is started. That sets up the host notifier > (VirtioBusClass::set_host_notifier -> virtio_pci_set_host_notifier -> > virtio_pci_set_host_notifier_internal -> memory_region_add_eventfd() > -> memory_region_transaction_commit() -> > address_space_update_ioeventfds - >address_space_add_del_ioeventfds -> > kvm_mem_ioeventfd_add -> kvm_set_ioeventfd_mmio -> KVM_IOEVENTFD > ioctl) > > From this point on further calls to virtio_blk_handle_output() are > IIUC a "can't happen", because vq notifies should go to the eventfd > instead, where they will kick the iothread. > > So, with SLOF, the first request is ok - it hits > virtio_blk_handle_output() which starts the iothread which goes on to > process the request. > > On the second request, however, we get back into > virtio_blk_data_plane_start() which sees the iothread is already > running and aborts. I think it is assuming that this must be the > result of a race with another vcpu starting the dataplane, and so > assumes the racing thread will have woken the dataplane which will > then handle this vcpu's request as well. > > In our case, however, the IO hcalls go through to > virtio_blk_handle_output() when the dataplane already going, and > become no-ops without waking it up again to handle the new request. > > Enlightened enough yet?
So reading this, it sounds like we could just add logic in the virtio dataplane code that allows for a graceful fallback to QEMU based MMIO by triggering the eventfd itself in the MMIO handler. When going via this slow path, we should of course emit a warning (once) to the user ;). Stefan, what do you think? Alex