On 09.02.15 01:37, David Gibson wrote: > On Fri, 06 Feb 2015 08:56:32 +0100 > Alexander Graf <ag...@suse.de> wrote: > >> >> >> On 06.02.15 03:54, David Gibson wrote: >>> On Thu, Feb 05, 2015 at 12:55:45PM +0100, Alexander Graf wrote: >>>> >>>> >>>> On 05.02.15 12:30, David Gibson wrote: >>>>> On Thu, Feb 05, 2015 at 11:22:13AM +0100, Alexander Graf wrote: >>> [snip] >>>>>>>>>>>> [snip] >>>>>>>>>>>> >>>>>>>>>>>>> + ret1 = kvmppc_enable_hcall(kvm_state, H_LOGICAL_CI_LOAD); >>>>>>>>>>>>> + if (ret1 != 0) { >>>>>>>>>>>>> + fprintf(stderr, "Warning: error enabling >>>>>>>>>>>>> H_LOGICAL_CI_LOAD in KVM:" >>>>>>>>>>>>> + " %s\n", strerror(errno)); >>>>>>>>>>>>> + } >>>>>>>>>>>>> + >>>>>>>>>>>>> + ret2 = kvmppc_enable_hcall(kvm_state, H_LOGICAL_CI_STORE); >>>>>>>>>>>>> + if (ret2 != 0) { >>>>>>>>>>>>> + fprintf(stderr, "Warning: error enabling >>>>>>>>>>>>> H_LOGICAL_CI_STORE in KVM:" >>>>>>>>>>>>> + " %s\n", strerror(errno)); >>>>>>>>>>>>> + } >>>>>>>>>>>>> + >>>>>>>>>>>>> + if ((ret1 != 0) || (ret2 != 0)) { >>>>>>>>>>>>> + fprintf(stderr, "Warning: Couldn't enable H_LOGICAL_CI_* >>>>>>>>>>>>> in KVM, SLOF" >>>>>>>>>>>>> + " may be unable to operate devices with >>>>>>>>>>>>> in-kernel emulation\n"); >>>>>>>>>>>>> + } >>>>>>>>>>>> >>>>>>>>>>>> You'll always get these warnings if you're running on an old >>>>>>>>>>>> (meaning >>>>>>>>>>>> current upstream) kernel, which could be annoying. >>>>>>>>>>> >>>>>>>>>>> True. >>>>>>>>>>> >>>>>>>>>>>> Is there any way >>>>>>>>>>>> to tell whether you have configured any devices which need the >>>>>>>>>>>> in-kernel MMIO emulation and only warn if you have? >>>>>>>>>>> >>>>>>>>>>> In theory, I guess so. In practice I can't see how you'd enumerate >>>>>>>>>>> all devices that might require kernel intervention without something >>>>>>>>>>> horribly invasive. >>>>>>>>>> >>>>>>>>>> We could WARN_ONCE in QEMU if we emulate such a hypercall, but its >>>>>>>>>> handler is io_mem_unassigned (or we add another minimum priority huge >>>>>>>>>> memory region on all 64bits of address space that reports the >>>>>>>>>> breakage). >>>>>>>>> >>>>>>>>> Would that work for the virtio+iothread case? I had the impression >>>>>>>>> the kernel handled notification region was layered over the qemu >>>>>>>>> emulated region in that case. >>>>>>>> >>>>>>>> IIRC we don't have a way to call back into kvm saying "please write to >>>>>>>> this in-kernel device". But we could at least defer the warning to a >>>>>>>> point where we know that we actually hit it. >>>>>>> >>>>>>> Right, but I'm saying we might miss the warning in cases where we want >>>>>>> it, because the KVM device is shadowed by a qemu device, so qemu won't >>>>>>> see the IO as unassigned or unhandled. >>>>>>> >>>>>>> In particular, I think that will happen in the case of virtio-blk with >>>>>>> iothread, which is the simplest case in which to observe the problem. >>>>>>> The virtio-blk device exists in qemu and is functional, but we rely on >>>>>>> KVM catching the queue notification MMIO before it reaches the qemu >>>>>>> implementation of the rest of the device's IO space. >>>>>> >>>>>> But in that case the VM stays functional and will merely see a >>>>>> performance hit when using virtio in SLOF, no? I don't think that's >>>>>> a problem worth worrying users about. >>>>> >>>>> Alas, no. The iothread stuff *relies* on the in-kernel notification, >>>>> so it will not work if the IO gets punted to qemu. This is the whole >>>>> reason for the in-kernel hcall implementation. >>>> >>>> So at least with vhost-net the in-kernel trapping is optional. If we >>>> happen to get MMIO into QEMU, we'll just handle it there. >>>> >>>> Enlighten me why the iothread stuff can't handle it that way too. >>> >>> So, as I understand it, it could, but it doesn't. Working out how to >>> fix it properly requires better understanding of the dataplane code >>> than I currently possess, >>> >>> So, using virtio-blk as the example case. Normally the queue notify >>> mmio will get routed by the general virtio code to >>> virtio_blk_handle_output(). >>> >>> In the case of dataplane, that just calls >>> virtio_blk_data_plane_start(). So the first time we get a vq notify, >>> the dataplane is started. That sets up the host notifier >>> (VirtioBusClass::set_host_notifier -> virtio_pci_set_host_notifier -> >>> virtio_pci_set_host_notifier_internal -> memory_region_add_eventfd() >>> -> memory_region_transaction_commit() -> >>> address_space_update_ioeventfds - >address_space_add_del_ioeventfds -> >>> kvm_mem_ioeventfd_add -> kvm_set_ioeventfd_mmio -> KVM_IOEVENTFD >>> ioctl) >>> >>> From this point on further calls to virtio_blk_handle_output() are >>> IIUC a "can't happen", because vq notifies should go to the eventfd >>> instead, where they will kick the iothread. >>> >>> So, with SLOF, the first request is ok - it hits >>> virtio_blk_handle_output() which starts the iothread which goes on to >>> process the request. >>> >>> On the second request, however, we get back into >>> virtio_blk_data_plane_start() which sees the iothread is already >>> running and aborts. I think it is assuming that this must be the >>> result of a race with another vcpu starting the dataplane, and so >>> assumes the racing thread will have woken the dataplane which will >>> then handle this vcpu's request as well. >>> >>> In our case, however, the IO hcalls go through to >>> virtio_blk_handle_output() when the dataplane already going, and >>> become no-ops without waking it up again to handle the new request. >>> >>> Enlightened enough yet? >> >> So reading this, it sounds like we could just add logic in the virtio >> dataplane code that allows for a graceful fallback to QEMU based MMIO by >> triggering the eventfd itself in the MMIO handler. When going via this >> slow path, we should of course emit a warning (once) to the user ;). >> >> Stefan, what do you think? > > So, as I understand it this should be possible. I did even have a > draft which did this. However, I don't know the dataplane well enough > to know what gotchas there might be in terms of races, and therefore > how to do this quite right.
I'm sure Stefan knows :). > Note that this doesn't remove the need for the in-kernel H_LOGICAL_CI_* > hcalls, because those will still be necessary if we get real in-kernel > emulated devices in future. Oh, I definitely agree on that part. I just want to make sure we gracefully run on older host kernels with newer QEMU versions. Alex