Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots
On 11/20/14 00:32, Mario Smarduch wrote: > Hi Laszlo, > > couple observations. > > I'm wondering if access from qemu and guest won't > result in mixed memory attributes and if that's acceptable > to the CPU. Normally this would be a problem I think (Jon raised the topic of live migration). However, for flash programming specifically, I think the guest's access pattern ensures that we'll see things OK. When the guest issues the first write access, the memslot is deleted, and everything is forwarded to qemu, both reads and writes. In response qemu modifies the array that *otherwise* backs the flash. These modifications by qemu end up in the dcache mostly. When the guest is done "programming", it writes a special command (read array mode) at which point the memslot is recreated (as read-only) and flushed / set up for flushing during demand paging. So from the emulated flash POV, the memslot either doesn't exist at all (and then qemu serves all accesses just fine), or it exists r/o, at which point qemu (host userspace) will have stopped writing to it, and will have set it up for flushing before and during guest read accesses. > Also is if you update memory from qemu you may break > dirty page logging/migration. Very probably. Jon said the same thing. > Unless there is some other way > you keep track. Of course it may not be applicable in your > case (i.e. flash unused after boot). The flash *is* used after boot, because the UEFI runtime variable services *are* exercised by the guest kernel. However those use the same access pattern (it's the same set of UEFI services just called by a different "client"). *Uncoordinated* access from guest and host in parallel will be a big problem; but we're not that far yet, and we need to get the flash problem sorted, so that we can at least boot and work on the basic stuff. The flash programming dance happens to provide coordination; the flash mode changes (which are equivalent to the teardown and the recreation of the memslot) can be considered barriers. I hope this is acceptable for the time being... Thanks Laszlo > > - Mario > > On 11/17/2014 07:49 AM, Laszlo Ersek wrote: >> On 11/17/14 16:29, Paolo Bonzini wrote: >>> >>> >>> On 17/11/2014 15:58, Ard Biesheuvel wrote: Readonly memslots are often used to implement emulation of ROMs and NOR flashes, in which case the guest may legally map these regions as uncached. To deal with the incoherency associated with uncached guest mappings, treat all readonly memslots as incoherent, and ensure that pages that belong to regions tagged as such are flushed to DRAM before being passed to the guest. >>> >>> On x86, the processor combines the cacheability values from the two >>> levels of page tables. Is there no way to do the same on ARM? >> >> Combining occurs on ARMv8 too. The Stage1 (guest) mapping is very strict >> (Device non-Gathering, non-Reordering, no Early Write Acknowledgement -- >> for EFI_MEMORY_UC), which basically "overrides" the Stage2 (very lax >> host) memory attributes. >> >> When qemu writes, as part of emulating the flash programming commands, >> to the RAMBlock that *otherwise* backs the flash range (as a r/o >> memslot), those writes (from host userspace) tend to end up in dcache. >> >> But, when the guest flips back the flash to romd mode, and tries to read >> back the values from the flash as plain ROM, the dcache is completely >> bypassed due to the strict stage1 mapping, and the guest goes directly >> to DRAM. >> >> Where qemu's earlier writes are not yet / necessarily visible. >> >> Please see my original patch (which was incomplete) in the attachment, >> it has a very verbose commit message. >> >> Anyway, I'll let others explain; they can word it better than I can :) >> >> FWIW, >> >> Series >> Reviewed-by: Laszlo Ersek >> >> I ported this series to a 3.17.0+ based kernel, and tested it. It works >> fine. The ROM-like view of the NOR flash now reflects the previously >> programmed contents. >> >> Series >> Tested-by: Laszlo Ersek >> >> Thanks! >> Laszlo >> >> >> >> ___ >> kvmarm mailing list >> kvm...@lists.cs.columbia.edu >> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm >> > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v1 0/2] Define some VFIO interfaces for VT-d Posted-Interrupts
VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt. With VT-d Posted-Interrupts enabled, external interrupts from direct-assigned devices can be delivered to guests without VMM intervention when guest is running in non-root mode. You can find the VT-d Posted-Interrtups Spec. in the following URL: http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html This patch set does the following things: - Define a new VFIO group KVM_DEV_VFIO_INTERRUPT and it's attributes KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ. Qemu can use this interface to configure VT-d PI when guest updates the interrupt configuration (MSI/MSI-X configuration). - Define a new VFIO API: vfio_msi_get_irq(), which can be used by KVM to get the host irq of the assigned devices. Then KVM can update the associated IRTE for VT-d PI. Feng Wu (2): vfio: Add new interrupt group for VFIO vfio: Add VFIO API vfio_msi_get_irq Documentation/virtual/kvm/devices/vfio.txt |8 drivers/vfio/pci/vfio_pci.c| 10 ++ include/linux/vfio.h |2 ++ include/uapi/linux/kvm.h | 14 ++ 4 files changed, 34 insertions(+), 0 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v1 2/2] vfio: Add VFIO API vfio_msi_get_irq
This API returns the host irq for the MSI/MSI-X interrrupts. Signed-off-by: Feng Wu --- drivers/vfio/pci/vfio_pci.c | 10 ++ include/linux/vfio.h|2 ++ 2 files changed, 12 insertions(+), 0 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 9558da3..4fb9828 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -1009,6 +1009,16 @@ put_devs: kfree(devs.devices); } +unsigned int vfio_msi_get_irq(struct vfio_device *device, int vector, bool msix) +{ + struct vfio_pci_device *vdev = + (struct vfio_pci_device *)vfio_device_data(device); + struct pci_dev *pdev = vdev->pdev; + + return msix ? vdev->msix[vector].vector : pdev->irq + vector; +} +EXPORT_SYMBOL_GPL(vfio_msi_get_irq); + static void __exit vfio_pci_cleanup(void) { pci_unregister_driver(&vfio_pci_driver); diff --git a/include/linux/vfio.h b/include/linux/vfio.h index d320411..007ca55 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -92,6 +92,8 @@ extern void vfio_unregister_iommu_driver( /* * External user API */ +extern unsigned int vfio_msi_get_irq(struct vfio_device *device, int vector, + bool msix); extern struct vfio_group *vfio_group_get_external_user(struct file *filep); extern void vfio_group_put_external_user(struct vfio_group *group); extern int vfio_external_user_iommu_id(struct vfio_group *group); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v1 1/2] vfio: Add new interrupt group for VFIO
Add new group KVM_DEV_VFIO_INTERRUPT and command KVM_DEV_VFIO_DEVIE_POSTING_IRQ related to it. This is used for VT-d Posted-Interrupts setup. Signed-off-by: Feng Wu --- Documentation/virtual/kvm/devices/vfio.txt |8 include/uapi/linux/kvm.h | 14 ++ 2 files changed, 22 insertions(+), 0 deletions(-) diff --git a/Documentation/virtual/kvm/devices/vfio.txt b/Documentation/virtual/kvm/devices/vfio.txt index ef51740..bd99176 100644 --- a/Documentation/virtual/kvm/devices/vfio.txt +++ b/Documentation/virtual/kvm/devices/vfio.txt @@ -13,6 +13,7 @@ VFIO-group is held by KVM. Groups: KVM_DEV_VFIO_GROUP + KVM_DEV_VFIO_INTERRUPT KVM_DEV_VFIO_GROUP attributes: KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking @@ -20,3 +21,10 @@ KVM_DEV_VFIO_GROUP attributes: For each, kvm_device_attr.addr points to an int32_t file descriptor for the VFIO group. + +KVM_DEV_VFIO_INTERRUPT attributes: + KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ: Set up the interrupt configuration for +VT-d Posted-Interrrupts + +For each, kvm_device_attr.addr points to struct kvm_posted_intr, which +include the needed information for VT-d Posted-Interrupts setup. diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 6076882..5544fcc 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -943,9 +943,23 @@ struct kvm_device_attr { __u64 addr; /* userspace address of attr data */ }; +struct virq_info { + __u32 index; /* index of the msi/msix entry */ + int virq; /* virq of the interrupt */ +}; + +struct kvm_posted_intr { + __u32 fd; /* file descriptor of the VFIO device */ + __u32 count; + boolmsix; + struct virq_info virq_info[0]; +}; + #define KVM_DEV_VFIO_GROUP1 #define KVM_DEV_VFIO_GROUP_ADD 1 #define KVM_DEV_VFIO_GROUP_DEL 2 +#define KVM_DEV_VFIO_INTERRUPT2 +#define KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ 1 enum kvm_device_type { KVM_DEV_TYPE_FSL_MPIC_20= 1, -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
"Suresh E. Warrier" writes: > This patch adds trace points in the guest entry and exit code and also > for exceptions handled by the host in kernel mode - hypercalls and page > faults. The new events are added to /sys/kernel/debug/tracing/events > under a new subsystem called kvm_hv. > /* Set this explicitly in case thread 0 doesn't have a vcpu */ > @@ -1687,6 +1691,9 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) > > vc->vcore_state = VCORE_RUNNING; > preempt_disable(); > + > + trace_kvmppc_run_core(vc, 0); > + > spin_unlock(&vc->lock); Do we really want to call tracepoint with spin lock held ? Is that a good thing to do ?. -aneesh -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] vfio: put off the allocation of "minor" in vfio_create_group
The next code fragment "list_for_each_entry" is not depend on "minor". With this patch, the free of "minor" in "list_for_each_entry" can be reduced, and there is no functional change. Signed-off-by: Zhen Lei --- drivers/vfio/vfio.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index f018d8d..737eb468 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -225,22 +225,21 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group) mutex_lock(&vfio.group_lock); - minor = vfio_alloc_group_minor(group); - if (minor < 0) { - vfio_group_unlock_and_free(group); - return ERR_PTR(minor); - } - /* Did we race creating this group? */ list_for_each_entry(tmp, &vfio.group_list, vfio_next) { if (tmp->iommu_group == iommu_group) { vfio_group_get(tmp); - vfio_free_group_minor(minor); vfio_group_unlock_and_free(group); return tmp; } } + minor = vfio_alloc_group_minor(group); + if (minor < 0) { + vfio_group_unlock_and_free(group); + return ERR_PTR(minor); + } + dev = device_create(vfio.class, NULL, MKDEV(MAJOR(vfio.group_devt), minor), group, "%d", iommu_group_id(iommu_group)); -- 1.8.0 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: compiler bug gcc4.6/4.7 with ACCESS_ONCE and workarounds
Am 10.11.2014 um 22:07 schrieb Linus Torvalds: [...] > So before blacklisting any compilers, let's first see if > > (a) we can actually make it a real rule that we only use ACCESS_ONCE on > scalars > (b) we can somehow enforce this with a compiler warning/error for mis-uses > > For example, the attached patch works for some cases, but shows how we > use ACCESS_ONCE() on pointers to pte_t's etc, so it doesn't come even > close to compiling the whole kernel. But I wonder how painful that > would be to change.. The places where it complains are actually > somewhat debatable to begin with, like: > > - handle_pte_fault(.. pte_t *pte ..): > > entry = ACCESS_ONCE(*pte); > > and the thing is, "pte" is actually possibly an 8-byte entity on > x86-32, and that ACCESS_ONCE() fundamentally will be two 32-byte > reads. > > So there is a very valid argument for saying "well, you shouldn't do > that, then", and that we might be better off cleaning up our > ACCESS_ONCE() uses, than to just blindly blacklist compilers. > > NOTE! I'm not at all advocating the attached patch. I'm sending it out > white-space damaged on purpose, it's more of a "hey, something like > this might be the direction we want to go in", with the spinlock.h > part of the patch also acting as an example of the kind of changes the > "ACCESS_ONCE() only works on scalars" rule would require. So I tried to see if I can come up with some results on how often this problem happens... [...] > diff --git a/include/linux/compiler.h b/include/linux/compiler.h > index d5ad7b1118fc..63e82f1dfc1a 100644 > --- a/include/linux/compiler.h > +++ b/include/linux/compiler.h > @@ -378,7 +378,11 @@ void ftrace_likely_update(struct > ftrace_branch_data *f, int val, int expect); > * use is to mediate communication between process-level code and irq/NMI > * handlers, all running on the same CPU. > */ > -#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x)) > +#define get_scalar_volatile_pointer(x) ({ \ > + typeof(x) *__p = &(x); \ > + volatile typeof(x) *__vp = __p; \ > + (void)(long)*__p; __vp; }) > +#define ACCESS_ONCE(x) (*get_scalar_volatile_pointer(x)) ..and just took this patch. On s390 is pretty much clean with allyesconfig In fact with the siif lock changed only the pte/pmd cases you mentioned trigger a compile error: mm/memory.c: In function 'handle_pte_fault': mm/memory.c:3203:2: error: aggregate value used where an integer was expected entry = ACCESS_ONCE(*pte); mm/rmap.c: In function 'mm_find_pmd': mm/rmap.c:584:2: error: aggregate value used where an integer was expected pmde = ACCESS_ONCE(*pmd); Here a barrier() might be a good solution as well, I guess. On x86 allyesconfig its almost the same. - we need your spinlock changes (well, something different to make it compile) - we need to fix pmd and pte - we have gup_get_pte in arch/x86/mm/gup.c getting a ptep So It looks like we could make a change to ACCESS_ONCE. Would something like CONFIG_ARCH_SCALAR_ACCESS_ONCE be a good start? This would boil down to Patch1: Provide stricter ACCESS_ONCE if CONFIG_ARCH_SCALAR_ACCESS_ONCE is set + docu update + comments Patch2: Change mm/* to barriers Patch3: Change x86 locks Patch4: Change x86 gup Patch4: Enable CONFIG_ARCH_SCALAR_ACCESS_ONCE for s390x and x86 Makes sense? Christian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
On 19.11.14 22:54, Suresh E. Warrier wrote: > > > On 11/14/2014 04:56 AM, Alexander Graf wrote: >> >> >> >>> Am 14.11.2014 um 00:29 schrieb Suresh E. Warrier >>> : >>> >>> This patch adds trace points in the guest entry and exit code and also >>> for exceptions handled by the host in kernel mode - hypercalls and page >>> faults. The new events are added to /sys/kernel/debug/tracing/events >>> under a new subsystem called kvm_hv. >>> >>> Acked-by: Paul Mackerras >>> Signed-off-by: Suresh Warrier >>> --- >>> arch/powerpc/kvm/book3s_64_mmu_hv.c | 12 +- >>> arch/powerpc/kvm/book3s_hv.c| 19 ++ >>> arch/powerpc/kvm/trace_hv.h | 497 >>> >>> 3 files changed, 525 insertions(+), 3 deletions(-) >>> create mode 100644 arch/powerpc/kvm/trace_hv.h >>> >>> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c >>> b/arch/powerpc/kvm/book3s_64_mmu_hv.c >>> index 70feb7b..20cbad1 100644 >>> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c >>> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c >>> @@ -38,6 +38,7 @@ >>> #include >>> >>> #include "book3s_hv_cma.h" >>> +#include "trace_hv.h" >>> >>> /* POWER7 has 10-bit LPIDs, PPC970 has 6-bit LPIDs */ >>> #define MAX_LPID_97063 >>> @@ -627,6 +628,8 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, >>> struct kvm_vcpu *vcpu, >>>gfn = gpa >> PAGE_SHIFT; >>>memslot = gfn_to_memslot(kvm, gfn); >>> >>> +trace_kvm_page_fault_enter(vcpu, hpte, memslot, ea, dsisr); >>> + >>>/* No memslot means it's an emulated MMIO region */ >>>if (!memslot || (memslot->flags & KVM_MEMSLOT_INVALID)) >>>return kvmppc_hv_emulate_mmio(run, vcpu, gpa, ea, >>> @@ -639,6 +642,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, >>> struct kvm_vcpu *vcpu, >>>mmu_seq = kvm->mmu_notifier_seq; >>>smp_rmb(); >>> >>> +ret = -EFAULT; >>>is_io = 0; >>>pfn = 0; >>>page = NULL; >>> @@ -662,7 +666,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, >>> struct kvm_vcpu *vcpu, >>>} >>>up_read(¤t->mm->mmap_sem); >>>if (!pfn) >>> -return -EFAULT; >>> +goto out_put; >>>} else { >>>page = pages[0]; >>>if (PageHuge(page)) { >>> @@ -690,14 +694,14 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, >>> struct kvm_vcpu *vcpu, >>>pfn = page_to_pfn(page); >>>} >>> >>> -ret = -EFAULT; >>>if (psize > pte_size) >>>goto out_put; >>> >>>/* Check WIMG vs. the actual page we're accessing */ >>>if (!hpte_cache_flags_ok(r, is_io)) { >>>if (is_io) >>> -return -EFAULT; >>> +goto out_put; >>> + >>>/* >>> * Allow guest to map emulated device memory as >>> * uncacheable, but actually make it cacheable. >>> @@ -753,6 +757,8 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, >>> struct kvm_vcpu *vcpu, >>>SetPageDirty(page); >>> >>> out_put: >>> +trace_kvm_page_fault_exit(vcpu, hpte, ret); >>> + >>>if (page) { >>>/* >>> * We drop pages[0] here, not page because page might >>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c >>> index 69d4085..5143d17 100644 >>> --- a/arch/powerpc/kvm/book3s_hv.c >>> +++ b/arch/powerpc/kvm/book3s_hv.c >>> @@ -57,6 +57,9 @@ >>> >>> #include "book3s.h" >>> >>> +#define CREATE_TRACE_POINTS >>> +#include "trace_hv.h" >>> + >>> /* #define EXIT_DEBUG */ >>> /* #define EXIT_DEBUG_SIMPLE */ >>> /* #define EXIT_DEBUG_INT */ >>> @@ -1679,6 +1682,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) >>>list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) { >>>kvmppc_start_thread(vcpu); >>>kvmppc_create_dtl_entry(vcpu, vc); >>> +trace_kvm_guest_enter(vcpu); >>>} >>> >>>/* Set this explicitly in case thread 0 doesn't have a vcpu */ >>> @@ -1687,6 +1691,9 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) >>> >>>vc->vcore_state = VCORE_RUNNING; >>>preempt_disable(); >>> + >>> +trace_kvmppc_run_core(vc, 0); >>> + >>>spin_unlock(&vc->lock); >>> >>>kvm_guest_enter(); >>> @@ -1732,6 +1739,8 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) >>>kvmppc_core_pending_dec(vcpu)) >>>kvmppc_core_dequeue_dec(vcpu); >>> >>> +trace_kvm_guest_exit(vcpu); >>> + >>>ret = RESUME_GUEST; >>>if (vcpu->arch.trap) >>>ret = kvmppc_handle_exit_hv(vcpu->arch.kvm_run, vcpu, >>> @@ -1757,6 +1766,8 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) >>>wake_up(&vcpu->arch.cpu_run); >>>} >>>} >>> + >>> +trace_kvmppc_run_core(vc, 1); >>> } >>> >>> /* >>> @@ -1783,11 +1794,13 @@ static void kvmppc_vcore_blocked(struct >>> kvmppc_vcore *vc) >>> >>>prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE); >>>vc->vcore_state = VCORE_SLEEPING; >>> +trace_kvmppc_vcore_blocked(vc, 0); >>>spin_unlock(&vc->lock); >>>schedule(); >>>fini
Re: [PATCH] KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
On 20.11.14 11:40, Aneesh Kumar K.V wrote: > "Suresh E. Warrier" writes: > >> This patch adds trace points in the guest entry and exit code and also >> for exceptions handled by the host in kernel mode - hypercalls and page >> faults. The new events are added to /sys/kernel/debug/tracing/events >> under a new subsystem called kvm_hv. > > > >> /* Set this explicitly in case thread 0 doesn't have a vcpu */ >> @@ -1687,6 +1691,9 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) >> >> vc->vcore_state = VCORE_RUNNING; >> preempt_disable(); >> + >> +trace_kvmppc_run_core(vc, 0); >> + >> spin_unlock(&vc->lock); > > Do we really want to call tracepoint with spin lock held ? Is that a good > thing to do ?. I thought it was safe to call tracepoints inside of spin lock regions? Steve? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: ia64: remove
2014-11-19 22:05+0100, Paolo Bonzini: > KVM for ia64 has been marked as broken not just once, but twice even, > and the last patch from the maintainer is now roughly 5 years old. > Time for it to rest in piece. > > Signed-off-by: Paolo Bonzini > --- Nice, if only every diffstat was like that! I propose another removal. (The reasoning below wasn't confirmed with ia64 compiler. I'd remove the ioctls even if they worked.) ---8<--- KVM: remove buggy ia64 specific ioctls IA64 is no longer present so new applications shouldn't use them. The main problem is that they most likely didn't work even before, because we have misused ioctl #define KVM_SET_GUEST_DEBUG _IOW(KVMIO, 0x9b, struct kvm_guest_debug) #define KVM_IA64_VCPU_SET_STACK _IOW(KVMIO, 0x9b, void *) as struct kvm_guest_debug { __u32 control; __u32 pad; struct kvm_guest_debug_arch arch; }; and struct kvm_guest_debug_arch { }; mean that sizeof(struct kvm_guest_debug) == sizeof(void *) == 8 thus KVM_SET_GUEST_DEBUG == KVM_IA64_VCPU_SET_STACK and KVM_SET_GUEST_DEBUG is handled before KVM_IA64_VCPU_SET_STACK. Signed-off-by: Radim Krčmář --- include/uapi/linux/kvm.h | 3 --- 1 file changed, 3 deletions(-) diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 6d59e5b..a37fd12 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1099,9 +1099,6 @@ struct kvm_s390_ucas_mapping { #define KVM_X86_SETUP_MCE _IOW(KVMIO, 0x9c, __u64) #define KVM_X86_GET_MCE_CAP_SUPPORTED _IOR(KVMIO, 0x9d, __u64) #define KVM_X86_SET_MCE _IOW(KVMIO, 0x9e, struct kvm_x86_mce) -/* IA64 stack access */ -#define KVM_IA64_VCPU_GET_STACK _IOR(KVMIO, 0x9a, void *) -#define KVM_IA64_VCPU_SET_STACK _IOW(KVMIO, 0x9b, void *) /* Available with KVM_CAP_VCPU_EVENTS */ #define KVM_GET_VCPU_EVENTS _IOR(KVMIO, 0x9f, struct kvm_vcpu_events) #define KVM_SET_VCPU_EVENTS _IOW(KVMIO, 0xa0, struct kvm_vcpu_events) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: x86: move ioapic.c and irq_comm.c back to arch/x86/
ia64 does not need them anymore. Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/kvm_host.h | 16 arch/x86/kvm/Makefile | 5 ++--- {virt => arch/x86}/kvm/ioapic.c | 0 {virt => arch/x86}/kvm/ioapic.h | 1 - {virt => arch/x86}/kvm/irq_comm.c | 4 ++-- arch/x86/kvm/x86.c| 1 + include/linux/kvm_host.h | 22 -- virt/kvm/eventfd.c| 7 --- virt/kvm/kvm_main.c | 3 --- 9 files changed, 29 insertions(+), 30 deletions(-) rename {virt => arch/x86}/kvm/ioapic.c (100%) rename {virt => arch/x86}/kvm/ioapic.h (98%) rename {virt => arch/x86}/kvm/irq_comm.c (98%) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 769db36a3001..76ff3e2d8fd2 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -603,6 +603,9 @@ struct kvm_arch { struct kvm_xen_hvm_config xen_hvm_config; + /* reads protected by irq_srcu, writes by irq_lock */ + struct hlist_head mask_notifier_list; + /* fields used by HYPER-V emulation */ u64 hv_guest_os_id; u64 hv_hypercall; @@ -819,6 +822,19 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa, const void *val, int bytes); u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn); +struct kvm_irq_mask_notifier { + void (*func)(struct kvm_irq_mask_notifier *kimn, bool masked); + int irq; + struct hlist_node link; +}; + +void kvm_register_irq_mask_notifier(struct kvm *kvm, int irq, + struct kvm_irq_mask_notifier *kimn); +void kvm_unregister_irq_mask_notifier(struct kvm *kvm, int irq, + struct kvm_irq_mask_notifier *kimn); +void kvm_fire_mask_notifiers(struct kvm *kvm, unsigned irqchip, unsigned pin, +bool mask); + extern bool tdp_enabled; u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index 25d22b2d6509..ee1cd92b03be 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -7,14 +7,13 @@ CFLAGS_vmx.o := -I. KVM := ../../../virt/kvm -kvm-y += $(KVM)/kvm_main.o $(KVM)/ioapic.o \ - $(KVM)/coalesced_mmio.o $(KVM)/irq_comm.o \ +kvm-y += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \ $(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT)+= $(KVM)/assigned-dev.o $(KVM)/iommu.o kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \ - i8254.o cpuid.o pmu.o + i8254.o ioapic.o irq_comm.o cpuid.o pmu.o kvm-intel-y+= vmx.o kvm-amd-y += svm.o diff --git a/virt/kvm/ioapic.c b/arch/x86/kvm/ioapic.c similarity index 100% rename from virt/kvm/ioapic.c rename to arch/x86/kvm/ioapic.c diff --git a/virt/kvm/ioapic.h b/arch/x86/kvm/ioapic.h similarity index 98% rename from virt/kvm/ioapic.h rename to arch/x86/kvm/ioapic.h index dc3baa3a538f..deac8d509f2a 100644 --- a/virt/kvm/ioapic.h +++ b/arch/x86/kvm/ioapic.h @@ -96,7 +96,6 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, struct kvm_lapic_irq *irq, unsigned long *dest_map); int kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state); int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state); -void kvm_vcpu_request_scan_ioapic(struct kvm *kvm); void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap, u32 *tmr); diff --git a/virt/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c similarity index 98% rename from virt/kvm/irq_comm.c rename to arch/x86/kvm/irq_comm.c index 1345bde064f5..e9c135b639aa 100644 --- a/virt/kvm/irq_comm.c +++ b/arch/x86/kvm/irq_comm.c @@ -234,7 +234,7 @@ void kvm_register_irq_mask_notifier(struct kvm *kvm, int irq, { mutex_lock(&kvm->irq_lock); kimn->irq = irq; - hlist_add_head_rcu(&kimn->link, &kvm->mask_notifier_list); + hlist_add_head_rcu(&kimn->link, &kvm->arch.mask_notifier_list); mutex_unlock(&kvm->irq_lock); } @@ -256,7 +256,7 @@ void kvm_fire_mask_notifiers(struct kvm *kvm, unsigned irqchip, unsigned pin, idx = srcu_read_lock(&kvm->irq_srcu); gsi = kvm_irq_map_chip_pin(kvm, irqchip, pin); if (gsi != -1) - hlist_for_each_entry_rcu(kimn, &kvm->mask_notifier_list, link) + hlist_for_each_entry_rcu(kimn, &kvm->arch.mask_notifier_list, link) if (kimn->irq == gsi) kimn->func(kimn, mask); srcu_read_unlock(&kvm->irq_srcu, idx); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a8f53a6960fd..5337039427c8 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c
Re: [PATCH] KVM: ia64: remove
On 20/11/2014 14:38, Radim Krčmář wrote: > I propose another removal. > (The reasoning below wasn't confirmed with ia64 compiler. > I'd remove the ioctls even if they worked.) Checked and applied, thanks. Paolo > ---8<--- > KVM: remove buggy ia64 specific ioctls > > IA64 is no longer present so new applications shouldn't use them. > > The main problem is that they most likely didn't work even before, > because we have misused ioctl > > #define KVM_SET_GUEST_DEBUG _IOW(KVMIO, 0x9b, struct kvm_guest_debug) > #define KVM_IA64_VCPU_SET_STACK _IOW(KVMIO, 0x9b, void *) > > as > > struct kvm_guest_debug { > __u32 control; > __u32 pad; > struct kvm_guest_debug_arch arch; > }; > > and > > struct kvm_guest_debug_arch { > }; > > mean that > > sizeof(struct kvm_guest_debug) == sizeof(void *) == 8 > > thus > > KVM_SET_GUEST_DEBUG == KVM_IA64_VCPU_SET_STACK > > and KVM_SET_GUEST_DEBUG is handled before KVM_IA64_VCPU_SET_STACK. > > Signed-off-by: Radim Krčmář > --- > include/uapi/linux/kvm.h | 3 --- > 1 file changed, 3 deletions(-) > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index 6d59e5b..a37fd12 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -1099,9 +1099,6 @@ struct kvm_s390_ucas_mapping { > #define KVM_X86_SETUP_MCE _IOW(KVMIO, 0x9c, __u64) > #define KVM_X86_GET_MCE_CAP_SUPPORTED _IOR(KVMIO, 0x9d, __u64) > #define KVM_X86_SET_MCE _IOW(KVMIO, 0x9e, struct kvm_x86_mce) > -/* IA64 stack access */ > -#define KVM_IA64_VCPU_GET_STACK _IOR(KVMIO, 0x9a, void *) > -#define KVM_IA64_VCPU_SET_STACK _IOW(KVMIO, 0x9b, void *) > /* Available with KVM_CAP_VCPU_EVENTS */ > #define KVM_GET_VCPU_EVENTS _IOR(KVMIO, 0x9f, struct kvm_vcpu_events) > #define KVM_SET_VCPU_EVENTS _IOW(KVMIO, 0xa0, struct kvm_vcpu_events) > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
On Thu, 20 Nov 2014 13:10:12 +0100 Alexander Graf wrote: > > > On 20.11.14 11:40, Aneesh Kumar K.V wrote: > > "Suresh E. Warrier" writes: > > > >> This patch adds trace points in the guest entry and exit code and also > >> for exceptions handled by the host in kernel mode - hypercalls and page > >> faults. The new events are added to /sys/kernel/debug/tracing/events > >> under a new subsystem called kvm_hv. > > > > > > > >>/* Set this explicitly in case thread 0 doesn't have a vcpu */ > >> @@ -1687,6 +1691,9 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) > >> > >>vc->vcore_state = VCORE_RUNNING; > >>preempt_disable(); > >> + > >> + trace_kvmppc_run_core(vc, 0); > >> + > >>spin_unlock(&vc->lock); > > > > Do we really want to call tracepoint with spin lock held ? Is that a good > > thing to do ?. > > I thought it was safe to call tracepoints inside of spin lock regions? > Steve? > There's tracepoints in the guts of the scheduler where rq lock is held. Don't worry about it. The tracing system is lockless. -- Steve -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 01/19] vfio/platform: initial skeleton of VFIO support for platform devices
On Wed, Nov 12, 2014 at 5:49 PM, Alex Williamson wrote: > > On Wed, 2014-11-12 at 11:05 +0100, Eric Auger wrote: > > Hi Antonios, > > > > On 10/27/2014 07:07 PM, Antonios Motakis wrote: > > > This patch forms the common skeleton code for platform devices support > > > with VFIO. This will include the core functionality of VFIO_PLATFORM, > > > however binding to the device and discovering the device resources will > > > be done with the help of a separate file where any Linux platform bus > > > specific code will reside. > > > > > > This will allow us to implement support for also discovering AMBA devices > > > and their resources, but still reuse a large part of the VFIO_PLATFORM > > > implementation. > > > > > > Signed-off-by: Antonios Motakis > > > --- > > > drivers/vfio/platform/vfio_platform_common.c | 126 > > > ++ > > > drivers/vfio/platform/vfio_platform_private.h | 36 > > > 2 files changed, 162 insertions(+) > > > create mode 100644 drivers/vfio/platform/vfio_platform_common.c > > > create mode 100644 drivers/vfio/platform/vfio_platform_private.h > > > > > > diff --git a/drivers/vfio/platform/vfio_platform_common.c > > > b/drivers/vfio/platform/vfio_platform_common.c > > > new file mode 100644 > > > index 000..e0fdbc8 > > > --- /dev/null > > > +++ b/drivers/vfio/platform/vfio_platform_common.c > > > @@ -0,0 +1,126 @@ > > > +/* > > > + * Copyright (C) 2013 - Virtual Open Systems > > > + * Author: Antonios Motakis > > > + * > > > + * This program is free software; you can redistribute it and/or modify > > > + * it under the terms of the GNU General Public License, version 2, as > > > + * published by the Free Software Foundation. > > > + * > > > + * This program is distributed in the hope that it will be useful, > > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > > + * GNU General Public License for more details. > > > + */ > > > + > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > not sure at that state all the above includes are needed. > > > + > > > +#include "vfio_platform_private.h" > > > + > > > +static void vfio_platform_release(void *device_data) > > > +{ > > > + module_put(THIS_MODULE); > > > +} > > > + > > > +static int vfio_platform_open(void *device_data) > > > +{ > > > + if (!try_module_get(THIS_MODULE)) > > > + return -ENODEV; > > > + > > > + return 0; > > > +} > > > + > > > +static long vfio_platform_ioctl(void *device_data, > > > + unsigned int cmd, unsigned long arg) > > a minor style comment/question that applies on all the series. Shouldn't > > subsequent argument lines rather be aligned with the first char after > > "(", as done in PCI code? > > It's also my preferred style to indent to just after the open paren on > wrapped lines where possible, but I don't think there are hard rules in > CodingStyle or checkpatch that enforce this, so I often let it slide. > Thanks, > You're right that there are no hard coding style rules for this, but I also like this style so I'll apply it more consistently. > Alex > > > > +{ > > > + if (cmd == VFIO_DEVICE_GET_INFO) > > > + return -EINVAL; > > > + > > > + else if (cmd == VFIO_DEVICE_GET_REGION_INFO) > > > + return -EINVAL; > > > + > > > + else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) > > > + return -EINVAL; > > > + > > > + else if (cmd == VFIO_DEVICE_SET_IRQS) > > > + return -EINVAL; > > > + > > > + else if (cmd == VFIO_DEVICE_RESET) > > > + return -EINVAL; > > > + > > > + return -ENOTTY; > > > +} > > > + > > > +static ssize_t vfio_platform_read(void *device_data, char __user *buf, > > > + size_t count, loff_t *ppos) > > > +{ > > > + return -EINVAL; > > > +} > > > + > > > +static ssize_t vfio_platform_write(void *device_data, const char __user > > > *buf, > > > + size_t count, loff_t *ppos) > > > +{ > > > + return -EINVAL; > > > +} > > > + > > > +static int vfio_platform_mmap(void *device_data, struct vm_area_struct > > > *vma) > > > +{ > > > + return -EINVAL; > > > +} > > > + > > > +static const struct vfio_device_ops vfio_platform_ops = { > > > + .name = "vfio-platform", > > > + .open = vfio_platform_open, > > > + .release= vfio_platform_release, > > > + .ioctl = vfio_platform_ioctl, > > > + .read = vfio_platform_read, > > > + .write = vfio_platform_write, > > > + .mmap = vfio_platform_mmap, > > > +}; > > > + > > > +int vfio_platform_probe_common(struct vfio_platform_device *vdev, > > > + struct device *dev) > > > +{ > > > + struct iommu_group *group; > > > + int ret; > > > + > >
Re: [PATCH v9 06/19] vfio/platform: return info for bound device
On Wed, Nov 12, 2014 at 5:36 PM, Alex Williamson wrote: > On Wed, 2014-11-12 at 11:32 +0100, Eric Auger wrote: >> On 10/27/2014 07:07 PM, Antonios Motakis wrote: >> > A VFIO userspace driver will start by opening the VFIO device >> > that corresponds to an IOMMU group, and will use the ioctl interface >> > to get the basic device info, such as number of memory regions and >> > interrupts, and their properties. This patch enables the >> > VFIO_DEVICE_GET_INFO ioctl call. >> > >> > Signed-off-by: Antonios Motakis >> > --- >> > drivers/vfio/platform/vfio_platform_common.c | 23 --- >> > 1 file changed, 20 insertions(+), 3 deletions(-) >> > >> > diff --git a/drivers/vfio/platform/vfio_platform_common.c >> > b/drivers/vfio/platform/vfio_platform_common.c >> > index e0fdbc8..cb20526 100644 >> > --- a/drivers/vfio/platform/vfio_platform_common.c >> > +++ b/drivers/vfio/platform/vfio_platform_common.c >> > @@ -43,10 +43,27 @@ static int vfio_platform_open(void *device_data) >> > static long vfio_platform_ioctl(void *device_data, >> >unsigned int cmd, unsigned long arg) >> > { >> > - if (cmd == VFIO_DEVICE_GET_INFO) >> > - return -EINVAL; >> > + struct vfio_platform_device *vdev = device_data; >> > + unsigned long minsz; >> > + >> > + if (cmd == VFIO_DEVICE_GET_INFO) { >> > + struct vfio_device_info info; >> > + >> > + minsz = offsetofend(struct vfio_device_info, num_irqs); >> > + >> > + if (copy_from_user(&info, (void __user *)arg, minsz)) >> > + return -EFAULT; >> > + >> > + if (info.argsz < minsz) >> > + return -EINVAL; >> > + >> > + info.flags = vdev->flags; >> > + info.num_regions = 0; >> > + info.num_irqs = 0; >> Seems a bit weird to me to enable the modality but returning zeroed >> values. Shouldn't we put that patch after VFIO_DEVICE_GET_REGION_INFO >> and VFIO_DEVICE_GET_IRQ_INFO ones? > > I actually like how Antonios has started from a base framework, exposing > a device but none of the resources and then incrementally adds each > component. It's also a good showcase of the VFIO ABI that we can do > things like this. Thanks, I also agree with Alex with this. But of course I'm not married with any particular splitting style, in case we decide to change this. > > Alex > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM module to Kconfig
On Thu, Nov 13, 2014 at 9:05 AM, Hongbo Zhang wrote: > On 12 November 2014 17:57, Antonios Motakis > wrote: >> Hello Hongbo, >> >> On Wed, Nov 12, 2014 at 10:52 AM, Hongbo Zhang >> wrote: >>> On 28 October 2014 02:07, Antonios Motakis >>> wrote: Enable building the VFIO PLATFORM driver that allows to use Linux platform devices with VFIO. Signed-off-by: Antonios Motakis --- drivers/vfio/Kconfig | 1 + drivers/vfio/Makefile | 1 + drivers/vfio/platform/Kconfig | 9 + drivers/vfio/platform/Makefile | 4 4 files changed, 15 insertions(+) create mode 100644 drivers/vfio/platform/Kconfig create mode 100644 drivers/vfio/platform/Makefile diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index a0abe04..962fb80 100644 --- a/drivers/vfio/Kconfig +++ b/drivers/vfio/Kconfig @@ -27,3 +27,4 @@ menuconfig VFIO If you don't know what to do here, say N. source "drivers/vfio/pci/Kconfig" +source "drivers/vfio/platform/Kconfig" diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile index 0b035b1..dadf0ca 100644 --- a/drivers/vfio/Makefile +++ b/drivers/vfio/Makefile @@ -3,3 +3,4 @@ obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o obj-$(CONFIG_VFIO_PCI) += pci/ +obj-$(CONFIG_VFIO_PLATFORM) += platform/ diff --git a/drivers/vfio/platform/Kconfig b/drivers/vfio/platform/Kconfig new file mode 100644 index 000..c51af17 --- /dev/null +++ b/drivers/vfio/platform/Kconfig @@ -0,0 +1,9 @@ +config VFIO_PLATFORM + tristate "VFIO support for platform devices" + depends on VFIO && EVENTFD && ARM >>> >>> Hi Antonios, >>> Is this only for ARM? how about X86 and PowerPC? >>> On Freescale's PowerPC platform, the IOMMU is called PAMU (Peripheral >>> Access Management Unit), and I am trying to use this VFIO framework on >>> it. >>> >> >> In principle it should be working on any platform with such devices; >> as long as you have a VFIO IOMMU driver for the PAMU (on ARM we use >> VFIO PLATFORM for the device, with VFIO IOMMU TYPE1 for the IOMMU). >> > > Antonios, > As far as you know, on which ARM platform can I apply your patches directly? > My purpose is to apply you patches[1], and then implement a user space > driver to evaluate the performance. > In principle, if your target has a working IOMMU in front of a platform or AMBA device, then you should be able to use this. In practice, I have tested this on various fast models, and less extensively on Arndale in the past. Linaro as far as I know has tested this series on Calxeda devices with an xgmac NIC. > [1] It is better without manually merging conflicts/dependencies etc, > I am vfio-platform user, not a iommu expert. > >> So if you have a suitable IOMMU driver for your target, feel free to >> test it, and let us know of the results. >> + help + Support for platform devices with VFIO. This is required to make + use of platform devices present on the system using the VFIO + framework. + + If you don't know what to do here, say N. diff --git a/drivers/vfio/platform/Makefile b/drivers/vfio/platform/Makefile new file mode 100644 index 000..279862b --- /dev/null +++ b/drivers/vfio/platform/Makefile @@ -0,0 +1,4 @@ + +vfio-platform-y := vfio_platform.o vfio_platform_common.o + +obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o -- 2.1.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 06/19] vfio/platform: return info for bound device
On 11/20/2014 03:10 PM, Antonios Motakis wrote: > On Wed, Nov 12, 2014 at 5:36 PM, Alex Williamson > wrote: >> On Wed, 2014-11-12 at 11:32 +0100, Eric Auger wrote: >>> On 10/27/2014 07:07 PM, Antonios Motakis wrote: A VFIO userspace driver will start by opening the VFIO device that corresponds to an IOMMU group, and will use the ioctl interface to get the basic device info, such as number of memory regions and interrupts, and their properties. This patch enables the VFIO_DEVICE_GET_INFO ioctl call. Signed-off-by: Antonios Motakis --- drivers/vfio/platform/vfio_platform_common.c | 23 --- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/drivers/vfio/platform/vfio_platform_common.c b/drivers/vfio/platform/vfio_platform_common.c index e0fdbc8..cb20526 100644 --- a/drivers/vfio/platform/vfio_platform_common.c +++ b/drivers/vfio/platform/vfio_platform_common.c @@ -43,10 +43,27 @@ static int vfio_platform_open(void *device_data) static long vfio_platform_ioctl(void *device_data, unsigned int cmd, unsigned long arg) { - if (cmd == VFIO_DEVICE_GET_INFO) - return -EINVAL; + struct vfio_platform_device *vdev = device_data; + unsigned long minsz; + + if (cmd == VFIO_DEVICE_GET_INFO) { + struct vfio_device_info info; + + minsz = offsetofend(struct vfio_device_info, num_irqs); + + if (copy_from_user(&info, (void __user *)arg, minsz)) + return -EFAULT; + + if (info.argsz < minsz) + return -EINVAL; + + info.flags = vdev->flags; + info.num_regions = 0; + info.num_irqs = 0; >>> Seems a bit weird to me to enable the modality but returning zeroed >>> values. Shouldn't we put that patch after VFIO_DEVICE_GET_REGION_INFO >>> and VFIO_DEVICE_GET_IRQ_INFO ones? >> >> I actually like how Antonios has started from a base framework, exposing >> a device but none of the resources and then incrementally adds each >> component. It's also a good showcase of the VFIO ABI that we can do >> things like this. Thanks, > > I also agree with Alex with this. But of course I'm not married with > any particular splitting style, in case we decide to change this. Hi Antonios, please keep as is. I also learn each day about splitting style ;-) Best Regards Eric > >> >> Alex >> -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall wrote: > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote: >> Hi All, >> >> I have second thoughts about rebasing KVM PMU patches >> to Marc's irq-forwarding patches. >> >> The PMU IRQs (when virtualized by KVM) are not exactly >> forwarded IRQs because they are shared between Host >> and Guest. >> >> Scenario1 >> - >> >> We might have perf running on Host and no KVM guest >> running. In this scenario, we wont get interrupts on Host >> because the kvm_pmu_hyp_init() (similar to the function >> kvm_timer_hyp_init() of Marc's IRQ-forwarding >> implementation) has put all host PMU IRQs in forwarding >> mode. >> >> The only way solve this problem is to not set forwarding >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead >> have special routines to turn on and turn off the forwarding >> mode of PMU IRQs. These routines will be called from >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ >> forwarding state. >> >> Scenario2 >> - >> >> We might have perf running on Host and Guest simultaneously >> which means it is quite likely that PMU HW trigger IRQ meant >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine >> of Marc's patchset which is called before local_irq_enable()). >> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu) >> will accidentally forward IRQ meant for Host to Guest unless >> we put additional checks to inspect VCPU PMU state. >> >> Am I missing any detail about IRQ forwarding for above >> scenarios? >> > Hi Anup, Hi Christoffer, > > I briefly discussed this with Marc. What I don't understand is how it > would be possible to get an interrupt for the host while running the > guest? > > The rationale behind my question is that whenever you're running the > guest, the PMU should be programmed exclusively with guest state, and > since the PMU is per core, any interrupts should be for the guest, where > it would always be pending. Yes, thats right PMU is programmed exclusively for guest when guest is running and for host when host is running. Let us assume a situation (Scenario2 mentioned previously) where both host and guest are using PMU. When the guest is running we come back to host mode due to variety of reasons (stage2 fault, guest IO, regular host interrupt, host interrupt meant for guest, ) which means we will return from the "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the kvm_arch_vcpu_ioctl_run() function with local IRQs disabled. At this point we would have restored back host PMU context and any PMU counter used by host can trigger PMU overflow interrup for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);" in the kvm_arch_vcpu_ioctl_run() function (similar to the kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset) which will try to detect PMU irq forwarding state in GIC hence it can accidentally discover PMU irq pending for guest while this PMU irq is actually meant for host. This above mentioned situation does not happen for timer because virtual timer interrupts are exclusively used for guest. The exclusive use of virtual timer interrupt for guest ensures that the function kvm_timer_sync_hwstate() will always see correct state of virtual timer IRQ from GIC. > > When migrating a VM with a pending PMU interrupt away for a CPU core, we > also capture the active state (the forwarding patches already handle > this), and obviously the PMU state along with it. Yes, the migration of PMU state and PMU interrupt state is quite clear. > > Does this address your concern? I hope above description give you idea about the concern raised by me. > > -Christoffer Regards, Anup -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] vfio: put off the allocation of "minor" in vfio_create_group
On Thu, 2014-11-20 at 19:25 +0800, Zhen Lei wrote: > The next code fragment "list_for_each_entry" is not depend on "minor". With > this > patch, the free of "minor" in "list_for_each_entry" can be reduced, and there > is > no functional change. A reasonable micro-optimization, but I'm curious if you're actually seeing some measurable overhead from this. It seems like we'd need to have multiple devices, all within the same IOMMU group, all probed by vfio-pci at the same time to exercise the race condition. Thanks, Alex > Signed-off-by: Zhen Lei > --- > drivers/vfio/vfio.c | 13 ++--- > 1 file changed, 6 insertions(+), 7 deletions(-) > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c > index f018d8d..737eb468 100644 > --- a/drivers/vfio/vfio.c > +++ b/drivers/vfio/vfio.c > @@ -225,22 +225,21 @@ static struct vfio_group *vfio_create_group(struct > iommu_group *iommu_group) > > mutex_lock(&vfio.group_lock); > > - minor = vfio_alloc_group_minor(group); > - if (minor < 0) { > - vfio_group_unlock_and_free(group); > - return ERR_PTR(minor); > - } > - > /* Did we race creating this group? */ > list_for_each_entry(tmp, &vfio.group_list, vfio_next) { > if (tmp->iommu_group == iommu_group) { > vfio_group_get(tmp); > - vfio_free_group_minor(minor); > vfio_group_unlock_and_free(group); > return tmp; > } > } > > + minor = vfio_alloc_group_minor(group); > + if (minor < 0) { > + vfio_group_unlock_and_free(group); > + return ERR_PTR(minor); > + } > + > dev = device_create(vfio.class, NULL, > MKDEV(MAJOR(vfio.group_devt), minor), > group, "%d", iommu_group_id(iommu_group)); > -- > 1.8.0 > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v1 1/2] vfio: Add new interrupt group for VFIO
On Thu, 2014-11-20 at 17:05 +0800, Feng Wu wrote: > Add new group KVM_DEV_VFIO_INTERRUPT and command > KVM_DEV_VFIO_DEVIE_POSTING_IRQ related to it. > > This is used for VT-d Posted-Interrupts setup. Eric proposed an interface for ARM forwarded interrupts[1] using group KVM_DEV_VFIO_DEVICE with attributes KVM_DEV_VFIO_DEVICE_ASSIGN_IRQ and KVM_DEV_VFIO_DEVICE_DEASSIGN_IRQ. Why are we proposing yet another group and attributes here? Why can't we re-use the ones Eric proposes? [1] https://lkml.org/lkml/2014/8/25/258 > Signed-off-by: Feng Wu > --- > Documentation/virtual/kvm/devices/vfio.txt |8 > include/uapi/linux/kvm.h | 14 ++ > 2 files changed, 22 insertions(+), 0 deletions(-) > > diff --git a/Documentation/virtual/kvm/devices/vfio.txt > b/Documentation/virtual/kvm/devices/vfio.txt > index ef51740..bd99176 100644 > --- a/Documentation/virtual/kvm/devices/vfio.txt > +++ b/Documentation/virtual/kvm/devices/vfio.txt > @@ -13,6 +13,7 @@ VFIO-group is held by KVM. > > Groups: >KVM_DEV_VFIO_GROUP > + KVM_DEV_VFIO_INTERRUPT > > KVM_DEV_VFIO_GROUP attributes: >KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking > @@ -20,3 +21,10 @@ KVM_DEV_VFIO_GROUP attributes: > > For each, kvm_device_attr.addr points to an int32_t file descriptor > for the VFIO group. > + > +KVM_DEV_VFIO_INTERRUPT attributes: > + KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ: Set up the interrupt configuration for > +VT-d Posted-Interrrupts > + > +For each, kvm_device_attr.addr points to struct kvm_posted_intr, which > +include the needed information for VT-d Posted-Interrupts setup. > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index 6076882..5544fcc 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -943,9 +943,23 @@ struct kvm_device_attr { > __u64 addr; /* userspace address of attr data */ > }; > > +struct virq_info { > + __u32 index; /* index of the msi/msix entry */ > + int virq; /* virq of the interrupt */ > +}; > + > +struct kvm_posted_intr { > + __u32 fd; /* file descriptor of the VFIO device */ > + __u32 count; > + boolmsix; Note that MSI-X (as opposed to MSI) is a PCI concept. Being a VFIO interface this should operate at VFIO IRQ index and sub-index. > + struct virq_info virq_info[0]; > +}; > + > #define KVM_DEV_VFIO_GROUP 1 > #define KVM_DEV_VFIO_GROUP_ADD 1 > #define KVM_DEV_VFIO_GROUP_DEL 2 > +#define KVM_DEV_VFIO_INTERRUPT 2 > +#define KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ 1 > > enum kvm_device_type { > KVM_DEV_TYPE_FSL_MPIC_20= 1, -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v1 1/2] vfio: Add new interrupt group for VFIO
On 11/20/2014 04:53 PM, Alex Williamson wrote: > On Thu, 2014-11-20 at 17:05 +0800, Feng Wu wrote: >> Add new group KVM_DEV_VFIO_INTERRUPT and command >> KVM_DEV_VFIO_DEVIE_POSTING_IRQ related to it. >> >> This is used for VT-d Posted-Interrupts setup. > > Eric proposed an interface for ARM forwarded interrupts[1] using group > KVM_DEV_VFIO_DEVICE with attributes KVM_DEV_VFIO_DEVICE_ASSIGN_IRQ and > KVM_DEV_VFIO_DEVICE_DEASSIGN_IRQ. Why are we proposing yet another > group and attributes here? Why can't we re-use the ones Eric proposes? Hi Alex, Feng I share your point of view about the KVM_DEV_VFIO_DEVICE group. For the attribute (renamed KVM_DEV_VFIO_DEVICE_FORWARD_IRQ in RFC v2) the issue is I specify the kvm_device_attr.addr points to a kvm_arch_forwarded_irq struct. Feng needs another struct - kvm_posted_intr -. An alternative is to merge both structs if it makes sense. Best Regards Eric > > [1] https://lkml.org/lkml/2014/8/25/258 > >> Signed-off-by: Feng Wu >> --- >> Documentation/virtual/kvm/devices/vfio.txt |8 >> include/uapi/linux/kvm.h | 14 ++ >> 2 files changed, 22 insertions(+), 0 deletions(-) >> >> diff --git a/Documentation/virtual/kvm/devices/vfio.txt >> b/Documentation/virtual/kvm/devices/vfio.txt >> index ef51740..bd99176 100644 >> --- a/Documentation/virtual/kvm/devices/vfio.txt >> +++ b/Documentation/virtual/kvm/devices/vfio.txt >> @@ -13,6 +13,7 @@ VFIO-group is held by KVM. >> >> Groups: >>KVM_DEV_VFIO_GROUP >> + KVM_DEV_VFIO_INTERRUPT >> >> KVM_DEV_VFIO_GROUP attributes: >>KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking >> @@ -20,3 +21,10 @@ KVM_DEV_VFIO_GROUP attributes: >> >> For each, kvm_device_attr.addr points to an int32_t file descriptor >> for the VFIO group. >> + >> +KVM_DEV_VFIO_INTERRUPT attributes: >> + KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ: Set up the interrupt configuration for >> +VT-d Posted-Interrrupts >> + >> +For each, kvm_device_attr.addr points to struct kvm_posted_intr, which >> +include the needed information for VT-d Posted-Interrupts setup. >> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h >> index 6076882..5544fcc 100644 >> --- a/include/uapi/linux/kvm.h >> +++ b/include/uapi/linux/kvm.h >> @@ -943,9 +943,23 @@ struct kvm_device_attr { >> __u64 addr; /* userspace address of attr data */ >> }; >> >> +struct virq_info { >> +__u32 index; /* index of the msi/msix entry */ >> +int virq; /* virq of the interrupt */ >> +}; >> + >> +struct kvm_posted_intr { >> +__u32 fd; /* file descriptor of the VFIO device */ >> +__u32 count; >> +boolmsix; > > Note that MSI-X (as opposed to MSI) is a PCI concept. Being a VFIO > interface this should operate at VFIO IRQ index and sub-index. > >> +struct virq_info virq_info[0]; >> +}; >> + >> #define KVM_DEV_VFIO_GROUP 1 >> #define KVM_DEV_VFIO_GROUP_ADD1 >> #define KVM_DEV_VFIO_GROUP_DEL2 >> +#define KVM_DEV_VFIO_INTERRUPT 2 >> +#define KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ1 >> >> enum kvm_device_type { >> KVM_DEV_TYPE_FSL_MPIC_20= 1, > > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v1 2/2] vfio: Add VFIO API vfio_msi_get_irq
On Thu, 2014-11-20 at 17:05 +0800, Feng Wu wrote: > This API returns the host irq for the MSI/MSI-X interrrupts. > > Signed-off-by: Feng Wu > --- > drivers/vfio/pci/vfio_pci.c | 10 ++ > include/linux/vfio.h|2 ++ > 2 files changed, 12 insertions(+), 0 deletions(-) > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c > index 9558da3..4fb9828 100644 > --- a/drivers/vfio/pci/vfio_pci.c > +++ b/drivers/vfio/pci/vfio_pci.c > @@ -1009,6 +1009,16 @@ put_devs: > kfree(devs.devices); > } > > +unsigned int vfio_msi_get_irq(struct vfio_device *device, int vector, bool > msix) > +{ > + struct vfio_pci_device *vdev = > + (struct vfio_pci_device *)vfio_device_data(device); > + struct pci_dev *pdev = vdev->pdev; > + > + return msix ? vdev->msix[vector].vector : pdev->irq + vector; > +} > +EXPORT_SYMBOL_GPL(vfio_msi_get_irq); > + It's difficult to see what you're going to do with this, but it's making the assumption that the vfio_device is a vfio_pci_device, which is the wrong starting point. Eric proposed[1] some VFIO external user interface extensions, that have since been refined in discussion that allows the external user (KVM in this case) access to the struct device where it can get the IRQ vectors itself. We either need to follow that same path for posted interrupts, teaching KVM about struct pci_device in order to extract the IRQ info, or we need to create another vfio external user interface extension that can return IRQ info regardless of the device type. Thanks, Alex [1] https://lkml.org/lkml/2014/8/25/257 > static void __exit vfio_pci_cleanup(void) > { > pci_unregister_driver(&vfio_pci_driver); > diff --git a/include/linux/vfio.h b/include/linux/vfio.h > index d320411..007ca55 100644 > --- a/include/linux/vfio.h > +++ b/include/linux/vfio.h > @@ -92,6 +92,8 @@ extern void vfio_unregister_iommu_driver( > /* > * External user API > */ > +extern unsigned int vfio_msi_get_irq(struct vfio_device *device, int vector, > + bool msix); > extern struct vfio_group *vfio_group_get_external_user(struct file *filep); > extern void vfio_group_put_external_user(struct vfio_group *group); > extern int vfio_external_user_iommu_id(struct vfio_group *group); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v1 2/2] vfio: Add VFIO API vfio_msi_get_irq
On 11/20/2014 05:05 PM, Alex Williamson wrote: > On Thu, 2014-11-20 at 17:05 +0800, Feng Wu wrote: >> This API returns the host irq for the MSI/MSI-X interrrupts. >> >> Signed-off-by: Feng Wu >> --- >> drivers/vfio/pci/vfio_pci.c | 10 ++ >> include/linux/vfio.h|2 ++ >> 2 files changed, 12 insertions(+), 0 deletions(-) >> >> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c >> index 9558da3..4fb9828 100644 >> --- a/drivers/vfio/pci/vfio_pci.c >> +++ b/drivers/vfio/pci/vfio_pci.c >> @@ -1009,6 +1009,16 @@ put_devs: >> kfree(devs.devices); >> } >> >> +unsigned int vfio_msi_get_irq(struct vfio_device *device, int vector, bool >> msix) >> +{ >> +struct vfio_pci_device *vdev = >> +(struct vfio_pci_device *)vfio_device_data(device); >> +struct pci_dev *pdev = vdev->pdev; >> + >> +return msix ? vdev->msix[vector].vector : pdev->irq + vector; >> +} >> +EXPORT_SYMBOL_GPL(vfio_msi_get_irq); >> + > > It's difficult to see what you're going to do with this, but it's making > the assumption that the vfio_device is a vfio_pci_device, which is the > wrong starting point. Eric proposed[1] some VFIO external user > interface extensions, that have since been refined in discussion that > allows the external user (KVM in this case) access to the struct device > where it can get the IRQ vectors itself. We either need to follow that > same path for posted interrupts, teaching KVM about struct pci_device in > order to extract the IRQ info, or we need to create another vfio > external user interface extension that can return IRQ info regardless of > the device type. Thanks, > > Alex > > > [1] https://lkml.org/lkml/2014/8/25/257 Here is the v2: https://lkml.org/lkml/2014/9/1/347 Best Regards Eric > http://lwn.net/Articles/610087/ >> static void __exit vfio_pci_cleanup(void) >> { >> pci_unregister_driver(&vfio_pci_driver); >> diff --git a/include/linux/vfio.h b/include/linux/vfio.h >> index d320411..007ca55 100644 >> --- a/include/linux/vfio.h >> +++ b/include/linux/vfio.h >> @@ -92,6 +92,8 @@ extern void vfio_unregister_iommu_driver( >> /* >> * External user API >> */ >> +extern unsigned int vfio_msi_get_irq(struct vfio_device *device, int vector, >> + bool msix); >> extern struct vfio_group *vfio_group_get_external_user(struct file *filep); >> extern void vfio_group_put_external_user(struct vfio_group *group); >> extern int vfio_external_user_iommu_id(struct vfio_group *group); > > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM causes #GP on XRSTORS
Fenghua, I got KVM (v3.17) crashing on a machine that supports XRSTORS - It appears to get a #GP when it is trying to load the guest FPU. One reason for the #GP is that XCOMP_BV[63] is zeroed on the guest_fpu, but I am not sure it is the only problem. Was KVM ever tested with XRSTORS? Thanks, Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: can I make this work… (Foundation for accessibility project)
On 11/18/2014 9:57 AM, Eric S. Johansson wrote: That's great to know. I will spin up a version of Windows 7 and give it a try given that I'm not looking at it, I can strip it down to the barest user interface elements and improve performance significantly. I tried it and it took me approximately 10 to 12 hours to install Windows 7 twice and I didn't even finish installing the last time. Here's what happened. The first time I installed it, it was a naïve install. Took all the defaults just set up the ISO and let the install run. Then I installed all the updates. Hours went by and it kind of came up and ran but then I tried to install the virt I/O drivers and the Windows installation lost its mind. Did some reading on how to make performance better and on using the virtio drivers in windows. So I start of the second install, same size disk 25 GB, same amount of RAM, 1 GB and installed the ethernet, disk and balloon drivers at the right time. I also changed the cache to none, I/O something to native and I think that's about it. Anyway, that was not really any improvement. It's still was incredibly slow and this time it was complaining about running out of memory and packages install never finished. Just kept going and going going. iptraf reported network io ranging from 3kbit to 100kbit range when the updates were running. I'm accustomed to lesser performance on virtual machines. That's the hazard of a running on old and slow laptop (dell e6400 (2.2ghz core duo, 8gb ram)[1]) but even virtual box is not this slow. So what am I doing wrong? It would be nice to use a slow machine like this as many handcrips don't have a whole lot of resources for buying newer/faster machines. On the other hand, many of them use desktops and work from one place whereas someone like me is all over the map (quite literally). --- eric [1] Part of the reason I don't bother upgrading machines all that often is because it no matter how fast the CPU runs or how much memory I have, Windows always runs about the same speed. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Book3S HV: Add missing HPTE unlock
On 05.11.14 02:21, Paul Mackerras wrote: > From: "Aneesh Kumar K.V" > > In kvm_test_clear_dirty(), if we find an invalid HPTE we move on to the > next HPTE without unlocking the invalid one. In fact we should never > find an invalid and unlocked HPTE in the rmap chain, but for robustness > we should unlock it. This adds the missing unlock. > > Reported-by: Benjamin Herrenschmidt > Signed-off-by: Aneesh Kumar K.V > Signed-off-by: Paul Mackerras Thanks, applied to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Book3S HV: ptes are big endian
On 03.11.14 16:35, Cédric Le Goater wrote: > When being restored from qemu, the kvm_get_htab_header are in native > endian, but the ptes are big endian. > > This patch fixes restore on a KVM LE host. Qemu also needs a fix for > this : > > http://lists.nongnu.org/archive/html/qemu-ppc/2014-11/msg8.html > > Signed-off-by: Cédric Le Goater > Cc: Paul Mackerras > Cc: Alexey Kardashevskiy > Cc: Gregory Kurz > > --- > > Tested on 3.17-rc7 with LE and BE host. > > > > arch/powerpc/kvm/book3s_64_mmu_hv.c |2 ++ > 1 file changed, 2 insertions(+) > > Index: linux-3.18-hv.git/arch/powerpc/kvm/book3s_64_mmu_hv.c > === > --- linux-3.18-hv.git.orig/arch/powerpc/kvm/book3s_64_mmu_hv.c > +++ linux-3.18-hv.git/arch/powerpc/kvm/book3s_64_mmu_hv.c > @@ -1542,6 +1542,8 @@ static ssize_t kvm_htab_write(struct fil > err = -EFAULT; > if (__get_user(v, lbuf) || __get_user(r, lbuf + 1)) > goto out; > + v = be64_to_cpu(v); > + r = be64_to_cpu(r); This will trigger warnings with sparse. Please introduce new be64 variables that you do get_user on and that you then use as source for v and r. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/17] RFC: userfault v2
Hi, On Fri, Oct 31, 2014 at 12:39:32PM -0700, Peter Feiner wrote: > On Fri, Oct 31, 2014 at 11:29:49AM +0800, zhanghailiang wrote: > > Agreed, but for doing live memory snapshot (VM is running when do > > snapsphot), > > we have to do this (block the write action), because we have to save the > > page before it > > is dirtied by writing action. This is the difference, compared to pre-copy > > migration. > > Ah ha, I understand the difference now. I suppose that you have considered > doing a traditional pre-copy migration (that is, passes over memory saving > dirty pages, followed by a pause and a final dump of remaining dirty pages) to > a file. Your approach has the advantage of having the VM pause time bounded by > the time it takes to handle the userfault and do the write, as opposed to > pre-copy migration which has a pause time bounded by the time it takes to do > the final dump of dirty pages, which, in the worst case, is the time it takes > to dump all of the guest memory! It sounds really similar issue as live migration, one can implement a precopy live snapshot, or a precopy+postcopy live snapshot or a pure postcopy live snapshot. The decision on the amount of precopy done before engaging postcopy (zero passes, 1 pass, or more passes) would have similar tradeoffs too, except instead of having to re-transmit the re-dirtied pages over the wire, it would need to overwrite them to disk. The more precopy passes, the longer it takes for the live snapshotting process to finish and the more I/O there will be (for live migration it'd be network bandwidth usage instead of amount of I/O), but the shortest the postcopy runtime will be (and the shorter postcopy runtime is, the fewer userfaults will end up triggering on writes, in turn reducing the slowdown and the artificial fault latency introduced to the guest runtime). But the more precopy passes the more overwriting will happen during the "longer" precopy stage and the more overall load there will be for the host (the otherwise idle part of the host). For the postcopy live snapshot the wrprotect faults are quite equivalent to the not-present faults of postcopy live migration logic. > You could use the old fork & dump trick. Given that the guest's memory is > backed by private VMA (as of a year ago when I last looked, is always the case > for QEMU), you can have the kernel do the write protection for you. > Essentially, you fork Qemu and, in the child process, dump the guest memory > then exit. If the parent (including the guest) writes to guest memory, then it > will fault and the kernel will copy the page. > > The fork & dump approach will give you the best performance w.r.t. guest pause > times (i.e., just pausing for the COW fault handler), but it does have the > distinct disadvantage of potentially using 2x the guest memory (i.e., if the > parent process races ahead and writes to all of the pages before you finish > the > dump). To mitigate memory copying, you could madvise MADV_DONTNEED the child > memory as you copy it. This is a very good point. fork must be evaluated first because it literally already provides you a readonly memory snapshot of the guest memory. The memory cons mentioned above could lead to both -ENOMEM of too many guests runs live snapshots at the same time in the same host, unless overcommit_memory is set to 1 (0 by default). Even then if too many live snapshots are running in parallel you could hit the OOM killer if there are just a bit too many faults at the same time, or you could hit heavy swapping which isn't ideal either. In fact the -ENOMEM avoidance (with qemu failing) is one of the two critical reasons why qemu always set the guest memory as MADV_DONTFORK. But that's not the only reason. To use the fork() trick you'd need to undo the MADV_DONTFORK first but that would open another problem: there's a race condition between fork() O_DIRECT and <4k hardblocksize of virtio-blk. If there's any read() syscall with O_DIRECT with len=512 while fork() is running (think if the aio runs in parallel with the live snapshot thread that forks the child to dump the snapshot) and if the guest writes with the CPU to any 512 fragment of the same page that is the destination buffer of the write(len=512) (on two different 512bytes area of the same guest page) the O_DIRECT write will get lost. So to use fork we'd need to fix this longstanding race (I tried but in the end we declared it an userland issue because it's not exploitable to bypass permissions or corrupt kernel or unrelated memory). Or you'd need to add locking between the dataplane/aio threads and the live snapshot thread to ensure no direct-io I/O is ever in-flight while fork runs. The O_DIRECT however would only help if it's qemu TCG, if it's KVM it's not even enough to stop O_DIRECT reads. KVM would use gup(write=1) from the async-pf all the time... and then the shadow pagetables would go out of sync (it won't destabilize the host of course, but the guest memor
Re: [PATCH 2/5] KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI
On 03.11.14 05:51, Paul Mackerras wrote: > From: Mahesh Salgaonkar > > When we get an HMI (hypervisor maintenance interrupt) while in a > guest, we see that guest enters into paused state. The reason is, in > kvmppc_handle_exit_hv it falls through default path and returns to > host instead of resuming guest. This causes guest to enter into > paused state. HMI is a hypervisor only interrupt and it is safe to > resume the guest since the host has handled it already. This patch > adds a switch case to resume the guest. > > Without this patch we see guest entering into paused state with following > console messages: > > [ 3003.329351] Severe Hypervisor Maintenance interrupt [Recovered] > [ 3003.329356] Error detail: Timer facility experienced an error > [ 3003.329359]HMER: 0840 > [ 3003.329360]TFMR: 4a12000980a84000 > [ 3003.329366] vcpu c007c35094c0 (40): > [ 3003.329368] pc = c00c2ba0 msr = 80009032 trap = e60 > [ 3003.329370] r 0 = c021ddc0 r16 = 0046 > [ 3003.329372] r 1 = c0007a02bbd0 r17 = 327d5d98 > [ 3003.329375] r 2 = c10980b8 r18 = 1fc9a0b0 > [ 3003.329377] r 3 = c142d6b8 r19 = c142d6b8 > [ 3003.329379] r 4 = 0002 r20 = > [ 3003.329381] r 5 = c524a110 r21 = > [ 3003.329383] r 6 = 0001 r22 = > [ 3003.329386] r 7 = r23 = c524a110 > [ 3003.329388] r 8 = r24 = 0001 > [ 3003.329391] r 9 = 0001 r25 = c0007c31da38 > [ 3003.329393] r10 = c14280b8 r26 = 0002 > [ 3003.329395] r11 = 746f6f6c2f68656c r27 = c524a110 > [ 3003.329397] r12 = 28004484 r28 = c0007c31da38 > [ 3003.329399] r13 = cfe01400 r29 = 0002 > [ 3003.329401] r14 = 0046 r30 = c3011e00 > [ 3003.329403] r15 = ffba r31 = 0002 > [ 3003.329404] ctr = c041a670 lr = c0272520 > [ 3003.329405] srr0 = c007e8d8 srr1 = 90001002 > [ 3003.329406] sprg0 = sprg1 = cfe01400 > [ 3003.329407] sprg2 = cfe01400 sprg3 = 0005 > [ 3003.329408] cr = 48004482 xer = 2000 dsisr = 4200 > [ 3003.329409] dar = 010015020048 > [ 3003.329410] fault dar = 010015020048 dsisr = 4200 > [ 3003.329411] SLB (8 entries): > [ 3003.329412] ESID = c800 VSID = 40016e7779000510 > [ 3003.329413] ESID = d801 VSID = 400142add1000510 > [ 3003.329414] ESID = f804 VSID = 4000eb1a81000510 > [ 3003.329415] ESID = 1f00080b VSID = 40004fda0a000d90 > [ 3003.329416] ESID = 3f00080c VSID = 400039f536000d90 > [ 3003.329417] ESID = 180d VSID = 0001251b35150d90 > [ 3003.329417] ESID = 0100080e VSID = 4001e4609d90 > [ 3003.329418] ESID = d8000819 VSID = 40013d349c000400 > [ 3003.329419] lpcr = c04881847001 sdr1 = 001b1906 last_inst = > > [ 3003.329421] trap=0xe60 | pc=0xc00c2ba0 | msr=0x80009032 > [ 3003.329524] Severe Hypervisor Maintenance interrupt [Recovered] > [ 3003.329526] Error detail: Timer facility experienced an error > [ 3003.329527]HMER: 0840 > [ 3003.329527]TFMR: 4a12000980a94000 > [ 3006.359786] Severe Hypervisor Maintenance interrupt [Recovered] > [ 3006.359792] Error detail: Timer facility experienced an error > [ 3006.359795]HMER: 0840 > [ 3006.359797]TFMR: 4a12000980a84000 > > IdName State > > 2 guest2 running > 3 guest3 paused > 4 guest4 running > > Signed-off-by: Mahesh Salgaonkar > Signed-off-by: Paul Mackerras Do we need this for PR running on bare metal as well? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Exposing host debug capabilities to userspace
Hi, I've almost finished the ARMv8 guest debug support but I have one problem left to solve. userspace needs to know how many hardware debug registers are available for GDB to use. This information is available from the ID_AA64DFR0_EL1 register. Currently I abuse GET_ONE_REG to fetch it's value however semantically this is poor as it's API is for getting guest state not host state and they could theoretically have different values. So far the options I've examined are: * KVM ioctl GET_ONE_REG(ID_AA64DFR0_EL1) As explained above, abusing a guest state API for host configuration. * ptrace(PTRACE_GETREGSET, NT_ARM_HW_WATCH) This is used by GDB to access the host details in debug-monitors. However the ptrace API really wants you to attach to a process before calling PTRACE_GETREGSET. Currently I've tried attaching to the thread_id of the vCPU but this fails with EPERM, I suspect because attaching to your own threads likely upsets the kernel. * KVM ioctl KVM_GET_DEBUGREGS This is currently x86 only and looks like it's more aimed at debug registers than capability stuff. Also I'm not sure what the state of this ioctl is compared to KVM_SET_GUEST_DEBUG. Do these APIs overlap or is one an older deprecated x86 only API? * Export the information via sysfs I suppose the correct canonical non-subsystem specific way to make this information available it to expose the data in some sort of sysfs node? However I don't see any existing sysfs structure for the CPU. * Expand /proc/cpuinfo I suspect adding extra text to be badly parsed by userspace is just horrid and unacceptable behaviour ;-) * Add another KVM ioctl? This would have the downside of being specific to KVM and of course proliferating the API space again. I'm open to any suggestions and look forward to your valued feedback ;-) -- Alex Bennée -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked
On 03.11.14 05:52, Paul Mackerras wrote: > From: "Suresh E. Warrier" > > The kvmppc_vcore_blocked() code does not check for the wait condition > after putting the process on the wait queue. This means that it is > possible for an external interrupt to become pending, but the vcpu to > remain asleep until the next decrementer interrupt. The fix is to > make one last check for pending exceptions and ceded state before > calling schedule(). > > Signed-off-by: Suresh Warrier > Signed-off-by: Paul Mackerras I don't understand the race you're fixing here. Can you please explain it? Alex > --- > arch/powerpc/kvm/book3s_hv.c | 20 > 1 file changed, 20 insertions(+) > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index cd7e030..1a7a281 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -1828,9 +1828,29 @@ static void kvmppc_wait_for_exec(struct kvm_vcpu > *vcpu, int wait_state) > */ > static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc) > { > + struct kvm_vcpu *vcpu; > + int do_sleep = 1; > + > DEFINE_WAIT(wait); > > prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE); > + > + /* > + * Check one last time for pending exceptions and ceded state after > + * we put ourselves on the wait queue > + */ > + list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) { > + if (vcpu->arch.pending_exceptions || !vcpu->arch.ceded) { > + do_sleep = 0; > + break; > + } > + } > + > + if (!do_sleep) { > + finish_wait(&vc->wq, &wait); > + return; > + } > + > vc->vcore_state = VCORE_SLEEPING; > spin_unlock(&vc->lock); > schedule(); > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] Some fixes for HV KVM on PPC
On 03.11.14 05:51, Paul Mackerras wrote: > Here are fixes for five bugs which were found in the testing of our > PowerKVM product. The bugs range from guest performance issues to > guest crashes and memory corruption. Please apply. Thanks, applied patches 1-4 to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/17] RFC: userfault v2
Hi, On Thu, Nov 20, 2014 at 10:54:29AM +0800, zhanghailiang wrote: > Yes, you are right. This is what i really want, bypass all non-present faults > and only track strict wrprotect faults. ;) > > So, do you plan to support that in the userfault API? Yes I think it's good idea to support wrprotect/COW faults too. I just wanted to understand if there was any other reason why you needed only wrprotect faults, because the non-present faults didn't look like a big performance concern if they triggered in addition to wrprotect faults, but it's certainly ok to optimize them away so it's fully optimal. All it takes to differentiate the behavior should be one more bit during registration so you can select non-present, wrprotect faults or both. postcopy live migration would select only non-present faults, postcopy live snapshot would select only wrprotect faults, anything like distributed shared memory supporting shared readonly access and exclusive write access, would select both flags. I just sent an (unfortunately) longish but way more detailed email about live snapshotting with userfaultfd but I just wanted to give a shorter answer here too :). Thanks, Andrea -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots
On 11/20/2014 12:08 AM, Laszlo Ersek wrote: > On 11/20/14 00:32, Mario Smarduch wrote: >> Hi Laszlo, >> >> couple observations. >> >> I'm wondering if access from qemu and guest won't >> result in mixed memory attributes and if that's acceptable >> to the CPU. > > Normally this would be a problem I think (Jon raised the topic of live > migration). However, for flash programming specifically, I think the > guest's access pattern ensures that we'll see things OK. > > When the guest issues the first write access, the memslot is deleted, > and everything is forwarded to qemu, both reads and writes. In response > qemu modifies the array that *otherwise* backs the flash. These > modifications by qemu end up in the dcache mostly. When the guest is > done "programming", it writes a special command (read array mode) at > which point the memslot is recreated (as read-only) and flushed / set up > for flushing during demand paging. > > So from the emulated flash POV, the memslot either doesn't exist at all > (and then qemu serves all accesses just fine), or it exists r/o, at > which point qemu (host userspace) will have stopped writing to it, and > will have set it up for flushing before and during guest read accesses. I think beyond consistency, there should be no double mappings with conflicting attributes at any time or CPU state is undefined. At least that's what I recall for cases where identity mapping was cacheble and user mmapp'ed regions uncacheable. Side effects like CPU hardstop or victim invalidate of dirty cache line. With virtualization extensions maybe behavior is different. I guess if you're not seeing lock ups or crashes then it appears to work :) Probably more senior folks in ARM community are in better position to address this, but I thought I raise a flag. > >> Also is if you update memory from qemu you may break >> dirty page logging/migration. > > Very probably. Jon said the same thing. > >> Unless there is some other way >> you keep track. Of course it may not be applicable in your >> case (i.e. flash unused after boot). > > The flash *is* used after boot, because the UEFI runtime variable > services *are* exercised by the guest kernel. However those use the same > access pattern (it's the same set of UEFI services just called by a > different "client"). > > *Uncoordinated* access from guest and host in parallel will be a big > problem; but we're not that far yet, and we need to get the flash > problem sorted, so that we can at least boot and work on the basic > stuff. The flash programming dance happens to provide coordination; the > flash mode changes (which are equivalent to the teardown and the > recreation of the memslot) can be considered barriers. > > I hope this is acceptable for the time being... Yeah I understand you have a more imediatte requirement to support, migration isssue is more fyi. Thanks for the details helps to understand the context. - Mario > > Thanks > Laszlo > >> >> - Mario >> >> On 11/17/2014 07:49 AM, Laszlo Ersek wrote: >>> On 11/17/14 16:29, Paolo Bonzini wrote: On 17/11/2014 15:58, Ard Biesheuvel wrote: > Readonly memslots are often used to implement emulation of ROMs and > NOR flashes, in which case the guest may legally map these regions as > uncached. > To deal with the incoherency associated with uncached guest mappings, > treat all readonly memslots as incoherent, and ensure that pages that > belong to regions tagged as such are flushed to DRAM before being passed > to the guest. On x86, the processor combines the cacheability values from the two levels of page tables. Is there no way to do the same on ARM? >>> >>> Combining occurs on ARMv8 too. The Stage1 (guest) mapping is very strict >>> (Device non-Gathering, non-Reordering, no Early Write Acknowledgement -- >>> for EFI_MEMORY_UC), which basically "overrides" the Stage2 (very lax >>> host) memory attributes. >>> >>> When qemu writes, as part of emulating the flash programming commands, >>> to the RAMBlock that *otherwise* backs the flash range (as a r/o >>> memslot), those writes (from host userspace) tend to end up in dcache. >>> >>> But, when the guest flips back the flash to romd mode, and tries to read >>> back the values from the flash as plain ROM, the dcache is completely >>> bypassed due to the strict stage1 mapping, and the guest goes directly >>> to DRAM. >>> >>> Where qemu's earlier writes are not yet / necessarily visible. >>> >>> Please see my original patch (which was incomplete) in the attachment, >>> it has a very verbose commit message. >>> >>> Anyway, I'll let others explain; they can word it better than I can :) >>> >>> FWIW, >>> >>> Series >>> Reviewed-by: Laszlo Ersek >>> >>> I ported this series to a 3.17.0+ based kernel, and tested it. It works >>> fine. The ROM-like view of the NOR flash now reflects the previously >>> programmed contents. >>> >>> Series >>> Tested-by: Laszlo Ersek >>> >>> Thanks! >>> L
Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots
On 20 November 2014 18:35, Mario Smarduch wrote: > I think beyond consistency, there should be no double mappings with > conflicting attributes at any time or CPU state is undefined. The situation is not so bleak as this. See section B2.9 "Mismatched memory attributes" in the ARMv8 ARM ARM (DDI0487A.d), which lays out in some detail what the results of mismatched attributes are (generally, you lose ordering or coherency guarantees you might have hoped to have). They're not pretty, but it's not as bad as completely UNPREDICTABLE behaviour. thanks -- PMM -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots
On 11/20/2014 10:40 AM, Peter Maydell wrote: > On 20 November 2014 18:35, Mario Smarduch wrote: >> I think beyond consistency, there should be no double mappings with >> conflicting attributes at any time or CPU state is undefined. > > The situation is not so bleak as this. See section B2.9 "Mismatched > memory attributes" in the ARMv8 ARM ARM (DDI0487A.d), which lays > out in some detail what the results of mismatched attributes are > (generally, you lose ordering or coherency guarantees you might > have hoped to have). They're not pretty, but it's not as bad > as completely UNPREDICTABLE behaviour. > > thanks > -- PMM > Hi Peter, thanks for digging that up, quite a list to navigate but it does provide detailed guidance. - Mario -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked
On 11/20/2014 11:36 AM, Alexander Graf wrote: > > > On 03.11.14 05:52, Paul Mackerras wrote: >> From: "Suresh E. Warrier" >> >> The kvmppc_vcore_blocked() code does not check for the wait condition >> after putting the process on the wait queue. This means that it is >> possible for an external interrupt to become pending, but the vcpu to >> remain asleep until the next decrementer interrupt. The fix is to >> make one last check for pending exceptions and ceded state before >> calling schedule(). >> >> Signed-off-by: Suresh Warrier >> Signed-off-by: Paul Mackerras > > I don't understand the race you're fixing here. Can you please explain it? > When a virtual interrupt needs to be delivered to the guest, and the virtual ICS state for the interrupt and virtual ICP state for the VCPU allow for the VCPU to be immediately interrupted, we 1. Set the BOOK3S_INTERRUPT_EXTERNAL_LEVEL bit in pending_exceptions. 2. Call kvmppc_fast_vcpu_kick_hv(), which checks the wait queue at vcpu->wq to wake the VCPU up. The caller of kvmppc_vcore_blocked() does the check for pending exceptions, but there is a race condition here and we do need to check again after the VCPU is put on the wait queue. -suresh > > Alex > >> --- >> arch/powerpc/kvm/book3s_hv.c | 20 >> 1 file changed, 20 insertions(+) >> >> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c >> index cd7e030..1a7a281 100644 >> --- a/arch/powerpc/kvm/book3s_hv.c >> +++ b/arch/powerpc/kvm/book3s_hv.c >> @@ -1828,9 +1828,29 @@ static void kvmppc_wait_for_exec(struct kvm_vcpu >> *vcpu, int wait_state) >> */ >> static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc) >> { >> +struct kvm_vcpu *vcpu; >> +int do_sleep = 1; >> + >> DEFINE_WAIT(wait); >> >> prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE); >> + >> +/* >> + * Check one last time for pending exceptions and ceded state after >> + * we put ourselves on the wait queue >> + */ >> +list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) { >> +if (vcpu->arch.pending_exceptions || !vcpu->arch.ceded) { >> +do_sleep = 0; >> +break; >> +} >> +} >> + >> +if (!do_sleep) { >> +finish_wait(&vc->wq, &wait); >> +return; >> +} >> + >> vc->vcore_state = VCORE_SLEEPING; >> spin_unlock(&vc->lock); >> schedule(); >> > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots
On 11/20/2014 01:40 PM, Peter Maydell wrote: > On 20 November 2014 18:35, Mario Smarduch wrote: >> I think beyond consistency, there should be no double mappings with >> conflicting attributes at any time or CPU state is undefined. > > The situation is not so bleak as this. See section B2.9 "Mismatched > memory attributes" in the ARMv8 ARM ARM (DDI0487A.d), which lays > out in some detail what the results of mismatched attributes are > (generally, you lose ordering or coherency guarantees you might > have hoped to have). They're not pretty, but it's not as bad > as completely UNPREDICTABLE behaviour. Quick side note that I did raise exactly this issue with the ARM Architecture team several years ago (that of missmatched memory attributes between a guest and hypervisor) and it is a known concern. I'm personally concerned about a couple of things that I won't go into here but will followup on what the longer term plan might be. Jon. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots
On 20 November 2014 19:49, Jon Masters wrote: > On 11/20/2014 01:40 PM, Peter Maydell wrote: >> The situation is not so bleak as this. See section B2.9 "Mismatched >> memory attributes" in the ARMv8 ARM ARM (DDI0487A.d), which lays >> out in some detail what the results of mismatched attributes are >> (generally, you lose ordering or coherency guarantees you might >> have hoped to have). They're not pretty, but it's not as bad >> as completely UNPREDICTABLE behaviour. > > Quick side note that I did raise exactly this issue with the ARM > Architecture team several years ago (that of missmatched memory > attributes between a guest and hypervisor) and it is a known concern. I think in practice for a well-behaved guest we can arrange that everything is fine (roughly, the guest has to treat DMA-capable devices as doing coherent-dma, which we can tell them to do via DT bindings or ACPI[*], plus the special case handling we already have for bootup), and naughty guests will only confuse themselves. But I need to think a bit more about it (and we should probably write down how it works somewhere :-)). [*] We should be able to emulate non-coherent-DMA devices but would need an extra API from KVM so userspace can do "clean dcache to point of coherency". And in practice I'm not sure we need to emulate those devices... -- PMM -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: compiler bug gcc4.6/4.7 with ACCESS_ONCE and workarounds
On Thu, Nov 20, 2014 at 3:39 AM, Christian Borntraeger wrote: > > So It looks like we could make a change to ACCESS_ONCE. Would something like > > CONFIG_ARCH_SCALAR_ACCESS_ONCE be a good start? No, if it's just a handful of places to be fixed, let's not add config options for broken cases. > This would boil down to > Patch1: Provide stricter ACCESS_ONCE if CONFIG_ARCH_SCALAR_ACCESS_ONCE is set > + docu update + comments > Patch2: Change mm/* to barriers > Patch3: Change x86 locks > Patch4: Change x86 gup > Patch4: Enable CONFIG_ARCH_SCALAR_ACCESS_ONCE for s390x and x86 Just do patches 2-4 first, and then patch 1 unconditionally. Obviously you'd need to spread the word on linux-arch to see how bad it is for other cases, but if other architectures are at all like x86 and s390, and just require a few trivial patches, let's not make this some config option. Linus -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots
On 11/20/14 21:10, Peter Maydell wrote: > On 20 November 2014 19:49, Jon Masters wrote: >> On 11/20/2014 01:40 PM, Peter Maydell wrote: >>> The situation is not so bleak as this. See section B2.9 "Mismatched >>> memory attributes" in the ARMv8 ARM ARM (DDI0487A.d), which lays >>> out in some detail what the results of mismatched attributes are >>> (generally, you lose ordering or coherency guarantees you might >>> have hoped to have). They're not pretty, but it's not as bad >>> as completely UNPREDICTABLE behaviour. >> >> Quick side note that I did raise exactly this issue with the ARM >> Architecture team several years ago (that of missmatched memory >> attributes between a guest and hypervisor) and it is a known concern. > > I think in practice for a well-behaved guest we can arrange > that everything is fine (roughly, the guest has to treat > DMA-capable devices as doing coherent-dma, which we can tell > them to do via DT bindings or ACPI[*], plus the special > case handling we already have for bootup), and naughty guests > will only confuse themselves. But I need to think a bit more > about it (and we should probably write down how it works > somewhere :-)). > > [*] We should be able to emulate non-coherent-DMA devices but > would need an extra API from KVM so userspace can do "clean > dcache to point of coherency". And in practice I'm not sure > we need to emulate those devices... This basically means that virtio transfers should just use normal memory in the guest (treating virtio transfers as "coherent DMA"), right? We're actually doing that in the ArmVirtualizationQemu.dsc build of edk2, and Things Work Great (TM) in guests on the Mustang. Thanks Laszlo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: can I make this work… (Foundation for accessibility project)
On 20/11/2014 17:28, Eric S. Johansson wrote: > I'm accustomed to lesser performance on virtual machines. That's the > hazard of a running on old and slow laptop (dell e6400 (2.2ghz core > duo, 8gb ram)[1]) but even virtual box is not this slow. So what am I > doing wrong? It would be nice to use a slow machine like this as many > handcrips don't have a whole lot of resources for buying newer/faster > machines. On the other hand, many of them use desktops and work from one > place whereas someone like me is all over the map (quite literally). How did you start the virtual machine? Perhaps you're not using KVM but emulation? I have a fast machine but slow disk (a NAS on 100 MBit ethernet) and I can do about 15 automated installations in less than 6 hours. Are you using libvirt or directly invoking QEMU? Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots
On 20 November 2014 21:13, Laszlo Ersek wrote: > On 11/20/14 21:10, Peter Maydell wrote: >> I think in practice for a well-behaved guest we can arrange >> that everything is fine (roughly, the guest has to treat >> DMA-capable devices as doing coherent-dma, which we can tell >> them to do via DT bindings or ACPI[*], plus the special >> case handling we already have for bootup), and naughty guests >> will only confuse themselves. But I need to think a bit more >> about it (and we should probably write down how it works >> somewhere :-)). > This basically means that virtio transfers should just use normal memory > in the guest (treating virtio transfers as "coherent DMA"), right? Normal *cacheable*, but yes. -- PMM -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: KVM causes #GP on XRSTORS
> From: Nadav Amit [mailto:nadav.a...@gmail.com] > Sent: Thursday, November 20, 2014 8:34 AM > To: Paolo Bonzini; Yu, Fenghua > Cc: kvm list > Subject: KVM causes #GP on XRSTORS > > Fenghua, > > I got KVM (v3.17) crashing on a machine that supports XRSTORS - It appears > to get a #GP when it is trying to load the guest FPU. > One reason for the #GP is that XCOMP_BV[63] is zeroed on the guest_fpu, > but I am not sure it is the only problem. > Was KVM ever tested with XRSTORS? In dmesg, do you see "xsave: . using compacted form"? If you use kernel "noxsaves", does KVM work fine? Thanks. -Fenghua -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: can I make this work… (Foundation for accessibility project)
On 11/20/2014 4:48 PM, Paolo Bonzini wrote: On 20/11/2014 17:28, Eric S. Johansson wrote: I'm accustomed to lesser performance on virtual machines. That's the hazard of a running on old and slow laptop (dell e6400 (2.2ghz core duo, 8gb ram)[1]) but even virtual box is not this slow. So what am I doing wrong? It would be nice to use a slow machine like this as many handcrips don't have a whole lot of resources for buying newer/faster machines. On the other hand, many of them use desktops and work from one place whereas someone like me is all over the map (quite literally). How did you start the virtual machine? Perhaps you're not using KVM but emulation? I have a fast machine but slow disk (a NAS on 100 MBit ethernet) and I can do about 15 automated installations in less than 6 hours. Are you using libvirt or directly invoking QEMU? I was using one of the GUIs ( less hand stress than trying to assemble a commandline). Unfortunately I'm in Windows 8 right now because I'm writing. I'm very sure the GUI was http://virt-manager.org/ I tried a different one but it kept telling me I only had QEMU I thought "silly program, that can't be right". Someday I will not argue with software or small electronic boxes. They don't care who wins and they are much more stubborn than I am. I'll be able to run some tests in about 2 to 3 hours after I finish this document. Let me know what I should look at? on a side note, a pointer to an automated install process would be wonderful. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] KVM: PPC: Book3S HV: ptes are big endian
When being restored from qemu, the kvm_get_htab_header are in native endian, but the ptes are big endian. This patch fixes restore on a KVM LE host. Qemu also needs a fix for this : http://lists.nongnu.org/archive/html/qemu-ppc/2014-11/msg8.html Signed-off-by: Cédric Le Goater Cc: Paul Mackerras Cc: Alexey Kardashevskiy Cc: Gregory Kurz --- Tested on 3.18-rc5 with LE and BE host. v2: add be64 local variables to be friendly with sparse arch/powerpc/kvm/book3s_64_mmu_hv.c |8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) Index: linux-3.18-hv.git/arch/powerpc/kvm/book3s_64_mmu_hv.c === --- linux-3.18-hv.git.orig/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ linux-3.18-hv.git/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -1539,9 +1539,15 @@ static ssize_t kvm_htab_write(struct fil hptp = (__be64 *)(kvm->arch.hpt_virt + (i * HPTE_SIZE)); lbuf = (unsigned long __user *)buf; for (j = 0; j < hdr.n_valid; ++j) { + __be64 hpte_v; + __be64 hpte_r; + err = -EFAULT; - if (__get_user(v, lbuf) || __get_user(r, lbuf + 1)) + if (__get_user(hpte_v, lbuf) || + __get_user(hpte_r, lbuf + 1)) goto out; + v = be64_to_cpu(hpte_v); + r = be64_to_cpu(hpte_r); err = -EINVAL; if (!(v & HPTE_V_VALID)) goto out; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM causes #GP on XRSTORS
Hi Nadav, On Thu, Nov 20, 2014 at 06:34:04PM +0200, Nadav Amit wrote: >Fenghua, > >I got KVM (v3.17) crashing on a machine that supports XRSTORS - It appears to >get a #GP when it is trying to load the guest FPU. >One reason for the #GP is that XCOMP_BV[63] is zeroed on the guest_fpu, but I >am not sure it is the only problem. >Was KVM ever tested with XRSTORS? Current kvm and qemu use standard format and xsaves/xrstors should use compact format, in addition, vmx is still not enabled for xsaves/xrstors in kvm. Regards, Wanpeng Li > >Thanks, >Nadav > >-- >To unsubscribe from this list: send the line "unsubscribe kvm" in >the body of a message to majord...@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM causes #GP on XRSTORS
> On Nov 21, 2014, at 02:10, Wanpeng Li wrote: > > Hi Nadav, > On Thu, Nov 20, 2014 at 06:34:04PM +0200, Nadav Amit wrote: >> Fenghua, >> >> I got KVM (v3.17) crashing on a machine that supports XRSTORS - It appears >> to get a #GP when it is trying to load the guest FPU. >> One reason for the #GP is that XCOMP_BV[63] is zeroed on the guest_fpu, but >> I am not sure it is the only problem. >> Was KVM ever tested with XRSTORS? > > Current kvm and qemu use standard format and xsaves/xrstors should use > compact format, in addition, vmx is still not enabled for xsaves/xrstors > in kvm. Sorry, but I don’t quite understand. Should KVM work on a machine that supports xsaves/xrstors? I am not referring to whether KVM exposes the capability to the guest, but to whether KVM should work at all. Thanks, Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] vfio: put off the allocation of "minor" in vfio_create_group
On 2014/11/20 23:37, Alex Williamson wrote: > On Thu, 2014-11-20 at 19:25 +0800, Zhen Lei wrote: >> The next code fragment "list_for_each_entry" is not depend on "minor". With >> this >> patch, the free of "minor" in "list_for_each_entry" can be reduced, and >> there is >> no functional change. > > A reasonable micro-optimization, but I'm curious if you're actually > seeing some measurable overhead from this. It seems like we'd need to Oh, I just found this by code review. It's good to put pure check first and resource allocation later, if they are order independent. Isn't it? I think it will not improve performance but only save a little code space and looks feel better. > have multiple devices, all within the same IOMMU group, all probed by > vfio-pci at the same time to exercise the race condition. Thanks, I think vfio_create_group can not be invoked frequently. So, the performance of this function is not a major consideration, we don't care slightly performance change. > > Alex > >> Signed-off-by: Zhen Lei >> --- >> drivers/vfio/vfio.c | 13 ++--- >> 1 file changed, 6 insertions(+), 7 deletions(-) >> >> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c >> index f018d8d..737eb468 100644 >> --- a/drivers/vfio/vfio.c >> +++ b/drivers/vfio/vfio.c >> @@ -225,22 +225,21 @@ static struct vfio_group *vfio_create_group(struct >> iommu_group *iommu_group) >> >> mutex_lock(&vfio.group_lock); >> >> -minor = vfio_alloc_group_minor(group); >> -if (minor < 0) { >> -vfio_group_unlock_and_free(group); >> -return ERR_PTR(minor); >> -} >> - >> /* Did we race creating this group? */ >> list_for_each_entry(tmp, &vfio.group_list, vfio_next) { >> if (tmp->iommu_group == iommu_group) { >> vfio_group_get(tmp); >> -vfio_free_group_minor(minor); >> vfio_group_unlock_and_free(group); >> return tmp; >> } >> } >> >> +minor = vfio_alloc_group_minor(group); >> +if (minor < 0) { >> +vfio_group_unlock_and_free(group); >> +return ERR_PTR(minor); >> +} >> + >> dev = device_create(vfio.class, NULL, >> MKDEV(MAJOR(vfio.group_devt), minor), >> group, "%d", iommu_group_id(iommu_group)); >> -- >> 1.8.0 >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ > > > > > . > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] KVM: x86: fix access memslots w/o hold srcu read lock
Hi Tang, On Fri, Nov 21, 2014 at 02:41:57PM +0800, Tang Chen wrote: >Hi Wanpeng, > >Sorry, it is about this problem again. > >I booted 3.18.0-rc2, without Paolo's patch. lockdep and RCU debug >were all opened. > >Then I started a qemu vm with the following options: > >/usr/libexec/qemu-kvm -hda rhel7.0ga-x64.qcow2 -m 512M -cpu >host,-x2apic -serial stdio > >I added printk() in kvm_vcpu_reload_apic_access_page(), and it was >printed out. >So I think I can confirm that I have run into >kvm_vcpu_reload_apic_access_page(). > >But I still didn't see any warning. > >Is there anything else I should do ? >Would you please share your qemu command with me ? I test it on the other guy's Ivytown and take advantage of the qemu command line which he used, so I forget the accurate command line which used that day. Paolo also reproduce the bug, Paolo, ping. Regards, Wanpeng Li > >Thanks. :) > >On 11/14/2014 07:39 AM, Wanpeng Li wrote: >>Hi Tang, >>On Tue, Nov 11, 2014 at 01:35:29PM +0800, Tang Chen wrote: >>>Hi Wanpeng, >>> >>Sorry for the late. >> >>>I think I have totally missed this thread. >>>I opened lockdep and RCU debug, and tried on 3.18-rc1. But I didn't >>>get the warning. >>I also opened lockdep and RCU debug, and tried 3.18.0-rc2 on a Ivy >>bridge, the warning will be triggered after run qemu immediately. There >>is no need to try any hotplug related stuff. >> >>In addition, Paolo's patch is merged upstream to fix this. >> >>commit a73896cb5bbdce672945745db8224352a689f580 >>Author: Paolo Bonzini >>Date: Sun Nov 2 07:54:30 2014 +0100 >> >>KVM: vmx: defer load of APIC access page address during reset >> >>Regards, >>Wanpeng Li >> >>>My steps are: >>> >>>1. Use numactl to bind a qemu process to node1. >>>2. Offline all node1 memory. And the qemu process is still running. >>> >>>Would you please tell me how did you reproduce it ? >>> >>>Thanks. >>> >>>On 11/02/2014 03:07 PM, Wanpeng Li wrote: The srcu read lock must be held while accessing memslots (e.g. when using gfn_to_* functions), however, commit c24ae0dcd3e8 ("kvm: x86: Unpin and remove kvm_arch->apic_access_page") call gfn_to_page() in kvm_vcpu_reload_apic_access_page() w/o hold it in vmx_vcpu_reset() path which leads to suspicious rcu_dereference_check() usage warning. This patch fix it by holding srcu read lock in all kvm_vcpu_reset() call path. [ INFO: suspicious RCU usage. ] 3.18.0-rc2-test2+ #70 Not tainted --- include/linux/kvm_host.h:474 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by qemu-system-x86/2371: #0: (&vcpu->mutex){+.+...}, at: [] vcpu_load+0x20/0xd0 [kvm] stack backtrace: CPU: 4 PID: 2371 Comm: qemu-system-x86 Not tainted 3.18.0-rc2-test2+ #70 Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 0001 880209983ca8 816f514f 8802099b8990 880209983cd8 810bd687 000fee00 880208a2c000 880208a1 88020ef50040 880209983d08 Call Trace: [] dump_stack+0x4e/0x71 [] lockdep_rcu_suspicious+0xe7/0x120 [] gfn_to_memslot+0xd5/0xe0 [kvm] [] __gfn_to_pfn+0x33/0x60 [kvm] [] gfn_to_page+0x25/0x90 [kvm] [] kvm_vcpu_reload_apic_access_page+0x3c/0x80 [kvm] [] vmx_vcpu_reset+0x20c/0x460 [kvm_intel] [] kvm_vcpu_reset+0x15e/0x1b0 [kvm] [] kvm_arch_vcpu_setup+0x2c/0x50 [kvm] [] kvm_vm_ioctl+0x1d0/0x780 [kvm] [] ? __lock_is_held+0x54/0x80 [] do_vfs_ioctl+0x300/0x520 [] ? __fget+0x5/0x250 [] ? __fget_light+0x2a/0xe0 [] SyS_ioctl+0x81/0xa0 [] system_call_fastpath+0x16/0x1b Reported-by: Takashi Iwai Reported-by: Alexei Starovoitov Suggested-by: Paolo Bonzini Signed-off-by: Wanpeng Li --- v3 -> v4: * bypass the problem altoghter by kvm_make_request v2 -> v3: * take care all vmx_vcpu_reset call path v1 -> v2: * just fix hold the srcu read lock in vmx_vcpu_reset path arch/x86/kvm/vmx.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a0f78db..3e556c6 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4579,7 +4579,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu) vmcs_write32(TPR_THRESHOLD, 0); } - kvm_vcpu_reload_apic_access_page(vcpu); + kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu); if (vmx_vm_has_apicv(vcpu->kvm)) memset(&vmx->pi_desc, 0, sizeof(struct pi_desc)); >>. >> -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/17] RFC: userfault v2
On 2014/11/21 1:38, Andrea Arcangeli wrote: Hi, On Thu, Nov 20, 2014 at 10:54:29AM +0800, zhanghailiang wrote: Yes, you are right. This is what i really want, bypass all non-present faults and only track strict wrprotect faults. ;) So, do you plan to support that in the userfault API? Yes I think it's good idea to support wrprotect/COW faults too. Great! Then i can expect your patches. ;) I just wanted to understand if there was any other reason why you needed only wrprotect faults, because the non-present faults didn't look like a big performance concern if they triggered in addition to wrprotect faults, but it's certainly ok to optimize them away so it's fully optimal. Er, you have got the answer, no special, it's only for optimality. All it takes to differentiate the behavior should be one more bit during registration so you can select non-present, wrprotect faults or both. postcopy live migration would select only non-present faults, postcopy live snapshot would select only wrprotect faults, anything like distributed shared memory supporting shared readonly access and exclusive write access, would select both flags. It is really flexible in this way. I just sent an (unfortunately) longish but way more detailed email about live snapshotting with userfaultfd but I just wanted to give a shorter answer here too :). Thanks for your explanation, and your patience. It is really useful, now i know more details about why 'fork & dump live snapshot' scenario is not acceptable. Thanks :) Thanks, Andrea . -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html