Re: Nested paging in nested SVM setup
Il 01/09/2014 21:21, Valentine Sinitsyn ha scritto: > >> Can you retry running the tests with the latest kvm-unit-tests (branch >> "master"), gather a trace of kvm and kvmmmu events, and send the >> compressed trace.dat my way? > You mean the trace when the problem reveal itself (not from running > tests), I assume? It's around 2G uncompressed (probably I'm enabling > tracing to early or doing anything else wrong). Will look into it > tomorrow, hopefully, I can reduce the size (e.g. by switching to > uniprocessor mode). Below is a trace snippet similar to the one I've > sent earlier. I actually meant kvm-unit-tests in order to understand the npt_rsvd failure. (I had sent a separate message for Jailhouse). For kvm-unit-tests, you can comment out tests that do not fail to reduce the trace size. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL 0/2] KVM: s390/mm: Two fixes for master (3.17)
Paolo, this request is against kvm/master and contains two fixes for guests that use storage keys. The following changes since commit ab3f285f227fec62868037e9b1b1fd18294a83b8: KVM: s390/mm: try a cow on read only pages for key ops (2014-08-25 14:35:28 +0200) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git tags/kvm-s390-master-20140902 for you to fetch changes up to 1951497d90d6754201af3e65241a06f9ef6755cd: KVM: s390/mm: Fix guest storage key corruption in ptep_set_access_flags (2014-09-02 10:30:43 +0200) KVM: s390/mm: Fix two guest storage key corruptions on paging Here are two patches that fix issues that were introduced with commit 0944fe3f4a32 ("s390/mm: implement software referenced bits"). This commit introduced additional invalid<->valid transitions that we need to handle to transfer the storage key from/to pgste. Christian Borntraeger (2): KVM: s390/mm: Fix storage key corruption during swapping KVM: s390/mm: Fix guest storage key corruption in ptep_set_access_flags arch/s390/include/asm/pgtable.h | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL 2/2] KVM: s390/mm: Fix guest storage key corruption in ptep_set_access_flags
commit 0944fe3f4a32 ("s390/mm: implement software referenced bits") triggered another paging/storage key corruption. There is an unhandled invalid->valid pte change where we have to set the real storage key from the pgste. When doing paging a guest page might be swapcache or swap and when faulted in it might be read-only and due to a parallel scan old. An do_wp_page will make it writeable and young. Due to software reference tracking this page was invalid and now becomes valid. Signed-off-by: Christian Borntraeger Acked-by: Martin Schwidefsky Cc: sta...@vger.kernel.org # v3.12+ --- arch/s390/include/asm/pgtable.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index 32686e8..5efb2fe 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1331,6 +1331,7 @@ static inline int ptep_set_access_flags(struct vm_area_struct *vma, ptep_flush_direct(vma->vm_mm, address, ptep); if (mm_has_pgste(vma->vm_mm)) { + pgste_set_key(ptep, pgste, entry, vma->vm_mm); pgste = pgste_set_pte(ptep, pgste, entry); pgste_set_unlock(ptep, pgste); } else -- 1.8.4.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL 1/2] KVM: s390/mm: Fix storage key corruption during swapping
Since 3.12 or more precisely commit 0944fe3f4a32 ("s390/mm: implement software referenced bits") guest storage keys get corrupted during paging. This commit added another valid->invalid translation for page tables - namely ptep_test_and_clear_young. We have to transfer the storage key into the pgste in that case. Signed-off-by: Christian Borntraeger Acked-by: Martin Schwidefsky Cc: sta...@vger.kernel.org # v3.12+ --- arch/s390/include/asm/pgtable.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index b76317c..32686e8 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1127,7 +1127,7 @@ static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { pgste_t pgste; - pte_t pte; + pte_t pte, oldpte; int young; if (mm_has_pgste(vma->vm_mm)) { @@ -1135,12 +1135,13 @@ static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, pgste = pgste_ipte_notify(vma->vm_mm, ptep, pgste); } - pte = *ptep; + oldpte = pte = *ptep; ptep_flush_direct(vma->vm_mm, addr, ptep); young = pte_young(pte); pte = pte_mkold(pte); if (mm_has_pgste(vma->vm_mm)) { + pgste = pgste_update_all(&oldpte, pgste, vma->vm_mm); pgste = pgste_set_pte(ptep, pgste, pte); pgste_set_unlock(ptep, pgste); } else -- 1.8.4.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Nested paging in nested SVM setup
On 02.09.2014 14:25, Paolo Bonzini wrote: I actually meant kvm-unit-tests in order to understand the npt_rsvd failure. (I had sent a separate message for Jailhouse). Oops, sorry for misunderstanding. Uploaded it here: https://www.dropbox.com/s/jp6ohb0ul3d6v4u/npt_rsvd.txt.bz2?dl=0 The environment is QEMU 2.1.0 + Linux 3.16.1 with paging_tmpl.h patch, and the only test enabled was npt_rsvd (others do pass now). For kvm-unit-tests, you can comment out tests that do not fail to reduce the trace size. Yes, I've sent that trace earlier today. Valentine -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/4] Make kvm_device_ops registration dynamic
Hi all, This is version 3 of the patches originally posted here: v1: http://www.spinics.net/lists/kvm-arm/msg10219.html v2: http://www.spinics.net/lists/kvm/msg105197.html Changes since v2 include: - Rebased onto 3.17-rc* (the vgic code changed a lot!) - Added relevant acks The mpic, flic and xics are still not ported over, as I don't want to risk breaking those devices (it's not clear at which point they need to be registered). Thanks, Will --->8 Cornelia Huck (1): KVM: s390: register flic ops dynamically Will Deacon (3): KVM: device: add simple registration mechanism for kvm_device_ops KVM: ARM: vgic: register kvm_device_ops dynamically KVM: VFIO: register kvm_device_ops dynamically arch/s390/kvm/kvm-s390.c | 3 +- arch/s390/kvm/kvm-s390.h | 1 + include/linux/kvm_host.h | 4 +- include/uapi/linux/kvm.h | 22 +-- virt/kvm/arm/vgic.c | 157 --- virt/kvm/kvm_main.c | 57 + virt/kvm/vfio.c | 22 --- 7 files changed, 142 insertions(+), 124 deletions(-) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 3/4] KVM: s390: register flic ops dynamically
From: Cornelia Huck Using the new kvm_register_device_ops() interface makes us get rid of an #ifdef in common code. Cc: Gleb Natapov Cc: Paolo Bonzini Signed-off-by: Cornelia Huck Signed-off-by: Will Deacon --- arch/s390/kvm/kvm-s390.c | 3 ++- arch/s390/kvm/kvm-s390.h | 1 + include/linux/kvm_host.h | 1 - virt/kvm/kvm_main.c | 4 4 files changed, 3 insertions(+), 6 deletions(-) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index ce81eb2ab76a..63793641393c 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -130,7 +130,8 @@ void kvm_arch_check_processor_compat(void *rtn) int kvm_arch_init(void *opaque) { - return 0; + /* Register floating interrupt controller interface. */ + return kvm_register_device_ops(&kvm_flic_ops, KVM_DEV_TYPE_FLIC); } void kvm_arch_exit(void) diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h index 3862fa2cefe0..04e90538c37e 100644 --- a/arch/s390/kvm/kvm-s390.h +++ b/arch/s390/kvm/kvm-s390.h @@ -228,6 +228,7 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu); int psw_extint_disabled(struct kvm_vcpu *vcpu); void kvm_s390_destroy_adapters(struct kvm *kvm); int kvm_s390_si_ext_call_pending(struct kvm_vcpu *vcpu); +extern struct kvm_device_ops kvm_flic_ops; /* implemented in guestdbg.c */ void kvm_s390_backup_guest_per_regs(struct kvm_vcpu *vcpu); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index e9a069ddcb2b..73f796ad0d0b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1073,7 +1073,6 @@ int kvm_register_device_ops(struct kvm_device_ops *ops, u32 type); extern struct kvm_device_ops kvm_mpic_ops; extern struct kvm_device_ops kvm_xics_ops; extern struct kvm_device_ops kvm_vfio_ops; -extern struct kvm_device_ops kvm_flic_ops; #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 6aaa44c1506e..78b25a44db53 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2272,10 +2272,6 @@ static struct kvm_device_ops *kvm_device_ops_table[KVM_DEV_TYPE_MAX] = { #ifdef CONFIG_KVM_VFIO [KVM_DEV_TYPE_VFIO] = &kvm_vfio_ops, #endif - -#ifdef CONFIG_S390 - [KVM_DEV_TYPE_FLIC] = &kvm_flic_ops, -#endif }; int kvm_register_device_ops(struct kvm_device_ops *ops, u32 type) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/4] KVM: device: add simple registration mechanism for kvm_device_ops
kvm_ioctl_create_device currently has knowledge of all the device types and their associated ops. This is fairly inflexible when adding support for new in-kernel device emulations, so move what we currently have out into a table, which can support dynamic registration of ops by new drivers for virtual hardware. Cc: Alex Williamson Cc: Alex Graf Cc: Gleb Natapov Cc: Paolo Bonzini Cc: Marc Zyngier Acked-by: Cornelia Huck Reviewed-by: Christoffer Dall Signed-off-by: Will Deacon --- include/linux/kvm_host.h | 1 + include/uapi/linux/kvm.h | 22 +++- virt/kvm/kvm_main.c | 65 3 files changed, 55 insertions(+), 33 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a4c33b34fe3f..ca105f919f0b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1068,6 +1068,7 @@ struct kvm_device_ops { void kvm_device_get(struct kvm_device *dev); void kvm_device_put(struct kvm_device *dev); struct kvm_device *kvm_device_from_filp(struct file *filp); +int kvm_register_device_ops(struct kvm_device_ops *ops, u32 type); extern struct kvm_device_ops kvm_mpic_ops; extern struct kvm_device_ops kvm_xics_ops; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index cf3a2ff440e4..5541024dd5d0 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -947,15 +947,25 @@ struct kvm_device_attr { __u64 addr; /* userspace address of attr data */ }; -#define KVM_DEV_TYPE_FSL_MPIC_20 1 -#define KVM_DEV_TYPE_FSL_MPIC_42 2 -#define KVM_DEV_TYPE_XICS 3 -#define KVM_DEV_TYPE_VFIO 4 #define KVM_DEV_VFIO_GROUP1 #define KVM_DEV_VFIO_GROUP_ADD 1 #define KVM_DEV_VFIO_GROUP_DEL 2 -#define KVM_DEV_TYPE_ARM_VGIC_V2 5 -#define KVM_DEV_TYPE_FLIC 6 + +enum kvm_device_type { + KVM_DEV_TYPE_FSL_MPIC_20= 1, +#define KVM_DEV_TYPE_FSL_MPIC_20 KVM_DEV_TYPE_FSL_MPIC_20 + KVM_DEV_TYPE_FSL_MPIC_42, +#define KVM_DEV_TYPE_FSL_MPIC_42 KVM_DEV_TYPE_FSL_MPIC_42 + KVM_DEV_TYPE_XICS, +#define KVM_DEV_TYPE_XICS KVM_DEV_TYPE_XICS + KVM_DEV_TYPE_VFIO, +#define KVM_DEV_TYPE_VFIO KVM_DEV_TYPE_VFIO + KVM_DEV_TYPE_ARM_VGIC_V2, +#define KVM_DEV_TYPE_ARM_VGIC_V2 KVM_DEV_TYPE_ARM_VGIC_V2 + KVM_DEV_TYPE_FLIC, +#define KVM_DEV_TYPE_FLIC KVM_DEV_TYPE_FLIC + KVM_DEV_TYPE_MAX, +}; /* * ioctls for VM fds diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 33712fb26eb1..81de6768d2c2 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2259,44 +2259,55 @@ struct kvm_device *kvm_device_from_filp(struct file *filp) return filp->private_data; } -static int kvm_ioctl_create_device(struct kvm *kvm, - struct kvm_create_device *cd) -{ - struct kvm_device_ops *ops = NULL; - struct kvm_device *dev; - bool test = cd->flags & KVM_CREATE_DEVICE_TEST; - int ret; - - switch (cd->type) { +static struct kvm_device_ops *kvm_device_ops_table[KVM_DEV_TYPE_MAX] = { #ifdef CONFIG_KVM_MPIC - case KVM_DEV_TYPE_FSL_MPIC_20: - case KVM_DEV_TYPE_FSL_MPIC_42: - ops = &kvm_mpic_ops; - break; + [KVM_DEV_TYPE_FSL_MPIC_20] = &kvm_mpic_ops, + [KVM_DEV_TYPE_FSL_MPIC_42] = &kvm_mpic_ops, #endif + #ifdef CONFIG_KVM_XICS - case KVM_DEV_TYPE_XICS: - ops = &kvm_xics_ops; - break; + [KVM_DEV_TYPE_XICS] = &kvm_xics_ops, #endif + #ifdef CONFIG_KVM_VFIO - case KVM_DEV_TYPE_VFIO: - ops = &kvm_vfio_ops; - break; + [KVM_DEV_TYPE_VFIO] = &kvm_vfio_ops, #endif + #ifdef CONFIG_KVM_ARM_VGIC - case KVM_DEV_TYPE_ARM_VGIC_V2: - ops = &kvm_arm_vgic_v2_ops; - break; + [KVM_DEV_TYPE_ARM_VGIC_V2] = &kvm_arm_vgic_v2_ops, #endif + #ifdef CONFIG_S390 - case KVM_DEV_TYPE_FLIC: - ops = &kvm_flic_ops; - break; + [KVM_DEV_TYPE_FLIC] = &kvm_flic_ops, #endif - default: +}; + +int kvm_register_device_ops(struct kvm_device_ops *ops, u32 type) +{ + if (type >= ARRAY_SIZE(kvm_device_ops_table)) + return -ENOSPC; + + if (kvm_device_ops_table[type] != NULL) + return -EEXIST; + + kvm_device_ops_table[type] = ops; + return 0; +} + +static int kvm_ioctl_create_device(struct kvm *kvm, + struct kvm_create_device *cd) +{ + struct kvm_device_ops *ops = NULL; + struct kvm_device *dev; + bool test = cd->flags & KVM_CREATE_DEVICE_TEST; + int ret; + + if (cd->type >= ARRAY_SIZE(kvm_device_ops_table)) + return -ENODEV; + + ops = kvm_device_ops_table[cd->typ
[PATCH v3 2/4] KVM: ARM: vgic: register kvm_device_ops dynamically
Now that we have a dynamic means to register kvm_device_ops, use that for the ARM VGIC, instead of relying on the static table. Cc: Gleb Natapov Cc: Paolo Bonzini Acked-by: Marc Zyngier Reviewed-by: Christoffer Dall Signed-off-by: Will Deacon --- include/linux/kvm_host.h | 1 - virt/kvm/arm/vgic.c | 157 --- virt/kvm/kvm_main.c | 4 -- 3 files changed, 79 insertions(+), 83 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index ca105f919f0b..e9a069ddcb2b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1073,7 +1073,6 @@ int kvm_register_device_ops(struct kvm_device_ops *ops, u32 type); extern struct kvm_device_ops kvm_mpic_ops; extern struct kvm_device_ops kvm_xics_ops; extern struct kvm_device_ops kvm_vfio_ops; -extern struct kvm_device_ops kvm_arm_vgic_v2_ops; extern struct kvm_device_ops kvm_flic_ops; #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 73eba793b17f..3ee3ce06bbec 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -1522,83 +1522,6 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu) return 0; } -static void vgic_init_maintenance_interrupt(void *info) -{ - enable_percpu_irq(vgic->maint_irq, 0); -} - -static int vgic_cpu_notify(struct notifier_block *self, - unsigned long action, void *cpu) -{ - switch (action) { - case CPU_STARTING: - case CPU_STARTING_FROZEN: - vgic_init_maintenance_interrupt(NULL); - break; - case CPU_DYING: - case CPU_DYING_FROZEN: - disable_percpu_irq(vgic->maint_irq); - break; - } - - return NOTIFY_OK; -} - -static struct notifier_block vgic_cpu_nb = { - .notifier_call = vgic_cpu_notify, -}; - -static const struct of_device_id vgic_ids[] = { - { .compatible = "arm,cortex-a15-gic", .data = vgic_v2_probe, }, - { .compatible = "arm,gic-v3", .data = vgic_v3_probe, }, - {}, -}; - -int kvm_vgic_hyp_init(void) -{ - const struct of_device_id *matched_id; - int (*vgic_probe)(struct device_node *,const struct vgic_ops **, - const struct vgic_params **); - struct device_node *vgic_node; - int ret; - - vgic_node = of_find_matching_node_and_match(NULL, - vgic_ids, &matched_id); - if (!vgic_node) { - kvm_err("error: no compatible GIC node found\n"); - return -ENODEV; - } - - vgic_probe = matched_id->data; - ret = vgic_probe(vgic_node, &vgic_ops, &vgic); - if (ret) - return ret; - - ret = request_percpu_irq(vgic->maint_irq, vgic_maintenance_handler, -"vgic", kvm_get_running_vcpus()); - if (ret) { - kvm_err("Cannot register interrupt %d\n", vgic->maint_irq); - return ret; - } - - ret = __register_cpu_notifier(&vgic_cpu_nb); - if (ret) { - kvm_err("Cannot register vgic CPU notifier\n"); - goto out_free_irq; - } - - /* Callback into for arch code for setup */ - vgic_arch_setup(vgic); - - on_each_cpu(vgic_init_maintenance_interrupt, NULL, 1); - - return 0; - -out_free_irq: - free_percpu_irq(vgic->maint_irq, kvm_get_running_vcpus()); - return ret; -} - /** * kvm_vgic_init - Initialize global VGIC state before running any VCPUs * @kvm: pointer to the kvm struct @@ -2062,7 +1985,7 @@ static int vgic_create(struct kvm_device *dev, u32 type) return kvm_vgic_create(dev->kvm); } -struct kvm_device_ops kvm_arm_vgic_v2_ops = { +static struct kvm_device_ops kvm_arm_vgic_v2_ops = { .name = "kvm-arm-vgic", .create = vgic_create, .destroy = vgic_destroy, @@ -2070,3 +1993,81 @@ struct kvm_device_ops kvm_arm_vgic_v2_ops = { .get_attr = vgic_get_attr, .has_attr = vgic_has_attr, }; + +static void vgic_init_maintenance_interrupt(void *info) +{ + enable_percpu_irq(vgic->maint_irq, 0); +} + +static int vgic_cpu_notify(struct notifier_block *self, + unsigned long action, void *cpu) +{ + switch (action) { + case CPU_STARTING: + case CPU_STARTING_FROZEN: + vgic_init_maintenance_interrupt(NULL); + break; + case CPU_DYING: + case CPU_DYING_FROZEN: + disable_percpu_irq(vgic->maint_irq); + break; + } + + return NOTIFY_OK; +} + +static struct notifier_block vgic_cpu_nb = { + .notifier_call = vgic_cpu_notify, +}; + +static const struct of_device_id vgic_ids[] = { + { .compatible = "arm,cortex-a15-gic", .data = vgic_v2_probe, }, + { .compatible = "arm,gic-v3", .data = vgic_v3_probe, }, + {}, +}; + +int kvm_vgic_hyp_init(void) +{ + c
[PATCH v3 4/4] KVM: VFIO: register kvm_device_ops dynamically
Now that we have a dynamic means to register kvm_device_ops, use that for the VFIO kvm device, instead of relying on the static table. This is achieved by a module_init call to register the ops with KVM. Cc: Gleb Natapov Cc: Paolo Bonzini Acked-by: Alex Williamson Signed-off-by: Will Deacon --- include/linux/kvm_host.h | 1 - virt/kvm/kvm_main.c | 4 virt/kvm/vfio.c | 22 +++--- 3 files changed, 15 insertions(+), 12 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 73f796ad0d0b..55a1618a19dc 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1072,7 +1072,6 @@ int kvm_register_device_ops(struct kvm_device_ops *ops, u32 type); extern struct kvm_device_ops kvm_mpic_ops; extern struct kvm_device_ops kvm_xics_ops; -extern struct kvm_device_ops kvm_vfio_ops; #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 78b25a44db53..e9a703eca8c1 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2268,10 +2268,6 @@ static struct kvm_device_ops *kvm_device_ops_table[KVM_DEV_TYPE_MAX] = { #ifdef CONFIG_KVM_XICS [KVM_DEV_TYPE_XICS] = &kvm_xics_ops, #endif - -#ifdef CONFIG_KVM_VFIO - [KVM_DEV_TYPE_VFIO] = &kvm_vfio_ops, -#endif }; int kvm_register_device_ops(struct kvm_device_ops *ops, u32 type) diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c index ba1a93f935c7..bb11b36ee8a2 100644 --- a/virt/kvm/vfio.c +++ b/virt/kvm/vfio.c @@ -246,6 +246,16 @@ static void kvm_vfio_destroy(struct kvm_device *dev) kfree(dev); /* alloc by kvm_ioctl_create_device, free by .destroy */ } +static int kvm_vfio_create(struct kvm_device *dev, u32 type); + +static struct kvm_device_ops kvm_vfio_ops = { + .name = "kvm-vfio", + .create = kvm_vfio_create, + .destroy = kvm_vfio_destroy, + .set_attr = kvm_vfio_set_attr, + .has_attr = kvm_vfio_has_attr, +}; + static int kvm_vfio_create(struct kvm_device *dev, u32 type) { struct kvm_device *tmp; @@ -268,10 +278,8 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type) return 0; } -struct kvm_device_ops kvm_vfio_ops = { - .name = "kvm-vfio", - .create = kvm_vfio_create, - .destroy = kvm_vfio_destroy, - .set_attr = kvm_vfio_set_attr, - .has_attr = kvm_vfio_has_attr, -}; +static int __init kvm_vfio_ops_init(void) +{ + return kvm_register_device_ops(&kvm_vfio_ops, KVM_DEV_TYPE_VFIO); +} +module_init(kvm_vfio_ops_init); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Nested paging in nested SVM setup
Il 02/09/2014 08:09, Valentine Sinitsyn ha scritto: > > https://www.dropbox.com/s/slbxmxyg74wh9hv/l1mmio-cpu0.txt.gz?dl=0 > > It was taken with kernel 3.16.1 modules with your paging-tmpl.h patch > applied. > > This time, the trace looks somewhat different, however my code still > hangs in nested KVM (and doesn't on real HW). This *is* different though. I don't see any kvm_inj_exception at all (with my patch it should be for vector 0xfe). In any case, the problem seems specific to _writes_ to the APIC page. I'm going to write a testcase for that and see if I can reproduce it now. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Nested paging in nested SVM setup
On 02.09.2014 15:45, Paolo Bonzini wrote: This *is* different though. I don't see any kvm_inj_exception at all (with my patch it should be for vector 0xfe). I've applied the part of your patch, that fixes the uninitialized exception vector problem, otherwise the lockup will trigger before my code will have chance to hang on APIC. Namely, I did the following change: --- a/arch/x86/kvm/paging_tmpl.h2014-09-02 21:53:26.035112557 +0600 +++ b/arch/x86/kvm/paging_tmpl.h2014-09-02 21:53:46.145110721 +0600 @@ -366,7 +366,7 @@ real_gpa = mmu->translate_gpa(vcpu, gfn_to_gpa(gfn), access); if (real_gpa == UNMAPPED_GVA) - return 0; + goto error; walker->gfn = real_gpa >> PAGE_SHIFT; So they should look like regular page faults (as they ought to, I guess) now. Thanks, Valentine -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Nested paging in nested SVM setup
On 02.09.2014 15:45, Paolo Bonzini wrote: In any case, the problem seems specific to _writes_ to the APIC page. I'm going to write a testcase for that and see if I can reproduce it now. If you'll need a complete trace, not only CPU 0, please let me know - I'll upload it as well. It's about 17M compressed. Valentine -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Nested paging in nested SVM setup
Il 02/09/2014 11:16, Valentine Sinitsyn ha scritto: > On 02.09.2014 14:25, Paolo Bonzini wrote: >> I actually meant kvm-unit-tests in order to understand the npt_rsvd >> failure. (I had sent a separate message for Jailhouse). > Oops, sorry for misunderstanding. Uploaded it here: > https://www.dropbox.com/s/jp6ohb0ul3d6v4u/npt_rsvd.txt.bz2?dl=0 Ugh, there are many bugs and the test is even wrong because the actual error code should be 0x20006 (error while visiting page tables). Paolo > The environment is QEMU 2.1.0 + Linux 3.16.1 with paging_tmpl.h patch, > and the only test enabled was npt_rsvd (others do pass now). > >> For kvm-unit-tests, you can comment out tests that do not fail to reduce >> the trace size. > Yes, I've sent that trace earlier today. > > Valentine > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Nested paging in nested SVM setup
On 02.09.2014 17:21, Paolo Bonzini wrote: Ugh, there are many bugs and the test is even wrong because the actual error code should be 0x20006 (error while visiting page tables). Well, good they were spotted. :-) Haven't looked at the test code actually, just saw it fails for some reason. Valentine -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Nested paging in nested SVM setup
Il 02/09/2014 11:53, Valentine Sinitsyn ha scritto: > > real_gpa = mmu->translate_gpa(vcpu, gfn_to_gpa(gfn), access); > if (real_gpa == UNMAPPED_GVA) > -return 0; > +goto error; > > walker->gfn = real_gpa >> PAGE_SHIFT; > > So they should look like regular page faults (as they ought to, I guess) > now. Yes, they do look like a regular page fault with this patch. However, they actually should look like nested page faults... I'm starting to clean up this stuff. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call for agenda for 2014-09-02
Juan Quintela wrote: > Hi > > Please, send any topic that you are interested in covering. > As there are no topics, call gets cancelled. Have a nice day, Juan. > Thanks, Juan. > > Call details: > > 15:00 CEST > 13:00 UTC > 09:00 EDT > > Every two weeks > > By popular demand, a google calendar public entry with it > > > https://www.google.com/calendar/embed?src=dG9iMXRqcXAzN3Y4ZXZwNzRoMHE4a3BqcXNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ > > (Let me know if you have any problems with the calendar entry) > > If you need phone number details, contact me privately > > Thanks, Juan. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 0/3] svm: fix expected values and add extra NPT tests
The NPT implementation of SVM does not set bits 32 and 33 of EXITINFO1. We want to fix that, so check those fields and also add two extra tests: - reserved bits during page access (the existing test is for reserved bits during page table walks) - test for writes to a read-only page mapped to an MMIO device from L1 Paolo Bonzini (3): x86: svm: fix exitinfo values for NPT tests x86: svm: add page access reserved bit test x86: svm: add L1 MMIO read-only permission test x86/svm.c | 88 +++ 1 file changed, 77 insertions(+), 11 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 1/3] x86: svm: fix exitinfo values for NPT tests
The exitinfo values were plain wrong for the page-walk tests (including npt_rsvd), or else they were missing bits 32:33. Expect the right values. Signed-off-by: Paolo Bonzini --- x86/svm.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/x86/svm.c b/x86/svm.c index 00b3191..c4d6d94 100644 --- a/x86/svm.c +++ b/x86/svm.c @@ -703,7 +703,7 @@ static bool npt_nx_check(struct test *test) test->vmcb->save.efer |= (1 << 11); return (test->vmcb->control.exit_code == SVM_EXIT_NPF) - && (test->vmcb->control.exit_info_1 == 0x15); + && (test->vmcb->control.exit_info_1 == 0x10015ULL); } static void npt_us_prepare(struct test *test) @@ -728,7 +728,7 @@ static bool npt_us_check(struct test *test) *pte |= (1ULL << 2); return (test->vmcb->control.exit_code == SVM_EXIT_NPF) - && (test->vmcb->control.exit_info_1 == 0x05); + && (test->vmcb->control.exit_info_1 == 0x10005ULL); } static void npt_rsvd_prepare(struct test *test) @@ -744,7 +744,7 @@ static bool npt_rsvd_check(struct test *test) pdpe[0] &= ~(1ULL << 8); return (test->vmcb->control.exit_code == SVM_EXIT_NPF) -&& (test->vmcb->control.exit_info_1 == 0x0f); +&& (test->vmcb->control.exit_info_1 == 0x20006ULL); } static void npt_rw_prepare(struct test *test) @@ -772,7 +772,7 @@ static bool npt_rw_check(struct test *test) *pte |= (1ULL << 1); return (test->vmcb->control.exit_code == SVM_EXIT_NPF) - && (test->vmcb->control.exit_info_1 == 0x07); + && (test->vmcb->control.exit_info_1 == 0x10007ULL); } static void npt_pfwalk_prepare(struct test *test) @@ -793,7 +793,7 @@ static bool npt_pfwalk_check(struct test *test) *pte |= (1ULL << 1); return (test->vmcb->control.exit_code == SVM_EXIT_NPF) - && (test->vmcb->control.exit_info_1 == 0x7) + && (test->vmcb->control.exit_info_1 == 0x20006ULL) && (test->vmcb->control.exit_info_2 == read_cr3()); } -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 2/3] x86: svm: add page access reserved bit test
The reserved bit test was testing faults during page walk, rather than during page access. Add another test that uses large pages to test reserved bits during page access, and rename the old test to indicate what it really covers. Signed-off-by: Paolo Bonzini --- x86/svm.c | 50 +++--- 1 file changed, 43 insertions(+), 7 deletions(-) diff --git a/x86/svm.c b/x86/svm.c index c4d6d94..df316b5 100644 --- a/x86/svm.c +++ b/x86/svm.c @@ -102,6 +102,17 @@ static void setup_svm(void) pml4e[0] = ((u64)pdpe) | 0x27; } +static u64 *npt_get_pde(u64 address) +{ +int i1, i2; + +address >>= 21; +i1 = (address >> 9) & 0x3; +i2 = address & 0x1ff; + +return &pde[i1][i2]; +} + static u64 *npt_get_pte(u64 address) { int i1, i2; @@ -731,20 +742,27 @@ static bool npt_us_check(struct test *test) && (test->vmcb->control.exit_info_1 == 0x10005ULL); } +u64 save_pde; + static void npt_rsvd_prepare(struct test *test) { +u64 *pde; vmcb_ident(test->vmcb); +pde = npt_get_pde((u64) null_test); -pdpe[0] |= (1ULL << 8); +save_pde = *pde; +*pde = (1ULL << 19) | (1ULL << 7) | 0x27; } static bool npt_rsvd_check(struct test *test) { -pdpe[0] &= ~(1ULL << 8); +u64 *pde = npt_get_pde((u64) null_test); + +*pde = save_pde; return (test->vmcb->control.exit_code == SVM_EXIT_NPF) -&& (test->vmcb->control.exit_info_1 == 0x20006ULL); +&& (test->vmcb->control.exit_info_1 == 0x1001dULL); } static void npt_rw_prepare(struct test *test) @@ -775,7 +793,7 @@ static bool npt_rw_check(struct test *test) && (test->vmcb->control.exit_info_1 == 0x10007ULL); } -static void npt_pfwalk_prepare(struct test *test) +static void npt_rw_pfwalk_prepare(struct test *test) { u64 *pte; @@ -786,7 +804,7 @@ static void npt_pfwalk_prepare(struct test *test) *pte &= ~(1ULL << 1); } -static bool npt_pfwalk_check(struct test *test) +static bool npt_rw_pfwalk_check(struct test *test) { u64 *pte = npt_get_pte(read_cr3()); @@ -797,6 +815,22 @@ static bool npt_pfwalk_check(struct test *test) && (test->vmcb->control.exit_info_2 == read_cr3()); } +static void npt_rsvd_pfwalk_prepare(struct test *test) +{ + +vmcb_ident(test->vmcb); + +pdpe[0] |= (1ULL << 8); +} + +static bool npt_rsvd_pfwalk_check(struct test *test) +{ +pdpe[0] &= ~(1ULL << 8); + +return (test->vmcb->control.exit_code == SVM_EXIT_NPF) +&& (test->vmcb->control.exit_info_1 == 0x20006ULL); +} + static void npt_l1mmio_prepare(struct test *test) { vmcb_ident(test->vmcb); @@ -984,8 +1018,10 @@ static struct test tests[] = { default_finished, npt_rsvd_check }, { "npt_rw", npt_supported, npt_rw_prepare, npt_rw_test, default_finished, npt_rw_check }, -{ "npt_pfwalk", npt_supported, npt_pfwalk_prepare, null_test, - default_finished, npt_pfwalk_check }, +{ "npt_rsvd_pfwalk", npt_supported, npt_rsvd_pfwalk_prepare, null_test, + default_finished, npt_rsvd_pfwalk_check }, +{ "npt_rw_pfwalk", npt_supported, npt_rw_pfwalk_prepare, null_test, + default_finished, npt_rw_pfwalk_check }, { "npt_l1mmio", npt_supported, npt_l1mmio_prepare, npt_l1mmio_test, default_finished, npt_l1mmio_check }, { "latency_run_exit", default_supported, latency_prepare, latency_test, -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 3/3] x86: svm: add L1 MMIO read-only permission test
Test that the emulator correctly injects a nested page fault VMEXIT. Reported-by: Valentine Sinitsyn Signed-off-by: Paolo Bonzini --- x86/svm.c | 30 ++ 1 file changed, 30 insertions(+) diff --git a/x86/svm.c b/x86/svm.c index df316b5..85bb1fa 100644 --- a/x86/svm.c +++ b/x86/svm.c @@ -855,6 +855,34 @@ static bool npt_l1mmio_check(struct test *test) return nested_apic_version1 == lvr && nested_apic_version2 == lvr; } +static void npt_rw_l1mmio_prepare(struct test *test) +{ + +u64 *pte; + +vmcb_ident(test->vmcb); +pte = npt_get_pte(0xfee00080); + +*pte &= ~(1ULL << 1); +} + +static void npt_rw_l1mmio_test(struct test *test) +{ +volatile u32 *data = (volatile void*)(0xfee00080); + +*data = *data; +} + +static bool npt_rw_l1mmio_check(struct test *test) +{ +u64 *pte = npt_get_pte(0xfee00080); + +*pte |= (1ULL << 1); + +return (test->vmcb->control.exit_code == SVM_EXIT_NPF) + && (test->vmcb->control.exit_info_1 == 0x10007ULL); +} + static void latency_prepare(struct test *test) { default_prepare(test); @@ -1024,6 +1052,8 @@ static struct test tests[] = { default_finished, npt_rw_pfwalk_check }, { "npt_l1mmio", npt_supported, npt_l1mmio_prepare, npt_l1mmio_test, default_finished, npt_l1mmio_check }, +{ "npt_rw_l1mmio", npt_supported, npt_rw_l1mmio_prepare, npt_rw_l1mmio_test, + default_finished, npt_rw_l1mmio_check }, { "latency_run_exit", default_supported, latency_prepare, latency_test, latency_finished, latency_check }, { "latency_svm_insn", default_supported, lat_svm_insn_prepare, null_test, -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] KVM: x86: propagate exception from permission checks on the nested page fault
Currently, if a permission error happens during the translation of the final GPA to HPA, walk_addr_generic returns 0 but does not fill in walker->fault. To avoid this, add an x86_exception* argument to the translate_gpa function, and let it fill in walker->fault. The nested_page_fault field will be true, since the walk_mmu is the nested_mmu and translate_gpu instead operates on the "outer" (NPT) instance. Reported-by: Valentine Sinitsyn Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/kvm_host.h | 9 ++--- arch/x86/kvm/mmu.c | 2 +- arch/x86/kvm/paging_tmpl.h | 7 --- arch/x86/kvm/x86.c | 9 + 4 files changed, 16 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 08cc299ec6f4..6bf2f3c7e180 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -262,7 +262,8 @@ struct kvm_mmu { struct x86_exception *fault); gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access, struct x86_exception *exception); - gpa_t (*translate_gpa)(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access); + gpa_t (*translate_gpa)(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access, + struct x86_exception *exception); int (*sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp); void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva); @@ -924,7 +925,8 @@ void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu); int kvm_mmu_load(struct kvm_vcpu *vcpu); void kvm_mmu_unload(struct kvm_vcpu *vcpu); void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu); -gpa_t translate_nested_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access); +gpa_t translate_nested_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access, + struct x86_exception *exception); gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva, struct x86_exception *exception); gpa_t kvm_mmu_gva_to_gpa_fetch(struct kvm_vcpu *vcpu, gva_t gva, @@ -944,7 +946,8 @@ void kvm_mmu_new_cr3(struct kvm_vcpu *vcpu); void kvm_enable_tdp(void); void kvm_disable_tdp(void); -static inline gpa_t translate_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access) +static inline gpa_t translate_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access, + struct x86_exception *exception) { return gpa; } diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 5b93a597e0c8..76398fe15df2 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3200,7 +3200,7 @@ static gpa_t nonpaging_gva_to_gpa_nested(struct kvm_vcpu *vcpu, gva_t vaddr, { if (exception) exception->error_code = 0; - return vcpu->arch.nested_mmu.translate_gpa(vcpu, vaddr, access); + return vcpu->arch.nested_mmu.translate_gpa(vcpu, vaddr, access, exception); } static bool quickly_check_mmio_pf(struct kvm_vcpu *vcpu, u64 addr, bool direct) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 99d4c4e836a0..a97e9214ebce 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -321,14 +321,15 @@ retry_walk: walker->pte_gpa[walker->level - 1] = pte_gpa; real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn), - PFERR_USER_MASK|PFERR_WRITE_MASK); + PFERR_USER_MASK|PFERR_WRITE_MASK, + &walker->fault); /* * Can this happen (except if the guest is playing TOCTTOU games)? * We should have gotten a nested page fault on table_gfn instead. */ if (unlikely(real_gfn == UNMAPPED_GVA)) - goto error; + return 0; real_gfn = gpa_to_gfn(real_gfn); @@ -370,7 +371,7 @@ retry_walk: if (PTTYPE == 32 && walker->level == PT_DIRECTORY_LEVEL && is_cpuid_PSE36()) gfn += pse36_gfn_delta(pte); - real_gpa = mmu->translate_gpa(vcpu, gfn_to_gpa(gfn), access); + real_gpa = mmu->translate_gpa(vcpu, gfn_to_gpa(gfn), access, &walker->fault); if (real_gpa == UNMAPPED_GVA) return 0; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 9e3b74c044ed..022513bf92f1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -467,11 +467,12 @@ int kvm_read_guest_page_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, gfn_t ngfn, void *data, int offset, int len, u32 access) { + struct x86_exception exception; gfn_t real_gfn; gpa_t ngpa; ngpa = gfn_to_gpa(ngfn); - real_gfn = mmu->translate_gpa(vcpu, ngpa, access); + real_gfn = mmu->translate_gpa(vcpu, n
[PATCH 3/4] KVM: x86: inject nested page faults on emulated instructions
This is required for the following patch to work correctly. If a nested page fault happens during emulation, we must inject a vmexit, not a page fault. Luckily we already have the required machinery: it is enough to return X86EMUL_INTERCEPTED instead of X86EMUL_PROPAGATE_FAULT. Reported-by: Valentine Sinitsyn Signed-off-by: Paolo Bonzini --- arch/x86/kvm/x86.c | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e4ed85e07a01..9e3b74c044ed 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -416,6 +416,16 @@ void kvm_propagate_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault) vcpu->arch.mmu.inject_page_fault(vcpu, fault); } +static inline int kvm_propagate_or_intercept(struct kvm_vcpu *vcpu, +struct x86_exception *exception) +{ + if (likely(!exception->nested_page_fault)) + return X86EMUL_PROPAGATE_FAULT; + + vcpu->arch.mmu.inject_page_fault(vcpu, exception); + return X86EMUL_INTERCEPTED; +} + void kvm_inject_nmi(struct kvm_vcpu *vcpu) { atomic_inc(&vcpu->arch.nmi_queued); @@ -4122,7 +4132,7 @@ static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int bytes, int ret; if (gpa == UNMAPPED_GVA) - return X86EMUL_PROPAGATE_FAULT; + return kvm_propagate_or_intercept(vcpu, exception); ret = kvm_read_guest_page(vcpu->kvm, gpa >> PAGE_SHIFT, data, offset, toread); if (ret < 0) { @@ -4152,7 +4162,7 @@ static int kvm_fetch_guest_virt(struct x86_emulate_ctxt *ctxt, gpa_t gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, addr, access|PFERR_FETCH_MASK, exception); if (unlikely(gpa == UNMAPPED_GVA)) - return X86EMUL_PROPAGATE_FAULT; + return kvm_propagate_or_intercept(vcpu, exception); offset = addr & (PAGE_SIZE-1); if (WARN_ON(offset + bytes > PAGE_SIZE)) @@ -4203,7 +4213,7 @@ int kvm_write_guest_virt_system(struct x86_emulate_ctxt *ctxt, int ret; if (gpa == UNMAPPED_GVA) - return X86EMUL_PROPAGATE_FAULT; + return kvm_propagate_or_intercept(vcpu, exception); ret = kvm_write_guest(vcpu->kvm, gpa, data, towrite); if (ret < 0) { r = X86EMUL_IO_NEEDED; @@ -4350,7 +4360,7 @@ static int emulator_read_write_onepage(unsigned long addr, void *val, ret = vcpu_mmio_gva_to_gpa(vcpu, addr, &gpa, exception, write); if (ret < 0) - return X86EMUL_PROPAGATE_FAULT; + return kvm_propagate_or_intercept(vcpu, exception); /* For APIC access vmexit */ if (ret) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] KVM: nSVM: propagate the NPF EXITINFO to the guest
This is similar to what the EPT code does with the exit qualification. This allows the guest to see a valid value for bits 33:32. Signed-off-by: Paolo Bonzini --- arch/x86/kvm/paging_tmpl.h | 6 ++ arch/x86/kvm/svm.c | 26 ++ 2 files changed, 28 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 410776528265..99d4c4e836a0 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -322,8 +322,14 @@ retry_walk: real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn), PFERR_USER_MASK|PFERR_WRITE_MASK); + + /* +* Can this happen (except if the guest is playing TOCTTOU games)? +* We should have gotten a nested page fault on table_gfn instead. +*/ if (unlikely(real_gfn == UNMAPPED_GVA)) goto error; + real_gfn = gpa_to_gfn(real_gfn); host_addr = gfn_to_hva_prot(vcpu->kvm, real_gfn, diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 8ef704037370..bd78a3baca31 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1974,10 +1974,28 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu, { struct vcpu_svm *svm = to_svm(vcpu); - svm->vmcb->control.exit_code = SVM_EXIT_NPF; - svm->vmcb->control.exit_code_hi = 0; - svm->vmcb->control.exit_info_1 = fault->error_code; - svm->vmcb->control.exit_info_2 = fault->address; + /* +* We can keep the value that the processor stored in the VMCB, +* but make up something sensible if we hit the WARN. +*/ + if (WARN_ON(svm->vmcb->control.exit_code != SVM_EXIT_NPF)) { + svm->vmcb->control.exit_code = SVM_EXIT_NPF; + svm->vmcb->control.exit_code_hi = 0; + svm->vmcb->control.exit_info_1 = (1ULL << 32); + svm->vmcb->control.exit_info_2 = fault->address; + } + + /* +* The present bit is always zero for page structure faults on real +* hardware, but we compute it as 1 in fault->error_code. We can +* just keep the original exit info for page structure faults---but +* not for other faults, because the processor's error code might +* be wrong if we have just created a shadow PTE. +*/ + if (svm->vmcb->control.exit_info_1 & (1ULL << 32)) { + svm->vmcb->control.exit_info_1 &= ~0xULL; + svm->vmcb->control.exit_info_1 |= fault->error_code; + } nested_svm_vmexit(svm); } -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] KVM: x86: reserve bit 8 of non-leaf PDPEs and PML4Es in 64-bit mode on AMD
Bit 8 would be the "global" bit, which does not quite make sense for non-leaf page table entries. Intel ignores it; AMD ignores it in PDEs, but reserves it in PDPEs and PML4Es. The SVM test is relying on this behavior, so enforce it. Signed-off-by: Paolo Bonzini --- arch/x86/kvm/cpuid.h | 8 arch/x86/kvm/mmu.c | 13 +++-- 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index a5380590ab0e..43b33e301e68 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -88,6 +88,14 @@ static inline bool guest_cpuid_has_x2apic(struct kvm_vcpu *vcpu) return best && (best->ecx & bit(X86_FEATURE_X2APIC)); } +static inline bool guest_cpuid_is_amd(struct kvm_vcpu *vcpu) +{ + struct kvm_cpuid_entry2 *best; + + best = kvm_find_cpuid_entry(vcpu, 0, 0); + return best && best->ebx == X86EMUL_CPUID_VENDOR_AuthenticAMD_ebx; +} + static inline bool guest_cpuid_has_gbpages(struct kvm_vcpu *vcpu) { struct kvm_cpuid_entry2 *best; diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 6b6df0c5be3d..5b93a597e0c8 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3512,6 +3512,7 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, int maxphyaddr = cpuid_maxphyaddr(vcpu); u64 exb_bit_rsvd = 0; u64 gbpages_bit_rsvd = 0; + u64 nonleaf_bit8_rsvd = 0; context->bad_mt_xwr = 0; @@ -3519,6 +3520,14 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, exb_bit_rsvd = rsvd_bits(63, 63); if (!guest_cpuid_has_gbpages(vcpu)) gbpages_bit_rsvd = rsvd_bits(7, 7); + + /* +* Non-leaf PML4Es and PDPEs reserve bit 8 (which would be the G bit for +* leaf entries) on AMD CPUs only. +*/ + if (guest_cpuid_is_amd(vcpu)) + nonleaf_bit8_rsvd = rsvd_bits(8, 8); + switch (context->root_level) { case PT32_ROOT_LEVEL: /* no rsvd bits for 2 level 4K page table entries */ @@ -3553,9 +3562,9 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, break; case PT64_ROOT_LEVEL: context->rsvd_bits_mask[0][3] = exb_bit_rsvd | - rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 7); + nonleaf_bit8_rsvd | rsvd_bits(7, 7) | rsvd_bits(maxphyaddr, 51); context->rsvd_bits_mask[0][2] = exb_bit_rsvd | - gbpages_bit_rsvd | rsvd_bits(maxphyaddr, 51); + nonleaf_bit8_rsvd | gbpages_bit_rsvd | rsvd_bits(maxphyaddr, 51); context->rsvd_bits_mask[0][1] = exb_bit_rsvd | rsvd_bits(maxphyaddr, 51); context->rsvd_bits_mask[0][0] = exb_bit_rsvd | -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] KVM: nested x86: nested page faults fixes
Patch 1 implements AMD semantics for non-leaf PDPEs and PML4Es, which are a bit different from Intel. The SVM test relies on this, so fix it. Patch 2 lets nested SVM implement nested page fault correctly. We were not setting bits 32/33. Patches 3 and 4 fix the interaction between emulator and nested EPT/NPT, which was reported by Valentine. Reviews are very welcome, I'm walking on thin ice here... Paolo Paolo Bonzini (4): KVM: x86: reserve bit 8 of non-leaf PDPEs and PML4Es in 64-bit mode on AMD KVM: nSVM: propagate the NPF EXITINFO to the guest KVM: x86: inject nested page faults on emulated instructions KVM: x86: propagate exception from permission checks on the nested page fault arch/x86/include/asm/kvm_host.h | 9 ++--- arch/x86/kvm/cpuid.h| 8 arch/x86/kvm/mmu.c | 15 --- arch/x86/kvm/paging_tmpl.h | 13 ++--- arch/x86/kvm/svm.c | 26 ++ arch/x86/kvm/x86.c | 27 +++ 6 files changed, 77 insertions(+), 21 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 00/14] ivshmem: update documentation, add client/server tools
Here is a patchset containing an update on ivshmem specs documentation and importing ivshmem server and client tools. These tools have been written from scratch and are not related to what is available in nahanni repository. I put them in contrib/ directory as the qemu-doc.texi was already telling the server was supposed to be there. Changes since v3: - first patch is untouched - just restored the Reviewed-By Claudio in second patch - following patches 3-8 take into account Stefan's comments - patches 9-12 take into account Gonglei's comments - patch 13 adjusts ivshmem-server default values - last patch introduces a change in the ivshmem client-server protocol to check a protocol version at connect time Changes since v2: - fixed license issues in ivshmem client/server (I took hw/virtio/virtio-rng.c file as a reference). Changes since v1: - moved client/server import patch before doc update, - tried to re-organise the ivshmem_device_spec.txt file based on Claudio comments (still not sure if the result is that great, comments welcome), - incorporated comments from Claudio, Eric and Cam, - added more details on the server <-> client messages exchange (but sorry, no ASCII art here). By the way, there are still some functionnalities that need description (use of ioeventfd, the lack of irqfd support) and some parts of the ivshmem code clearly need cleanup. I will try to address this in future patches when these first patches are ok. -- David Marchand David Marchand (14): contrib: add ivshmem client and server docs: update ivshmem device spec contrib/ivshmem-*: comply with QEMU coding style contrib/ivshmem-*: reuse qemu/queue.h contrib/ivshmem-*: switch to QEMU headers contrib/ivshmem-server: set client sockets as non blocking contrib/ivshmem-*: add missing const and static attrs contrib/ivshmem-*: plug client and server in QEMU top Makefile contrib/ivshmem-*: switch to g_malloc0/g_free contrib/ivshmem-server: fix mem leak on error contrib/ivshmem-*: rework error handling contrib/ivshmem-*: various fixes contrib/ivshmem-server: align server default parameter values ivshmem: add check on protocol version in QEMU Makefile|8 + configure |3 + contrib/ivshmem-client/ivshmem-client.c | 413 +++ contrib/ivshmem-client/ivshmem-client.h | 240 ++ contrib/ivshmem-client/main.c | 237 ++ contrib/ivshmem-server/ivshmem-server.c | 402 ++ contrib/ivshmem-server/ivshmem-server.h | 187 ++ contrib/ivshmem-server/main.c | 243 ++ docs/specs/ivshmem_device_spec.txt | 127 +++--- hw/misc/ivshmem.c | 43 +++- include/hw/misc/ivshmem.h | 17 ++ qemu-doc.texi | 10 +- 12 files changed, 1887 insertions(+), 43 deletions(-) create mode 100644 contrib/ivshmem-client/ivshmem-client.c create mode 100644 contrib/ivshmem-client/ivshmem-client.h create mode 100644 contrib/ivshmem-client/main.c create mode 100644 contrib/ivshmem-server/ivshmem-server.c create mode 100644 contrib/ivshmem-server/ivshmem-server.h create mode 100644 contrib/ivshmem-server/main.c create mode 100644 include/hw/misc/ivshmem.h -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 06/14] contrib/ivshmem-server: set client sockets as non blocking
Signed-off-by: David Marchand --- contrib/ivshmem-server/ivshmem-server.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/contrib/ivshmem-server/ivshmem-server.c b/contrib/ivshmem-server/ivshmem-server.c index 0afa6e8..e0d4d1d 100644 --- a/contrib/ivshmem-server/ivshmem-server.c +++ b/contrib/ivshmem-server/ivshmem-server.c @@ -128,7 +128,8 @@ handle_new_conn(IvshmemServer *server) /* accept the incoming connection */ unaddr_len = sizeof(unaddr); -newfd = accept(server->sock_fd, (struct sockaddr *)&unaddr, &unaddr_len); +newfd = accept4(server->sock_fd, (struct sockaddr *)&unaddr, &unaddr_len, +SOCK_NONBLOCK); if (newfd < 0) { debug_log(server, "cannot accept() %s\n", strerror(errno)); return -1; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 04/14] contrib/ivshmem-*: reuse qemu/queue.h
Switch to qemu/queue.h strutures. Signed-off-by: David Marchand --- contrib/ivshmem-client/ivshmem-client.c | 17 contrib/ivshmem-client/ivshmem-client.h |7 --- contrib/ivshmem-server/ivshmem-server.c | 33 --- contrib/ivshmem-server/ivshmem-server.h |7 --- 4 files changed, 30 insertions(+), 34 deletions(-) diff --git a/contrib/ivshmem-client/ivshmem-client.c b/contrib/ivshmem-client/ivshmem-client.c index 3f6ca98..ce3a5d2 100644 --- a/contrib/ivshmem-client/ivshmem-client.c +++ b/contrib/ivshmem-client/ivshmem-client.c @@ -14,12 +14,13 @@ #include #include #include -#include #include #include #include +#include "qemu/queue.h" + #include "ivshmem-client.h" /* log a message on stdout if verbose=1 */ @@ -84,7 +85,7 @@ free_peer(IvshmemClient *client, IvshmemClientPeer *peer) { unsigned vector; -TAILQ_REMOVE(&client->peer_list, peer, next); +QTAILQ_REMOVE(&client->peer_list, peer, next); for (vector = 0; vector < peer->vectors_count; vector++) { close(peer->vectors[vector]); } @@ -131,7 +132,7 @@ handle_server_msg(IvshmemClient *client) memset(peer, 0, sizeof(*peer)); peer->id = peer_id; peer->vectors_count = 0; -TAILQ_INSERT_TAIL(&client->peer_list, peer, next); +QTAILQ_INSERT_TAIL(&client->peer_list, peer, next); debug_log(client, "new peer id = %ld\n", peer_id); } @@ -161,7 +162,7 @@ ivshmem_client_init(IvshmemClient *client, const char *unix_sock_path, client->local.vectors[i] = -1; } -TAILQ_INIT(&client->peer_list); +QTAILQ_INIT(&client->peer_list); client->local.id = -1; client->notif_cb = notif_cb; @@ -230,7 +231,7 @@ ivshmem_client_close(IvshmemClient *client) debug_log(client, "close client\n"); -while ((peer = TAILQ_FIRST(&client->peer_list)) != NULL) { +while ((peer = QTAILQ_FIRST(&client->peer_list)) != NULL) { free_peer(client, peer); } @@ -363,7 +364,7 @@ ivshmem_client_notify_broadcast(const IvshmemClient *client) IvshmemClientPeer *peer; int ret = 0; -TAILQ_FOREACH(peer, &client->peer_list, next) { +QTAILQ_FOREACH(peer, &client->peer_list, next) { if (ivshmem_client_notify_all_vects(client, peer) < 0) { ret = -1; } @@ -382,7 +383,7 @@ ivshmem_client_search_peer(IvshmemClient *client, long peer_id) return &client->local; } -TAILQ_FOREACH(peer, &client->peer_list, next) { +QTAILQ_FOREACH(peer, &client->peer_list, next) { if (peer->id == peer_id) { return peer; } @@ -406,7 +407,7 @@ ivshmem_client_dump(const IvshmemClient *client) } /* dump peers */ -TAILQ_FOREACH(peer, &client->peer_list, next) { +QTAILQ_FOREACH(peer, &client->peer_list, next) { printf("peer_id = %ld\n", peer->id); for (vector = 0; vector < peer->vectors_count; vector++) { diff --git a/contrib/ivshmem-client/ivshmem-client.h b/contrib/ivshmem-client/ivshmem-client.h index 0fe0c94..e3b284d 100644 --- a/contrib/ivshmem-client/ivshmem-client.h +++ b/contrib/ivshmem-client/ivshmem-client.h @@ -21,7 +21,8 @@ #include #include -#include + +#include "qemu/queue.h" /** * Maximum number of notification vectors supported by the client @@ -40,12 +41,12 @@ * client in (IvshmemClient)->local. */ typedef struct IvshmemClientPeer { -TAILQ_ENTRY(IvshmemClientPeer) next; /**< next in list*/ +QTAILQ_ENTRY(IvshmemClientPeer) next;/**< next in list*/ long id; /**< the id of the peer */ int vectors[IVSHMEM_CLIENT_MAX_VECTORS]; /**< one fd per vector */ unsigned vectors_count; /**< number of vectors */ } IvshmemClientPeer; -TAILQ_HEAD(IvshmemClientPeerList, IvshmemClientPeer); +QTAILQ_HEAD(IvshmemClientPeerList, IvshmemClientPeer); typedef struct IvshmemClientPeerList IvshmemClientPeerList; typedef struct IvshmemClient IvshmemClient; diff --git a/contrib/ivshmem-server/ivshmem-server.c b/contrib/ivshmem-server/ivshmem-server.c index 20fbac0..e58864d 100644 --- a/contrib/ivshmem-server/ivshmem-server.c +++ b/contrib/ivshmem-server/ivshmem-server.c @@ -16,7 +16,6 @@ #include #include -#include #include #include #include @@ -24,6 +23,8 @@ #include #include +#include "qemu/queue.h" + #include "ivshmem-server.h" /* log a message on stdout if verbose=1 */ @@ -33,14 +34,6 @@ }\ } while (0) -/* browse the queue, allowing to remove/free the current element */ -#defineTAILQ_FOREACH_SAFE(var, var2, head, field)\ -for ((var) = TAILQ_FIRST((head)),\ - (var2) = ((var) ? TAILQ_NEXT((var), field) : NULL); \ - (var); \ - (var) = (var2),
[PATCH v4 10/14] contrib/ivshmem-server: fix mem leak on error
Signed-off-by: David Marchand --- contrib/ivshmem-server/ivshmem-server.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/contrib/ivshmem-server/ivshmem-server.c b/contrib/ivshmem-server/ivshmem-server.c index 15d468c..4732dab 100644 --- a/contrib/ivshmem-server/ivshmem-server.c +++ b/contrib/ivshmem-server/ivshmem-server.c @@ -190,8 +190,8 @@ fail: while (i--) { close(peer->vectors[i]); } -peer->sock_fd = -1; close(newfd); +g_free(peer); return -1; } -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 09/14] contrib/ivshmem-*: switch to g_malloc0/g_free
Signed-off-by: David Marchand --- contrib/ivshmem-client/ivshmem-client.c |9 ++--- contrib/ivshmem-server/ivshmem-server.c | 12 ++-- 2 files changed, 4 insertions(+), 17 deletions(-) diff --git a/contrib/ivshmem-client/ivshmem-client.c b/contrib/ivshmem-client/ivshmem-client.c index 2ba40a7..a08f4d9 100644 --- a/contrib/ivshmem-client/ivshmem-client.c +++ b/contrib/ivshmem-client/ivshmem-client.c @@ -82,7 +82,7 @@ free_peer(IvshmemClient *client, IvshmemClientPeer *peer) close(peer->vectors[vector]); } -free(peer); +g_free(peer); } /* handle message coming from server (new peer, new vectors) */ @@ -116,12 +116,7 @@ handle_server_msg(IvshmemClient *client) /* new peer */ if (peer == NULL) { -peer = malloc(sizeof(*peer)); -if (peer == NULL) { -debug_log(client, "cannot allocate new peer\n"); -return -1; -} -memset(peer, 0, sizeof(*peer)); +peer = g_malloc0(sizeof(*peer)); peer->id = peer_id; peer->vectors_count = 0; QTAILQ_INSERT_TAIL(&client->peer_list, peer, next); diff --git a/contrib/ivshmem-server/ivshmem-server.c b/contrib/ivshmem-server/ivshmem-server.c index e0d4d1d..15d468c 100644 --- a/contrib/ivshmem-server/ivshmem-server.c +++ b/contrib/ivshmem-server/ivshmem-server.c @@ -90,7 +90,7 @@ free_peer(IvshmemServer *server, IvshmemServerPeer *peer) close(peer->vectors[vector]); } -free(peer); +g_free(peer); } /* send the peer id and the shm_fd just after a new client connection */ @@ -138,15 +138,7 @@ handle_new_conn(IvshmemServer *server) debug_log(server, "accept()=%d\n", newfd); /* allocate new structure for this peer */ -peer = malloc(sizeof(*peer)); -if (peer == NULL) { -debug_log(server, "cannot allocate new peer\n"); -close(newfd); -return -1; -} - -/* initialize the peer struct, one eventfd per vector */ -memset(peer, 0, sizeof(*peer)); +peer = g_malloc0(sizeof(*peer)); peer->sock_fd = newfd; /* get an unused peer id */ -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 02/14] docs: update ivshmem device spec
Add some notes on the parts needed to use ivshmem devices: more specifically, explain the purpose of an ivshmem server and the basic concept to use the ivshmem devices in guests. Move some parts of the documentation and re-organise it. Signed-off-by: David Marchand Reviewed-by: Claudio Fontana --- docs/specs/ivshmem_device_spec.txt | 124 +++- 1 file changed, 93 insertions(+), 31 deletions(-) diff --git a/docs/specs/ivshmem_device_spec.txt b/docs/specs/ivshmem_device_spec.txt index 667a862..f5f2b95 100644 --- a/docs/specs/ivshmem_device_spec.txt +++ b/docs/specs/ivshmem_device_spec.txt @@ -2,30 +2,103 @@ Device Specification for Inter-VM shared memory device -- -The Inter-VM shared memory device is designed to share a region of memory to -userspace in multiple virtual guests. The memory region does not belong to any -guest, but is a POSIX memory object on the host. Optionally, the device may -support sending interrupts to other guests sharing the same memory region. +The Inter-VM shared memory device is designed to share a memory region (created +on the host via the POSIX shared memory API) between multiple QEMU processes +running different guests. In order for all guests to be able to pick up the +shared memory area, it is modeled by QEMU as a PCI device exposing said memory +to the guest as a PCI BAR. +The memory region does not belong to any guest, but is a POSIX memory object on +the host. The host can access this shared memory if needed. + +The device also provides an optional communication mechanism between guests +sharing the same memory object. More details about that in the section 'Guest to +guest communication' section. The Inter-VM PCI device --- -*BARs* +From the VM point of view, the ivshmem PCI device supports three BARs. + +- BAR0 is a 1 Kbyte MMIO region to support registers and interrupts when MSI is + not used. +- BAR1 is used for MSI-X when it is enabled in the device. +- BAR2 is used to access the shared memory object. + +It is your choice how to use the device but you must choose between two +behaviors : + +- basically, if you only need the shared memory part, you will map BAR2. + This way, you have access to the shared memory in guest and can use it as you + see fit (memnic, for example, uses it in userland + http://dpdk.org/browse/memnic). + +- BAR0 and BAR1 are used to implement an optional communication mechanism + through interrupts in the guests. If you need an event mechanism between the + guests accessing the shared memory, you will most likely want to write a + kernel driver that will handle interrupts. See details in the section 'Guest + to guest communication' section. + +The behavior is chosen when starting your QEMU processes: +- no communication mechanism needed, the first QEMU to start creates the shared + memory on the host, subsequent QEMU processes will use it. + +- communication mechanism needed, an ivshmem server must be started before any + QEMU processes, then each QEMU process connects to the server unix socket. + +For more details on the QEMU ivshmem parameters, see qemu-doc documentation. + + +Guest to guest communication + + +This section details the communication mechanism between the guests accessing +the ivhsmem shared memory. -The device supports three BARs. BAR0 is a 1 Kbyte MMIO region to support -registers. BAR1 is used for MSI-X when it is enabled in the device. BAR2 is -used to map the shared memory object from the host. The size of BAR2 is -specified when the guest is started and must be a power of 2 in size. +*ivshmem server* -*Registers* +This server code is available in qemu.git/contrib/ivshmem-server. -The device currently supports 4 registers of 32-bits each. Registers -are used for synchronization between guests sharing the same memory object when -interrupts are supported (this requires using the shared memory server). +The server must be started on the host before any guest. +It creates a shared memory object then waits for clients to connect on an unix +socket. -The server assigns each VM an ID number and sends this ID number to the QEMU -process when the guest starts. +For each client (QEMU processes) that connects to the server: +- the server assigns an ID for this client and sends this ID to him as the first + message, +- the server sends a fd to the shared memory object to this client, +- the server creates a new set of host eventfds associated to the new client and + sends this set to all already connected clients, +- finally, the server sends all the eventfds sets for all clients to the new + client. + +The server signals all clients when one of them disconnects. + +The client IDs are limited to 16 bits because of the current implementation (see +Doorbell register in 'PCI device registers' subsection). Hence on 65536 clients +are supported. + +All the fil
[PATCH v4 11/14] contrib/ivshmem-*: rework error handling
Following Gonglei comments, rework error handling using goto. Signed-off-by: David Marchand --- contrib/ivshmem-client/ivshmem-client.c | 17 - contrib/ivshmem-server/ivshmem-server.c | 19 ++- 2 files changed, 18 insertions(+), 18 deletions(-) diff --git a/contrib/ivshmem-client/ivshmem-client.c b/contrib/ivshmem-client/ivshmem-client.c index a08f4d9..e9a19ff 100644 --- a/contrib/ivshmem-client/ivshmem-client.c +++ b/contrib/ivshmem-client/ivshmem-client.c @@ -180,18 +180,14 @@ ivshmem_client_connect(IvshmemClient *client) if (connect(client->sock_fd, (struct sockaddr *)&sun, sizeof(sun)) < 0) { debug_log(client, "cannot connect to %s: %s\n", sun.sun_path, strerror(errno)); -close(client->sock_fd); -client->sock_fd = -1; -return -1; +goto err_close; } /* first, we expect our index + a fd == -1 */ if (read_one_msg(client, &client->local.id, &fd) < 0 || client->local.id < 0 || fd != -1) { debug_log(client, "cannot read from server\n"); -close(client->sock_fd); -client->sock_fd = -1; -return -1; +goto err_close; } debug_log(client, "our_id=%ld\n", client->local.id); @@ -200,13 +196,16 @@ ivshmem_client_connect(IvshmemClient *client) if (read_one_msg(client, &tmp, &fd) < 0 || tmp != -1 || fd < 0) { debug_log(client, "cannot read from server (2)\n"); -close(client->sock_fd); -client->sock_fd = -1; -return -1; +goto err_close; } debug_log(client, "shm_fd=%d\n", fd); return 0; + +err_close: +close(client->sock_fd); +client->sock_fd = -1; +return -1; } /* close connection to the server, and free all peer structures */ diff --git a/contrib/ivshmem-server/ivshmem-server.c b/contrib/ivshmem-server/ivshmem-server.c index 4732dab..f441da7 100644 --- a/contrib/ivshmem-server/ivshmem-server.c +++ b/contrib/ivshmem-server/ivshmem-server.c @@ -264,7 +264,7 @@ ivshmem_server_start(IvshmemServer *server) if (ivshmem_ftruncate(shm_fd, server->shm_size) < 0) { fprintf(stderr, "ftruncate(%s) failed: %s\n", server->shm_path, strerror(errno)); -return -1; +goto err_close_shm; } debug_log(server, "create & bind socket %s\n", server->unix_sock_path); @@ -273,8 +273,7 @@ ivshmem_server_start(IvshmemServer *server) sock_fd = socket(AF_UNIX, SOCK_STREAM, 0); if (sock_fd < 0) { debug_log(server, "cannot create socket: %s\n", strerror(errno)); -close(shm_fd); -return -1; +goto err_close_shm; } sun.sun_family = AF_UNIX; @@ -283,22 +282,24 @@ ivshmem_server_start(IvshmemServer *server) if (bind(sock_fd, (struct sockaddr *)&sun, sizeof(sun)) < 0) { debug_log(server, "cannot connect to %s: %s\n", sun.sun_path, strerror(errno)); -close(sock_fd); -close(shm_fd); -return -1; +goto err_close_sock; } if (listen(sock_fd, IVSHMEM_SERVER_LISTEN_BACKLOG) < 0) { debug_log(server, "listen() failed: %s\n", strerror(errno)); -close(sock_fd); -close(shm_fd); -return -1; +goto err_close_sock; } server->sock_fd = sock_fd; server->shm_fd = shm_fd; return 0; + +err_close_sock: +close(sock_fd); +err_close_shm: +close(shm_fd); +return -1; } /* close connections to clients, the unix socket and the shm fd */ -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 12/14] contrib/ivshmem-*: various fixes
More fixes following Gonglei comments: - add a missing \n in a debug message. - add an explicit initialisation of sock_fd. - fix a check on vector index. Signed-off-by: David Marchand --- contrib/ivshmem-client/ivshmem-client.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/contrib/ivshmem-client/ivshmem-client.c b/contrib/ivshmem-client/ivshmem-client.c index e9a19ff..ad210c8 100644 --- a/contrib/ivshmem-client/ivshmem-client.c +++ b/contrib/ivshmem-client/ivshmem-client.c @@ -105,7 +105,7 @@ handle_server_msg(IvshmemClient *client) if (fd == -1) { if (peer == NULL || peer == &client->local) { -debug_log(client, "receive delete for invalid peer %ld", peer_id); +debug_log(client, "receive delete for invalid peer %ld\n", peer_id); return -1; } @@ -155,6 +155,7 @@ ivshmem_client_init(IvshmemClient *client, const char *unix_sock_path, client->notif_cb = notif_cb; client->notif_arg = notif_arg; client->verbose = verbose; +client->sock_fd = -1; return 0; } @@ -309,7 +310,7 @@ ivshmem_client_notify(const IvshmemClient *client, uint64_t kick; int fd; -if (vector > peer->vectors_count) { +if (vector >= peer->vectors_count) { debug_log(client, "invalid vector %u on peer %ld\n", vector, peer->id); return -1; } -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 14/14] ivshmem: add check on protocol version in QEMU
Send a protocol version as the first message from server, clients must close communication if they don't support this protocol version. Older QEMUs should be fine with this change in the protocol since they overrides their own vm_id on reception of an id associated to no eventfd. Signed-off-by: David Marchand --- contrib/ivshmem-client/ivshmem-client.c | 14 +++--- contrib/ivshmem-client/ivshmem-client.h |1 + contrib/ivshmem-server/ivshmem-server.c |7 + contrib/ivshmem-server/ivshmem-server.h |1 + docs/specs/ivshmem_device_spec.txt |9 --- hw/misc/ivshmem.c | 43 --- include/hw/misc/ivshmem.h | 17 7 files changed, 77 insertions(+), 15 deletions(-) create mode 100644 include/hw/misc/ivshmem.h diff --git a/contrib/ivshmem-client/ivshmem-client.c b/contrib/ivshmem-client/ivshmem-client.c index ad210c8..0c4e016 100644 --- a/contrib/ivshmem-client/ivshmem-client.c +++ b/contrib/ivshmem-client/ivshmem-client.c @@ -184,10 +184,18 @@ ivshmem_client_connect(IvshmemClient *client) goto err_close; } -/* first, we expect our index + a fd == -1 */ +/* first, we expect a protocol version */ +if (read_one_msg(client, &tmp, &fd) < 0 || +(tmp != IVSHMEM_PROTOCOL_VERSION) || fd != -1) { +debug_log(client, "cannot read from server\n"); +goto err_close; +} +debug_log(client, "our_id=%ld\n", client->local.id); + +/* then, we expect our index + a fd == -1 */ if (read_one_msg(client, &client->local.id, &fd) < 0 || client->local.id < 0 || fd != -1) { -debug_log(client, "cannot read from server\n"); +debug_log(client, "cannot read from server (2)\n"); goto err_close; } debug_log(client, "our_id=%ld\n", client->local.id); @@ -196,7 +204,7 @@ ivshmem_client_connect(IvshmemClient *client) * is not used */ if (read_one_msg(client, &tmp, &fd) < 0 || tmp != -1 || fd < 0) { -debug_log(client, "cannot read from server (2)\n"); +debug_log(client, "cannot read from server (3)\n"); goto err_close; } debug_log(client, "shm_fd=%d\n", fd); diff --git a/contrib/ivshmem-client/ivshmem-client.h b/contrib/ivshmem-client/ivshmem-client.h index 45f2b64..8d6ab35 100644 --- a/contrib/ivshmem-client/ivshmem-client.h +++ b/contrib/ivshmem-client/ivshmem-client.h @@ -23,6 +23,7 @@ #include #include "qemu/queue.h" +#include "hw/misc/ivshmem.h" /** * Maximum number of notification vectors supported by the client diff --git a/contrib/ivshmem-server/ivshmem-server.c b/contrib/ivshmem-server/ivshmem-server.c index f441da7..670c58c 100644 --- a/contrib/ivshmem-server/ivshmem-server.c +++ b/contrib/ivshmem-server/ivshmem-server.c @@ -99,6 +99,13 @@ send_initial_info(IvshmemServer *server, IvshmemServerPeer *peer) { int ret; +/* send our protool version first */ +ret = send_one_msg(peer->sock_fd, IVSHMEM_PROTOCOL_VERSION, -1); +if (ret < 0) { +debug_log(server, "cannot send version: %s\n", strerror(errno)); +return -1; +} + /* send the peer id to the client */ ret = send_one_msg(peer->sock_fd, peer->id, -1); if (ret < 0) { diff --git a/contrib/ivshmem-server/ivshmem-server.h b/contrib/ivshmem-server/ivshmem-server.h index 5ccc7af..e76e4fe 100644 --- a/contrib/ivshmem-server/ivshmem-server.h +++ b/contrib/ivshmem-server/ivshmem-server.h @@ -30,6 +30,7 @@ #include #include "qemu/queue.h" +#include "hw/misc/ivshmem.h" /** * Maximum number of notification vectors supported by the server diff --git a/docs/specs/ivshmem_device_spec.txt b/docs/specs/ivshmem_device_spec.txt index f5f2b95..bae87de 100644 --- a/docs/specs/ivshmem_device_spec.txt +++ b/docs/specs/ivshmem_device_spec.txt @@ -64,6 +64,8 @@ It creates a shared memory object then waits for clients to connect on an unix socket. For each client (QEMU processes) that connects to the server: +- the server sends a protocol version, if client does not support it, the client + closes the communication, - the server assigns an ID for this client and sends this ID to him as the first message, - the server sends a fd to the shared memory object to this client, @@ -86,9 +88,10 @@ been provided in qemu.git/contrib/ivshmem-client for debug. *QEMU as an ivshmem client* -At initialisation, when creating the ivshmem device, QEMU gets its ID from the -server then make it available through BAR0 IVPosition register for the VM to use -(see 'PCI device registers' subsection). +At initialisation, when creating the ivshmem device, QEMU first receives a +protocol version and closes communication with server if it does not match. +Then, QEMU gets its ID from the server then make it available through BAR0 +IVPosition register for the VM to use (see 'PCI device registers' subsection). QEMU then uses the fd to the shared memory to map it to BAR2. even
[PATCH v4 03/14] contrib/ivshmem-*: comply with QEMU coding style
Fix coding style for structures. Signed-off-by: David Marchand --- contrib/ivshmem-client/ivshmem-client.c | 47 ++- contrib/ivshmem-client/ivshmem-client.h | 76 +++ contrib/ivshmem-client/main.c | 21 - contrib/ivshmem-server/ivshmem-server.c | 38 contrib/ivshmem-server/ivshmem-server.h | 68 +-- contrib/ivshmem-server/main.c | 12 ++--- 6 files changed, 129 insertions(+), 133 deletions(-) diff --git a/contrib/ivshmem-client/ivshmem-client.c b/contrib/ivshmem-client/ivshmem-client.c index 2166b64..3f6ca98 100644 --- a/contrib/ivshmem-client/ivshmem-client.c +++ b/contrib/ivshmem-client/ivshmem-client.c @@ -31,7 +31,7 @@ /* read message from the unix socket */ static int -read_one_msg(struct ivshmem_client *client, long *index, int *fd) +read_one_msg(IvshmemClient *client, long *index, int *fd) { int ret; struct msghdr msg; @@ -80,7 +80,7 @@ read_one_msg(struct ivshmem_client *client, long *index, int *fd) /* free a peer when the server advertise a disconnection or when the * client is freed */ static void -free_peer(struct ivshmem_client *client, struct ivshmem_client_peer *peer) +free_peer(IvshmemClient *client, IvshmemClientPeer *peer) { unsigned vector; @@ -94,9 +94,9 @@ free_peer(struct ivshmem_client *client, struct ivshmem_client_peer *peer) /* handle message coming from server (new peer, new vectors) */ static int -handle_server_msg(struct ivshmem_client *client) +handle_server_msg(IvshmemClient *client) { -struct ivshmem_client_peer *peer; +IvshmemClientPeer *peer; long peer_id; int ret, fd; @@ -146,7 +146,7 @@ handle_server_msg(struct ivshmem_client *client) /* init a new ivshmem client */ int -ivshmem_client_init(struct ivshmem_client *client, const char *unix_sock_path, +ivshmem_client_init(IvshmemClient *client, const char *unix_sock_path, ivshmem_client_notif_cb_t notif_cb, void *notif_arg, int verbose) { @@ -173,7 +173,7 @@ ivshmem_client_init(struct ivshmem_client *client, const char *unix_sock_path, /* create and connect to the unix socket */ int -ivshmem_client_connect(struct ivshmem_client *client) +ivshmem_client_connect(IvshmemClient *client) { struct sockaddr_un sun; int fd; @@ -223,9 +223,9 @@ ivshmem_client_connect(struct ivshmem_client *client) /* close connection to the server, and free all peer structures */ void -ivshmem_client_close(struct ivshmem_client *client) +ivshmem_client_close(IvshmemClient *client) { -struct ivshmem_client_peer *peer; +IvshmemClientPeer *peer; unsigned i; debug_log(client, "close client\n"); @@ -244,8 +244,7 @@ ivshmem_client_close(struct ivshmem_client *client) /* get the fd_set according to the unix socket and peer list */ void -ivshmem_client_get_fds(const struct ivshmem_client *client, fd_set *fds, - int *maxfd) +ivshmem_client_get_fds(const IvshmemClient *client, fd_set *fds, int *maxfd) { int fd; unsigned vector; @@ -266,9 +265,9 @@ ivshmem_client_get_fds(const struct ivshmem_client *client, fd_set *fds, /* handle events from eventfd: just print a message on notification */ static int -handle_event(struct ivshmem_client *client, const fd_set *cur, int maxfd) +handle_event(IvshmemClient *client, const fd_set *cur, int maxfd) { -struct ivshmem_client_peer *peer; +IvshmemClientPeer *peer; uint64_t kick; unsigned i; int ret; @@ -301,7 +300,7 @@ handle_event(struct ivshmem_client *client, const fd_set *cur, int maxfd) /* read and handle new messages on the given fd_set */ int -ivshmem_client_handle_fds(struct ivshmem_client *client, fd_set *fds, int maxfd) +ivshmem_client_handle_fds(IvshmemClient *client, fd_set *fds, int maxfd) { if (client->sock_fd < maxfd && FD_ISSET(client->sock_fd, fds) && handle_server_msg(client) < 0 && errno != EINTR) { @@ -317,8 +316,8 @@ ivshmem_client_handle_fds(struct ivshmem_client *client, fd_set *fds, int maxfd) /* send a notification on a vector of a peer */ int -ivshmem_client_notify(const struct ivshmem_client *client, - const struct ivshmem_client_peer *peer, unsigned vector) +ivshmem_client_notify(const IvshmemClient *client, + const IvshmemClientPeer *peer, unsigned vector) { uint64_t kick; int fd; @@ -342,8 +341,8 @@ ivshmem_client_notify(const struct ivshmem_client *client, /* send a notification to all vectors of a peer */ int -ivshmem_client_notify_all_vects(const struct ivshmem_client *client, -const struct ivshmem_client_peer *peer) +ivshmem_client_notify_all_vects(const IvshmemClient *client, +const IvshmemClientPeer *peer) { unsigned vector; int ret = 0; @@ -359,9 +358,9 @@ ivshmem_client_notify_all_vects(const struct
[PATCH v4 01/14] contrib: add ivshmem client and server
When using ivshmem devices, notifications between guests can be sent as interrupts using a ivshmem-server (typical use described in documentation). The client is provided as a debug tool. Signed-off-by: Olivier Matz Signed-off-by: David Marchand --- contrib/ivshmem-client/Makefile | 29 +++ contrib/ivshmem-client/ivshmem-client.c | 418 ++ contrib/ivshmem-client/ivshmem-client.h | 238 ++ contrib/ivshmem-client/main.c | 246 ++ contrib/ivshmem-server/Makefile | 29 +++ contrib/ivshmem-server/ivshmem-server.c | 420 +++ contrib/ivshmem-server/ivshmem-server.h | 185 ++ contrib/ivshmem-server/main.c | 296 ++ qemu-doc.texi | 10 +- 9 files changed, 1868 insertions(+), 3 deletions(-) create mode 100644 contrib/ivshmem-client/Makefile create mode 100644 contrib/ivshmem-client/ivshmem-client.c create mode 100644 contrib/ivshmem-client/ivshmem-client.h create mode 100644 contrib/ivshmem-client/main.c create mode 100644 contrib/ivshmem-server/Makefile create mode 100644 contrib/ivshmem-server/ivshmem-server.c create mode 100644 contrib/ivshmem-server/ivshmem-server.h create mode 100644 contrib/ivshmem-server/main.c diff --git a/contrib/ivshmem-client/Makefile b/contrib/ivshmem-client/Makefile new file mode 100644 index 000..eee97c6 --- /dev/null +++ b/contrib/ivshmem-client/Makefile @@ -0,0 +1,29 @@ +# Copyright 6WIND S.A., 2014 +# +# This work is licensed under the terms of the GNU GPL, version 2 or +# (at your option) any later version. See the COPYING file in the +# top-level directory. + +S ?= $(CURDIR) +O ?= $(CURDIR) + +CFLAGS += -Wall -Wextra -Werror -g +LDFLAGS += +LDLIBS += -lrt + +VPATH = $(S) +PROG = ivshmem-client +OBJS := $(O)/ivshmem-client.o +OBJS += $(O)/main.o + +$(O)/%.o: %.c + $(CC) $(CFLAGS) -o $@ -c $< + +$(O)/$(PROG): $(OBJS) + $(CC) $(LDFLAGS) -o $@ $^ $(LDLIBS) + +.PHONY: all +all: $(O)/$(PROG) + +clean: + rm -f $(OBJS) $(O)/$(PROG) diff --git a/contrib/ivshmem-client/ivshmem-client.c b/contrib/ivshmem-client/ivshmem-client.c new file mode 100644 index 000..2166b64 --- /dev/null +++ b/contrib/ivshmem-client/ivshmem-client.c @@ -0,0 +1,418 @@ +/* + * Copyright 6WIND S.A., 2014 + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * (at your option) any later version. See the COPYING file in the + * top-level directory. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +#include "ivshmem-client.h" + +/* log a message on stdout if verbose=1 */ +#define debug_log(client, fmt, ...) do { \ +if ((client)->verbose) { \ +printf(fmt, ## __VA_ARGS__); \ +}\ +} while (0) + +/* read message from the unix socket */ +static int +read_one_msg(struct ivshmem_client *client, long *index, int *fd) +{ +int ret; +struct msghdr msg; +struct iovec iov[1]; +union { +struct cmsghdr cmsg; +char control[CMSG_SPACE(sizeof(int))]; +} msg_control; +struct cmsghdr *cmsg; + +iov[0].iov_base = index; +iov[0].iov_len = sizeof(*index); + +memset(&msg, 0, sizeof(msg)); +msg.msg_iov = iov; +msg.msg_iovlen = 1; +msg.msg_control = &msg_control; +msg.msg_controllen = sizeof(msg_control); + +ret = recvmsg(client->sock_fd, &msg, 0); +if (ret < 0) { +debug_log(client, "cannot read message: %s\n", strerror(errno)); +return -1; +} +if (ret == 0) { +debug_log(client, "lost connection to server\n"); +return -1; +} + +*fd = -1; + +for (cmsg = CMSG_FIRSTHDR(&msg); cmsg; cmsg = CMSG_NXTHDR(&msg, cmsg)) { + +if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)) || +cmsg->cmsg_level != SOL_SOCKET || +cmsg->cmsg_type != SCM_RIGHTS) { +continue; +} + +memcpy(fd, CMSG_DATA(cmsg), sizeof(*fd)); +} + +return 0; +} + +/* free a peer when the server advertise a disconnection or when the + * client is freed */ +static void +free_peer(struct ivshmem_client *client, struct ivshmem_client_peer *peer) +{ +unsigned vector; + +TAILQ_REMOVE(&client->peer_list, peer, next); +for (vector = 0; vector < peer->vectors_count; vector++) { +close(peer->vectors[vector]); +} + +free(peer); +} + +/* handle message coming from server (new peer, new vectors) */ +static int +handle_server_msg(struct ivshmem_client *client) +{ +struct ivshmem_client_peer *peer; +long peer_id; +int ret, fd; + +ret = read_one_msg(client, &peer_id, &fd); +if (ret < 0) { +return -1; +} + +/* can return a peer or the local client */ +peer = ivshmem_client_search_peer(client, peer_id); + +/* delete peer */ +if (fd == -1
[PATCH v4 08/14] contrib/ivshmem-*: plug client and server in QEMU top Makefile
Signed-off-by: David Marchand --- Makefile|8 configure |3 +++ contrib/ivshmem-client/Makefile | 29 - contrib/ivshmem-server/Makefile | 29 - 4 files changed, 11 insertions(+), 58 deletions(-) delete mode 100644 contrib/ivshmem-client/Makefile delete mode 100644 contrib/ivshmem-server/Makefile diff --git a/Makefile b/Makefile index b33aaac..0575898 100644 --- a/Makefile +++ b/Makefile @@ -283,6 +283,14 @@ $(qga-obj-y) qemu-ga.o: $(QGALIB_GEN) qemu-ga$(EXESUF): $(qga-obj-y) libqemuutil.a libqemustub.a $(call LINK, $^) +IVSHMEM_CLIENT_OBJS=$(addprefix $(SRC_PATH)/contrib/ivshmem-client/, ivshmem-client.o main.o) +ivshmem-client$(EXESUF): $(IVSHMEM_CLIENT_OBJS) + $(call LINK, $^) + +IVSHMEM_SERVER_OBJS=$(addprefix $(SRC_PATH)/contrib/ivshmem-server/, ivshmem-server.o main.o) +ivshmem-server$(EXESUF): $(IVSHMEM_SERVER_OBJS) libqemuutil.a libqemustub.a + $(call LINK, $^) + clean: # avoid old build problems by removing potentially incorrect old files rm -f config.mak op-i386.h opc-i386.h gen-op-i386.h op-arm.h opc-arm.h gen-op-arm.h diff --git a/configure b/configure index 961bf6f..a41a16c 100755 --- a/configure +++ b/configure @@ -4125,6 +4125,9 @@ if test "$want_tools" = "yes" ; then if [ "$linux" = "yes" -o "$bsd" = "yes" -o "$solaris" = "yes" ] ; then tools="qemu-nbd\$(EXESUF) $tools" fi + if [ "$kvm" = "yes" ] ; then +tools="ivshmem-client\$(EXESUF) ivshmem-server\$(EXESUF) $tools" + fi fi if test "$softmmu" = yes ; then if test "$virtfs" != no ; then diff --git a/contrib/ivshmem-client/Makefile b/contrib/ivshmem-client/Makefile deleted file mode 100644 index eee97c6..000 --- a/contrib/ivshmem-client/Makefile +++ /dev/null @@ -1,29 +0,0 @@ -# Copyright 6WIND S.A., 2014 -# -# This work is licensed under the terms of the GNU GPL, version 2 or -# (at your option) any later version. See the COPYING file in the -# top-level directory. - -S ?= $(CURDIR) -O ?= $(CURDIR) - -CFLAGS += -Wall -Wextra -Werror -g -LDFLAGS += -LDLIBS += -lrt - -VPATH = $(S) -PROG = ivshmem-client -OBJS := $(O)/ivshmem-client.o -OBJS += $(O)/main.o - -$(O)/%.o: %.c - $(CC) $(CFLAGS) -o $@ -c $< - -$(O)/$(PROG): $(OBJS) - $(CC) $(LDFLAGS) -o $@ $^ $(LDLIBS) - -.PHONY: all -all: $(O)/$(PROG) - -clean: - rm -f $(OBJS) $(O)/$(PROG) diff --git a/contrib/ivshmem-server/Makefile b/contrib/ivshmem-server/Makefile deleted file mode 100644 index 26b4a72..000 --- a/contrib/ivshmem-server/Makefile +++ /dev/null @@ -1,29 +0,0 @@ -# Copyright 6WIND S.A., 2014 -# -# This work is licensed under the terms of the GNU GPL, version 2 or -# (at your option) any later version. See the COPYING file in the -# top-level directory. - -S ?= $(CURDIR) -O ?= $(CURDIR) - -CFLAGS += -Wall -Wextra -Werror -g -LDFLAGS += -LDLIBS += -lrt - -VPATH = $(S) -PROG = ivshmem-server -OBJS := $(O)/ivshmem-server.o -OBJS += $(O)/main.o - -$(O)/%.o: %.c - $(CC) $(CFLAGS) -o $@ -c $< - -$(O)/$(PROG): $(OBJS) - $(CC) $(LDFLAGS) -o $@ $^ $(LDLIBS) - -.PHONY: all -all: $(O)/$(PROG) - -clean: - rm -f $(OBJS) $(O)/$(PROG) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 07/14] contrib/ivshmem-*: add missing const and static attrs
Signed-off-by: David Marchand --- contrib/ivshmem-client/main.c |6 +++--- contrib/ivshmem-server/main.c |8 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/contrib/ivshmem-client/main.c b/contrib/ivshmem-client/main.c index f8a7b66..a8e1586 100644 --- a/contrib/ivshmem-client/main.c +++ b/contrib/ivshmem-client/main.c @@ -15,7 +15,7 @@ typedef struct IvshmemClientArgs { bool verbose; -char *unix_sock_path; +const char *unix_sock_path; } IvshmemClientArgs; /* show usage and exit with given error code */ @@ -129,7 +129,7 @@ handle_stdin_command(IvshmemClient *client) /* listen on stdin (command line), on unix socket (notifications of new * and dead peers), and on eventfd (IRQ request) */ -int +static int poll_events(IvshmemClient *client) { fd_set fds; @@ -172,7 +172,7 @@ poll_events(IvshmemClient *client) } /* callback when we receive a notification (just display it) */ -void +static void notification_cb(const IvshmemClient *client, const IvshmemClientPeer *peer, unsigned vect, void *arg) { diff --git a/contrib/ivshmem-server/main.c b/contrib/ivshmem-server/main.c index f00e6f9..1966e71 100644 --- a/contrib/ivshmem-server/main.c +++ b/contrib/ivshmem-server/main.c @@ -22,9 +22,9 @@ typedef struct IvshmemServerArgs { bool verbose; bool foreground; -char *pid_file; -char *unix_socket_path; -char *shm_path; +const char *pid_file; +const char *unix_socket_path; +const char *shm_path; size_t shm_size; unsigned n_vectors; } IvshmemServerArgs; @@ -138,7 +138,7 @@ parse_args(IvshmemServerArgs *args, int argc, char *argv[]) /* wait for events on listening server unix socket and connected client * sockets */ -int +static int poll_events(IvshmemServer *server) { fd_set fds; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 13/14] contrib/ivshmem-server: align server default parameter values
ivshmem server should use the same default values as hw/misc/ivshmem. Update accordingly. Signed-off-by: David Marchand --- contrib/ivshmem-server/main.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/contrib/ivshmem-server/main.c b/contrib/ivshmem-server/main.c index 1966e71..8792fc8 100644 --- a/contrib/ivshmem-server/main.c +++ b/contrib/ivshmem-server/main.c @@ -15,8 +15,8 @@ #define DEFAULT_PID_FILE "/var/run/ivshmem-server.pid" #define DEFAULT_UNIX_SOCK_PATH "/tmp/ivshmem_socket" #define DEFAULT_SHM_PATH "ivshmem" -#define DEFAULT_SHM_SIZE (1024*1024) -#define DEFAULT_N_VECTORS 16 +#define DEFAULT_SHM_SIZE (4*1024*1024) +#define DEFAULT_N_VECTORS 1 /* arguments given by the user */ typedef struct IvshmemServerArgs { -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 05/14] contrib/ivshmem-*: switch to QEMU headers
Reuse parsers from QEMU, C99 boolean. Signed-off-by: David Marchand --- contrib/ivshmem-client/ivshmem-client.c | 12 + contrib/ivshmem-client/ivshmem-client.h |4 +- contrib/ivshmem-client/main.c | 12 + contrib/ivshmem-server/ivshmem-server.c | 14 +- contrib/ivshmem-server/ivshmem-server.h |4 +- contrib/ivshmem-server/main.c | 73 +-- 6 files changed, 20 insertions(+), 99 deletions(-) diff --git a/contrib/ivshmem-client/ivshmem-client.c b/contrib/ivshmem-client/ivshmem-client.c index ce3a5d2..2ba40a7 100644 --- a/contrib/ivshmem-client/ivshmem-client.c +++ b/contrib/ivshmem-client/ivshmem-client.c @@ -6,19 +6,11 @@ * top-level directory. */ -#include -#include -#include -#include -#include -#include -#include -#include - #include #include #include +#include "qemu-common.h" #include "qemu/queue.h" #include "ivshmem-client.h" @@ -149,7 +141,7 @@ handle_server_msg(IvshmemClient *client) int ivshmem_client_init(IvshmemClient *client, const char *unix_sock_path, ivshmem_client_notif_cb_t notif_cb, void *notif_arg, -int verbose) +bool verbose) { unsigned i; diff --git a/contrib/ivshmem-client/ivshmem-client.h b/contrib/ivshmem-client/ivshmem-client.h index e3b284d..45f2b64 100644 --- a/contrib/ivshmem-client/ivshmem-client.h +++ b/contrib/ivshmem-client/ivshmem-client.h @@ -78,7 +78,7 @@ struct IvshmemClient { ivshmem_client_notif_cb_t notif_cb; /**< notification callback */ void *notif_arg;/**< notification argument */ -int verbose;/**< true to enable debug */ +bool verbose; /**< true to enable debug */ }; /** @@ -101,7 +101,7 @@ struct IvshmemClient { */ int ivshmem_client_init(IvshmemClient *client, const char *unix_sock_path, ivshmem_client_notif_cb_t notif_cb, void *notif_arg, -int verbose); +bool verbose); /** * Connect to the server diff --git a/contrib/ivshmem-client/main.c b/contrib/ivshmem-client/main.c index 778d0f2..f8a7b66 100644 --- a/contrib/ivshmem-client/main.c +++ b/contrib/ivshmem-client/main.c @@ -6,15 +6,7 @@ * top-level directory. */ -#include -#include -#include -#include -#include -#include -#include -#include -#include +#include "qemu-common.h" #include "ivshmem-client.h" @@ -22,7 +14,7 @@ #define DEFAULT_UNIX_SOCK_PATH "/tmp/ivshmem_socket" typedef struct IvshmemClientArgs { -int verbose; +bool verbose; char *unix_sock_path; } IvshmemClientArgs; diff --git a/contrib/ivshmem-server/ivshmem-server.c b/contrib/ivshmem-server/ivshmem-server.c index e58864d..0afa6e8 100644 --- a/contrib/ivshmem-server/ivshmem-server.c +++ b/contrib/ivshmem-server/ivshmem-server.c @@ -6,23 +6,13 @@ * top-level directory. */ -#include -#include -#include -#include -#include -#include -#include -#include -#include - #include -#include #include #include #include #include +#include "qemu-common.h" #include "qemu/queue.h" #include "ivshmem-server.h" @@ -246,7 +236,7 @@ ivshmem_ftruncate(int fd, unsigned shmsize) int ivshmem_server_init(IvshmemServer *server, const char *unix_sock_path, const char *shm_path, size_t shm_size, unsigned n_vectors, -int verbose) +bool verbose) { memset(server, 0, sizeof(*server)); diff --git a/contrib/ivshmem-server/ivshmem-server.h b/contrib/ivshmem-server/ivshmem-server.h index acd17a8..5ccc7af 100644 --- a/contrib/ivshmem-server/ivshmem-server.h +++ b/contrib/ivshmem-server/ivshmem-server.h @@ -69,7 +69,7 @@ typedef struct IvshmemServer { int shm_fd; /**< shm file descriptor */ unsigned n_vectors; /**< number of vectors */ long cur_id; /**< id to be given to next client */ -int verbose; /**< true in verbose mode */ +bool verbose;/**< true in verbose mode */ IvshmemServerPeerList peer_list; /**< list of peers */ } IvshmemServer; @@ -97,7 +97,7 @@ typedef struct IvshmemServer { int ivshmem_server_init(IvshmemServer *server, const char *unix_sock_path, const char *shm_path, size_t shm_size, unsigned n_vectors, -int verbose); +bool verbose); /** * Open the shm, then create and bind to the unix socket diff --git a/contrib/ivshmem-server/main.c b/contrib/ivshmem-server/main.c index a4504c3..f00e6f9 100644 --- a/contrib/ivshmem-server/main.c +++ b/contrib/ivshmem-server/main.c @@ -6,17 +6,7 @@ * top-level directory. */ -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include +#include "qemu-common.h" #include "ivshmem-server.h" @@ -30,8
Re: [PATCH 0/3] fix bugs with stale or corrupt MMIO caches
Il 29/08/2014 12:31, Paolo Bonzini ha scritto: > David and Xiao, here's my take on the MMIO generation patches. Now > with documentation, too. :) Please review! > > David Matlack (2): > kvm: fix potentially corrupt mmio cache > kvm: x86: fix stale mmio cache bug > > Paolo Bonzini (1): > KVM: do not bias the generation number in kvm_current_mmio_generation > > Documentation/virtual/kvm/mmu.txt | 14 ++ > arch/x86/include/asm/kvm_host.h | 1 + > arch/x86/kvm/mmu.c| 29 ++--- > arch/x86/kvm/x86.h| 20 +++- > virt/kvm/kvm_main.c | 30 +++--- > 5 files changed, 67 insertions(+), 27 deletions(-) > Ping? Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofits correspondingioapic->irr bit always set
On Thu, Aug 28, 2014 at 08:55:18PM +0800, Zhang Haoyu wrote: > Hi Jason, > I tested below patch, it's okay, the e1000 interrupt storm disappeared. > But I am going to make a bit change on it, could you help review it? > > >Currently, we call ioapic_service() immediately when we find the irq is still > >active during eoi broadcast. But for real hardware, there's some dealy > >between > >the EOI writing and irq delivery (system bus latency?). So we need to emulate > >this behavior. Otherwise, for a guest who haven't register a proper irq > >handler > >, it would stay in the interrupt routine as this irq would be re-injected > >immediately after guest enables interrupt. This would lead guest can't move > >forward and may miss the possibility to get proper irq handler registered > >(one > >example is windows guest resuming from hibernation). > > > >As there's no way to differ the unhandled irq from new raised ones, this > >patch > >solve this problems by scheduling a delayed work when the count of irq > >injected > >during eoi broadcast exceeds a threshold value. After this patch, the guest > >can > >move a little forward when there's no suitable irq handler in case it may > >register one very soon and for guest who has a bad irq detection routine ( > >such > >as note_interrupt() in linux ), this bad irq would be recognized soon as in > >the > >past. > > > >Signed-off-by: Jason Wang redhat.com> > >--- > > virt/kvm/ioapic.c | 47 +-- > > virt/kvm/ioapic.h |2 ++ > > 2 files changed, 47 insertions(+), 2 deletions(-) > > > >diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c > >index dcaf272..892253e 100644 > >--- a/virt/kvm/ioapic.c > >+++ b/virt/kvm/ioapic.c > > -221,6 +221,24 int kvm_ioapic_set_irq(struct > > kvm_ioapic *ioapic, int irq, int level) > > return ret; > > } > > > >+static void kvm_ioapic_eoi_inject_work(struct work_struct *work) > >+{ > >+int i, ret; > >+struct kvm_ioapic *ioapic = container_of(work, struct kvm_ioapic, > >+ eoi_inject.work); > >+spin_lock(&ioapic->lock); > >+for (i = 0; i < IOAPIC_NUM_PINS; i++) { > >+union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i]; > >+ > >+if (ent->fields.trig_mode != IOAPIC_LEVEL_TRIG) > >+continue; > >+ > >+if (ioapic->irr & (1 << i) && !ent->fields.remote_irr) > >+ret = ioapic_service(ioapic, i); > >+} > >+spin_unlock(&ioapic->lock); > >+} > >+ > > static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector, > > int trigger_mode) > > { > > -249,8 +267,29 static void > > __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector, > > > > ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG); > > ent->fields.remote_irr = 0; > >-if (!ent->fields.mask && (ioapic->irr & (1 << i))) > >-ioapic_service(ioapic, i); > >+if (!ent->fields.mask && (ioapic->irr & (1 << i))) { > >+++ioapic->irq_eoi; > -+++ioapic->irq_eoi; > ++++ioapic->irq_eoi[i]; > >+if (ioapic->irq_eoi == 100) { > -+if (ioapic->irq_eoi == 100) { > ++if (ioapic->irq_eoi[i] == 100) { > >+/* > >+ * Real hardware does not deliver the irq so > >+ * immediately during eoi broadcast, so we need > >+ * to emulate this behavior. Otherwise, for > >+ * guests who has not registered handler of a > >+ * level irq, this irq would be injected > >+ * immediately after guest enables interrupt > >+ * (which happens usually at the end of the > >+ * common interrupt routine). This would lead > >+ * guest can't move forward and may miss the > >+ * possibility to get proper irq handler > >+ * registered. So we need to give some breath to > >+ * guest. TODO: 1 is too long? > >+ */ > >+schedule_delayed_work(&ioapic->eoi_inject, 1); > >+ioapic->irq_eoi = 0; > -+ioapic->irq_eoi = 0; > ++ioapic->irq_eoi[i] = 0; > >+} else { > >+ioapic_service(ioapic, i); > >+} > >+} > ++else { > ++ioapic->irq_eoi[i] = 0; > ++} > > } > > } > I think ioapic->irq_eoi is prone to reach to 100, because during a eoi > broadcast, > it's possible that another interrupt's (not
Re: [PATCH v4 4/6] kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest().
On Wed, Aug 27, 2014 at 06:17:39PM +0800, Tang Chen wrote: > apic access page is pinned in memory. As a result, it cannot be > migrated/hot-removed. > Actually, it is not necessary to be pinned. > > The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer. When > the page is migrated, kvm_mmu_notifier_invalidate_page() will invalidate the > corresponding ept entry. This patch introduces a new vcpu request named > KVM_REQ_APIC_PAGE_RELOAD, and makes this request to all the vcpus at this > time, > and force all the vcpus exit guest, and re-enter guest till they updates the > VMCS > APIC_ACCESS_ADDR pointer to the new apic access page address, and updates > kvm->arch.apic_access_page to the new page. > > Signed-off-by: Tang Chen > --- > arch/x86/include/asm/kvm_host.h | 1 + > arch/x86/kvm/svm.c | 6 ++ > arch/x86/kvm/vmx.c | 6 ++ > arch/x86/kvm/x86.c | 15 +++ > include/linux/kvm_host.h| 2 ++ > virt/kvm/kvm_main.c | 12 > 6 files changed, 42 insertions(+) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 35171c7..514183e 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -739,6 +739,7 @@ struct kvm_x86_ops { > void (*hwapic_isr_update)(struct kvm *kvm, int isr); > void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap); > void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set); > + void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa); > void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector); > void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu); > int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c > index 1d941ad..f2eacc4 100644 > --- a/arch/x86/kvm/svm.c > +++ b/arch/x86/kvm/svm.c > @@ -3619,6 +3619,11 @@ static void svm_set_virtual_x2apic_mode(struct > kvm_vcpu *vcpu, bool set) > return; > } > > +static void svm_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa) > +{ > + return; > +} > + > static int svm_vm_has_apicv(struct kvm *kvm) > { > return 0; > @@ -4373,6 +4378,7 @@ static struct kvm_x86_ops svm_x86_ops = { > .enable_irq_window = enable_irq_window, > .update_cr8_intercept = update_cr8_intercept, > .set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode, > + .set_apic_access_page_addr = svm_set_apic_access_page_addr, > .vm_has_apicv = svm_vm_has_apicv, > .load_eoi_exitmap = svm_load_eoi_exitmap, > .hwapic_isr_update = svm_hwapic_isr_update, > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 63c4c3e..da6d55d 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -7093,6 +7093,11 @@ static void vmx_set_virtual_x2apic_mode(struct > kvm_vcpu *vcpu, bool set) > vmx_set_msr_bitmap(vcpu); > } > > +static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa) > +{ > + vmcs_write64(APIC_ACCESS_ADDR, hpa); > +} > + > static void vmx_hwapic_isr_update(struct kvm *kvm, int isr) > { > u16 status; > @@ -8910,6 +8915,7 @@ static struct kvm_x86_ops vmx_x86_ops = { > .enable_irq_window = enable_irq_window, > .update_cr8_intercept = update_cr8_intercept, > .set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode, > + .set_apic_access_page_addr = vmx_set_apic_access_page_addr, > .vm_has_apicv = vmx_vm_has_apicv, > .load_eoi_exitmap = vmx_load_eoi_exitmap, > .hwapic_irr_update = vmx_hwapic_irr_update, > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index e05bd58..96f4188 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -5989,6 +5989,19 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu) > kvm_apic_update_tmr(vcpu, tmr); > } > > +static void vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu) > +{ > + /* > + * apic access page could be migrated. When the page is being migrated, > + * GUP will wait till the migrate entry is replaced with the new pte > + * entry pointing to the new page. > + */ > + vcpu->kvm->arch.apic_access_page = gfn_to_page(vcpu->kvm, > + APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT); > + kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm, > + page_to_phys(vcpu->kvm->arch.apic_access_page)); I am a little bit worried that here all vcpus write to vcpu->kvm->arch.apic_access_page without any locking. It is probably benign since pointer write is atomic on x86. Paolo? Do we even need apic_access_page? Why not call gfn_to_page(APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT) put_page() on rare occasions we need to know its address? > +} > + > /* > * Returns 1 to let __vcpu_run() continue the guest execution loop without > * exiting to the userspace. Otherwise, the value will be returned to the > @@ -6049
Re: [PATCH 0/4] KVM: nested x86: nested page faults fixes
Hi Paolo, On 02.09.2014 21:13, Paolo Bonzini wrote: Patches 3 and 4 fix the interaction between emulator and nested EPT/NPT, which was reported by Valentine. I can confirm the initial bug I observed is fixed with these patches (applied to 3.16.1). All tests in kvm-unit-test's master also pass, except for ioio which is (probably) affected by another (unrelated) bug fixed by Jan back in June but not in 3.16 [1,2]. 1. http://thread.gmane.org/gmane.comp.emulators.kvm.devel/124071/ 2. http://thread.gmane.org/gmane.comp.emulators.kvm.devel/124067/ Thanks, Valentine -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v6 15/20] vfio/platform: support for maskable and automasked interrupts
On Sun, Jun 8, 2014 at 12:17 PM, Christoffer Dall wrote: > On Thu, Jun 05, 2014 at 07:03:23PM +0200, Antonios Motakis wrote: >> Adds support to mask interrupts, and also for automasked interrupts. >> Level sensitive interrupts are exposed as automasked interrupts and >> are masked and disabled automatically when they fire. >> >> Signed-off-by: Antonios Motakis >> --- >> drivers/vfio/platform/vfio_platform_irq.c | 112 >> -- >> drivers/vfio/platform/vfio_platform_private.h | 2 + >> 2 files changed, 109 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/vfio/platform/vfio_platform_irq.c >> b/drivers/vfio/platform/vfio_platform_irq.c >> index d79f5af..10dfbf0 100644 >> --- a/drivers/vfio/platform/vfio_platform_irq.c >> +++ b/drivers/vfio/platform/vfio_platform_irq.c >> @@ -51,9 +51,17 @@ int vfio_platform_irq_init(struct vfio_platform_device >> *vdev) >> if (hwirq < 0) >> goto err; >> >> - vdev->irq[i].flags = VFIO_IRQ_INFO_EVENTFD; >> + spin_lock_init(&vdev->irq[i].lock); >> + >> + vdev->irq[i].flags = VFIO_IRQ_INFO_EVENTFD >> + | VFIO_IRQ_INFO_MASKABLE; >> + >> + if (irq_get_trigger_type(hwirq) & IRQ_TYPE_LEVEL_MASK) >> + vdev->irq[i].flags |= VFIO_IRQ_INFO_AUTOMASKED; > > This seems to rely on the fact that you had actually loaded a driver for > your device to set the right type. Is this assumption always correct? > > It seems to me that this configuration bit should now be up to your user > space drive who is the best candidate to know details about your device > at this point? > Hm, I see this type being set usually either in a device tree source, or in the support code for a specific platform. Are there any situations where this is actually set by the driver? If I understand right this is not the case for the PL330, but if it is possible that it is the case for another device then I need to rethink this. Though as far as I understand this should not be the case. >> + >> vdev->irq[i].count = 1; >> vdev->irq[i].hwirq = hwirq; >> + vdev->irq[i].masked = false; >> } >> >> vdev->num_irqs = cnt; >> @@ -77,11 +85,27 @@ void vfio_platform_irq_cleanup(struct >> vfio_platform_device *vdev) >> >> static irqreturn_t vfio_irq_handler(int irq, void *dev_id) >> { >> - struct eventfd_ctx *trigger = dev_id; >> + struct vfio_platform_irq *irq_ctx = dev_id; >> + unsigned long flags; >> + int ret = IRQ_NONE; >> + >> + spin_lock_irqsave(&irq_ctx->lock, flags); >> + >> + if (!irq_ctx->masked) { >> + ret = IRQ_HANDLED; >> + >> + if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED) { >> + disable_irq_nosync(irq_ctx->hwirq); >> + irq_ctx->masked = true; >> + } >> + } >> >> - eventfd_signal(trigger, 1); >> + spin_unlock_irqrestore(&irq_ctx->lock, flags); >> >> - return IRQ_HANDLED; >> + if (ret == IRQ_HANDLED) >> + eventfd_signal(irq_ctx->trigger, 1); >> + >> + return ret; >> } >> >> static int vfio_set_trigger(struct vfio_platform_device *vdev, >> @@ -162,6 +186,82 @@ static int vfio_platform_set_irq_trigger(struct >> vfio_platform_device *vdev, >> return -EFAULT; >> } >> >> +static int vfio_platform_set_irq_unmask(struct vfio_platform_device *vdev, >> + unsigned index, unsigned start, >> + unsigned count, uint32_t flags, void *data) >> +{ >> + uint8_t arr; > > > arr? arr for array! As in, the VFIO API allows an array of IRQs. However for platform devices we don't use this, each IRQ is exposed independently as an array of 1 IRQ. > >> + >> + if (start != 0 || count != 1) >> + return -EINVAL; >> + >> + switch (flags & VFIO_IRQ_SET_DATA_TYPE_MASK) { >> + case VFIO_IRQ_SET_DATA_BOOL: >> + if (copy_from_user(&arr, data, sizeof(uint8_t))) >> + return -EFAULT; >> + >> + if (arr != 0x1) >> + return -EINVAL; > > why the fallthrough, what's this about? The VFIO API allows to unmask/mask an array of IRQs, however with platform devices we only have arrays of 1 IRQ (so not really arrays). So if the user uses VFIO_IRQ_SET_DATA_BOOL, we need to check that arr == 0x1. When that is the case, a fallthrough to the same code for VFIO_IRQ_SET_DATA_NONE is safe. If that is not readable enough, then I can add a comment or duplicate the code that does the unmasking. I realize that if you don't know the VFIO API well, then this can look confusing. > >> + >> + case VFIO_IRQ_SET_DATA_NONE: >> + >> + spin_lock_irq(&vdev->irq[index].lock); >> + >> + if (vdev->irq[index].masked) { >> + enable_irq(vdev->irq[index].hwirq); >> + vdev->irq[index].masked = false;
Re: [RFC PATCH v6 14/20] vfio/platform: initial interrupts support
On Sun, Jun 8, 2014 at 12:09 PM, Christoffer Dall wrote: > > On Thu, Jun 05, 2014 at 07:03:22PM +0200, Antonios Motakis wrote: > > This patch allows to set an eventfd for a patform device's interrupt, > > and also to trigger the interrupt eventfd from userspace for testing. > > > > Signed-off-by: Antonios Motakis > > --- > > drivers/vfio/platform/vfio_platform.c | 36 ++- > > drivers/vfio/platform/vfio_platform_irq.c | 130 > > +- > > drivers/vfio/platform/vfio_platform_private.h | 7 ++ > > 3 files changed, 169 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/vfio/platform/vfio_platform.c > > b/drivers/vfio/platform/vfio_platform.c > > index 192291c..f4c06c6 100644 > > --- a/drivers/vfio/platform/vfio_platform.c > > +++ b/drivers/vfio/platform/vfio_platform.c > > @@ -177,10 +177,40 @@ static long vfio_platform_ioctl(void *device_data, > > > > return copy_to_user((void __user *)arg, &info, minsz); > > > > - } else if (cmd == VFIO_DEVICE_SET_IRQS) > > - return -EINVAL; > > + } else if (cmd == VFIO_DEVICE_SET_IRQS) { > > + struct vfio_irq_set hdr; > > + int ret = 0; > > + > > + minsz = offsetofend(struct vfio_irq_set, count); > > + > > + if (copy_from_user(&hdr, (void __user *)arg, minsz)) > > + return -EFAULT; > > + > > + if (hdr.argsz < minsz) > > + return -EINVAL; > > + > > + if (hdr.index >= vdev->num_irqs) > > + return -EINVAL; > > + > > + if (hdr.start != 0 || hdr.count > 1) > > + return -EINVAL; > > + > > + if (hdr.count == 0 && > > + (!(hdr.flags & VFIO_IRQ_SET_DATA_NONE) || > > + !(hdr.flags & VFIO_IRQ_SET_ACTION_TRIGGER))) > > + return -EINVAL; > > + > > + if (hdr.flags & ~(VFIO_IRQ_SET_DATA_TYPE_MASK | > > + VFIO_IRQ_SET_ACTION_TYPE_MASK)) > > + return -EINVAL; > > + > > + ret = vfio_platform_set_irqs_ioctl(vdev, hdr.flags, hdr.index, > > +hdr.start, hdr.count, > > +(void *)arg+minsz); > > + > > + return ret; > > > > - else if (cmd == VFIO_DEVICE_RESET) > > + } else if (cmd == VFIO_DEVICE_RESET) > > return -EINVAL; > > > > return -ENOTTY; > > diff --git a/drivers/vfio/platform/vfio_platform_irq.c > > b/drivers/vfio/platform/vfio_platform_irq.c > > index 22c214f..d79f5af 100644 > > --- a/drivers/vfio/platform/vfio_platform_irq.c > > +++ b/drivers/vfio/platform/vfio_platform_irq.c > > @@ -31,6 +31,9 @@ > > > > #include "vfio_platform_private.h" > > > > +static int vfio_set_trigger(struct vfio_platform_device *vdev, > > + int index, int fd); > > + > > int vfio_platform_irq_init(struct vfio_platform_device *vdev) > > { > > int cnt = 0, i; > > @@ -43,17 +46,142 @@ int vfio_platform_irq_init(struct vfio_platform_device > > *vdev) > > return -ENOMEM; > > > > for (i = 0; i < cnt; i++) { > > - vdev->irq[i].flags = 0; > > + int hwirq = platform_get_irq(vdev->pdev, i); > > + > > + if (hwirq < 0) > > + goto err; > > + > > + vdev->irq[i].flags = VFIO_IRQ_INFO_EVENTFD; > > vdev->irq[i].count = 1; > > + vdev->irq[i].hwirq = hwirq; > > } > > > > vdev->num_irqs = cnt; > > > > return 0; > > +err: > > + kfree(vdev->irq); > > + return -EINVAL; > > } > > > > void vfio_platform_irq_cleanup(struct vfio_platform_device *vdev) > > { > > + int i; > > + > > + for (i = 0; i < vdev->num_irqs; i++) > > + vfio_set_trigger(vdev, i, -1); > > + > > vdev->num_irqs = 0; > > kfree(vdev->irq); > > } > > + > > +static irqreturn_t vfio_irq_handler(int irq, void *dev_id) > > +{ > > + struct eventfd_ctx *trigger = dev_id; > > + > > + eventfd_signal(trigger, 1); > > + > > + return IRQ_HANDLED; > > +} > > + > > +static int vfio_set_trigger(struct vfio_platform_device *vdev, > > + int index, int fd) > > +{ > > + struct vfio_platform_irq *irq = &vdev->irq[index]; > > + struct eventfd_ctx *trigger; > > + int ret; > > + > > + if (irq->trigger) { > > + free_irq(irq->hwirq, irq); > > + kfree(irq->name); > > + eventfd_ctx_put(irq->trigger); > > + irq->trigger = NULL; > > + } > > this feels incredibly racy, is some lock protecting this access? > Good catch; there should have been a mutex earlier protecting it, but it is missing. Thanks. > -Christoffer -- Antonios Motakis Virtual Open Systems -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kerne
[PATCH] powerpc/kvm/cma: Fix panic introduces by signed shift operation
fc95ca7284bc54953165cba76c3228bd2cdb9591 introduces a memset in kvmppc_alloc_hpt since the general CMA doesn't clear the memory it allocates. However, the size argument passed to memset is computed from a signed value and its signed bit is extended by the cast the compiler is doing. This lead to extremely large size value when dealing with order value >= 31, and almost all the memory following the allocated space is cleaned. As a consequence, the system is panicing and may even fail spawning the kdump kernel. This fix makes use of an unsigned value for the memset's size argument to avoid sign extension. Among this fix, another shift operation which may lead to signed extended value too is also fixed. Cc: Alexey Kardashevskiy Cc: Paul Mackerras Cc: Alexander Graf Cc: Aneesh Kumar K.V Cc: Joonsoo Kim Cc: Benjamin Herrenschmidt Signed-off-by: Laurent Dufour --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 72c20bb16d26..79294c4c5015 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -62,10 +62,10 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp) } kvm->arch.hpt_cma_alloc = 0; - page = kvm_alloc_hpt(1 << (order - PAGE_SHIFT)); + page = kvm_alloc_hpt(1ul << (order - PAGE_SHIFT)); if (page) { hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page)); - memset((void *)hpt, 0, (1 << order)); + memset((void *)hpt, 0, (1ul << order)); kvm->arch.hpt_cma_alloc = 1; } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests 1/3] x86: svm: fix exitinfo values for NPT tests
On Tue, Sep 02, 2014 at 05:05:26PM +0200, Paolo Bonzini wrote: > The exitinfo values were plain wrong for the page-walk tests > (including npt_rsvd), or else they were missing bits 32:33. > Expect the right values. Are bits 32:33 really emulated? IIRC they were not emulated in the inital implementation, and they are missing in some hardware NPT implementations as well. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] KVM: nSVM: propagate the NPF EXITINFO to the guest
Ah, here you add emulation of these bits. On Tue, Sep 02, 2014 at 05:13:48PM +0200, Paolo Bonzini wrote: > This is similar to what the EPT code does with the exit qualification. > This allows the guest to see a valid value for bits 33:32. > > Signed-off-by: Paolo Bonzini > --- > arch/x86/kvm/paging_tmpl.h | 6 ++ > arch/x86/kvm/svm.c | 26 ++ > 2 files changed, 28 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h > index 410776528265..99d4c4e836a0 100644 > --- a/arch/x86/kvm/paging_tmpl.h > +++ b/arch/x86/kvm/paging_tmpl.h > @@ -322,8 +322,14 @@ retry_walk: > > real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn), > PFERR_USER_MASK|PFERR_WRITE_MASK); > + > + /* > + * Can this happen (except if the guest is playing TOCTTOU > games)? > + * We should have gotten a nested page fault on table_gfn > instead. > + */ Comment is true, but doesn't make the check below obsolete, no? > if (unlikely(real_gfn == UNMAPPED_GVA)) > goto error; > @@ -1974,10 +1974,28 @@ static void nested_svm_inject_npf_exit(struct > kvm_vcpu *vcpu, > { > struct vcpu_svm *svm = to_svm(vcpu); > > - svm->vmcb->control.exit_code = SVM_EXIT_NPF; > - svm->vmcb->control.exit_code_hi = 0; > - svm->vmcb->control.exit_info_1 = fault->error_code; > - svm->vmcb->control.exit_info_2 = fault->address; > + /* > + * We can keep the value that the processor stored in the VMCB, > + * but make up something sensible if we hit the WARN. > + */ > + if (WARN_ON(svm->vmcb->control.exit_code != SVM_EXIT_NPF)) { > + svm->vmcb->control.exit_code = SVM_EXIT_NPF; > + svm->vmcb->control.exit_code_hi = 0; > + svm->vmcb->control.exit_info_1 = (1ULL << 32); > + svm->vmcb->control.exit_info_2 = fault->address; > + } Its been a while since I looked into this, but is an injected NPF exit always the result of a real NPF exit? How about an io-port emulated on L1 but passed through to L2 by the nested hypervisor. On emulation of INS or OUTS, KVM would need to read/write to an L2 address space, maybe causing NPF faults to be injected. In this case an IOIO exit would cause an injected NPF exit for L1. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] kvm: fix potentially corrupt mmio cache
On Fri, Aug 29, 2014 at 3:31 AM, Paolo Bonzini wrote: > From: David Matlack > > vcpu exits and memslot mutations can run concurrently as long as the > vcpu does not aquire the slots mutex. Thus it is theoretically possible > for memslots to change underneath a vcpu that is handling an exit. > > If we increment the memslot generation number again after > synchronize_srcu_expedited(), vcpus can safely cache memslot generation > without maintaining a single rcu_dereference through an entire vm exit. > And much of the x86/kvm code does not maintain a single rcu_dereference > of the current memslots during each exit. > > We can prevent the following case: > >vcpu (CPU 0) | thread (CPU 1) > +-- > 1 vm exit | > 2 srcu_read_unlock(&kvm->srcu) | > 3 decide to cache something based on | > old memslots | > 4 | change memslots > | (increments generation) > 5 | synchronize_srcu(&kvm->srcu); > 6 retrieve generation # from new memslots | > 7 tag cache with new memslot generation| > 8 srcu_read_unlock(&kvm->srcu) | > ... | > though the caching decision was based | > on the old memslots>| > ... | > memslot generation change, which may| > be never> | > | > > By incrementing the generation after synchronizing with kvm->srcu readers, > we ensure that the generation retrieved in (6) will become invalid soon > after (8). > > Keeping the existing increment is not strictly necessary, but we > do keep it and just move it for consistency from update_memslots to > install_new_memslots. It invalidates old cached MMIOs immediately, > instead of having to wait for the end of synchronize_srcu_expedited, > which makes the code more clearly correct in case CPU 1 is preempted > right after synchronize_srcu() returns. > > To avoid halving the generation space in SPTEs, always presume that the > low bit of the generation is zero when reconstructing a generation number > out of an SPTE. This effectively disables MMIO caching in SPTEs during > the call to synchronize_srcu_expedited. Using the low bit this way is > somewhat like a seqcount---where the protected thing is a cache, and > instead of retrying we can simply punt if we observe the low bit to be 1. > > Cc: sta...@vger.kernel.org > Cc: Xiao Guangrong > Signed-off-by: David Matlack > Signed-off-by: Paolo Bonzini > --- > Documentation/virtual/kvm/mmu.txt | 14 ++ > arch/x86/kvm/mmu.c| 20 > virt/kvm/kvm_main.c | 23 --- > 3 files changed, 42 insertions(+), 15 deletions(-) > > diff --git a/Documentation/virtual/kvm/mmu.txt > b/Documentation/virtual/kvm/mmu.txt > index 290894176142..53838d9c6295 100644 > --- a/Documentation/virtual/kvm/mmu.txt > +++ b/Documentation/virtual/kvm/mmu.txt > @@ -425,6 +425,20 @@ fault through the slow path. > Since only 19 bits are used to store generation-number on mmio spte, all > pages are zapped when there is an overflow. > > +Unfortunately, a single memory access might access kvm_memslots(kvm) multiple > +times, the last one happening when the generation number is retrieved and > +stored into the MMIO spte. Thus, the MMIO spte might be created based on > +out-of-date information, but with an up-to-date generation number. > + > +To avoid this, the generation number is incremented again after > synchronize_srcu > +returns; thus, the low bit of kvm_memslots(kvm)->generation is only 1 during > a > +memslot update, while some SRCU readers might be using the old copy. We do > not > +want to use an MMIO sptes created with an odd generation number, and we can > do > +this without losing a bit in the MMIO spte. The low bit of the generation > +is not stored in MMIO spte, and presumed zero when it is extracted out of the > +spte. If KVM is unlucky and creates an MMIO spte while the low bit is 1, > +the next access to the spte will always be a cache miss. > + > > Further reading > === > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index 323c3f5f5c84..96515957ba82 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -199,16 +199,20 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask) > EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask); > > /* > - * spte bits of bit 3 ~ bit 11 are used as low 9 bits of generation number, > - * the bits of bits 52 ~ bit 61 are used as high 10 bits of generation > - * number. > + * the low bit of the generation number is always presumed to be zero. > + * This disables mmio cachin
Re: [PATCH 2/4] KVM: nSVM: propagate the NPF EXITINFO to the guest
Il 02/09/2014 18:33, Joerg Roedel ha scritto: > Ah, here you add emulation of these bits. > > On Tue, Sep 02, 2014 at 05:13:48PM +0200, Paolo Bonzini wrote: >> This is similar to what the EPT code does with the exit qualification. >> This allows the guest to see a valid value for bits 33:32. >> >> Signed-off-by: Paolo Bonzini >> --- >> arch/x86/kvm/paging_tmpl.h | 6 ++ >> arch/x86/kvm/svm.c | 26 ++ >> 2 files changed, 28 insertions(+), 4 deletions(-) >> >> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h >> index 410776528265..99d4c4e836a0 100644 >> --- a/arch/x86/kvm/paging_tmpl.h >> +++ b/arch/x86/kvm/paging_tmpl.h >> @@ -322,8 +322,14 @@ retry_walk: >> >> real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn), >>PFERR_USER_MASK|PFERR_WRITE_MASK); >> + >> +/* >> + * Can this happen (except if the guest is playing TOCTTOU >> games)? >> + * We should have gotten a nested page fault on table_gfn >> instead. >> + */ > > Comment is true, but doesn't make the check below obsolete, no? No, it doesn't. I'll rewrite it as /* * This cannot happen unless the guest is playing TOCTTOU games, * because we would have gotten a nested page fault on table_gfn * instead. If this happens, the exit qualification / exit info * field will incorrectly have "guest page access" as the * nested page fault's cause, instead of "guest page structure * access". */ >> if (unlikely(real_gfn == UNMAPPED_GVA)) >> goto error; >> @@ -1974,10 +1974,28 @@ static void nested_svm_inject_npf_exit(struct >> kvm_vcpu *vcpu, >> { >> struct vcpu_svm *svm = to_svm(vcpu); >> >> -svm->vmcb->control.exit_code = SVM_EXIT_NPF; >> -svm->vmcb->control.exit_code_hi = 0; >> -svm->vmcb->control.exit_info_1 = fault->error_code; >> -svm->vmcb->control.exit_info_2 = fault->address; >> +/* >> + * We can keep the value that the processor stored in the VMCB, >> + * but make up something sensible if we hit the WARN. >> + */ >> +if (WARN_ON(svm->vmcb->control.exit_code != SVM_EXIT_NPF)) { >> +svm->vmcb->control.exit_code = SVM_EXIT_NPF; >> +svm->vmcb->control.exit_code_hi = 0; >> +svm->vmcb->control.exit_info_1 = (1ULL << 32); >> +svm->vmcb->control.exit_info_2 = fault->address; >> +} > > Its been a while since I looked into this, but is an injected NPF exit > always the result of a real NPF exit? I think so, but that's why I CCed you. :) > How about an io-port emulated on > L1 but passed through to L2 by the nested hypervisor. On emulation of > INS or OUTS, KVM would need to read/write to an L2 address space, It would need to read/write to *L1* (that's where the VMCB's IOIO map lies), which could result into a regular page fault injected into L1. Paolo > maybe > causing NPF faults to be injected. In this case an IOIO exit would cause > an injected NPF exit for L1. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] fix bugs with stale or corrupt MMIO caches
On Tue, Sep 2, 2014 at 8:42 AM, Paolo Bonzini wrote: > Il 29/08/2014 12:31, Paolo Bonzini ha scritto: >> David and Xiao, here's my take on the MMIO generation patches. Now >> with documentation, too. :) Please review! >> >> David Matlack (2): >> kvm: fix potentially corrupt mmio cache >> kvm: x86: fix stale mmio cache bug >> >> Paolo Bonzini (1): >> KVM: do not bias the generation number in kvm_current_mmio_generation >> >> Documentation/virtual/kvm/mmu.txt | 14 ++ >> arch/x86/include/asm/kvm_host.h | 1 + >> arch/x86/kvm/mmu.c| 29 ++--- >> arch/x86/kvm/x86.h| 20 +++- >> virt/kvm/kvm_main.c | 30 +++--- >> 5 files changed, 67 insertions(+), 27 deletions(-) >> > > Ping? Sorry for the delay. I think the patches look good. And patch 3/3 still fixes the bug I was originally seeing, so I'm happy :). I just had one small comment (see my reply to patch 2/3). > > Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] kvm: fix potentially corrupt mmio cache
Il 02/09/2014 18:44, David Matlack ha scritto: > > > > -#define MMIO_GEN_SHIFT 19 > > -#define MMIO_GEN_LOW_SHIFT 9 > > -#define MMIO_GEN_LOW_MASK ((1 << MMIO_GEN_LOW_SHIFT) - 1) > > +#define MMIO_GEN_SHIFT 20 > > +#define MMIO_GEN_LOW_SHIFT 10 > > +#define MMIO_GEN_LOW_MASK ((1 << MMIO_GEN_LOW_SHIFT) - 2) > > #define MMIO_GEN_MASK ((1 << MMIO_GEN_SHIFT) - 1) > > #define MMIO_MAX_GEN ((1 << MMIO_GEN_SHIFT) - 1) > > > > @@ -4428,7 +4432,7 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm) > > * The very rare case: if the generation-number is round, > > * zap all shadow pages. > > */ > > - if (unlikely(kvm_current_mmio_generation(kvm) >= MMIO_MAX_GEN)) { > > + if (unlikely(kvm_current_mmio_generation(kvm) == 0)) { > > This should be in patch 1/3. I don't think so. This change is not due to the removal of biasing in x86.c, but rather due to removing bit 0 from MMIO_GEN_LOW_MASK. I placed it here, because the previous test works just fine until bit 0 is removed from MMIO_GEN_LOW_MASK. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] fix bugs with stale or corrupt MMIO caches
Il 02/09/2014 18:47, David Matlack ha scritto: >> > Ping? > Sorry for the delay. I think the patches look good. And patch 3/3 still > fixes the bug I was originally seeing, so I'm happy :). I just had one > small comment (see my reply to patch 2/3). > I answered that question now. Can I add your Reviewed-by and, for patch 3, Tested-by? Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] KVM: nested x86: nested page faults fixes
Il 02/09/2014 18:02, Valentine Sinitsyn ha scritto: >> > I can confirm the initial bug I observed is fixed with these patches > (applied to 3.16.1). > > All tests in kvm-unit-test's master also pass, except for ioio which is > (probably) affected by another (unrelated) bug fixed by Jan back in June > but not in 3.16 [1,2]. > > 1. http://thread.gmane.org/gmane.comp.emulators.kvm.devel/124071/ > 2. http://thread.gmane.org/gmane.comp.emulators.kvm.devel/124067/ Indeed, ioio also hangs for me in 3.16 (not in 3.17). Thanks, I'll add your Tested-by. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] KVM: nSVM: propagate the NPF EXITINFO to the guest
Il 02/09/2014 18:46, Paolo Bonzini ha scritto: >> > How about an io-port emulated on >> > L1 but passed through to L2 by the nested hypervisor. On emulation of >> > INS or OUTS, KVM would need to read/write to an L2 address space, > It would need to read/write to *L1* (that's where the VMCB's IOIO map > lies), which could result into a regular page fault injected into L1. Nevermind, the VMCB's IO bitmap is a guest physical address. I see what you mean now. nested_svm_intercept_ioio would return NESTED_EXIT_HOST and proceed to emulate the read/write. This could indeed cause a NPF. For now I'll remove the WARN_ON. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] KVM: nSVM: propagate the NPF EXITINFO to the guest
On Tue, Sep 02, 2014 at 06:46:06PM +0200, Paolo Bonzini wrote: > Il 02/09/2014 18:33, Joerg Roedel ha scritto: > > Comment is true, but doesn't make the check below obsolete, no? > > No, it doesn't. I'll rewrite it as > > /* >* This cannot happen unless the guest is playing TOCTTOU games, >* because we would have gotten a nested page fault on table_gfn >* instead. If this happens, the exit qualification / exit info >* field will incorrectly have "guest page access" as the >* nested page fault's cause, instead of "guest page structure >* access". >*/ Sounds good. > > Its been a while since I looked into this, but is an injected NPF exit > > always the result of a real NPF exit? > > I think so, but that's why I CCed you. :) > > > How about an io-port emulated on > > L1 but passed through to L2 by the nested hypervisor. On emulation of > > INS or OUTS, KVM would need to read/write to an L2 address space, > > It would need to read/write to *L1* (that's where the VMCB's IOIO map > lies), which could result into a regular page fault injected into L1. Hmm, INS and OUTS read/write to a virtual memory location, right? So if we have an io-port intercepted by bare-metal KVM but not by the L1 hypervisor, and the L2 guest accesses this io-port, bare-metal KVM would get an IOIO exit it has to emulate itself (it can't inject an IOIO exit to L1 anyway, since L1 does not intercept the io-port). During emulation the bare-metal KVM would read/write at an L2 virtual address, because that is the context the instruction is executed in. And while resolving this L2 virtual address, an NPF fault could be detected that needs to be injected into the L1 hypervisor, right? Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] kvm: fix potentially corrupt mmio cache
On Tue, Sep 2, 2014 at 9:49 AM, Paolo Bonzini wrote: > Il 02/09/2014 18:44, David Matlack ha scritto: >> > >> > -#define MMIO_GEN_SHIFT 19 >> > -#define MMIO_GEN_LOW_SHIFT 9 >> > -#define MMIO_GEN_LOW_MASK ((1 << MMIO_GEN_LOW_SHIFT) - 1) >> > +#define MMIO_GEN_SHIFT 20 >> > +#define MMIO_GEN_LOW_SHIFT 10 >> > +#define MMIO_GEN_LOW_MASK ((1 << MMIO_GEN_LOW_SHIFT) - 2) >> > #define MMIO_GEN_MASK ((1 << MMIO_GEN_SHIFT) - 1) >> > #define MMIO_MAX_GEN ((1 << MMIO_GEN_SHIFT) - 1) >> > >> > @@ -4428,7 +4432,7 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm) >> > * The very rare case: if the generation-number is round, >> > * zap all shadow pages. >> > */ >> > - if (unlikely(kvm_current_mmio_generation(kvm) >= MMIO_MAX_GEN)) { >> > + if (unlikely(kvm_current_mmio_generation(kvm) == 0)) { >> >> This should be in patch 1/3. > > I don't think so. This change is not due to the removal of biasing in > x86.c, but rather due to removing bit 0 from MMIO_GEN_LOW_MASK. > > I placed it here, because the previous test works just fine until bit 0 > is removed from MMIO_GEN_LOW_MASK. Ah ok, you're right. > > Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] fix bugs with stale or corrupt MMIO caches
On Tue, Sep 2, 2014 at 9:50 AM, Paolo Bonzini wrote: > Il 02/09/2014 18:47, David Matlack ha scritto: >>> > Ping? >> Sorry for the delay. I think the patches look good. And patch 3/3 still >> fixes the bug I was originally seeing, so I'm happy :). I just had one >> small comment (see my reply to patch 2/3). >> > > I answered that question now. Can I add your Reviewed-by and, for patch > 3, Tested-by? Please do. And thanks again for working on these! > > Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] KVM: nSVM: propagate the NPF EXITINFO to the guest
On 09/02/2014 07:46 PM, Paolo Bonzini wrote: */ if (unlikely(real_gfn == UNMAPPED_GVA)) goto error; @@ -1974,10 +1974,28 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu, { struct vcpu_svm *svm = to_svm(vcpu); - svm->vmcb->control.exit_code = SVM_EXIT_NPF; - svm->vmcb->control.exit_code_hi = 0; - svm->vmcb->control.exit_info_1 = fault->error_code; - svm->vmcb->control.exit_info_2 = fault->address; + /* +* We can keep the value that the processor stored in the VMCB, +* but make up something sensible if we hit the WARN. +*/ + if (WARN_ON(svm->vmcb->control.exit_code != SVM_EXIT_NPF)) { + svm->vmcb->control.exit_code = SVM_EXIT_NPF; + svm->vmcb->control.exit_code_hi = 0; + svm->vmcb->control.exit_info_1 = (1ULL << 32); + svm->vmcb->control.exit_info_2 = fault->address; + } Its been a while since I looked into this, but is an injected NPF exit always the result of a real NPF exit? I think so, but that's why I CCed you. :) It could always be the result of emulation into which L0 was tricked. I don't think it's a safe assumption. How about an io-port emulated on L1 but passed through to L2 by the nested hypervisor. On emulation of INS or OUTS, KVM would need to read/write to an L2 address space, It would need to read/write to *L1* (that's where the VMCB's IOIO map lies), which could result into a regular page fault injected into L1. Paolo maybe causing NPF faults to be injected. In this case an IOIO exit would cause an injected NPF exit for L1. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial
On Tue, Sep 2, 2014 at 10:36 AM, Amit Shah wrote: > On (Mon) 01 Sep 2014 [20:52:46], Zhang Haoyu wrote: >> >>> Hi, all >> >>> >> >>> I start a VM with virtio-serial (default ports number: 31), and found >> >>> that virtio-blk performance degradation happened, about 25%, this >> >>> problem can be reproduced 100%. >> >>> without virtio-serial: >> >>> 4k-read-random 1186 IOPS >> >>> with virtio-serial: >> >>> 4k-read-random 871 IOPS >> >>> >> >>> but if use max_ports=2 option to limit the max number of virio-serial >> >>> ports, then the IO performance degradation is not so serious, about 5%. >> >>> >> >>> And, ide performance degradation does not happen with virtio-serial. >> >> >> >>Pretty sure it's related to MSI vectors in use. It's possible that >> >>the virtio-serial device takes up all the avl vectors in the guests, >> >>leaving old-style irqs for the virtio-blk device. >> >> >> >I don't think so, >> >I use iometer to test 64k-read(or write)-sequence case, if I disable the >> >virtio-serial dynamically via device manager->virtio-serial => disable, >> >then the performance get promotion about 25% immediately, then I re-enable >> >the virtio-serial via device manager->virtio-serial => enable, >> >the performance got back again, very obvious. >> add comments: >> Although the virtio-serial is enabled, I don't use it at all, the >> degradation still happened. > > Using the vectors= option as mentioned below, you can restrict the > number of MSI vectors the virtio-serial device gets. You can then > confirm whether it's MSI that's related to these issues. > >> >So, I think it has no business with legacy interrupt mode, right? >> > >> >I am going to observe the difference of perf top data on qemu and perf kvm >> >stat data when disable/enable virtio-serial in guest, >> >and the difference of perf top data on guest when disable/enable >> >virtio-serial in guest, >> >any ideas? >> > >> >Thanks, >> >Zhang Haoyu >> >>If you restrict the number of vectors the virtio-serial device gets >> >>(using the -device virtio-serial-pci,vectors= param), does that make >> >>things better for you? > > > > Amit > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Can confirm serious degradation comparing to the 1.1 with regular serial output - I am able to hang VM forever after some tens of seconds after continuously printing dmest to the ttyS0. VM just ate all available CPU quota during test and hanged over some tens of seconds, not even responding to regular pings and progressively raising CPU consumption up to the limit. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial
On (Tue) 02 Sep 2014 [22:05:45], Andrey Korolyov wrote: > Can confirm serious degradation comparing to the 1.1 with regular > serial output - I am able to hang VM forever after some tens of > seconds after continuously printing dmest to the ttyS0. VM just ate > all available CPU quota during test and hanged over some tens of > seconds, not even responding to regular pings and progressively > raising CPU consumption up to the limit. Entirely different to what's being discussed here. You're observing slowdown with ttyS0 in the guest -- the isa-serial device. This thread is discussing virtio-blk and virtio-serial. Amit -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial
On Tue, Sep 2, 2014 at 10:11 PM, Amit Shah wrote: > On (Tue) 02 Sep 2014 [22:05:45], Andrey Korolyov wrote: > >> Can confirm serious degradation comparing to the 1.1 with regular >> serial output - I am able to hang VM forever after some tens of >> seconds after continuously printing dmest to the ttyS0. VM just ate >> all available CPU quota during test and hanged over some tens of >> seconds, not even responding to regular pings and progressively >> raising CPU consumption up to the limit. > > Entirely different to what's being discussed here. You're observing > slowdown with ttyS0 in the guest -- the isa-serial device. This > thread is discussing virtio-blk and virtio-serial. > > Amit Sorry for thread hijacking, the problem definitely not related to the interrupt rework, will start a new thread. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-unit-test failures
On 08/31/2014 11:05 AM, Paolo Bonzini wrote: > Il 29/08/2014 23:05, Chris J Arges ha scritto: >> And indeed there is a condition where matched && already_matched are >> both true. In this case we don't zero or increment nr_vcpus_matched_tsc. >> Incrementing nr_vcpus_matched_tsc in that last else clause allows the >> test to pass; however this is identical to the logic before the patch. > > Can you please trace the test using trace-cmd > (http://www.linux-kvm.org/page/Tracing) and send the output? > > Paolo > Paolo, I have posted the trace data here: http://people.canonical.com/~arges/kvm/trace.dat.xz Here is the output from the actual test case: enabling apic enabling apic kvm-clock: cpu 0, msr 0x:44d4c0 kvm-clock: cpu 0, msr 0x:44d4c0 Wallclock test, threshold 5 Seconds get from host: 1409687073 Seconds get from kvmclock: 1409333034 Offset:-354039 offset too large! Check the stability of raw cycle ... Worst warp -354462672821748 Total vcpus: 2 Test loops: 1000 Total warps: 1 Total stalls: 0 Worst warp: -354462672821748 Raw cycle is not stable Monotonic cycle test: Worst warp -354455286691490 Total vcpus: 2 Test loops: 1000 Total warps: 1 Total stalls: 0 Worst warp: -354455286691490 Measure the performance of raw cycle ... Total vcpus: 2 Test loops: 1000 TSC cycles: 1234719818 Measure the performance of adjusted cycle ... Total vcpus: 2 Test loops: 1000 TSC cycles: 1234750103 I also enabled dynamic debugging on arch/x86/kvm/x86.c (on a separate run): [354558.460867] kvm_get_time_scale: base_khz 100 => 2593993, shift 2, mul 2785278775 [354558.461034] kvm: new tsc generation 1, clock 0 [354558.461812] kvm_get_time_scale: base_khz 100 => 2593993, shift 2, mul 2785278775 [354558.461846] kvm: matched tsc offset for 0 [354558.477542] kvm: matched tsc offset for 0 [354558.477614] kvm: matched tsc offset for 0 [354558.513729] kvm: matched tsc offset for 0 [354558.541724] kvm: matched tsc offset for 0 [354558.741532] kvm_get_time_scale: base_khz 2593993 => 100, shift -1, mul 3311471770 [354558.742551] kvm_get_time_scale: base_khz 2593993 => 100, shift -1, mul 3311471770 Also I ensured that this was reproducible with the latest qemu, and these results were gathered with the latest version as of today. Any other places to look into? --chris j arges -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 01/14] contrib: add ivshmem client and server
On 09/02/2014 09:25 AM, David Marchand wrote: > When using ivshmem devices, notifications between guests can be sent as > interrupts using a ivshmem-server (typical use described in documentation). > The client is provided as a debug tool. > > Signed-off-by: Olivier Matz > Signed-off-by: David Marchand > --- > +/* parse the size of shm */ > +static int > +parse_size(const char *val_str, size_t *val) > +{ > +char *endptr; > +unsigned long long tmp; > + > +errno = 0; > +tmp = strtoull(val_str, &endptr, 0); > +if ((errno == ERANGE && tmp == ULLONG_MAX) || (errno != 0 && tmp == 0)) { > +return -1; > +} This is overly complex; usually, this is just done as: if (errno) return -1; > +/* parse an unsigned int */ > +static int > +parse_uint(const char *val_str, unsigned *val) > +{ > +char *endptr; > +unsigned long tmp; > + > +errno = 0; > +tmp = strtoul(val_str, &endptr, 0); > +if ((errno == ERANGE && tmp == ULONG_MAX) || (errno != 0 && tmp == 0)) { > +return -1; > +} and again > +if (endptr == val_str || endptr[0] != '\0') { > +return -1; > +} > +*val = tmp; > +return 0; > +} -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [PATCH v4 02/14] docs: update ivshmem device spec
On 09/02/2014 09:25 AM, David Marchand wrote: > Add some notes on the parts needed to use ivshmem devices: more specifically, > explain the purpose of an ivshmem server and the basic concept to use the > ivshmem devices in guests. > Move some parts of the documentation and re-organise it. > > Signed-off-by: David Marchand > Reviewed-by: Claudio Fontana > --- > docs/specs/ivshmem_device_spec.txt | 124 > +++- > 1 file changed, 93 insertions(+), 31 deletions(-) > > > -The device currently supports 4 registers of 32-bits each. Registers > -are used for synchronization between guests sharing the same memory object > when > -interrupts are supported (this requires using the shared memory server). > +The server must be started on the host before any guest. > +It creates a shared memory object then waits for clients to connect on an > unix > +socket. s/an unix/a unix/ > > -The server assigns each VM an ID number and sends this ID number to the QEMU > -process when the guest starts. > +For each client (QEMU processes) that connects to the server: s/processes/process/ > +The client IDs are limited to 16 bits because of the current implementation > (see > +Doorbell register in 'PCI device registers' subsection). Hence on 65536 > clients s/on/only/ > +At initialisation, when creating the ivshmem device, QEMU gets its ID from > the > +server then make it available through BAR0 IVPosition register for the VM to > use s/make/makes/ -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [PATCH v4 05/14] contrib/ivshmem-*: switch to QEMU headers
On 09/02/2014 09:25 AM, David Marchand wrote: > Reuse parsers from QEMU, C99 boolean. > > Signed-off-by: David Marchand > --- > contrib/ivshmem-client/ivshmem-client.c | 12 + > contrib/ivshmem-client/ivshmem-client.h |4 +- > contrib/ivshmem-client/main.c | 12 + > contrib/ivshmem-server/ivshmem-server.c | 14 +- > contrib/ivshmem-server/ivshmem-server.h |4 +- > contrib/ivshmem-server/main.c | 73 > +-- > 6 files changed, 20 insertions(+), 99 deletions(-) Why introduce code in patch 1 only to rip it back out in patch 5? Why not just squash these together and just introduce the contrib file correct on its first commit? > > case 'l': /* shm_size */ > -if (parse_size(optarg, &args->shm_size) < 0) { > +parse_option_size("shm_size", optarg, &args->shm_size, &errp); > +if (errp) { > +error_free(errp); > fprintf(stderr, "cannot parse shm size\n"); It would be nicer to print the contents of errp, instead of discarding what is likely to be a more specific error message. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [PATCH v4 00/14] ivshmem: update documentation, add client/server tools
On 09/02/2014 09:25 AM, David Marchand wrote: > Here is a patchset containing an update on ivshmem specs documentation and > importing ivshmem server and client tools. > These tools have been written from scratch and are not related to what is > available in nahanni repository. > I put them in contrib/ directory as the qemu-doc.texi was already telling the > server was supposed to be there. > > Changes since v3: > - first patch is untouched > - just restored the Reviewed-By Claudio in second patch > - following patches 3-8 take into account Stefan's comments > - patches 9-12 take into account Gonglei's comments > - patch 13 adjusts ivshmem-server default values > - last patch introduces a change in the ivshmem client-server protocol to > check a protocol version at connect time Rather than introducing new files with bugs, followed by patches to clean it up, why not just introduce the new files correct in the first place? I think you are better off squashing in a lot of the cleanup patches into patch 1. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [RFC v2 0/9] KVM-VFIO IRQ forward control
On Mon, 2014-09-01 at 14:52 +0200, Eric Auger wrote: > This RFC proposes an integration of "ARM: Forwarding physical > interrupts to a guest VM" (http://lwn.net/Articles/603514/) in > KVM. > > It enables to transform a VFIO platform driver IRQ into a forwarded > IRQ. The direct benefit is that, for a level sensitive IRQ, a VM > switch can be avoided on guest virtual IRQ completion. Before this > patch, a maintenance IRQ was triggered on the virtual IRQ completion. > > When the IRQ is forwarded, the VFIO platform driver does not need to > disable the IRQ anymore. Indeed when returning from the IRQ handler > the IRQ is not deactivated. Only its priority is lowered. This means > the same IRQ cannot hit before the guest completes the virtual IRQ > and the GIC automatically deactivates the corresponding physical IRQ. > > Besides, the injection still is based on irqfd triggering. The only > impact on irqfd process is resamplefd is not called anymore on > virtual IRQ completion since this latter becomes "transparent". > > The current integration is based on an extension of the KVM-VFIO > device, previously used by KVM to interact with VFIO groups. The > patch serie now enables KVM to directly interact with a VFIO > platform device. The VFIO external API was extended for that purpose. > > Th KVM-VFIO device can get/put the vfio platform device, check its > integrity and type, get the IRQ number associated to an IRQ index. > > The IRQ forward programming is architecture specific (virtual interrupt > controller programming basically). However the whole infrastructure is > kept generic. > > from a user point of view, the functionality is provided through new > KVM-VFIO device commands, KVM_DEV_VFIO_DEVICE_(UN)FORWARD_IRQ > and the capability can be checked with KVM_HAS_DEVICE_ATTR. > Assignment can only be changed when the physical IRQ is not active. > It is the responsability of the user to do this check. > > This patch serie has the following dependencies: > - "ARM: Forwarding physical interrupts to a guest VM" > (http://lwn.net/Articles/603514/) in > - [PATCH v3] irqfd for ARM > - and obviously the VFIO platform driver serie: > [RFC PATCH v6 00/20] VFIO support for platform devices on ARM > https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html > > Integrated pieces can be found at > ssh://git.linaro.org/people/eric.auger/linux.git > on branch 3.17rc3_irqfd_forward_integ_v2 > > This was was tested on Calxeda Midway, assigning the xgmac main IRQ. > > v1 -> v2: > - forward control is moved from architecture specific file into generic > vfio.c module. > only kvm_arch_set_fwd_state remains architecture specific > - integrate Kim's patch which enables KVM-VFIO for ARM > - fix vgic state bypass in vgic_queue_hwirq > - struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h > to include/uapi/linux/kvm.h > also irq_index renamed into index and guest_irq renamed into gsi > - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD > - vfio_external_get_base_device renamed into vfio_external_base_device > - vfio_external_get_type removed > - kvm_vfio_external_get_base_device renamed into kvm_vfio_external_base_device > - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD > > Eric Auger (8): > KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded > IRQ > KVM: ARM: VGIC: add forwarded irq rbtree lock > VFIO: platform: handler tests whether the IRQ is forwarded > KVM: KVM-VFIO: update user API to program forwarded IRQ > VFIO: Extend external user API > KVM: KVM-VFIO: add new VFIO external API hooks > KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding > control > KVM: KVM-VFIO: ARM forwarding control > > Kim Phillips (1): > ARM: KVM: Enable the KVM-VFIO device > > Documentation/virtual/kvm/devices/vfio.txt | 26 ++ > arch/arm/include/asm/kvm_host.h| 7 + > arch/arm/kvm/Kconfig | 1 + > arch/arm/kvm/Makefile | 4 +- > arch/arm/kvm/kvm_vfio_arm.c| 85 + > drivers/vfio/platform/vfio_platform_irq.c | 7 +- > drivers/vfio/vfio.c| 24 ++ > include/kvm/arm_vgic.h | 1 + > include/linux/kvm_host.h | 27 ++ > include/linux/vfio.h | 3 + > include/uapi/linux/kvm.h | 9 + > virt/kvm/arm/vgic.c| 59 +++- > virt/kvm/vfio.c| 497 > - > 13 files changed, 733 insertions(+), 17 deletions(-) > create mode 100644 arch/arm/kvm/kvm_vfio_arm.c > Have we ventured too far in the other direction? I suppose what I was hoping to see was something more like: case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{ /* get vfio_device */ /* get mutex */ /* verify device+irq isn't already forwarded */
issue with network on debian jessie with kvm/qemu
Hey all, >From all the manuals, faqs, howtos, etc. It looks like I have everything >configured correctly and everything but the networking is working. Can >anyone suggest what I might have wrong? On the network the host and anything on the actual network can ping the guest VM. But the guest VM can't ping the host or anything on the network nor can it connect to anything on the network. There is no firewall running on the host or the guest VM. Form all the reading I have tried setting the below. net.bridge.bridge-nf-call-ip6tables = 0 net.bridge.bridge-nf-call-iptables = 0 net.bridge.bridge-nf-call-arptables = 0 /proc/sys/net/ipv4/ip_forward=1 /proc/sys/net/ipv4/conf/all/send_redirects=0 Below are all the outputs that I could think of getting.The host is 64bit and the guest is a window 7 64 bit.I don't have the ipconfig from the windows but it is set to 172.16.0.205/24 on local connection 1 shows the same mac address as vnet0. ps aux: libvirt+ 3188 45.1 3.1 13490892 8224256 ?Rl 12:17 1:31 qemu-system-x86_64 -enable-kvm -name TestCenter01 -S -machine pc-i440fx-2.1,accel=kvm -m 8192 -smp 2,sockets=1,cores=2,threads=1 -uuid bc91b12b-f737-2945-9821-04c518086be8 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/TestCenter01.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -boot c -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -usb -drive file=/var/lib/libvirt/images/testcenter01.qcow2,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0 -drive file=/root/win-guest-drivers.iso,if=none,media=cdrom,id=drive-ide0-0-0,readonly=on,format=raw -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive file=/root/testcenter-4.43.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:79:12:3d,bus=pci.0,addr=0x3 -netdev tap,fd=26,id=hostnet1,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:2c:5e:d5,bus=pci.0,addr=0x4 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -device usb-tablet,id=input0 -spice port=5900,addr=0.0.0.0,disable-ticketing,seamless-migration=on -vga qxl -global qxl.ram_size=67108864 -global qxl.vram_size=67108864 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 qemu xml config file: ifconfig: br0 Link encap:Ethernet HWaddr ec:f4:bb:c4:66:e0 inet addr:172.16.0.220 Bcast:172.16.0.255 Mask:255.255.255.0 inet6 addr: fe80::eef4:bbff:fec4:66e0/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1865 errors:0 dropped:84 overruns:0 frame:0 TX packets:142 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:161772 (157.9 KiB) TX bytes:13480 (13.1 KiB) br1 Link encap:Ethernet HWaddr ec:f4:bb:c4:66:e2 inet addr:192.168.0.220 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::eef4:bbff:fec4:66e2/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1843 errors:0 dropped:84 overruns:0 frame:0 TX packets:68 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:159632 (155.8 KiB) TX bytes:9042 (8.8 KiB) eth0 Link encap:Ethernet HWaddr ec:f4:bb:c4:66:e0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1991 errors:0 dropped:0 overruns:0 frame:0 TX packets:186 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:219674 (214.5 KiB) TX bytes:18508 (18.0 KiB) eth1 Link encap:Ethernet HWaddr ec:f4:bb:c4:66:e2 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1951 errors:0 dropped:0 overruns:0 frame:0 TX packets:101 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:211856 (206.8 KiB) TX bytes:12776 (12.4 KiB) loLink encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:102 errors:0 dropped:0 overruns:0 frame:0 TX packets:102 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:7450 (7.2 KiB) TX bytes:7450 (7.2 KiB) vnet0 Link encap:Ethernet
Re: [PATCH v4 4/6] kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest().
Hi Gleb, On 09/03/2014 12:00 AM, Gleb Natapov wrote: .. +static void vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu) +{ + /* +* apic access page could be migrated. When the page is being migrated, +* GUP will wait till the migrate entry is replaced with the new pte +* entry pointing to the new page. +*/ + vcpu->kvm->arch.apic_access_page = gfn_to_page(vcpu->kvm, + APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT); + kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm, + page_to_phys(vcpu->kvm->arch.apic_access_page)); I am a little bit worried that here all vcpus write to vcpu->kvm->arch.apic_access_page without any locking. It is probably benign since pointer write is atomic on x86. Paolo? Do we even need apic_access_page? Why not call gfn_to_page(APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT) put_page() on rare occasions we need to know its address? Isn't it a necessary item defined in hardware spec ? I didn't read intel spec deeply, but according to the code, the page's address is written into vmcs. And it made me think that we cannnot remove it. Thanks. +} + /* * Returns 1 to let __vcpu_run() continue the guest execution loop without * exiting to the userspace. Otherwise, the value will be returned to the @@ -6049,6 +6062,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) kvm_deliver_pmi(vcpu); if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu)) vcpu_scan_ioapic(vcpu); + if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu)) + vcpu_reload_apic_access_page(vcpu); } if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) { diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a4c33b3..8be076a 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -136,6 +136,7 @@ static inline bool is_error_page(struct page *page) #define KVM_REQ_GLOBAL_CLOCK_UPDATE 22 #define KVM_REQ_ENABLE_IBS23 #define KVM_REQ_DISABLE_IBS 24 +#define KVM_REQ_APIC_PAGE_RELOAD 25 #define KVM_USERSPACE_IRQ_SOURCE_ID 0 #define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID 1 @@ -579,6 +580,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm); void kvm_reload_remote_mmus(struct kvm *kvm); void kvm_make_mclock_inprogress_request(struct kvm *kvm); void kvm_make_scan_ioapic_request(struct kvm *kvm); +void kvm_reload_apic_access_page(struct kvm *kvm); long kvm_arch_dev_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 33712fb..d8280de 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -210,6 +210,11 @@ void kvm_make_scan_ioapic_request(struct kvm *kvm) make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC); } +void kvm_reload_apic_access_page(struct kvm *kvm) +{ + make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD); +} + int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id) { struct page *page; @@ -294,6 +299,13 @@ static void kvm_mmu_notifier_invalidate_page(struct mmu_notifier *mn, if (need_tlb_flush) kvm_flush_remote_tlbs(kvm); + /* +* The physical address of apic access page is stroed in VMCS. +* So need to update it when it becomes invalid. +*/ + if (address == gfn_to_hva(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT)) + kvm_reload_apic_access_page(kvm); + spin_unlock(&kvm->mmu_lock); srcu_read_unlock(&kvm->srcu, idx); } -- 1.8.3.1 -- Gleb. . -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 5/6] kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is running.
Hi Gleb, By the way, when testing nested vm, I started L1 and L2 vm with -cpu XXX, -x2apic But with or with out this patch 5/6, when migrating apic access page, the nested vm didn't corrupt. We cannot migrate L2 vm because it pinned some other pages in memory. Without this patch, if we migrate apic access page, I thought L2 vm will corrupt. But it didn't. Did I made any mistake you can obviously find out ? Thanks. On 08/27/2014 06:17 PM, Tang Chen wrote: This patch only handle "L1 and L2 vm share one apic access page" situation. When L1 vm is running, if the shared apic access page is migrated, mmu_notifier will request all vcpus to exit to L0, and reload apic access page physical address for all the vcpus' vmcs (which is done by patch 5/6). And when it enters L2 vm, L2's vmcs will be updated in prepare_vmcs02() called by nested_vm_run(). So we need to do nothing. When L2 vm is running, if the shared apic access page is migrated, mmu_notifier will request all vcpus to exit to L0, and reload apic access page physical address for all L2 vmcs. And this patch requests apic access page reload in L2->L1 vmexit. Signed-off-by: Tang Chen --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/svm.c | 6 ++ arch/x86/kvm/vmx.c | 32 arch/x86/kvm/x86.c | 3 +++ virt/kvm/kvm_main.c | 1 + 5 files changed, 43 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 514183e..13fbb62 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -740,6 +740,7 @@ struct kvm_x86_ops { void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap); void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set); void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa); + void (*set_nested_apic_page_migrated)(struct kvm_vcpu *vcpu, bool set); void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector); void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu); int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index f2eacc4..da88646 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3624,6 +3624,11 @@ static void svm_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa) return; } +static void svm_set_nested_apic_page_migrated(struct kvm_vcpu *vcpu, bool set) +{ + return; +} + static int svm_vm_has_apicv(struct kvm *kvm) { return 0; @@ -4379,6 +4384,7 @@ static struct kvm_x86_ops svm_x86_ops = { .update_cr8_intercept = update_cr8_intercept, .set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode, .set_apic_access_page_addr = svm_set_apic_access_page_addr, + .set_nested_apic_page_migrated = svm_set_nested_apic_page_migrated, .vm_has_apicv = svm_vm_has_apicv, .load_eoi_exitmap = svm_load_eoi_exitmap, .hwapic_isr_update = svm_hwapic_isr_update, diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index da6d55d..9035fd1 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -379,6 +379,16 @@ struct nested_vmx { * we must keep them pinned while L2 runs. */ struct page *apic_access_page; + /* +* L1's apic access page can be migrated. When L1 and L2 are sharing +* the apic access page, after the page is migrated when L2 is running, +* we have to reload it to L1 vmcs before we enter L1. +* +* When the shared apic access page is migrated in L1 mode, we don't +* need to do anything else because we reload apic access page each +* time when entering L2 in prepare_vmcs02(). +*/ + bool apic_access_page_migrated; u64 msr_ia32_feature_control; struct hrtimer preemption_timer; @@ -7098,6 +7108,12 @@ static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa) vmcs_write64(APIC_ACCESS_ADDR, hpa); } +static void vmx_set_nested_apic_page_migrated(struct kvm_vcpu *vcpu, bool set) +{ + struct vcpu_vmx *vmx = to_vmx(vcpu); + vmx->nested.apic_access_page_migrated = set; +} + static void vmx_hwapic_isr_update(struct kvm *kvm, int isr) { u16 status; @@ -8796,6 +8812,21 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, } /* +* When shared (L1 & L2) apic access page is migrated during L2 is +* running, mmu_notifier will force to reload the page's hpa for L2 +* vmcs. Need to reload it for L1 before entering L1. +*/ + if (vmx->nested.apic_access_page_migrated) { + /* +* Do not call kvm_reload_apic_access_page() because we are now +* in L2. We should not call make_all_cpus_request() to exit to +* L0, otherwise we will reload for L2 vmcs again. +
[Bug 82211] Cannot boot Xen under KVM with X2APIC enabled
https://bugzilla.kernel.org/show_bug.cgi?id=82211 --- Comment #7 from Zhou, Chao --- xen verion: 4.4-unstable. xen.gz file is attached -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM Test report, kernel fd275235... qemu 8b303011...
Hi All, This is KVM upstream test result against kvm.git next branch and qemu.git master branch. kvm.git next branch: fd2752352bbc98850d83b5448a288d8991590317 based on kernel 3.17.0-rc1 qemu.git master branch: 8b3030114a449e66c68450acaac4b66f26d91416 We found two new bugs and two fix bugs in the past three weeks. We alse closed a bug with invalid. New issues (2): 1. Cannot boot Xen under KVM with X2APIC enabled https://bugzilla.kernel.org/show_bug.cgi?id=82211 2. guest can't get ip when create guest with bridge using virtio driver. https://bugs.launchpad.net/qemu/+bug/1364249 Fixed issues (2): 1. The guest will be destroyed after hot remove the VF from the guest. https://bugs.launchpad.net/qemu/+bug/1354279 2. [Nested]L2 guest failed to start in VMware on KVM https://bugzilla.kernel.org/show_bug.cgi?id=61411 closed bug with invalid:(1) 1. with 'monitor pty', it needs to flush pts device after sending command to it https://bugs.launchpad.net/qemu/+bug/1185228 Old issues (2): -- 1. Guest hang when doing kernel build and writing data in guest. https://bugs.launchpad.net/qemu/+bug/1096814 2. [Nested] Windows XP Mode can not work https://bugzilla.kernel.org/show_bug.cgi?id=60782 Test environment: == Platform IvyBridge-EP Haswell-EP CPU Cores 32 56 Memory size 128GB 160GB Regards Robert Ho -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 82211] Cannot boot Xen under KVM with X2APIC enabled
https://bugzilla.kernel.org/show_bug.cgi?id=82211 --- Comment #8 from Zhou, Chao --- Created attachment 149111 --> https://bugzilla.kernel.org/attachment.cgi?id=149111&action=edit xen.gz file -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] fix bugs with stale or corrupt MMIO caches
On 09/02/2014 11:42 PM, Paolo Bonzini wrote: > Il 29/08/2014 12:31, Paolo Bonzini ha scritto: >> David and Xiao, here's my take on the MMIO generation patches. Now >> with documentation, too. :) Please review! >> >> David Matlack (2): >> kvm: fix potentially corrupt mmio cache >> kvm: x86: fix stale mmio cache bug >> >> Paolo Bonzini (1): >> KVM: do not bias the generation number in kvm_current_mmio_generation >> >> Documentation/virtual/kvm/mmu.txt | 14 ++ >> arch/x86/include/asm/kvm_host.h | 1 + >> arch/x86/kvm/mmu.c| 29 ++--- >> arch/x86/kvm/x86.h| 20 +++- >> virt/kvm/kvm_main.c | 30 +++--- >> 5 files changed, 67 insertions(+), 27 deletions(-) >> > > Ping? Looks good to me. Reviewed-by: Xiao Guangrong -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html