Re: GET_RNG_SEED hypercall ABI? (Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm)
Il 27/08/2014 01:58, Andy Lutomirski ha scritto: > hpa pointed out that the ABI that I chose (an MSR from the KVM range > and a KVM cpuid bit) is unnecessarily KVM-specific. It would be nice > to allocate an MSR that everyone involved can agree on and, rather > than relying on a cpuid bit, just have the guest probe for the MSR. > > This leads to a few questions: > > 1. How do we allocate an MSR? (For background, this would be an MSR > that either returns 64 bits of best-effort cryptographically secure > random data or fails with #GP.) Ask Intel? :) > 2. For KVM, what's the right way to allow QEMU to turn the feature on > and off? Is this even necessary? KVM currently doesn't seem to allow > QEMU to turn any of its MSRs off; it just allows QEMU to ask it to > stop advertising support. Note that QEMU is not involved in the implementation of your feature, only in advertising it. You could look at CPUID at runtime, but that would mean teaching KVM to look for the KVMKVMKVM\0\0\0 signature in the CPUID data supplied by userspace. I don't think this is particularly useful. > 3. QEMU people, can you please fix your RDMSR emulation to send #GP on > failure? I can work around it for this MSR in the Linux code, but for > Pete's sake... :( Sure, I will look at it. Though I expect it was done because of MSRs that are missing in QEMU (similar to how Linux doesn't #GP if compiled with pvops). Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: GET_RNG_SEED hypercall ABI? (Re: [PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm)
On 08/27/2014 12:00 AM, Paolo Bonzini wrote: > Il 27/08/2014 01:58, Andy Lutomirski ha scritto: >> hpa pointed out that the ABI that I chose (an MSR from the KVM range >> and a KVM cpuid bit) is unnecessarily KVM-specific. It would be nice >> to allocate an MSR that everyone involved can agree on and, rather >> than relying on a cpuid bit, just have the guest probe for the MSR. >> >> This leads to a few questions: >> >> 1. How do we allocate an MSR? (For background, this would be an MSR >> that either returns 64 bits of best-effort cryptographically secure >> random data or fails with #GP.) > > Ask Intel? :) I'm going to poke around internally. Intel might as a matter of policy be reluctant to assign an MSR index specifically for software use, but I'll try to find out. -hpa -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL 2/2] KVM: s390/mm: try a cow on read only pages for key ops
On 27/08/14 05:06, Ben Hutchings wrote: > On Mon, 2014-08-25 at 15:10 +0200, Christian Borntraeger wrote: >> The PFMF instruction handler blindly wrote the storage key even if >> the page was mapped R/O in the host. Lets try a COW before continuing >> and bail out in case of errors. >> >> Signed-off-by: Christian Borntraeger >> Reviewed-by: Dominik Dingel >> Cc: sta...@vger.kernel.org >> --- >> arch/s390/mm/pgtable.c | 10 ++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c >> index 19daa53..5404a62 100644 >> --- a/arch/s390/mm/pgtable.c >> +++ b/arch/s390/mm/pgtable.c >> @@ -986,11 +986,21 @@ int set_guest_storage_key(struct mm_struct *mm, >> unsigned long addr, >> pte_t *ptep; >> >> down_read(&mm->mmap_sem); >> +retry: >> ptep = get_locked_pte(current->mm, addr, &ptl); >> if (unlikely(!ptep)) { >> up_read(&mm->mmap_sem); >> return -EFAULT; >> } >> +if (!(pte_val(*ptep) & _PAGE_INVALID) && >> + (pte_val(*ptep) & _PAGE_PROTECT)) { >> +pte_unmap_unlock(*ptep, ptl); >> +if (fixup_user_fault(current, mm, addr, >> FAULT_FLAG_WRITE)) { >> +up_read(&mm->mmap_sem); >> +return -EFAULT; >> +} >> +goto retry; >> +} > > Every line below the first 'if' is indented one tab stop too far. > > Ben. > >> new = old = pgste_get_lock(ptep); >> pgste_val(new) &= ~(PGSTE_GR_BIT | PGSTE_GC_BIT | > Hmm, indeed. Drat. Paolo, do you want a revert, resend? Christian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [question] e1000 interrupt storm happened becauseofits correspondingioapic->irr bit always set
>> Hi, all >> >> I use a qemu-1.4.1/qemu-2.0.0 to run win7 guest, and encounter e1000 NIC >> interrupt storm, >> because "if (!ent->fields.mask && (ioapic->irr & (1 << i)))" is always >> true in __kvm_ioapic_update_eoi(). >> >> Any ideas? > We meet this several times: search the autoneg patches for an example of > workaround for this in qemu, and patch kvm: ioapic: conditionally delay > irq delivery during eoi broadcast for an workaround in kvm (rejected). > Thanks, Jason, I searched "e1000 autoneg" in gmane.comp.emulators.qemu, and found below patches, http://thread.gmane.org/gmane.comp.emulators.qemu/143001/focus=143007 >>> This series is the first try to fix the guest hang during guest >>> hibernation or driver enable/disable. http://thread.gmane.org/gmane.comp.emulators.qemu/284105/focus=284765 http://thread.gmane.org/gmane.comp.emulators.qemu/186159/focus=187351 >>> Those are follow-up that tries to fix the bugs introduced by the autoneg >>> hack. which one tries to fix this problem, or all of them? >>> As you can see, those kinds of hacking may not as good as we expect >>> since we don't know exactly how e1000 works. Only the register function >>> description from Intel's manual may not be sufficient. And you can >>> search e1000 in the archives and you can find some behaviour of e1000 >>> registers were not fictionalized like what spec said. It was really >>> suggested to use virtio-net instead of e1000 in guest. >> Will the "[PATCH] kvm: ioapic: conditionally delay irq delivery during eoi >> broadcast" add delay to virtual interrupt injection sometimes, >> then some time delay sensitive applications will be impacted? > >I don't test it too much but it only give a minor delay of 1% irq in the >hope of guest irq handler will be registered shortly. But I suspect it's >the bug of e1000 who inject the irq in the wrong time. Under what cases >did you meet this issue? Some scenarios, not constant and 100% reproducity, e.g., reboot vm, ifdown e1000 nic, install kaspersky(network configuration is performed during installing stage), .etc. Thanks, Zhang Haoyu >> >> Thanks, >> Zhang Haoyu -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: x86: sync old tmr with ioapic to update
On 2014/8/27 22:05, Wei Wang wrote: kvm_ioapic_scan_entry() needs to update tmr. The previous lapic tmr value (old_tmr) needs to sync with ioapic to get an accurate updated tmr value before the updating work. Tested-by: Rongrong Liu Signed-off-by: Yang Zhang Signed-off-by: Wei Wang --- arch/x86/kvm/lapic.c | 19 +-- arch/x86/kvm/x86.c |2 +- 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 08e8a89..8c1162d 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -518,10 +518,25 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu) void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr) { struct kvm_lapic *apic = vcpu->arch.apic; + u32 irr; + u32 isr; + u32 old_tmr, new_tmr; int i; - for (i = 0; i < 8; i++) - apic_set_reg(apic, APIC_TMR + 0x10 * i, tmr[i]); + /* +* The updated tmr value comes from level-triggerd interrupts that s/level-triggerd/level-triggered +* have already been delieverd to lapic and new coming ones which s/delieverd/delivered +* are pending in ioapic. According to the x86 spec, tmr is valid +* when irr or isr is set. +*/ + for (i = 0; i < 8; i++) { + irr = kvm_apic_get_reg(apic, APIC_IRR + 0x10 * i); + isr = kvm_apic_get_reg(apic, APIC_ISR + 0x10 * i); + old_tmr = kvm_apic_get_reg(apic, APIC_TMR + 0x10 * i); + new_tmr = (~(irr | isr) & tmr[i]) + | ((irr | isr) & old_tmr); + apic_set_reg(apic, APIC_TMR + 0x10 * i, new_tmr); + } } static void apic_update_ppr(struct kvm_lapic *apic) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5f5edb6..d401684 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5991,8 +5991,8 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu) memset(tmr, 0, 32); kvm_ioapic_scan_entry(vcpu, eoi_exit_bitmap, tmr); - kvm_x86_ops->load_eoi_exitmap(vcpu, eoi_exit_bitmap); kvm_apic_update_tmr(vcpu, tmr); + kvm_x86_ops->load_eoi_exitmap(vcpu, eoi_exit_bitmap); } /* -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL 2/2] KVM: s390/mm: try a cow on read only pages for key ops
Il 27/08/2014 09:13, Christian Borntraeger ha scritto: > On 27/08/14 05:06, Ben Hutchings wrote: >> On Mon, 2014-08-25 at 15:10 +0200, Christian Borntraeger wrote: >>> The PFMF instruction handler blindly wrote the storage key even if >>> the page was mapped R/O in the host. Lets try a COW before continuing >>> and bail out in case of errors. >>> >>> Signed-off-by: Christian Borntraeger >>> Reviewed-by: Dominik Dingel >>> Cc: sta...@vger.kernel.org >>> --- >>> arch/s390/mm/pgtable.c | 10 ++ >>> 1 file changed, 10 insertions(+) >>> >>> diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c >>> index 19daa53..5404a62 100644 >>> --- a/arch/s390/mm/pgtable.c >>> +++ b/arch/s390/mm/pgtable.c >>> @@ -986,11 +986,21 @@ int set_guest_storage_key(struct mm_struct *mm, >>> unsigned long addr, >>> pte_t *ptep; >>> >>> down_read(&mm->mmap_sem); >>> +retry: >>> ptep = get_locked_pte(current->mm, addr, &ptl); >>> if (unlikely(!ptep)) { >>> up_read(&mm->mmap_sem); >>> return -EFAULT; >>> } >>> + if (!(pte_val(*ptep) & _PAGE_INVALID) && >>> +(pte_val(*ptep) & _PAGE_PROTECT)) { >>> + pte_unmap_unlock(*ptep, ptl); >>> + if (fixup_user_fault(current, mm, addr, >>> FAULT_FLAG_WRITE)) { >>> + up_read(&mm->mmap_sem); >>> + return -EFAULT; >>> + } >>> + goto retry; >>> + } >> >> Every line below the first 'if' is indented one tab stop too far. >> >> Ben. >> >>> new = old = pgste_get_lock(ptep); >>> pgste_val(new) &= ~(PGSTE_GR_BIT | PGSTE_GC_BIT | >> > > Hmm, indeed. Drat. Paolo, do you want a revert, resend? Just send a trivial patch to fix up the formatting. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 1/6] kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
We have APIC_DEFAULT_PHYS_BASE defined as 0xfee0, which is also the address of apic access page. So use this macro. Signed-off-by: Tang Chen --- arch/x86/kvm/svm.c | 3 ++- arch/x86/kvm/vmx.c | 6 +++--- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index ddf7427..1d941ad 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1257,7 +1257,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id) svm->asid_generation = 0; init_vmcb(svm); - svm->vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE; + svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE | + MSR_IA32_APICBASE_ENABLE; if (kvm_vcpu_is_bsp(&svm->vcpu)) svm->vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index bfe11cf..4b80ead 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3999,13 +3999,13 @@ static int alloc_apic_access_page(struct kvm *kvm) goto out; kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT; kvm_userspace_mem.flags = 0; - kvm_userspace_mem.guest_phys_addr = 0xfee0ULL; + kvm_userspace_mem.guest_phys_addr = APIC_DEFAULT_PHYS_BASE; kvm_userspace_mem.memory_size = PAGE_SIZE; r = __kvm_set_memory_region(kvm, &kvm_userspace_mem); if (r) goto out; - page = gfn_to_page(kvm, 0xfee00); + page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT); if (is_error_page(page)) { r = -EFAULT; goto out; @@ -4477,7 +4477,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu) vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val(); kvm_set_cr8(&vmx->vcpu, 0); - apic_base_msr.data = 0xfee0 | MSR_IA32_APICBASE_ENABLE; + apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE; if (kvm_vcpu_is_bsp(&vmx->vcpu)) apic_base_msr.data |= MSR_IA32_APICBASE_BSP; apic_base_msr.host_initiated = true; -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: s390/mm: fix up indentation of set_guest_storage_key
commit ab3f285f227f ("KVM: s390/mm: try a cow on read only pages for key ops")' misaligned a code block. Let's fixup the indentation. Reported-by: Ben Hutchings Signed-off-by: Christian Borntraeger --- arch/s390/mm/pgtable.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index 5404a62..1570dbd 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -994,13 +994,13 @@ retry: } if (!(pte_val(*ptep) & _PAGE_INVALID) && (pte_val(*ptep) & _PAGE_PROTECT)) { - pte_unmap_unlock(*ptep, ptl); - if (fixup_user_fault(current, mm, addr, FAULT_FLAG_WRITE)) { - up_read(&mm->mmap_sem); - return -EFAULT; - } - goto retry; + pte_unmap_unlock(*ptep, ptl); + if (fixup_user_fault(current, mm, addr, FAULT_FLAG_WRITE)) { + up_read(&mm->mmap_sem); + return -EFAULT; } + goto retry; + } new = old = pgste_get_lock(ptep); pgste_val(new) &= ~(PGSTE_GR_BIT | PGSTE_GC_BIT | -- 1.8.4.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 4/6] kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest().
apic access page is pinned in memory. As a result, it cannot be migrated/hot-removed. Actually, it is not necessary to be pinned. The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer. When the page is migrated, kvm_mmu_notifier_invalidate_page() will invalidate the corresponding ept entry. This patch introduces a new vcpu request named KVM_REQ_APIC_PAGE_RELOAD, and makes this request to all the vcpus at this time, and force all the vcpus exit guest, and re-enter guest till they updates the VMCS APIC_ACCESS_ADDR pointer to the new apic access page address, and updates kvm->arch.apic_access_page to the new page. Signed-off-by: Tang Chen --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/svm.c | 6 ++ arch/x86/kvm/vmx.c | 6 ++ arch/x86/kvm/x86.c | 15 +++ include/linux/kvm_host.h| 2 ++ virt/kvm/kvm_main.c | 12 6 files changed, 42 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 35171c7..514183e 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -739,6 +739,7 @@ struct kvm_x86_ops { void (*hwapic_isr_update)(struct kvm *kvm, int isr); void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap); void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set); + void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa); void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector); void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu); int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 1d941ad..f2eacc4 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3619,6 +3619,11 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set) return; } +static void svm_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa) +{ + return; +} + static int svm_vm_has_apicv(struct kvm *kvm) { return 0; @@ -4373,6 +4378,7 @@ static struct kvm_x86_ops svm_x86_ops = { .enable_irq_window = enable_irq_window, .update_cr8_intercept = update_cr8_intercept, .set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode, + .set_apic_access_page_addr = svm_set_apic_access_page_addr, .vm_has_apicv = svm_vm_has_apicv, .load_eoi_exitmap = svm_load_eoi_exitmap, .hwapic_isr_update = svm_hwapic_isr_update, diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 63c4c3e..da6d55d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -7093,6 +7093,11 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set) vmx_set_msr_bitmap(vcpu); } +static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa) +{ + vmcs_write64(APIC_ACCESS_ADDR, hpa); +} + static void vmx_hwapic_isr_update(struct kvm *kvm, int isr) { u16 status; @@ -8910,6 +8915,7 @@ static struct kvm_x86_ops vmx_x86_ops = { .enable_irq_window = enable_irq_window, .update_cr8_intercept = update_cr8_intercept, .set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode, + .set_apic_access_page_addr = vmx_set_apic_access_page_addr, .vm_has_apicv = vmx_vm_has_apicv, .load_eoi_exitmap = vmx_load_eoi_exitmap, .hwapic_irr_update = vmx_hwapic_irr_update, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e05bd58..96f4188 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5989,6 +5989,19 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu) kvm_apic_update_tmr(vcpu, tmr); } +static void vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu) +{ + /* +* apic access page could be migrated. When the page is being migrated, +* GUP will wait till the migrate entry is replaced with the new pte +* entry pointing to the new page. +*/ + vcpu->kvm->arch.apic_access_page = gfn_to_page(vcpu->kvm, + APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT); + kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm, + page_to_phys(vcpu->kvm->arch.apic_access_page)); +} + /* * Returns 1 to let __vcpu_run() continue the guest execution loop without * exiting to the userspace. Otherwise, the value will be returned to the @@ -6049,6 +6062,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) kvm_deliver_pmi(vcpu); if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu)) vcpu_scan_ioapic(vcpu); + if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu)) + vcpu_reload_apic_access_page(vcpu); } if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) { diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a4c33b3..8be076a 100644 --- a/includ
[PATCH v4 6/6] kvm, mem-hotplug: Do not pin apic access page in memory.
gfn_to_page() will finally call hva_to_pfn() to get the pfn, and pin the page in memory by calling GUP functions. This function unpins the page. After this patch, acpi access page is able to be migrated. Signed-off-by: Tang Chen --- arch/x86/kvm/vmx.c | 2 +- arch/x86/kvm/x86.c | 4 +--- include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 17 - 4 files changed, 19 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 9035fd1..e0043a5 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4022,7 +4022,7 @@ static int alloc_apic_access_page(struct kvm *kvm) if (r) goto out; - page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT); + page = gfn_to_page_no_pin(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT); if (is_error_page(page)) { r = -EFAULT; goto out; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 131b6e8..2edbeb9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5996,7 +5996,7 @@ static void vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu) * GUP will wait till the migrate entry is replaced with the new pte * entry pointing to the new page. */ - vcpu->kvm->arch.apic_access_page = gfn_to_page(vcpu->kvm, + vcpu->kvm->arch.apic_access_page = gfn_to_page_no_pin(vcpu->kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT); kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm, page_to_phys(vcpu->kvm->arch.apic_access_page)); @@ -7255,8 +7255,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm) kfree(kvm->arch.vpic); kfree(kvm->arch.vioapic); kvm_free_vcpus(kvm); - if (kvm->arch.apic_access_page) - put_page(kvm->arch.apic_access_page); kfree(rcu_dereference_check(kvm->arch.apic_map, 1)); } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 8be076a..02cbcb1 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -526,6 +526,7 @@ int gfn_to_page_many_atomic(struct kvm *kvm, gfn_t gfn, struct page **pages, int nr_pages); struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn); +struct page *gfn_to_page_no_pin(struct kvm *kvm, gfn_t gfn); unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn); unsigned long gfn_to_hva_prot(struct kvm *kvm, gfn_t gfn, bool *writable); unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 784127e..19d90d2 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1386,9 +1386,24 @@ struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn) return kvm_pfn_to_page(pfn); } - EXPORT_SYMBOL_GPL(gfn_to_page); +struct page *gfn_to_page_no_pin(struct kvm *kvm, gfn_t gfn) +{ + struct page *page = gfn_to_page(kvm, gfn); + + /* +* gfn_to_page() will finally call hva_to_pfn() to get the pfn, and pin +* the page in memory by calling GUP functions. This function unpins +* the page. +*/ + if (!is_error_page(page)) + put_page(page); + + return page; +} +EXPORT_SYMBOL_GPL(gfn_to_page_no_pin); + void kvm_release_page_clean(struct page *page) { WARN_ON(is_error_page(page)); -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 0/6] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.
ept identity pagetable and apic access page in kvm are pinned in memory. As a result, they cannot be migrated/hot-removed. But actually they don't need to be pinned in memory. [For ept identity page] Just do not pin it. When it is migrated, guest will be able to find the new page in the next ept violation. [For apic access page] The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer. When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer for each vcpu in addition. NOTE: Tested with -cpu xxx,-x2apic option. But since nested vm pins some other pages in memory, if user uses nested vm, memory hot-remove will not work. Change log v3 -> v4: 1. The original patch 6 is now patch 5. ( by Jan Kiszka ) 2. The original patch 1 is now patch 6 since we should unpin apic access page at the very last moment. Tang Chen (6): kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address. kvm: Remove ept_identity_pagetable from struct kvm_arch. kvm: Make init_rmode_identity_map() return 0 on success. kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest(). kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is running. kvm, mem-hotplug: Do not pin apic access page in memory. arch/x86/include/asm/kvm_host.h | 3 +- arch/x86/kvm/svm.c | 15 +- arch/x86/kvm/vmx.c | 103 +++- arch/x86/kvm/x86.c | 22 +++-- include/linux/kvm_host.h| 3 ++ virt/kvm/kvm_main.c | 30 +++- 6 files changed, 135 insertions(+), 41 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 2/6] kvm: Remove ept_identity_pagetable from struct kvm_arch.
kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But it is never used to refer to the page at all. In vcpu initialization, it indicates two things: 1. indicates if ept page is allocated 2. indicates if a memory slot for identity page is initialized Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept identity pagetable is initialized. So we can remove ept_identity_pagetable. NOTE: In the original code, ept identity pagetable page is pinned in memroy. As a result, it cannot be migrated/hot-removed. After this patch, since kvm_arch->ept_identity_pagetable is removed, ept identity pagetable page is no longer pinned in memory. And it can be migrated/hot-removed. Signed-off-by: Tang Chen --- arch/x86/include/asm/kvm_host.h | 1 - arch/x86/kvm/vmx.c | 50 - arch/x86/kvm/x86.c | 2 -- 3 files changed, 25 insertions(+), 28 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 7c492ed..35171c7 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -580,7 +580,6 @@ struct kvm_arch { gpa_t wall_clock; - struct page *ept_identity_pagetable; bool ept_identity_pagetable_done; gpa_t ept_identity_map_addr; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 4b80ead..953d529 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -743,6 +743,7 @@ static u32 vmx_segment_access_rights(struct kvm_segment *var); static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu); static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx); static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx); +static int alloc_identity_pagetable(struct kvm *kvm); static DEFINE_PER_CPU(struct vmcs *, vmxarea); static DEFINE_PER_CPU(struct vmcs *, current_vmcs); @@ -3938,21 +3939,27 @@ out: static int init_rmode_identity_map(struct kvm *kvm) { - int i, idx, r, ret; + int i, idx, r, ret = 0; pfn_t identity_map_pfn; u32 tmp; if (!enable_ept) return 1; - if (unlikely(!kvm->arch.ept_identity_pagetable)) { - printk(KERN_ERR "EPT: identity-mapping pagetable " - "haven't been allocated!\n"); - return 0; + + /* Protect kvm->arch.ept_identity_pagetable_done. */ + mutex_lock(&kvm->slots_lock); + + if (likely(kvm->arch.ept_identity_pagetable_done)) { + ret = 1; + goto out2; } - if (likely(kvm->arch.ept_identity_pagetable_done)) - return 1; - ret = 0; + identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT; + + r = alloc_identity_pagetable(kvm); + if (r) + goto out2; + idx = srcu_read_lock(&kvm->srcu); r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE); if (r < 0) @@ -3970,6 +3977,9 @@ static int init_rmode_identity_map(struct kvm *kvm) ret = 1; out: srcu_read_unlock(&kvm->srcu, idx); + +out2: + mutex_unlock(&kvm->slots_lock); return ret; } @@ -4019,31 +4029,23 @@ out: static int alloc_identity_pagetable(struct kvm *kvm) { - struct page *page; + /* +* In init_rmode_identity_map(), kvm->arch.ept_identity_pagetable_done +* is checked before calling this function and set to true after the +* calling. The access to kvm->arch.ept_identity_pagetable_done should +* be protected by kvm->slots_lock. +*/ + struct kvm_userspace_memory_region kvm_userspace_mem; int r = 0; - mutex_lock(&kvm->slots_lock); - if (kvm->arch.ept_identity_pagetable) - goto out; kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT; kvm_userspace_mem.flags = 0; kvm_userspace_mem.guest_phys_addr = kvm->arch.ept_identity_map_addr; kvm_userspace_mem.memory_size = PAGE_SIZE; r = __kvm_set_memory_region(kvm, &kvm_userspace_mem); - if (r) - goto out; - page = gfn_to_page(kvm, kvm->arch.ept_identity_map_addr >> PAGE_SHIFT); - if (is_error_page(page)) { - r = -EFAULT; - goto out; - } - - kvm->arch.ept_identity_pagetable = page; -out: - mutex_unlock(&kvm->slots_lock); return r; } @@ -7643,8 +7645,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) kvm->arch.ept_identity_map_addr = VMX_EPT_IDENTITY_PAGETABLE_ADDR; err = -ENOMEM; - if (alloc_identity_pagetable(kvm) != 0) - goto free_vmcs; if (!init_rmode_identity_map(kvm)) goto free_vmcs; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8f1e22d..e05bd58 1
[PATCH v4 5/6] kvm, mem-hotplug: Reload L1's apic access page on migration when L2 is running.
This patch only handle "L1 and L2 vm share one apic access page" situation. When L1 vm is running, if the shared apic access page is migrated, mmu_notifier will request all vcpus to exit to L0, and reload apic access page physical address for all the vcpus' vmcs (which is done by patch 5/6). And when it enters L2 vm, L2's vmcs will be updated in prepare_vmcs02() called by nested_vm_run(). So we need to do nothing. When L2 vm is running, if the shared apic access page is migrated, mmu_notifier will request all vcpus to exit to L0, and reload apic access page physical address for all L2 vmcs. And this patch requests apic access page reload in L2->L1 vmexit. Signed-off-by: Tang Chen --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/svm.c | 6 ++ arch/x86/kvm/vmx.c | 32 arch/x86/kvm/x86.c | 3 +++ virt/kvm/kvm_main.c | 1 + 5 files changed, 43 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 514183e..13fbb62 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -740,6 +740,7 @@ struct kvm_x86_ops { void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap); void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set); void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa); + void (*set_nested_apic_page_migrated)(struct kvm_vcpu *vcpu, bool set); void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector); void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu); int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index f2eacc4..da88646 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3624,6 +3624,11 @@ static void svm_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa) return; } +static void svm_set_nested_apic_page_migrated(struct kvm_vcpu *vcpu, bool set) +{ + return; +} + static int svm_vm_has_apicv(struct kvm *kvm) { return 0; @@ -4379,6 +4384,7 @@ static struct kvm_x86_ops svm_x86_ops = { .update_cr8_intercept = update_cr8_intercept, .set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode, .set_apic_access_page_addr = svm_set_apic_access_page_addr, + .set_nested_apic_page_migrated = svm_set_nested_apic_page_migrated, .vm_has_apicv = svm_vm_has_apicv, .load_eoi_exitmap = svm_load_eoi_exitmap, .hwapic_isr_update = svm_hwapic_isr_update, diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index da6d55d..9035fd1 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -379,6 +379,16 @@ struct nested_vmx { * we must keep them pinned while L2 runs. */ struct page *apic_access_page; + /* +* L1's apic access page can be migrated. When L1 and L2 are sharing +* the apic access page, after the page is migrated when L2 is running, +* we have to reload it to L1 vmcs before we enter L1. +* +* When the shared apic access page is migrated in L1 mode, we don't +* need to do anything else because we reload apic access page each +* time when entering L2 in prepare_vmcs02(). +*/ + bool apic_access_page_migrated; u64 msr_ia32_feature_control; struct hrtimer preemption_timer; @@ -7098,6 +7108,12 @@ static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa) vmcs_write64(APIC_ACCESS_ADDR, hpa); } +static void vmx_set_nested_apic_page_migrated(struct kvm_vcpu *vcpu, bool set) +{ + struct vcpu_vmx *vmx = to_vmx(vcpu); + vmx->nested.apic_access_page_migrated = set; +} + static void vmx_hwapic_isr_update(struct kvm *kvm, int isr) { u16 status; @@ -8796,6 +8812,21 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, } /* +* When shared (L1 & L2) apic access page is migrated during L2 is +* running, mmu_notifier will force to reload the page's hpa for L2 +* vmcs. Need to reload it for L1 before entering L1. +*/ + if (vmx->nested.apic_access_page_migrated) { + /* +* Do not call kvm_reload_apic_access_page() because we are now +* in L2. We should not call make_all_cpus_request() to exit to +* L0, otherwise we will reload for L2 vmcs again. +*/ + kvm_reload_apic_access_page(vcpu->kvm); + vmx->nested.apic_access_page_migrated = false; + } + + /* * Exiting from L2 to L1, we're now back to L1 which thinks it just * finished a VMLAUNCH or VMRESUME instruction, so we need to set the * success or failure flag accordingly. @@ -8916,6 +8947,7 @@ static struct kvm_x86_ops vmx_x86_ops = { .update_cr8_intercept = update_cr8_intercept, .se
[PATCH v4 3/6] kvm: Make init_rmode_identity_map() return 0 on success.
In init_rmode_identity_map(), there two variables indicating the return value, r and ret, and it return 0 on error, 1 on success. The function is only called by vmx_create_vcpu(), and r is redundant. This patch removes the redundant variable r, and make init_rmode_identity_map() return 0 on success, -errno on failure. Signed-off-by: Tang Chen --- arch/x86/kvm/vmx.c | 25 +++-- 1 file changed, 11 insertions(+), 14 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 953d529..63c4c3e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3939,45 +3939,42 @@ out: static int init_rmode_identity_map(struct kvm *kvm) { - int i, idx, r, ret = 0; + int i, idx, ret = 0; pfn_t identity_map_pfn; u32 tmp; if (!enable_ept) - return 1; + return 0; /* Protect kvm->arch.ept_identity_pagetable_done. */ mutex_lock(&kvm->slots_lock); - if (likely(kvm->arch.ept_identity_pagetable_done)) { - ret = 1; + if (likely(kvm->arch.ept_identity_pagetable_done)) goto out2; - } identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT; - r = alloc_identity_pagetable(kvm); - if (r) + ret = alloc_identity_pagetable(kvm); + if (ret) goto out2; idx = srcu_read_lock(&kvm->srcu); - r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE); - if (r < 0) + ret = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE); + if (ret) goto out; /* Set up identity-mapping pagetable for EPT in real mode */ for (i = 0; i < PT32_ENT_PER_PAGE; i++) { tmp = (i << 22) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE); - r = kvm_write_guest_page(kvm, identity_map_pfn, + ret = kvm_write_guest_page(kvm, identity_map_pfn, &tmp, i * sizeof(tmp), sizeof(tmp)); - if (r < 0) + if (ret) goto out; } kvm->arch.ept_identity_pagetable_done = true; - ret = 1; + out: srcu_read_unlock(&kvm->srcu, idx); - out2: mutex_unlock(&kvm->slots_lock); return ret; @@ -7645,7 +7642,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) kvm->arch.ept_identity_map_addr = VMX_EPT_IDENTITY_PAGETABLE_ADDR; err = -ENOMEM; - if (!init_rmode_identity_map(kvm)) + if (init_rmode_identity_map(kvm)) goto free_vmcs; } -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: PPC: Book3S HV: Add register name when loading toc
On 19.08.14 06:59, Michael Neuling wrote: > Add 'r' to register name r2 in kvmppc_hv_enter. > > Also update comment at the top of kvmppc_hv_enter to indicate that R2/TOC is > non-volatile. > > Signed-off-by: Michael Neuling > Signed-off-by: Paul Mackerras Thanks, applied to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] KVM: PPC: BOOKE: Emulate debug registers and exception
On 13.08.14 11:09, Bharat Bhushan wrote: > This patch emulates debug registers and debug exception > to support guest using debug resource. This enables running > gdb/kgdb etc in guest. > > On BOOKE architecture we cannot share debug resources between QEMU and > guest because: > When QEMU is using debug resources then debug exception must > be always enabled. To achieve this we set MSR_DE and also set > MSRP_DEP so guest cannot change MSR_DE. > > When emulating debug resource for guest we want guest > to control MSR_DE (enable/disable debug interrupt on need). > > So above mentioned two configuration cannot be supported > at the same time. So the result is that we cannot share > debug resources between QEMU and Guest on BOOKE architecture. > > In the current design QEMU gets priority over guest, this means that if > QEMU is using debug resources then guest cannot use them and if guest is > using debug resource then QEMU can overwrite them. > > Signed-off-by: Bharat Bhushan Scott, could you please recheck whether you're ok with it now? :) Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/6] KVM: PPC: Book3e: AltiVec support
On 20.08.14 15:36, Mihai Caraman wrote: > Add KVM Book3e AltiVec support. > > Changes: > > v4: > - use CONFIG_SPE_POSSIBLE and a new ifdef for CONFIG_ALTIVEC > - remove SPE handlers from bookehv > - split ONE_REG powerpc generic and ONE_REG AltiVec > - add setters for IVPR, IVOR2 and IVOR8 > - add api documentation for ONE_REG IVPR and IVORs > - don't enable e6500 core since hardware threads are not yet supported > > v3: > - use distinct SPE/AltiVec exception handlers > - make ONE_REG AltiVec support powerpc generic > - add ONE_REG IVORs support > > v2: > - integrate Paul's FP/VMX/VSX changes that landed in kvm-ppc-queue >in January and take into account feedback > > Mihai Caraman (6): > KVM: PPC: Book3E: Increase FPU laziness > KVM: PPC: Book3e: Add AltiVec support > KVM: PPC: Make ONE_REG powerpc generic > KVM: PPC: Move ONE_REG AltiVec support to powerpc > KVM: PPC: Booke: Add setter functions for IVPR, IVOR2 and IVOR8 > emulation > KVM: PPC: Booke: Add ONE_REG support for IVPR and IVORs Thanks, applied 1-4 to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 5/6] KVM: PPC: Booke: Add setter functions for IVPR, IVOR2 and IVOR8 emulation
On 20.08.14 15:36, Mihai Caraman wrote: > Add setter functions for IVPR, IVOR2 and IVOR8 emulation in preparation > for ONE_REG support. > > Signed-off-by: Mihai Caraman What about the other GIVORs? Also, I would prefer to have a common helper for IVOR setting that simply covers SPRN_GIVOR setting along the way. Something like void kvmppc_set_ivor(struct kvm_vcpu *vcpu, int irqprio_ivor, u16 new_ivor) { vcpu->arch.ivor[irqprio_ivor] = new_ivor; switch (irqprio_ivor) { case BOOKE_IRQPRIO_DATA_STORAGE: mtspr(SPRN_GIVOR2, new_ivor); break; ... } } which you can just call from all the IVOR setters. In fact, you can probably combine all of the ONE_REG handlers into a single handler that just does a quick table lookup for its irqprio. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: pert stat in KVM guest can not get LLC-loads hardware cache event
> > Dear KVM developers: > I am trying use perf stat inside a VM to obtain some hardware cache > performance counter values. > The perf stat can report some numbers for L1 and TLB related counters. But > for the LLC-loads and LLC-load-misses, the numbers are always 0. It seems > that the these offcore events are not exposed to the guest. > > Is this a bug in Qemu or KVM? > There is no offcore virtualization support in KVM yet. For you case, I guess you are using paravirt for guest, so it should be 0. Otherwise, you should get #GP in guest. > My testbed is > > Host kernel: 3.12.26 > Qemu: 2.1.0 > CPU: Intel Ivy bridge 2620 > VM boosted by qemu, with -cpu host. > > Thanks. > > - Hui Kang
Re: [PATCH 1/2] add check parameter to run_tests configuration
Il 26/08/2014 20:29, Chris J Arges ha scritto: > +path=${check_param%%=*} > +value=${check_param#*=} > +if [ $path ] && [[ $(cat $path) != $value ]]; then [[ ]] is a bashism, please use [ ]. Also, please include all operands of [ ] within double quotes. Paolo > +echo "skip $1 ($path not equal to $value)" > +return > +fi -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] x86/unittests.cfg: the pmu testcase requires that nmi_watchdog is disabled
Il 26/08/2014 20:29, Chris J Arges ha scritto: > If nmi_watchdog is enabled, it will take up a PMU counter causing the > all_counters testcase to fail. This additional check will error out if > nmi_watchdog is enabled and provide feedback for the user to configure the > host machine correctly. > > Signed-off-by: Chris J Arges > --- > x86/unittests.cfg | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/x86/unittests.cfg b/x86/unittests.cfg > index 0123944..badb08a 100644 > --- a/x86/unittests.cfg > +++ b/x86/unittests.cfg > @@ -92,6 +92,7 @@ file = msr.flat > [pmu] > file = pmu.flat > extra_params = -cpu host > +check = /proc/sys/kernel/nmi_watchdog=0 > > [port80] > file = port80.flat > Reviewed-by: Paolo Bonzini -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/6] target-i386: Make most CPU models work with "enforce" out of the box
Il 26/08/2014 20:01, Eduardo Habkost ha scritto: > On Tue, Aug 26, 2014 at 02:56:21PM +0200, Paolo Bonzini wrote: >> Il 25/08/2014 22:45, Eduardo Habkost ha scritto: >>> >>> TCG users expect the default CPU model to contain most TCG-supported >>> features >>> (and it makes sense). See, for example, commit >>> f1e00a9cf326acc1f2386a72525af8859852e1df. >> >> It doesn't though (SMAP is the most egregious omission, and probably the >> main reason why people use QEMU TCG these days), and it raises the >> question of backwards-compatibility of qemu64---should we disable TCG >> features in old machine types? Probably yes, but we've never done that. > > Had we changed qemu64, any changes to the feature set of qemu64 would > probably require compatibility code on old machine-types for KVM, > anyway. But the last time qemu64 was changed was in 2009 (commit > f1e00a9cf326acc1f2386a72525af8859852e1df), it looks like everybody was > afraid of touching "qemu64" because its purpose was not very clear. > > So maybe that's good news, as things can be simpler if we make both TCG > and KVM have similar behavior: > > * qemu64: a conservative default that should work out of the box on > most systems, for both TCG and KVM. That's already the current status, > we just need to document it. > > * -cpu host: for people who want every possible feature to be enabled > (but without cross-version live-migration support). We can easily add > support for "-cpu host" to TCG, too. This means that "-cpu host" has different meanings in KVM and TCG. Is that an advantage or a disadvantage? If I have to choose blindly, I'd rather give different (but sane) meanings to "-cpu qemu64" and the same meanings to "-cpu host"... Basically "-cpu qemu32/64" on KVM would be changed automatically to kvm32/64. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/6] target-i386: Make most CPU models work with "enforce" out of the box
On Wed, Aug 27, 2014 at 03:36:51PM +0200, Paolo Bonzini wrote: > Il 26/08/2014 20:01, Eduardo Habkost ha scritto: > > On Tue, Aug 26, 2014 at 02:56:21PM +0200, Paolo Bonzini wrote: > >> Il 25/08/2014 22:45, Eduardo Habkost ha scritto: > >>> > >>> TCG users expect the default CPU model to contain most TCG-supported > >>> features > >>> (and it makes sense). See, for example, commit > >>> f1e00a9cf326acc1f2386a72525af8859852e1df. > >> > >> It doesn't though (SMAP is the most egregious omission, and probably the > >> main reason why people use QEMU TCG these days), and it raises the > >> question of backwards-compatibility of qemu64---should we disable TCG > >> features in old machine types? Probably yes, but we've never done that. > > > > Had we changed qemu64, any changes to the feature set of qemu64 would > > probably require compatibility code on old machine-types for KVM, > > anyway. But the last time qemu64 was changed was in 2009 (commit > > f1e00a9cf326acc1f2386a72525af8859852e1df), it looks like everybody was > > afraid of touching "qemu64" because its purpose was not very clear. > > > > So maybe that's good news, as things can be simpler if we make both TCG > > and KVM have similar behavior: > > > > * qemu64: a conservative default that should work out of the box on > > most systems, for both TCG and KVM. That's already the current status, > > we just need to document it. > > > > * -cpu host: for people who want every possible feature to be enabled > > (but without cross-version live-migration support). We can easily add > > support for "-cpu host" to TCG, too. > > This means that "-cpu host" has different meanings in KVM and TCG. Is > that an advantage or a disadvantage? It is the same meaning to me: "enable everything that's possible, considering what's provided by the underlying accelerator". The "host" name is misleading, though, because on KVM it is close to the host CPU, but on TCG it depends solely on TCG's capabilities. > > If I have to choose blindly, I'd rather give different (but sane) > meanings to "-cpu qemu64" and the same meanings to "-cpu host"... > Basically "-cpu qemu32/64" on KVM would be changed automatically to > kvm32/64. This (different meanings to qemu64) is what I was proposing first, except for the "same meaning to -cpu host" part. What exactly would you expect "-cpu host" to mean on TCG? -- Eduardo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: x86: sync old tmr with ioapic to update
Il 27/08/2014 16:05, Wei Wang ha scritto: > kvm_ioapic_scan_entry() needs to update tmr. The previous lapic tmr value > (old_tmr) needs to sync with ioapic to get an accurate updated tmr > value before the updating work. > > Tested-by: Rongrong Liu > Signed-off-by: Yang Zhang > Signed-off-by: Wei Wang This is also a very terse commit message. As mentioned in the review of the other patch, I'm not sure this change is correct, but in any case here is how a better commit message could have looked like: According to the Intel manuals, TMR is only modified upon "acceptance of an interrupt into the IRR". Currently, this is not what KVM does; any IOAPIC scan will modify the TMR. The TMR is used to track whether an EOI message needs to be sent to the IOAPIC. In KVM, this means that we need to add the vector to the EOI exit bitmap, and in fact the next patch will use the TMR exactly for this purpose. However, if we change the TMR value for an active interrupt we risk missing an EOI, similar to the scenario fixed by commit 0f6c0a740b7d (KVM: x86: always exit on EOIs for interrupts listed in the IOAPIC redir table, 2014-07-30). This patch ensures that the TMR does not change between acceptance of an interrupt into the IRR and the corresponding EOI cycle; to do this, we mix values from the old TMR (where ISR|IRR=1) and from the IOAPIC's redirection table (where ISR|IRR=0 in the LAPIC). We still deviate from the spec by setting a value for the TMR even when the corresponding bit in IRR|ISR is 0, but that's mostly invisible to guests. Paolo > --- > arch/x86/kvm/lapic.c | 19 +-- > arch/x86/kvm/x86.c |2 +- > 2 files changed, 18 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index 08e8a89..8c1162d 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -518,10 +518,25 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu) > void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr) > { > struct kvm_lapic *apic = vcpu->arch.apic; > + u32 irr; > + u32 isr; > + u32 old_tmr, new_tmr; > int i; > > - for (i = 0; i < 8; i++) > - apic_set_reg(apic, APIC_TMR + 0x10 * i, tmr[i]); > + /* > + * The updated tmr value comes from level-triggerd interrupts that > + * have already been delieverd to lapic and new coming ones which > + * are pending in ioapic. According to the x86 spec, tmr is valid > + * when irr or isr is set. > + */ > + for (i = 0; i < 8; i++) { > + irr = kvm_apic_get_reg(apic, APIC_IRR + 0x10 * i); > + isr = kvm_apic_get_reg(apic, APIC_ISR + 0x10 * i); > + old_tmr = kvm_apic_get_reg(apic, APIC_TMR + 0x10 * i); > + new_tmr = (~(irr | isr) & tmr[i]) > + | ((irr | isr) & old_tmr); > + apic_set_reg(apic, APIC_TMR + 0x10 * i, new_tmr); > + } > } > > static void apic_update_ppr(struct kvm_lapic *apic) > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 5f5edb6..d401684 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -5991,8 +5991,8 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu) > memset(tmr, 0, 32); > > kvm_ioapic_scan_entry(vcpu, eoi_exit_bitmap, tmr); > - kvm_x86_ops->load_eoi_exitmap(vcpu, eoi_exit_bitmap); > kvm_apic_update_tmr(vcpu, tmr); > + kvm_x86_ops->load_eoi_exitmap(vcpu, eoi_exit_bitmap); > } > > /* > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: x86: keep eoi exit bitmap accurate before loading it.
Il 27/08/2014 16:05, Wei Wang ha scritto: > Guest may mask the IOAPIC entry before issue EOI. In such case, > EOI will not be intercepted by the hypervisor, since the corresponding > bit in eoi_exit_bitmap is not set after the masking of IOAPIC entry. > > The solution here is to OR eoi_exit_bitmap with tmr to make sure that > all level-triggered interrupts have their bits in eoi_exit_bitmap set. This commit message does not explain why this change is necessary, and the relationship between this patch and the previous one. For example: -- Commit 0f6c0a740b7d (KVM: x86: always exit on EOIs for interrupts listed in the IOAPIC redir table, 2014-07-30) fixed an APICv bug where an incorrect EOI exit bitmap triggered an interrupt storm inside the guest. There is a corner case for which that patch would have disabled accelerated EOI unnecessarily. Suppose you have: - a device that was the sole user of an INTx interrupt and is hot-unplugged - an OS that masks the INTx interrupt entry in the IOAPIC after the unplug - another device that uses MSI and is subsequently hot-plugged If the OS chooses to reuse the same LAPIC interrupt vector for the two interrupts, the patch would have left the vector in the EOI exit bitmap, because KVM takes into account the stale entry in the IOAPIC redirection table. We do know exactly which masked interrupts are still in-service and thus require broadcasting an EOI to the IOAPIC: this information is in the TMR. So, this patch ORs the EOI exit bitmap provided by the ioapic with the TMR register. Thanks to the previous patch, an active level-triggered interrupt will always be included in the EOI exit bitmap. -- However, see below. > Tested-by: Rongrong Liu > Signed-off-by: Yang Zhang > Signed-off-by: Wei Wang > --- > arch/x86/kvm/lapic.c | 12 > arch/x86/kvm/lapic.h |1 + > arch/x86/kvm/x86.c |1 + > virt/kvm/ioapic.c|7 --- > 4 files changed, 18 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index 8c1162d..0fcac3c 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -539,6 +539,18 @@ void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr) > } > } > > +void kvm_apic_update_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap) > +{ > + struct kvm_lapic *apic = vcpu->arch.apic; > + u32 i; > + u32 tmr; > + > + for (i = 0; i < 8; i++) { > + tmr = kvm_apic_get_reg(apic, APIC_TMR + 0x10 * i); > + *((u32 *)eoi_exit_bitmap + i) |= tmr; > + } > +} > + > static void apic_update_ppr(struct kvm_lapic *apic) > { > u32 tpr, isrv, ppr, old_ppr; > diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h > index 6a11845..d2b96f2 100644 > --- a/arch/x86/kvm/lapic.h > +++ b/arch/x86/kvm/lapic.h > @@ -55,6 +55,7 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu); > > void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr); > void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir); > +void kvm_apic_update_eoi_exitmap(struct kvm_vcpu *vcpu, u64 > *eoi_exit_bitmap); > int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest); > int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda); > int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq, > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index d401684..d23b558 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -5992,6 +5992,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu) > > kvm_ioapic_scan_entry(vcpu, eoi_exit_bitmap, tmr); > kvm_apic_update_tmr(vcpu, tmr); > + kvm_apic_update_eoi_exitmap(vcpu, eoi_exit_bitmap); > kvm_x86_ops->load_eoi_exitmap(vcpu, eoi_exit_bitmap); > } > > diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c > index e8ce34c..ed15936 100644 > --- a/virt/kvm/ioapic.c > +++ b/virt/kvm/ioapic.c > @@ -254,9 +254,10 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 > *eoi_exit_bitmap, > spin_lock(&ioapic->lock); > for (index = 0; index < IOAPIC_NUM_PINS; index++) { > e = &ioapic->redirtbl[index]; > - if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG || > - kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, > index) || > - index == RTC_GSI) { > + if ((!e->fields.mask > + && e->fields.trig_mode == IOAPIC_LEVEL_TRIG) > + || kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, > + index) || index == RTC_GSI) { > if (kvm_apic_match_dest(vcpu, NULL, 0, > e->fields.dest_id, e->fields.dest_mode)) { > __set_bit(e->fields.vector, > There's still something missing here. Suppose you have the following: Program edge-triggered MSI for vector 123 Interrupt comes in, ISR[123]=1 Mask MSI Program level-triggered IOAPI
Re: [PATCH v2 0/6] target-i386: Make most CPU models work with "enforce" out of the box
Il 27/08/2014 16:05, Eduardo Habkost ha scritto: > On Wed, Aug 27, 2014 at 03:36:51PM +0200, Paolo Bonzini wrote: >> Il 26/08/2014 20:01, Eduardo Habkost ha scritto: >>> So maybe that's good news, as things can be simpler if we make both TCG >>> and KVM have similar behavior: >>> >>> * qemu64: a conservative default that should work out of the box on >>> most systems, for both TCG and KVM. That's already the current status, >>> we just need to document it. >>> >>> * -cpu host: for people who want every possible feature to be enabled >>> (but without cross-version live-migration support). We can easily add >>> support for "-cpu host" to TCG, too. >> >> This means that "-cpu host" has different meanings in KVM and TCG. Is >> that an advantage or a disadvantage? > > It is the same meaning to me: "enable everything that's possible, > considering what's provided by the underlying accelerator". The "host" > name is misleading, though, because on KVM it is close to the host CPU, > but on TCG it depends solely on TCG's capabilities. True. It's not very intuitive, but it is the same concept for processor capabilities. Though for some leaves that do not correspond to processor capabilities, "-cpu host" does set them to the host values. This is not just the cache model, but also the family/model/stepping/vendor. For the TCG case, when running on a Nehalem it would be weird to see a Nehalem guest with SMAP or ADOX support... I'm not sure it would even work to have SVM with an Intel vendor. :) >> If I have to choose blindly, I'd rather give different (but sane) >> meanings to "-cpu qemu64" and the same meanings to "-cpu host"... >> Basically "-cpu qemu32/64" on KVM would be changed automatically to >> kvm32/64. > > This (different meanings to qemu64) is what I was proposing first, Good. > except for the "same meaning to -cpu host" part. What exactly would you > expect "-cpu host" to mean on TCG? Emulate (as much as possible of) a SandyBridge if I'm running on a SandyBridge, etc. "-cpu qemu64" would be the best CPU that TCG can do, with a standard family/model/stepping/vendor slapped on top. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: s390/mm: fix up indentation of set_guest_storage_key
Il 27/08/2014 12:20, Christian Borntraeger ha scritto: > commit ab3f285f227f ("KVM: s390/mm: try a cow on read only pages for > key ops")' misaligned a code block. Let's fixup the indentation. > > Reported-by: Ben Hutchings > Signed-off-by: Christian Borntraeger > --- > arch/s390/mm/pgtable.c | 12 ++-- > 1 file changed, 6 insertions(+), 6 deletions(-) > > diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c > index 5404a62..1570dbd 100644 > --- a/arch/s390/mm/pgtable.c > +++ b/arch/s390/mm/pgtable.c > @@ -994,13 +994,13 @@ retry: > } > if (!(pte_val(*ptep) & _PAGE_INVALID) && >(pte_val(*ptep) & _PAGE_PROTECT)) { > - pte_unmap_unlock(*ptep, ptl); > - if (fixup_user_fault(current, mm, addr, > FAULT_FLAG_WRITE)) { > - up_read(&mm->mmap_sem); > - return -EFAULT; > - } > - goto retry; > + pte_unmap_unlock(*ptep, ptl); > + if (fixup_user_fault(current, mm, addr, FAULT_FLAG_WRITE)) { > + up_read(&mm->mmap_sem); > + return -EFAULT; > } > + goto retry; > + } > > new = old = pgste_get_lock(ptep); > pgste_val(new) &= ~(PGSTE_GR_BIT | PGSTE_GC_BIT | > Applying this patch, thanks. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2 v2] add check parameter to run_tests configuration
In unittests.cfg one can add a line like the following: check = /proc/sys/kernel/nmi_watchdog=0 /proc/sys/kernel/ostype=Linux run_tests.sh will now check for those values (if defined) and only run the test if all conditions are true. Signed-off-by: Chris J Arges --- run_tests.sh | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/run_tests.sh b/run_tests.sh index 4758573..d37e0ec 100755 --- a/run_tests.sh +++ b/run_tests.sh @@ -18,6 +18,7 @@ function run() local kernel="$4" local opts="$5" local arch="$6" +local check="$7" if [ -z "$testname" ]; then return @@ -32,6 +33,18 @@ function run() return fi +# check a file for a particular value before running a test +# the check line can contain multiple files to check separated by a space +# but each check parameter needs to be of the form = +for check_param in ${check[@]}; do +path=${check_param%%=*} +value=${check_param#*=} +if [ $path ] && [[ $(cat $path) != $value ]]; then +echo "skip $1 ($path not equal to $value)" +return +fi +done + cmdline="./$TEST_DIR-run $kernel -smp $smp $opts" if [ $verbose != 0 ]; then echo $cmdline @@ -57,18 +70,20 @@ function run_all() local opts local groups local arch +local check exec {config_fd}<$config while read -u $config_fd line; do if [[ "$line" =~ ^\[(.*)\]$ ]]; then -run "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" +run "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" testname=${BASH_REMATCH[1]} smp=1 kernel="" opts="" groups="" arch="" +check="" elif [[ $line =~ ^file\ *=\ *(.*)$ ]]; then kernel=$TEST_DIR/${BASH_REMATCH[1]} elif [[ $line =~ ^smp\ *=\ *(.*)$ ]]; then @@ -79,10 +94,12 @@ function run_all() groups=${BASH_REMATCH[1]} elif [[ $line =~ ^arch\ *=\ *(.*)$ ]]; then arch=${BASH_REMATCH[1]} +elif [[ $line =~ ^check\ *=\ *(.*)$ ]]; then +check=${BASH_REMATCH[1]} fi done -run "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" +run "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" exec {config_fd}<&- } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pert stat in KVM guest can not get LLC-loads hardware cache event
Hi, Kan, Thanks for your reply. 1. If the guest is non-paravirt, can I get the LLC-loads number? 2. Do you know any method that can capture the LLC-loads for the guest? Thanks. - Hui On Wed, Aug 27, 2014 at 9:05 AM, Liang, Kan wrote: > > > > > > > Dear KVM developers: > > I am trying use perf stat inside a VM to obtain some hardware cache > > performance counter values. > > The perf stat can report some numbers for L1 and TLB related counters. But > > for the LLC-loads and LLC-load-misses, the numbers are always 0. It seems > > that the these offcore events are not exposed to the guest. > > > > Is this a bug in Qemu or KVM? > > > > There is no offcore virtualization support in KVM yet. > For you case, I guess you are using paravirt for guest, so it should be 0. > Otherwise, you should get #GP in guest. > > > > My testbed is > > > > Host kernel: 3.12.26 > > Qemu: 2.1.0 > > CPU: Intel Ivy bridge 2620 > > VM boosted by qemu, with -cpu host. > > > > Thanks. > > > > - Hui Kang -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2 v3] add check parameter to run_tests configuration
In unittests.cfg one can add a line like the following: check = /proc/sys/kernel/nmi_watchdog=0 /proc/sys/kernel/ostype=Linux run_tests.sh will now check for those values (if defined) and only run the test if all conditions are true. Signed-off-by: Chris J Arges --- run_tests.sh | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/run_tests.sh b/run_tests.sh index 4758573..e48f1db 100755 --- a/run_tests.sh +++ b/run_tests.sh @@ -18,6 +18,7 @@ function run() local kernel="$4" local opts="$5" local arch="$6" +local check="$7" if [ -z "$testname" ]; then return @@ -32,6 +33,18 @@ function run() return fi +# check a file for a particular value before running a test +# the check line can contain multiple files to check separated by a space +# but each check parameter needs to be of the form = +for check_param in ${check[@]}; do +path=${check_param%%=*} +value=${check_param#*=} +if [ "$path" ] && [ "$(cat $path)" != "$value" ]; then +echo "skip $1 ($path not equal to $value)" +return +fi +done + cmdline="./$TEST_DIR-run $kernel -smp $smp $opts" if [ $verbose != 0 ]; then echo $cmdline @@ -57,18 +70,20 @@ function run_all() local opts local groups local arch +local check exec {config_fd}<$config while read -u $config_fd line; do if [[ "$line" =~ ^\[(.*)\]$ ]]; then -run "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" +run "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" testname=${BASH_REMATCH[1]} smp=1 kernel="" opts="" groups="" arch="" +check="" elif [[ $line =~ ^file\ *=\ *(.*)$ ]]; then kernel=$TEST_DIR/${BASH_REMATCH[1]} elif [[ $line =~ ^smp\ *=\ *(.*)$ ]]; then @@ -79,10 +94,12 @@ function run_all() groups=${BASH_REMATCH[1]} elif [[ $line =~ ^arch\ *=\ *(.*)$ ]]; then arch=${BASH_REMATCH[1]} +elif [[ $line =~ ^check\ *=\ *(.*)$ ]]; then +check=${BASH_REMATCH[1]} fi done -run "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" +run "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" exec {config_fd}<&- } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2 v3] add check parameter to run_tests configuration
Il 27/08/2014 16:45, Chris J Arges ha scritto: > In unittests.cfg one can add a line like the following: > check = /proc/sys/kernel/nmi_watchdog=0 /proc/sys/kernel/ostype=Linux > > run_tests.sh will now check for those values (if defined) and only run > the test if all conditions are true. > > Signed-off-by: Chris J Arges > --- > run_tests.sh | 21 +++-- > 1 file changed, 19 insertions(+), 2 deletions(-) > > diff --git a/run_tests.sh b/run_tests.sh > index 4758573..e48f1db 100755 > --- a/run_tests.sh > +++ b/run_tests.sh > @@ -18,6 +18,7 @@ function run() > local kernel="$4" > local opts="$5" > local arch="$6" > +local check="$7" > > if [ -z "$testname" ]; then > return > @@ -32,6 +33,18 @@ function run() > return > fi > > +# check a file for a particular value before running a test > +# the check line can contain multiple files to check separated by a space > +# but each check parameter needs to be of the form = > +for check_param in ${check[@]}; do > +path=${check_param%%=*} > +value=${check_param#*=} > +if [ "$path" ] && [ "$(cat $path)" != "$value" ]; then > +echo "skip $1 ($path not equal to $value)" > +return > +fi > +done > + > cmdline="./$TEST_DIR-run $kernel -smp $smp $opts" > if [ $verbose != 0 ]; then > echo $cmdline > @@ -57,18 +70,20 @@ function run_all() > local opts > local groups > local arch > +local check > > exec {config_fd}<$config > > while read -u $config_fd line; do > if [[ "$line" =~ ^\[(.*)\]$ ]]; then > -run "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" > +run "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" > "$check" > testname=${BASH_REMATCH[1]} > smp=1 > kernel="" > opts="" > groups="" > arch="" > +check="" > elif [[ $line =~ ^file\ *=\ *(.*)$ ]]; then > kernel=$TEST_DIR/${BASH_REMATCH[1]} > elif [[ $line =~ ^smp\ *=\ *(.*)$ ]]; then > @@ -79,10 +94,12 @@ function run_all() > groups=${BASH_REMATCH[1]} > elif [[ $line =~ ^arch\ *=\ *(.*)$ ]]; then > arch=${BASH_REMATCH[1]} > +elif [[ $line =~ ^check\ *=\ *(.*)$ ]]; then > +check=${BASH_REMATCH[1]} > fi > done > > -run "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" > +run "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" > > exec {config_fd}<&- > } > Thanks, looks good. Are there more failures? Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pert stat in KVM guest can not get LLC-loads hardware cache event
Hi, Kan, The dTLB-load-misses is 0, but it shows 80.00%hit, does that mean the TLB-load miss is 0.8 * (dTLB-loads). Thanks. Performance counter stats for './parsecmgmt -a run -i native -c gcc-hooks -n 1 -p freqmine': 0 dTLB-load-misses #0.00% of all dTLB cache hits [80.00%] 782,565,273,315 dTLB-loads [80.00%] 782,552,911,616 L1-dcache-loads [80.00%] 5,810,697,456 L1-dcache-load-misses #0.74% of all L1-dcache hits [80.00%] 2,145,907,209 L1-dcache-prefetch-misses [80.00%] - Hui On Wed, Aug 27, 2014 at 9:05 AM, Liang, Kan wrote: > > >> >> Dear KVM developers: >> I am trying use perf stat inside a VM to obtain some hardware cache >> performance counter values. >> The perf stat can report some numbers for L1 and TLB related counters. But >> for the LLC-loads and LLC-load-misses, the numbers are always 0. It seems >> that the these offcore events are not exposed to the guest. >> >> Is this a bug in Qemu or KVM? >> > > There is no offcore virtualization support in KVM yet. > For you case, I guess you are using paravirt for guest, so it should be 0. > Otherwise, you should get #GP in guest. > > >> My testbed is >> >> Host kernel: 3.12.26 >> Qemu: 2.1.0 >> CPU: Intel Ivy bridge 2620 >> VM boosted by qemu, with -cpu host. >> >> Thanks. >> >> - Hui Kang -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 0/9] KVM-VFIO IRQ forward control
On 08/26/2014 07:49 PM, Alex Williamson wrote: > On Mon, 2014-08-25 at 15:27 +0200, Eric Auger wrote: >> This RFC proposes an integration of "ARM: Forwarding physical >> interrupts to a guest VM" (http://lwn.net/Articles/603514/) in >> KVM. >> >> It enables to transform a VFIO platform driver IRQ into a forwarded >> IRQ. The direct benefit is that, for a level sensitive IRQ, a VM >> switch can be avoided on guest virtual IRQ completion. Before this >> patch, a maintenance IRQ was triggered on the virtual IRQ completion. >> >> When the IRQ is forwarded, the VFIO platform driver does not need to >> disable the IRQ anymore. Indeed when returning from the IRQ handler >> the IRQ is not deactivated. Only its priority is lowered. This means >> the same IRQ cannot hit before the guest completes the virtual IRQ >> and the GIC automatically deactivates the corresponding physical IRQ. >> >> Besides, the injection still is based on irqfd triggering. The only >> impact on irqfd process is resamplefd is not called anymore on >> virtual IRQ completion since this latter becomes "transparent". >> >> The current integration is based on an extension of the KVM-VFIO >> device, previously used by KVM to interact with VFIO groups. The >> patch serie now enables KVM to directly interact with a VFIO >> platform device. The VFIO external API was extended for that purpose. >> >> Th KVM-VFIO device can get/put the vfio platform device, check its >> integrity and type, get the IRQ number associated to an IRQ index. >> >> The KVM-VFIO is extended with an architecture specific implementation. >> IRQ forward control is implemented in the ARM specific part. >> >> from a user point of view, the functionality is provided through new >> KVM-VFIO device commands, KVM_DEV_VFIO_DEVICE_(DE)ASSIGN_IRQ >> and the capability can be checked with KVM_HAS_DEVICE_ATTR. >> Assignment can only be changed when the physical IRQ is not active. >> It is the responsability of the user to do this check. >> >> This patch serie has the following dependencies: >> - "ARM: Forwarding physical interrupts to a guest VM" >> (http://lwn.net/Articles/603514/) in >> - [PATCH v2] irqfd for ARM >> which itself depends on >> - arm/arm64: KVM: Various VGIC cleanups and improvements >> >> http://lists.infradead.org/pipermail/linux-arm-kernel/2014-June/263685.html >> - and obviously the VFIO platform driver serie: >> [RFC PATCH v6 00/20] VFIO support for platform devices on ARM >> https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html >> >> Integrated pieces can be found at >> git://git.linaro.org/people/eric.auger/linux.git >> on branch 3.17rc1_forward_integ_v0 >> >> This was was tested on Calxeda Miday, assigning the xgmac main IRQ. > > Presumably this optimization should provide lower interrupt exit latency > and lower CPU overhead since we avoid the entire EOI path of the > resampler. Does it? It seems like there should be a measurable > improvement with something like netperf TCP_RR with this series. > Thanks, Hi Alex, I will publish some performance figures soon. I am currently missing a second node to run netserver. My preliminary understanding is perf improvement will come from 1) reduction of EOI latency 2) potential saving of VM switches, with 2) depending on thephysical IRQ rate. VERY HIGH RATE: Without the patch (traditional irqfd on ARM with maintenance IRQ): guest completes the vIRQ -> maintenance IRQ handler -> guest->host VM switch -> resamplefd trigger (virqfd) -> enable physical IRQ -> new physical IRQ hits -> VFIO handler -> fd trigger -> injection in guest -> host->guest VM switch with the patch guest completes the vIRQ -> GIC completes the physical IRQ -> new physical IRQ hits -> guest->host VM switch -> VFIO handler -> fd trigger -> injection in guest -> host->guest VM switch => Same number of VM switches SLOWER RATE: Without the patch: guest completes the vIRQ -> maintenance IRQ handler -> guest->host VM switch -> resamplefd trigger (virqfd) -> enable physical IRQ [host ..] host->guest VM switch [guest ..] physical IRQ hits -> guest->host VM switch -> VFIO handler -> fd trigger -> injection in guest -> host->guest VM switch With that patch: guest completes the vIRQ -> GIC completes the physical IRQ [guest ..] physical IRQ hits -> guest->host VM switch -> VFIO handler -> fd trigger -> injection in guest -> host->guest VM switch Hence less VM switches with that patch. But it is also related to scheduling, relative load of host/guest... Any comment welcome! Best Regards Eric > > Alex > >> Eric Auger (9): >> KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded >> IRQ >> KVM: ARM: VGIC: add forwarded irq rbtree lock >> VFIO: platform: handler tests whether the IRQ is forwarded >> KVM: KVM-VFIO: update user API to program forwarded IRQ >> VFIO: Extend external user API >> KVM: KVM-VFIO: allow arch specific implementation >> KVM: KVM-VFIO: add new VFIO external API hooks >> KVM: KVM-
Re: [RFC 9/9] KVM: KVM_VFIO: ARM: implement irq forwarding control
On 08/26/2014 09:02 PM, Alex Williamson wrote: > On Mon, 2014-08-25 at 15:27 +0200, Eric Auger wrote: >> Implements ARM specific KVM-VFIO device group commands: >> - KVM_DEV_VFIO_DEVICE_ASSIGN_IRQ >> - KVM_DEV_VFIO_DEVICE_DEASSIGN_IRQ >> capability can be queried using KVM_HAS_DEVICE_ATTR. >> >> The new commands enable to set IRQ forwarding on/off for a given >> IRQ index of a VFIO platform device. >> >> as soon as a forwarded irq is set, a reference to the VFIO device >> is taken by the kvm-vfio device. >> >> The kvm-vfio device stores in the kvm_vfio_arch_data the list >> of "assigned" devices (kvm_vfio_device). Each kvm_vfio_device >> stores the list of assigned IRQs (potentially allowed a subset of >> IRQ to be forwarded) >> >> The kvm-vfio device programs both the GIC and vGIC. Also it >> clears the active bit on destruction, in case the guest did not >> do it itself. >> >> Changing the forwarded state is not allowed in the critical >> section starting from VFIO IRQ handler to LR programming. It is >> up to the client to take care of this. >> >> Signed-off-by: Eric Auger >> --- >> arch/arm/include/asm/kvm_host.h | 2 + >> arch/arm/kvm/Makefile | 2 +- >> arch/arm/kvm/kvm_vfio_arm.c | 599 >> >> 3 files changed, 602 insertions(+), 1 deletion(-) >> create mode 100644 arch/arm/kvm/kvm_vfio_arm.c > > I'm really happy that it seems like the kvm-vfio device is going to work > for you, but I think too much stuff is being pushed out to arch code > here. Exporting the interfaces in patches 7 & 8 are setting the stage > for duplicate code for anyone wanting to implement device attributes. > Instead, I think the core code should support the list of > kvm_vfio_devices with proper cleanup, and we should attempt to access > the kvm_vfio_ callbacks as little as possible from arch code. Thanks, OK. my next iteration will feature much more generic code. Thanks for the review Best Regards Eric > > Alex > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 4/9] KVM: KVM-VFIO: update user API to program forwarded IRQ
On 08/26/2014 09:01 PM, Alex Williamson wrote: > On Mon, 2014-08-25 at 15:27 +0200, Eric Auger wrote: >> add new device group commands: >> - KVM_DEV_VFIO_DEVICE_ASSIGN_IRQ and >> KVM_DEV_VFIO_DEVICE_DEASSIGN_IRQ >> >> which enable to turn forwarded IRQ mode on/off. >> >> Signed-off-by: Eric Auger >> --- >> Documentation/virtual/kvm/devices/vfio.txt | 25 + >> arch/arm/include/uapi/asm/kvm.h| 6 ++ >> include/uapi/linux/kvm.h | 3 +++ >> 3 files changed, 34 insertions(+) >> >> diff --git a/Documentation/virtual/kvm/devices/vfio.txt >> b/Documentation/virtual/kvm/devices/vfio.txt >> index ef51740..c8b3fa1 100644 >> --- a/Documentation/virtual/kvm/devices/vfio.txt >> +++ b/Documentation/virtual/kvm/devices/vfio.txt >> @@ -13,6 +13,7 @@ VFIO-group is held by KVM. >> >> Groups: >>KVM_DEV_VFIO_GROUP >> + KVM_DEV_VFIO_DEVICE >> >> KVM_DEV_VFIO_GROUP attributes: >>KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking >> @@ -20,3 +21,27 @@ KVM_DEV_VFIO_GROUP attributes: >> >> For each, kvm_device_attr.addr points to an int32_t file descriptor >> for the VFIO group. >> + >> +KVM_DEV_VFIO_DEVICE attributes: >> + KVM_DEV_VFIO_DEVICE_ASSIGN_IRQ >> + KVM_DEV_VFIO_DEVICE_DEASSIGN_IRQ >> + >> +For each, kvm_device_attr.addr points to an kvm_arch_forwarded_irq. >> +This user API makes possible to create a special IRQ handling mode, >> +currently supported only on ARM, where KVM and a VFIO platform driver >> +collaborate to improve IRQ handling performance. >> +fd represents the file descriptor of a valid VFIO device whose physical >> +IRQ, referenced by its irq_index is injected to the VM guest_irq. >> + >> +On ASSIGN_IRQ, KVM-VFIO device programs: >> +- the host, to not complete the physical IRQ itself. >> +- the GIC, to automatically complete the physical IRQ when the guest >> + completes the virtual IRQ >> +This avoid trapping the end-of-interrupt for level sensitive IRQ. >> + >> +On DEASSIGN_IRQ, one come back to the mode where the host completes the >> +physical IRQ and the guest only completes the virtual IRQ. >> + >> +It is up to the caller of this API to get the assurance the IRQ is not >> +outstanding when the ASSIGN/DEASSIGN is called. This could lead to some >> +inconsistency on who is going to complete the IRQ. > > Why not call these FORWARD/UNFORWARD or something since the operation > isn't really doing anything with assignment of the IRQ. The IRQ is > already "assigned", we're modifying the behavior. Sure I will change the name. > >> diff --git a/arch/arm/include/uapi/asm/kvm.h >> b/arch/arm/include/uapi/asm/kvm.h >> index 3034c66..1920b33 100644 >> --- a/arch/arm/include/uapi/asm/kvm.h >> +++ b/arch/arm/include/uapi/asm/kvm.h >> @@ -109,6 +109,12 @@ struct kvm_sync_regs { >> struct kvm_arch_memory_slot { >> }; >> >> +struct kvm_arch_forwarded_irq { >> +__u32 fd; /* file desciptor of the VFIO device */ >> +__u32 irq_index; /* platform device index of the IRQ */ > > The vfio-platform device IRQ index? ARM is the only implementation we > have of this, but to keep it generic maybe the comment should read "vfio > device IRQ index". I will replace by vfio device IRQ index. > >> +__u32 guest_irq; /* virtual IRQ number */ > > This would be a GSI or similar concept if we were on x86. ok for GSI now this naming seems better understood by the ARM community too;-) I'm a little > confused that were using an arch structure here rather than trying to > keep the kvm-vfio device interface neutral. I will move much more things on the generic side then then assuming that someone in the future may use such functionality. Actually the only ARM specific implementation is the GIC interrupt controller programming. As far as I see the rest is generic, in terms of API. Thanks Eric Maybe it makes sense, I'm > not sure. Thanks, > > Alex > >> +}; >> + >> /* If you need to interpret the index values, here is the key: */ >> #define KVM_REG_ARM_COPROC_MASK 0x0FFF >> #define KVM_REG_ARM_COPROC_SHIFT16 >> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h >> index cf3a2ff..b149ba8 100644 >> --- a/include/uapi/linux/kvm.h >> +++ b/include/uapi/linux/kvm.h >> @@ -954,6 +954,9 @@ struct kvm_device_attr { >> #define KVM_DEV_VFIO_GROUP 1 >> #define KVM_DEV_VFIO_GROUP_ADD1 >> #define KVM_DEV_VFIO_GROUP_DEL2 >> +#define KVM_DEV_VFIO_DEVICE2 >> +#define KVM_DEV_VFIO_DEVICE_ASSIGN_IRQ1 >> +#define KVM_DEV_VFIO_DEVICE_DEASSIGN_IRQ 2 >> #define KVM_DEV_TYPE_ARM_VGIC_V25 >> #define KVM_DEV_TYPE_FLIC 6 >> > > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 8/9] KVM: KVM-VFIO: add kvm_vfio_arch_data and accessors
On 08/26/2014 09:02 PM, Alex Williamson wrote: > On Mon, 2014-08-25 at 15:27 +0200, Eric Auger wrote: >> add a pointer to architecture specific data in kvm_vfio struct >> add accessors to keep kvm_vfio private >> >> Signed-off-by: Eric Auger >> --- >> arch/arm/include/asm/kvm_host.h | 8 >> virt/kvm/vfio.c | 21 + >> 2 files changed, 29 insertions(+) >> >> diff --git a/arch/arm/include/asm/kvm_host.h >> b/arch/arm/include/asm/kvm_host.h >> index 62cbf5b..4f1edbf 100644 >> --- a/arch/arm/include/asm/kvm_host.h >> +++ b/arch/arm/include/asm/kvm_host.h >> @@ -177,6 +177,14 @@ void kvm_vfio_device_put_external_user(struct >> vfio_device *vdev); >> int kvm_vfio_external_get_type(struct vfio_device *vdev); >> struct device *kvm_vfio_external_get_base_device(struct vfio_device *vdev); >> >> +struct kvm_vfio; >> +struct kvm_vfio_arch_data; >> +void kvm_vfio_device_set_arch_data(struct kvm_vfio *kv, >> + struct kvm_vfio_arch_data *ptr); >> +struct kvm_vfio_arch_data *kvm_vfio_device_get_arch_data(struct kvm_vfio >> *kv); >> +void kvm_vfio_lock(struct kvm_vfio *kv); >> +void kvm_vfio_unlock(struct kvm_vfio *kv); >> + >> /* We do not have shadow page tables, hence the empty hooks */ >> static inline int kvm_age_hva(struct kvm *kvm, unsigned long hva) >> { >> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c >> index f1c4e35..177b71e 100644 >> --- a/virt/kvm/vfio.c >> +++ b/virt/kvm/vfio.c >> @@ -28,6 +28,7 @@ struct kvm_vfio { >> struct list_head group_list; >> struct mutex lock; >> bool noncoherent; >> +struct kvm_vfio_arch_data *arch_data; >> }; >> >> static struct vfio_group *kvm_vfio_group_get_external_user(struct file >> *filep) >> @@ -338,6 +339,26 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 >> type) >> return 0; >> } >> >> +void kvm_vfio_device_set_arch_data(struct kvm_vfio *kv, >> + struct kvm_vfio_arch_data *ptr) >> +{ >> +kv->arch_data = ptr; >> +} >> + >> +struct kvm_vfio_arch_data *kvm_vfio_device_get_arch_data(struct kvm_vfio >> *kv) >> +{ > > My preference would be s/get_// ok > >> +return kv->arch_data; >> +} >> + >> +void kvm_vfio_lock(struct kvm_vfio *kv) >> +{ >> +mutex_lock(&kv->lock); >> +} >> + >> +void kvm_vfio_unlock(struct kvm_vfio *kv) >> +{ >> +mutex_unlock(&kv->lock); >> +} > > Gosh, what could go wrong... Hum sorry I did not understand what you meant here Thanks Eric > >> >> struct kvm_device_ops kvm_vfio_ops = { >> .name = "kvm-vfio", > > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 8/9] KVM: KVM-VFIO: add kvm_vfio_arch_data and accessors
On Wed, 2014-08-27 at 17:22 +0200, Eric Auger wrote: > On 08/26/2014 09:02 PM, Alex Williamson wrote: > > On Mon, 2014-08-25 at 15:27 +0200, Eric Auger wrote: > >> add a pointer to architecture specific data in kvm_vfio struct > >> add accessors to keep kvm_vfio private > >> > >> Signed-off-by: Eric Auger > >> --- > >> arch/arm/include/asm/kvm_host.h | 8 > >> virt/kvm/vfio.c | 21 + > >> 2 files changed, 29 insertions(+) > >> > >> diff --git a/arch/arm/include/asm/kvm_host.h > >> b/arch/arm/include/asm/kvm_host.h > >> index 62cbf5b..4f1edbf 100644 > >> --- a/arch/arm/include/asm/kvm_host.h > >> +++ b/arch/arm/include/asm/kvm_host.h > >> @@ -177,6 +177,14 @@ void kvm_vfio_device_put_external_user(struct > >> vfio_device *vdev); > >> int kvm_vfio_external_get_type(struct vfio_device *vdev); > >> struct device *kvm_vfio_external_get_base_device(struct vfio_device > >> *vdev); > >> > >> +struct kvm_vfio; > >> +struct kvm_vfio_arch_data; > >> +void kvm_vfio_device_set_arch_data(struct kvm_vfio *kv, > >> + struct kvm_vfio_arch_data *ptr); > >> +struct kvm_vfio_arch_data *kvm_vfio_device_get_arch_data(struct kvm_vfio > >> *kv); > >> +void kvm_vfio_lock(struct kvm_vfio *kv); > >> +void kvm_vfio_unlock(struct kvm_vfio *kv); > >> + > >> /* We do not have shadow page tables, hence the empty hooks */ > >> static inline int kvm_age_hva(struct kvm *kvm, unsigned long hva) > >> { > >> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c > >> index f1c4e35..177b71e 100644 > >> --- a/virt/kvm/vfio.c > >> +++ b/virt/kvm/vfio.c > >> @@ -28,6 +28,7 @@ struct kvm_vfio { > >>struct list_head group_list; > >>struct mutex lock; > >>bool noncoherent; > >> + struct kvm_vfio_arch_data *arch_data; > >> }; > >> > >> static struct vfio_group *kvm_vfio_group_get_external_user(struct file > >> *filep) > >> @@ -338,6 +339,26 @@ static int kvm_vfio_create(struct kvm_device *dev, > >> u32 type) > >>return 0; > >> } > >> > >> +void kvm_vfio_device_set_arch_data(struct kvm_vfio *kv, > >> + struct kvm_vfio_arch_data *ptr) > >> +{ > >> + kv->arch_data = ptr; > >> +} > >> + > >> +struct kvm_vfio_arch_data *kvm_vfio_device_get_arch_data(struct kvm_vfio > >> *kv) > >> +{ > > > > My preference would be s/get_// > ok > > > >> + return kv->arch_data; > >> +} > >> + > >> +void kvm_vfio_lock(struct kvm_vfio *kv) > >> +{ > >> + mutex_lock(&kv->lock); > >> +} > >> + > >> +void kvm_vfio_unlock(struct kvm_vfio *kv) > >> +{ > >> + mutex_unlock(&kv->lock); > >> +} > > > > Gosh, what could go wrong... > Hum sorry I did not understand what you meant here Sorry, I was just sarcastically noting that exposing an internal lock like this seems to be asking for trouble. As you rework it to pull more into the common code and generalize the architecture callouts, I hope we can avoid exporting these locks. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/6] target-i386: Make most CPU models work with "enforce" out of the box
On Wed, Aug 27, 2014 at 04:33:54PM +0200, Paolo Bonzini wrote: > Il 27/08/2014 16:05, Eduardo Habkost ha scritto: > > On Wed, Aug 27, 2014 at 03:36:51PM +0200, Paolo Bonzini wrote: > >> Il 26/08/2014 20:01, Eduardo Habkost ha scritto: > >>> So maybe that's good news, as things can be simpler if we make both TCG > >>> and KVM have similar behavior: > >>> > >>> * qemu64: a conservative default that should work out of the box on > >>> most systems, for both TCG and KVM. That's already the current status, > >>> we just need to document it. > >>> > >>> * -cpu host: for people who want every possible feature to be enabled > >>> (but without cross-version live-migration support). We can easily add > >>> support for "-cpu host" to TCG, too. > >> > >> This means that "-cpu host" has different meanings in KVM and TCG. Is > >> that an advantage or a disadvantage? > > > > It is the same meaning to me: "enable everything that's possible, > > considering what's provided by the underlying accelerator". The "host" > > name is misleading, though, because on KVM it is close to the host CPU, > > but on TCG it depends solely on TCG's capabilities. > > True. It's not very intuitive, but it is the same concept for processor > capabilities. > > Though for some leaves that do not correspond to processor capabilities, > "-cpu host" does set them to the host values. This is not just the > cache model, but also the family/model/stepping/vendor. > > For the TCG case, when running on a Nehalem it would be weird to see a > Nehalem guest with SMAP or ADOX support... I'm not sure it would even > work to have SVM with an Intel vendor. :) In that case, the best family/model/stepping/vendor choice depends on TCG capabilities (defined at compile time), not on the host CPU. ...and that proves your point: if we aren't even using the host CPU family/model/stepping, calling it "-cpu host" doesn't make much sense. If it is so different from the host model, we can call it "qemu64" (and do as you suggests below). > > >> If I have to choose blindly, I'd rather give different (but sane) > >> meanings to "-cpu qemu64" and the same meanings to "-cpu host"... > >> Basically "-cpu qemu32/64" on KVM would be changed automatically to > >> kvm32/64. > > > > This (different meanings to qemu64) is what I was proposing first, > > Good. > > > except for the "same meaning to -cpu host" part. What exactly would you > > expect "-cpu host" to mean on TCG? > > Emulate (as much as possible of) a SandyBridge if I'm running on a > SandyBridge, etc. > > "-cpu qemu64" would be the best CPU that TCG can do, with a standard > family/model/stepping/vendor slapped on top. Makes sense to me. -- Eduardo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 5/9] VFIO: Extend external user API
On 08/26/2014 09:02 PM, Alex Williamson wrote: > On Mon, 2014-08-25 at 15:27 +0200, Eric Auger wrote: >> New functions are added to be called from ARM KVM-VFIO device. >> >> - vfio_device_get_external_user enables to get a vfio device from >> its fd >> - vfio_device_put_external_user puts the vfio device >> - vfio_external_get_type enables to retrieve the type of the device >> (PCI or platform) >> - vfio_external_get_base_device enables to get the >> struct device*, useful to access the platform_device >> >> Signed-off-by: Eric Auger >> --- >> drivers/vfio/vfio.c | 35 +++ >> include/linux/vfio.h | 4 >> 2 files changed, 39 insertions(+) >> >> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c >> index 8e84471..c93b9e4 100644 >> --- a/drivers/vfio/vfio.c >> +++ b/drivers/vfio/vfio.c >> @@ -1401,6 +1401,41 @@ void vfio_group_put_external_user(struct vfio_group >> *group) >> } >> EXPORT_SYMBOL_GPL(vfio_group_put_external_user); >> >> +struct vfio_device *vfio_device_get_external_user(struct file *filep) >> +{ >> +struct vfio_device *vdev = filep->private_data; >> + >> +if (filep->f_op != &vfio_device_fops) >> +return ERR_PTR(-EINVAL); >> + >> +vfio_device_get(vdev); >> +return vdev; >> +} >> +EXPORT_SYMBOL_GPL(vfio_device_get_external_user); >> + >> +void vfio_device_put_external_user(struct vfio_device *vdev) >> +{ >> +vfio_device_put(vdev); >> +} >> +EXPORT_SYMBOL_GPL(vfio_device_put_external_user); >> + >> +int vfio_external_get_type(struct vfio_device *vdev) >> +{ >> +if (!strcmp(vdev->ops->name, "vfio-platform")) >> +return VFIO_DEVICE_FLAGS_PLATFORM; >> +else if (!strcmp(vdev->ops->name, "vfio-pci")) >> +return VFIO_DEVICE_FLAGS_PCI; >> +else >> +return -EINVAL; >> +} >> +EXPORT_SYMBOL_GPL(vfio_external_get_type); > > Returning the bit of the flag we use in get_device_info looks rather > sloppy here. Should we define a new enum for use with this? Actually, > is this interface even necessary? If we can get the struct device then > we can get the bus_type and keep vfio out of this. thanks for the nit. I will try to get rid of it using it. > > For both of these last two, I like to use the convention that where > there is a "get" there is a matching "put". These aren't reference > counting anything, so let's not use get in the name. I will rename. Thanks Eric > >> + >> +struct device *vfio_external_get_base_device(struct vfio_device *vdev) >> +{ >> +return vdev->dev; >> +} >> +EXPORT_SYMBOL_GPL(vfio_external_get_base_device); >> + > > Looks almost too simple, but reviewing the object lifecycles, this all > looks safe. Thanks, > > Alex > >> int vfio_external_user_iommu_id(struct vfio_group *group) >> { >> return iommu_group_id(group->iommu_group); >> diff --git a/include/linux/vfio.h b/include/linux/vfio.h >> index ffe04ed..19e98eb 100644 >> --- a/include/linux/vfio.h >> +++ b/include/linux/vfio.h >> @@ -99,6 +99,10 @@ extern void vfio_group_put_external_user(struct >> vfio_group *group); >> extern int vfio_external_user_iommu_id(struct vfio_group *group); >> extern long vfio_external_check_extension(struct vfio_group *group, >>unsigned long arg); >> +extern struct vfio_device *vfio_device_get_external_user(struct file >> *filep); >> +extern void vfio_device_put_external_user(struct vfio_device *vdev); >> +extern int vfio_external_get_type(struct vfio_device *vdev); >> +extern struct device *vfio_external_get_base_device(struct vfio_device >> *vdev); >> >> struct pci_dev; >> #ifdef CONFIG_EEH > > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 8/9] KVM: KVM-VFIO: add kvm_vfio_arch_data and accessors
On 08/27/2014 05:37 PM, Alex Williamson wrote: > On Wed, 2014-08-27 at 17:22 +0200, Eric Auger wrote: >> On 08/26/2014 09:02 PM, Alex Williamson wrote: >>> On Mon, 2014-08-25 at 15:27 +0200, Eric Auger wrote: add a pointer to architecture specific data in kvm_vfio struct add accessors to keep kvm_vfio private Signed-off-by: Eric Auger --- arch/arm/include/asm/kvm_host.h | 8 virt/kvm/vfio.c | 21 + 2 files changed, 29 insertions(+) diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 62cbf5b..4f1edbf 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -177,6 +177,14 @@ void kvm_vfio_device_put_external_user(struct vfio_device *vdev); int kvm_vfio_external_get_type(struct vfio_device *vdev); struct device *kvm_vfio_external_get_base_device(struct vfio_device *vdev); +struct kvm_vfio; +struct kvm_vfio_arch_data; +void kvm_vfio_device_set_arch_data(struct kvm_vfio *kv, + struct kvm_vfio_arch_data *ptr); +struct kvm_vfio_arch_data *kvm_vfio_device_get_arch_data(struct kvm_vfio *kv); +void kvm_vfio_lock(struct kvm_vfio *kv); +void kvm_vfio_unlock(struct kvm_vfio *kv); + /* We do not have shadow page tables, hence the empty hooks */ static inline int kvm_age_hva(struct kvm *kvm, unsigned long hva) { diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c index f1c4e35..177b71e 100644 --- a/virt/kvm/vfio.c +++ b/virt/kvm/vfio.c @@ -28,6 +28,7 @@ struct kvm_vfio { struct list_head group_list; struct mutex lock; bool noncoherent; + struct kvm_vfio_arch_data *arch_data; }; static struct vfio_group *kvm_vfio_group_get_external_user(struct file *filep) @@ -338,6 +339,26 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type) return 0; } +void kvm_vfio_device_set_arch_data(struct kvm_vfio *kv, + struct kvm_vfio_arch_data *ptr) +{ + kv->arch_data = ptr; +} + +struct kvm_vfio_arch_data *kvm_vfio_device_get_arch_data(struct kvm_vfio *kv) +{ >>> >>> My preference would be s/get_// >> ok >>> + return kv->arch_data; +} + +void kvm_vfio_lock(struct kvm_vfio *kv) +{ + mutex_lock(&kv->lock); +} + +void kvm_vfio_unlock(struct kvm_vfio *kv) +{ + mutex_unlock(&kv->lock); +} >>> >>> Gosh, what could go wrong... >> Hum sorry I did not understand what you meant here > > Sorry, I was just sarcastically noting that exposing an internal lock > like this seems to be asking for trouble. As you rework it to pull more > into the common code and generalize the architecture callouts, I hope we > can avoid exporting these locks. Thanks, ok thanks. No problem I learnt a new word ;-) > > Alex > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/6] target-i386: Make most CPU models work with "enforce" out of the box
Am 27.08.2014 17:42, schrieb Eduardo Habkost: > On Wed, Aug 27, 2014 at 04:33:54PM +0200, Paolo Bonzini wrote: >> Il 27/08/2014 16:05, Eduardo Habkost ha scritto: >>> On Wed, Aug 27, 2014 at 03:36:51PM +0200, Paolo Bonzini wrote: Il 26/08/2014 20:01, Eduardo Habkost ha scritto: > So maybe that's good news, as things can be simpler if we make both TCG > and KVM have similar behavior: > > * qemu64: a conservative default that should work out of the box on > most systems, for both TCG and KVM. That's already the current status, > we just need to document it. > > * -cpu host: for people who want every possible feature to be enabled > (but without cross-version live-migration support). We can easily add > support for "-cpu host" to TCG, too. This means that "-cpu host" has different meanings in KVM and TCG. Is that an advantage or a disadvantage? >>> >>> It is the same meaning to me: "enable everything that's possible, >>> considering what's provided by the underlying accelerator". The "host" >>> name is misleading, though, because on KVM it is close to the host CPU, >>> but on TCG it depends solely on TCG's capabilities. >> >> True. It's not very intuitive, but it is the same concept for processor >> capabilities. >> >> Though for some leaves that do not correspond to processor capabilities, >> "-cpu host" does set them to the host values. This is not just the >> cache model, but also the family/model/stepping/vendor. >> >> For the TCG case, when running on a Nehalem it would be weird to see a >> Nehalem guest with SMAP or ADOX support... I'm not sure it would even >> work to have SVM with an Intel vendor. :) > > In that case, the best family/model/stepping/vendor choice depends on > TCG capabilities (defined at compile time), not on the host CPU. > > ...and that proves your point: if we aren't even using the host CPU > family/model/stepping, calling it "-cpu host" doesn't make much sense. > If it is so different from the host model, we can call it "qemu64" (and > do as you suggests below). Might that be an opportunity to reconsider a -cpu best or so, independent of its implementation, to avoid "host"? Regards, Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/6] target-i386: Make most CPU models work with "enforce" out of the box
Am 27.08.2014 18:08, schrieb Eduardo Habkost: > On Wed, Aug 27, 2014 at 05:58:49PM +0200, Andreas Färber wrote: >> Am 27.08.2014 17:42, schrieb Eduardo Habkost: >>> On Wed, Aug 27, 2014 at 04:33:54PM +0200, Paolo Bonzini wrote: Il 27/08/2014 16:05, Eduardo Habkost ha scritto: > On Wed, Aug 27, 2014 at 03:36:51PM +0200, Paolo Bonzini wrote: >> Il 26/08/2014 20:01, Eduardo Habkost ha scritto: >>> So maybe that's good news, as things can be simpler if we make both TCG >>> and KVM have similar behavior: >>> >>> * qemu64: a conservative default that should work out of the box on >>> most systems, for both TCG and KVM. That's already the current status, >>> we just need to document it. >>> >>> * -cpu host: for people who want every possible feature to be enabled >>> (but without cross-version live-migration support). We can easily add >>> support for "-cpu host" to TCG, too. >> >> This means that "-cpu host" has different meanings in KVM and TCG. Is >> that an advantage or a disadvantage? > > It is the same meaning to me: "enable everything that's possible, > considering what's provided by the underlying accelerator". The "host" > name is misleading, though, because on KVM it is close to the host CPU, > but on TCG it depends solely on TCG's capabilities. True. It's not very intuitive, but it is the same concept for processor capabilities. Though for some leaves that do not correspond to processor capabilities, "-cpu host" does set them to the host values. This is not just the cache model, but also the family/model/stepping/vendor. For the TCG case, when running on a Nehalem it would be weird to see a Nehalem guest with SMAP or ADOX support... I'm not sure it would even work to have SVM with an Intel vendor. :) >>> >>> In that case, the best family/model/stepping/vendor choice depends on >>> TCG capabilities (defined at compile time), not on the host CPU. >>> >>> ...and that proves your point: if we aren't even using the host CPU >>> family/model/stepping, calling it "-cpu host" doesn't make much sense. >>> If it is so different from the host model, we can call it "qemu64" (and >>> do as you suggests below). >> >> Might that be an opportunity to reconsider a -cpu best or so, >> independent of its implementation, to avoid "host"? > > It depends on what you expect "-cpu best" to mean. I have seen different > meanings being proposed for it. > > IIRC, "best" was proposed to mean "choose the best one from the existing > (predefined) CPU models", not "enable everything possible, not even > looking at the CPU model table". > > Anyway, it makes sense to have a name for the "enable everything" mode > (whatever it is), and simply make "qemu64" an alias to it when in TCG > mode. > > (If we didn't have existing libvirt code assuming "qemu64" is always the > default in QEMU, we could simply get rid of "qemu64" and use better > names. We may get rid of "qemu64" later, but we need to provide a way > for libvirt to stop using it, first.) My "or so" referring to, e.g., -cpu optimum or -cpu maximum or whatever we come up with that is a little more telling than "qemu64" or "host". Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/6] target-i386: Make most CPU models work with "enforce" out of the box
On Wed, Aug 27, 2014 at 05:58:49PM +0200, Andreas Färber wrote: > Am 27.08.2014 17:42, schrieb Eduardo Habkost: > > On Wed, Aug 27, 2014 at 04:33:54PM +0200, Paolo Bonzini wrote: > >> Il 27/08/2014 16:05, Eduardo Habkost ha scritto: > >>> On Wed, Aug 27, 2014 at 03:36:51PM +0200, Paolo Bonzini wrote: > Il 26/08/2014 20:01, Eduardo Habkost ha scritto: > > So maybe that's good news, as things can be simpler if we make both TCG > > and KVM have similar behavior: > > > > * qemu64: a conservative default that should work out of the box on > > most systems, for both TCG and KVM. That's already the current status, > > we just need to document it. > > > > * -cpu host: for people who want every possible feature to be enabled > > (but without cross-version live-migration support). We can easily add > > support for "-cpu host" to TCG, too. > > This means that "-cpu host" has different meanings in KVM and TCG. Is > that an advantage or a disadvantage? > >>> > >>> It is the same meaning to me: "enable everything that's possible, > >>> considering what's provided by the underlying accelerator". The "host" > >>> name is misleading, though, because on KVM it is close to the host CPU, > >>> but on TCG it depends solely on TCG's capabilities. > >> > >> True. It's not very intuitive, but it is the same concept for processor > >> capabilities. > >> > >> Though for some leaves that do not correspond to processor capabilities, > >> "-cpu host" does set them to the host values. This is not just the > >> cache model, but also the family/model/stepping/vendor. > >> > >> For the TCG case, when running on a Nehalem it would be weird to see a > >> Nehalem guest with SMAP or ADOX support... I'm not sure it would even > >> work to have SVM with an Intel vendor. :) > > > > In that case, the best family/model/stepping/vendor choice depends on > > TCG capabilities (defined at compile time), not on the host CPU. > > > > ...and that proves your point: if we aren't even using the host CPU > > family/model/stepping, calling it "-cpu host" doesn't make much sense. > > If it is so different from the host model, we can call it "qemu64" (and > > do as you suggests below). > > Might that be an opportunity to reconsider a -cpu best or so, > independent of its implementation, to avoid "host"? It depends on what you expect "-cpu best" to mean. I have seen different meanings being proposed for it. IIRC, "best" was proposed to mean "choose the best one from the existing (predefined) CPU models", not "enable everything possible, not even looking at the CPU model table". Anyway, it makes sense to have a name for the "enable everything" mode (whatever it is), and simply make "qemu64" an alias to it when in TCG mode. (If we didn't have existing libvirt code assuming "qemu64" is always the default in QEMU, we could simply get rid of "qemu64" and use better names. We may get rid of "qemu64" later, but we need to provide a way for libvirt to stop using it, first.) -- Eduardo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/6] target-i386: Make most CPU models work with "enforce" out of the box
Il 27/08/2014 18:08, Eduardo Habkost ha scritto: > > Might that be an opportunity to reconsider a -cpu best or so, > > independent of its implementation, to avoid "host"? Nowadays we have CPU models added way before silicon is available, and "-cpu host" in practice should be migratable (the big exception being nested VMX and, when running on KVM, nested SVM). What would "-cpu best" be useful for? > It depends on what you expect "-cpu best" to mean. I have seen different > meanings being proposed for it. > > IIRC, "best" was proposed to mean "choose the best one from the existing > (predefined) CPU models", not "enable everything possible, not even > looking at the CPU model table". How do you define "best"? You could have a model that lacks feature F1 and a model that lacks feature F2. Adding features on top of an existing model is what libvirt's element does, and it's broken. It's broken because some features do not work unless you also bump the level (for example xsave, my favorite example for CPUID bugs, requires leaf 0xD to be present). > Anyway, it makes sense to have a name for the "enable everything" mode > (whatever it is), and simply make "qemu64" an alias to it when in TCG > mode. Or conversely, say "qemu64" is { baseline for KVM, enable-everything for TCG }. Then "-cpu best" and "-cpu qemu64" would effectively be synonyms on TCG. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: pert stat in KVM guest can not get LLC-loads hardware cache event
> > 1. If the guest is non-paravirt, can I get the LLC-loads number? No, the guest will crash. > 2. Do you know any method that can capture the LLC-loads for the guest? I don't know. > Thanks. >
RE: pert stat in KVM guest can not get LLC-loads hardware cache event
> > Hi, Kan, > > > > The dTLB-load-misses is 0, but it shows 80.00%hit, does that mean the > > TLB- load miss is 0.8 * (dTLB-loads). Thanks. > > > > Performance counter stats for './parsecmgmt -a run -i native -c > > gcc-hooks -n > > 1 -p freqmine': I'm not familiar with parsecmgmt. What’s the perf commands it calls? > > > > > > 0 dTLB-load-misses #0.00% of all dTLB > > cache hits [80.00%] > > > >782,565,273,315 dTLB-loads > > [80.00%] > > > >782,552,911,616 L1-dcache-loads > > [80.00%] > > > > 5,810,697,456 L1-dcache-load-misses #0.74% of all > > L1-dcache hits [80.00%] > > > > 2,145,907,209 L1-dcache-prefetch-misses > > [80.00%] > > > > - Hui > > > > On Wed, Aug 27, 2014 at 9:05 AM, Liang, Kan wrote: > > > > > > > > >> > > >> Dear KVM developers: > > >> I am trying use perf stat inside a VM to obtain some hardware cache > > >> performance counter values. > > >> The perf stat can report some numbers for L1 and TLB related > > >> counters. But for the LLC-loads and LLC-load-misses, the numbers > > >> are always 0. It seems that the these offcore events are not > > >> exposed to the > > guest. > > >> > > >> Is this a bug in Qemu or KVM? > > >> > > > > > > There is no offcore virtualization support in KVM yet. > > > For you case, I guess you are using paravirt for guest, so it should be 0. > > > Otherwise, you should get #GP in guest. > > > > > > > > >> My testbed is > > >> > > >> Host kernel: 3.12.26 > > >> Qemu: 2.1.0 > > >> CPU: Intel Ivy bridge 2620 > > >> VM boosted by qemu, with -cpu host. > > >> > > >> Thanks. > > >> > > >> - Hui Kang N�r��yb�X��ǧv�^�){.n�+h����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf
Re: [PATCH v2 0/6] target-i386: Make most CPU models work with "enforce" out of the box
On Wed, Aug 27, 2014 at 06:18:13PM +0200, Paolo Bonzini wrote: > Il 27/08/2014 18:08, Eduardo Habkost ha scritto: > > > Might that be an opportunity to reconsider a -cpu best or so, > > > independent of its implementation, to avoid "host"? > > Nowadays we have CPU models added way before silicon is available, and > "-cpu host" in practice should be migratable (the big exception being > nested VMX and, when running on KVM, nested SVM). What would "-cpu > best" be useful for? > > > It depends on what you expect "-cpu best" to mean. I have seen different > > meanings being proposed for it. > > > > IIRC, "best" was proposed to mean "choose the best one from the existing > > (predefined) CPU models", not "enable everything possible, not even > > looking at the CPU model table". > > How do you define "best"? You could have a model that lacks feature F1 > and a model that lacks feature F2. That's one reason we never implemented it. :) In other words: deciding what's really "best" is not something that can be decided by QEMU alone. > > Adding features on top of an existing model is what libvirt's mode='host-model'/> element does, and it's broken. It's broken because > some features do not work unless you also bump the level (for example > xsave, my favorite example for CPUID bugs, requires leaf 0xD to be present). I want to fix that. We have existing code to bump level when features on leaf 0x7 are present, and there's no reason we can't generalize that to all features that need a specific leaf to be present. We just need to be careful to keep backwards compatibility. > > > Anyway, it makes sense to have a name for the "enable everything" mode > > (whatever it is), and simply make "qemu64" an alias to it when in TCG > > mode. > > Or conversely, say "qemu64" is { baseline for KVM, enable-everything for > TCG }. Then "-cpu best" and "-cpu qemu64" would effectively be synonyms > on TCG. This is another way to see it. But I prefer to treat it as just a (temporary?) alias to a meaningful model name, because the only reason we are keeping the "qemu64" name (instead of simply making "-cpu best/maximum/whatever" the default on TCG and "-cpu kvm64" the default on KVM) is for compatibility with existing management code. -- Eduardo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pert stat in KVM guest can not get LLC-loads hardware cache event
Inside the guest, I am using "perf stat -e dTLB-load-misses -e dTLB-loads -e L1-dcache-loads -e L1-dcache-load-misses -e L1-dcache-prefetch-misses ", followed by the parsec command. Thanks. - Hui On Wed, Aug 27, 2014 at 12:28 PM, Liang, Kan wrote: > >> > Hi, Kan, >> > >> > The dTLB-load-misses is 0, but it shows 80.00%hit, does that mean the >> > TLB- load miss is 0.8 * (dTLB-loads). Thanks. >> > >> > Performance counter stats for './parsecmgmt -a run -i native -c >> > gcc-hooks -n >> > 1 -p freqmine': > > I'm not familiar with parsecmgmt. What’s the perf commands it calls? > >> > >> > >> > 0 dTLB-load-misses #0.00% of all dTLB >> > cache hits [80.00%] >> > >> >782,565,273,315 dTLB-loads >> > [80.00%] >> > >> >782,552,911,616 L1-dcache-loads >> > [80.00%] >> > >> > 5,810,697,456 L1-dcache-load-misses #0.74% of all >> > L1-dcache hits [80.00%] >> > >> > 2,145,907,209 L1-dcache-prefetch-misses >> > [80.00%] >> > >> > - Hui >> > >> > On Wed, Aug 27, 2014 at 9:05 AM, Liang, Kan wrote: >> > > >> > > >> > >> >> > >> Dear KVM developers: >> > >> I am trying use perf stat inside a VM to obtain some hardware cache >> > >> performance counter values. >> > >> The perf stat can report some numbers for L1 and TLB related >> > >> counters. But for the LLC-loads and LLC-load-misses, the numbers >> > >> are always 0. It seems that the these offcore events are not >> > >> exposed to the >> > guest. >> > >> >> > >> Is this a bug in Qemu or KVM? >> > >> >> > > >> > > There is no offcore virtualization support in KVM yet. >> > > For you case, I guess you are using paravirt for guest, so it should be >> > > 0. >> > > Otherwise, you should get #GP in guest. >> > > >> > > >> > >> My testbed is >> > >> >> > >> Host kernel: 3.12.26 >> > >> Qemu: 2.1.0 >> > >> CPU: Intel Ivy bridge 2620 >> > >> VM boosted by qemu, with -cpu host. >> > >> >> > >> Thanks. >> > >> >> > >> - Hui Kang -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: pert stat in KVM guest can not get LLC-loads hardware cache event
> > Inside the guest, I am using "perf stat -e dTLB-load-misses -e > > dTLB-loads -e L1-dcache-loads -e L1-dcache-load-misses -e > > L1-dcache-prefetch-misses ", followed by the parsec command. > > The misses/hit radio is the first number after "#". For your case, 0.00% is the misses/hit radio for dTLB cache. 0.74% is the misses/hit radio for L1dcache. I have no idea what does 80.00% mean. > > > > On Wed, Aug 27, 2014 at 12:28 PM, Liang, Kan wrote: > > > > > >> > Hi, Kan, > > >> > > > >> > The dTLB-load-misses is 0, but it shows 80.00%hit, does that mean > > >> > the > > >> > TLB- load miss is 0.8 * (dTLB-loads). Thanks. > > >> > > > >> > > > >> > 0 dTLB-load-misses #0.00% of all dTLB > > >> > cache hits [80.00%] > > >> > > > >> >782,565,273,315 dTLB-loads > > >> > [80.00%] > > >> > > > >> >782,552,911,616 L1-dcache-loads > > >> > [80.00%] > > >> > > > >> > 5,810,697,456 L1-dcache-load-misses #0.74% of all > > >> > L1-dcache hits [80.00%] > > >> > > > >> > 2,145,907,209 L1-dcache-prefetch-misses > > >> > [80.00%] > > >> > > > >> > - Hui
kvm-unit-test failures (was: [PATCH 1/2 v3] add check parameter to run_tests configuration)
> Thanks, looks good. Are there more failures? > > Paolo > Paolo, Thanks for applying those patches! I now only see the two failures on my machine: model name : Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz I'm running with the tip of kvm master: 0ac625df43ce9d085d4ff54c1f739611f4308b13 (Merge tag 'kvm-s390-20140825') sudo ./x86-run x86/apic.flat -smp 2 -cpu qemu64,+x2apic,+tsc-deadline | grep -v PASS qemu-system-x86_64 -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio -device pci-testdev -kernel x86/apic.flat -smp 2 -cpu qemu64,+x2apic,+tsc-deadline enabling apic enabling apic paging enabled cr0 = 80010011 cr3 = 7fff000 cr4 = 20 apic version: 1050014 x2apic enabled FAIL: tsc deadline timer clearing tsc deadline timer enabled SUMMARY: 16 tests, 1 unexpected failures Return value from qemu: 3 sudo ./x86-run x86/kvmclock_test.flat -smp 2 --append "1000 `date +%s`" qemu-system-x86_64 -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio -device pci-testdev -kernel x86/kvmclock_test.flat -smp 2 --append 1000 1409174399 enabling apic enabling apic kvm-clock: cpu 0, msr 0x:44d4c0 kvm-clock: cpu 0, msr 0x:44d4c0 Wallclock test, threshold 5 Seconds get from host: 1409174399 Seconds get from kvmclock: 1409173176 Offset:-1223 offset too large! Check the stability of raw cycle ... Worst warp -1222831419357 Total vcpus: 2 Test loops: 1000 Total warps: 1 Total stalls: 0 Worst warp: -1222831419357 Raw cycle is not stable Monotonic cycle test: Worst warp -1219118621614 Total vcpus: 2 Test loops: 1000 Total warps: 1 Total stalls: 0 Worst warp: -1219118621614 Measure the performance of raw cycle ... Total vcpus: 2 Test loops: 1000 TSC cycles: 1065145046 Measure the performance of adjusted cycle ... Total vcpus: 2 Test loops: 1000 TSC cycles: 1126981511 Return value from qemu: 3 Let me know if anything comes to mind. I can also look more deeply into these failures. Thanks, --chris j arges -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-unit-test failures (was: [PATCH 1/2 v3] add check parameter to run_tests configuration)
- Messaggio originale - > Da: "Chris J Arges" > A: "Paolo Bonzini" , kvm@vger.kernel.org > Inviato: Mercoledì, 27 agosto 2014 23:24:14 > Oggetto: kvm-unit-test failures (was: [PATCH 1/2 v3] add check parameter to > run_tests configuration) > > > > Thanks, looks good. Are there more failures? > > > > Paolo > > > > Paolo, > Thanks for applying those patches! > > I now only see the two failures on my machine: > model name : Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz > > I'm running with the tip of kvm master: > 0ac625df43ce9d085d4ff54c1f739611f4308b13 (Merge tag 'kvm-s390-20140825') > > sudo ./x86-run x86/apic.flat -smp 2 -cpu qemu64,+x2apic,+tsc-deadline | > grep -v PASS > qemu-system-x86_64 -enable-kvm -device pc-testdev -device > isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio > -device pci-testdev -kernel x86/apic.flat -smp 2 -cpu > qemu64,+x2apic,+tsc-deadline > enabling apic > enabling apic > paging enabled > cr0 = 80010011 > cr3 = 7fff000 > cr4 = 20 > apic version: 1050014 > x2apic enabled > FAIL: tsc deadline timer clearing > tsc deadline timer enabled This is fixed in kvm/next (3.18). > SUMMARY: 16 tests, 1 unexpected failures > Return value from qemu: 3 > > sudo ./x86-run x86/kvmclock_test.flat -smp 2 --append "1000 `date +%s`" > qemu-system-x86_64 -enable-kvm -device pc-testdev -device > isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio > -device pci-testdev -kernel x86/kvmclock_test.flat -smp 2 --append > 1000 1409174399 > enabling apic > enabling apic > kvm-clock: cpu 0, msr 0x:44d4c0 > kvm-clock: cpu 0, msr 0x:44d4c0 > Wallclock test, threshold 5 > Seconds get from host: 1409174399 > Seconds get from kvmclock: 1409173176 > Offset:-1223 Weird, your clock is 20 minutes behind in the VM than it is in the host. Is the offset always around -1200? What happens if you reboot? (I get 0, 1 or sometimes 2). Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 83381] New: 4-ports 82576 detect 2 ports when add "intel_iommu=on pci=assign-busses".
https://bugzilla.kernel.org/show_bug.cgi?id=83381 Bug ID: 83381 Summary: 4-ports 82576 detect 2 ports when add "intel_iommu=on pci=assign-busses". Product: Virtualization Version: unspecified Kernel Version: 3.17.0-rc1 Hardware: All OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: kvm Assignee: virtualization_...@kernel-bugs.osdl.org Reporter: chao.z...@intel.com Regression: No Environment: Host OS (ia32/ia32e/IA64):ia32e Guest OS (ia32/ia32e/IA64):ia32e Guest OS Type (Linux/Windows):linux kvm.git Commit:54ad89b05ec49b90790de814647b244d3d2cc5ca qemu.git Commit:c47c61be8dcd91689c8fc6db924d684c3b39 Host Kernel Version:3.17.0-rc1 Hardware:HSW-Desktop Bug detailed description: -- the 4-ports 82576 can detect 2 ports when add "intel_iommu=on pci=assign-busses" in kernel of grub. the 4-ports 82576 can detect 4 ports without "intel_iommu=on pci=assign-busses" when add "intel_iommu=on", 4-ports 82576 detected 4 ports. when add "pci=assign-busses" ,the 4-ports 82576 can detect 2 ports. note: kernel version: 3.0.0+ (kvm+qemu-kvm:e72ef590_fda19064) 1.the 4-ports 82576 can detect 4 ports when add "intel_iommu=on pci=assign-busses". 2. when add "intel_iommu=on" or "pci=assign-busses" to boot the system, 4-ports 82576 detected 4 ports Reproduce steps: 1. add "intel_iommu=on pci=assign-busses" in kernel of grub 2. boot up the system 3. lspci | grep Eth Current result: 4-ports 82576 detect 2 ports Expected result: 4-ports 82576 detect 4 ports Basic root-causing log: -- -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 83381] 4-ports 82576 detect 2 ports when add "intel_iommu=on pci=assign-busses".
https://bugzilla.kernel.org/show_bug.cgi?id=83381 --- Comment #2 from Zhou, Chao --- Created attachment 148571 --> https://bugzilla.kernel.org/attachment.cgi?id=148571&action=edit lspci good -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 83381] 4-ports 82576 detect 2 ports when add "intel_iommu=on pci=assign-busses".
https://bugzilla.kernel.org/show_bug.cgi?id=83381 --- Comment #1 from Zhou, Chao --- Created attachment 148561 --> https://bugzilla.kernel.org/attachment.cgi?id=148561&action=edit dmesg-good -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 83381] 4-ports 82576 detect 2 ports when add "intel_iommu=on pci=assign-busses".
https://bugzilla.kernel.org/show_bug.cgi?id=83381 --- Comment #4 from Zhou, Chao --- Created attachment 148591 --> https://bugzilla.kernel.org/attachment.cgi?id=148591&action=edit dmesg-bad -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 83381] 4-ports 82576 detect 2 ports when add "intel_iommu=on pci=assign-busses".
https://bugzilla.kernel.org/show_bug.cgi?id=83381 --- Comment #3 from Zhou, Chao --- Created attachment 148581 --> https://bugzilla.kernel.org/attachment.cgi?id=148581&action=edit lspci-t-good -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 83381] 4-ports 82576 detect 2 ports when add "intel_iommu=on pci=assign-busses".
https://bugzilla.kernel.org/show_bug.cgi?id=83381 --- Comment #5 from Zhou, Chao --- Created attachment 148601 --> https://bugzilla.kernel.org/attachment.cgi?id=148601&action=edit lspci-bad -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 83381] 4-ports 82576 detect 2 ports when add "intel_iommu=on pci=assign-busses".
https://bugzilla.kernel.org/show_bug.cgi?id=83381 --- Comment #6 from Zhou, Chao --- Created attachment 148611 --> https://bugzilla.kernel.org/attachment.cgi?id=148611&action=edit lspci-t-bad -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] KVM: PPC: BOOKE: Emulate debug registers and exception
On Wed, 2014-08-27 at 13:23 +0200, Alexander Graf wrote: > > On 13.08.14 11:09, Bharat Bhushan wrote: > > This patch emulates debug registers and debug exception > > to support guest using debug resource. This enables running > > gdb/kgdb etc in guest. > > > > On BOOKE architecture we cannot share debug resources between QEMU and > > guest because: > > When QEMU is using debug resources then debug exception must > > be always enabled. To achieve this we set MSR_DE and also set > > MSRP_DEP so guest cannot change MSR_DE. > > > > When emulating debug resource for guest we want guest > > to control MSR_DE (enable/disable debug interrupt on need). > > > > So above mentioned two configuration cannot be supported > > at the same time. So the result is that we cannot share > > debug resources between QEMU and Guest on BOOKE architecture. > > > > In the current design QEMU gets priority over guest, this means that if > > QEMU is using debug resources then guest cannot use them and if guest is > > using debug resource then QEMU can overwrite them. > > > > Signed-off-by: Bharat Bhushan > > Scott, could you please recheck whether you're ok with it now? :) I'm OK with it. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2] KVM: x86: keep eoi exit bitmap accurate before loading it.
Paolo Bonzini wrote on 2014-08-27: > Il 27/08/2014 16:05, Wei Wang ha scritto: > > Guest may mask the IOAPIC entry before issue EOI. In such case, EOI > > will not be intercepted by the hypervisor, since the corresponding bit > > in eoi_exit_bitmap is not set after the masking of IOAPIC entry. > > > > The solution here is to OR eoi_exit_bitmap with tmr to make sure that > > all level-triggered interrupts have their bits in eoi_exit_bitmap set. > > This commit message does not explain why this change is necessary, and the > relationship between this patch and the previous one. > > For example: > > -- > Commit 0f6c0a740b7d (KVM: x86: always exit on EOIs for interrupts listed in > the IOAPIC redir table, 2014-07-30) fixed an APICv bug where an incorrect EOI > exit bitmap triggered an interrupt storm inside the guest. > > There is a corner case for which that patch would have disabled accelerated > EOI unnecessarily. Suppose you have: > > - a device that was the sole user of an INTx interrupt and is hot-unplugged > > - an OS that masks the INTx interrupt entry in the IOAPIC after the unplug > > - another device that uses MSI and is subsequently hot-plugged > > If the OS chooses to reuse the same LAPIC interrupt vector for the two > interrupts, the patch would have left the vector in the EOI exit bitmap, > because > KVM takes into account the stale entry in the IOAPIC redirection table. > > We do know exactly which masked interrupts are still in-service and thus > require broadcasting an EOI to the IOAPIC: this information is in the TMR. > So, > this patch ORs the EOI exit bitmap provided by the ioapic with the TMR > register. > Thanks to the previous patch, an active level-triggered interrupt will always > be > included in the EOI exit bitmap. > -- > > However, see below. > > > Tested-by: Rongrong Liu > > Signed-off-by: Yang Zhang > > Signed-off-by: Wei Wang > > --- > > arch/x86/kvm/lapic.c | 12 > > arch/x86/kvm/lapic.h |1 + > > arch/x86/kvm/x86.c |1 + > > virt/kvm/ioapic.c|7 --- > > 4 files changed, 18 insertions(+), 3 deletions(-) > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index > > 8c1162d..0fcac3c 100644 > > --- a/arch/x86/kvm/lapic.c > > +++ b/arch/x86/kvm/lapic.c > > @@ -539,6 +539,18 @@ void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, > u32 *tmr) > > } > > } > > > > +void kvm_apic_update_eoi_exitmap(struct kvm_vcpu *vcpu, u64 > > +*eoi_exit_bitmap) { > > + struct kvm_lapic *apic = vcpu->arch.apic; > > + u32 i; > > + u32 tmr; > > + > > + for (i = 0; i < 8; i++) { > > + tmr = kvm_apic_get_reg(apic, APIC_TMR + 0x10 * i); > > + *((u32 *)eoi_exit_bitmap + i) |= tmr; > > + } > > +} > > + > > static void apic_update_ppr(struct kvm_lapic *apic) { > > u32 tpr, isrv, ppr, old_ppr; > > diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index > > 6a11845..d2b96f2 100644 > > --- a/arch/x86/kvm/lapic.h > > +++ b/arch/x86/kvm/lapic.h > > @@ -55,6 +55,7 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu); > > > > void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr); void > > kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir); > > +void kvm_apic_update_eoi_exitmap(struct kvm_vcpu *vcpu, u64 > > +*eoi_exit_bitmap); > > int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest); > > int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda); int > > kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq, > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index > > d401684..d23b558 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -5992,6 +5992,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu > > *vcpu) > > > > kvm_ioapic_scan_entry(vcpu, eoi_exit_bitmap, tmr); > > kvm_apic_update_tmr(vcpu, tmr); > > + kvm_apic_update_eoi_exitmap(vcpu, eoi_exit_bitmap); > > kvm_x86_ops->load_eoi_exitmap(vcpu, eoi_exit_bitmap); } > > > > diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index > > e8ce34c..ed15936 100644 > > --- a/virt/kvm/ioapic.c > > +++ b/virt/kvm/ioapic.c > > @@ -254,9 +254,10 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, > u64 *eoi_exit_bitmap, > > spin_lock(&ioapic->lock); > > for (index = 0; index < IOAPIC_NUM_PINS; index++) { > > e = &ioapic->redirtbl[index]; > > - if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG || > > - kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index) > || > > - index == RTC_GSI) { > > + if ((!e->fields.mask > > + && e->fields.trig_mode == IOAPIC_LEVEL_TRIG) > > + || kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, > > + index) || index == RTC_GSI) { > > if (kvm_apic_match_dest(vcpu, NULL, 0, > > e->fields.dest_id, e->fields.dest_mode)) { > > __set_bit(e->fields.