Fix Penguin Penalty 17th October2014 ( mail-archive.com )
Dear Sir Did your website get hit by Google Penguin update on October 17th 2014? What basically is Google Penguin Update? It is actually a code name for Google algorithm which aims at decreasing your websites search engine rankings that violate Googles guidelines by using black hat SEO techniques to rank your webpage by giving number of spammy links to the page. We are one of those few SEO companies that can help you avoid penalties from Google Updates like Penguin and Panda. Our clients have survived all the previous and present updates with ease. They have never been hit because we use 100% white hat SEO techniques to rank Webpages. Simple thing that we do to keep websites away from any Penguin or Panda penalties is follow Google guidelines and we give Google users the best answers to their queries. If you are looking to increase the quality of your websites and to get more targeted traffic or save your websites from these Google penalties email us back with your interest. We will be glad to serve you and help you grow your business. Regards Vince G SEO Manager ( TOB ) B7 Green Avenue, Amritsar 143001 Punjab NO CLICK in the subject to STOP EMAILS -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Possible approaches to limit csw overhead
Hello, I have a rather practical question, is it possible to limit amount of vm-initiated events for single VM? As and example, VM which experienced OOM and effectively stuck dead generates a lot of unnecessary context switches triggering do_raw_spin_lock very often and therefore increasing overall compute workload. This possibly can be done via reactive limitation of the cpu quota via cgroup, but such method is quite impractical because every orchestration solution will need to implement its own piece of code to detect such VM states and act properly. I wonder if there may be a proposal which will do this job better than userspace-implemented perf statistics loop. Thanks! -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nested KVM slower than QEMU with gnumach guest kernel
Jan Kiszka, le Mon 17 Nov 2014 07:28:23 +0100, a écrit : > > AIUI, the external interrupt is 0xf6, i.e. Linux' IRQ_WORK_VECTOR. I > > however don't see any of them, neither in L0's /proc/interrupts, nor in > > L1's /proc/interrupts... > > I suppose this is a SMP host and guest? L0 is a hyperthreaded quad-core, but L1 is only 1 VCPU. In the trace, L1 happens to have been apparently always scheduled on the same L0 CPU: trace-cmd tells me that CPU [0-24-7] are empty. Samuel -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nested KVM slower than QEMU with gnumach guest kernel
On Sun, Nov 16, 2014 at 11:18:28PM +0100, Samuel Thibault wrote: > Hello, > > Jan Kiszka, le Wed 12 Nov 2014 00:42:52 +0100, a écrit : > > On 2014-11-11 19:55, Samuel Thibault wrote: > > > jenkins.debian.net is running inside a KVM VM, and it runs nested > > > KVM guests for its installation attempts. This goes fine with Linux > > > kernels, but it is extremely slow with gnumach kernels. > > > You can try to catch a trace (ftrace) on the physical host. > > > > I suspect the setup forces a lot of instruction emulation, either on L0 > > or L1. And that is slower than QEMU is KVM does not optimize like QEMU does. > > Here is a sample of trace-cmd output dump: the same kind of pattern > repeats over and over, with EXTERNAL_INTERRUPT happening mostly > every other microsecond: > > qemu-system-x86-9752 [003] 4106.187755: kvm_exit: reason > EXTERNAL_INTERRUPT rip 0xa02848b1 info 0 80f6 > qemu-system-x86-9752 [003] 4106.187756: kvm_entry:vcpu 0 > qemu-system-x86-9752 [003] 4106.187757: kvm_exit: reason > EXTERNAL_INTERRUPT rip 0xa02848b1 info 0 80f6 > qemu-system-x86-9752 [003] 4106.187758: kvm_entry:vcpu 0 > qemu-system-x86-9752 [003] 4106.187759: kvm_exit: reason > EXTERNAL_INTERRUPT rip 0xa02848b1 info 0 80f6 > qemu-system-x86-9752 [003] 4106.187760: kvm_entry:vcpu 0 > > The various functions being interrupted are vmx_vcpu_run > (0xa02848b1 and 0xa0284972), handle_io > (0xa027ee62), vmx_get_cpl (0xa027a7de), > load_vmc12_host_state (0xa027ea31), native_read_tscp > (0x81050a84), native_write_msr_safe (0x81050aa6), > vmx_decache_cr0_guest_bits (0xa027a384), > vmx_handle_external_intr (0xa027a54d). > > AIUI, the external interrupt is 0xf6, i.e. Linux' IRQ_WORK_VECTOR. I > however don't see any of them, neither in L0's /proc/interrupts, nor in > L1's /proc/interrupts... > Do you know how gnumach timekeeping works? Does it have a timer that fires each 1ms? Which clock device is it using? -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nested KVM slower than QEMU with gnumach guest kernel
Gleb Natapov, le Mon 17 Nov 2014 10:58:45 +0200, a écrit : > Do you know how gnumach timekeeping works? Does it have a timer that fires > each 1ms? > Which clock device is it using? It uses the PIT every 10ms, in square mode (PIT_C0|PIT_SQUAREMODE|PIT_READMODE = 0x36). Samuel -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nested KVM slower than QEMU with gnumach guest kernel
On 2014-11-17 10:03, Samuel Thibault wrote: > Gleb Natapov, le Mon 17 Nov 2014 10:58:45 +0200, a écrit : >> Do you know how gnumach timekeeping works? Does it have a timer that fires >> each 1ms? >> Which clock device is it using? > > It uses the PIT every 10ms, in square mode > (PIT_C0|PIT_SQUAREMODE|PIT_READMODE = 0x36). Wow... how retro. That feature might be unsupported - does user space irqchip work better? Jan signature.asc Description: OpenPGP digital signature
Re: nested KVM slower than QEMU with gnumach guest kernel
Jan Kiszka, le Mon 17 Nov 2014 10:04:37 +0100, a écrit : > On 2014-11-17 10:03, Samuel Thibault wrote: > > Gleb Natapov, le Mon 17 Nov 2014 10:58:45 +0200, a écrit : > >> Do you know how gnumach timekeeping works? Does it have a timer that fires > >> each 1ms? > >> Which clock device is it using? > > > > It uses the PIT every 10ms, in square mode > > (PIT_C0|PIT_SQUAREMODE|PIT_READMODE = 0x36). > > Wow... how retro. That feature might be unsupported - does user space > irqchip work better? I had indeed tried giving -machine kernel_irqchip=off to the L2 kvm, with the same bad performance and external_interrupt in the trace. Samuel -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nested KVM slower than QEMU with gnumach guest kernel
On Mon, Nov 17, 2014 at 10:10:25AM +0100, Samuel Thibault wrote: > Jan Kiszka, le Mon 17 Nov 2014 10:04:37 +0100, a écrit : > > On 2014-11-17 10:03, Samuel Thibault wrote: > > > Gleb Natapov, le Mon 17 Nov 2014 10:58:45 +0200, a écrit : > > >> Do you know how gnumach timekeeping works? Does it have a timer that > > >> fires each 1ms? > > >> Which clock device is it using? > > > > > > It uses the PIT every 10ms, in square mode > > > (PIT_C0|PIT_SQUAREMODE|PIT_READMODE = 0x36). > > > > Wow... how retro. That feature might be unsupported - does user space > > irqchip work better? > > I had indeed tried giving -machine kernel_irqchip=off to the L2 kvm, > with the same bad performance and external_interrupt in the trace. > They are always be in the trace, but do you see them each ms or each 10ms with user space irqchip? -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 1/2] kvm: x86: mmu: return zero if s > e in rsvd_bits()
On 17/11/2014 02:34, Chen, Tiejun wrote: > On 2014/11/14 18:06, Paolo Bonzini wrote: >> >> >> On 14/11/2014 10:31, Tiejun Chen wrote: >>> In some real scenarios 'start' may not be less than 'end' like >>> maxphyaddr = 52. >>> >>> Signed-off-by: Tiejun Chen >>> --- >>> arch/x86/kvm/mmu.h | 2 ++ >>> 1 file changed, 2 insertions(+) >>> >>> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h >>> index bde8ee7..0e98b5e 100644 >>> --- a/arch/x86/kvm/mmu.h >>> +++ b/arch/x86/kvm/mmu.h >>> @@ -58,6 +58,8 @@ >>> >>> static inline u64 rsvd_bits(int s, int e) >>> { >>> +if (unlikely(s > e)) >>> +return 0; >>> return ((1ULL << (e - s + 1)) - 1) << s; >>> } >>> >>> >> >> s == e + 1 is supported: >> >> (1ULL << (e - (e + 1) + 1)) - 1) << s == > > (1ULL << (e - (e + 1) + 1)) - 1) << s > = (1ULL << (e - e - 1) + 1)) - 1) << s > = (1ULL << (-1) + 1)) - 1) << s no, ((1ULL << (-1 + 1)) - 1) << s > = (1ULL << (0) - 1) << s ((1ULL << (0)) - 1) << s > = (1ULL << (- 1) << s (1 - 1) << s 0 << s Paolo > > Am I missing something? > > Thanks > Tiejun > >> (1ULL << 0) << s == >> 0 >> >> Is there any case where s is even bigger? >> >> Paolo >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] KVM: simplification to the memslots code
On 17/11/2014 02:56, Takuya Yoshikawa wrote: >> > here are a few small patches that simplify __kvm_set_memory_region >> > and associated code. Can you please review them? > Ah, already queued. Sorry for being late to respond. While they are not in kvm/next, there's time to add Reviewed-by's and all that. kvm/queue basically means "I want Fengguang to compile-test them, some testing done on x86_64". Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 1/2] kvm: x86: mmu: return zero if s > e in rsvd_bits()
On 2014/11/17 17:22, Paolo Bonzini wrote: On 17/11/2014 02:34, Chen, Tiejun wrote: On 2014/11/14 18:06, Paolo Bonzini wrote: On 14/11/2014 10:31, Tiejun Chen wrote: In some real scenarios 'start' may not be less than 'end' like maxphyaddr = 52. Signed-off-by: Tiejun Chen --- arch/x86/kvm/mmu.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index bde8ee7..0e98b5e 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -58,6 +58,8 @@ static inline u64 rsvd_bits(int s, int e) { +if (unlikely(s > e)) +return 0; return ((1ULL << (e - s + 1)) - 1) << s; } s == e + 1 is supported: (1ULL << (e - (e + 1) + 1)) - 1) << s == (1ULL << (e - (e + 1) + 1)) - 1) << s = (1ULL << (e - e - 1) + 1)) - 1) << s = (1ULL << (-1) + 1)) - 1) << s no, You're right since I'm seeing "()" wrongly. Sorry to bother you. Thanks Tiejun -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] KVM: simplification to the memslots code
On 2014/11/17 18:23, Paolo Bonzini wrote: > > > On 17/11/2014 02:56, Takuya Yoshikawa wrote: here are a few small patches that simplify __kvm_set_memory_region and associated code. Can you please review them? >> Ah, already queued. Sorry for being late to respond. > > While they are not in kvm/next, there's time to add Reviewed-by's and > all that. kvm/queue basically means "I want Fengguang to compile-test > them, some testing done on x86_64". > > Paolo > OK. I reviewed patch 2/3 and 3/3, and saw no problem, some improvements, there. Takuya -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nested KVM slower than QEMU with gnumach guest kernel
Gleb Natapov, le Mon 17 Nov 2014 11:21:22 +0200, a écrit : > On Mon, Nov 17, 2014 at 10:10:25AM +0100, Samuel Thibault wrote: > > Jan Kiszka, le Mon 17 Nov 2014 10:04:37 +0100, a écrit : > > > On 2014-11-17 10:03, Samuel Thibault wrote: > > > > Gleb Natapov, le Mon 17 Nov 2014 10:58:45 +0200, a écrit : > > > >> Do you know how gnumach timekeeping works? Does it have a timer that > > > >> fires each 1ms? > > > >> Which clock device is it using? > > > > > > > > It uses the PIT every 10ms, in square mode > > > > (PIT_C0|PIT_SQUAREMODE|PIT_READMODE = 0x36). > > > > > > Wow... how retro. That feature might be unsupported - does user space > > > irqchip work better? > > > > I had indeed tried giving -machine kernel_irqchip=off to the L2 kvm, > > with the same bad performance and external_interrupt in the trace. > > > They are always be in the trace, but do you see them each ms or each 10ms > with user space irqchip? The external interupts are every 1 *microsecond, not millisecond. With irqchip=off or not. Samuel -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nested KVM slower than QEMU with gnumach guest kernel
Also, I have made gnumach show a timer counter, it does get PIT interrupts every 10ms as expected, not more often. Samuel -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vhost + multiqueue + RSS question.
On Mon, Nov 17, 2014 at 09:44:23AM +0200, Gleb Natapov wrote: > On Sun, Nov 16, 2014 at 08:56:04PM +0200, Michael S. Tsirkin wrote: > > On Sun, Nov 16, 2014 at 06:18:18PM +0200, Gleb Natapov wrote: > > > Hi Michael, > > > > > > I am playing with vhost multiqueue capability and have a question about > > > vhost multiqueue and RSS (receive side steering). My setup has Mellanox > > > ConnectX-3 NIC which supports multiqueue and RSS. Network related > > > parameters for qemu are: > > > > > >-netdev tap,id=hn0,script=qemu-ifup.sh,vhost=on,queues=4 > > >-device virtio-net-pci,netdev=hn0,id=nic1,mq=on,vectors=10 > > > > > > In a guest I ran "ethtool -L eth0 combined 4" to enable multiqueue. > > > > > > I am running one tcp stream into the guest using iperf. Since there is > > > only one tcp stream I expect it to be handled by one queue only but > > > this seams to be not the case. ethtool -S on a host shows that the > > > stream is handled by one queue in the NIC, just like I would expect, > > > but in a guest all 4 virtio-input interrupt are incremented. Am I > > > missing any configuration? > > > > I don't see anything obviously wrong with what you describe. > > Maybe, somehow, same irqfd got bound to multiple MSI vectors? > It does not look like this is what is happening judging by the way > interrupts are distributed between queues. They are not distributed > uniformly and often I see one queue gets most interrupt and others get > much less and then it changes. Weird. It would happen if you transmitted from multiple CPUs. You did pin iperf to a single CPU within guest, did you not? > -- > Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seeking a KVM benchmark
Hi Paolo, On 11/11/14, 1:28 AM, Paolo Bonzini wrote: On 10/11/2014 15:23, Avi Kivity wrote: It's not surprising [1]. Since the meaning of some PTE bits change [2], the TLB has to be flushed. In VMX we have VPIDs, so we only need to flush if EFER changed between two invocations of the same VPID, which isn't the case. If there need a TLB flush if guest is UP? Regards, Wanpeng Li [1] after the fact [2] although those bits were reserved with NXE=0, so they shouldn't have any TLB footprint You're right that this is not that surprising after the fact, and that both Sandy Bridge and Ivy Bridge have VPIDs (even the non-Xeon ones). This is also why I'm curious about the Nehalem. However note that even toggling the SCE bit is flushing the TLB. The NXE bit is not being toggled here! That's the more surprising part. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seeking a KVM benchmark
On 17/11/2014 12:17, Wanpeng Li wrote: >> >>> It's not surprising [1]. Since the meaning of some PTE bits change [2], >>> the TLB has to be flushed. In VMX we have VPIDs, so we only need to flush >>> if EFER changed between two invocations of the same VPID, which isn't >>> the case. > > If there need a TLB flush if guest is UP? The wrmsr is in the host, and the TLB flush is done in the processor microcode. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vhost + multiqueue + RSS question.
On Mon, Nov 17, 2014 at 12:38:16PM +0200, Michael S. Tsirkin wrote: > On Mon, Nov 17, 2014 at 09:44:23AM +0200, Gleb Natapov wrote: > > On Sun, Nov 16, 2014 at 08:56:04PM +0200, Michael S. Tsirkin wrote: > > > On Sun, Nov 16, 2014 at 06:18:18PM +0200, Gleb Natapov wrote: > > > > Hi Michael, > > > > > > > > I am playing with vhost multiqueue capability and have a question about > > > > vhost multiqueue and RSS (receive side steering). My setup has Mellanox > > > > ConnectX-3 NIC which supports multiqueue and RSS. Network related > > > > parameters for qemu are: > > > > > > > >-netdev tap,id=hn0,script=qemu-ifup.sh,vhost=on,queues=4 > > > >-device virtio-net-pci,netdev=hn0,id=nic1,mq=on,vectors=10 > > > > > > > > In a guest I ran "ethtool -L eth0 combined 4" to enable multiqueue. > > > > > > > > I am running one tcp stream into the guest using iperf. Since there is > > > > only one tcp stream I expect it to be handled by one queue only but > > > > this seams to be not the case. ethtool -S on a host shows that the > > > > stream is handled by one queue in the NIC, just like I would expect, > > > > but in a guest all 4 virtio-input interrupt are incremented. Am I > > > > missing any configuration? > > > > > > I don't see anything obviously wrong with what you describe. > > > Maybe, somehow, same irqfd got bound to multiple MSI vectors? > > It does not look like this is what is happening judging by the way > > interrupts are distributed between queues. They are not distributed > > uniformly and often I see one queue gets most interrupt and others get > > much less and then it changes. > > Weird. It would happen if you transmitted from multiple CPUs. > You did pin iperf to a single CPU within guest, did you not? > No, I didn't because I didn't expect it to matter for input interrupts. When I run iperf on a host rx queue that receives all packets depends only on a connection itself, not on a cpu iperf is running on (I tested that). When I pin iperf in a guest I do indeed see that all interrupts are arriving to the same irq vector. Is a number after virtio-input in /proc/interrupt any indication of a queue a packet arrived to (on a host I can use ethtool -S to check what queue receives packets, but unfortunately this does not work for virtio nic in a guest)? Because if it is the way RSS works in virtio is not how it works on a host and not what I would expect after reading about RSS. The queue a packets arrives to should be calculated by hashing fields from a packet header only. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC v2 0/9] KVM-VFIO IRQ forward control
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org > [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Alex Williamson > Sent: Thursday, September 11, 2014 1:10 PM > To: Christoffer Dall > Cc: Eric Auger; eric.au...@st.com; marc.zyng...@arm.com; > linux-arm-ker...@lists.infradead.org; kvm...@lists.cs.columbia.edu; > kvm@vger.kernel.org; joel.sch...@amd.com; kim.phill...@freescale.com; > pau...@samba.org; g...@kernel.org; pbonz...@redhat.com; > linux-ker...@vger.kernel.org; patc...@linaro.org; will.dea...@arm.com; > a.mota...@virtualopensystems.com; a.r...@virtualopensystems.com; > john.li...@huawei.com > Subject: Re: [RFC v2 0/9] KVM-VFIO IRQ forward control > > On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote: > > On Tue, Sep 02, 2014 at 03:05:41PM -0600, Alex Williamson wrote: > > > On Mon, 2014-09-01 at 14:52 +0200, Eric Auger wrote: > > > > This RFC proposes an integration of "ARM: Forwarding physical > > > > interrupts to a guest VM" (http://lwn.net/Articles/603514/) in > > > > KVM. > > > > > > > > It enables to transform a VFIO platform driver IRQ into a forwarded > > > > IRQ. The direct benefit is that, for a level sensitive IRQ, a VM > > > > switch can be avoided on guest virtual IRQ completion. Before this > > > > patch, a maintenance IRQ was triggered on the virtual IRQ completion. > > > > > > > > When the IRQ is forwarded, the VFIO platform driver does not need to > > > > disable the IRQ anymore. Indeed when returning from the IRQ handler > > > > the IRQ is not deactivated. Only its priority is lowered. This means > > > > the same IRQ cannot hit before the guest completes the virtual IRQ > > > > and the GIC automatically deactivates the corresponding physical IRQ. > > > > > > > > Besides, the injection still is based on irqfd triggering. The only > > > > impact on irqfd process is resamplefd is not called anymore on > > > > virtual IRQ completion since this latter becomes "transparent". > > > > > > > > The current integration is based on an extension of the KVM-VFIO > > > > device, previously used by KVM to interact with VFIO groups. The > > > > patch serie now enables KVM to directly interact with a VFIO > > > > platform device. The VFIO external API was extended for that purpose. > > > > > > > > Th KVM-VFIO device can get/put the vfio platform device, check its > > > > integrity and type, get the IRQ number associated to an IRQ index. > > > > > > > > The IRQ forward programming is architecture specific (virtual interrupt > > > > controller programming basically). However the whole infrastructure is > > > > kept generic. > > > > > > > > from a user point of view, the functionality is provided through new > > > > KVM-VFIO device commands, > KVM_DEV_VFIO_DEVICE_(UN)FORWARD_IRQ > > > > and the capability can be checked with KVM_HAS_DEVICE_ATTR. > > > > Assignment can only be changed when the physical IRQ is not active. > > > > It is the responsability of the user to do this check. > > > > > > > > This patch serie has the following dependencies: > > > > - "ARM: Forwarding physical interrupts to a guest VM" > > > > (http://lwn.net/Articles/603514/) in > > > > - [PATCH v3] irqfd for ARM > > > > - and obviously the VFIO platform driver serie: > > > > [RFC PATCH v6 00/20] VFIO support for platform devices on ARM > > > > https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html > > > > > > > > Integrated pieces can be found at > > > > ssh://git.linaro.org/people/eric.auger/linux.git > > > > on branch 3.17rc3_irqfd_forward_integ_v2 > > > > > > > > This was was tested on Calxeda Midway, assigning the xgmac main IRQ. > > > > > > > > v1 -> v2: > > > > - forward control is moved from architecture specific file into generic > > > > vfio.c module. > > > > only kvm_arch_set_fwd_state remains architecture specific > > > > - integrate Kim's patch which enables KVM-VFIO for ARM > > > > - fix vgic state bypass in vgic_queue_hwirq > > > > - struct kvm_arch_forwarded_irq moved from > arch/arm/include/uapi/asm/kvm.h > > > > to include/uapi/linux/kvm.h > > > > also irq_index renamed into index and guest_irq renamed into gsi > > > > - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD > > > > - vfio_external_get_base_device renamed into vfio_external_base_device > > > > - vfio_external_get_type removed > > > > - kvm_vfio_external_get_base_device renamed into > kvm_vfio_external_base_device > > > > - __KVM_HAVE_ARCH_KVM_VFIO renamed into > __KVM_HAVE_ARCH_KVM_VFIO_FORWARD > > > > > > > > Eric Auger (8): > > > > KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded > > > > IRQ > > > > KVM: ARM: VGIC: add forwarded irq rbtree lock > > > > VFIO: platform: handler tests whether the IRQ is forwarded > > > > KVM: KVM-VFIO: update user API to program forwarded IRQ > > > > VFIO: Extend external user API > > > > KVM: KVM-VFIO: add new VFIO external API hooks > > > > KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ > forwarding > > >
[v2][PATCH] kvm: x86: mmio: fix setting the present bit of mmio spte
In non-ept 64-bit of PAE case maxphyaddr may be 52bit as well, so we also need to disable mmio page fault. Here we can check MMIO_SPTE_GEN_HIGH_SHIFT directly to determine if we should set the present bit, and bring a little cleanup. Signed-off-by: Tiejun Chen --- v2: * Correct codes comments * Need to use "|=" to set the present bit arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/mmu.c | 25 + arch/x86/kvm/x86.c | 30 -- 3 files changed, 26 insertions(+), 30 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index dc932d3..667f2b6 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -809,6 +809,7 @@ void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn_offset, unsigned long mask); void kvm_mmu_zap_all(struct kvm *kvm); +void kvm_set_mmio_spte_mask(void); void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm); unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm); void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages); diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index ac1c4de..fe9a917 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -295,6 +295,31 @@ static bool check_mmio_spte(struct kvm *kvm, u64 spte) return likely(kvm_gen == spte_gen); } +/* + * Set the reserved bits and the present bit of an paging-structure + * entry to generate page fault with PFER.RSV = 1. + */ +void kvm_set_mmio_spte_mask(void) +{ + u64 mask; + int maxphyaddr = boot_cpu_data.x86_phys_bits; + + /* Mask the reserved physical address bits. */ + mask = rsvd_bits(maxphyaddr, MMIO_SPTE_GEN_HIGH_SHIFT - 1); + + /* Magic bits are always reserved to identify mmio spte. +* On 32 bit systems we have bit 62. +*/ + mask |= 0x3ull << 62; + + /* Set the present bit to enable mmio page fault. */ + if (maxphyaddr < MMIO_SPTE_GEN_HIGH_SHIFT) + mask |= PT_PRESENT_MASK; + + kvm_mmu_set_mmio_spte_mask(mask); +} +EXPORT_SYMBOL_GPL(kvm_set_mmio_spte_mask); + void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, u64 dirty_mask, u64 nx_mask, u64 x_mask) { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f85da5c..550f179 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5596,36 +5596,6 @@ void kvm_after_handle_nmi(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_after_handle_nmi); -static void kvm_set_mmio_spte_mask(void) -{ - u64 mask; - int maxphyaddr = boot_cpu_data.x86_phys_bits; - - /* -* Set the reserved bits and the present bit of an paging-structure -* entry to generate page fault with PFER.RSV = 1. -*/ -/* Mask the reserved physical address bits. */ - mask = rsvd_bits(maxphyaddr, 51); - - /* Bit 62 is always reserved for 32bit host. */ - mask |= 0x3ull << 62; - - /* Set the present bit. */ - mask |= 1ull; - -#ifdef CONFIG_X86_64 - /* -* If reserved bit is not supported, clear the present bit to disable -* mmio page fault. -*/ - if (maxphyaddr == 52) - mask &= ~1ull; -#endif - - kvm_mmu_set_mmio_spte_mask(mask); -} - #ifdef CONFIG_X86_64 static void pvclock_gtod_update_fn(struct work_struct *work) { -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v2][PATCH] kvm: x86: mmio: fix setting the present bit of mmio spte
On 17/11/2014 12:31, Tiejun Chen wrote: > In non-ept 64-bit of PAE case maxphyaddr may be 52bit as well, There is no such thing as 64-bit PAE. On 32-bit PAE hosts, PTEs have bit 62 reserved, as in your patch: > + /* Magic bits are always reserved for 32bit host. */ > + mask |= 0x3ull << 62; so there is no need to disable the MMIO page fault. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vhost + multiqueue + RSS question.
On Mon, Nov 17, 2014 at 01:22:07PM +0200, Gleb Natapov wrote: > On Mon, Nov 17, 2014 at 12:38:16PM +0200, Michael S. Tsirkin wrote: > > On Mon, Nov 17, 2014 at 09:44:23AM +0200, Gleb Natapov wrote: > > > On Sun, Nov 16, 2014 at 08:56:04PM +0200, Michael S. Tsirkin wrote: > > > > On Sun, Nov 16, 2014 at 06:18:18PM +0200, Gleb Natapov wrote: > > > > > Hi Michael, > > > > > > > > > > I am playing with vhost multiqueue capability and have a question > > > > > about > > > > > vhost multiqueue and RSS (receive side steering). My setup has > > > > > Mellanox > > > > > ConnectX-3 NIC which supports multiqueue and RSS. Network related > > > > > parameters for qemu are: > > > > > > > > > >-netdev tap,id=hn0,script=qemu-ifup.sh,vhost=on,queues=4 > > > > >-device virtio-net-pci,netdev=hn0,id=nic1,mq=on,vectors=10 > > > > > > > > > > In a guest I ran "ethtool -L eth0 combined 4" to enable multiqueue. > > > > > > > > > > I am running one tcp stream into the guest using iperf. Since there is > > > > > only one tcp stream I expect it to be handled by one queue only but > > > > > this seams to be not the case. ethtool -S on a host shows that the > > > > > stream is handled by one queue in the NIC, just like I would expect, > > > > > but in a guest all 4 virtio-input interrupt are incremented. Am I > > > > > missing any configuration? > > > > > > > > I don't see anything obviously wrong with what you describe. > > > > Maybe, somehow, same irqfd got bound to multiple MSI vectors? > > > It does not look like this is what is happening judging by the way > > > interrupts are distributed between queues. They are not distributed > > > uniformly and often I see one queue gets most interrupt and others get > > > much less and then it changes. > > > > Weird. It would happen if you transmitted from multiple CPUs. > > You did pin iperf to a single CPU within guest, did you not? > > > No, I didn't because I didn't expect it to matter for input interrupts. > When I run iperf on a host rx queue that receives all packets depends > only on a connection itself, not on a cpu iperf is running on (I tested > that). This really depends on the type of networking card you have on the host, and how it's configured. I think you will get something more closely resembling this behaviour if you enable RFS in host. > When I pin iperf in a guest I do indeed see that all interrupts > are arriving to the same irq vector. Is a number after virtio-input > in /proc/interrupt any indication of a queue a packet arrived to (on > a host I can use ethtool -S to check what queue receives packets, but > unfortunately this does not work for virtio nic in a guest)? I think it is. > Because if > it is the way RSS works in virtio is not how it works on a host and not > what I would expect after reading about RSS. The queue a packets arrives > to should be calculated by hashing fields from a packet header only. Yes, what virtio has is not RSS - it's an accelerated RFS really. The point is to try and take application locality into account. > -- > Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seeking a KVM benchmark
Hi Paolo, On 11/17/14, 7:18 PM, Paolo Bonzini wrote: On 17/11/2014 12:17, Wanpeng Li wrote: It's not surprising [1]. Since the meaning of some PTE bits change [2], the TLB has to be flushed. In VMX we have VPIDs, so we only need to flush if EFER changed between two invocations of the same VPID, which isn't the case. If there need a TLB flush if guest is UP? The wrmsr is in the host, and the TLB flush is done in the processor microcode. Sorry, maybe I didn't state my question clearly. As Avi mentioned above "In VMX we have VPIDs, so we only need to flush if EFER changed between two invocations of the same VPID", so there is only one VPID if the guest is UP, my question is if there need a TLB flush when guest's EFER has been changed? Regards, Wanpeng Li Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seeking a KVM benchmark
On 17/11/2014 13:00, Wanpeng Li wrote: > Sorry, maybe I didn't state my question clearly. As Avi mentioned above > "In VMX we have VPIDs, so we only need to flush if EFER changed between > two invocations of the same VPID", so there is only one VPID if the > guest is UP, my question is if there need a TLB flush when guest's EFER > has been changed? Yes, because the meaning of the page table entries has changed. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seeking a KVM benchmark
Hi Paolo, On 11/17/14, 8:04 PM, Paolo Bonzini wrote: On 17/11/2014 13:00, Wanpeng Li wrote: Sorry, maybe I didn't state my question clearly. As Avi mentioned above "In VMX we have VPIDs, so we only need to flush if EFER changed between two invocations of the same VPID", so there is only one VPID if the guest is UP, my question is if there need a TLB flush when guest's EFER has been changed? Yes, because the meaning of the page table entries has changed. So both VMX EFER writes and non-VMX EFER writes cause a TLB flush for UP guest, is there still a performance improvement in this case? Regards, Wanpeng Li Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seeking a KVM benchmark
On 17/11/2014 13:14, Wanpeng Li wrote: >> >>> Sorry, maybe I didn't state my question clearly. As Avi mentioned above >>> "In VMX we have VPIDs, so we only need to flush if EFER changed between >>> two invocations of the same VPID", so there is only one VPID if the >>> guest is UP, my question is if there need a TLB flush when guest's EFER >>> has been changed? >> Yes, because the meaning of the page table entries has changed. > > So both VMX EFER writes and non-VMX EFER writes cause a TLB flush for UP > guest, is there still a performance improvement in this case? Note that the guest's EFER does not change, so no TLB flush happens. The guest EFER, however, is different from the host's, so if you change it with a wrmsr in the host you will get a TLB flush on every userspace exit. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vhost + multiqueue + RSS question.
On Mon, Nov 17, 2014 at 01:58:20PM +0200, Michael S. Tsirkin wrote: > On Mon, Nov 17, 2014 at 01:22:07PM +0200, Gleb Natapov wrote: > > On Mon, Nov 17, 2014 at 12:38:16PM +0200, Michael S. Tsirkin wrote: > > > On Mon, Nov 17, 2014 at 09:44:23AM +0200, Gleb Natapov wrote: > > > > On Sun, Nov 16, 2014 at 08:56:04PM +0200, Michael S. Tsirkin wrote: > > > > > On Sun, Nov 16, 2014 at 06:18:18PM +0200, Gleb Natapov wrote: > > > > > > Hi Michael, > > > > > > > > > > > > I am playing with vhost multiqueue capability and have a question > > > > > > about > > > > > > vhost multiqueue and RSS (receive side steering). My setup has > > > > > > Mellanox > > > > > > ConnectX-3 NIC which supports multiqueue and RSS. Network related > > > > > > parameters for qemu are: > > > > > > > > > > > >-netdev tap,id=hn0,script=qemu-ifup.sh,vhost=on,queues=4 > > > > > >-device virtio-net-pci,netdev=hn0,id=nic1,mq=on,vectors=10 > > > > > > > > > > > > In a guest I ran "ethtool -L eth0 combined 4" to enable multiqueue. > > > > > > > > > > > > I am running one tcp stream into the guest using iperf. Since there > > > > > > is > > > > > > only one tcp stream I expect it to be handled by one queue only but > > > > > > this seams to be not the case. ethtool -S on a host shows that the > > > > > > stream is handled by one queue in the NIC, just like I would expect, > > > > > > but in a guest all 4 virtio-input interrupt are incremented. Am I > > > > > > missing any configuration? > > > > > > > > > > I don't see anything obviously wrong with what you describe. > > > > > Maybe, somehow, same irqfd got bound to multiple MSI vectors? > > > > It does not look like this is what is happening judging by the way > > > > interrupts are distributed between queues. They are not distributed > > > > uniformly and often I see one queue gets most interrupt and others get > > > > much less and then it changes. > > > > > > Weird. It would happen if you transmitted from multiple CPUs. > > > You did pin iperf to a single CPU within guest, did you not? > > > > > No, I didn't because I didn't expect it to matter for input interrupts. > > When I run iperf on a host rx queue that receives all packets depends > > only on a connection itself, not on a cpu iperf is running on (I tested > > that). > > This really depends on the type of networking card you have > on the host, and how it's configured. > > I think you will get something more closely resembling this > behaviour if you enable RFS in host. > > > When I pin iperf in a guest I do indeed see that all interrupts > > are arriving to the same irq vector. Is a number after virtio-input > > in /proc/interrupt any indication of a queue a packet arrived to (on > > a host I can use ethtool -S to check what queue receives packets, but > > unfortunately this does not work for virtio nic in a guest)? > > I think it is. > > > Because if > > it is the way RSS works in virtio is not how it works on a host and not > > what I would expect after reading about RSS. The queue a packets arrives > > to should be calculated by hashing fields from a packet header only. > > Yes, what virtio has is not RSS - it's an accelerated RFS really. > OK, if what virtio has is RFS and not RSS my test results make sense. Thanks! -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-unit-tests PATCH 0/6] arm: enable MMU
On 30/10/2014 16:56, Andrew Jones wrote: > This first patch of this series fixes a bug caused by attempting > to use spinlocks without enabling the MMU. The next three do some > prep for the fifth, and also fix arm's PAGE_ALIGN. The fifth is > prep for the sixth, which finally turns the MMU on for arm unit > tests. > > Andrew Jones (6): > arm: fix crash on cubietruck > lib: add ALIGN() macro > lib: steal const.h from kernel > arm: apply ALIGN() and const.h to arm files > arm: import some Linux page table API > arm: turn on the MMU > > arm/cstart.S| 33 +++ > config/config-arm.mak | 3 ++- > lib/alloc.c | 4 +-- > lib/arm/asm/mmu.h | 43 ++ > lib/arm/asm/page.h | 43 +++--- > lib/arm/asm/pgtable-hwdef.h | 65 > + > lib/arm/mmu.c | 53 > lib/arm/processor.c | 11 > lib/arm/setup.c | 3 +++ > lib/arm/spinlock.c | 7 + > lib/asm-generic/page.h | 17 ++-- > lib/const.h | 11 > lib/libcflat.h | 4 +++ > 13 files changed, 275 insertions(+), 22 deletions(-) > create mode 100644 lib/arm/asm/mmu.h > create mode 100644 lib/arm/asm/pgtable-hwdef.h > create mode 100644 lib/arm/mmu.c > create mode 100644 lib/const.h > Tested on CubieTruck and applied, thanks. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 0/9] KVM-VFIO IRQ forward control
Hi Feng, I will submit a PATCH v3 release end of this week. Best Regards Eric On 11/17/2014 12:25 PM, Wu, Feng wrote: > > >> -Original Message- >> From: linux-kernel-ow...@vger.kernel.org >> [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Alex Williamson >> Sent: Thursday, September 11, 2014 1:10 PM >> To: Christoffer Dall >> Cc: Eric Auger; eric.au...@st.com; marc.zyng...@arm.com; >> linux-arm-ker...@lists.infradead.org; kvm...@lists.cs.columbia.edu; >> kvm@vger.kernel.org; joel.sch...@amd.com; kim.phill...@freescale.com; >> pau...@samba.org; g...@kernel.org; pbonz...@redhat.com; >> linux-ker...@vger.kernel.org; patc...@linaro.org; will.dea...@arm.com; >> a.mota...@virtualopensystems.com; a.r...@virtualopensystems.com; >> john.li...@huawei.com >> Subject: Re: [RFC v2 0/9] KVM-VFIO IRQ forward control >> >> On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote: >>> On Tue, Sep 02, 2014 at 03:05:41PM -0600, Alex Williamson wrote: On Mon, 2014-09-01 at 14:52 +0200, Eric Auger wrote: > This RFC proposes an integration of "ARM: Forwarding physical > interrupts to a guest VM" (http://lwn.net/Articles/603514/) in > KVM. > > It enables to transform a VFIO platform driver IRQ into a forwarded > IRQ. The direct benefit is that, for a level sensitive IRQ, a VM > switch can be avoided on guest virtual IRQ completion. Before this > patch, a maintenance IRQ was triggered on the virtual IRQ completion. > > When the IRQ is forwarded, the VFIO platform driver does not need to > disable the IRQ anymore. Indeed when returning from the IRQ handler > the IRQ is not deactivated. Only its priority is lowered. This means > the same IRQ cannot hit before the guest completes the virtual IRQ > and the GIC automatically deactivates the corresponding physical IRQ. > > Besides, the injection still is based on irqfd triggering. The only > impact on irqfd process is resamplefd is not called anymore on > virtual IRQ completion since this latter becomes "transparent". > > The current integration is based on an extension of the KVM-VFIO > device, previously used by KVM to interact with VFIO groups. The > patch serie now enables KVM to directly interact with a VFIO > platform device. The VFIO external API was extended for that purpose. > > Th KVM-VFIO device can get/put the vfio platform device, check its > integrity and type, get the IRQ number associated to an IRQ index. > > The IRQ forward programming is architecture specific (virtual interrupt > controller programming basically). However the whole infrastructure is > kept generic. > > from a user point of view, the functionality is provided through new > KVM-VFIO device commands, >> KVM_DEV_VFIO_DEVICE_(UN)FORWARD_IRQ > and the capability can be checked with KVM_HAS_DEVICE_ATTR. > Assignment can only be changed when the physical IRQ is not active. > It is the responsability of the user to do this check. > > This patch serie has the following dependencies: > - "ARM: Forwarding physical interrupts to a guest VM" > (http://lwn.net/Articles/603514/) in > - [PATCH v3] irqfd for ARM > - and obviously the VFIO platform driver serie: > [RFC PATCH v6 00/20] VFIO support for platform devices on ARM > https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html > > Integrated pieces can be found at > ssh://git.linaro.org/people/eric.auger/linux.git > on branch 3.17rc3_irqfd_forward_integ_v2 > > This was was tested on Calxeda Midway, assigning the xgmac main IRQ. > > v1 -> v2: > - forward control is moved from architecture specific file into generic > vfio.c module. > only kvm_arch_set_fwd_state remains architecture specific > - integrate Kim's patch which enables KVM-VFIO for ARM > - fix vgic state bypass in vgic_queue_hwirq > - struct kvm_arch_forwarded_irq moved from >> arch/arm/include/uapi/asm/kvm.h > to include/uapi/linux/kvm.h > also irq_index renamed into index and guest_irq renamed into gsi > - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD > - vfio_external_get_base_device renamed into vfio_external_base_device > - vfio_external_get_type removed > - kvm_vfio_external_get_base_device renamed into >> kvm_vfio_external_base_device > - __KVM_HAVE_ARCH_KVM_VFIO renamed into >> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD > > Eric Auger (8): > KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded > IRQ > KVM: ARM: VGIC: add forwarded irq rbtree lock > VFIO: platform: handler tests whether the IRQ is forwarded > KVM: KVM-VFIO: update user API to program forwarded IRQ > VFIO: Extend external user API > KVM: KVM-VFIO: add new VFIO external API hooks > KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ >> forwarding > co
RE: [RFC v2 0/9] KVM-VFIO IRQ forward control
> -Original Message- > From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On > Behalf Of Eric Auger > Sent: Monday, November 17, 2014 9:42 PM > To: Wu, Feng; Alex Williamson; Christoffer Dall > Cc: eric.au...@st.com; marc.zyng...@arm.com; > linux-arm-ker...@lists.infradead.org; kvm...@lists.cs.columbia.edu; > kvm@vger.kernel.org; joel.sch...@amd.com; kim.phill...@freescale.com; > pau...@samba.org; g...@kernel.org; pbonz...@redhat.com; > linux-ker...@vger.kernel.org; patc...@linaro.org; will.dea...@arm.com; > a.mota...@virtualopensystems.com; a.r...@virtualopensystems.com; > john.li...@huawei.com > Subject: Re: [RFC v2 0/9] KVM-VFIO IRQ forward control > > Hi Feng, > > I will submit a PATCH v3 release end of this week. > > Best Regards > > Eric Thanks for the update, Eric! Thanks, Feng > > On 11/17/2014 12:25 PM, Wu, Feng wrote: > > > > > >> -Original Message- > >> From: linux-kernel-ow...@vger.kernel.org > >> [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Alex Williamson > >> Sent: Thursday, September 11, 2014 1:10 PM > >> To: Christoffer Dall > >> Cc: Eric Auger; eric.au...@st.com; marc.zyng...@arm.com; > >> linux-arm-ker...@lists.infradead.org; kvm...@lists.cs.columbia.edu; > >> kvm@vger.kernel.org; joel.sch...@amd.com; kim.phill...@freescale.com; > >> pau...@samba.org; g...@kernel.org; pbonz...@redhat.com; > >> linux-ker...@vger.kernel.org; patc...@linaro.org; will.dea...@arm.com; > >> a.mota...@virtualopensystems.com; a.r...@virtualopensystems.com; > >> john.li...@huawei.com > >> Subject: Re: [RFC v2 0/9] KVM-VFIO IRQ forward control > >> > >> On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote: > >>> On Tue, Sep 02, 2014 at 03:05:41PM -0600, Alex Williamson wrote: > On Mon, 2014-09-01 at 14:52 +0200, Eric Auger wrote: > > This RFC proposes an integration of "ARM: Forwarding physical > > interrupts to a guest VM" (http://lwn.net/Articles/603514/) in > > KVM. > > > > It enables to transform a VFIO platform driver IRQ into a forwarded > > IRQ. The direct benefit is that, for a level sensitive IRQ, a VM > > switch can be avoided on guest virtual IRQ completion. Before this > > patch, a maintenance IRQ was triggered on the virtual IRQ completion. > > > > When the IRQ is forwarded, the VFIO platform driver does not need to > > disable the IRQ anymore. Indeed when returning from the IRQ handler > > the IRQ is not deactivated. Only its priority is lowered. This means > > the same IRQ cannot hit before the guest completes the virtual IRQ > > and the GIC automatically deactivates the corresponding physical IRQ. > > > > Besides, the injection still is based on irqfd triggering. The only > > impact on irqfd process is resamplefd is not called anymore on > > virtual IRQ completion since this latter becomes "transparent". > > > > The current integration is based on an extension of the KVM-VFIO > > device, previously used by KVM to interact with VFIO groups. The > > patch serie now enables KVM to directly interact with a VFIO > > platform device. The VFIO external API was extended for that purpose. > > > > Th KVM-VFIO device can get/put the vfio platform device, check its > > integrity and type, get the IRQ number associated to an IRQ index. > > > > The IRQ forward programming is architecture specific (virtual interrupt > > controller programming basically). However the whole infrastructure is > > kept generic. > > > > from a user point of view, the functionality is provided through new > > KVM-VFIO device commands, > >> KVM_DEV_VFIO_DEVICE_(UN)FORWARD_IRQ > > and the capability can be checked with KVM_HAS_DEVICE_ATTR. > > Assignment can only be changed when the physical IRQ is not active. > > It is the responsability of the user to do this check. > > > > This patch serie has the following dependencies: > > - "ARM: Forwarding physical interrupts to a guest VM" > > (http://lwn.net/Articles/603514/) in > > - [PATCH v3] irqfd for ARM > > - and obviously the VFIO platform driver serie: > > [RFC PATCH v6 00/20] VFIO support for platform devices on ARM > > > https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html > > > > Integrated pieces can be found at > > ssh://git.linaro.org/people/eric.auger/linux.git > > on branch 3.17rc3_irqfd_forward_integ_v2 > > > > This was was tested on Calxeda Midway, assigning the xgmac main IRQ. > > > > v1 -> v2: > > - forward control is moved from architecture specific file into generic > > vfio.c module. > > only kvm_arch_set_fwd_state remains architecture specific > > - integrate Kim's patch which enables KVM-VFIO for ARM > > - fix vgic state bypass in vgic_queue_hwirq > > - struct kvm_arch_forwarded_irq moved from > >> arch/arm/include/uapi/asm/kvm.h > > to include/uapi/linux/
[PATCH 1/3] kvm: add a memslot flag for incoherent memory regions
Memory regions may be incoherent with the caches, typically when the guest has mapped a host system RAM backed memory region as uncached. Add a flag KVM_MEMSLOT_INCOHERENT so that we can tag these memslots and handle them appropriately when mapping them. Signed-off-by: Ard Biesheuvel --- include/linux/kvm_host.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a6059bdf7b03..e4d8f705fecd 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -43,6 +43,7 @@ * include/linux/kvm_h. */ #define KVM_MEMSLOT_INVALID(1UL << 16) +#define KVM_MEMSLOT_INCOHERENT (1UL << 17) /* Two fragments for cross MMIO pages. */ #define KVM_MAX_MMIO_FRAGMENTS 2 -- 1.8.3.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] arm, arm64: KVM: allow forced dcache flush on page faults
From: Laszlo Ersek To allow handling of incoherent memslots in a subsequent patch, this patch adds a paramater 'ipa_uncached' to cache_coherent_guest_page() so that we can instruct it to flush the page's contents to DRAM even if the guest has caching globally enabled. Signed-off-by: Laszlo Ersek Signed-off-by: Ard Biesheuvel --- arch/arm/include/asm/kvm_mmu.h | 5 +++-- arch/arm/kvm/mmu.c | 9 +++-- arch/arm64/include/asm/kvm_mmu.h | 5 +++-- 3 files changed, 13 insertions(+), 6 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index acb0d5712716..f867060035ec 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -161,9 +161,10 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu) } static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva, -unsigned long size) +unsigned long size, +bool ipa_uncached) { - if (!vcpu_has_cache_enabled(vcpu)) + if (!vcpu_has_cache_enabled(vcpu) || ipa_uncached) kvm_flush_dcache_to_poc((void *)hva, size); /* diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index b007438242e2..cb924c6d56a6 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -852,6 +852,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, struct vm_area_struct *vma; pfn_t pfn; pgprot_t mem_type = PAGE_S2; + bool fault_ipa_uncached; write_fault = kvm_is_write_fault(vcpu); if (fault_status == FSC_PERM && !write_fault) { @@ -918,6 +919,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (!hugetlb && !force_pte) hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa); + fault_ipa_uncached = false; + if (hugetlb) { pmd_t new_pmd = pfn_pmd(pfn, mem_type); new_pmd = pmd_mkhuge(new_pmd); @@ -925,7 +928,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, kvm_set_s2pmd_writable(&new_pmd); kvm_set_pfn_dirty(pfn); } - coherent_cache_guest_page(vcpu, hva & PMD_MASK, PMD_SIZE); + coherent_cache_guest_page(vcpu, hva & PMD_MASK, PMD_SIZE, + fault_ipa_uncached); ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd); } else { pte_t new_pte = pfn_pte(pfn, mem_type); @@ -933,7 +937,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, kvm_set_s2pte_writable(&new_pte); kvm_set_pfn_dirty(pfn); } - coherent_cache_guest_page(vcpu, hva, PAGE_SIZE); + coherent_cache_guest_page(vcpu, hva, PAGE_SIZE, + fault_ipa_uncached); ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, pgprot_val(mem_type) == pgprot_val(PAGE_S2_DEVICE)); } diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 0caf7a59f6a1..123b521a9908 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -243,9 +243,10 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu) } static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva, -unsigned long size) +unsigned long size, +bool ipa_uncached) { - if (!vcpu_has_cache_enabled(vcpu)) + if (!vcpu_has_cache_enabled(vcpu) || ipa_uncached) kvm_flush_dcache_to_poc((void *)hva, size); if (!icache_is_aliasing()) {/* PIPT */ -- 1.8.3.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots
Readonly memslots are often used to implement emulation of ROMs and NOR flashes, in which case the guest may legally map these regions as uncached. To deal with the incoherency associated with uncached guest mappings, treat all readonly memslots as incoherent, and ensure that pages that belong to regions tagged as such are flushed to DRAM before being passed to the guest. Signed-off-by: Ard Biesheuvel --- arch/arm/kvm/mmu.c | 20 +++- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index cb924c6d56a6..f2a9874ff5cb 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -919,7 +919,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (!hugetlb && !force_pte) hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa); - fault_ipa_uncached = false; + fault_ipa_uncached = memslot->flags & KVM_MEMSLOT_INCOHERENT; if (hugetlb) { pmd_t new_pmd = pfn_pmd(pfn, mem_type); @@ -1298,11 +1298,12 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, hva = vm_end; } while (hva < reg_end); - if (ret) { - spin_lock(&kvm->mmu_lock); + spin_lock(&kvm->mmu_lock); + if (ret) unmap_stage2_range(kvm, mem->guest_phys_addr, mem->memory_size); - spin_unlock(&kvm->mmu_lock); - } + else + stage2_flush_memslot(kvm, memslot); + spin_unlock(&kvm->mmu_lock); return ret; } @@ -1314,6 +1315,15 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free, int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, unsigned long npages) { + /* +* Readonly memslots are not incoherent with the caches by definition, +* but in practice, they are used mostly to emulate ROMs or NOR flashes +* that the guest may consider devices and hence map as uncached. +* To prevent incoherency issues in these cases, tag all readonly +* regions as incoherent. +*/ + if (slot->flags & KVM_MEM_READONLY) + slot->flags |= KVM_MEMSLOT_INCOHERENT; return 0; } -- 1.8.3.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 3/6] hw_random: use reference counts on each struct hwrng.
On Wed, Nov 12, 2014 at 02:11:23PM +1030, Rusty Russell wrote: > Amos Kong writes: > > From: Rusty Russell > > > > current_rng holds one reference, and we bump it every time we want > > to do a read from it. > > > > This means we only hold the rng_mutex to grab or drop a reference, > > so accessing /sys/devices/virtual/misc/hw_random/rng_current doesn't > > block on read of /dev/hwrng. > > > > Using a kref is overkill (we're always under the rng_mutex), but > > a standard pattern. > > > > This also solves the problem that the hwrng_fillfn thread was > > accessing current_rng without a lock, which could change (eg. to NULL) > > underneath it. > > > > v4: decrease last reference for triggering the cleanup > > This doesn't make any sense: > > > +static void drop_current_rng(void) > > +{ > > + struct hwrng *rng = current_rng; > > + > > + BUG_ON(!mutex_is_locked(&rng_mutex)); > > + if (!current_rng) > > + return; > > + > > + /* release current_rng reference */ > > + kref_put(¤t_rng->ref, cleanup_rng); > > + current_rng = NULL; > > + > > + /* decrease last reference for triggering the cleanup */ > > + kref_put(&rng->ref, cleanup_rng); > > +} > > Why would it drop the refcount twice? This doesn't make sense. > > Hmm, because you added kref_init, which initializes the reference count > to 1, you created this bug. I saw some kernel code uses kref_* helper functions, the reference conter is initialized to 1. Some code didn't use the helper functions to increase/decrease the reference counter. So I will drop kref_init() and the second kref_put(). > Leave out the kref_init, and let it naturally be 0 (until, and if, it > becomes current_rng). Add a comment if you want. OK, thanks. > Thanks, > Rusty. > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Amos. signature.asc Description: Digital signature
Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots
On 17/11/2014 15:58, Ard Biesheuvel wrote: > Readonly memslots are often used to implement emulation of ROMs and > NOR flashes, in which case the guest may legally map these regions as > uncached. > To deal with the incoherency associated with uncached guest mappings, > treat all readonly memslots as incoherent, and ensure that pages that > belong to regions tagged as such are flushed to DRAM before being passed > to the guest. On x86, the processor combines the cacheability values from the two levels of page tables. Is there no way to do the same on ARM? Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots
Hi Paolo, On 17/11/14 15:29, Paolo Bonzini wrote: > > > On 17/11/2014 15:58, Ard Biesheuvel wrote: >> Readonly memslots are often used to implement emulation of ROMs and >> NOR flashes, in which case the guest may legally map these regions as >> uncached. >> To deal with the incoherency associated with uncached guest mappings, >> treat all readonly memslots as incoherent, and ensure that pages that >> belong to regions tagged as such are flushed to DRAM before being passed >> to the guest. > > On x86, the processor combines the cacheability values from the two > levels of page tables. Is there no way to do the same on ARM? ARM is broadly similar, but there's a number of gotchas: - uncacheable (guest level) + cacheable (host level) -> uncacheable: the read request is going to be directly sent to RAM, bypassing the caches. - Userspace is going to use a cacheable view of the "NOR" pages, which is going to stick around in the cache (this is just memory, after all). The net result is that we need to detect those cases and make sure the guest sees the latest bit of data written by userland. We already have a similar mechanism when we fault pages in, but the guest has not enabled its caches yet. M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots
On 11/17/14 16:29, Paolo Bonzini wrote: > > > On 17/11/2014 15:58, Ard Biesheuvel wrote: >> Readonly memslots are often used to implement emulation of ROMs and >> NOR flashes, in which case the guest may legally map these regions as >> uncached. >> To deal with the incoherency associated with uncached guest mappings, >> treat all readonly memslots as incoherent, and ensure that pages that >> belong to regions tagged as such are flushed to DRAM before being passed >> to the guest. > > On x86, the processor combines the cacheability values from the two > levels of page tables. Is there no way to do the same on ARM? Combining occurs on ARMv8 too. The Stage1 (guest) mapping is very strict (Device non-Gathering, non-Reordering, no Early Write Acknowledgement -- for EFI_MEMORY_UC), which basically "overrides" the Stage2 (very lax host) memory attributes. When qemu writes, as part of emulating the flash programming commands, to the RAMBlock that *otherwise* backs the flash range (as a r/o memslot), those writes (from host userspace) tend to end up in dcache. But, when the guest flips back the flash to romd mode, and tries to read back the values from the flash as plain ROM, the dcache is completely bypassed due to the strict stage1 mapping, and the guest goes directly to DRAM. Where qemu's earlier writes are not yet / necessarily visible. Please see my original patch (which was incomplete) in the attachment, it has a very verbose commit message. Anyway, I'll let others explain; they can word it better than I can :) FWIW, Series Reviewed-by: Laszlo Ersek I ported this series to a 3.17.0+ based kernel, and tested it. It works fine. The ROM-like view of the NOR flash now reflects the previously programmed contents. Series Tested-by: Laszlo Ersek Thanks! Laszlo >From a2b4da9b03f03ccdb8b0988a5cc64d1967f00398 Mon Sep 17 00:00:00 2001 From: Laszlo Ersek Date: Sun, 16 Nov 2014 01:43:11 +0100 Subject: [PATCH] arm, arm64: KVM: clean cache on page fault also when IPA is uncached (WIP) This patch builds on Marc Zyngier's commit 2d58b733c87689d3d5144e4ac94ea861cc729145. (1) The guest bypasses the cache *not only* when the VCPU's dcache is disabled (see bit 0 and bit 2 in SCTLR_EL1, "MMU enable" and "Cache enable", respectively -- vcpu_has_cache_enabled()). The guest bypasses the cache *also* when the Stage 1 memory attributes say "device memory" about the Intermediate Page Address in question, independently of the Stage 2 memory attributes. Refer to: Table D5-38 Combining the stage 1 and stage 2 memory type assignments in the ARM ARM. (This is likely similar to MTRRs on x86.) (2) In edk2 (EFI Development Kit II), the ARM NOR flash driver, ArmPlatformPkg/Drivers/NorFlashDxe/NorFlashFvbDxe.c uses the AddMemorySpace() and SetMemorySpaceAttributes() Global Coherency Domain Services of DXE (Driver eXecution Environment) to *justifiedly* set the attributes of the guest memory covering the flash chip to EFI_MEMORY_UC ("uncached"). According to the AArch64 bindings for UEFI (see "2.3.6.1 Memory types" in the UEFI-2.4A specification), EFI_MEMORY_UC is mapped to: ARM Memory Type: MAIR attribute encodingARM Memory Type: EFI Memory TypeAttr [7:4] [3:0]Meaning ------ EFI_MEMORY_UC Device-nGnRnE (Device (Not cacheable) non-Gathering, non-Reordering, no Early Write Acknowledgement) This is correctly implemented in edk2, in the ArmConfigureMmu() function, via the ArmSetMAIR() call and the MAIR_ATTR() macro: The TT_ATTR_INDX_DEVICE_MEMORY (== 0) memory attribute index, which is used for EFI_MEMORY_UC memory, is associated with the MAIR_ATTR_DEVICE_MEMORY (== 0x00, see above) memory attribute value, in the MAIR_ELx register. As a consequence of (1) and (2), when edk2 code running in the guest accesses an IPA falling in the flash range, it will completely bypass the cache. Therefore, when such a page is faulted in in user_mem_abort(), we must flush the data cache; otherwise the guest will see stale data in the flash chip. This patch is not complete because I have no clue how to calculate the memory attribute for "fault_ipa" in user_mem_abort(). Right now I set "fault_ipa_uncached" to constant true, which might incur some performance penalty for data faults, but it certainly improves correctness -- the ArmVirtualizationPkg platform build of edk2 actually boots as a KVM guest on APM Mustang. Signed-off-by: Laszlo Ersek --- arch/arm/include/asm/kvm_mmu.h | 5 +++-- arch/arm64/include/asm/kvm_mmu.h | 5 +++-- arch/arm/kvm/mmu.c | 10 -- 3 files changed, 14 insertions(+), 6 deletions(-) diff --git a/
Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots
On 17/11/2014 16:39, Marc Zyngier wrote: > ARM is broadly similar, but there's a number of gotchas: > - uncacheable (guest level) + cacheable (host level) -> uncacheable: the > read request is going to be directly sent to RAM, bypassing the caches. > - Userspace is going to use a cacheable view of the "NOR" pages, which > is going to stick around in the cache (this is just memory, after all). Ah, x86 also has uncacheable + cacheable -> uncacheable, but Intel also added a bit to ignore the guest-provided type. We use that bit for RAM-backed areas. Also, on x86 if the cache is disabled the processor will still snoop caches (including its own cache) and perform writeback+invalidate of the cache line before accessing main memory, if it's dirty. AMD does not have the aforementioned bit, but applies this same algorithm if the host says the page is writeback in the MTRR (memory type range register). The Intel solution is less tricky and has better performance. Paolo > The net result is that we need to detect those cases and make sure the > guest sees the latest bit of data written by userland. > > We already have a similar mechanism when we fault pages in, but the > guest has not enabled its caches yet. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Where is the VM live migration code?
Hi, I saw this page: http://www.linux-kvm.org/page/Migration. It looks like Migration is a feature provided by KVM? But when I look at the Linux kernel source code, i.e., virt/kvm, and arch/x86/kvm, I don't see the code for this migration feature. So I wonder where is the source code for the live migration? Is it purely implemented in user space? Because I see there are the following files in the qemu source code: migration.c migration-exec.c migration-fd.c migration-rdma.c migration-tcp.c migration-unix.c If I wish to understand the implementation of migration in Qemu/KVM, are these above files the ones I should read? Thanks. -Jidong -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Where is the VM live migration code?
> Hi, > > I saw this page: > > http://www.linux-kvm.org/page/Migration. > > It looks like Migration is a feature provided by KVM? But when I look > at the Linux kernel source code, i.e., virt/kvm, and arch/x86/kvm, I > don't see the code for this migration feature. > Most of live migration code is in qemu migration.c, savevm.c, arch_init.c, block-migration.c, and the other devices's save/load handler, .etc, only log/sync dirty page implemented in kernel. You can read the most important function migration_thread(), process_incoming_migration_co(). > So I wonder where is the source code for the live migration? Is it >purely implemented in user space? Because I see there are the > following files in the qemu source code: > > migration.c migration-exec.c migration-fd.c migration-rdma.c > migration-tcp.c migration-unix.c > > If I wish to understand the implementation of migration in Qemu/KVM, > are these above files the ones I should read? Thanks. > > -Jidong -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vhost + multiqueue + RSS question.
> On Mon, Nov 17, 2014 at 01:58:20PM +0200, Michael S. Tsirkin wrote: > > On Mon, Nov 17, 2014 at 01:22:07PM +0200, Gleb Natapov wrote: > > > On Mon, Nov 17, 2014 at 12:38:16PM +0200, Michael S. Tsirkin wrote: > > > > On Mon, Nov 17, 2014 at 09:44:23AM +0200, Gleb Natapov wrote: > > > > > On Sun, Nov 16, 2014 at 08:56:04PM +0200, Michael S. Tsirkin wrote: > > > > > > On Sun, Nov 16, 2014 at 06:18:18PM +0200, Gleb Natapov wrote: > > > > > > > Hi Michael, > > > > > > > > > > > > > > I am playing with vhost multiqueue capability and have a > > > > > > > question about > > > > > > > vhost multiqueue and RSS (receive side steering). My setup has > > > > > > > Mellanox > > > > > > > ConnectX-3 NIC which supports multiqueue and RSS. Network related > > > > > > > parameters for qemu are: > > > > > > > > > > > > > >-netdev tap,id=hn0,script=qemu-ifup.sh,vhost=on,queues=4 > > > > > > >-device virtio-net-pci,netdev=hn0,id=nic1,mq=on,vectors=10 > > > > > > > > > > > > > > In a guest I ran "ethtool -L eth0 combined 4" to enable > > > > > > > multiqueue. > > > > > > > > > > > > > > I am running one tcp stream into the guest using iperf. Since > > > > > > > there is > > > > > > > only one tcp stream I expect it to be handled by one queue only > > > > > > > but > > > > > > > this seams to be not the case. ethtool -S on a host shows that the > > > > > > > stream is handled by one queue in the NIC, just like I would > > > > > > > expect, > > > > > > > but in a guest all 4 virtio-input interrupt are incremented. Am I > > > > > > > missing any configuration? > > > > > > > > > > > > I don't see anything obviously wrong with what you describe. > > > > > > Maybe, somehow, same irqfd got bound to multiple MSI vectors? > > > > > It does not look like this is what is happening judging by the way > > > > > interrupts are distributed between queues. They are not distributed > > > > > uniformly and often I see one queue gets most interrupt and others get > > > > > much less and then it changes. > > > > > > > > Weird. It would happen if you transmitted from multiple CPUs. > > > > You did pin iperf to a single CPU within guest, did you not? > > > > > > > No, I didn't because I didn't expect it to matter for input interrupts. > > > When I run iperf on a host rx queue that receives all packets depends > > > only on a connection itself, not on a cpu iperf is running on (I tested > > > that). > > > > This really depends on the type of networking card you have > > on the host, and how it's configured. > > > > I think you will get something more closely resembling this > > behaviour if you enable RFS in host. > > > > > When I pin iperf in a guest I do indeed see that all interrupts > > > are arriving to the same irq vector. Is a number after virtio-input > > > in /proc/interrupt any indication of a queue a packet arrived to (on > > > a host I can use ethtool -S to check what queue receives packets, but > > > unfortunately this does not work for virtio nic in a guest)? > > > > I think it is. > > > > > Because if > > > it is the way RSS works in virtio is not how it works on a host and not > > > what I would expect after reading about RSS. The queue a packets arrives > > > to should be calculated by hashing fields from a packet header only. > > > > Yes, what virtio has is not RSS - it's an accelerated RFS really. > > > OK, if what virtio has is RFS and not RSS my test results make sense. > Thanks! I think the RSS emulation for virtio-mq NIC is implemented in tun_select_queue(), am I missing something? Thanks, Zhang Haoyu -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Where is the VM live migration code?
On Mon, Nov 17, 2014 at 5:29 PM, Zhang Haoyu wrote: >> Hi, >> >> I saw this page: >> >> http://www.linux-kvm.org/page/Migration. >> >> It looks like Migration is a feature provided by KVM? But when I look >> at the Linux kernel source code, i.e., virt/kvm, and arch/x86/kvm, I >> don't see the code for this migration feature. >> > Most of live migration code is in qemu migration.c, savevm.c, arch_init.c, > block-migration.c, and the other devices's save/load handler, .etc, > only log/sync dirty page implemented in kernel. > You can read the most important function migration_thread(), > process_incoming_migration_co(). > Great, thanks Haoyu! I will try to understand these parts of code first. -Jidong >> So I wonder where is the source code for the live migration? Is it >>purely implemented in user space? Because I see there are the >> following files in the qemu source code: >> >> migration.c migration-exec.c migration-fd.c migration-rdma.c >> migration-tcp.c migration-unix.c >> >> If I wish to understand the implementation of migration in Qemu/KVM, >> are these above files the ones I should read? Thanks. >> >> -Jidong > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
On Tue, Nov 11, 2014 at 2:48 PM, Anup Patel wrote: > Hi All, > > I have second thoughts about rebasing KVM PMU patches > to Marc's irq-forwarding patches. > > The PMU IRQs (when virtualized by KVM) are not exactly > forwarded IRQs because they are shared between Host > and Guest. > > Scenario1 > - > > We might have perf running on Host and no KVM guest > running. In this scenario, we wont get interrupts on Host > because the kvm_pmu_hyp_init() (similar to the function > kvm_timer_hyp_init() of Marc's IRQ-forwarding > implementation) has put all host PMU IRQs in forwarding > mode. > > The only way solve this problem is to not set forwarding > mode for PMU IRQs in kvm_pmu_hyp_init() and instead > have special routines to turn on and turn off the forwarding > mode of PMU IRQs. These routines will be called from > kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ > forwarding state. > > Scenario2 > - > > We might have perf running on Host and Guest simultaneously > which means it is quite likely that PMU HW trigger IRQ meant > for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" > and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine > of Marc's patchset which is called before local_irq_enable()). > > In this scenario, the updated kvm_pmu_sync_hwstate(vcpu) > will accidentally forward IRQ meant for Host to Guest unless > we put additional checks to inspect VCPU PMU state. > > Am I missing any detail about IRQ forwarding for above > scenarios? > > If not then can we consider current mask/unmask approach > for forwarding PMU IRQs? > > Marc?? Will?? > > Regards, > Anup Ping ??? -- Anup -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vhost + multiqueue + RSS question.
On 11/17/2014 07:58 PM, Michael S. Tsirkin wrote: > On Mon, Nov 17, 2014 at 01:22:07PM +0200, Gleb Natapov wrote: >> > On Mon, Nov 17, 2014 at 12:38:16PM +0200, Michael S. Tsirkin wrote: >>> > > On Mon, Nov 17, 2014 at 09:44:23AM +0200, Gleb Natapov wrote: > > > On Sun, Nov 16, 2014 at 08:56:04PM +0200, Michael S. Tsirkin wrote: > > > > > On Sun, Nov 16, 2014 at 06:18:18PM +0200, Gleb Natapov wrote: >> > > > > > Hi Michael, >> > > > > > >> > > > > > I am playing with vhost multiqueue capability and have a >> > > > > > question about >> > > > > > vhost multiqueue and RSS (receive side steering). My setup has >> > > > > > Mellanox >> > > > > > ConnectX-3 NIC which supports multiqueue and RSS. Network >> > > > > > related >> > > > > > parameters for qemu are: >> > > > > > >> > > > > >-netdev tap,id=hn0,script=qemu-ifup.sh,vhost=on,queues=4 >> > > > > >-device virtio-net-pci,netdev=hn0,id=nic1,mq=on,vectors=10 >> > > > > > >> > > > > > In a guest I ran "ethtool -L eth0 combined 4" to enable >> > > > > > multiqueue. >> > > > > > >> > > > > > I am running one tcp stream into the guest using iperf. Since >> > > > > > there is >> > > > > > only one tcp stream I expect it to be handled by one queue >> > > > > > only but >> > > > > > this seams to be not the case. ethtool -S on a host shows that >> > > > > > the >> > > > > > stream is handled by one queue in the NIC, just like I would >> > > > > > expect, >> > > > > > but in a guest all 4 virtio-input interrupt are incremented. >> > > > > > Am I >> > > > > > missing any configuration? > > > > > > > > > > I don't see anything obviously wrong with what you describe. > > > > > Maybe, somehow, same irqfd got bound to multiple MSI vectors? > > > It does not look like this is what is happening judging by the way > > > interrupts are distributed between queues. They are not distributed > > > uniformly and often I see one queue gets most interrupt and others > > > get > > > much less and then it changes. >>> > > >>> > > Weird. It would happen if you transmitted from multiple CPUs. >>> > > You did pin iperf to a single CPU within guest, did you not? >>> > > >> > No, I didn't because I didn't expect it to matter for input interrupts. >> > When I run iperf on a host rx queue that receives all packets depends >> > only on a connection itself, not on a cpu iperf is running on (I tested >> > that). > This really depends on the type of networking card you have > on the host, and how it's configured. > > I think you will get something more closely resembling this > behaviour if you enable RFS in host. > >> > When I pin iperf in a guest I do indeed see that all interrupts >> > are arriving to the same irq vector. Is a number after virtio-input >> > in /proc/interrupt any indication of a queue a packet arrived to (on >> > a host I can use ethtool -S to check what queue receives packets, but >> > unfortunately this does not work for virtio nic in a guest)? > I think it is. > >> > Because if >> > it is the way RSS works in virtio is not how it works on a host and not >> > what I would expect after reading about RSS. The queue a packets arrives >> > to should be calculated by hashing fields from a packet header only. > Yes, what virtio has is not RSS - it's an accelerated RFS really. Strictly speaking, not aRFS. aRFS requires a programmable filter and needs driver to fill the filter on demand. For virtio-net, this is done automatically in host side (tun/tap). There's no guest involvement. > > The point is to try and take application locality into account. > Yes, the locality was done through (consider a N vcpu guest with N queue): - virtio-net driver will provide a default 1:1 mapping between vcpu and txq through XPS - virtio-net driver will suggest a default irq affinity hint also for a 1:1 mapping bettwen vcpu and txq/rxq With all these, each vcpu get its private txq/rxq paris. And host side implementation (tun/tap) will make sure if the packets of a flow were received from queue N, if will also use queue N to transmit the packets of this flow to guest. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vhost + multiqueue + RSS question.
On 11/18/2014 09:37 AM, Zhang Haoyu wrote: >> On Mon, Nov 17, 2014 at 01:58:20PM +0200, Michael S. Tsirkin wrote: >>> On Mon, Nov 17, 2014 at 01:22:07PM +0200, Gleb Natapov wrote: On Mon, Nov 17, 2014 at 12:38:16PM +0200, Michael S. Tsirkin wrote: > On Mon, Nov 17, 2014 at 09:44:23AM +0200, Gleb Natapov wrote: >> On Sun, Nov 16, 2014 at 08:56:04PM +0200, Michael S. Tsirkin wrote: >>> On Sun, Nov 16, 2014 at 06:18:18PM +0200, Gleb Natapov wrote: Hi Michael, I am playing with vhost multiqueue capability and have a question about vhost multiqueue and RSS (receive side steering). My setup has Mellanox ConnectX-3 NIC which supports multiqueue and RSS. Network related parameters for qemu are: -netdev tap,id=hn0,script=qemu-ifup.sh,vhost=on,queues=4 -device virtio-net-pci,netdev=hn0,id=nic1,mq=on,vectors=10 In a guest I ran "ethtool -L eth0 combined 4" to enable multiqueue. I am running one tcp stream into the guest using iperf. Since there is only one tcp stream I expect it to be handled by one queue only but this seams to be not the case. ethtool -S on a host shows that the stream is handled by one queue in the NIC, just like I would expect, but in a guest all 4 virtio-input interrupt are incremented. Am I missing any configuration? >>> I don't see anything obviously wrong with what you describe. >>> Maybe, somehow, same irqfd got bound to multiple MSI vectors? >> It does not look like this is what is happening judging by the way >> interrupts are distributed between queues. They are not distributed >> uniformly and often I see one queue gets most interrupt and others get >> much less and then it changes. > Weird. It would happen if you transmitted from multiple CPUs. > You did pin iperf to a single CPU within guest, did you not? > No, I didn't because I didn't expect it to matter for input interrupts. When I run iperf on a host rx queue that receives all packets depends only on a connection itself, not on a cpu iperf is running on (I tested that). >>> This really depends on the type of networking card you have >>> on the host, and how it's configured. >>> >>> I think you will get something more closely resembling this >>> behaviour if you enable RFS in host. >>> When I pin iperf in a guest I do indeed see that all interrupts are arriving to the same irq vector. Is a number after virtio-input in /proc/interrupt any indication of a queue a packet arrived to (on a host I can use ethtool -S to check what queue receives packets, but unfortunately this does not work for virtio nic in a guest)? >>> I think it is. >>> Because if it is the way RSS works in virtio is not how it works on a host and not what I would expect after reading about RSS. The queue a packets arrives to should be calculated by hashing fields from a packet header only. >>> Yes, what virtio has is not RSS - it's an accelerated RFS really. >>> >> OK, if what virtio has is RFS and not RSS my test results make sense. >> Thanks! > I think the RSS emulation for virtio-mq NIC is implemented in > tun_select_queue(), > am I missing something? > > Thanks, > Zhang Haoyu > Yes, if RSS is the short for Receive Side Steering which is a generic technology. But RSS is usually short for Receive Side Scaling which was commonly technology used by Windows, it was implemented through a indirection table in the card which is obviously not supported in tun currently. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
can I make this work… (Foundation for accessibility project)
this is a rather different use case than what you've been thinking of for KVM. It could mean significant improvement of the quality of life of disabled programs like myself. It's difficult to convey what it's like to try to use computers with speech recognition for something other than writing so, bear with me when I say something is real but don't quite prove it yet. also, please take it as read that the only really usable speech recognition environment out there is NaturallySpeaking with Google close behind in terms of accuracy but not even in the same planet for ability to extend for speech enabled applications. I'm trying to figure out ways of making it possible to drive Linux from Windows speech recognition (NaturallySpeaking). The goal is a system where Windows runs in a virtual machine (Linux host), audio is passed through from a USB headset to the Windows environment. And the output of the recognition engine is piped through some magic back to the Linux host. the hardest part of all of this without question is getting clean uninterrupted audio from the USB device all the way through to the Windows virtual machine. virtual box, VMware fail mostly in delivering reliable audio to the virtual machine. I expect KVM to not work right with regards to getting clean audio/real-time USB but I'm asking in case I'm wrong. if it doesn't work or can't work yet, what would it take to make it possible for clean audio to be passed through to a guest? --- Why this is important, approaches that failed, why think this will work. Boring accessibility info --- The history of trying to make Windows or DOS based speech recognition drive Linux has a long and tortured history. almost all of them involve some form of an open loop system that ignores system context and counts on the grammar to specify the context and the subsequent keystrokes injected into the target system. This model fails because it effectively speaking keyboard functions which wastes the majority of the power of a good grammar in a speech recognition environment. Most common configuration for speech recognition in a virtualized environment today is that Windows is the host with speech recognition and Linux is the guest. It's just a reimplementation of the open-loop system described above where your dictation results are keystrokes injected into the virtual machine console window. Sometimes works, sometimes drops characters. One big failing of the Windows host/Linux guest environments is in addition to dropping characters,it seems to drop segments of the audio stream on the Windows side. It's common but not frequent for this to happen anyway when running Windows with any sort of CPU utilization but it's almost guaranteed as soon as a virtual machine starts up. Another failing is that the context the recognition application is aware of is the window of the console. It knows nothing about the internal context of the virtual machine (what application has focus). And unfortunately it can't know anything more because of the way that NaturallySpeaking uses the local Windows context. Inverting the relationship between guest and host where Linux is the host and Windows is the guest solves at least the focus problem. In the virtual machine, you have a portal application the canal control the perception of context and tunnels the character stream from the recognition engine into the host OS to drive it open loop. The portal application[1] can also communicate which grammar sequence has been parsed and what action should be taken on the host site. At this point, we now have the capabilities of a closed-loop speech recognition environment where a grammar can read context to generate a new grammar to fit the applications state. This means smaller utterances which can be disambiguated versus the more traditional large utterance disambiguation technique. A couple other advantages of Windows as a guest is that it only run speech recognition in the portal. There's no browsers, no flash, JavaScript, viruses and other "stuff" taking up resources and distracting from speech recognition working as well as possible. The downside is that the host running the virtual machine needs to make the VM very high almost real-time priority[2] so that it doesn't stall and speech recognition works as quickly and as accurately as possible. Hope I didn't bore you too badly. Thank you for reading and I hope we can make this work. --- eric [1] should I call it cake? [2] I'm looking at you Firefox, sucking down 30% of the CPU doing nothing -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vhost + multiqueue + RSS question.
On Tue, Nov 18, 2014 at 11:41:11AM +0800, Jason Wang wrote: > On 11/18/2014 09:37 AM, Zhang Haoyu wrote: > >> On Mon, Nov 17, 2014 at 01:58:20PM +0200, Michael S. Tsirkin wrote: > >>> On Mon, Nov 17, 2014 at 01:22:07PM +0200, Gleb Natapov wrote: > On Mon, Nov 17, 2014 at 12:38:16PM +0200, Michael S. Tsirkin wrote: > > On Mon, Nov 17, 2014 at 09:44:23AM +0200, Gleb Natapov wrote: > >> On Sun, Nov 16, 2014 at 08:56:04PM +0200, Michael S. Tsirkin wrote: > >>> On Sun, Nov 16, 2014 at 06:18:18PM +0200, Gleb Natapov wrote: > Hi Michael, > > I am playing with vhost multiqueue capability and have a question > about > vhost multiqueue and RSS (receive side steering). My setup has > Mellanox > ConnectX-3 NIC which supports multiqueue and RSS. Network related > parameters for qemu are: > > -netdev tap,id=hn0,script=qemu-ifup.sh,vhost=on,queues=4 > -device virtio-net-pci,netdev=hn0,id=nic1,mq=on,vectors=10 > > In a guest I ran "ethtool -L eth0 combined 4" to enable multiqueue. > > I am running one tcp stream into the guest using iperf. Since there > is > only one tcp stream I expect it to be handled by one queue only but > this seams to be not the case. ethtool -S on a host shows that the > stream is handled by one queue in the NIC, just like I would expect, > but in a guest all 4 virtio-input interrupt are incremented. Am I > missing any configuration? > >>> I don't see anything obviously wrong with what you describe. > >>> Maybe, somehow, same irqfd got bound to multiple MSI vectors? > >> It does not look like this is what is happening judging by the way > >> interrupts are distributed between queues. They are not distributed > >> uniformly and often I see one queue gets most interrupt and others get > >> much less and then it changes. > > Weird. It would happen if you transmitted from multiple CPUs. > > You did pin iperf to a single CPU within guest, did you not? > > > No, I didn't because I didn't expect it to matter for input interrupts. > When I run iperf on a host rx queue that receives all packets depends > only on a connection itself, not on a cpu iperf is running on (I tested > that). > >>> This really depends on the type of networking card you have > >>> on the host, and how it's configured. > >>> > >>> I think you will get something more closely resembling this > >>> behaviour if you enable RFS in host. > >>> > When I pin iperf in a guest I do indeed see that all interrupts > are arriving to the same irq vector. Is a number after virtio-input > in /proc/interrupt any indication of a queue a packet arrived to (on > a host I can use ethtool -S to check what queue receives packets, but > unfortunately this does not work for virtio nic in a guest)? > >>> I think it is. > >>> > Because if > it is the way RSS works in virtio is not how it works on a host and not > what I would expect after reading about RSS. The queue a packets arrives > to should be calculated by hashing fields from a packet header only. > >>> Yes, what virtio has is not RSS - it's an accelerated RFS really. > >>> > >> OK, if what virtio has is RFS and not RSS my test results make sense. > >> Thanks! > > I think the RSS emulation for virtio-mq NIC is implemented in > > tun_select_queue(), > > am I missing something? > > > > Thanks, > > Zhang Haoyu > > > > Yes, if RSS is the short for Receive Side Steering which is a generic > technology. But RSS is usually short for Receive Side Scaling which was > commonly technology used by Windows, it was implemented through a > indirection table in the card which is obviously not supported in tun > currently. Hmm, I had an impression that "Receive Side Steering" and "Receive Side Scaling" are interchangeable. Software implementation for RSS is called "Receive Packet Steering" according to Documentation/networking/scaling.txt not "Receive Packet Scaling". Those damn TLAs are confusing. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html