RE: [PATCH v3 0/8] KVM-VFIO IRQ forward control
> -Original Message- > From: Eric Auger [mailto:eric.au...@linaro.org] > Sent: Monday, November 24, 2014 2:36 AM > To: eric.au...@st.com; eric.au...@linaro.org; christoffer.d...@linaro.org; > marc.zyng...@arm.com; linux-arm-ker...@lists.infradead.org; > kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org; > alex.william...@redhat.com; joel.sch...@amd.com; > kim.phill...@freescale.com; pau...@samba.org; g...@kernel.org; > pbonz...@redhat.com; ag...@suse.de > Cc: linux-ker...@vger.kernel.org; patc...@linaro.org; will.dea...@arm.com; > a.mota...@virtualopensystems.com; a.r...@virtualopensystems.com; > john.li...@huawei.com; ming@canonical.com; Wu, Feng > Subject: [PATCH v3 0/8] KVM-VFIO IRQ forward control > > This series proposes an integration of "ARM: Forwarding physical > interrupts to a guest VM" (http://lwn.net/Articles/603514/) in > KVM. > > It enables to transform a VFIO platform driver IRQ into a forwarded > IRQ. > > When a physical IRQ is forwarded (to a guest), the host does not > deactivates this latter. Completion ownership is transferred to the > guest. When the guest deactivates the associated virtual IRQ, > the interrupt controler automatically completes the physical IRQ. > Obviously this requires some dedicated HW support in the interrupt > controler. > > The direct benefit is that, for a level sensitive IRQ, it avoids a > VM exit on forwarded IRQ completion. > > When the IRQ is forwarded, the VFIO platform driver does not need to > mask the physical IRQ anymore before signaling the eventfd. Indeed > genirq lowers the running priority, enabling other physical IRQ to hit > except that one. > > Besides, the injection still is based on irqfd triggering. The only > impact on irqfd process is resamplefd is not called anymore on > virtual IRQ completion since this latter becomes "transparent". > > The current integration is based on an extension of the KVM-VFIO > device, previously used by KVM to interact with VFIO groups. The > patch series now enables KVM to directly interact with a VFIO > platform device. The VFIO external API was extended for that purpose. > > Th KVM-VFIO device can get/put the vfio platform device, check its > integrity and type, get the IRQ number associated to an IRQ index. > > The IRQ forward programming is architecture specific (virtual interrupt > controller programming basically). However the whole infrastructure is > kept generic. > > from a user point of view, the functionality is provided through a > new KVM-VFIO group named KVM_DEV_VFIO_DEVICE and 2 associated > attributes: > - KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, > - KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. > > The capability can be checked with KVM_HAS_DEVICE_ATTR. > > Forwarding must be activated before VFIO signaling mechanism is set > using VFIO_DEVICE_SET_IRQS and unset while the signaling is disabled. > > --- > > This patch series has the following dependencies: > - "ARM: Forwarding physical interrupts to a guest VM" > (http://lwn.net/Articles/603514/) > - [PATCH v9 00/19] VFIO support for platform and AMBA devices on ARM > (http://www.spinics.net/lists/kvm-arm/msg11745.html) > - [PATCH v2 0/6] vfio: type1: support for ARM SMMUS with > VFIO_IOMMU_TYPE1 > (http://www.spinics.net/lists/kvm-arm/msg11738.html) > > Integrated pieces can be found at > ssh://git.linaro.org/people/eric.auger/linux.git > on branch irqfd_integ_v8 > > This was was tested on Calxeda Midway, assigning the xgmac main IRQ. > Hi Eric, Did you send out the latest QEMU part for this patch set, I notice that v6 of The QEMU part is sent out, but seems some structure in this new version has been changed, such as, struct kvm_arch_forwarded_irq (subindex is added in this version), so a new patchset in QEMU is also needed. Thanks, Feng > v2 -> v3: > - kvm_fwd_irq_action enum replaced by a bool (KVM_VFIO_IRQ_CLEANUP does > not > exist anymore) > - a new struct local to vfio.c was introduced to wrap kvm_fw_irq and make it > linkable: kvm_vfio_fwd_irq_node > - kvm_fwd_irq now is self-contained (includes struct vfio_device *) > - a single list of kvm_vfio_fwd_irq_irq_node is used instead of having > a list of devices and a list of forward irq per device. Having 2 lists > brought extra complexity. > - the VFIO device ref counter is incremented each time a new IRQ is forwarded. > It is not attempted anymore to hold a single reference whatever the number > of forwarded IRQs. > - subindex added on top of index to be closer to VFIO API > - platform device check moved in the arm specific implementation > - enable the KVM-VFIO device for arm64 > - forwarded state change only can happen while the VFIO IRQ handler is not > set; in other words, when the VFIO IRQ signaling is not set. > > v1 -> v2: > - forward control is moved from architecture specific file into generic > vfio.c module. > only kvm_arch_set_fwd_state remains architecture specific > - integrate Kim's patch which enables KVM-VFIO for ARM > - fix vgic s
Re: [PATCH v3 0/8] KVM-VFIO IRQ forward control
On 11/24/2014 09:14 AM, Wu, Feng wrote: > > >> -Original Message- >> From: Eric Auger [mailto:eric.au...@linaro.org] >> Sent: Monday, November 24, 2014 2:36 AM >> To: eric.au...@st.com; eric.au...@linaro.org; christoffer.d...@linaro.org; >> marc.zyng...@arm.com; linux-arm-ker...@lists.infradead.org; >> kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org; >> alex.william...@redhat.com; joel.sch...@amd.com; >> kim.phill...@freescale.com; pau...@samba.org; g...@kernel.org; >> pbonz...@redhat.com; ag...@suse.de >> Cc: linux-ker...@vger.kernel.org; patc...@linaro.org; will.dea...@arm.com; >> a.mota...@virtualopensystems.com; a.r...@virtualopensystems.com; >> john.li...@huawei.com; ming@canonical.com; Wu, Feng >> Subject: [PATCH v3 0/8] KVM-VFIO IRQ forward control >> >> This series proposes an integration of "ARM: Forwarding physical >> interrupts to a guest VM" (http://lwn.net/Articles/603514/) in >> KVM. >> >> It enables to transform a VFIO platform driver IRQ into a forwarded >> IRQ. >> >> When a physical IRQ is forwarded (to a guest), the host does not >> deactivates this latter. Completion ownership is transferred to the >> guest. When the guest deactivates the associated virtual IRQ, >> the interrupt controler automatically completes the physical IRQ. >> Obviously this requires some dedicated HW support in the interrupt >> controler. >> >> The direct benefit is that, for a level sensitive IRQ, it avoids a >> VM exit on forwarded IRQ completion. >> >> When the IRQ is forwarded, the VFIO platform driver does not need to >> mask the physical IRQ anymore before signaling the eventfd. Indeed >> genirq lowers the running priority, enabling other physical IRQ to hit >> except that one. >> >> Besides, the injection still is based on irqfd triggering. The only >> impact on irqfd process is resamplefd is not called anymore on >> virtual IRQ completion since this latter becomes "transparent". >> >> The current integration is based on an extension of the KVM-VFIO >> device, previously used by KVM to interact with VFIO groups. The >> patch series now enables KVM to directly interact with a VFIO >> platform device. The VFIO external API was extended for that purpose. >> >> Th KVM-VFIO device can get/put the vfio platform device, check its >> integrity and type, get the IRQ number associated to an IRQ index. >> >> The IRQ forward programming is architecture specific (virtual interrupt >> controller programming basically). However the whole infrastructure is >> kept generic. >> >> from a user point of view, the functionality is provided through a >> new KVM-VFIO group named KVM_DEV_VFIO_DEVICE and 2 associated >> attributes: >> - KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, >> - KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. >> >> The capability can be checked with KVM_HAS_DEVICE_ATTR. >> >> Forwarding must be activated before VFIO signaling mechanism is set >> using VFIO_DEVICE_SET_IRQS and unset while the signaling is disabled. >> >> --- >> >> This patch series has the following dependencies: >> - "ARM: Forwarding physical interrupts to a guest VM" >> (http://lwn.net/Articles/603514/) >> - [PATCH v9 00/19] VFIO support for platform and AMBA devices on ARM >> (http://www.spinics.net/lists/kvm-arm/msg11745.html) >> - [PATCH v2 0/6] vfio: type1: support for ARM SMMUS with >> VFIO_IOMMU_TYPE1 >> (http://www.spinics.net/lists/kvm-arm/msg11738.html) >> >> Integrated pieces can be found at >> ssh://git.linaro.org/people/eric.auger/linux.git >> on branch irqfd_integ_v8 >> >> This was was tested on Calxeda Midway, assigning the xgmac main IRQ. >> > > Hi Eric, > > Did you send out the latest QEMU part for this patch set, I notice that v6 of > The QEMU part is sent out, but seems some structure in this new version > has been changed, such as, struct kvm_arch_forwarded_irq (subindex is added > in this version), so a new patchset in QEMU is also needed. Hi Feng, v7 is available at: http://lists.gnu.org/archive/html/qemu-devel/2014-10/msg03804.html. It already illustrates KVM-VFIO device usage. v8 which will indeed integrate subindex addition will be delivered this week. Best Regards Eric > > Thanks, > Feng > > >> v2 -> v3: >> - kvm_fwd_irq_action enum replaced by a bool (KVM_VFIO_IRQ_CLEANUP does >> not >> exist anymore) >> - a new struct local to vfio.c was introduced to wrap kvm_fw_irq and make it >> linkable: kvm_vfio_fwd_irq_node >> - kvm_fwd_irq now is self-contained (includes struct vfio_device *) >> - a single list of kvm_vfio_fwd_irq_irq_node is used instead of having >> a list of devices and a list of forward irq per device. Having 2 lists >> brought extra complexity. >> - the VFIO device ref counter is incremented each time a new IRQ is >> forwarded. >> It is not attempted anymore to hold a single reference whatever the number >> of forwarded IRQs. >> - subindex added on top of index to be closer to VFIO API >> - platform device check moved in the arm specific implementation >> - enable the KVM-VFIO devic
RE: [PATCH v3 0/8] KVM-VFIO IRQ forward control
> -Original Message- > From: Eric Auger [mailto:eric.au...@linaro.org] > Sent: Monday, November 24, 2014 4:27 PM > To: Wu, Feng; eric.au...@st.com; christoffer.d...@linaro.org; > marc.zyng...@arm.com; linux-arm-ker...@lists.infradead.org; > kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org; > alex.william...@redhat.com; joel.sch...@amd.com; > kim.phill...@freescale.com; pau...@samba.org; g...@kernel.org; > pbonz...@redhat.com; ag...@suse.de > Cc: linux-ker...@vger.kernel.org; patc...@linaro.org; will.dea...@arm.com; > a.mota...@virtualopensystems.com; a.r...@virtualopensystems.com; > john.li...@huawei.com; ming@canonical.com > Subject: Re: [PATCH v3 0/8] KVM-VFIO IRQ forward control > > On 11/24/2014 09:14 AM, Wu, Feng wrote: > > > > > >> -Original Message- > >> From: Eric Auger [mailto:eric.au...@linaro.org] > >> Sent: Monday, November 24, 2014 2:36 AM > >> To: eric.au...@st.com; eric.au...@linaro.org; christoffer.d...@linaro.org; > >> marc.zyng...@arm.com; linux-arm-ker...@lists.infradead.org; > >> kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org; > >> alex.william...@redhat.com; joel.sch...@amd.com; > >> kim.phill...@freescale.com; pau...@samba.org; g...@kernel.org; > >> pbonz...@redhat.com; ag...@suse.de > >> Cc: linux-ker...@vger.kernel.org; patc...@linaro.org; > will.dea...@arm.com; > >> a.mota...@virtualopensystems.com; a.r...@virtualopensystems.com; > >> john.li...@huawei.com; ming@canonical.com; Wu, Feng > >> Subject: [PATCH v3 0/8] KVM-VFIO IRQ forward control > >> > >> This series proposes an integration of "ARM: Forwarding physical > >> interrupts to a guest VM" (http://lwn.net/Articles/603514/) in > >> KVM. > >> > >> It enables to transform a VFIO platform driver IRQ into a forwarded > >> IRQ. > >> > >> When a physical IRQ is forwarded (to a guest), the host does not > >> deactivates this latter. Completion ownership is transferred to the > >> guest. When the guest deactivates the associated virtual IRQ, > >> the interrupt controler automatically completes the physical IRQ. > >> Obviously this requires some dedicated HW support in the interrupt > >> controler. > >> > >> The direct benefit is that, for a level sensitive IRQ, it avoids a > >> VM exit on forwarded IRQ completion. > >> > >> When the IRQ is forwarded, the VFIO platform driver does not need to > >> mask the physical IRQ anymore before signaling the eventfd. Indeed > >> genirq lowers the running priority, enabling other physical IRQ to hit > >> except that one. > >> > >> Besides, the injection still is based on irqfd triggering. The only > >> impact on irqfd process is resamplefd is not called anymore on > >> virtual IRQ completion since this latter becomes "transparent". > >> > >> The current integration is based on an extension of the KVM-VFIO > >> device, previously used by KVM to interact with VFIO groups. The > >> patch series now enables KVM to directly interact with a VFIO > >> platform device. The VFIO external API was extended for that purpose. > >> > >> Th KVM-VFIO device can get/put the vfio platform device, check its > >> integrity and type, get the IRQ number associated to an IRQ index. > >> > >> The IRQ forward programming is architecture specific (virtual interrupt > >> controller programming basically). However the whole infrastructure is > >> kept generic. > >> > >> from a user point of view, the functionality is provided through a > >> new KVM-VFIO group named KVM_DEV_VFIO_DEVICE and 2 associated > >> attributes: > >> - KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, > >> - KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. > >> > >> The capability can be checked with KVM_HAS_DEVICE_ATTR. > >> > >> Forwarding must be activated before VFIO signaling mechanism is set > >> using VFIO_DEVICE_SET_IRQS and unset while the signaling is disabled. > >> > >> --- > >> > >> This patch series has the following dependencies: > >> - "ARM: Forwarding physical interrupts to a guest VM" > >> (http://lwn.net/Articles/603514/) > >> - [PATCH v9 00/19] VFIO support for platform and AMBA devices on ARM > >> (http://www.spinics.net/lists/kvm-arm/msg11745.html) > >> - [PATCH v2 0/6] vfio: type1: support for ARM SMMUS with > >> VFIO_IOMMU_TYPE1 > >> (http://www.spinics.net/lists/kvm-arm/msg11738.html) > >> > >> Integrated pieces can be found at > >> ssh://git.linaro.org/people/eric.auger/linux.git > >> on branch irqfd_integ_v8 > >> > >> This was was tested on Calxeda Midway, assigning the xgmac main IRQ. > >> > > > > Hi Eric, > > > > Did you send out the latest QEMU part for this patch set, I notice that v6 > > of > > The QEMU part is sent out, but seems some structure in this new version > > has been changed, such as, struct kvm_arch_forwarded_irq (subindex is > added > > in this version), so a new patchset in QEMU is also needed. > > Hi Feng, > > v7 is available at: > http://lists.gnu.org/archive/html/qemu-devel/2014-10/msg03804.html. It > already illustrates KVM-VFIO device usage. > > v8 which will indeed integrate subindex additio
Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall wrote: > On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote: >> Hi Christoffer, >> >> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall >> wrote: >> > On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote: >> >> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall >> >> wrote: >> >> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote: >> >> >> Hi All, >> >> >> >> >> >> I have second thoughts about rebasing KVM PMU patches >> >> >> to Marc's irq-forwarding patches. >> >> >> >> >> >> The PMU IRQs (when virtualized by KVM) are not exactly >> >> >> forwarded IRQs because they are shared between Host >> >> >> and Guest. >> >> >> >> >> >> Scenario1 >> >> >> - >> >> >> >> >> >> We might have perf running on Host and no KVM guest >> >> >> running. In this scenario, we wont get interrupts on Host >> >> >> because the kvm_pmu_hyp_init() (similar to the function >> >> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding >> >> >> implementation) has put all host PMU IRQs in forwarding >> >> >> mode. >> >> >> >> >> >> The only way solve this problem is to not set forwarding >> >> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead >> >> >> have special routines to turn on and turn off the forwarding >> >> >> mode of PMU IRQs. These routines will be called from >> >> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ >> >> >> forwarding state. >> >> >> >> >> >> Scenario2 >> >> >> - >> >> >> >> >> >> We might have perf running on Host and Guest simultaneously >> >> >> which means it is quite likely that PMU HW trigger IRQ meant >> >> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" >> >> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine >> >> >> of Marc's patchset which is called before local_irq_enable()). >> >> >> >> >> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu) >> >> >> will accidentally forward IRQ meant for Host to Guest unless >> >> >> we put additional checks to inspect VCPU PMU state. >> >> >> >> >> >> Am I missing any detail about IRQ forwarding for above >> >> >> scenarios? >> >> >> >> >> > Hi Anup, >> >> >> >> Hi Christoffer, >> >> >> >> > >> >> > I briefly discussed this with Marc. What I don't understand is how it >> >> > would be possible to get an interrupt for the host while running the >> >> > guest? >> >> > >> >> > The rationale behind my question is that whenever you're running the >> >> > guest, the PMU should be programmed exclusively with guest state, and >> >> > since the PMU is per core, any interrupts should be for the guest, where >> >> > it would always be pending. >> >> >> >> Yes, thats right PMU is programmed exclusively for guest when >> >> guest is running and for host when host is running. >> >> >> >> Let us assume a situation (Scenario2 mentioned previously) >> >> where both host and guest are using PMU. When the guest is >> >> running we come back to host mode due to variety of reasons >> >> (stage2 fault, guest IO, regular host interrupt, host interrupt >> >> meant for guest, ) which means we will return from the >> >> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the >> >> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled. >> >> At this point we would have restored back host PMU context and >> >> any PMU counter used by host can trigger PMU overflow interrup >> >> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);" >> >> in the kvm_arch_vcpu_ioctl_run() function (similar to the >> >> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset) >> >> which will try to detect PMU irq forwarding state in GIC hence it >> >> can accidentally discover PMU irq pending for guest while this >> >> PMU irq is actually meant for host. >> >> >> >> This above mentioned situation does not happen for timer >> >> because virtual timer interrupts are exclusively used for guest. >> >> The exclusive use of virtual timer interrupt for guest ensures that >> >> the function kvm_timer_sync_hwstate() will always see correct >> >> state of virtual timer IRQ from GIC. >> >> >> > I'm not quite following. >> > >> > When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section, >> > you would (1) capture the active state of the IRQ pertaining to the >> > guest and (2) deactive the IRQ on the host, then (3) switch the state of >> > the PMU to the host state, and finally (4) re-enable IRQs on the CPU >> > you're running on. >> > >> > If the host PMU state restored in (3) causes the PMU to raise an >> > interrupt, you'll take an interrupt after (4), which is for the host, >> > and you'll handle it on the host. >> > >> We only switch PMU state in assembly code using >> kvm_call_hyp(__kvm_vcpu_run, vcpu) >> so whenever we are in kvm_arch_vcpu_ioctl_run() (i.e. host mode) >> the current hardware PMU state is for host. This means whenever >> we are in host mode the host PMU can change state of PMU IRQ >> in GIC even if local IRQs are
Re: [question] lots of interrupts injected to vm when pressingsomekey w/o releasing
>On Thu, Nov 20, 2014 at 02:59:36PM +0800, Zhang Haoyu wrote: >> >On 20/11/2014 03:20, Zhang Haoyu wrote: >> >> Hi all, >> >> >> >> If I press the one of "Insert/Delete/Home/End/PageUp/PageDown/UpArrow/ >> >> DownArrow/LeftArrow/RightArrow" key w/o releasing, then lots of interrupts >> >> will be injected to vm(win7/win2008), about 8000/s, the system become >> >> very slow, >> >> bringing very bad experience. But the other keys are okay. >> >> And, linux guest has no this problem. >> > >> >Do you have a trace for this? What version of QEMU and what UI backend? >> > >> Sorry for forgetting to mention test environment from the start. >> Host: rhel7 with kernel-3.10.0-121 >> QEMU: qemu-2.0.2 >> Guest: win7(bad),win2008(bad),linux-kernel-3.10.0-121(good) >> >> No UI backend, directly start the VM via qemu command. >> >> perf top data when above problem happening: >Trace it like this: http://www.linux-kvm.org/page/Tracing trace data while pressing "downArrow" key w/o releasing:(the vm has 2 vcpus) version = 6 CPU 2 is empty cpus=16 kvm-16063 [000] 8312.322731: kvm_pio: pio_read at 0x64 size 1 count 1 kvm-16062 [003] 8312.322732: kvm_exit: reason PAUSE_INSTRUCTION rip 0x805466c0 info 0 0 kvm-16062 [003] 8312.322733: kvm_entry:vcpu 0 kvm-16063 [000] 8312.322733: kvm_userspace_exit: reason KVM_EXIT_IO (2) kvm-16062 [003] 8312.322736: kvm_exit: reason PAUSE_INSTRUCTION rip 0x805466c0 info 0 0 kvm-16062 [003] 8312.322736: kvm_entry:vcpu 0 kvm-16063 [000] 8312.322736: kvm_entry:vcpu 1 kvm-16063 [000] 8312.322737: kvm_exit: reason CPUID rip 0x806e7fbb info 0 0 kvm-16063 [000] 8312.322738: kvm_cpuid:func 0 rax a rbx 756e6547 rcx 6c65746e rdx 49656e69 kvm-16062 [003] 8312.322738: kvm_exit: reason PAUSE_INSTRUCTION rip 0x805466c0 info 0 0 kvm-16063 [000] 8312.322738: kvm_entry:vcpu 1 kvm-16062 [003] 8312.322739: kvm_entry:vcpu 0 kvm-16063 [000] 8312.322739: kvm_exit: reason IO_INSTRUCTION rip 0x806edf72 info b008000b 0 kvm-16063 [000] 8312.322740: kvm_pio: pio_read at 0xb008 size 4 count 1 kvm-16063 [000] 8312.322740: kvm_entry:vcpu 1 kvm-16062 [003] 8312.322741: kvm_exit: reason PAUSE_INSTRUCTION rip 0x805466c0 info 0 0 kvm-16063 [000] 8312.322741: kvm_exit: reason IO_INSTRUCTION rip 0x806edf72 info b008000b 0 kvm-16062 [003] 8312.322741: kvm_entry:vcpu 0 kvm-16063 [000] 8312.322742: kvm_pio: pio_read at 0xb008 size 4 count 1 kvm-16063 [000] 8312.322742: kvm_entry:vcpu 1 kvm-16063 [000] 8312.322743: kvm_exit: reason IO_INSTRUCTION rip 0x806eda5a info 640008 0 kvm-16063 [000] 8312.322743: kvm_pio: pio_read at 0x64 size 1 count 1 kvm-16062 [003] 8312.322744: kvm_exit: reason PAUSE_INSTRUCTION rip 0x805466c0 info 0 0 kvm-16063 [000] 8312.322744: kvm_userspace_exit: reason KVM_EXIT_IO (2) kvm-16062 [003] 8312.322744: kvm_entry:vcpu 0 kvm-16062 [003] 8312.322746: kvm_exit: reason PAUSE_INSTRUCTION rip 0x805466c0 info 0 0 kvm-16062 [003] 8312.322747: kvm_entry:vcpu 0 kvm-16063 [000] 8312.322747: kvm_entry:vcpu 1 kvm-16063 [000] 8312.322748: kvm_exit: reason CPUID rip 0x806e7fbb info 0 0 kvm-16063 [000] 8312.322748: kvm_cpuid:func 0 rax a rbx 756e6547 rcx 6c65746e rdx 49656e69 kvm-16063 [000] 8312.322749: kvm_entry:vcpu 1 kvm-16062 [003] 8312.322749: kvm_exit: reason PAUSE_INSTRUCTION rip 0x805466c0 info 0 0 kvm-16062 [003] 8312.322749: kvm_entry:vcpu 0 kvm-16063 [000] 8312.322749: kvm_exit: reason IO_INSTRUCTION rip 0x806edf72 info b008000b 0 kvm-16063 [000] 8312.322750: kvm_pio: pio_read at 0xb008 size 4 count 1 kvm-16063 [000] 8312.322751: kvm_entry:vcpu 1 kvm-16062 [003] 8312.322751: kvm_exit: reason PAUSE_INSTRUCTION rip 0x805466c0 info 0 0 kvm-16063 [000] 8312.322751: kvm_exit: reason IO_INSTRUCTION rip 0x806edf72 info b008000b 0 kvm-16062 [003] 8312.322752: kvm_entry:vcpu 0 kvm-16063 [000] 8312.322752: kvm_pio: pio_read at 0xb008 size 4 count 1 ... trace data while pressing "downArrow" key w/o releasing:(the vm has 1 vcpus) version = 6 CPU 4 is empty CPU 5 is empty CPU 6 is empty C
Re: linux-next: build failure after merge of the kvm tree
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 24/11/2014 07:19, Stephen Rothwell wrote: > Hi all, > > After merging the kvm tree, today's linux-next build (x86_64 > allmodconfig) failed like this: > > ERROR: "get_xsave_addr" [arch/x86/kvm/kvm.ko] undefined! > > Caused by commit 1d7fe1d1fb18 ("KVM: x86: support XSAVES usage in > the host"). > > I have used the kvm tree from next-20141121 for today. Indeed this patch wasn't supposed to be pushed yet. I will rewind kvm/next by two commits. Sorry. Paolo -BEGIN PGP SIGNATURE- Version: GnuPG v2 iQEcBAEBAgAGBQJUcvyjAAoJEL/70l94x66DqS0H/2KZIYQ2De2dc0xY4tV6fpPV qHWlQADg4rPJ1Y0nx3f4y3Xbqypp+Jrh9kgc2j3QNj66jcNoE8iYvXTmaCBJx4x8 mex8eurjcP9S+wOHCiWoT/JAnK/jnmkdaxQSWSaGmAaTw+G7Zeui5KUHE/dG9QGK MNbNJOF1Uksz6Anqd/wNyXAbZcQJAyIiz0sujnOFeZOMG6EY3N9HTYaaYf7hbmtV q3lZJZOASAOj2+SOWsIaFP6LgcPFW5dQSMpiuZ3aH48VrISHdFnZxoFt5REWiDQZ ct2HyERItNvjUze/e/b2jjn0KYWswYQgBH7WKUBRvLj291WAepiP8W1dxO2KmaY= =o6gf -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] arm/arm64: vgic: Remove unreachable irq_clear_pending
When 'injecting' an edge-triggered interrupt with a falling edge we shouldn't clear the pending state on the distributor. In fact, we don't, because the check in vgic_validate_injection would prevent us from ever reaching this bit of code. Remove the unreachable snippet. Signed-off-by: Christoffer Dall --- virt/kvm/arm/vgic.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 3aaca49..f45cf16 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -1643,8 +1643,6 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid, vgic_dist_irq_clear_level(vcpu, irq_num); if (!vgic_dist_irq_soft_pend(vcpu, irq_num)) vgic_dist_irq_clear_pending(vcpu, irq_num); - } else { - vgic_dist_irq_clear_pending(vcpu, irq_num); } } -- 2.1.2.330.g565301e.dirty -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/3] irqfd support for arm/arm64
On Sun, Nov 23, 2014 at 06:56:57PM +0100, Eric Auger wrote: > This patch series enables irqfd on arm and arm64. > > Irqfd framework enables to inject a virtual IRQ into a guest upon an > eventfd trigger. User-side uses KVM_IRQFD VM ioctl to provide KVM with > a kvm_irqfd struct that associates a VM, an eventfd, a virtual IRQ number > (aka. the gsi). When an actor signals the eventfd (typically a VFIO > platform driver), the kvm irqfd subsystem injects the gsi into the VM. > > Resamplefd also is supported for level sensitive interrupts, ie. the > user can provide another eventfd that is triggered when the completion > of the virtual IRQ (gsi) is detected by the GIC. > > The gsi must correspond to a shared peripheral interrupt (SPI), ie the > GIC interrupt ID is gsi + 32. > > The rationale behind not supporting PPI irqfd injection is that > any device using a PPI would be a private-to-the-CPU device (timer for > instance), so its state would have to be context-switched along with the > VCPU and would require in-kernel wiring anyhow. It is not a relevant use > case for irqfds. > > this patch enables CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD. > > No IRQ routing table is used, enabling to remove CONFIG_HAVE_KVM_IRQCHIP > > can be found at git://git.linaro.org/people/eric.auger/linux.git > on branch irqfd_integ_v8 > > This work was tested with Calxeda Midway xgmac main interrupt with > qemu-system-arm and QEMU VFIO platform device. Also irqfd was proven > functional on several vhost-net prototypes. > > v3 -> v4: > - rebase on 3.18rc5 > - vgic dynamic instantiation brought new challenges: > handling of irqfd injection when vgic is not ready > - unset of CONFIG_HAVE_KVM_IRQCHIP in a separate patch > - add arm64 enable > - vgic.c style modifications according to Christoffer comments > There also seems to be a different split of the patches here? We've probably also reached the point where you need to start rebasing on Andre's GICv3 patches, which I expect will go in first. Thanks, -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 1/3] KVM: arm/arm64: unset CONFIG_HAVE_KVM_IRQCHIP
On Sun, Nov 23, 2014 at 06:56:58PM +0100, Eric Auger wrote: > CONFIG_HAVE_KVM_IRQCHIP is needed to support IRQ routing (along > with irq_comm.c and irqchip.c usage). This is not the case for > arm/arm64 currently. > > This patch unsets the flag for both arm and arm64. > > Signed-off-by: Eric Auger I don't fully understand why we used to have these and we don't need them anymore. Was it just a stupid bug in the past? Anyhow, looks reasonable: Acked-by: Christoffer Dall -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/3] KVM: arm: add irqfd support
On Sun, Nov 23, 2014 at 06:56:59PM +0100, Eric Auger wrote: > This patch enables irqfd on arm. > > Both irqfd and resamplefd are supported. Injection is implemented > in vgic.c without routing. > > This patch enables CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD. > > KVM_CAP_IRQFD is now advertised. KVM_CAP_IRQFD_RESAMPLE capability > automatically is advertised as soon as CONFIG_HAVE_KVM_IRQFD is set. > > Signed-off-by: Eric Auger > > --- > > v3 -> v4: > - reword commit message > - explain why we unlock the distributor before calling kvm_notify_acked_irq > - rename is_assigned_irq into has_notifier > - change EOI and injection kvm_debug format string > - remove error local variable in kvm_set_irq > - Move HAVE_KVM_IRQCHIP unset in a separate patch > - The rationale behind not supporting PPI irqfd injection is that > any device using a PPI would be a private-to-the-CPU device (timer for > instance), so its state would have to be context-switched along with the > VCPU and would require in-kernel wiring anyhow. It is not a relevant use > case for irqfds. this blob could go in the commit message. > - handle case were the irqfd injection is attempted before the vgic is ready. > in such a case the notifier, if any, is called immediatly > - use nr_irqs to test spi is within correct range > > v2 -> v3: > - removal of irq.h from eventfd.c put in a separate patch to increase > visibility > - properly expose KVM_CAP_IRQFD capability in arm.c > - remove CONFIG_HAVE_KVM_IRQCHIP meaningfull only if irq_comm.c is used > > v1 -> v2: > - rebase on 3.17rc1 > - move of the dist unlock in process_maintenance > - remove of dist lock in __kvm_vgic_sync_hwstate > - rewording of the commit message (add resamplefd reference) > - remove irq.h > --- > Documentation/virtual/kvm/api.txt | 5 ++- > arch/arm/include/uapi/asm/kvm.h | 3 ++ > arch/arm/kvm/Kconfig | 2 ++ > arch/arm/kvm/Makefile | 2 +- > arch/arm/kvm/arm.c| 3 ++ > virt/kvm/arm/vgic.c | 72 > --- > 6 files changed, 81 insertions(+), 6 deletions(-) > > diff --git a/Documentation/virtual/kvm/api.txt > b/Documentation/virtual/kvm/api.txt > index 7610eaa..4deccc0 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -2206,7 +2206,7 @@ into the hash PTE second double word). > 4.75 KVM_IRQFD > > Capability: KVM_CAP_IRQFD > -Architectures: x86 s390 > +Architectures: x86 s390 arm > Type: vm ioctl > Parameters: struct kvm_irqfd (in) > Returns: 0 on success, -1 on error > @@ -2232,6 +2232,9 @@ Note that closing the resamplefd is not sufficient to > disable the > irqfd. The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment > and need not be specified with KVM_IRQFD_FLAG_DEASSIGN. > > +On arm, the gsi must be a shared peripheral interrupt (SPI). > +This means the corresponding programmed GIC interrupt ID is gsi+32. > + On ARM, the gsi field in the kvm_irqfd struct specifies the Shared Peripheral Interrupt (SPI) index, such that the GIC interrupt ID is given by gsi + 32. > 4.76 KVM_PPC_ALLOCATE_HTAB > > Capability: KVM_CAP_PPC_ALLOC_HTAB > diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h > index 09ee408..77547bb 100644 > --- a/arch/arm/include/uapi/asm/kvm.h > +++ b/arch/arm/include/uapi/asm/kvm.h > @@ -196,6 +196,9 @@ struct kvm_arch_memory_slot { > /* Highest supported SPI, from VGIC_NR_IRQS */ > #define KVM_ARM_IRQ_GIC_MAX 127 > > +/* One single KVM irqchip, ie. the VGIC */ > +#define KVM_NR_IRQCHIPS 1 > + > /* PSCI interface */ > #define KVM_PSCI_FN_BASE 0x95c1ba5e > #define KVM_PSCI_FN(n) (KVM_PSCI_FN_BASE + (n)) > diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig > index 9f581b1..e519a40 100644 > --- a/arch/arm/kvm/Kconfig > +++ b/arch/arm/kvm/Kconfig > @@ -24,6 +24,7 @@ config KVM > select KVM_MMIO > select KVM_ARM_HOST > depends on ARM_VIRT_EXT && ARM_LPAE > + select HAVE_KVM_EVENTFD > ---help--- > Support hosting virtualized guest machines. You will also > need to select one or more of the processor modules below. > @@ -55,6 +56,7 @@ config KVM_ARM_MAX_VCPUS > config KVM_ARM_VGIC > bool "KVM support for Virtual GIC" > depends on KVM_ARM_HOST && OF > + select HAVE_KVM_IRQFD > default y > ---help--- > Adds support for a hardware assisted, in-kernel GIC emulation. > diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile > index f7057ed..859db09 100644 > --- a/arch/arm/kvm/Makefile > +++ b/arch/arm/kvm/Makefile > @@ -15,7 +15,7 @@ AFLAGS_init.o := -Wa,-march=armv7-a$(plus_virt) > AFLAGS_interrupts.o := -Wa,-march=armv7-a$(plus_virt) > > KVM := ../../../virt/kvm > -kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o > +kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o > > obj-y +=
Re: [PATCH v4 3/3] KVM: arm64: add irqfd support
On Sun, Nov 23, 2014 at 06:57:00PM +0100, Eric Auger wrote: > From: Joel Schopp > > This patch enables irqfd for arm64. > > Signed-off-by: Joel Schopp > Signed-off-by: Eric Auger > > --- > > [Eric Auger] > - originates from Joel's [RFC PATCH] arm64: KVM: add irqfd support > http://www.spinics.net/lists/kvm-arm/msg10798.html > - isolates modifications really related to irqfd > --- This looks overly complicated to preserve authorship, if Joel is ok with it, I suggest sqaushing this into the previous patch. -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: x86: export get_xsave_addr
get_xsave_addr is the API to access XSAVE states, and KVM would like to use it. Export it. Cc: x...@kernel.org Cc: H. Peter Anvin Signed-off-by: Paolo Bonzini --- Peter, can you please ACK this for inclusion in the KVM tree? Thanks. --- arch/x86/kernel/xsave.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c index 4c540c4719d8..0de1fae2bdf0 100644 --- a/arch/x86/kernel/xsave.c +++ b/arch/x86/kernel/xsave.c @@ -738,3 +738,4 @@ void *get_xsave_addr(struct xsave_struct *xsave, int xstate) return (void *)xsave + xstate_comp_offsets[feature]; } +EXPORT_SYMBOL_GPL(get_xsave_addr); -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: Remove FIXMEs in emulate.c
On 24/11/2014 05:25, Nicholas Krause wrote: > Remove fixme comments about needing fault addresses to be returned. These > are propaagated from walk_addr_generic to gva_to_gpa and from there to > ops->read_std and ops->write_std. > > Signed-off-by: Nicholas Krause > --- > arch/x86/kvm/emulate.c | 4 > 1 file changed, 4 deletions(-) > > diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c > index 9f8a2fa..16b5d52 100644 > --- a/arch/x86/kvm/emulate.c > +++ b/arch/x86/kvm/emulate.c > @@ -2751,7 +2751,6 @@ static int task_switch_32(struct x86_emulate_ctxt *ctxt, > ret = ops->read_std(ctxt, old_tss_base, &tss_seg, sizeof tss_seg, > &ctxt->exception); > if (ret != X86EMUL_CONTINUE) > - /* FIXME: need to provide precise fault address */ > return ret; > > save_state_to_tss32(ctxt, &tss_seg); > @@ -2760,13 +2759,11 @@ static int task_switch_32(struct x86_emulate_ctxt > *ctxt, > ret = ops->write_std(ctxt, old_tss_base + eip_offset, &tss_seg.eip, >ldt_sel_offset - eip_offset, &ctxt->exception); > if (ret != X86EMUL_CONTINUE) > - /* FIXME: need to provide precise fault address */ > return ret; > > ret = ops->read_std(ctxt, new_tss_base, &tss_seg, sizeof tss_seg, > &ctxt->exception); > if (ret != X86EMUL_CONTINUE) > - /* FIXME: need to provide precise fault address */ > return ret; > > if (old_tss_sel != 0x) { > @@ -2777,7 +2774,6 @@ static int task_switch_32(struct x86_emulate_ctxt *ctxt, >sizeof tss_seg.prev_task_link, >&ctxt->exception); > if (ret != X86EMUL_CONTINUE) > - /* FIXME: need to provide precise fault address */ > return ret; > } > > Thanks, this patch has been applied already. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/3] irqfd support for arm/arm64
On 11/24/2014 10:47 AM, Christoffer Dall wrote: > On Sun, Nov 23, 2014 at 06:56:57PM +0100, Eric Auger wrote: >> This patch series enables irqfd on arm and arm64. >> >> Irqfd framework enables to inject a virtual IRQ into a guest upon an >> eventfd trigger. User-side uses KVM_IRQFD VM ioctl to provide KVM with >> a kvm_irqfd struct that associates a VM, an eventfd, a virtual IRQ number >> (aka. the gsi). When an actor signals the eventfd (typically a VFIO >> platform driver), the kvm irqfd subsystem injects the gsi into the VM. >> >> Resamplefd also is supported for level sensitive interrupts, ie. the >> user can provide another eventfd that is triggered when the completion >> of the virtual IRQ (gsi) is detected by the GIC. >> >> The gsi must correspond to a shared peripheral interrupt (SPI), ie the >> GIC interrupt ID is gsi + 32. >> >> The rationale behind not supporting PPI irqfd injection is that >> any device using a PPI would be a private-to-the-CPU device (timer for >> instance), so its state would have to be context-switched along with the >> VCPU and would require in-kernel wiring anyhow. It is not a relevant use >> case for irqfds. >> >> this patch enables CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD. >> >> No IRQ routing table is used, enabling to remove CONFIG_HAVE_KVM_IRQCHIP >> >> can be found at git://git.linaro.org/people/eric.auger/linux.git >> on branch irqfd_integ_v8 >> >> This work was tested with Calxeda Midway xgmac main interrupt with >> qemu-system-arm and QEMU VFIO platform device. Also irqfd was proven >> functional on several vhost-net prototypes. >> >> v3 -> v4: >> - rebase on 3.18rc5 >> - vgic dynamic instantiation brought new challenges: >> handling of irqfd injection when vgic is not ready >> - unset of CONFIG_HAVE_KVM_IRQCHIP in a separate patch >> - add arm64 enable >> - vgic.c style modifications according to Christoffer comments >> > > There also seems to be a different split of the patches here? Hi Christoffer, yes I added arm64b support and moved CONFIG_HAVE_KVM_IRQCHIP removal in a separate patch. > > We've probably also reached the point where you need to start rebasing > on Andre's GICv3 patches, which I expect will go in first. the patch applies without conflict on Andre's series. BR Eric > > Thanks, > -Christoffer > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Another Obsolete Fix me in trace.h?
On 24/11/2014 05:36, nick wrote: > Greetings Again Gleb and others, > I am assuming in the code I am pasting below the fix me is obsolete now and I > can remove it. :) > Cheers Nick > TP_printk("%s (0x%x)", > __print_symbolic(__entry->exception, kvm_trace_sym_exc), >/* FIXME: don't print error_code if not present */ > __entry->has_error ? __entry->error_code : 0) > ); > No, it's not obsolete, the idea is to print only %s instead of %s (0x%x) if __entry->has_error is false. I don't know the trace API well enough to know if that is possible. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CFT PATCH 2/2] KVM: x86: support XSAVES usage in the host
On 24/11/2014 03:10, Wanpeng Li wrote: > Hi Paolo, > On Fri, Nov 21, 2014 at 07:31:18PM +0100, Paolo Bonzini wrote: > [...] >> +u64 feature = valid & -valid; >> +int index = fls64(feature) - 1; >> +void *src = get_xsave_addr(xsave, feature); >> + >> +if (src) { >> +u32 size, offset, ecx, edx; >> +cpuid_count(XSTATE_CPUID, index, >> +&size, &offset, &ecx, &edx); >> +memcpy(dest + offset, src, size); > > The offset you get is still for compact format No, it's not, or all old software using XSAVE/XRSTOR would be broken. The code in arch/x86/kernel/xsave.c agrees with me; compacted offsets (xsave_comp_offsets) are computed by summing sizes, while non-compacted offsets (xsave_offsets) come for CPUID. > , so you almost convert compat > format to compat format instead of convert compact format to standard format. > In addition, I think convert standard format to compact format should be > implemented in put path. If I do that, userspace is broken because it expects standard format. Hence, passing XSAVE data to userspace in compact format can be done, but has to be guarded by an explicitly enabled capability (using KVM_ENABLE_CAP). I do not think that's useful, since no supervisor-specific states are defined yet, and anyway they can be passed using KVM_GET/SET_MSR because this is not a fast path. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Another Obsolete Fix me in trace.h?
On 2014-11-24 11:12, Paolo Bonzini wrote: > On 24/11/2014 05:36, nick wrote: >> Greetings Again Gleb and others, >> I am assuming in the code I am pasting below the fix me is obsolete now and >> I can remove it. :) >> Cheers Nick >> TP_printk("%s (0x%x)", >> __print_symbolic(__entry->exception, kvm_trace_sym_exc), >>/* FIXME: don't print error_code if not present */ >> __entry->has_error ? __entry->error_code : 0) >> ); >> > > No, it's not obsolete, the idea is to print only > >%s > > instead of > >%s (0x%x) > > if __entry->has_error is false. I don't know the trace API well enough > to know if that is possible. Last time I ran across such a scenario, it was not feasible and essentially required separate tracepoints. But maybe Steven knows a trick. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: export get_xsave_addr
On Mon, 24 Nov 2014, Paolo Bonzini wrote: > get_xsave_addr is the API to access XSAVE states, and KVM would > like to use it. Export it. > > Cc: x...@kernel.org > Cc: H. Peter Anvin > Signed-off-by: Paolo Bonzini > --- > Peter, can you please ACK this for inclusion in the KVM tree? Are you content with my acked-by as well? Acked-by: Thomas Gleixner -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: export get_xsave_addr
On 24/11/2014 11:45, Thomas Gleixner wrote: > On Mon, 24 Nov 2014, Paolo Bonzini wrote: >> get_xsave_addr is the API to access XSAVE states, and KVM would >> like to use it. Export it. >> >> Cc: x...@kernel.org >> Cc: H. Peter Anvin >> Signed-off-by: Paolo Bonzini >> --- >> Peter, can you please ACK this for inclusion in the KVM tree? > > Are you content with my acked-by as well? > > Acked-by: Thomas Gleixner Sure. :) Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 0/4] vgic additions for forwarded irq
On Sun, Nov 23, 2014 at 07:12:49PM +0100, Eric Auger wrote: > This series applies on top of "ARM: Forwarding physical > interrupts to a guest VM" (http://lwn.net/Articles/603514/) > series. Marc and Eric, Does it make sense to review and look at these patches given the current state of the forwarding patches, or should we wait until Marc respins that series? > > It brings some extra functionalities that were requested to > be able to inject virtual level sensitive IRQs triggered from > VFIO/irqfd. > > It adds: > - a specific handling of forwarded IRQ into the VGIC state machine. > - deactivation of physical IRQ and unforwarding on vgic destruction > - handling of forwarded IRQ injection before the vgic readiness: > this was needed because in a sample qemu/vfio use case, qemu > registers forwarded IRQ and set up VFIO signaling before the first > vcpu run and hence before vgic readiness. At that time some > physical IRQ may hit before the VGIC readiness. This is typically > observed with Calxeda xgmac on second QEMU run. this seems related to my note in the last patch? Same or different problem? -Christoffer > - rbtree lock addition. > > Integrated pieces can be found at > ssh://git.linaro.org/people/eric.auger/linux.git > on branch irqfd_integ_v8 > > The first 2 patch files were previously part of [RFC v2 0/9] > KVM-VFIO IRQ forward control (https://lkml.org/lkml/2014/9/1/347). > > > Eric Auger (4): > KVM: arm: vgic: fix state machine for forwarded IRQ > KVM: arm: vgic: add forwarded irq rbtree lock > KVM: arm: vgic: cleanup forwarded IRQs on destroy > KVM: arm: vgic: handle irqfd forwarded IRQ injection before vgic > readiness > > include/kvm/arm_vgic.h | 1 + > virt/kvm/arm/vgic.c| 128 > + > 2 files changed, 110 insertions(+), 19 deletions(-) > > -- > 1.9.1 > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 0/4] vgic additions for forwarded irq
On 24/11/14 10:50, Christoffer Dall wrote: > On Sun, Nov 23, 2014 at 07:12:49PM +0100, Eric Auger wrote: >> This series applies on top of "ARM: Forwarding physical >> interrupts to a guest VM" (http://lwn.net/Articles/603514/) >> series. > > Marc and Eric, > > Does it make sense to review and look at these patches given the current > state of the forwarding patches, or should we wait until Marc respins > that series? I'm still in the process of respining all of this (we've had a few iterations with tglx, but I got sidetracked with the ITS/v2m/MSI side of things). I'll get back onto this this week. Hopefully. Thanks, M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/3] KVM: arm: add irqfd support
On 11/24/2014 11:00 AM, Christoffer Dall wrote: > On Sun, Nov 23, 2014 at 06:56:59PM +0100, Eric Auger wrote: >> This patch enables irqfd on arm. >> >> Both irqfd and resamplefd are supported. Injection is implemented >> in vgic.c without routing. >> >> This patch enables CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD. >> >> KVM_CAP_IRQFD is now advertised. KVM_CAP_IRQFD_RESAMPLE capability >> automatically is advertised as soon as CONFIG_HAVE_KVM_IRQFD is set. >> >> Signed-off-by: Eric Auger >> >> --- >> >> v3 -> v4: >> - reword commit message >> - explain why we unlock the distributor before calling kvm_notify_acked_irq >> - rename is_assigned_irq into has_notifier >> - change EOI and injection kvm_debug format string >> - remove error local variable in kvm_set_irq >> - Move HAVE_KVM_IRQCHIP unset in a separate patch >> - The rationale behind not supporting PPI irqfd injection is that >> any device using a PPI would be a private-to-the-CPU device (timer for >> instance), so its state would have to be context-switched along with the >> VCPU and would require in-kernel wiring anyhow. It is not a relevant use >> case for irqfds. > > this blob could go in the commit message. OK > >> - handle case were the irqfd injection is attempted before the vgic is ready. >> in such a case the notifier, if any, is called immediatly >> - use nr_irqs to test spi is within correct range >> >> v2 -> v3: >> - removal of irq.h from eventfd.c put in a separate patch to increase >> visibility >> - properly expose KVM_CAP_IRQFD capability in arm.c >> - remove CONFIG_HAVE_KVM_IRQCHIP meaningfull only if irq_comm.c is used >> >> v1 -> v2: >> - rebase on 3.17rc1 >> - move of the dist unlock in process_maintenance >> - remove of dist lock in __kvm_vgic_sync_hwstate >> - rewording of the commit message (add resamplefd reference) >> - remove irq.h >> --- >> Documentation/virtual/kvm/api.txt | 5 ++- >> arch/arm/include/uapi/asm/kvm.h | 3 ++ >> arch/arm/kvm/Kconfig | 2 ++ >> arch/arm/kvm/Makefile | 2 +- >> arch/arm/kvm/arm.c| 3 ++ >> virt/kvm/arm/vgic.c | 72 >> --- >> 6 files changed, 81 insertions(+), 6 deletions(-) >> >> diff --git a/Documentation/virtual/kvm/api.txt >> b/Documentation/virtual/kvm/api.txt >> index 7610eaa..4deccc0 100644 >> --- a/Documentation/virtual/kvm/api.txt >> +++ b/Documentation/virtual/kvm/api.txt >> @@ -2206,7 +2206,7 @@ into the hash PTE second double word). >> 4.75 KVM_IRQFD >> >> Capability: KVM_CAP_IRQFD >> -Architectures: x86 s390 >> +Architectures: x86 s390 arm >> Type: vm ioctl >> Parameters: struct kvm_irqfd (in) >> Returns: 0 on success, -1 on error >> @@ -2232,6 +2232,9 @@ Note that closing the resamplefd is not sufficient to >> disable the >> irqfd. The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment >> and need not be specified with KVM_IRQFD_FLAG_DEASSIGN. >> >> +On arm, the gsi must be a shared peripheral interrupt (SPI). >> +This means the corresponding programmed GIC interrupt ID is gsi+32. >> + > > On ARM, the gsi field in the kvm_irqfd struct specifies the Shared > Peripheral Interrupt (SPI) index, such that the GIC interrupt ID is > given by gsi + 32. OK > >> 4.76 KVM_PPC_ALLOCATE_HTAB >> >> Capability: KVM_CAP_PPC_ALLOC_HTAB >> diff --git a/arch/arm/include/uapi/asm/kvm.h >> b/arch/arm/include/uapi/asm/kvm.h >> index 09ee408..77547bb 100644 >> --- a/arch/arm/include/uapi/asm/kvm.h >> +++ b/arch/arm/include/uapi/asm/kvm.h >> @@ -196,6 +196,9 @@ struct kvm_arch_memory_slot { >> /* Highest supported SPI, from VGIC_NR_IRQS */ >> #define KVM_ARM_IRQ_GIC_MAX 127 >> >> +/* One single KVM irqchip, ie. the VGIC */ >> +#define KVM_NR_IRQCHIPS 1 >> + >> /* PSCI interface */ >> #define KVM_PSCI_FN_BASE0x95c1ba5e >> #define KVM_PSCI_FN(n) (KVM_PSCI_FN_BASE + (n)) >> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig >> index 9f581b1..e519a40 100644 >> --- a/arch/arm/kvm/Kconfig >> +++ b/arch/arm/kvm/Kconfig >> @@ -24,6 +24,7 @@ config KVM >> select KVM_MMIO >> select KVM_ARM_HOST >> depends on ARM_VIRT_EXT && ARM_LPAE >> +select HAVE_KVM_EVENTFD >> ---help--- >>Support hosting virtualized guest machines. You will also >>need to select one or more of the processor modules below. >> @@ -55,6 +56,7 @@ config KVM_ARM_MAX_VCPUS >> config KVM_ARM_VGIC >> bool "KVM support for Virtual GIC" >> depends on KVM_ARM_HOST && OF >> +select HAVE_KVM_IRQFD >> default y >> ---help--- >>Adds support for a hardware assisted, in-kernel GIC emulation. >> diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile >> index f7057ed..859db09 100644 >> --- a/arch/arm/kvm/Makefile >> +++ b/arch/arm/kvm/Makefile >> @@ -15,7 +15,7 @@ AFLAGS_init.o := -Wa,-march=armv7-a$(plus_virt) >> AFLAGS_interrupts.o := -Wa,-march=armv7-a$(plus_virt) >> >> K
Re: [RFC 0/4] vgic additions for forwarded irq
On 11/24/2014 11:54 AM, Marc Zyngier wrote: > On 24/11/14 10:50, Christoffer Dall wrote: >> On Sun, Nov 23, 2014 at 07:12:49PM +0100, Eric Auger wrote: >>> This series applies on top of "ARM: Forwarding physical >>> interrupts to a guest VM" (http://lwn.net/Articles/603514/) >>> series. >> >> Marc and Eric, >> >> Does it make sense to review and look at these patches given the current >> state of the forwarding patches, or should we wait until Marc respins >> that series? > > I'm still in the process of respining all of this (we've had a few > iterations with tglx, but I got sidetracked with the ITS/v2m/MSI side of > things). Hi Marc, Christoffer, for your info, I integrated with kvm-arm64/irq-forward branch prior to ITS integration, not with the RFC. Best Regards Eric > > I'll get back onto this this week. Hopefully. > > Thanks, > > M. > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 1/3] KVM: arm/arm64: unset CONFIG_HAVE_KVM_IRQCHIP
On Sun, Nov 23, 2014 at 05:56:58PM +, Eric Auger wrote: > CONFIG_HAVE_KVM_IRQCHIP is needed to support IRQ routing (along > with irq_comm.c and irqchip.c usage). This is not the case for > arm/arm64 currently. > > This patch unsets the flag for both arm and arm64. > > Signed-off-by: Eric Auger > --- The arm64 change is fine by me. I assume this will go via the kvm tree eventually? Acked-by: Will Deacon Will > arch/arm/kvm/Kconfig | 2 -- > arch/arm64/kvm/Kconfig | 1 - > 2 files changed, 3 deletions(-) > > diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig > index 466bd29..9f581b1 100644 > --- a/arch/arm/kvm/Kconfig > +++ b/arch/arm/kvm/Kconfig > @@ -55,7 +55,6 @@ config KVM_ARM_MAX_VCPUS > config KVM_ARM_VGIC > bool "KVM support for Virtual GIC" > depends on KVM_ARM_HOST && OF > - select HAVE_KVM_IRQCHIP > default y > ---help--- > Adds support for a hardware assisted, in-kernel GIC emulation. > @@ -63,7 +62,6 @@ config KVM_ARM_VGIC > config KVM_ARM_TIMER > bool "KVM support for Architected Timers" > depends on KVM_ARM_VGIC && ARM_ARCH_TIMER > - select HAVE_KVM_IRQCHIP > default y > ---help--- > Adds support for the Architected Timers in virtual machines > diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig > index 8ba85e9..279e1a0 100644 > --- a/arch/arm64/kvm/Kconfig > +++ b/arch/arm64/kvm/Kconfig > @@ -50,7 +50,6 @@ config KVM_ARM_MAX_VCPUS > config KVM_ARM_VGIC > bool > depends on KVM_ARM_HOST && OF > - select HAVE_KVM_IRQCHIP > ---help--- > Adds support for a hardware assisted, in-kernel GIC emulation. > > -- > 1.9.1 > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Exposing host debug capabilities to userspace
On 24 November 2014 at 11:16, Alex Bennée wrote: ^^^ :-) > Alex Bennée writes: >> * KVM ioctl KVM_GET_DEBUGREGS >> >> This is currently x86 only and looks like it's more aimed at debug >> registers than capability stuff. Also I'm not sure what the state of >> this ioctl is compared to KVM_SET_GUEST_DEBUG. Do these APIs overlap or >> is one an older deprecated x86 only API? > > I'm minded to re-use this ioctl and define it for ARM as reading the > host debug architecture state ID_AA64DFR0/1_EL1. Currently for x86 it's > used for getting vcpu debug registers which on ARM is handled via the > GET/SET one reg interface. This seems a bit odd. Either the x86 use of this ioctl is for accessing guest state, in which case using it on ARM for host state is a bit weird, or else why is x86 doing its debug via host state and ARM using guest state? It may well still be the best choice, but it just feels like maybe something isn't lined up right... -- PMM -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Exposing host debug capabilities to userspace
Fixed CC:kvmarm, Added: Alexander Graf, Fixed: my From: Replying to myself with additional information on each option Alex Bennée writes: > Hi, > > I've almost finished the ARMv8 guest debug support but I have one > problem left to solve. userspace needs to know how many hardware debug > registers are available for GDB to use. This information is available > from the ID_AA64DFR0_EL1 register. Currently I abuse GET_ONE_REG to > fetch it's value however semantically this is poor as it's API is for > getting guest state not host state and they could theoretically have > different values. > > So far the options I've examined are: > > * KVM ioctl GET_ONE_REG(ID_AA64DFR0_EL1) Nope, guest state API > * ptrace(PTRACE_GETREGSET, NT_ARM_HW_WATCH) Nope, ptrace requires attachment and you can't attach to your own thread group. > * KVM ioctl KVM_GET_DEBUGREGS > > This is currently x86 only and looks like it's more aimed at debug > registers than capability stuff. Also I'm not sure what the state of > this ioctl is compared to KVM_SET_GUEST_DEBUG. Do these APIs overlap or > is one an older deprecated x86 only API? I'm minded to re-use this ioctl and define it for ARM as reading the host debug architecture state ID_AA64DFR0/1_EL1. Currently for x86 it's used for getting vcpu debug registers which on ARM is handled via the GET/SET one reg interface. > * Export the information via sysfs > > I suppose the correct canonical non-subsystem specific way to make this > information available it to expose the data in some sort of sysfs node? > However I don't see any existing sysfs structure for the CPU. I suspect this would get complicated depending on the architecture. > * Expand /proc/cpuinfo > > I suspect adding extra text to be badly parsed by userspace is just > horrid and unacceptable behaviour ;-) > > * Add another KVM ioctl? > > This would have the downside of being specific to KVM and of course > proliferating the API space again. So unless there are any objections my intention to re-use the existing API calls for ARM architectures. -- Alex Bennée -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CFT PATCH 0/2] KVM: support XSAVES usage in the host
On 23/11/2014 09:16, Nadav Amit wrote: > I’ll try to check it tomorrow (I don’t have access to the failing machine at > the moment). Thanks, you'll need to squash this in: diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c index 4c540c4719d8..0de1fae2bdf0 100644 --- a/arch/x86/kernel/xsave.c +++ b/arch/x86/kernel/xsave.c @@ -738,3 +738,4 @@ void *get_xsave_addr(struct xsave_struct *xsave, int xstate) return (void *)xsave + xstate_comp_offsets[feature]; } +EXPORT_SYMBOL_GPL(get_xsave_addr); Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 40/41] vhost/scsi: partial virtio 1.0 support
Include all endian conversions as required by virtio 1.0. Don't set virtio 1.0 yet, since that requires ANY_LAYOUT which we don't yet support. Signed-off-by: Michael S. Tsirkin --- drivers/vhost/scsi.c | 22 -- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index a17f118..01c01cb 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -168,6 +168,7 @@ enum { VHOST_SCSI_VQ_IO = 2, }; +/* Note: can't set VIRTIO_F_VERSION_1 yet, since that implies ANY_LAYOUT. */ enum { VHOST_SCSI_FEATURES = VHOST_FEATURES | (1ULL << VIRTIO_SCSI_F_HOTPLUG) | (1ULL << VIRTIO_SCSI_F_T10_PI) @@ -577,8 +578,8 @@ tcm_vhost_allocate_evt(struct vhost_scsi *vs, return NULL; } - evt->event.event = event; - evt->event.reason = reason; + evt->event.event = cpu_to_vhost32(vq, event); + evt->event.reason = cpu_to_vhost32(vq, reason); vs->vs_events_nr++; return evt; @@ -636,7 +637,7 @@ again: } if (vs->vs_events_missed) { - event->event |= VIRTIO_SCSI_T_EVENTS_MISSED; + event->event |= cpu_to_vhost32(vq, VIRTIO_SCSI_T_EVENTS_MISSED); vs->vs_events_missed = false; } @@ -695,12 +696,13 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work *work) cmd, se_cmd->residual_count, se_cmd->scsi_status); memset(&v_rsp, 0, sizeof(v_rsp)); - v_rsp.resid = se_cmd->residual_count; + v_rsp.resid = cpu_to_vhost32(cmd->tvc_vq, se_cmd->residual_count); /* TODO is status_qualifier field needed? */ v_rsp.status = se_cmd->scsi_status; - v_rsp.sense_len = se_cmd->scsi_sense_length; + v_rsp.sense_len = cpu_to_vhost32(cmd->tvc_vq, +se_cmd->scsi_sense_length); memcpy(v_rsp.sense, cmd->tvc_sense_buf, - v_rsp.sense_len); + se_cmd->scsi_sense_length); ret = copy_to_user(cmd->tvc_resp, &v_rsp, sizeof(v_rsp)); if (likely(ret == 0)) { struct vhost_scsi_virtqueue *q; @@ -1095,14 +1097,14 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq) ", but wrong data_direction\n"); goto err_cmd; } - prot_bytes = v_req_pi.pi_bytesout; + prot_bytes = vhost32_to_cpu(vq, v_req_pi.pi_bytesout); } else if (v_req_pi.pi_bytesin) { if (data_direction != DMA_FROM_DEVICE) { vq_err(vq, "Received non zero di_pi_niov" ", but wrong data_direction\n"); goto err_cmd; } - prot_bytes = v_req_pi.pi_bytesin; + prot_bytes = vhost32_to_cpu(vq, v_req_pi.pi_bytesin); } if (prot_bytes) { int tmp = 0; @@ -1117,12 +1119,12 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq) data_first += prot_niov; data_niov = data_num - prot_niov; } - tag = v_req_pi.tag; + tag = vhost64_to_cpu(vq, v_req_pi.tag); task_attr = v_req_pi.task_attr; cdb = &v_req_pi.cdb[0]; lun = ((v_req_pi.lun[2] << 8) | v_req_pi.lun[3]) & 0x3FFF; } else { - tag = v_req.tag; + tag = vhost64_to_cpu(vq, v_req.tag); task_attr = v_req.task_attr; cdb = &v_req.cdb[0]; lun = ((v_req.lun[2] << 8) | v_req.lun[3]) & 0x3FFF; -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 31/41] vhost/net: suppress compiler warning
len is always initialized since function is called with size > 0. Signed-off-by: Michael S. Tsirkin --- drivers/vhost/net.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 984242e..54ffbb0 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -501,7 +501,7 @@ static int get_rx_bufs(struct vhost_virtqueue *vq, int headcount = 0; unsigned d; int r, nlogs = 0; - u32 len; + u32 uninitialized_var(len); while (datalen > 0 && headcount < quota) { if (unlikely(seg >= UIO_MAXIOV)) { -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 27/41] vhost: make features 64 bit
We need to use bit 32 for virtio 1.0 Signed-off-by: Michael S. Tsirkin --- drivers/vhost/vhost.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index b9032e8..1f321fd 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -106,7 +106,7 @@ struct vhost_virtqueue { /* Protected by virtqueue mutex. */ struct vhost_memory *memory; void *private_data; - unsigned acked_features; + u64 acked_features; /* Log write descriptors */ void __user *log_base; struct vhost_log *log; -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 29/41] vhost/net: larger header for virtio 1.0
Signed-off-by: Michael S. Tsirkin --- drivers/vhost/net.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index cae22f9..1ac58d0 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -1027,7 +1027,8 @@ static int vhost_net_set_features(struct vhost_net *n, u64 features) size_t vhost_hlen, sock_hlen, hdr_len; int i; - hdr_len = (features & (1 << VIRTIO_NET_F_MRG_RXBUF)) ? + hdr_len = (features & ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | + (1ULL << VIRTIO_F_VERSION_1))) ? sizeof(struct virtio_net_hdr_mrg_rxbuf) : sizeof(struct virtio_net_hdr); if (features & (1 << VHOST_NET_F_VIRTIO_NET_HDR)) { -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 26/41] vhost: virtio 1.0 endian-ness support
Signed-off-by: Michael S. Tsirkin --- drivers/vhost/vhost.c | 93 +++ 1 file changed, 56 insertions(+), 37 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index c90f437..4d379ed 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -33,8 +33,8 @@ enum { VHOST_MEMORY_F_LOG = 0x1, }; -#define vhost_used_event(vq) ((u16 __user *)&vq->avail->ring[vq->num]) -#define vhost_avail_event(vq) ((u16 __user *)&vq->used->ring[vq->num]) +#define vhost_used_event(vq) ((__virtio16 __user *)&vq->avail->ring[vq->num]) +#define vhost_avail_event(vq) ((__virtio16 __user *)&vq->used->ring[vq->num]) static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh, poll_table *pt) @@ -1001,7 +1001,7 @@ EXPORT_SYMBOL_GPL(vhost_log_write); static int vhost_update_used_flags(struct vhost_virtqueue *vq) { void __user *used; - if (__put_user(vq->used_flags, &vq->used->flags) < 0) + if (__put_user(cpu_to_vhost16(vq, vq->used_flags), &vq->used->flags) < 0) return -EFAULT; if (unlikely(vq->log_used)) { /* Make sure the flag is seen before log. */ @@ -1019,7 +1019,7 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq) static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event) { - if (__put_user(vq->avail_idx, vhost_avail_event(vq))) + if (__put_user(cpu_to_vhost16(vq, vq->avail_idx), vhost_avail_event(vq))) return -EFAULT; if (unlikely(vq->log_used)) { void __user *used; @@ -1038,6 +1038,7 @@ static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event) int vhost_init_used(struct vhost_virtqueue *vq) { + __virtio16 last_used_idx; int r; if (!vq->private_data) return 0; @@ -1046,7 +1047,13 @@ int vhost_init_used(struct vhost_virtqueue *vq) if (r) return r; vq->signalled_used_valid = false; - return get_user(vq->last_used_idx, &vq->used->idx); + if (!access_ok(VERIFY_READ, &vq->used->idx, sizeof vq->used->idx)) + return -EFAULT; + r = __get_user(last_used_idx, &vq->used->idx); + if (r) + return r; + vq->last_used_idx = vhost16_to_cpu(vq, last_used_idx); + return 0; } EXPORT_SYMBOL_GPL(vhost_init_used); @@ -1087,16 +1094,16 @@ static int translate_desc(struct vhost_virtqueue *vq, u64 addr, u32 len, /* Each buffer in the virtqueues is actually a chain of descriptors. This * function returns the next descriptor in the chain, * or -1U if we're at the end. */ -static unsigned next_desc(struct vring_desc *desc) +static unsigned next_desc(struct vhost_virtqueue *vq, struct vring_desc *desc) { unsigned int next; /* If this descriptor says it doesn't chain, we're done. */ - if (!(desc->flags & VRING_DESC_F_NEXT)) + if (!(desc->flags & cpu_to_vhost16(vq, VRING_DESC_F_NEXT))) return -1U; /* Check they're not leading us off end of descriptors. */ - next = desc->next; + next = vhost16_to_cpu(vq, desc->next); /* Make sure compiler knows to grab that: we don't want it changing! */ /* We will use the result as an index in an array, so most * architectures only need a compiler barrier here. */ @@ -1113,18 +1120,19 @@ static int get_indirect(struct vhost_virtqueue *vq, { struct vring_desc desc; unsigned int i = 0, count, found = 0; + u32 len = vhost32_to_cpu(vq, indirect->len); int ret; /* Sanity check */ - if (unlikely(indirect->len % sizeof desc)) { + if (unlikely(len % sizeof desc)) { vq_err(vq, "Invalid length in indirect descriptor: " "len 0x%llx not multiple of 0x%zx\n", - (unsigned long long)indirect->len, + (unsigned long long)vhost32_to_cpu(vq, indirect->len), sizeof desc); return -EINVAL; } - ret = translate_desc(vq, indirect->addr, indirect->len, vq->indirect, + ret = translate_desc(vq, vhost64_to_cpu(vq, indirect->addr), len, vq->indirect, UIO_MAXIOV); if (unlikely(ret < 0)) { vq_err(vq, "Translation failure %d in indirect.\n", ret); @@ -1135,7 +1143,7 @@ static int get_indirect(struct vhost_virtqueue *vq, * architectures only need a compiler barrier here. */ read_barrier_depends(); - count = indirect->len / sizeof desc; + count = len / sizeof desc; /* Buffers are chained via a 16 bit next field, so * we can have at most 2^16 of these. */ if (unlikely(count > USHRT_MAX + 1)) { @@ -1155,16 +1163,17 @@ static int get_indirect(struct vhost_virtqueue *vq, if (unlikely(memcpy_fromiovec((unsig
[PATCH v3 30/41] vhost/net: enable virtio 1.0
Signed-off-by: Michael S. Tsirkin --- drivers/vhost/net.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 1ac58d0..984242e 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -61,7 +61,8 @@ MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;" enum { VHOST_NET_FEATURES = VHOST_FEATURES | (1ULL << VHOST_NET_F_VIRTIO_NET_HDR) | -(1ULL << VIRTIO_NET_F_MRG_RXBUF), +(1ULL << VIRTIO_NET_F_MRG_RXBUF) | +(1ULL << VIRTIO_F_VERSION_1), }; enum { -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 25/41] vhost/net: force len for TX to host endian
We use native endian-ness internally but never expose it to guest. Signed-off-by: Michael S. Tsirkin --- drivers/vhost/net.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 8dae2f7..dce5c58 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -48,15 +48,15 @@ MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;" * status internally; used for zerocopy tx only. */ /* Lower device DMA failed */ -#define VHOST_DMA_FAILED_LEN 3 +#define VHOST_DMA_FAILED_LEN ((__force __virtio32)3) /* Lower device DMA done */ -#define VHOST_DMA_DONE_LEN 2 +#define VHOST_DMA_DONE_LEN ((__force __virtio32)2) /* Lower device DMA in progress */ -#define VHOST_DMA_IN_PROGRESS 1 +#define VHOST_DMA_IN_PROGRESS ((__force __virtio32)1) /* Buffer unused */ -#define VHOST_DMA_CLEAR_LEN0 +#define VHOST_DMA_CLEAR_LEN((__force __virtio32)0) -#define VHOST_DMA_IS_DONE(len) ((len) >= VHOST_DMA_DONE_LEN) +#define VHOST_DMA_IS_DONE(len) ((__force u32)(len) >= (__force u32)VHOST_DMA_DONE_LEN) enum { VHOST_NET_FEATURES = VHOST_FEATURES | -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 28/41] vhost/net: virtio 1.0 byte swap
Signed-off-by: Michael S. Tsirkin --- drivers/vhost/net.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index dce5c58..cae22f9 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -416,7 +416,7 @@ static void handle_tx(struct vhost_net *net) struct ubuf_info *ubuf; ubuf = nvq->ubuf_info + nvq->upend_idx; - vq->heads[nvq->upend_idx].id = head; + vq->heads[nvq->upend_idx].id = cpu_to_vhost32(vq, head); vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS; ubuf->callback = vhost_zerocopy_callback; ubuf->ctx = nvq->ubufs; @@ -500,6 +500,7 @@ static int get_rx_bufs(struct vhost_virtqueue *vq, int headcount = 0; unsigned d; int r, nlogs = 0; + u32 len; while (datalen > 0 && headcount < quota) { if (unlikely(seg >= UIO_MAXIOV)) { @@ -527,13 +528,14 @@ static int get_rx_bufs(struct vhost_virtqueue *vq, nlogs += *log_num; log += *log_num; } - heads[headcount].id = d; - heads[headcount].len = iov_length(vq->iov + seg, in); - datalen -= heads[headcount].len; + heads[headcount].id = cpu_to_vhost32(vq, d); + len = iov_length(vq->iov + seg, in); + heads[headcount].len = cpu_to_vhost32(vq, len); + datalen -= len; ++headcount; seg += in; } - heads[headcount - 1].len += datalen; + heads[headcount - 1].len = cpu_to_vhost32(vq, len - datalen); *iovcount = seg; if (unlikely(log)) *log_num = nlogs; -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 24/41] vhost: add memory access wrappers
Signed-off-by: Michael S. Tsirkin --- drivers/vhost/vhost.h | 33 - 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 3eda654..b9032e8 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -174,6 +174,37 @@ enum { static inline int vhost_has_feature(struct vhost_virtqueue *vq, int bit) { - return vq->acked_features & (1 << bit); + return vq->acked_features & (1ULL << bit); +} + +/* Memory accessors */ +static inline u16 vhost16_to_cpu(struct vhost_virtqueue *vq, __virtio16 val) +{ + return __virtio16_to_cpu(vhost_has_feature(vq, VIRTIO_F_VERSION_1), val); +} + +static inline __virtio16 cpu_to_vhost16(struct vhost_virtqueue *vq, u16 val) +{ + return __cpu_to_virtio16(vhost_has_feature(vq, VIRTIO_F_VERSION_1), val); +} + +static inline u32 vhost32_to_cpu(struct vhost_virtqueue *vq, __virtio32 val) +{ + return __virtio32_to_cpu(vhost_has_feature(vq, VIRTIO_F_VERSION_1), val); +} + +static inline __virtio32 cpu_to_vhost32(struct vhost_virtqueue *vq, u32 val) +{ + return __cpu_to_virtio32(vhost_has_feature(vq, VIRTIO_F_VERSION_1), val); +} + +static inline u64 vhost64_to_cpu(struct vhost_virtqueue *vq, __virtio64 val) +{ + return __virtio64_to_cpu(vhost_has_feature(vq, VIRTIO_F_VERSION_1), val); +} + +static inline __virtio64 cpu_to_vhost64(struct vhost_virtqueue *vq, u64 val) +{ + return __cpu_to_virtio64(vhost_has_feature(vq, VIRTIO_F_VERSION_1), val); } #endif -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm/arm64: vgic: Remove unreachable irq_clear_pending
Hej Christoffer, On 24/11/14 09:41, Christoffer Dall wrote: > When 'injecting' an edge-triggered interrupt with a falling edge we > shouldn't clear the pending state on the distributor. In fact, we > don't, because the check in vgic_validate_injection would prevent us > from ever reaching this bit of code. > > Remove the unreachable snippet. > > Signed-off-by: Christoffer Dall Acked-by: Andre Przywara I agree on this. Would it make sense to rewrite this function a bit to make it more clearer what happens? I find the nesting of the if-statements counter-intuitive: I'd prefer to first differentiate between level and edge triggered and then only check the actual level in the level-triggered branch. Not sure if it's worth the fuss, though. Cheers, Andre. > --- > virt/kvm/arm/vgic.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c > index 3aaca49..f45cf16 100644 > --- a/virt/kvm/arm/vgic.c > +++ b/virt/kvm/arm/vgic.c > @@ -1643,8 +1643,6 @@ static bool vgic_update_irq_pending(struct kvm *kvm, > int cpuid, > vgic_dist_irq_clear_level(vcpu, irq_num); > if (!vgic_dist_irq_soft_pend(vcpu, irq_num)) > vgic_dist_irq_clear_pending(vcpu, irq_num); > - } else { > - vgic_dist_irq_clear_pending(vcpu, irq_num); > } > } > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [question] lots of interrupts injected to vm when pressing some key w/o releasing
>Hi all, > >If I press the one of "Insert/Delete/Home/End/PageUp/PageDown/UpArrow/ >DownArrow/LeftArrow/RightArrow" key w/o releasing, then lots of interrupts >will be injected to vm(win7/win2008), about 8000/s, the system become very >slow, >bringing very bad experience. But the other keys are okay. Sorry for wrong description, the interrupt rate is normal, but huge numbers of vmexit induced by PIO were produced. Thanks, Zhang Haoyu >And, linux guest has no this problem. > >If I remove the commit of 0bc830b05c667218d703f2026ec866c49df974fc, then the >problem disappeared, but win7 guest got stuck at booting stage. >And so strange that If the vm has only one vcpu, then the problem also >disappeared. > >Any ideas? > >Thanks, >Zhang Haoyu -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Exposing host debug capabilities to userspace
On Mon, Nov 24, 2014 at 12:21 PM, Peter Maydell wrote: > On 24 November 2014 at 11:16, Alex Bennée > > wrote: > > ^^^ :-) > >> Alex Bennée writes: >>> * KVM ioctl KVM_GET_DEBUGREGS >>> >>> This is currently x86 only and looks like it's more aimed at debug >>> registers than capability stuff. Also I'm not sure what the state of >>> this ioctl is compared to KVM_SET_GUEST_DEBUG. Do these APIs overlap or >>> is one an older deprecated x86 only API? >> >> I'm minded to re-use this ioctl and define it for ARM as reading the >> host debug architecture state ID_AA64DFR0/1_EL1. Currently for x86 it's >> used for getting vcpu debug registers which on ARM is handled via the >> GET/SET one reg interface. > > This seems a bit odd. Either the x86 use of this ioctl is > for accessing guest state, in which case using it on ARM > for host state is a bit weird, or else why is x86 doing > its debug via host state and ARM using guest state? > > It may well still be the best choice, but it just feels > like maybe something isn't lined up right... > It seems weird, agreed, but somehow what we're left with except for adding a new ioctl. I think part of the explanation may simply be that x86 solves this problem in an inherently different way. -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [question] lots of interrupts injected to vm when pressing some key w/o releasing
On 24/11/2014 13:17, Zhang Haoyu wrote: >> Hi all, >> >> If I press the one of "Insert/Delete/Home/End/PageUp/PageDown/UpArrow/ >> DownArrow/LeftArrow/RightArrow" key w/o releasing, then lots of interrupts >> will be injected to vm(win7/win2008), about 8000/s, the system become very >> slow, >> bringing very bad experience. But the other keys are okay. > > Sorry for wrong description, the interrupt rate is normal, > but huge numbers of vmexit induced by PIO were produced. This is expected when running Windows without paravirtualized time counter (-cpu ...,hv_time). Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Exposing host debug capabilities to userspace
On 24.11.14 12:35, Alex Bennée wrote: > > Fixed CC:kvmarm, Added: Alexander Graf, Fixed: my From: > > Replying to myself with additional information on each option > > Alex Bennée writes: > >> Hi, >> >> I've almost finished the ARMv8 guest debug support but I have one >> problem left to solve. userspace needs to know how many hardware debug >> registers are available for GDB to use. This information is available >> from the ID_AA64DFR0_EL1 register. Currently I abuse GET_ONE_REG to >> fetch it's value however semantically this is poor as it's API is for >> getting guest state not host state and they could theoretically have >> different values. >> >> So far the options I've examined are: >> >> * KVM ioctl GET_ONE_REG(ID_AA64DFR0_EL1) > Nope, guest state API What's the problem with using ONE_REG for this? After all, the total number of available guest debug register is a guest vcpu property of some sort. Btw, looking through the mess that we have today with asymmetric SET_GUEST_DEBUG and GET_DEBUGREGS ioctls I can't shake off the feeling that we're best off just doing all of the debug register sync via ONE_REGs. That way we at least have a guaranteed symmetric interface that can get and set values, so we won't trip over it on live migration. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Exposing host debug capabilities to userspace
On 24 November 2014 at 12:26, Alexander Graf wrote: > On 24.11.14 12:35, Alex Bennée wrote: >>> * KVM ioctl GET_ONE_REG(ID_AA64DFR0_EL1) >> Nope, guest state API > > What's the problem with using ONE_REG for this? After all, the total > number of available guest debug register is a guest vcpu property of > some sort. Yes, but we don't want to know about properties of the guest vCPU. In an ideal world QEMU could reserve say half the debug registers for debugging the VM on startup and have KVM expose ID registers indicating to the guest that it only had the other half... -- PMM -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Exposing host debug capabilities to userspace
> Am 24.11.2014 um 13:32 schrieb Peter Maydell : > >> On 24 November 2014 at 12:26, Alexander Graf wrote: >> On 24.11.14 12:35, Alex Bennée wrote: * KVM ioctl GET_ONE_REG(ID_AA64DFR0_EL1) >>> Nope, guest state API >> >> What's the problem with using ONE_REG for this? After all, the total >> number of available guest debug register is a guest vcpu property of >> some sort. > > Yes, but we don't want to know about properties of the guest > vCPU. In an ideal world QEMU could reserve say half the debug > registers for debugging the VM on startup and have KVM expose > ID registers indicating to the guest that it only had the > other half... Yup, so create another (read-only) ONE_REG that exposes the number of actual guest debug registers. Alex > > -- PMM -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Exposing host debug capabilities to userspace
On 24 November 2014 at 12:41, Alexander Graf wrote: >> Am 24.11.2014 um 13:32 schrieb Peter Maydell : >> Yes, but we don't want to know about properties of the guest >> vCPU. In an ideal world QEMU could reserve say half the debug >> registers for debugging the VM on startup and have KVM expose >> ID registers indicating to the guest that it only had the >> other half... > > Yup, so create another (read-only) ONE_REG that exposes the number > of actual guest debug registers. I'm confused. ONE_REG is for guest state, and the ID register by definition is how we tell the guest how many debug registers it has. What we want to know (and perhaps even control) for debugging the VM is how many debug registers the host has. -- PMM -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Exposing host debug capabilities to userspace
On 24.11.14 13:44, Peter Maydell wrote: > On 24 November 2014 at 12:41, Alexander Graf wrote: >>> Am 24.11.2014 um 13:32 schrieb Peter Maydell : >>> Yes, but we don't want to know about properties of the guest >>> vCPU. In an ideal world QEMU could reserve say half the debug >>> registers for debugging the VM on startup and have KVM expose >>> ID registers indicating to the guest that it only had the >>> other half... >> >> Yup, so create another (read-only) ONE_REG that exposes the number >> of actual guest debug registers. > > I'm confused. ONE_REG is for guest state, and the ID register > by definition is how we tell the guest how many debug registers > it has. What we want to know (and perhaps even control) for > debugging the VM is how many debug registers the host has. No, we don't want to know how many debug registers the host has. We want to know how many debug registers the guest has. Imagine you're running on A57 today with 8 debug registers (no idea if that's true, but assume it is). Tomorrow there will be a new core - let's call it A67 - with 16 debug registers. To make sure your legacy, badly written guest behaves exactly the same - especially after live migration - you want to spawn a VM with -cpu A57. That implies you want to expose 8 debug registers into the guest. So debug register synchronization should only be aware of those 8 registers. So what we really care about is the number of debug registers available to a guest vcpu. That in turn means it's guest state and as such can easily go into ONE_REG. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm/arm64: vgic: Remove unreachable irq_clear_pending
On Mon, Nov 24, 2014 at 12:04:11PM +, Andre Przywara wrote: > Hej Christoffer, > > On 24/11/14 09:41, Christoffer Dall wrote: > > When 'injecting' an edge-triggered interrupt with a falling edge we > > shouldn't clear the pending state on the distributor. In fact, we > > don't, because the check in vgic_validate_injection would prevent us > > from ever reaching this bit of code. > > > > Remove the unreachable snippet. > > > > Signed-off-by: Christoffer Dall > > Acked-by: Andre Przywara > > I agree on this. Would it make sense to rewrite this function a bit to > make it more clearer what happens? I find the nesting of the > if-statements counter-intuitive: I'd prefer to first differentiate > between level and edge triggered and then only check the actual level in > the level-triggered branch. Not sure if it's worth the fuss, though. > I disagree, and it's not ;) -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] kvm: x86: potential shift wrapping bug
cs.base is declared as a __u64 variable and vector is a u32 so this causes a static checker warning. I'm not very familiar with this code but my understanding is that the user can set "sipi_vector" to any u32 value in kvm_vcpu_ioctl_x86_set_vcpu_events(). Signed-off-by: Dan Carpenter diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 34c8f94..6608115 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7000,7 +7000,7 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, unsigned int vector) kvm_get_segment(vcpu, &cs, VCPU_SREG_CS); cs.selector = vector << 8; - cs.base = vector << 12; + cs.base = (u64)vector << 12; kvm_set_segment(vcpu, &cs, VCPU_SREG_CS); kvm_rip_write(vcpu, 0); } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Exposing host debug capabilities to userspace
Alexander Graf writes: >> Am 24.11.2014 um 13:32 schrieb Peter Maydell : >> >>> On 24 November 2014 at 12:26, Alexander Graf wrote: >>> On 24.11.14 12:35, Alex Bennée wrote: > * KVM ioctl GET_ONE_REG(ID_AA64DFR0_EL1) Nope, guest state API >>> >>> What's the problem with using ONE_REG for this? After all, the total >>> number of available guest debug register is a guest vcpu property of >>> some sort. >> >> Yes, but we don't want to know about properties of the guest >> vCPU. In an ideal world QEMU could reserve say half the debug >> registers for debugging the VM on startup and have KVM expose >> ID registers indicating to the guest that it only had the >> other half... > > Yup, so create another (read-only) ONE_REG that exposes the number of > actual guest debug registers. Does the GET/SET_ONE_REG have the concept of a read-only register? I'd be worried by code just blindly iterating over the lists getting thrown when one doesn't work. > > Alex > >> >> -- PMM -- Alex Bennée -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Exposing host debug capabilities to userspace
On Mon, Nov 24, 2014 at 1:51 PM, Alexander Graf wrote: > > > On 24.11.14 13:44, Peter Maydell wrote: >> On 24 November 2014 at 12:41, Alexander Graf wrote: Am 24.11.2014 um 13:32 schrieb Peter Maydell : Yes, but we don't want to know about properties of the guest vCPU. In an ideal world QEMU could reserve say half the debug registers for debugging the VM on startup and have KVM expose ID registers indicating to the guest that it only had the other half... >>> >>> Yup, so create another (read-only) ONE_REG that exposes the number >>> of actual guest debug registers. >> >> I'm confused. ONE_REG is for guest state, and the ID register >> by definition is how we tell the guest how many debug registers >> it has. What we want to know (and perhaps even control) for >> debugging the VM is how many debug registers the host has. > > No, we don't want to know how many debug registers the host has. We want > to know how many debug registers the guest has. > > Imagine you're running on A57 today with 8 debug registers (no idea if > that's true, but assume it is). Tomorrow there will be a new core - > let's call it A67 - with 16 debug registers. > > To make sure your legacy, badly written guest behaves exactly the same - > especially after live migration - you want to spawn a VM with -cpu A57. > That implies you want to expose 8 debug registers into the guest. So > debug register synchronization should only be aware of those 8 registers. > > So what we really care about is the number of debug registers available > to a guest vcpu. That in turn means it's guest state and as such can > easily go into ONE_REG. > We already export this for the guest via ONE_REG. What we want to do is support gdbstubs in QEMU to debug the guest, and to do this, QEMU needs to know how many hardware registers on the host there is; the guest will never see this information. So this is really about the host, the guest side is trivially handled through ONE_REG. -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Exposing host debug capabilities to userspace
On Mon, Nov 24, 2014 at 1:53 PM, Alex Bennée wrote: > > Alexander Graf writes: > >>> Am 24.11.2014 um 13:32 schrieb Peter Maydell : >>> On 24 November 2014 at 12:26, Alexander Graf wrote: On 24.11.14 12:35, Alex Bennée wrote: >> * KVM ioctl GET_ONE_REG(ID_AA64DFR0_EL1) > Nope, guest state API What's the problem with using ONE_REG for this? After all, the total number of available guest debug register is a guest vcpu property of some sort. >>> >>> Yes, but we don't want to know about properties of the guest >>> vCPU. In an ideal world QEMU could reserve say half the debug >>> registers for debugging the VM on startup and have KVM expose >>> ID registers indicating to the guest that it only had the >>> other half... >> >> Yup, so create another (read-only) ONE_REG that exposes the number of >> actual guest debug registers. > > Does the GET/SET_ONE_REG have the concept of a read-only register? I'd > be worried by code just blindly iterating over the lists getting thrown > when one doesn't work. > yes, we have invariant cp15 registers. -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Exposing host debug capabilities to userspace
On 24.11.14 13:53, Christoffer Dall wrote: > On Mon, Nov 24, 2014 at 1:51 PM, Alexander Graf wrote: >> >> >> On 24.11.14 13:44, Peter Maydell wrote: >>> On 24 November 2014 at 12:41, Alexander Graf wrote: > Am 24.11.2014 um 13:32 schrieb Peter Maydell : > Yes, but we don't want to know about properties of the guest > vCPU. In an ideal world QEMU could reserve say half the debug > registers for debugging the VM on startup and have KVM expose > ID registers indicating to the guest that it only had the > other half... Yup, so create another (read-only) ONE_REG that exposes the number of actual guest debug registers. >>> >>> I'm confused. ONE_REG is for guest state, and the ID register >>> by definition is how we tell the guest how many debug registers >>> it has. What we want to know (and perhaps even control) for >>> debugging the VM is how many debug registers the host has. >> >> No, we don't want to know how many debug registers the host has. We want >> to know how many debug registers the guest has. >> >> Imagine you're running on A57 today with 8 debug registers (no idea if >> that's true, but assume it is). Tomorrow there will be a new core - >> let's call it A67 - with 16 debug registers. >> >> To make sure your legacy, badly written guest behaves exactly the same - >> especially after live migration - you want to spawn a VM with -cpu A57. >> That implies you want to expose 8 debug registers into the guest. So >> debug register synchronization should only be aware of those 8 registers. >> >> So what we really care about is the number of debug registers available >> to a guest vcpu. That in turn means it's guest state and as such can >> easily go into ONE_REG. >> > We already export this for the guest via ONE_REG. > > What we want to do is support gdbstubs in QEMU to debug the guest, and > to do this, QEMU needs to know how many hardware registers on the host > there is; the guest will never see this information. > > So this is really about the host, the guest side is trivially handled > through ONE_REG. That's the cp15 register that happens to get exposed to the guest. You can just add another ONE_REG that does not have a cp15 equivalent to expose the number of the vcpu's actually available debug registers. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Exposing host debug capabilities to userspace
On Mon, Nov 24, 2014 at 1:56 PM, Alexander Graf wrote: > > > On 24.11.14 13:53, Christoffer Dall wrote: >> On Mon, Nov 24, 2014 at 1:51 PM, Alexander Graf wrote: >>> >>> >>> On 24.11.14 13:44, Peter Maydell wrote: On 24 November 2014 at 12:41, Alexander Graf wrote: >> Am 24.11.2014 um 13:32 schrieb Peter Maydell : >> Yes, but we don't want to know about properties of the guest >> vCPU. In an ideal world QEMU could reserve say half the debug >> registers for debugging the VM on startup and have KVM expose >> ID registers indicating to the guest that it only had the >> other half... > > Yup, so create another (read-only) ONE_REG that exposes the number > of actual guest debug registers. I'm confused. ONE_REG is for guest state, and the ID register by definition is how we tell the guest how many debug registers it has. What we want to know (and perhaps even control) for debugging the VM is how many debug registers the host has. >>> >>> No, we don't want to know how many debug registers the host has. We want >>> to know how many debug registers the guest has. >>> >>> Imagine you're running on A57 today with 8 debug registers (no idea if >>> that's true, but assume it is). Tomorrow there will be a new core - >>> let's call it A67 - with 16 debug registers. >>> >>> To make sure your legacy, badly written guest behaves exactly the same - >>> especially after live migration - you want to spawn a VM with -cpu A57. >>> That implies you want to expose 8 debug registers into the guest. So >>> debug register synchronization should only be aware of those 8 registers. >>> >>> So what we really care about is the number of debug registers available >>> to a guest vcpu. That in turn means it's guest state and as such can >>> easily go into ONE_REG. >>> >> We already export this for the guest via ONE_REG. >> >> What we want to do is support gdbstubs in QEMU to debug the guest, and >> to do this, QEMU needs to know how many hardware registers on the host >> there is; the guest will never see this information. >> >> So this is really about the host, the guest side is trivially handled >> through ONE_REG. > > That's the cp15 register that happens to get exposed to the guest. You > can just add another ONE_REG that does not have a cp15 equivalent to > expose the number of the vcpu's actually available debug registers. > > The fact that we currently map the guest vcpu registers to that of the host doesn't mean that will always be the case (which you argued for above). If you migrate a VM from a CPU with 16 debug registers (or emulate a CPU with 16 debug registers) on a physical CPU with only 8 debug registers, you cannot tell QEMU that it has 16 debug registers on the hardware to configure to debug the guest, which is what ONE_REG would give you. You could add another set of registers to ONE_REG which says "these are the host versions", or you could say that you don't ever support a setup where the guest number of debug registers is not the same as the host number, but both would be wrong. -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] kvm: x86: potential shift wrapping bug
On 24/11/2014 13:53, Dan Carpenter wrote: > cs.base is declared as a __u64 variable and vector is a u32 so this > causes a static checker warning. I'm not very familiar with this code > but my understanding is that the user can set "sipi_vector" to any u32 > value in kvm_vcpu_ioctl_x86_set_vcpu_events(). The user can do so, but it should not set it to any value greater than 255. So the right fix is to cast to (u8). Thanks for the report! Paolo > Signed-off-by: Dan Carpenter > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 34c8f94..6608115 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -7000,7 +7000,7 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu > *vcpu, unsigned int vector) > > kvm_get_segment(vcpu, &cs, VCPU_SREG_CS); > cs.selector = vector << 8; > - cs.base = vector << 12; > + cs.base = (u64)vector << 12; > kvm_set_segment(vcpu, &cs, VCPU_SREG_CS); > kvm_rip_write(vcpu, 0); > } > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Exposing host debug capabilities to userspace
Alex Bennée writes: > Alex Bennée writes: > >> Hi, >> >> I've almost finished the ARMv8 guest debug support but I have one >> problem left to solve. userspace needs to know how many hardware debug >> registers are available for GDB to use. This information is available >> from the ID_AA64DFR0_EL1 register. >> So far the options I've examined are: >> >> * KVM ioctl GET_ONE_REG(ID_AA64DFR0_EL1) >> * ptrace(PTRACE_GETREGSET, NT_ARM_HW_WATCH) >> * KVM ioctl KVM_GET_DEBUGREGS >> * Export the information via sysfs >> * Expand /proc/cpuinfo >> * Add another KVM ioctl? Alexander Graf pointed out that KVM_CHECK_EXTENSION can return any positive number for success. How about using: max_hw_bps = kvm_check_extension(kvm_state, KVM_CAP_GUEST_DEBUG_HW_BPS); max_hw_wps = kvm_check_extension(kvm_state, KVM_CAP_GUEST_DEBUG_HW_WPS); Seems pretty sane, doesn't change the semantics of an API and is architecture agnostic if others need the number? -- Alex Bennée -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Exposing host debug capabilities to userspace
On 24.11.14 14:10, Christoffer Dall wrote: > On Mon, Nov 24, 2014 at 1:56 PM, Alexander Graf wrote: >> >> >> On 24.11.14 13:53, Christoffer Dall wrote: >>> On Mon, Nov 24, 2014 at 1:51 PM, Alexander Graf wrote: On 24.11.14 13:44, Peter Maydell wrote: > On 24 November 2014 at 12:41, Alexander Graf wrote: >>> Am 24.11.2014 um 13:32 schrieb Peter Maydell : >>> Yes, but we don't want to know about properties of the guest >>> vCPU. In an ideal world QEMU could reserve say half the debug >>> registers for debugging the VM on startup and have KVM expose >>> ID registers indicating to the guest that it only had the >>> other half... >> >> Yup, so create another (read-only) ONE_REG that exposes the number >> of actual guest debug registers. > > I'm confused. ONE_REG is for guest state, and the ID register > by definition is how we tell the guest how many debug registers > it has. What we want to know (and perhaps even control) for > debugging the VM is how many debug registers the host has. No, we don't want to know how many debug registers the host has. We want to know how many debug registers the guest has. Imagine you're running on A57 today with 8 debug registers (no idea if that's true, but assume it is). Tomorrow there will be a new core - let's call it A67 - with 16 debug registers. To make sure your legacy, badly written guest behaves exactly the same - especially after live migration - you want to spawn a VM with -cpu A57. That implies you want to expose 8 debug registers into the guest. So debug register synchronization should only be aware of those 8 registers. So what we really care about is the number of debug registers available to a guest vcpu. That in turn means it's guest state and as such can easily go into ONE_REG. >>> We already export this for the guest via ONE_REG. >>> >>> What we want to do is support gdbstubs in QEMU to debug the guest, and >>> to do this, QEMU needs to know how many hardware registers on the host >>> there is; the guest will never see this information. >>> >>> So this is really about the host, the guest side is trivially handled >>> through ONE_REG. >> >> That's the cp15 register that happens to get exposed to the guest. You >> can just add another ONE_REG that does not have a cp15 equivalent to >> expose the number of the vcpu's actually available debug registers. >> >> > The fact that we currently map the guest vcpu registers to that of the > host doesn't mean that will always be the case (which you argued for > above). > > If you migrate a VM from a CPU with 16 debug registers (or emulate a > CPU with 16 debug registers) on a physical CPU with only 8 debug > registers, you cannot tell QEMU that it has 16 debug registers on the > hardware to configure to debug the guest, which is what ONE_REG would > give you. > > You could add another set of registers to ONE_REG which says "these > are the host versions", or you could say that you don't ever support a > setup where the guest number of debug registers is not the same as the > host number, but both would be wrong. Why would QEMU ever want to access debug registers that it didn't ask for in the first place? If we simply limit ourselves to the register set the vcpu has we don't have any problems and the complexity matrix shrinks noticably, no? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: x86: move assigned-dev.c and iommu.c to arch/x86/
2014-11-22 17:22+0100, Paolo Bonzini: > On 21/11/2014 22:21, Radim Krčmář wrote: > > - struct kvm_assigned_dev_kernel depends on struct kvm_irq_ack_notifier > > kvm_assign_device and kvm_deassign_device can also be moved to arch/x86, > in a new assigned-dev.h header. The header could include struct > kvm_assigned_dev_kernel too but, if you change the argument of > kvm_assign_device and kvm_deassign_device to struct pci_dev, the struct > can move directly to assigned-dev.c and remain hidden there. Thanks! > It's fine to do this as a follow up. With assigned-dev.h, we could remove everything from x86/kvm_host.h too, which would be better as a replacement to the original ---8<--- kvm: x86: move assigned-dev.c and iommu.c to arch/x86/ Now that ia64 is gone, we can hide deprecated device assignment in x86. kvm_vm_ioctl_assigned_device() is newly called from kvm_arch_vm_ioctl(). Definitions and declarations have been consolidated into assigned-dev.h. Remaining kvm_iommu_(un)map_pages() would require new code to be moved. Signed-off-by: Radim Krčmář --- arch/x86/kvm/Makefile | 2 +- {virt => arch/x86}/kvm/assigned-dev.c | 1 + arch/x86/kvm/assigned-dev.h | 59 +++ {virt => arch/x86}/kvm/iommu.c| 1 + arch/x86/kvm/x86.c| 3 +- include/linux/kvm_host.h | 55 virt/kvm/kvm_main.c | 2 -- 7 files changed, 64 insertions(+), 59 deletions(-) rename {virt => arch/x86}/kvm/assigned-dev.c (99%) create mode 100644 arch/x86/kvm/assigned-dev.h rename {virt => arch/x86}/kvm/iommu.c (99%) diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index ee1cd92..08f790d 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -9,11 +9,11 @@ KVM := ../../../virt/kvm kvm-y += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \ $(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o -kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT)+= $(KVM)/assigned-dev.o $(KVM)/iommu.o kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \ i8254.o ioapic.o irq_comm.o cpuid.o pmu.o +kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT)+= assigned-dev.o iommu.o kvm-intel-y+= vmx.o kvm-amd-y += svm.o diff --git a/virt/kvm/assigned-dev.c b/arch/x86/kvm/assigned-dev.c similarity index 99% rename from virt/kvm/assigned-dev.c rename to arch/x86/kvm/assigned-dev.c index e05000e..0cfd54f5f 100644 --- a/virt/kvm/assigned-dev.c +++ b/arch/x86/kvm/assigned-dev.c @@ -20,6 +20,7 @@ #include #include #include "irq.h" +#include "assigned-dev.h" static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct list_head *head, int assigned_dev_id) diff --git a/arch/x86/kvm/assigned-dev.h b/arch/x86/kvm/assigned-dev.h new file mode 100644 index 000..4caabc5 --- /dev/null +++ b/arch/x86/kvm/assigned-dev.h @@ -0,0 +1,59 @@ +#ifndef ARCH_X86_KVM_ASSIGNED_DEV_H +#define ARCH_X86_KVM_ASSIGNED_DEV_H + +#include + +#ifdef CONFIG_KVM_DEVICE_ASSIGNMENT +struct kvm_assigned_dev_kernel { + struct kvm_irq_ack_notifier ack_notifier; + struct list_head list; + int assigned_dev_id; + int host_segnr; + int host_busnr; + int host_devfn; + unsigned int entries_nr; + int host_irq; + bool host_irq_disabled; + bool pci_2_3; + struct msix_entry *host_msix_entries; + int guest_irq; + struct msix_entry *guest_msix_entries; + unsigned long irq_requested_type; + int irq_source_id; + int flags; + struct pci_dev *dev; + struct kvm *kvm; + spinlock_t intx_lock; + spinlock_t intx_mask_lock; + char irq_name[32]; + struct pci_saved_state *pci_saved_state; +}; + +int kvm_assign_device(struct kvm *kvm, + struct kvm_assigned_dev_kernel *assigned_dev); +int kvm_deassign_device(struct kvm *kvm, + struct kvm_assigned_dev_kernel *assigned_dev); + +int kvm_iommu_map_guest(struct kvm *kvm); +int kvm_iommu_unmap_guest(struct kvm *kvm); + +long kvm_vm_ioctl_assigned_device(struct kvm *kvm, unsigned ioctl, + unsigned long arg); + +void kvm_free_all_assigned_devices(struct kvm *kvm); +#else +static inline int kvm_iommu_unmap_guest(struct kvm *kvm) +{ + return 0; +} + +static inline long kvm_vm_ioctl_assigned_device(struct kvm *kvm, unsigned ioctl, + unsigned long arg) +{ + return -ENOTTY; +} + +static inline void kvm_free_all_assigned_devices(struct kvm *kvm) {} +#endif /* CONFIG_KVM_DEVICE_ASSIGNMENT */ + +#endif /* ARCH_X86_KVM_ASSIGNED_DEV_H */ diff --git a/virt/kvm/iommu.c b/arch/x86/kvm/iommu.c similarity index 99% rename from virt/kvm/iommu.c rename to arch/x
Re: [PATCH] kvm: x86: move ioapic.c and irq_comm.c back to arch/x86/
On 11/20/2014 02:42 PM, Paolo Bonzini wrote: > ia64 does not need them anymore. > > Signed-off-by: Paolo Bonzini > --- > arch/x86/include/asm/kvm_host.h | 16 > arch/x86/kvm/Makefile | 5 ++--- > {virt => arch/x86}/kvm/ioapic.c | 0 > {virt => arch/x86}/kvm/ioapic.h | 1 - > {virt => arch/x86}/kvm/irq_comm.c | 4 ++-- > arch/x86/kvm/x86.c| 1 + > include/linux/kvm_host.h | 22 -- > virt/kvm/eventfd.c| 7 --- > virt/kvm/kvm_main.c | 3 --- > 9 files changed, 29 insertions(+), 30 deletions(-) > rename {virt => arch/x86}/kvm/ioapic.c (100%) > rename {virt => arch/x86}/kvm/ioapic.h (98%) > rename {virt => arch/x86}/kvm/irq_comm.c (98%) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 769db36a3001..76ff3e2d8fd2 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -603,6 +603,9 @@ struct kvm_arch { > > struct kvm_xen_hvm_config xen_hvm_config; > > + /* reads protected by irq_srcu, writes by irq_lock */ > + struct hlist_head mask_notifier_list; > + > /* fields used by HYPER-V emulation */ > u64 hv_guest_os_id; > u64 hv_hypercall; > @@ -819,6 +822,19 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa, > const void *val, int bytes); > u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn); > > +struct kvm_irq_mask_notifier { > + void (*func)(struct kvm_irq_mask_notifier *kimn, bool masked); > + int irq; > + struct hlist_node link; > +}; > + > +void kvm_register_irq_mask_notifier(struct kvm *kvm, int irq, > + struct kvm_irq_mask_notifier *kimn); > +void kvm_unregister_irq_mask_notifier(struct kvm *kvm, int irq, > + struct kvm_irq_mask_notifier *kimn); > +void kvm_fire_mask_notifiers(struct kvm *kvm, unsigned irqchip, unsigned pin, > + bool mask); > + > extern bool tdp_enabled; > > u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu); > diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile > index 25d22b2d6509..ee1cd92b03be 100644 > --- a/arch/x86/kvm/Makefile > +++ b/arch/x86/kvm/Makefile > @@ -7,14 +7,13 @@ CFLAGS_vmx.o := -I. > > KVM := ../../../virt/kvm > > -kvm-y+= $(KVM)/kvm_main.o $(KVM)/ioapic.o \ > - $(KVM)/coalesced_mmio.o $(KVM)/irq_comm.o \ > +kvm-y+= $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \ > $(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o > kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += $(KVM)/assigned-dev.o $(KVM)/iommu.o > kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o > > kvm-y+= x86.o mmu.o emulate.o i8259.o irq.o lapic.o \ > -i8254.o cpuid.o pmu.o > +i8254.o ioapic.o irq_comm.o cpuid.o pmu.o > kvm-intel-y += vmx.o > kvm-amd-y+= svm.o > > diff --git a/virt/kvm/ioapic.c b/arch/x86/kvm/ioapic.c > similarity index 100% > rename from virt/kvm/ioapic.c > rename to arch/x86/kvm/ioapic.c > diff --git a/virt/kvm/ioapic.h b/arch/x86/kvm/ioapic.h > similarity index 98% > rename from virt/kvm/ioapic.h > rename to arch/x86/kvm/ioapic.h > index dc3baa3a538f..deac8d509f2a 100644 > --- a/virt/kvm/ioapic.h > +++ b/arch/x86/kvm/ioapic.h > @@ -96,7 +96,6 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct > kvm_lapic *src, > struct kvm_lapic_irq *irq, unsigned long *dest_map); > int kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state); > int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state); > -void kvm_vcpu_request_scan_ioapic(struct kvm *kvm); > void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap, > u32 *tmr); > > diff --git a/virt/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c > similarity index 98% > rename from virt/kvm/irq_comm.c > rename to arch/x86/kvm/irq_comm.c > index 1345bde064f5..e9c135b639aa 100644 > --- a/virt/kvm/irq_comm.c > +++ b/arch/x86/kvm/irq_comm.c > @@ -234,7 +234,7 @@ void kvm_register_irq_mask_notifier(struct kvm *kvm, int > irq, > { > mutex_lock(&kvm->irq_lock); > kimn->irq = irq; > - hlist_add_head_rcu(&kimn->link, &kvm->mask_notifier_list); > + hlist_add_head_rcu(&kimn->link, &kvm->arch.mask_notifier_list); > mutex_unlock(&kvm->irq_lock); > } > > @@ -256,7 +256,7 @@ void kvm_fire_mask_notifiers(struct kvm *kvm, unsigned > irqchip, unsigned pin, > idx = srcu_read_lock(&kvm->irq_srcu); > gsi = kvm_irq_map_chip_pin(kvm, irqchip, pin); > if (gsi != -1) > - hlist_for_each_entry_rcu(kimn, &kvm->mask_notifier_list, link) > + hlist_for_each_entry_rcu(kimn, &kvm->arch.mask_notifier_list, > link) > if (kimn->irq == gsi) >
Re: [PATCH v3 26/41] vhost: virtio 1.0 endian-ness support
Hi Michael, Do you have a tree from where I could pull these patches ? Thanks, C. On 11/24/2014 12:54 PM, Michael S. Tsirkin wrote: > Signed-off-by: Michael S. Tsirkin > --- > drivers/vhost/vhost.c | 93 > +++ > 1 file changed, 56 insertions(+), 37 deletions(-) > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index c90f437..4d379ed 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -33,8 +33,8 @@ enum { > VHOST_MEMORY_F_LOG = 0x1, > }; > > -#define vhost_used_event(vq) ((u16 __user *)&vq->avail->ring[vq->num]) > -#define vhost_avail_event(vq) ((u16 __user *)&vq->used->ring[vq->num]) > +#define vhost_used_event(vq) ((__virtio16 __user *)&vq->avail->ring[vq->num]) > +#define vhost_avail_event(vq) ((__virtio16 __user *)&vq->used->ring[vq->num]) > > static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh, > poll_table *pt) > @@ -1001,7 +1001,7 @@ EXPORT_SYMBOL_GPL(vhost_log_write); > static int vhost_update_used_flags(struct vhost_virtqueue *vq) > { > void __user *used; > - if (__put_user(vq->used_flags, &vq->used->flags) < 0) > + if (__put_user(cpu_to_vhost16(vq, vq->used_flags), &vq->used->flags) < > 0) > return -EFAULT; > if (unlikely(vq->log_used)) { > /* Make sure the flag is seen before log. */ > @@ -1019,7 +1019,7 @@ static int vhost_update_used_flags(struct > vhost_virtqueue *vq) > > static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 > avail_event) > { > - if (__put_user(vq->avail_idx, vhost_avail_event(vq))) > + if (__put_user(cpu_to_vhost16(vq, vq->avail_idx), > vhost_avail_event(vq))) > return -EFAULT; > if (unlikely(vq->log_used)) { > void __user *used; > @@ -1038,6 +1038,7 @@ static int vhost_update_avail_event(struct > vhost_virtqueue *vq, u16 avail_event) > > int vhost_init_used(struct vhost_virtqueue *vq) > { > + __virtio16 last_used_idx; > int r; > if (!vq->private_data) > return 0; > @@ -1046,7 +1047,13 @@ int vhost_init_used(struct vhost_virtqueue *vq) > if (r) > return r; > vq->signalled_used_valid = false; > - return get_user(vq->last_used_idx, &vq->used->idx); > + if (!access_ok(VERIFY_READ, &vq->used->idx, sizeof vq->used->idx)) > + return -EFAULT; > + r = __get_user(last_used_idx, &vq->used->idx); > + if (r) > + return r; > + vq->last_used_idx = vhost16_to_cpu(vq, last_used_idx); > + return 0; > } > EXPORT_SYMBOL_GPL(vhost_init_used); > > @@ -1087,16 +1094,16 @@ static int translate_desc(struct vhost_virtqueue *vq, > u64 addr, u32 len, > /* Each buffer in the virtqueues is actually a chain of descriptors. This > * function returns the next descriptor in the chain, > * or -1U if we're at the end. */ > -static unsigned next_desc(struct vring_desc *desc) > +static unsigned next_desc(struct vhost_virtqueue *vq, struct vring_desc > *desc) > { > unsigned int next; > > /* If this descriptor says it doesn't chain, we're done. */ > - if (!(desc->flags & VRING_DESC_F_NEXT)) > + if (!(desc->flags & cpu_to_vhost16(vq, VRING_DESC_F_NEXT))) > return -1U; > > /* Check they're not leading us off end of descriptors. */ > - next = desc->next; > + next = vhost16_to_cpu(vq, desc->next); > /* Make sure compiler knows to grab that: we don't want it changing! */ > /* We will use the result as an index in an array, so most >* architectures only need a compiler barrier here. */ > @@ -1113,18 +1120,19 @@ static int get_indirect(struct vhost_virtqueue *vq, > { > struct vring_desc desc; > unsigned int i = 0, count, found = 0; > + u32 len = vhost32_to_cpu(vq, indirect->len); > int ret; > > /* Sanity check */ > - if (unlikely(indirect->len % sizeof desc)) { > + if (unlikely(len % sizeof desc)) { > vq_err(vq, "Invalid length in indirect descriptor: " > "len 0x%llx not multiple of 0x%zx\n", > -(unsigned long long)indirect->len, > +(unsigned long long)vhost32_to_cpu(vq, indirect->len), > sizeof desc); > return -EINVAL; > } > > - ret = translate_desc(vq, indirect->addr, indirect->len, vq->indirect, > + ret = translate_desc(vq, vhost64_to_cpu(vq, indirect->addr), len, > vq->indirect, >UIO_MAXIOV); > if (unlikely(ret < 0)) { > vq_err(vq, "Translation failure %d in indirect.\n", ret); > @@ -1135,7 +1143,7 @@ static int get_indirect(struct vhost_virtqueue *vq, >* architectures only need a compiler barrier here. */ > read_barrier_depends(); > > - count = indirect->len / sizeof desc; > + count = len / sizeof desc; > /* Buffers are chained via a 16
Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
On Mon, Nov 24, 2014 at 02:14:48PM +0530, Anup Patel wrote: > On Fri, Nov 21, 2014 at 5:19 PM, Christoffer Dall > wrote: > > On Fri, Nov 21, 2014 at 04:06:05PM +0530, Anup Patel wrote: > >> Hi Christoffer, > >> > >> On Fri, Nov 21, 2014 at 3:29 PM, Christoffer Dall > >> wrote: > >> > On Thu, Nov 20, 2014 at 08:17:32PM +0530, Anup Patel wrote: > >> >> On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall > >> >> wrote: > >> >> > On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote: > >> >> >> Hi All, > >> >> >> > >> >> >> I have second thoughts about rebasing KVM PMU patches > >> >> >> to Marc's irq-forwarding patches. > >> >> >> > >> >> >> The PMU IRQs (when virtualized by KVM) are not exactly > >> >> >> forwarded IRQs because they are shared between Host > >> >> >> and Guest. > >> >> >> > >> >> >> Scenario1 > >> >> >> - > >> >> >> > >> >> >> We might have perf running on Host and no KVM guest > >> >> >> running. In this scenario, we wont get interrupts on Host > >> >> >> because the kvm_pmu_hyp_init() (similar to the function > >> >> >> kvm_timer_hyp_init() of Marc's IRQ-forwarding > >> >> >> implementation) has put all host PMU IRQs in forwarding > >> >> >> mode. > >> >> >> > >> >> >> The only way solve this problem is to not set forwarding > >> >> >> mode for PMU IRQs in kvm_pmu_hyp_init() and instead > >> >> >> have special routines to turn on and turn off the forwarding > >> >> >> mode of PMU IRQs. These routines will be called from > >> >> >> kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ > >> >> >> forwarding state. > >> >> >> > >> >> >> Scenario2 > >> >> >> - > >> >> >> > >> >> >> We might have perf running on Host and Guest simultaneously > >> >> >> which means it is quite likely that PMU HW trigger IRQ meant > >> >> >> for Host between "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" > >> >> >> and "kvm_pmu_sync_hwstate(vcpu);" (similar to timer sync routine > >> >> >> of Marc's patchset which is called before local_irq_enable()). > >> >> >> > >> >> >> In this scenario, the updated kvm_pmu_sync_hwstate(vcpu) > >> >> >> will accidentally forward IRQ meant for Host to Guest unless > >> >> >> we put additional checks to inspect VCPU PMU state. > >> >> >> > >> >> >> Am I missing any detail about IRQ forwarding for above > >> >> >> scenarios? > >> >> >> > >> >> > Hi Anup, > >> >> > >> >> Hi Christoffer, > >> >> > >> >> > > >> >> > I briefly discussed this with Marc. What I don't understand is how it > >> >> > would be possible to get an interrupt for the host while running the > >> >> > guest? > >> >> > > >> >> > The rationale behind my question is that whenever you're running the > >> >> > guest, the PMU should be programmed exclusively with guest state, and > >> >> > since the PMU is per core, any interrupts should be for the guest, > >> >> > where > >> >> > it would always be pending. > >> >> > >> >> Yes, thats right PMU is programmed exclusively for guest when > >> >> guest is running and for host when host is running. > >> >> > >> >> Let us assume a situation (Scenario2 mentioned previously) > >> >> where both host and guest are using PMU. When the guest is > >> >> running we come back to host mode due to variety of reasons > >> >> (stage2 fault, guest IO, regular host interrupt, host interrupt > >> >> meant for guest, ) which means we will return from the > >> >> "ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);" statement in the > >> >> kvm_arch_vcpu_ioctl_run() function with local IRQs disabled. > >> >> At this point we would have restored back host PMU context and > >> >> any PMU counter used by host can trigger PMU overflow interrup > >> >> for host. Now we will be having "kvm_pmu_sync_hwstate(vcpu);" > >> >> in the kvm_arch_vcpu_ioctl_run() function (similar to the > >> >> kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset) > >> >> which will try to detect PMU irq forwarding state in GIC hence it > >> >> can accidentally discover PMU irq pending for guest while this > >> >> PMU irq is actually meant for host. > >> >> > >> >> This above mentioned situation does not happen for timer > >> >> because virtual timer interrupts are exclusively used for guest. > >> >> The exclusive use of virtual timer interrupt for guest ensures that > >> >> the function kvm_timer_sync_hwstate() will always see correct > >> >> state of virtual timer IRQ from GIC. > >> >> > >> > I'm not quite following. > >> > > >> > When you call kvm_pmu_sync_hwstate(vcpu) in the non-preemtible section, > >> > you would (1) capture the active state of the IRQ pertaining to the > >> > guest and (2) deactive the IRQ on the host, then (3) switch the state of > >> > the PMU to the host state, and finally (4) re-enable IRQs on the CPU > >> > you're running on. > >> > > >> > If the host PMU state restored in (3) causes the PMU to raise an > >> > interrupt, you'll take an interrupt after (4), which is for the host, > >> > and you'll handle it on the host. > >> > > >> We only switch PMU state in assembly
Re: Exposing host debug capabilities to userspace
On Mon, Nov 24, 2014 at 03:07:35PM +0100, Alexander Graf wrote: > > > On 24.11.14 14:10, Christoffer Dall wrote: > > On Mon, Nov 24, 2014 at 1:56 PM, Alexander Graf wrote: > >> > >> > >> On 24.11.14 13:53, Christoffer Dall wrote: > >>> On Mon, Nov 24, 2014 at 1:51 PM, Alexander Graf wrote: > > > On 24.11.14 13:44, Peter Maydell wrote: > > On 24 November 2014 at 12:41, Alexander Graf wrote: > >>> Am 24.11.2014 um 13:32 schrieb Peter Maydell > >>> : > >>> Yes, but we don't want to know about properties of the guest > >>> vCPU. In an ideal world QEMU could reserve say half the debug > >>> registers for debugging the VM on startup and have KVM expose > >>> ID registers indicating to the guest that it only had the > >>> other half... > >> > >> Yup, so create another (read-only) ONE_REG that exposes the number > >> of actual guest debug registers. > > > > I'm confused. ONE_REG is for guest state, and the ID register > > by definition is how we tell the guest how many debug registers > > it has. What we want to know (and perhaps even control) for > > debugging the VM is how many debug registers the host has. > > No, we don't want to know how many debug registers the host has. We want > to know how many debug registers the guest has. > > Imagine you're running on A57 today with 8 debug registers (no idea if > that's true, but assume it is). Tomorrow there will be a new core - > let's call it A67 - with 16 debug registers. > > To make sure your legacy, badly written guest behaves exactly the same - > especially after live migration - you want to spawn a VM with -cpu A57. > That implies you want to expose 8 debug registers into the guest. So > debug register synchronization should only be aware of those 8 registers. > > So what we really care about is the number of debug registers available > to a guest vcpu. That in turn means it's guest state and as such can > easily go into ONE_REG. > > >>> We already export this for the guest via ONE_REG. > >>> > >>> What we want to do is support gdbstubs in QEMU to debug the guest, and > >>> to do this, QEMU needs to know how many hardware registers on the host > >>> there is; the guest will never see this information. > >>> > >>> So this is really about the host, the guest side is trivially handled > >>> through ONE_REG. > >> > >> That's the cp15 register that happens to get exposed to the guest. You > >> can just add another ONE_REG that does not have a cp15 equivalent to > >> expose the number of the vcpu's actually available debug registers. > >> > >> > > The fact that we currently map the guest vcpu registers to that of the > > host doesn't mean that will always be the case (which you argued for > > above). > > > > If you migrate a VM from a CPU with 16 debug registers (or emulate a > > CPU with 16 debug registers) on a physical CPU with only 8 debug > > registers, you cannot tell QEMU that it has 16 debug registers on the > > hardware to configure to debug the guest, which is what ONE_REG would > > give you. > > > > You could add another set of registers to ONE_REG which says "these > > are the host versions", or you could say that you don't ever support a > > setup where the guest number of debug registers is not the same as the > > host number, but both would be wrong. > > Why would QEMU ever want to access debug registers that it didn't ask > for in the first place? If we simply limit ourselves to the register set > the vcpu has we don't have any problems and the complexity matrix > shrinks noticably, no? > (For others following this, not hanging out on #kvm-arm, I provide a summary). We had a lenghty IRC discussion on this, for the curious, go read it here: http://irclogs.linaro.org/2014/11/24/%23kvm-arm.html The point of confusion is that other KVM architectures use the ONE_REG interface to set the debug registers, for both guest and host state, and in fact this can be overlayed. The propose idea of using SET_DEBUGREGS was flawed, because this also pertains to guest state and is a deprecated ABI. So that left us with two choices: (1) Implement a new ABI to retrieve and set host debugging independently of the guest state. (2) Overlay the host debugging of a guest with the guest's internal debugging state (a process inside the guest is debugging a process). Option (1) has the usual drawback of having to design an ABI, consumes the ioctl number space etc. But it also has (the more severe) drawback that debugging the guest using QEMU gdbstubs breaks guest debugging, because we would keep completely distinct register sets. Option (2) feels semantically slightly weird in the cases where the number of guest debug regs differs from the number of host debug regs, but as Alex Graf pointed out, we probably wouldn't dream of supporting more guest debug registers than the host has (refuse migration,
Re: [CFT PATCH 0/2] KVM: support XSAVES usage in the host
> On Nov 24, 2014, at 13:39, Paolo Bonzini wrote: > > > > On 23/11/2014 09:16, Nadav Amit wrote: >> I’ll try to check it tomorrow (I don’t have access to the failing machine at >> the moment). > > Thanks, you'll need to squash this in: > > diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c > index 4c540c4719d8..0de1fae2bdf0 100644 > --- a/arch/x86/kernel/xsave.c > +++ b/arch/x86/kernel/xsave.c > @@ -738,3 +738,4 @@ void *get_xsave_addr(struct xsave_struct *xsave, int > xstate) > > return (void *)xsave + xstate_comp_offsets[feature]; > } > +EXPORT_SYMBOL_GPL(get_xsave_addr); I tested the patches but there are still problems. Since kvm_load_guest_fpu is called before the guest_fpu is ever stored, there are 2 more problems that currently cause #GP: 1. XCOMP_BV[63] = 0 2. XSTATE_BV sets a bit (including bit 63) that is not set in XCOMP_BV (XCOMP_BV is initialised to zero). [see SDM 13.11 "OPERATION OF XRSTORS”] Once I initialise XCOMP_BV to (1ull << 63) | XSTATE_BV, the guest runs successfully. I have not checked any other qemu functionality that might be affected by the patch. Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/3] KVM: arm: add irqfd support
On Mon, Nov 24, 2014 at 12:02 PM, Eric Auger wrote: > On 11/24/2014 11:00 AM, Christoffer Dall wrote: >> On Sun, Nov 23, 2014 at 06:56:59PM +0100, Eric Auger wrote: >>> This patch enables irqfd on arm. >>> >>> Both irqfd and resamplefd are supported. Injection is implemented >>> in vgic.c without routing. >>> >>> This patch enables CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD. >>> >>> KVM_CAP_IRQFD is now advertised. KVM_CAP_IRQFD_RESAMPLE capability >>> automatically is advertised as soon as CONFIG_HAVE_KVM_IRQFD is set. >>> >>> Signed-off-by: Eric Auger >>> >>> --- >>> >>> v3 -> v4: >>> - reword commit message >>> - explain why we unlock the distributor before calling kvm_notify_acked_irq >>> - rename is_assigned_irq into has_notifier >>> - change EOI and injection kvm_debug format string >>> - remove error local variable in kvm_set_irq >>> - Move HAVE_KVM_IRQCHIP unset in a separate patch >>> - The rationale behind not supporting PPI irqfd injection is that >>> any device using a PPI would be a private-to-the-CPU device (timer for >>> instance), so its state would have to be context-switched along with the >>> VCPU and would require in-kernel wiring anyhow. It is not a relevant use >>> case for irqfds. >> >> this blob could go in the commit message. > OK >> >>> - handle case were the irqfd injection is attempted before the vgic is >>> ready. >>> in such a case the notifier, if any, is called immediatly >>> - use nr_irqs to test spi is within correct range >>> >>> v2 -> v3: >>> - removal of irq.h from eventfd.c put in a separate patch to increase >>> visibility >>> - properly expose KVM_CAP_IRQFD capability in arm.c >>> - remove CONFIG_HAVE_KVM_IRQCHIP meaningfull only if irq_comm.c is used >>> >>> v1 -> v2: >>> - rebase on 3.17rc1 >>> - move of the dist unlock in process_maintenance >>> - remove of dist lock in __kvm_vgic_sync_hwstate >>> - rewording of the commit message (add resamplefd reference) >>> - remove irq.h >>> --- >>> Documentation/virtual/kvm/api.txt | 5 ++- >>> arch/arm/include/uapi/asm/kvm.h | 3 ++ >>> arch/arm/kvm/Kconfig | 2 ++ >>> arch/arm/kvm/Makefile | 2 +- >>> arch/arm/kvm/arm.c| 3 ++ >>> virt/kvm/arm/vgic.c | 72 >>> --- >>> 6 files changed, 81 insertions(+), 6 deletions(-) >>> >>> diff --git a/Documentation/virtual/kvm/api.txt >>> b/Documentation/virtual/kvm/api.txt >>> index 7610eaa..4deccc0 100644 >>> --- a/Documentation/virtual/kvm/api.txt >>> +++ b/Documentation/virtual/kvm/api.txt >>> @@ -2206,7 +2206,7 @@ into the hash PTE second double word). >>> 4.75 KVM_IRQFD >>> >>> Capability: KVM_CAP_IRQFD >>> -Architectures: x86 s390 >>> +Architectures: x86 s390 arm >>> Type: vm ioctl >>> Parameters: struct kvm_irqfd (in) >>> Returns: 0 on success, -1 on error >>> @@ -2232,6 +2232,9 @@ Note that closing the resamplefd is not sufficient to >>> disable the >>> irqfd. The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment >>> and need not be specified with KVM_IRQFD_FLAG_DEASSIGN. >>> >>> +On arm, the gsi must be a shared peripheral interrupt (SPI). >>> +This means the corresponding programmed GIC interrupt ID is gsi+32. >>> + >> >> On ARM, the gsi field in the kvm_irqfd struct specifies the Shared >> Peripheral Interrupt (SPI) index, such that the GIC interrupt ID is >> given by gsi + 32. > OK >> >>> 4.76 KVM_PPC_ALLOCATE_HTAB >>> >>> Capability: KVM_CAP_PPC_ALLOC_HTAB >>> diff --git a/arch/arm/include/uapi/asm/kvm.h >>> b/arch/arm/include/uapi/asm/kvm.h >>> index 09ee408..77547bb 100644 >>> --- a/arch/arm/include/uapi/asm/kvm.h >>> +++ b/arch/arm/include/uapi/asm/kvm.h >>> @@ -196,6 +196,9 @@ struct kvm_arch_memory_slot { >>> /* Highest supported SPI, from VGIC_NR_IRQS */ >>> #define KVM_ARM_IRQ_GIC_MAX 127 >>> >>> +/* One single KVM irqchip, ie. the VGIC */ >>> +#define KVM_NR_IRQCHIPS 1 >>> + >>> /* PSCI interface */ >>> #define KVM_PSCI_FN_BASE0x95c1ba5e >>> #define KVM_PSCI_FN(n) (KVM_PSCI_FN_BASE + (n)) >>> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig >>> index 9f581b1..e519a40 100644 >>> --- a/arch/arm/kvm/Kconfig >>> +++ b/arch/arm/kvm/Kconfig >>> @@ -24,6 +24,7 @@ config KVM >>> select KVM_MMIO >>> select KVM_ARM_HOST >>> depends on ARM_VIRT_EXT && ARM_LPAE >>> +select HAVE_KVM_EVENTFD >>> ---help--- >>>Support hosting virtualized guest machines. You will also >>>need to select one or more of the processor modules below. >>> @@ -55,6 +56,7 @@ config KVM_ARM_MAX_VCPUS >>> config KVM_ARM_VGIC >>> bool "KVM support for Virtual GIC" >>> depends on KVM_ARM_HOST && OF >>> +select HAVE_KVM_IRQFD >>> default y >>> ---help--- >>>Adds support for a hardware assisted, in-kernel GIC emulation. >>> diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile >>> index f7057ed..859db09 100644 >>> --- a/arch/arm/kvm/M
Re: [CFT PATCH 0/2] KVM: support XSAVES usage in the host
On 24/11/2014 16:28, Nadav Amit wrote: > Since kvm_load_guest_fpu is called before the guest_fpu is ever stored, there > are 2 more problems that currently cause #GP: > 1. XCOMP_BV[63] = 0 > 2. XSTATE_BV sets a bit (including bit 63) that is not set in XCOMP_BV > (XCOMP_BV is initialised to zero). > > [see SDM 13.11 "OPERATION OF XRSTORS”] > > Once I initialise XCOMP_BV to (1ull << 63) | XSTATE_BV, the guest runs > successfully. > I have not checked any other qemu functionality that might be affected by the > patch. Ah, so the problem is with KVM_SET_XSAVE. Thanks! Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[CFT PATCH v2 1/2] kvm: x86: mask out XSAVES
This feature is not supported inside KVM guests yet, because we do not emulate MSR_IA32_XSS. Mask it out. Cc: Nadav Amit Signed-off-by: Paolo Bonzini --- arch/x86/kvm/cpuid.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 20d83217fb1d..a4f5ac46226c 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -320,6 +320,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, F(ADX) | F(SMAP) | F(AVX512F) | F(AVX512PF) | F(AVX512ER) | F(AVX512CD); + /* cpuid 0xD.1.eax */ + const u32 kvm_supported_word10_x86_features = + F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1); + /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); @@ -456,13 +460,18 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, entry->eax &= supported; entry->edx &= supported >> 32; entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; + if (!supported) + break; + for (idx = 1, i = 1; idx < 64; ++idx) { u64 mask = ((u64)1 << idx); if (*nent >= maxnent) goto out; do_cpuid_1_ent(&entry[i], function, idx); - if (entry[i].eax == 0 || !(supported & mask)) + if (idx == 1) + entry[i].eax &= kvm_supported_word10_x86_features; + else if (entry[i].eax == 0 || !(supported & mask)) continue; entry[i].flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[CFT PATCH v2 2/2] KVM: x86: support XSAVES usage in the host
Userspace is expecting non-compacted format for KVM_GET_XSAVE, but struct xsave_struct might be using the compacted format. Convert in order to preserve userspace ABI. Likewise, userspace is passing non-compacted format for KVM_SET_XSAVE but the kernel will pass it to XRSTORS, and we need to convert back. Fixes: f31a9f7c71691569359fa7fb8b0acaa44bce0324 Cc: Fenghua Yu Cc: H. Peter Anvin Cc: Nadav Amit Signed-off-by: Paolo Bonzini --- arch/x86/kvm/x86.c | 87 +- 1 file changed, 80 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 08b5657e57ed..373b0ab9a32e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3132,15 +3132,89 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu, return 0; } +#define XSTATE_COMPACTION_ENABLED (1ULL << 63) + +static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu) +{ + struct xsave_struct *xsave = &vcpu->arch.guest_fpu.state->xsave; + u64 xstate_bv = vcpu->arch.guest_supported_xcr0 | XSTATE_FPSSE; + u64 valid; + + /* +* Copy legacy XSAVE area, to avoid complications with CPUID +* leaves 0 and 1 in the loop below. +*/ + memcpy(dest, xsave, XSAVE_HDR_OFFSET); + + /* Set XSTATE_BV */ + *(u64 *)(dest + XSAVE_HDR_OFFSET) = xstate_bv; + + /* +* Copy each region from the possibly compacted offset to the +* non-compacted offset. +*/ + valid = xstate_bv & ~XSTATE_FPSSE; + while (valid) { + u64 feature = valid & -valid; + int index = fls64(feature) - 1; + void *src = get_xsave_addr(xsave, feature); + + if (src) { + u32 size, offset, ecx, edx; + cpuid_count(XSTATE_CPUID, index, + &size, &offset, &ecx, &edx); + memcpy(dest + offset, src, size); + } + + valid -= feature; + } +} + +static void load_xsave(struct kvm_vcpu *vcpu, u8 *src) +{ + struct xsave_struct *xsave = &vcpu->arch.guest_fpu.state->xsave; + u64 xstate_bv = *(u64 *)(src + XSAVE_HDR_OFFSET); + u64 valid; + + /* +* Copy legacy XSAVE area, to avoid complications with CPUID +* leaves 0 and 1 in the loop below. +*/ + memcpy(xsave, src, XSAVE_HDR_OFFSET); + + /* Set XSTATE_BV and possibly XCOMP_BV. */ + xsave->xsave_hdr.xstate_bv = xstate_bv; + if (cpu_has_xsaves) + xsave->xsave_hdr.xcomp_bv = host_xcr0 | XSTATE_COMPACTION_ENABLED; + + /* +* Copy each region from the non-compacted offset to the +* possibly compacted offset. +*/ + valid = xstate_bv & ~XSTATE_FPSSE; + while (valid) { + u64 feature = valid & -valid; + int index = fls64(feature) - 1; + void *dest = get_xsave_addr(xsave, feature); + + if (dest) { + u32 size, offset, ecx, edx; + cpuid_count(XSTATE_CPUID, index, + &size, &offset, &ecx, &edx); + memcpy(dest, src + offset, size); + } else + WARN_ON_ONCE(1); + + valid -= feature; + } +} + static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu, struct kvm_xsave *guest_xsave) { if (cpu_has_xsave) { - memcpy(guest_xsave->region, - &vcpu->arch.guest_fpu.state->xsave, - vcpu->arch.guest_xstate_size); - *(u64 *)&guest_xsave->region[XSAVE_HDR_OFFSET / sizeof(u32)] &= - vcpu->arch.guest_supported_xcr0 | XSTATE_FPSSE; + memset(guest_xsave, 0, sizeof(struct kvm_xsave)); + fill_xsave((u8 *) guest_xsave->region, vcpu); } else { memcpy(guest_xsave->region, &vcpu->arch.guest_fpu.state->fxsave, @@ -3164,8 +3238,7 @@ static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu, */ if (xstate_bv & ~kvm_supported_xcr0()) return -EINVAL; - memcpy(&vcpu->arch.guest_fpu.state->xsave, - guest_xsave->region, vcpu->arch.guest_xstate_size); + load_xsave(vcpu, (u8 *)guest_xsave->region); } else { if (xstate_bv & ~XSTATE_FPSSE) return -EINVAL; -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[CFT PATCH v2 0/2] KVM: support XSAVES usage in the host
The first patch ensures that XSAVES is not exposed in the guest until we emulate MSR_IA32_XSS. The second exports XSAVE data in the correct format. I tested these on a non-XSAVES system so they should not be completely broken, but I need some help. I am not even sure which XSAVE states are _not_ enabled, and thus compacted, in Linux. Note that these patches do not add support for XSAVES in the guest yet, since MSR_IA32_XSS is not emulated. If they fix the bug Nadav reported, I'll add Reported-by and commit. Thanks, Paolo v1->v2: also adjust KVM_SET_XSAVE Paolo Bonzini (2): kvm: x86: mask out XSAVES KVM: x86: support XSAVES usage in the host arch/x86/kvm/cpuid.c | 11 ++- arch/x86/kvm/x86.c | 87 +++- 2 files changed, 90 insertions(+), 8 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: x86: move ioapic.c and irq_comm.c back to arch/x86/
On 24/11/2014 15:26, Eric Auger wrote: >> > +#ifdef __KVM_HAVE_IOAPIC >> > +void kvm_vcpu_request_scan_ioapic(struct kvm *kvm); >> > +#else >> > +static inline void kvm_vcpu_request-scan_ioapic(struct kvm *kvm) > Hi Paolo, > > you have a typo above: "-" instead of "_". Indeed, thanks. The fixed version is already on kernel.org. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CFT PATCH 0/2] KVM: support XSAVES usage in the host
On 24/11/2014 16:28, Nadav Amit wrote: > > Since kvm_load_guest_fpu is called before the guest_fpu is ever stored, there > are 2 more problems that currently cause #GP: > 1. XCOMP_BV[63] = 0 > 2. XSTATE_BV sets a bit (including bit 63) that is not set in XCOMP_BV > (XCOMP_BV is initialised to zero). > > [see SDM 13.11 "OPERATION OF XRSTORS”] > > Once I initialise XCOMP_BV to (1ull << 63) | XSTATE_BV, the guest runs > successfully. > I have not checked any other qemu functionality that might be affected by the > patch. I posted patches that assume that QEMU calls KVM_SET_XSAVE early enough. If this is not the case, can you cook up and post a patch to kvm_arch_vcpu_init that fixes the remaining problem? Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/3] irqfd support for arm/arm64
Hi, On 24/11/14 10:10, Eric Auger wrote: > On 11/24/2014 10:47 AM, Christoffer Dall wrote: >> On Sun, Nov 23, 2014 at 06:56:57PM +0100, Eric Auger wrote: >>> This patch series enables irqfd on arm and arm64. >>> >>> Irqfd framework enables to inject a virtual IRQ into a guest upon an >>> eventfd trigger. User-side uses KVM_IRQFD VM ioctl to provide KVM with >>> a kvm_irqfd struct that associates a VM, an eventfd, a virtual IRQ number >>> (aka. the gsi). When an actor signals the eventfd (typically a VFIO >>> platform driver), the kvm irqfd subsystem injects the gsi into the VM. >>> >>> Resamplefd also is supported for level sensitive interrupts, ie. the >>> user can provide another eventfd that is triggered when the completion >>> of the virtual IRQ (gsi) is detected by the GIC. >>> >>> The gsi must correspond to a shared peripheral interrupt (SPI), ie the >>> GIC interrupt ID is gsi + 32. >>> >>> The rationale behind not supporting PPI irqfd injection is that >>> any device using a PPI would be a private-to-the-CPU device (timer for >>> instance), so its state would have to be context-switched along with the >>> VCPU and would require in-kernel wiring anyhow. It is not a relevant use >>> case for irqfds. >>> >>> this patch enables CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD. >>> >>> No IRQ routing table is used, enabling to remove CONFIG_HAVE_KVM_IRQCHIP >>> >>> can be found at git://git.linaro.org/people/eric.auger/linux.git >>> on branch irqfd_integ_v8 >>> >>> This work was tested with Calxeda Midway xgmac main interrupt with >>> qemu-system-arm and QEMU VFIO platform device. Also irqfd was proven >>> functional on several vhost-net prototypes. >>> >>> v3 -> v4: >>> - rebase on 3.18rc5 >>> - vgic dynamic instantiation brought new challenges: >>> handling of irqfd injection when vgic is not ready >>> - unset of CONFIG_HAVE_KVM_IRQCHIP in a separate patch >>> - add arm64 enable >>> - vgic.c style modifications according to Christoffer comments >>> >> >> There also seems to be a different split of the patches here? > Hi Christoffer, > yes I added arm64b support and moved CONFIG_HAVE_KVM_IRQCHIP removal in > a separate patch. >> >> We've probably also reached the point where you need to start rebasing >> on Andre's GICv3 patches, which I expect will go in first. > the patch applies without conflict on Andre's series. Yes, I can confirm this. I also manually checked the patches to spot any fallouts due to the vgic.c file split, but Eric's patches are not affected by this. Cheers, Andre. > > BR > > Eric >> >> Thanks, >> -Christoffer >> > > ___ > kvmarm mailing list > kvm...@lists.cs.columbia.edu > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CFT PATCH 0/2] KVM: support XSAVES usage in the host
> On Nov 24, 2014, at 19:53, Paolo Bonzini wrote: > > > > On 24/11/2014 16:28, Nadav Amit wrote: >> >> Since kvm_load_guest_fpu is called before the guest_fpu is ever stored, >> there are 2 more problems that currently cause #GP: >> 1. XCOMP_BV[63] = 0 >> 2. XSTATE_BV sets a bit (including bit 63) that is not set in XCOMP_BV >> (XCOMP_BV is initialised to zero). >> >> [see SDM 13.11 "OPERATION OF XRSTORS”] >> >> Once I initialise XCOMP_BV to (1ull << 63) | XSTATE_BV, the guest runs >> successfully. >> I have not checked any other qemu functionality that might be affected by >> the patch. > > I posted patches that assume that QEMU calls KVM_SET_XSAVE early enough. > If this is not the case, can you cook up and post a patch to > kvm_arch_vcpu_init that fixes the remaining problem? Sure. I will try to do so tomorrow. Nadav -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v14 3/7] KVM: x86: switch to kvm_get_dirty_log_protect
On 11/22/2014 11:19 AM, Christoffer Dall wrote: > On Thu, Nov 13, 2014 at 05:57:44PM -0800, Mario Smarduch wrote: >> From: Paolo Bonzini >> >> We now have a generic function that does most of the work of >> kvm_vm_ioctl_get_dirty_log, now use it. >> >> Signed-off-by: Mario Smarduch >> --- >> arch/x86/include/asm/kvm_host.h |3 -- >> arch/x86/kvm/Kconfig|1 + >> arch/x86/kvm/mmu.c |4 +-- >> arch/x86/kvm/x86.c | 64 >> ++- >> 4 files changed, 12 insertions(+), 60 deletions(-) >> >> diff --git a/arch/x86/include/asm/kvm_host.h >> b/arch/x86/include/asm/kvm_host.h >> index 7c492ed..934dc24 100644 >> --- a/arch/x86/include/asm/kvm_host.h >> +++ b/arch/x86/include/asm/kvm_host.h >> @@ -805,9 +805,6 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 >> accessed_mask, >> >> void kvm_mmu_reset_context(struct kvm_vcpu *vcpu); >> void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot); >> -void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, >> - struct kvm_memory_slot *slot, >> - gfn_t gfn_offset, unsigned long mask); >> void kvm_mmu_zap_all(struct kvm *kvm); >> void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm); >> unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm); >> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig >> index f9d16ff..d073594 100644 >> --- a/arch/x86/kvm/Kconfig >> +++ b/arch/x86/kvm/Kconfig >> @@ -39,6 +39,7 @@ config KVM >> select PERF_EVENTS >> select HAVE_KVM_MSI >> select HAVE_KVM_CPU_RELAX_INTERCEPT >> +select KVM_GENERIC_DIRTYLOG_READ_PROTECT >> select KVM_VFIO >> ---help--- >>Support hosting fully virtualized guest machines using hardware >> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c >> index 9314678..bf6b82c 100644 >> --- a/arch/x86/kvm/mmu.c >> +++ b/arch/x86/kvm/mmu.c >> @@ -1224,7 +1224,7 @@ static bool __rmap_write_protect(struct kvm *kvm, >> unsigned long *rmapp, >> } >> >> /** >> - * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages >> + * kvm_arch_mmu_write_protect_pt_masked - write protect selected PT level >> pages >> * @kvm: kvm instance >> * @slot: slot to protect >> * @gfn_offset: start of the BITS_PER_LONG pages we care about >> @@ -1233,7 +1233,7 @@ static bool __rmap_write_protect(struct kvm *kvm, >> unsigned long *rmapp, >> * Used when we do not need to care about huge page mappings: e.g. during >> dirty >> * logging we do not have any such mappings. >> */ >> -void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, >> +void kvm_arch_mmu_write_protect_pt_masked(struct kvm *kvm, >> struct kvm_memory_slot *slot, >> gfn_t gfn_offset, unsigned long mask) >> { >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> index 8f1e22d..9f8ae9a 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -3606,77 +3606,31 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm, >> * >> * 1. Take a snapshot of the bit and clear it if needed. >> * 2. Write protect the corresponding page. >> - * 3. Flush TLB's if needed. >> - * 4. Copy the snapshot to the userspace. >> + * 3. Copy the snapshot to the userspace. >> + * 4. Flush TLB's if needed. >> * >> - * Between 2 and 3, the guest may write to the page using the remaining TLB >> - * entry. This is not a problem because the page will be reported dirty at >> - * step 4 using the snapshot taken before and step 3 ensures that successive >> - * writes will be logged for the next call. >> + * Between 2 and 4, the guest may write to the page using the remaining TLB >> + * entry. This is not a problem because the page is reported dirty using >> + * the snapshot taken before and step 4 ensures that writes done after >> + * exiting to userspace will be logged for the next call. >> */ > > this seems to duplicate the comment in virt/kvm/kvm_main.c, but > whatever. Reuses most of that text but differs slightly, the _protect version is a subset of this one. > > FWIW: > Acked-by: Christoffer Dall > Thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v14 6/7] KVM: arm: dirty logging write protect support
On 11/22/2014 11:40 AM, Christoffer Dall wrote: > On Thu, Nov 13, 2014 at 05:57:47PM -0800, Mario Smarduch wrote: >> Add support to track dirty pages between user space KVM_GET_DIRTY_LOG ioctl >> calls. We call kvm_get_dirty_log_protect() function to do most of the work. >> >> Reviewed-by: Marc Zyngier >> Signed-off-by: Mario Smarduch >> --- >> arch/arm/kvm/Kconfig |1 + >> arch/arm/kvm/arm.c | 46 ++ >> arch/arm/kvm/mmu.c | 22 ++ >> 3 files changed, 69 insertions(+) >> >> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig >> index f27f336..a8d1ace 100644 >> --- a/arch/arm/kvm/Kconfig >> +++ b/arch/arm/kvm/Kconfig >> @@ -24,6 +24,7 @@ config KVM >> select HAVE_KVM_ARCH_TLB_FLUSH_ALL >> select KVM_MMIO >> select KVM_ARM_HOST >> +select KVM_GENERIC_DIRTYLOG_READ_PROTECT >> depends on ARM_VIRT_EXT && ARM_LPAE >> ---help--- >>Support hosting virtualized guest machines. You will also >> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c >> index a99e0cd..040c0f3 100644 >> --- a/arch/arm/kvm/arm.c >> +++ b/arch/arm/kvm/arm.c >> @@ -737,9 +737,55 @@ long kvm_arch_vcpu_ioctl(struct file *filp, >> } >> } >> >> +/** >> + * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a >> slot >> + * @kvm:kvm instance >> + * @log:slot id and address to which we copy the log >> + * >> + * We need to keep it in mind that VCPU threads can write to the bitmap >> + * concurrently. So, to avoid losing data, we keep the following order for >> + * each bit: >> + * >> + * 1. Take a snapshot of the bit and clear it if needed. >> + * 2. Write protect the corresponding page. >> + * 3. Copy the snapshot to the userspace. >> + * 4. Flush TLB's if needed. >> + * >> + * Steps 1,2,3 are handled by kvm_get_dirty_log_protect(). >> + * Between 2 and 4, the guest may write to the page using the remaining TLB >> + * entry. This is not a problem because the page is reported dirty using >> + * the snapshot taken before and step 4 ensures that writes done after >> + * exiting to userspace will be logged for the next call. >> + */ >> int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) >> { >> +#ifdef CONFIG_ARM >> +int r; >> +bool is_dirty = false; >> + >> +mutex_lock(&kvm->slots_lock); >> + >> +r = kvm_get_dirty_log_protect(kvm, log, &is_dirty); >> +if (r) >> +goto out; >> + >> +/* >> + * kvm_get_dirty_log_protect() may fail and we may skip TLB flush >> + * leaving few stale spte TLB entries which is harmless, given we're >> + * just write protecting spte's, so few stale TLB's will be left in >> + * original R/W state. And since the bitmap is corrupt userspace will >> + * error out anyway (i.e. during migration or dirty page loging for > > s/loging/logging/ > > Hmmm, where is this behavior specified in the ABI? If you call > KVM_GET_DIRTY_LOG subsequently, you will now potentially have unreported > dirty pages, which can be completely avoided by removing the > if-statement and the goto above. Why not simply do that and get rid of > this comment? Yeah that makes sense, the comment is an overkill for these few lines. > >> + * other reasons) terminating dirty page logging. >> + */ >> +if (is_dirty) >> +kvm_flush_remote_tlbs(kvm); >> +out: >> +mutex_unlock(&kvm->slots_lock); >> + >> +return r; >> +#else /* ARM64 */ >> return -EINVAL; >> +#endif >> } >> >> static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm, >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >> index 1e8b6a9..8137455 100644 >> --- a/arch/arm/kvm/mmu.c >> +++ b/arch/arm/kvm/mmu.c >> @@ -870,6 +870,28 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot) >> spin_unlock(&kvm->mmu_lock); >> kvm_flush_remote_tlbs(kvm); >> } >> + >> +/** >> + * kvm_arch_mmu_write_protect_pt_masked() - write protect dirty pages >> + * @kvm:The KVM pointer >> + * @slot: The memory slot associated with mask >> + * @gfn_offset: The gfn offset in memory slot >> + * @mask: The mask of dirty pages at offset 'gfn_offset' in this memory >> + * slot to be write protected >> + * >> + * Walks bits set in mask write protects the associated pte's. Caller must >> + * acquire kvm_mmu_lock. >> + */ >> +void kvm_arch_mmu_write_protect_pt_masked(struct kvm *kvm, >> +struct kvm_memory_slot *slot, >> +gfn_t gfn_offset, unsigned long mask) >> +{ >> +phys_addr_t base_gfn = slot->base_gfn + gfn_offset; >> +phys_addr_t start = (base_gfn + __ffs(mask)) << PAGE_SHIFT; >> +phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT; >> + >> +stage2_wp_range(kvm, start, end); >> +} >> #endif >> >> static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, >> -- >> 1.7.9.5 >> -- To unsubscribe from this list: send the line "unsubscribe kvm"
Re: [PATCH v14 5/7] KVM: arm: Add initial dirty page locking support
On 11/22/2014 11:33 AM, Christoffer Dall wrote: > On Thu, Nov 13, 2014 at 05:57:46PM -0800, Mario Smarduch wrote: >> Add support for initial write protection of VM memslots. This patch >> series assumes that huge PUDs will not be used in 2nd stage tables, which is >> always valid on ARMv7. >> >> Signed-off-by: Mario Smarduch >> --- >> arch/arm/include/asm/kvm_host.h |2 + >> arch/arm/include/asm/kvm_mmu.h| 20 + >> arch/arm/include/asm/pgtable-3level.h |1 + >> arch/arm/kvm/mmu.c| 138 >> + >> 4 files changed, 161 insertions(+) >> >> diff --git a/arch/arm/include/asm/kvm_host.h >> b/arch/arm/include/asm/kvm_host.h >> index 3da6ea7..8fa6238 100644 >> --- a/arch/arm/include/asm/kvm_host.h >> +++ b/arch/arm/include/asm/kvm_host.h >> @@ -245,4 +245,6 @@ static inline void vgic_arch_setup(const struct >> vgic_params *vgic) >> int kvm_perf_init(void); >> int kvm_perf_teardown(void); >> >> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot); >> + >> #endif /* __ARM_KVM_HOST_H__ */ >> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h >> index 5cc0b0f..08ab5e8 100644 >> --- a/arch/arm/include/asm/kvm_mmu.h >> +++ b/arch/arm/include/asm/kvm_mmu.h >> @@ -114,6 +114,26 @@ static inline void kvm_set_s2pmd_writable(pmd_t *pmd) >> pmd_val(*pmd) |= L_PMD_S2_RDWR; >> } >> >> +static inline void kvm_set_s2pte_readonly(pte_t *pte) >> +{ >> +pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY; >> +} >> + >> +static inline bool kvm_s2pte_readonly(pte_t *pte) >> +{ >> +return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY; >> +} >> + >> +static inline void kvm_set_s2pmd_readonly(pmd_t *pmd) >> +{ >> +pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY; >> +} >> + >> +static inline bool kvm_s2pmd_readonly(pmd_t *pmd) >> +{ >> +return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY; >> +} >> + >> /* Open coded p*d_addr_end that can deal with 64bit addresses */ >> #define kvm_pgd_addr_end(addr, end) \ >> ({ u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;\ >> diff --git a/arch/arm/include/asm/pgtable-3level.h >> b/arch/arm/include/asm/pgtable-3level.h >> index 06e0bc0..d29c880 100644 >> --- a/arch/arm/include/asm/pgtable-3level.h >> +++ b/arch/arm/include/asm/pgtable-3level.h >> @@ -130,6 +130,7 @@ >> #define L_PTE_S2_RDONLY (_AT(pteval_t, 1) << 6) /* >> HAP[1] */ >> #define L_PTE_S2_RDWR (_AT(pteval_t, 3) << 6) /* >> HAP[2:1] */ >> >> +#define L_PMD_S2_RDONLY (_AT(pmdval_t, 1) << 6) /* >> HAP[1] */ >> #define L_PMD_S2_RDWR (_AT(pmdval_t, 3) << 6) /* >> HAP[2:1] */ >> >> /* >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >> index 16e7994..1e8b6a9 100644 >> --- a/arch/arm/kvm/mmu.c >> +++ b/arch/arm/kvm/mmu.c >> @@ -45,6 +45,7 @@ static phys_addr_t hyp_idmap_vector; >> #define pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t)) >> >> #define kvm_pmd_huge(_x)(pmd_huge(_x) || pmd_trans_huge(_x)) >> +#define kvm_pud_huge(_x)pud_huge(_x) >> >> static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa) >> { >> @@ -746,6 +747,131 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, >> phys_addr_t *ipap) >> return false; >> } >> >> +#ifdef CONFIG_ARM >> +/** >> + * stage2_wp_ptes - write protect PMD range >> + * @pmd:pointer to pmd entry >> + * @addr: range start address >> + * @end:range end address >> + */ >> +static void stage2_wp_ptes(pmd_t *pmd, phys_addr_t addr, phys_addr_t end) >> +{ >> +pte_t *pte; >> + >> +pte = pte_offset_kernel(pmd, addr); >> +do { >> +if (!pte_none(*pte)) { >> +if (!kvm_s2pte_readonly(pte)) >> +kvm_set_s2pte_readonly(pte); >> +} > > incorrect indentation of the closing brace got it. > >> +} while (pte++, addr += PAGE_SIZE, addr != end); >> +} >> + >> +/** >> + * stage2_wp_pmds - write protect PUD range >> + * @pud:pointer to pud entry >> + * @addr: range start address >> + * @end:range end address >> + */ >> +static void stage2_wp_pmds(pud_t *pud, phys_addr_t addr, phys_addr_t end) >> +{ >> +pmd_t *pmd; >> +phys_addr_t next; >> + >> +pmd = pmd_offset(pud, addr); >> + >> +do { >> +next = kvm_pmd_addr_end(addr, end); >> +if (!pmd_none(*pmd)) { >> +if (kvm_pmd_huge(*pmd)) { >> +if (!kvm_s2pmd_readonly(pmd)) >> +kvm_set_s2pmd_readonly(pmd); >> +} else { >> +stage2_wp_ptes(pmd, addr, next); >> +} >> +} >> +} while (pmd++, addr = next, addr != end); >> +} >> + >> +/** >> + * stage2_wp_puds - write protect PGD ran
Fix Penguin Penalty 17th October2014 ( mail-archive.com )
Dear Sir Did your website get hit by Google Penguin update on October 17th 2014? What basically is Google Penguin Update? It is actually a code name for Google algorithm which aims at decreasing your websites search engine rankings that violate Googles guidelines by using black hat SEO techniques to rank your webpage by giving number of spammy links to the page. We are one of those few SEO companies that can help you avoid penalties from Google Updates like Penguin and Panda. Our clients have survived all the previous and present updates with ease. They have never been hit because we use 100% white hat SEO techniques to rank Webpages. Simple thing that we do to keep websites away from any Penguin or Panda penalties is follow Google guidelines and we give Google users the best answers to their queries. If you are looking to increase the quality of your websites and to get more targeted traffic or save your websites from these Google penalties email us back with your interest. We will be glad to serve you and help you grow your business. Regards Julia kites SEO Manager ( TOB ) B7 Green Avenue, Amritsar 143001 Punjab NO CLICK in the subject to STOP EMAILS -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL v2 02/15] pc: kvm: check if KVM has free memory slots to avoid abort()
From: Igor Mammedov When more memory devices are used than available KVM memory slots, QEMU crashes with: kvm_alloc_slot: no free slot available Aborted (core dumped) Fix this by checking that KVM has a free slot before attempting to map memory in guest address space. Signed-off-by: Igor Mammedov Acked-by: Paolo Bonzini Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin --- include/sysemu/kvm.h | 1 + hw/i386/pc.c | 5 + kvm-all.c| 18 +- kvm-stub.c | 5 + 4 files changed, 28 insertions(+), 1 deletion(-) diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index b0cd657..22e42ef 100644 --- a/include/sysemu/kvm.h +++ b/include/sysemu/kvm.h @@ -163,6 +163,7 @@ extern KVMState *kvm_state; /* external API */ +bool kvm_has_free_slot(MachineState *ms); int kvm_has_sync_mmu(void); int kvm_has_vcpu_events(void); int kvm_has_robust_singlestep(void); diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 1205db8..ce7b752 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -1598,6 +1598,11 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev, goto out; } +if (kvm_enabled() && !kvm_has_free_slot(machine)) { +error_setg(&local_err, "hypervisor has no free memory slots left"); +goto out; +} + memory_region_add_subregion(&pcms->hotplug_memory, addr - pcms->hotplug_memory_base, mr); vmstate_register_ram(mr, dev); diff --git a/kvm-all.c b/kvm-all.c index 596e7ce..937bc9d 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -132,7 +132,7 @@ static const KVMCapabilityInfo kvm_required_capabilites[] = { KVM_CAP_LAST_INFO }; -static KVMSlot *kvm_alloc_slot(KVMState *s) +static KVMSlot *kvm_get_free_slot(KVMState *s) { int i; @@ -142,6 +142,22 @@ static KVMSlot *kvm_alloc_slot(KVMState *s) } } +return NULL; +} + +bool kvm_has_free_slot(MachineState *ms) +{ +return kvm_get_free_slot(KVM_STATE(ms->accelerator)); +} + +static KVMSlot *kvm_alloc_slot(KVMState *s) +{ +KVMSlot *slot = kvm_get_free_slot(s); + +if (slot) { +return slot; +} + fprintf(stderr, "%s: no free slot available\n", __func__); abort(); } diff --git a/kvm-stub.c b/kvm-stub.c index 43fc0dd..7ba90c5 100644 --- a/kvm-stub.c +++ b/kvm-stub.c @@ -147,4 +147,9 @@ int kvm_irqchip_remove_irqfd_notifier(KVMState *s, EventNotifier *n, int virq) { return -ENOSYS; } + +bool kvm_has_free_slot(MachineState *ms) +{ +return false; +} #endif -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL v2 05/15] memory: expose alignment used for allocating RAM as MemoryRegion API
From: Igor Mammedov introduce memory_region_get_alignment() that returns underlying memory block alignment or 0 if it's not relevant/implemented for backend. Signed-off-by: Igor Mammedov Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin --- include/exec/exec-all.h | 2 +- include/exec/memory.h | 2 ++ include/qemu/osdep.h| 3 ++- exec.c | 9 ++--- memory.c| 5 + target-s390x/kvm.c | 2 +- util/oslib-posix.c | 5 - util/oslib-win32.c | 2 +- 8 files changed, 22 insertions(+), 8 deletions(-) diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h index 421a142..0844885 100644 --- a/include/exec/exec-all.h +++ b/include/exec/exec-all.h @@ -333,7 +333,7 @@ extern uintptr_t tci_tb_ptr; #if !defined(CONFIG_USER_ONLY) -void phys_mem_set_alloc(void *(*alloc)(size_t)); +void phys_mem_set_alloc(void *(*alloc)(size_t, uint64_t *align)); struct MemoryRegion *iotlb_to_region(AddressSpace *as, hwaddr index); bool io_mem_read(struct MemoryRegion *mr, hwaddr addr, diff --git a/include/exec/memory.h b/include/exec/memory.h index 74a58b4..f64ab5e 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -146,6 +146,7 @@ struct MemoryRegion { hwaddr addr; void (*destructor)(MemoryRegion *mr); ram_addr_t ram_addr; +uint64_t align; bool subpage; bool terminates; bool romd_mode; @@ -838,6 +839,7 @@ void memory_region_add_subregion_overlap(MemoryRegion *mr, */ ram_addr_t memory_region_get_ram_addr(MemoryRegion *mr); +uint64_t memory_region_get_alignment(const MemoryRegion *mr); /** * memory_region_del_subregion: Remove a subregion. * diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h index c032434..b3300cc 100644 --- a/include/qemu/osdep.h +++ b/include/qemu/osdep.h @@ -5,6 +5,7 @@ #include #include #include +#include #include #ifdef __OpenBSD__ #include @@ -103,7 +104,7 @@ typedef signed int int_fast16_t; int qemu_daemon(int nochdir, int noclose); void *qemu_try_memalign(size_t alignment, size_t size); void *qemu_memalign(size_t alignment, size_t size); -void *qemu_anon_ram_alloc(size_t size); +void *qemu_anon_ram_alloc(size_t size, uint64_t *align); void qemu_vfree(void *ptr); void qemu_anon_ram_free(void *ptr, size_t size); diff --git a/exec.c b/exec.c index f0e2bd3..71ac104 100644 --- a/exec.c +++ b/exec.c @@ -909,14 +909,15 @@ static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end, uint16_t section); static subpage_t *subpage_init(AddressSpace *as, hwaddr base); -static void *(*phys_mem_alloc)(size_t size) = qemu_anon_ram_alloc; +static void *(*phys_mem_alloc)(size_t size, uint64_t *align) = + qemu_anon_ram_alloc; /* * Set a custom physical guest memory alloator. * Accelerators with unusual needs may need this. Hopefully, we can * get rid of it eventually. */ -void phys_mem_set_alloc(void *(*alloc)(size_t)) +void phys_mem_set_alloc(void *(*alloc)(size_t, uint64_t *align)) { phys_mem_alloc = alloc; } @@ -1098,6 +1099,7 @@ static void *file_ram_alloc(RAMBlock *block, error_propagate(errp, local_err); goto error; } +block->mr->align = hpagesize; if (memory < hpagesize) { error_setg(errp, "memory size 0x" RAM_ADDR_FMT " must be equal to " @@ -1309,7 +1311,8 @@ static ram_addr_t ram_block_add(RAMBlock *new_block, Error **errp) if (xen_enabled()) { xen_ram_alloc(new_block->offset, new_block->length, new_block->mr); } else { -new_block->host = phys_mem_alloc(new_block->length); +new_block->host = phys_mem_alloc(new_block->length, + &new_block->mr->align); if (!new_block->host) { error_setg_errno(errp, errno, "cannot set up guest memory '%s'", diff --git a/memory.c b/memory.c index 0f4fdc7..15cf9eb 100644 --- a/memory.c +++ b/memory.c @@ -1749,6 +1749,11 @@ ram_addr_t memory_region_get_ram_addr(MemoryRegion *mr) return mr->ram_addr; } +uint64_t memory_region_get_alignment(const MemoryRegion *mr) +{ +return mr->align; +} + static int cmp_flatrange_addr(const void *addr_, const void *fr_) { const AddrRange *addr = addr_; diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c index d247471..50709ba 100644 --- a/target-s390x/kvm.c +++ b/target-s390x/kvm.c @@ -404,7 +404,7 @@ int kvm_arch_get_registers(CPUState *cs) * to grow. We also have to use MAP parameters that avoid * read-only mapping of guest pages. */ -static void *legacy_s390_alloc(size_t size) +static void *legacy_s390_alloc(size_t size, , uint64_t *align) { void *mem; diff --git a/util/oslib-posix.c b/util/oslib-posix.c index 8c9d80e..16fcec2 100644 --- a/util/oslib-posix.c +++ b/util/oslib-posix.c @@ -124,7 +124,7 @@ void *qemu_memalign(size_t ali
Re: [RESEND PATCH v14 7/7] KVM: arm: page logging 2nd stage fault handling
On 11/22/2014 11:53 AM, Christoffer Dall wrote: > On Fri, Nov 14, 2014 at 01:54:44PM -0800, Mario Smarduch wrote: >> This patch adds support for handling 2nd stage page faults during migration, >> it disables faulting in huge pages, and dissolves huge pages to page tables. >> In case migration is canceled huge pages are used again. >> >> Resending to addresse Marc's comments to simplify stage2_set_pte() handling >> of logging, and mapping device memory - flags. >> >> Reviewed-by: Marc Zyngier >> Reviewed-by: Christoffer Dall >> Signed-off-by: Mario Smarduch >> --- >> arch/arm/kvm/mmu.c | 61 >> +--- >> 1 file changed, 53 insertions(+), 8 deletions(-) >> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >> index 8137455..d29de77 100644 >> --- a/arch/arm/kvm/mmu.c >> +++ b/arch/arm/kvm/mmu.c >> @@ -47,6 +47,18 @@ static phys_addr_t hyp_idmap_vector; >> #define kvm_pmd_huge(_x)(pmd_huge(_x) || pmd_trans_huge(_x)) >> #define kvm_pud_huge(_x)pud_huge(_x) >> >> +#define KVM_S2PTE_FLAG_IS_IOMAP (1UL << 0) >> +#define KVM_S2PTE_FLAG_LOGGING_ACTIVE (1UL << 1) >> + >> +static bool kvm_get_logging_state(struct kvm_memory_slot *memslot) >> +{ >> +#ifdef CONFIG_ARM >> +return !!memslot->dirty_bitmap; >> +#else >> +return false; >> +#endif >> +} >> + >> static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa) >> { >> /* >> @@ -626,10 +638,13 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct >> kvm_mmu_memory_cache >> } >> >> static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache >> *cache, >> - phys_addr_t addr, const pte_t *new_pte, bool iomap) >> + phys_addr_t addr, const pte_t *new_pte, >> + unsigned long flags) >> { >> pmd_t *pmd; >> pte_t *pte, old_pte; >> +unsigned long iomap = flags & KVM_S2PTE_FLAG_IS_IOMAP; >> +unsigned long logging_active = flags & KVM_S2PTE_FLAG_LOGGING_ACTIVE; >> >> /* Create stage-2 page table mapping - Level 1 */ >> pmd = stage2_get_pmd(kvm, cache, addr); >> @@ -641,6 +656,18 @@ static int stage2_set_pte(struct kvm *kvm, struct >> kvm_mmu_memory_cache *cache, >> return 0; >> } >> >> +/* >> + * While dirty memory logging, clear PMD entry for huge page and split >> + * into smaller pages, to track dirty memory at page granularity. >> + */ >> +if (logging_active && kvm_pmd_huge(*pmd)) { >> +phys_addr_t ipa = pmd_pfn(*pmd) << PAGE_SHIFT; > > just noticed this: this is not an IPA is it? pmd_pfn should give us the > host pfn. I think you need to manipulate @addr instead. > > Did I manage to confuse myself? No you're right, a *bad mistake* on my part I broke it between v8 and v9. Not sure how it happened absent minded cut and paste? Also when the pmd is cleared, should that be flushed to level where the pmd is visible to page table walks? Or am I confusing something here? Thanks. > > (Yeah, I know I said I reviewed this one already) > >> + >> +pmd_clear(pmd); >> +kvm_tlb_flush_vmid_ipa(kvm, ipa); >> +put_page(virt_to_page(pmd)); >> +} >> + >> /* Create stage-2 page mappings - Level 2 */ >> if (pmd_none(*pmd)) { >> if (!cache) >> @@ -693,7 +720,8 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t >> guest_ipa, >> if (ret) >> goto out; >> spin_lock(&kvm->mmu_lock); >> -ret = stage2_set_pte(kvm, &cache, addr, &pte, true); >> +ret = stage2_set_pte(kvm, &cache, addr, &pte, >> +KVM_S2PTE_FLAG_IS_IOMAP); >> spin_unlock(&kvm->mmu_lock); >> if (ret) >> goto out; >> @@ -908,6 +936,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >> phys_addr_t fault_ipa, >> struct vm_area_struct *vma; >> pfn_t pfn; >> pgprot_t mem_type = PAGE_S2; >> +unsigned long logging_active = 0; >> + >> +if (kvm_get_logging_state(memslot)) >> +logging_active = KVM_S2PTE_FLAG_LOGGING_ACTIVE; >> >> write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu)); >> if (fault_status == FSC_PERM && !write_fault) { >> @@ -918,7 +950,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >> phys_addr_t fault_ipa, >> /* Let's check if we will get back a huge page backed by hugetlbfs */ >> down_read(¤t->mm->mmap_sem); >> vma = find_vma_intersection(current->mm, hva, hva + 1); >> -if (is_vm_hugetlb_page(vma)) { >> +if (is_vm_hugetlb_page(vma) && !logging_active) { >> hugetlb = true; >> gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT; >> } else { >> @@ -964,7 +996,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >> phys_addr_t fault_ipa, >> spin_lock(&kvm->mmu_lock); >> if (mmu_notifier_retry(kvm, mmu_seq))
KVM call for agenda for 2014-11-24
Hi Please, send any topic that you are interested in covering. Thanks, Juan. Call details: 15:00 CEST 13:00 UTC 09:00 EDT Every two weeks By popular demand, a google calendar public entry with it https://www.google.com/calendar/embed?src=dG9iMXRqcXAzN3Y4ZXZwNzRoMHE4a3BqcXNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ (Let me know if you have any problems with the calendar entry) If you need phone number details, contact me privately Thanks, Juan. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 26/41] vhost: virtio 1.0 endian-ness support
On Mon, Nov 24, 2014 at 03:28:26PM +0100, Cedric Le Goater wrote: > Hi Michael, > > Do you have a tree from where I could pull these patches ? > > Thanks, > > C. Yes - vhost-next that linux-next includes. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 6/8] KVM: kvm-vfio: wrapper to VFIO external API device helpers
On Sun, 2014-11-23 at 19:35 +0100, Eric Auger wrote: > Provide wrapper functions that allow KVM-VFIO device code to > interact with a vfio device: > - kvm_vfio_device_get_external_user gets a handle to a struct > vfio_device from the vfio device file descriptor and increments > its reference counter, > - kvm_vfio_device_put_external_user decrements the reference counter > to a vfio device, > - kvm_vfio_external_base_device returns a handle to the struct device > of the vfio device. > > The KVM-VFIO device uses the VFIO external API device functions. > > Signed-off-by: Eric Auger > > --- > > v2 -> v3: > reword the commit message and title > > v1 -> v2: > - kvm_vfio_external_get_base_device renamed into > kvm_vfio_external_base_device > - kvm_vfio_external_get_type removed > --- > arch/arm/include/asm/kvm_host.h | 5 + > virt/kvm/vfio.c | 45 > + > 2 files changed, 50 insertions(+) > > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h > index 53036e2..bca5b79 100644 > --- a/arch/arm/include/asm/kvm_host.h > +++ b/arch/arm/include/asm/kvm_host.h > @@ -169,6 +169,11 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long > hva, pte_t pte); > unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu); > int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices); > > +struct vfio_device; > +struct vfio_device *kvm_vfio_device_get_external_user(struct file *filep); > +void kvm_vfio_device_put_external_user(struct vfio_device *vdev); > +struct device *kvm_vfio_external_base_device(struct vfio_device *vdev); > + This doesn't look right, why doesn't kvm-vfio send the struct dev to the arch callback? Then the below functions can be static. Nothing outside of kvm-vfio device should need to do anything with vfio_device references. > /* We do not have shadow page tables, hence the empty hooks */ > static inline int kvm_age_hva(struct kvm *kvm, unsigned long start, > unsigned long end) > diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c > index 620e37f..6f0cc34 100644 > --- a/virt/kvm/vfio.c > +++ b/virt/kvm/vfio.c > @@ -60,6 +60,51 @@ static void kvm_vfio_group_put_external_user(struct > vfio_group *vfio_group) > symbol_put(vfio_group_put_external_user); > } > > +struct vfio_device *kvm_vfio_device_get_external_user(struct file *filep) > +{ > + struct vfio_device *vdev; > + struct vfio_device *(*fn)(struct file *); > + > + fn = symbol_get(vfio_device_get_external_user); > + if (!fn) > + return ERR_PTR(-EINVAL); > + > + vdev = fn(filep); > + > + symbol_put(vfio_device_get_external_user); > + > + return vdev; > +} > + > +void kvm_vfio_device_put_external_user(struct vfio_device *vdev) > +{ > + void (*fn)(struct vfio_device *); > + > + fn = symbol_get(vfio_device_put_external_user); > + if (!fn) > + return; > + > + fn(vdev); > + > + symbol_put(vfio_device_put_external_user); > +} > + > +struct device *kvm_vfio_external_base_device(struct vfio_device *vdev) > +{ > + struct device *(*fn)(struct vfio_device *); > + struct device *dev; > + > + fn = symbol_get(vfio_external_base_device); > + if (!fn) > + return NULL; > + > + dev = fn(vdev); > + > + symbol_put(vfio_external_base_device); > + > + return dev; > +} > + > static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group) > { > long (*fn)(struct vfio_group *, unsigned long); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 8/8] KVM: arm: kvm-vfio: forwarding control
On Sun, 2014-11-23 at 19:36 +0100, Eric Auger wrote: > This patch sets __KVM_HAVE_ARCH_KVM_VFIO_FORWARD and implements > kvm_arch_vfio_set_forward for ARM. > > As a result the KVM-VFIO device now allows to forward/unforward a > VFIO device IRQ on ARM. > > kvm_arch_vfio_set_forward programs both genirq and the VGIC to control > where the physical IRQ deactivation is initiated. > - forwarded case: deactivation is initiated by the guest; when it > completes the virtual IRQ, the GIC automatically deactivates the > physical IRQ. > - not forwarded case: the physical IRQ deactivation is handled by the host > > Signed-off-by: Eric Auger > > --- > > v2 -> v3: > - renaming of kvm_arch_set_fwd_state into kvm_arch_vfio_set_forward > - takes a bool arg instead of kvm_fwd_irq_action enum > - removal of KVM_VFIO_IRQ_CLEANUP > - platform device check now happens here > - more precise errors returned > - irq_eoi handled externally to this patch (VGIC) > - correct enable_irq bug done twice > - reword the commit message > - correct check of platform_bus_type > - use raw_spin_lock_irqsave and check the validity of the handler > --- > arch/arm/include/asm/kvm_host.h | 2 + > arch/arm/kvm/Makefile | 2 +- > arch/arm/kvm/kvm_vfio_arm.c | 101 > > 3 files changed, 104 insertions(+), 1 deletion(-) > create mode 100644 arch/arm/kvm/kvm_vfio_arm.c > > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h > index bca5b79..447f90c 100644 > --- a/arch/arm/include/asm/kvm_host.h > +++ b/arch/arm/include/asm/kvm_host.h > @@ -27,6 +27,8 @@ > #include > #include > > +#define __KVM_HAVE_ARCH_KVM_VFIO_FORWARD > + > #if defined(CONFIG_KVM_ARM_MAX_VCPUS) > #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS > #else > diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile > index ea1fa76..26a5a42 100644 > --- a/arch/arm/kvm/Makefile > +++ b/arch/arm/kvm/Makefile > @@ -19,7 +19,7 @@ kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o > $(KVM)/eventfd.o $(KVM)/vf > > obj-y += kvm-arm.o init.o interrupts.o > obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o > -obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o > +obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o > kvm_vfio_arm.o > obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic.o > obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v2.o > obj-$(CONFIG_KVM_ARM_TIMER) += $(KVM)/arm/arch_timer.o > diff --git a/arch/arm/kvm/kvm_vfio_arm.c b/arch/arm/kvm/kvm_vfio_arm.c > new file mode 100644 > index 000..af2c501 > --- /dev/null > +++ b/arch/arm/kvm/kvm_vfio_arm.c > @@ -0,0 +1,101 @@ > +/* > + * Copyright (C) 2014 Linaro Ltd. > + * Authors: Eric Auger > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License, version 2, as > + * published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +/** > + * kvm_arch_vfio_set_forward - Change the forward state of an IRQ > + * > + * @fwd_irq: handle to the forward irq struct > + * @forward: target forwarding state > + * > + * If forward is true, programs genirq and VGIC so that physical IRQ > + * deactivation ownership is transferred to the guest (using GIC HW feature). > + * When forward is false, standard behavior is restored, ie. host > + * deactivates the physical IRQ. > + * returns: > + * -EINVAL if the vfio device is not a platform device > + * -ENOENT if the irq could not be identified > + * -EBUSY if physical IRQ is in progress > + * -ENOENT if the VGIC has a physical/virtual IRQ mapping that is not > + * consistent with the request. > + */ > +int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq, > + bool forward) > +{ > + int hwirq; > + int ret = -EBUSY; > + struct irq_desc *desc; > + struct irq_data *d; > + struct platform_device *platdev; > + struct device *dev = kvm_vfio_external_base_device(fwd_irq->vdev); Just pass the dev > + unsigned long flags; > + /* > + * We don't have to garantee the vcpu handle is non void since the s/garantee/guarantee/ > + * vfio device holds a reference to the kvm struct > + */ > + struct kvm_vcpu *vcpu = kvm_get_vcpu(fwd_irq->kvm, 0); > + > + if (dev->bus == &platform_bus_type) { > + platdev = to_platform_device(dev); > + hwirq = platform_get_irq(platdev, fwd_irq->index); > + if (hwirq < 0) > + return -EINVAL; > +
Re: [PATCH v3 7/8] KVM: kvm-vfio: generic forwarding control
On Sun, 2014-11-23 at 19:35 +0100, Eric Auger wrote: > This patch introduces a new KVM_DEV_VFIO_DEVICE group. > > This is a new control channel which enables KVM to cooperate with > viable VFIO devices. > > Functions are introduced to check the validity of a VFIO device > file descriptor, increment/decrement the ref counter of the VFIO > device. > > The patch introduces 2 attributes for this new device group: > KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. > Their purpose is to turn a VFIO device IRQ into a forwarded IRQ and > unset respectively unset the feature. > > The VFIO device stores a list of registered forwarded IRQs. The reference > counter of the device is incremented each time a new IRQ is forwarded. > Reference counter is decremented when the IRQ forwarding is unset. > > The forwarding programmming is architecture specific, implemented in > kvm_arch_set_fwd_state function. Architecture specific implementation is > enabled when __KVM_HAVE_ARCH_KVM_VFIO_FORWARD is set. When not set those > functions are void. > > Signed-off-by: Eric Auger > > --- > > v2 -> v3: > - add API comments in kvm_host.h > - improve the commit message > - create a private kvm_vfio_fwd_irq struct > - fwd_irq_action replaced by a bool and removal of VFIO_IRQ_CLEANUP. This > latter action will be handled in vgic. > - add a vfio_device handle argument to kvm_arch_set_fwd_state. The goal is > to move platform specific stuff in architecture specific code. > - kvm_arch_set_fwd_state renamed into kvm_arch_vfio_set_forward > - increment the ref counter each time we do an IRQ forwarding and decrement > this latter each time one IRQ forward is unset. Simplifies the whole > ref counting. > - simplification of list handling: create, search, removal > > v1 -> v2: > - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD > - original patch file separated into 2 parts: generic part moved in vfio.c > and ARM specific part(kvm_arch_set_fwd_state) > --- > include/linux/kvm_host.h | 28 ++ > virt/kvm/vfio.c | 249 > ++- > 2 files changed, 274 insertions(+), 3 deletions(-) > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index ea53b04..0b9659d 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -1076,6 +1076,15 @@ struct kvm_device_ops { > unsigned long arg); > }; > > +/* internal self-contained structure describing a forwarded IRQ */ > +struct kvm_fwd_irq { > + struct kvm *kvm; /* VM to inject the GSI into */ > + struct vfio_device *vdev; /* vfio device the IRQ belongs to */ > + __u32 index; /* VFIO device IRQ index */ > + __u32 subindex; /* VFIO device IRQ subindex */ > + __u32 gsi; /* gsi, ie. virtual IRQ number */ > +}; > + > void kvm_device_get(struct kvm_device *dev); > void kvm_device_put(struct kvm_device *dev); > struct kvm_device *kvm_device_from_filp(struct file *filp); > @@ -1085,6 +1094,25 @@ void kvm_unregister_device_ops(u32 type); > extern struct kvm_device_ops kvm_mpic_ops; > extern struct kvm_device_ops kvm_xics_ops; > > +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD > +/** > + * kvm_arch_vfio_set_forward - changes the forwarded state of an IRQ > + * > + * @fwd_irq: handle to the forwarded irq struct > + * @forward: true means forwarded, false means not forwarded > + * returns 0 on success, < 0 on failure > + */ > +int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq, > + bool forward); We could add a struct device* to the args list or into struct kvm_fwd_irq so that arch code doesn't need to touch the vdev. arch code has no business dealing with references to the vfio_device. > + > +#else > +static inline int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq, > + bool forward) > +{ > + return 0; > +} > +#endif > + > #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT > > static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val) > diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c > index 6f0cc34..af178bb 100644 > --- a/virt/kvm/vfio.c > +++ b/virt/kvm/vfio.c > @@ -25,8 +25,16 @@ struct kvm_vfio_group { > struct vfio_group *vfio_group; > }; > > +/* private linkable kvm_fwd_irq struct */ > +struct kvm_vfio_fwd_irq_node { > + struct list_head link; > + struct kvm_fwd_irq fwd_irq; > +}; > + > struct kvm_vfio { > struct list_head group_list; > + /* list of registered VFIO forwarded IRQs */ > + struct list_head fwd_node_list; > struct mutex lock; > bool noncoherent; > }; > @@ -247,12 +255,239 @@ static int kvm_vfio_set_group(struct kvm_device *dev, > long attr, u64 arg) > return -ENXIO; > } > > +/** > + * kvm_vfio_get_vfio_device - Returns a handle to a vfio-device > + * > + * Checks it is a valid vfio device and increments its reference counter > + * @fd: file descriptor of t
Re: Another Obsolete Fix me in trace.h?
2014-11-24 11:40+0100, Jan Kiszka: > On 2014-11-24 11:12, Paolo Bonzini wrote: > > On 24/11/2014 05:36, nick wrote: > >> Greetings Again Gleb and others, > >> I am assuming in the code I am pasting below the fix me is obsolete now > >> and I can remove it. :) > >> Cheers Nick > >> TP_printk("%s (0x%x)", > >> __print_symbolic(__entry->exception, kvm_trace_sym_exc), > >>/* FIXME: don't print error_code if not present */ > >> __entry->has_error ? __entry->error_code : 0) > >> ); > >> > > > > No, it's not obsolete, the idea is to print only > > > >%s > > > > instead of > > > >%s (0x%x) > > > > if __entry->has_error is false. I don't know the trace API well enough > > to know if that is possible. > > Last time I ran across such a scenario, it was not feasible and > essentially required separate tracepoints. But maybe Steven knows a trick. The format string has to be a string literal[1]; we could change it to allow expressions[2], but what we want is almost possible through a direct call to trace_seq_printf()[3]. The raw result would look like #define __print(fmt, args...) ({ \ const char *buf_start = trace_seq_buffer_ptr(p); \ trace_seq_printf(p, fmt, args); \ trace_seq_putc(p, '\0'); \ buf_start; \ }) TP_printk("%s%s", [...], __entry->has_error ? __print("(0x%x)", __entry->error_code) : "") and would be acceptable if something __print-like made it into a ftrace helper[4]. (Userspace won't be able to nicely print it otherwise.) --- 1: #define TP_printk(fmt, args...) fmt "\n", args 2: TP_printk(__entry->has_error ? "%s (0x%x)" : "%s", [...] 3: Already in scsi_dispatch_cmd_start or kvm_mmu_get_page tracepoints. 4: Like __print_hex or print_symbolic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Using blktap2/blktap3 under KVM?
Hi, I was wondering whether it is possible to use blktap2 or blktap3 (from the tools of Xen) as the block devices for KVM? Both blktap2 and blktap3 are the tools from Xen as the block devices. They are all outside of Xen. According to the blktap3 wiki page, blktap3 even does not require any kernel support in Xen Dom0 (http://wiki.xen.org/wiki/Blktap3). I also found an early discussion that seems to indicate the workable combination of KVM with blktap2 (http://www.gossamer-threads.com/lists/xen/devel/263167). If this is possible, is there any hints about how to configure to use blktap2/blktap3 in KVM? Thanks, Zhuan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/3] KVM: arm/arm64: Enable Dirty Page logging for ARMv8
On 11/22/2014 12:02 PM, Christoffer Dall wrote: > On Sat, Nov 15, 2014 at 12:19:10AM -0800, m.smard...@samsung.com wrote: >> From: Mario Smarduch >> >> This patch enables ARMv8 ditry page logging support. Plugs ARMv8 into generic >> layer through Kconfig symbol, and drops earlier ARM64 constraints to enable >> logging at architecture layer. >> >> Signed-off-by: Mario Smarduch > > Just reminding you again of what I said in the previous thread (think > that was before you sent this out), that you need to handle the pud_huge > case in arch/arm/kvm/mmu.c for ARMv8 here. > > -Christoffer > Yes, so like similar handling to what unmap_puds() does when it encounters a PUD Block? Should next revision be rebased to 'queued' 3.18.0-rc2? Thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Another Obsolete Fix me in trace.h?
On Mon, 24 Nov 2014 22:00:01 +0100 Radim Krčmář wrote: > 2014-11-24 11:40+0100, Jan Kiszka: > > On 2014-11-24 11:12, Paolo Bonzini wrote: > > > On 24/11/2014 05:36, nick wrote: > > >> Greetings Again Gleb and others, > > >> I am assuming in the code I am pasting below the fix me is obsolete now > > >> and I can remove it. :) > > >> Cheers Nick > > >> TP_printk("%s (0x%x)", > > >> __print_symbolic(__entry->exception, > > >> kvm_trace_sym_exc), > > >>/* FIXME: don't print error_code if not present */ > > >> __entry->has_error ? __entry->error_code : 0) > > >> ); > > >> > > > > > > No, it's not obsolete, the idea is to print only > > > > > >%s > > > > > > instead of > > > > > >%s (0x%x) > > > > > > if __entry->has_error is false. I don't know the trace API well enough > > > to know if that is possible. > > > > Last time I ran across such a scenario, it was not feasible and > > essentially required separate tracepoints. But maybe Steven knows a trick. > > The format string has to be a string literal[1]; we could change it to > allow expressions[2], but what we want is almost possible through a > direct call to trace_seq_printf()[3]. > > The raw result would look like > > #define __print(fmt, args...) ({ \ > const char *buf_start = trace_seq_buffer_ptr(p); \ > trace_seq_printf(p, fmt, args); \ > trace_seq_putc(p, '\0'); \ > buf_start; \ > }) > > TP_printk("%s%s", [...], > __entry->has_error ? __print("(0x%x)", __entry->error_code) : "") > > and would be acceptable if something __print-like made it into a ftrace > helper[4]. (Userspace won't be able to nicely print it otherwise.) You mean if we add something like a __print_conditional(cond, fmt, ...); For this case you would have: TP_printk("%s%s", [...], __print_conditional(__entry->has_error, " (0x%x)", __entry->error_code)); Where __print_conditional() will return "" when "cond" is false, and will return the formatted string otherwise. That wouldn't be too hard to implement. -- Steve > > > --- > 1: #define TP_printk(fmt, args...) fmt "\n", args > 2: TP_printk(__entry->has_error ? "%s (0x%x)" : "%s", [...] > 3: Already in scsi_dispatch_cmd_start or kvm_mmu_get_page tracepoints. > 4: Like __print_hex or print_symbolic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 0/5] ARM: KVM: Enable the ioeventfd capability of KVM on ARM
The IOEVENTFD KVM capability is a prerequisite for vhost support, and is also used to implement improved interrupt handling in VFIO drivers. This series enables the ioeventfd KVM capability on ARM. The implementation routes MMIO access in the IO abort handler to the KVM IO bus. If there is already a registered ioeventfd handler for this address, the file descriptor will be triggered. We extended the KVM IO bus API to expose the VCPU struct pointer. Now the VGIC MMIO access is done through this API. For this to operate the VGIC registers a kvm_io_device which reprresents the whole MMIO region. The patches are implemented on top of the latest Andre's vGICv3 work from here: http://www.linux-arm.org/git?p=linux-ap.git;a=shortlog;h=refs/heads/kvm-gicv3/v4 The code was tested on Dual Cortex-A15 Exynos5250 (ARM Chromebook). --- Nikolay Nikolaev (5): KVM: redesing kvm_io_bus_ API to pass VCPU structure to the callbacks. ARM: on IO mem abort - route the call to KVM MMIO bus KVM: ARM VGIC add kvm_io_bus_ frontend ARM: enable linking against eventfd and irqchip ARM: enable KVM_CAP_IOEVENTFD arch/arm/kvm/Kconfig |1 + arch/arm/kvm/Makefile |2 + arch/arm/kvm/arm.c |3 ++ arch/arm/kvm/mmio.c| 32 arch/ia64/kvm/kvm-ia64.c |4 +- arch/powerpc/kvm/powerpc.c |4 +- arch/s390/kvm/diag.c |2 + arch/x86/kvm/vmx.c |2 + arch/x86/kvm/x86.c | 11 +++--- include/kvm/arm_vgic.h |3 +- include/linux/kvm_host.h | 10 +++-- virt/kvm/arm/vgic.c| 88 +--- virt/kvm/coalesced_mmio.c |5 ++- virt/kvm/eventfd.c |4 +- virt/kvm/iodev.h | 23 virt/kvm/kvm_main.c| 32 16 files changed, 163 insertions(+), 63 deletions(-) -- Signature -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 3/5] KVM: ARM VGIC add kvm_io_bus_ frontend
In io_mem_abort remove the call to vgic_handle_mmio. The target is to have a single MMIO handling path - that is through the kvm_io_bus_ API. Register a kvm_io_device in kvm_vgic_init on the whole vGIC MMIO region. Both read and write calls are redirected to vgic_io_dev_access where kvm_exit_mmio is composed to pass it to vm_ops.handle_mmio. Signed-off-by: Nikolay Nikolaev --- arch/arm/kvm/mmio.c|3 -- include/kvm/arm_vgic.h |3 +- virt/kvm/arm/vgic.c| 88 3 files changed, 74 insertions(+), 20 deletions(-) diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c index 81230da..1c44a2b 100644 --- a/arch/arm/kvm/mmio.c +++ b/arch/arm/kvm/mmio.c @@ -227,9 +227,6 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, if (mmio.is_write) mmio_write_buf(mmio.data, mmio.len, data); - if (vgic_handle_mmio(vcpu, run, &mmio)) - return 1; - if (handle_kernel_mmio(vcpu, run, &mmio)) return 1; diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index e452ef7..d9b7d2a 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -233,6 +233,7 @@ struct vgic_dist { unsigned long *irq_pending_on_cpu; struct vgic_vm_ops vm_ops; + struct kvm_io_device*io_dev; #endif }; @@ -307,8 +308,6 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num, bool level); void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg); int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu); -bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run, - struct kvm_exit_mmio *mmio); #define irqchip_in_kernel(k) (!!((k)->arch.vgic.in_kernel)) #define vgic_initialized(k)((k)->arch.vgic.ready) diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 1213da5..3da1115 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -31,6 +31,9 @@ #include #include #include +#include + +#include "iodev.h" /* * How the whole thing works (courtesy of Christoffer Dall): @@ -775,28 +778,81 @@ bool vgic_handle_mmio_range(struct kvm_vcpu *vcpu, struct kvm_run *run, return true; } -/** - * vgic_handle_mmio - handle an in-kernel MMIO access for the GIC emulation - * @vcpu: pointer to the vcpu performing the access - * @run: pointer to the kvm_run structure - * @mmio: pointer to the data describing the access - * - * returns true if the MMIO access has been performed in kernel space, - * and false if it needs to be emulated in user space. - * Calls the actual handling routine for the selected VGIC model. - */ -bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run, - struct kvm_exit_mmio *mmio) +static int vgic_io_dev_access(struct kvm_vcpu *vcpu, struct kvm_io_device *this, + gpa_t addr, int len, void *val, bool is_write) { - if (!irqchip_in_kernel(vcpu->kvm)) - return false; + struct kvm_exit_mmio mmio; + bool ret; + + mmio = (struct kvm_exit_mmio) { + .phys_addr = addr, + .len = len, + .is_write = is_write, + }; + + if (is_write) + memcpy(mmio.data, val, len); /* * This will currently call either vgic_v2_handle_mmio() or * vgic_v3_handle_mmio(), which in turn will call * vgic_handle_mmio_range() defined above. */ - return vcpu->kvm->arch.vgic.vm_ops.handle_mmio(vcpu, run, mmio); + ret = vcpu->kvm->arch.vgic.vm_ops.handle_mmio(vcpu, vcpu->run, &mmio); + + if (!is_write) + memcpy(val, mmio.data, len); + + return ret ? 0 : 1; +} + +static int vgic_io_dev_read(struct kvm_vcpu *vcpu, struct kvm_io_device *this, + gpa_t addr, int len, void *val) +{ + return vgic_io_dev_access(vcpu, this, addr, len, val, false); +} + +static int vgic_io_dev_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this, + gpa_t addr, int len, const void *val) +{ + return vgic_io_dev_access(vcpu, this, addr, len, (void *)val, true); +} + +static const struct kvm_io_device_ops vgic_io_dev_ops = { + .read = vgic_io_dev_read, + .write = vgic_io_dev_write, +}; + +static int vgic_register_kvm_io_dev(struct kvm *kvm) +{ + struct kvm_io_device *dev; + int ret; + + struct vgic_dist *dist = &kvm->arch.vgic; + unsigned long base = dist->vgic_dist_base; + + dev = kzalloc(sizeof(struct kvm_io_device), GFP_KERNEL); + if (!dev) + return -ENOMEM; + + kvm_iodevice_init(dev, &vgic_io_dev_ops); + + mutex_lock(&kvm->slots_lock); + + ret = kvm_io_bus_register_dev(kvm, KVM_MMIO_BUS, + base, KVM_VGIC_V2_DIST_SIZE, dev); + if (ret < 0) + goto out_free_d
[RFC PATCH 5/5] ARM: enable KVM_CAP_IOEVENTFD
KVM on arm will support the eventfd extension. Signed-off-by: Nikolay Nikolaev --- arch/arm/kvm/arm.c |3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index c3d0fbd..266b618 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -197,6 +197,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_MAX_VCPUS: r = KVM_MAX_VCPUS; break; + case KVM_CAP_IOEVENTFD: + r = 1; + break; default: r = kvm_arch_dev_ioctl_check_extension(ext); break; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 4/5] ARM: enable linking against eventfd and irqchip
This enables compilation of the eventfd feature on ARM. Signed-off-by: Nikolay Nikolaev --- arch/arm/kvm/Kconfig |1 + arch/arm/kvm/Makefile |2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index 466bd29..a4b0312 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -20,6 +20,7 @@ config KVM bool "Kernel-based Virtual Machine (KVM) support" select PREEMPT_NOTIFIERS select ANON_INODES + select HAVE_KVM_EVENTFD select HAVE_KVM_CPU_RELAX_INTERCEPT select KVM_MMIO select KVM_ARM_HOST diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile index 443b8be..539c1a5 100644 --- a/arch/arm/kvm/Makefile +++ b/arch/arm/kvm/Makefile @@ -15,7 +15,7 @@ AFLAGS_init.o := -Wa,-march=armv7-a$(plus_virt) AFLAGS_interrupts.o := -Wa,-march=armv7-a$(plus_virt) KVM := ../../../virt/kvm -kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o +kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o obj-y += kvm-arm.o init.o interrupts.o obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 1/5] KVM: redesing kvm_io_bus_ API to pass VCPU structure to the callbacks.
This is needed in e.g. ARM vGIC emulation, where the MMIO handling depends on the VCPU that does the access. Signed-off-by: Nikolay Nikolaev --- arch/ia64/kvm/kvm-ia64.c |4 ++-- arch/powerpc/kvm/powerpc.c |4 ++-- arch/s390/kvm/diag.c |2 +- arch/x86/kvm/vmx.c |2 +- arch/x86/kvm/x86.c | 11 ++- include/linux/kvm_host.h | 10 +- virt/kvm/coalesced_mmio.c |5 +++-- virt/kvm/eventfd.c |4 ++-- virt/kvm/iodev.h | 23 +++ virt/kvm/kvm_main.c| 32 10 files changed, 53 insertions(+), 44 deletions(-) diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index ec6b9ac..16f1d26 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -246,10 +246,10 @@ static int handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) return 0; mmio: if (p->dir) - r = kvm_io_bus_read(vcpu->kvm, KVM_MMIO_BUS, p->addr, + r = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, p->addr, p->size, &p->data); else - r = kvm_io_bus_write(vcpu->kvm, KVM_MMIO_BUS, p->addr, + r = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, p->addr, p->size, &p->data); if (r) printk(KERN_ERR"kvm: No iodevice found! addr:%lx\n", p->addr); diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index c1f8f53..5ac065b 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -814,7 +814,7 @@ int kvmppc_handle_load(struct kvm_run *run, struct kvm_vcpu *vcpu, idx = srcu_read_lock(&vcpu->kvm->srcu); - ret = kvm_io_bus_read(vcpu->kvm, KVM_MMIO_BUS, run->mmio.phys_addr, + ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, run->mmio.phys_addr, bytes, &run->mmio.data); srcu_read_unlock(&vcpu->kvm->srcu, idx); @@ -887,7 +887,7 @@ int kvmppc_handle_store(struct kvm_run *run, struct kvm_vcpu *vcpu, idx = srcu_read_lock(&vcpu->kvm->srcu); - ret = kvm_io_bus_write(vcpu->kvm, KVM_MMIO_BUS, run->mmio.phys_addr, + ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, run->mmio.phys_addr, bytes, &run->mmio.data); srcu_read_unlock(&vcpu->kvm->srcu, idx); diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c index 9254aff..329ec75 100644 --- a/arch/s390/kvm/diag.c +++ b/arch/s390/kvm/diag.c @@ -213,7 +213,7 @@ static int __diag_virtio_hypercall(struct kvm_vcpu *vcpu) * - gpr 3 contains the virtqueue index (passed as datamatch) * - gpr 4 contains the index on the bus (optionally) */ - ret = kvm_io_bus_write_cookie(vcpu->kvm, KVM_VIRTIO_CCW_NOTIFY_BUS, + ret = kvm_io_bus_write_cookie(vcpu, KVM_VIRTIO_CCW_NOTIFY_BUS, vcpu->run->s.regs.gprs[2] & 0x, 8, &vcpu->run->s.regs.gprs[3], vcpu->run->s.regs.gprs[4]); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 3e556c6..e6d9f01 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5620,7 +5620,7 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu) gpa_t gpa; gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); - if (!kvm_io_bus_write(vcpu->kvm, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) { + if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) { skip_emulated_instruction(vcpu); return 1; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0033df3..bbf9375 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4053,7 +4053,7 @@ static int vcpu_mmio_write(struct kvm_vcpu *vcpu, gpa_t addr, int len, n = min(len, 8); if (!(vcpu->arch.apic && !kvm_iodevice_write(&vcpu->arch.apic->dev, addr, n, v)) - && kvm_io_bus_write(vcpu->kvm, KVM_MMIO_BUS, addr, n, v)) + && kvm_io_bus_write(vcpu, KVM_MMIO_BUS, addr, n, v)) break; handled += n; addr += n; @@ -4072,8 +4072,9 @@ static int vcpu_mmio_read(struct kvm_vcpu *vcpu, gpa_t addr, int len, void *v) do { n = min(len, 8); if (!(vcpu->arch.apic && - !kvm_iodevice_read(&vcpu->arch.apic->dev, addr, n, v)) - && kvm_io_bus_read(vcpu->kvm, KVM_MMIO_BUS, addr, n, v)) + !kvm_iodevice_read(vcpu, &vcpu->arch.apic->dev, +addr, n, v)) + && kvm_io_bus_read(vcpu, KVM_MMIO_BUS, addr, n, v)) break; trace_kvm_mmio(KVM_TRACE_MMIO_READ, n, addr, *(u64 *)v); handled += n; @@ -4565,10 +4566,10 @@ static int kernel_pio(struct kvm_vcpu *vcpu,
[RFC PATCH 2/5] ARM: on IO mem abort - route the call to KVM MMIO bus
On IO memory abort, try to handle the MMIO access thorugh the KVM registered read/write callbacks. This is done by invoking the relevant kvm_io_bus_* API. Signed-off-by: Nikolay Nikolaev --- arch/arm/kvm/mmio.c | 33 + 1 file changed, 33 insertions(+) diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c index 4cb5a93..81230da 100644 --- a/arch/arm/kvm/mmio.c +++ b/arch/arm/kvm/mmio.c @@ -162,6 +162,36 @@ static int decode_hsr(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return 0; } +/** + * kvm_handle_mmio - handle an in-kernel MMIO access + * @vcpu: pointer to the vcpu performing the access + * @run: pointer to the kvm_run structure + * @mmio: pointer to the data describing the access + * + * returns true if the MMIO access has been performed in kernel space, + * and false if it needs to be emulated in user space. + */ +static bool handle_kernel_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run, + struct kvm_exit_mmio *mmio) +{ + int ret; + + if (mmio->is_write) { + ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, mmio->phys_addr, + mmio->len, &mmio->data); + + } else { + ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, mmio->phys_addr, + mmio->len, &mmio->data); + } + if (!ret) { + kvm_prepare_mmio(run, mmio); + kvm_handle_mmio_return(vcpu, run); + } + + return !ret; +} + int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, phys_addr_t fault_ipa) { @@ -200,6 +230,9 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, if (vgic_handle_mmio(vcpu, run, &mmio)) return 1; + if (handle_kernel_mmio(vcpu, run, &mmio)) + return 1; + kvm_prepare_mmio(run, &mmio); return 0; } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Another Obsolete Fix me in trace.h?
2014-11-24 16:19-0500, Steven Rostedt: > On Mon, 24 Nov 2014 22:00:01 +0100 > Radim Krčmář wrote: > > > 2014-11-24 11:40+0100, Jan Kiszka: > > The format string has to be a string literal[1]; we could change it to > > allow expressions[2], but what we want is almost possible through a > > direct call to trace_seq_printf()[3]. > > > > The raw result would look like > > > > #define __print(fmt, args...) ({ \ > > const char *buf_start = trace_seq_buffer_ptr(p); \ > > trace_seq_printf(p, fmt, args); \ > > trace_seq_putc(p, '\0'); \ > > buf_start; \ > > }) > > > > TP_printk("%s%s", [...], > > __entry->has_error ? __print("(0x%x)", __entry->error_code) : > > "") > > > > and would be acceptable if something __print-like made it into a ftrace > > helper[4]. (Userspace won't be able to nicely print it otherwise.) > > You mean if we add something like a __print_conditional(cond, fmt, ...); The benefit of _conditional is cleaner code? (_conditional would be possible as a #define on top of generic print, the ternary seems to be parsed correctly.) > For this case you would have: > > TP_printk("%s%s", [...], > __print_conditional(__entry->has_error, " (0x%x)", > __entry->error_code)); > > Where __print_conditional() will return "" when "cond" is false, and > will return the formatted string otherwise. (This might introduce 'const char empty[] = ""'.) > That wouldn't be too hard to implement. I'll look at the patch tommorrow. Thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RESEND PATCH v14 7/7] KVM: arm: page logging 2nd stage fault handling
On 11/24/2014 12:02 PM, Mario Smarduch wrote: > On 11/22/2014 11:53 AM, Christoffer Dall wrote: >> On Fri, Nov 14, 2014 at 01:54:44PM -0800, Mario Smarduch wrote: >>> This patch adds support for handling 2nd stage page faults during migration, >>> it disables faulting in huge pages, and dissolves huge pages to page tables. >>> In case migration is canceled huge pages are used again. >>> >>> Resending to addresse Marc's comments to simplify stage2_set_pte() handling >>> of logging, and mapping device memory - flags. >>> >>> Reviewed-by: Marc Zyngier >>> Reviewed-by: Christoffer Dall >>> Signed-off-by: Mario Smarduch >>> --- >>> arch/arm/kvm/mmu.c | 61 >>> +--- >>> 1 file changed, 53 insertions(+), 8 deletions(-) >>> >>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >>> index 8137455..d29de77 100644 >>> --- a/arch/arm/kvm/mmu.c >>> +++ b/arch/arm/kvm/mmu.c >>> @@ -47,6 +47,18 @@ static phys_addr_t hyp_idmap_vector; >>> #define kvm_pmd_huge(_x) (pmd_huge(_x) || pmd_trans_huge(_x)) >>> #define kvm_pud_huge(_x) pud_huge(_x) >>> >>> +#define KVM_S2PTE_FLAG_IS_IOMAP(1UL << 0) >>> +#define KVM_S2PTE_FLAG_LOGGING_ACTIVE (1UL << 1) >>> + >>> +static bool kvm_get_logging_state(struct kvm_memory_slot *memslot) >>> +{ >>> +#ifdef CONFIG_ARM >>> + return !!memslot->dirty_bitmap; >>> +#else >>> + return false; >>> +#endif >>> +} >>> + >>> static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa) >>> { >>> /* >>> @@ -626,10 +638,13 @@ static int stage2_set_pmd_huge(struct kvm *kvm, >>> struct kvm_mmu_memory_cache >>> } >>> >>> static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache >>> *cache, >>> - phys_addr_t addr, const pte_t *new_pte, bool iomap) >>> + phys_addr_t addr, const pte_t *new_pte, >>> + unsigned long flags) >>> { >>> pmd_t *pmd; >>> pte_t *pte, old_pte; >>> + unsigned long iomap = flags & KVM_S2PTE_FLAG_IS_IOMAP; >>> + unsigned long logging_active = flags & KVM_S2PTE_FLAG_LOGGING_ACTIVE; >>> >>> /* Create stage-2 page table mapping - Level 1 */ >>> pmd = stage2_get_pmd(kvm, cache, addr); >>> @@ -641,6 +656,18 @@ static int stage2_set_pte(struct kvm *kvm, struct >>> kvm_mmu_memory_cache *cache, >>> return 0; >>> } >>> >>> + /* >>> +* While dirty memory logging, clear PMD entry for huge page and split >>> +* into smaller pages, to track dirty memory at page granularity. >>> +*/ >>> + if (logging_active && kvm_pmd_huge(*pmd)) { >>> + phys_addr_t ipa = pmd_pfn(*pmd) << PAGE_SHIFT; >> >> just noticed this: this is not an IPA is it? pmd_pfn should give us the >> host pfn. I think you need to manipulate @addr instead. >> >> Did I manage to confuse myself? > No you're right, a *bad mistake* on my part I broke it between v8 and > v9. Not sure how it happened absent minded cut and paste? > > Also when the pmd is cleared, should that be flushed > to level where the pmd is visible to page table walks? > Or am I confusing something here? Hi Christoffer, disregard question, sorry for the unnecessary traffic. > > Thanks. >> >> (Yeah, I know I said I reviewed this one already) >> >>> + >>> + pmd_clear(pmd); >>> + kvm_tlb_flush_vmid_ipa(kvm, ipa); >>> + put_page(virt_to_page(pmd)); >>> + } >>> + >>> /* Create stage-2 page mappings - Level 2 */ >>> if (pmd_none(*pmd)) { >>> if (!cache) >>> @@ -693,7 +720,8 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t >>> guest_ipa, >>> if (ret) >>> goto out; >>> spin_lock(&kvm->mmu_lock); >>> - ret = stage2_set_pte(kvm, &cache, addr, &pte, true); >>> + ret = stage2_set_pte(kvm, &cache, addr, &pte, >>> + KVM_S2PTE_FLAG_IS_IOMAP); >>> spin_unlock(&kvm->mmu_lock); >>> if (ret) >>> goto out; >>> @@ -908,6 +936,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >>> phys_addr_t fault_ipa, >>> struct vm_area_struct *vma; >>> pfn_t pfn; >>> pgprot_t mem_type = PAGE_S2; >>> + unsigned long logging_active = 0; >>> + >>> + if (kvm_get_logging_state(memslot)) >>> + logging_active = KVM_S2PTE_FLAG_LOGGING_ACTIVE; >>> >>> write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu)); >>> if (fault_status == FSC_PERM && !write_fault) { >>> @@ -918,7 +950,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >>> phys_addr_t fault_ipa, >>> /* Let's check if we will get back a huge page backed by hugetlbfs */ >>> down_read(¤t->mm->mmap_sem); >>> vma = find_vma_intersection(current->mm, hva, hva + 1); >>> - if (is_vm_hugetlb_page(vma)) { >>> + if (is_vm_hugetlb_page(vma) && !logging_active) { >>> hugetlb = true; >>> gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
Re: [question] lots of interrupts injected to vm when pressing somekey w/o releasing
>>> Hi all, >>> >>> If I press the one of "Insert/Delete/Home/End/PageUp/PageDown/UpArrow/ >>> DownArrow/LeftArrow/RightArrow" key w/o releasing, then lots of interrupts >>> will be injected to vm(win7/win2008), about 8000/s, the system become very >>> slow, >>> bringing very bad experience. But the other keys are okay. >> >> Sorry for wrong description, the interrupt rate is normal, >> but huge numbers of vmexit induced by PIO were produced. > >This is expected when running Windows without paravirtualized time >counter (-cpu ...,hv_time). > I tested win-server-2008 with "-cpu core2duo,hv_spinlocks=0x,hv_relaxed,hv_time", this problem still happened, about 200,000 vmexits per-second, bringing very bad experience, just like being stuck. Thanks, Zhang Haoyu >Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v3 7/8] KVM: kvm-vfio: generic forwarding control
> -Original Message- > From: Eric Auger [mailto:eric.au...@linaro.org] > Sent: Monday, November 24, 2014 2:36 AM > To: eric.au...@st.com; eric.au...@linaro.org; christoffer.d...@linaro.org; > marc.zyng...@arm.com; linux-arm-ker...@lists.infradead.org; > kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org; > alex.william...@redhat.com; joel.sch...@amd.com; > kim.phill...@freescale.com; pau...@samba.org; g...@kernel.org; > pbonz...@redhat.com; ag...@suse.de > Cc: linux-ker...@vger.kernel.org; patc...@linaro.org; will.dea...@arm.com; > a.mota...@virtualopensystems.com; a.r...@virtualopensystems.com; > john.li...@huawei.com; ming@canonical.com; Wu, Feng > Subject: [PATCH v3 7/8] KVM: kvm-vfio: generic forwarding control > > This patch introduces a new KVM_DEV_VFIO_DEVICE group. > > This is a new control channel which enables KVM to cooperate with > viable VFIO devices. > > Functions are introduced to check the validity of a VFIO device > file descriptor, increment/decrement the ref counter of the VFIO > device. > > The patch introduces 2 attributes for this new device group: > KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, > KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. > Their purpose is to turn a VFIO device IRQ into a forwarded IRQ and > unset respectively unset the feature. > > The VFIO device stores a list of registered forwarded IRQs. The reference > counter of the device is incremented each time a new IRQ is forwarded. > Reference counter is decremented when the IRQ forwarding is unset. > > The forwarding programmming is architecture specific, implemented in > kvm_arch_set_fwd_state function. Architecture specific implementation is > enabled when __KVM_HAVE_ARCH_KVM_VFIO_FORWARD is set. When not > set those > functions are void. > > Signed-off-by: Eric Auger > > --- > > v2 -> v3: > - add API comments in kvm_host.h > - improve the commit message > - create a private kvm_vfio_fwd_irq struct > - fwd_irq_action replaced by a bool and removal of VFIO_IRQ_CLEANUP. This > latter action will be handled in vgic. > - add a vfio_device handle argument to kvm_arch_set_fwd_state. The goal is > to move platform specific stuff in architecture specific code. > - kvm_arch_set_fwd_state renamed into kvm_arch_vfio_set_forward > - increment the ref counter each time we do an IRQ forwarding and decrement > this latter each time one IRQ forward is unset. Simplifies the whole > ref counting. > - simplification of list handling: create, search, removal > > v1 -> v2: > - __KVM_HAVE_ARCH_KVM_VFIO renamed into > __KVM_HAVE_ARCH_KVM_VFIO_FORWARD > - original patch file separated into 2 parts: generic part moved in vfio.c > and ARM specific part(kvm_arch_set_fwd_state) > --- > include/linux/kvm_host.h | 28 ++ > virt/kvm/vfio.c | 249 > ++- > 2 files changed, 274 insertions(+), 3 deletions(-) > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index ea53b04..0b9659d 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -1076,6 +1076,15 @@ struct kvm_device_ops { > unsigned long arg); > }; > > +/* internal self-contained structure describing a forwarded IRQ */ > +struct kvm_fwd_irq { > + struct kvm *kvm; /* VM to inject the GSI into */ > + struct vfio_device *vdev; /* vfio device the IRQ belongs to */ > + __u32 index; /* VFIO device IRQ index */ > + __u32 subindex; /* VFIO device IRQ subindex */ > + __u32 gsi; /* gsi, ie. virtual IRQ number */ > +}; > + > void kvm_device_get(struct kvm_device *dev); > void kvm_device_put(struct kvm_device *dev); > struct kvm_device *kvm_device_from_filp(struct file *filp); > @@ -1085,6 +1094,25 @@ void kvm_unregister_device_ops(u32 type); > extern struct kvm_device_ops kvm_mpic_ops; > extern struct kvm_device_ops kvm_xics_ops; > > +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD > +/** > + * kvm_arch_vfio_set_forward - changes the forwarded state of an IRQ > + * > + * @fwd_irq: handle to the forwarded irq struct > + * @forward: true means forwarded, false means not forwarded > + * returns 0 on success, < 0 on failure > + */ > +int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq, > + bool forward); > + > +#else > +static inline int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq, > + bool forward) > +{ > + return 0; > +} > +#endif > + > #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT > > static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val) > diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c > index 6f0cc34..af178bb 100644 > --- a/virt/kvm/vfio.c > +++ b/virt/kvm/vfio.c > @@ -25,8 +25,16 @@ struct kvm_vfio_group { > struct vfio_group *vfio_group; > }; > > +/* private linkable kvm_fwd_irq struct */ > +struct kvm_vfio_fwd_irq_node { > + struct list_head link; > + struct kvm_fwd_irq fwd_irq; > +}; > + > struct kvm_vfio { > struct lis
Re: [question] lots of interrupts injected to vm when pressing somekey w/o releasing
On 25/11/2014 02:54, Zhang Haoyu wrote: > I tested win-server-2008 with "-cpu > core2duo,hv_spinlocks=0x,hv_relaxed,hv_time", > this problem still happened, about 200,000 vmexits per-second, > bringing very bad experience, just like being stuck. Please upload a full trace somewhere, or at least the "perf report" output. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 4/6] hw_random: fix unregister race.
On Wed, Nov 12, 2014 at 02:33:00PM +1030, Rusty Russell wrote: > Amos Kong writes: > > From: Rusty Russell > > > > The previous patch added one potential problem: we can still be > > reading from a hwrng when it's unregistered. Add a wait for zero > > in the hwrng_unregister path. > > > > v4: add cleanup_done flag to insure that cleanup is done > > That's a bit weird. The usual pattern would be to hold a reference > until we're actually finished, but this reference is a bit weird. The cleanup function is a callback function of kref_put(), we can't use the same reference count inside cleanup function. > We hold the mutex across cleanup, so we could grab that but we have to > take care sleeping inside wait_event, otherwise Peter will have to fix > my code again :) We didn't hold rng_mutex inside cleanup_rng(), am I missing something? > AFAICT the wake_woken() stuff isn't merged yet, so your patch will > have to do for now. Can you provide some patches/mail link here? I searched nothing about wake_woken. > > @@ -98,6 +99,8 @@ static inline void cleanup_rng(struct kref *kref) > > > > if (rng->cleanup) > > rng->cleanup(rng); > > + rng->cleanup_done = true; > > + wake_up_all(&rng_done); > > } > > > > static void set_current_rng(struct hwrng *rng) > > @@ -536,6 +539,11 @@ void hwrng_unregister(struct hwrng *rng) > > kthread_stop(hwrng_fill); > > } else > > mutex_unlock(&rng_mutex); > > + > > + /* Just in case rng is reading right now, wait. */ > > + wait_event(rng_done, rng->cleanup_done && > > + atomic_read(&rng->ref.refcount) == 0); > > + > > The atomic_read() isn't necessary here. > > However, you should probably init cleanup_done in hwrng_register(). > (Probably noone does unregister then register, but let's be clear). Got it. > Thanks, > Rusty. > > > } > > EXPORT_SYMBOL_GPL(hwrng_unregister); > > > > diff --git a/include/linux/hw_random.h b/include/linux/hw_random.h > > index c212e71..7832e50 100644 > > --- a/include/linux/hw_random.h > > +++ b/include/linux/hw_random.h > > @@ -46,6 +46,7 @@ struct hwrng { > > /* internal. */ > > struct list_head list; > > struct kref ref; > > + bool cleanup_done; > > }; > > > > /** Register a new Hardware Random Number Generator driver. */ > > -- > > 1.9.3 -- Amos. signature.asc Description: Digital signature