When I boot two virtio-rng devices, guest will hang
QEMU commandline: ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -m 2000 -drive file=/images/nolvm.qcow2 --kernel /home/devel/linux/arch/x86/boot/bzImage -append "ro root=/dev/sda1 console=ttyS0,115200" -monitor unix:/tmp/m,nowait,server -device virtio-net-pci,netdev=h0,vectors=17,mq=on,id=n0 -netdev tap,id=h0,queues=8 -device virtio-net-pci,netdev=h1,vectors=0,mq=on,id=n1 -netdev tap,id=h1,queues=8 -vnc :0 -mon chardev=qmp,mode=control,pretty=on -chardev socket,id=qmp,host=localhost,port=1234,server,nowait -serial stdio -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0,id=h0 -object rng-random,filename=/dev/urandom,id=rng1 -device virtio-rng-pci,rng=rng1,id=h1 It works when I only add one virtio-rng device. Did you touch this problem? I'm using latest net-next/master (ac3d2e5a9ef2f4d8f57c50070c4883ecb7cec29f) [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Initializing cgroup subsys cpuacct [0.00] Linux version 3.16.0-rc6+ (amos@z) (gcc version 4.8.3 20140624 (Red Hat 4.8.3-1) (GCC) ) #363 SMP Mon Jul 28 15:05:44 CST 2014 [0.00] Command line: ro root=/dev/sda1 console=ttyS0,115200 [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009fbff] usable [0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved [0.00] BIOS-e820: [mem 0x000f-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0x7cffdfff] usable [0.00] BIOS-e820: [mem 0x7cffe000-0x7cff] reserved [0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] reserved [0.00] BIOS-e820: [mem 0xfffc-0x] reserved [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.8 present. [0.00] Hypervisor detected: KVM [0.00] AGP: No AGP bridge found [0.00] e820: last_pfn = 0x7cffe max_arch_pfn = 0x4 [0.00] PAT not supported by CPU. [0.00] found SMP MP-table at [mem 0x000f0e90-0x000f0e9f] mapped at [880f0e90] [0.00] init_memory_mapping: [mem 0x-0x000f] [0.00] init_memory_mapping: [mem 0x7cc0-0x7cdf] [0.00] init_memory_mapping: [mem 0x7c00-0x7cbf] [0.00] init_memory_mapping: [mem 0x0010-0x7bff] [0.00] init_memory_mapping: [mem 0x7ce0-0x7cffdfff] [0.00] ACPI: Early table checksum verification disabled [0.00] ACPI: RSDP 0x000F0C90 14 (v00 BOCHS ) [0.00] ACPI: RSDT 0x7CFFFEF0 34 (v01 BOCHS BXPCRSDT 0001 BXPC 0001) [0.00] ACPI: FACP 0x7CFFF1D3 74 (v01 BOCHS BXPCFACP 0001 BXPC 0001) [0.00] ACPI: DSDT 0x7CFFE040 001193 (v01 BOCHS BXPCDSDT 0001 BXPC 0001) [0.00] ACPI: FACS 0x7CFFE000 40 [0.00] ACPI: SSDT 0x7CFFF247 000BF9 (v01 BOCHS BXPCSSDT 0001 BXPC 0001) [0.00] ACPI: APIC 0x7CFFFE40 78 (v01 BOCHS BXPCAPIC 0001 BXPC 0001) [0.00] ACPI: HPET 0x7CFFFEB8 38 (v01 BOCHS BXPCHPET 0001 BXPC 0001) [0.00] No NUMA configuration found [0.00] Faking a node at [mem 0x-0x7cffdfff] [0.00] Initmem setup node 0 [mem 0x-0x7cffdfff] [0.00] NODE_DATA [mem 0x7cfea000-0x7cffdfff] [0.00] kvm-clock: Using msrs 4b564d01 and 4b564d00 [0.00] kvm-clock: cpu 0, msr 0:7cfe8001, primary cpu clock [0.00] Zone ranges: [0.00] DMA [mem 0x1000-0x00ff] [0.00] DMA32[mem 0x0100-0x] [0.00] Normal empty [0.00] Movable zone start for each node [0.00] Early memory node ranges [0.00] node 0: [mem 0x1000-0x0009efff] [0.00] node 0: [mem 0x0010-0x7cffdfff] [0.00] ACPI: PM-Timer IO Port: 0x608 [0.00] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) [0.00] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1]) [0.00] ACPI: IOAPIC (id[0x00] address[0xfec0] gsi_base[0]) [0.00] IOAPIC[0]: apic_id 0, version 17, address 0xfec0, GSI 0-23 [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level) [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level) [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level) [0.00] Using ACPI (MADT) for SMP configuration information [0.00] ACPI: HPET id: 0x8086a201 base: 0xfed0 [0.00] smpboot: Allowing 1 CPUs, 0 hotplug CPUs [0.00] PM: Registered nosave memory: [mem 0x0009f000-0x0009] [0.00] PM
Re: When I boot two virtio-rng devices, guest will hang
On (Mon) 28 Jul 2014 [15:32:42], Amos Kong wrote: > QEMU commandline: > > ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -m 2000 -drive > file=/images/nolvm.qcow2 --kernel /home/devel/linux/arch/x86/boot/bzImage > -append "ro root=/dev/sda1 console=ttyS0,115200" -monitor > unix:/tmp/m,nowait,server -device > virtio-net-pci,netdev=h0,vectors=17,mq=on,id=n0 -netdev tap,id=h0,queues=8 > -device virtio-net-pci,netdev=h1,vectors=0,mq=on,id=n1 -netdev > tap,id=h1,queues=8 -vnc :0 -mon chardev=qmp,mode=control,pretty=on -chardev > socket,id=qmp,host=localhost,port=1234,server,nowait -serial stdio -object > rng-random,filename=/dev/urandom,id=rng0 -device > virtio-rng-pci,rng=rng0,id=h0 -object > rng-random,filename=/dev/urandom,id=rng1 -device > virtio-rng-pci,rng=rng1,id=h1 > > It works when I only add one virtio-rng device. Did you touch this > problem? > > I'm using latest net-next/master (ac3d2e5a9ef2f4d8f57c50070c4883ecb7cec29f) > [0.223503] Non-volatile memory driver v1.3 > [1.172293] tsc: Refined TSC clocksource calibration: 2893.436 MHz > qemu: terminating on signal 2 <-- (I have to cancel QEMU > process by Ctrl + C) This looks similar to what I saw when driver asks for randomness from the host before probe is completed. Does the following patch help? While the driver is setup (DRIVER_OK is set), the vqs aren't marked usable before the probe for the other device finishes as well. That's not a scenario I considered. We can try to put this patch in 3.16, but it won't be applicable for 3.17, where this is handled the proper way. Also, 3.15 isn't affected, since that doesn't have multi-device support. Also attaching a patch that enables traces in qemu so it's easier to see what's going on. diff --git a/drivers/char/hw_random/virtio-rng.c b/drivers/char/hw_random/virtio-rng.c index e9b15bc..124cac5 100644 --- a/drivers/char/hw_random/virtio-rng.c +++ b/drivers/char/hw_random/virtio-rng.c @@ -34,12 +34,11 @@ struct virtrng_info { unsigned int data_avail; struct completion have_data; bool busy; + bool probe_done; char name[25]; int index; }; -static bool probe_done; - static void random_recv_done(struct virtqueue *vq) { struct virtrng_info *vi = vq->vdev->priv; @@ -73,7 +72,7 @@ static int virtio_read(struct hwrng *rng, void *buf, -size_t size, bool wait) * Don't ask host for data till we're setup. This call can * happen during hwrng_register(), after commit d9e7972619. */ - if (unlikely(!probe_done)) + if (unlikely(!vi->probe_done)) return 0; if (!vi->busy) { @@ -146,7 +145,7 @@ static int probe_common(struct virtio_device *vdev) return err; } - probe_done = true; + vi->probe_done = true; return 0; } Amit >From ec1aa555b67628beefa0ac6902baa2cc2e156f58 Mon Sep 17 00:00:00 2001 Message-Id: From: Amit Shah Date: Mon, 21 Jul 2014 14:46:28 +0530 Subject: [PATCH 1/1] virtio-rng: add some trace events Add some trace events to virtio-rng for easier debugging Signed-off-by: Amit Shah --- hw/virtio/virtio-rng.c | 7 +++ trace-events | 5 + 2 files changed, 12 insertions(+) diff --git a/hw/virtio/virtio-rng.c b/hw/virtio/virtio-rng.c index 7c5a675..4a6472b 100644 --- a/hw/virtio/virtio-rng.c +++ b/hw/virtio/virtio-rng.c @@ -16,6 +16,7 @@ #include "hw/virtio/virtio-rng.h" #include "sysemu/rng.h" #include "qom/object_interfaces.h" +#include "trace.h" static bool is_guest_ready(VirtIORNG *vrng) { @@ -24,6 +25,7 @@ static bool is_guest_ready(VirtIORNG *vrng) && (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) { return true; } +trace_virtio_rng_guest_not_ready(vrng); return false; } @@ -62,6 +64,7 @@ static void chr_read(void *opaque, const void *buf, size_t size) offset += len; virtqueue_push(vrng->vq, &elem, len); +trace_virtio_rng_pushed(vrng, len); } virtio_notify(vdev, vrng->vq); } @@ -81,7 +84,11 @@ static void virtio_rng_process(VirtIORNG *vrng) quota = MIN((uint64_t)vrng->quota_remaining, (uint64_t)UINT32_MAX); } size = get_request_size(vrng->vq, quota); + +trace_virtio_rng_request(vrng, size, quota); + size = MIN(vrng->quota_remaining, size); + if (size) { rng_backend_request_entropy(vrng->rng, size, chr_read, vrng); } diff --git a/trace-events b/trace-events index 11a17a8..99f39ac 100644 --- a/trace-events +++ b/trace-events @@ -41,6 +41,11 @@ virtio_irq(void *vq) "vq %p" virtio_notify(void *vdev, void *vq) "vdev %p vq %p" virtio_set_status(void *vdev, uint8_t val) "vdev %p val %u" +# hw/virtio/virtio-rng.c +virtio_rng_guest_not_ready(void *rng) "rng %p: guest not ready" +virtio_rng_pushed(void *rng, size_t len) "rng %p: %zd bytes pushed" +virtio_rng_request(void *rng, size_t size, unsigned quota) "rng %p: %zd bytes
RE: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers
> -Original Message- > From: Wood Scott-B07421 > Sent: Saturday, July 26, 2014 3:11 AM > To: Caraman Mihai Claudiu-B02008 > Cc: Alexander Graf; kvm-...@vger.kernel.org; kvm@vger.kernel.org; > linuxppc-...@lists.ozlabs.org > Subject: Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for > SPE/FP/AltiVec int numbers > > On Thu, 2014-07-24 at 04:16 -0500, Caraman Mihai Claudiu-B02008 wrote: > > Scott, Alex's request to define SPE handlers only for e500v2 implies > changes > > in 32-bit FSL kernel to have exclusive configurations for e200/e500v2 > and > > e500mc/e5500. We would probably need something like this, what's your > take on it? > > That is already a compile-time decision. Yes, but is not fully implemented. > > > diff --git a/arch/powerpc/kernel/head_fsl_booke.S > b/arch/powerpc/kernel/head_fsl_booke.S > > index b497188..9d41015 100644 > > --- a/arch/powerpc/kernel/head_fsl_booke.S > > +++ b/arch/powerpc/kernel/head_fsl_booke.S > > @@ -613,6 +613,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV) > > mfspr r10, SPRN_SPRG_RSCRATCH0 > > b InstructionStorage > > > > +/* Define SPE handlers only for e500v2 and e200 */ > > +#ifndef CONFIG_PPC_E500MC > > #ifdef CONFIG_SPE > > /* SPE Unavailable */ > > START_EXCEPTION(SPEUnavailable) > > @@ -626,7 +628,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV) > > EXCEPTION(0x2020, SPE_ALTIVEC_UNAVAIL, SPEUnavailable, \ > > unknown_exception, EXC_XFER_EE) > > #endif /* CONFIG_SPE */ > > +#endif > > This is redundant: > > config SPE > bool "SPE Support" > depends on E200 || (E500 && !PPC_E500MC) > default y I see what you mean, but it's not redundant. Alex was looking to remove SPE handlers for e500mc+ and the proposal handled !SPE case. In the new light I find this variant more readable: +/* Define SPE handlers for e200 and e500v2 */ #ifdef CONFIG_SPE /* SPE Unavailable */ START_EXCEPTION(SPEUnavailable) @@ -622,11 +623,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV) b fast_exception_return 1: addir3,r1,STACK_FRAME_OVERHEAD EXC_XFER_EE_LITE(0x2010, KernelSPE) -#else +#elif defined(CONFIG_E200) || \ + (defined(CONFIG_E500) && !defined(CONFIG_PPC_E500MC)) EXCEPTION(0x2020, SPE_ALTIVEC_UNAVAIL, SPEUnavailable, \ unknown_exception, EXC_XFER_EE) #endif /* CONFIG_SPE */ > > > diff --git a/arch/powerpc/kernel/cputable.c > b/arch/powerpc/kernel/cputable.c > > index c1faade..3ab65c2 100644 > > --- a/arch/powerpc/kernel/cputable.c > > +++ b/arch/powerpc/kernel/cputable.c > > @@ -2030,6 +2030,7 @@ static struct cpu_spec __initdata cpu_specs[] = { > > #endif /* CONFIG_PPC32 */ > > #ifdef CONFIG_E500 > > #ifdef CONFIG_PPC32 > > +#ifndef CONFIG_PPC_E500MC > > { /* e500 */ > > .pvr_mask = 0x, > > .pvr_value = 0x8020, > > @@ -2069,6 +2070,7 @@ static struct cpu_spec __initdata cpu_specs[] = { > > .machine_check = machine_check_e500, > > .platform = "ppc8548", > > }, > > +#endif /* !CONFIG_PPC_E500MC */ > > { /* e500mc */ > > .pvr_mask = 0x, > > .pvr_value = 0x8023, > > > > This looks a bit strange -- e500v2 gets excluded if PPC_E500MC, but > e500mc gets included in !PPC_E500MC? Right, my main purpose was to get rid of __setup_e500_ivors on PPC_E500MC which refers SPEUnavailable. I will add an #else to exclude e500mc. The "generic E500 PPC" default cpu advertises PPC_FEATURE_HAS_SPE_COMP. Do we want to excluded it if PPC_E500MC? Is this cpu useful without cpu_setup() and if so why don't we also have a default for 64-bit? -Mike
Re: When I boot two virtio-rng devices, guest will hang
On (Mon) 28 Jul 2014 [16:49:20], Amos Kong wrote: > On Mon, Jul 28, 2014 at 01:25:14PM +0530, Amit Shah wrote: > > On (Mon) 28 Jul 2014 [15:32:42], Amos Kong wrote: > > > QEMU commandline: > > > > > > ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -m 2000 -drive > > > file=/images/nolvm.qcow2 --kernel /home/devel/linux/arch/x86/boot/bzImage > > > -append "ro root=/dev/sda1 console=ttyS0,115200" -monitor > > > unix:/tmp/m,nowait,server -device > > > virtio-net-pci,netdev=h0,vectors=17,mq=on,id=n0 -netdev > > > tap,id=h0,queues=8 -device virtio-net-pci,netdev=h1,vectors=0,mq=on,id=n1 > > > -netdev tap,id=h1,queues=8 -vnc :0 -mon > > > chardev=qmp,mode=control,pretty=on -chardev > > > socket,id=qmp,host=localhost,port=1234,server,nowait -serial stdio > > > -object rng-random,filename=/dev/urandom,id=rng0 -device > > > virtio-rng-pci,rng=rng0,id=h0 -object > > > rng-random,filename=/dev/urandom,id=rng1 -device > > > virtio-rng-pci,rng=rng1,id=h1 > > > > > > It works when I only add one virtio-rng device. Did you touch this > > > problem? > > > > > > I'm using latest net-next/master > > > (ac3d2e5a9ef2f4d8f57c50070c4883ecb7cec29f) > > Hi Amit, > > > > > > [0.223503] Non-volatile memory driver v1.3 > > > [1.172293] tsc: Refined TSC clocksource calibration: 2893.436 MHz > > > qemu: terminating on signal 2 <-- (I have to cancel > > > QEMU process by Ctrl + C) > > > > This looks similar to what I saw when driver asks for randomness from the > > host > > before probe is completed. > > > > Does the following patch help? > > This patch was already inclued in latest net-next/master > patch commit: e052dbf554610e2104c5a7518c4d8374bed701bb No, it's a different one, goes on top of the commit you referenced. Amit -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: When I boot two virtio-rng devices, guest will hang
On Mon, Jul 28, 2014 at 01:25:14PM +0530, Amit Shah wrote: > On (Mon) 28 Jul 2014 [15:32:42], Amos Kong wrote: > > QEMU commandline: > > > > ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -m 2000 -drive > > file=/images/nolvm.qcow2 --kernel /home/devel/linux/arch/x86/boot/bzImage > > -append "ro root=/dev/sda1 console=ttyS0,115200" -monitor > > unix:/tmp/m,nowait,server -device > > virtio-net-pci,netdev=h0,vectors=17,mq=on,id=n0 -netdev tap,id=h0,queues=8 > > -device virtio-net-pci,netdev=h1,vectors=0,mq=on,id=n1 -netdev > > tap,id=h1,queues=8 -vnc :0 -mon chardev=qmp,mode=control,pretty=on > > -chardev socket,id=qmp,host=localhost,port=1234,server,nowait -serial stdio > > -object rng-random,filename=/dev/urandom,id=rng0 -device > > virtio-rng-pci,rng=rng0,id=h0 -object > > rng-random,filename=/dev/urandom,id=rng1 -device > > virtio-rng-pci,rng=rng1,id=h1 > > > > It works when I only add one virtio-rng device. Did you touch this > > problem? > > > > I'm using latest net-next/master (ac3d2e5a9ef2f4d8f57c50070c4883ecb7cec29f) Hi Amit, > > > [0.223503] Non-volatile memory driver v1.3 > > [1.172293] tsc: Refined TSC clocksource calibration: 2893.436 MHz > > qemu: terminating on signal 2 <-- (I have to cancel QEMU > > process by Ctrl + C) > > This looks similar to what I saw when driver asks for randomness from the host > before probe is completed. > > Does the following patch help? This patch was already inclued in latest net-next/master patch commit: e052dbf554610e2104c5a7518c4d8374bed701bb > While the driver is setup (DRIVER_OK is set), the vqs aren't marked > usable before the probe for the other device finishes as well. That's > not a scenario I considered. We can try to put this patch in 3.16, > but it won't be applicable for 3.17, where this is handled the proper > way. Also, 3.15 isn't affected, since that doesn't have multi-device > support. > > Also attaching a patch that enables traces in qemu so it's easier to > see what's going on. > > diff --git a/drivers/char/hw_random/virtio-rng.c > b/drivers/char/hw_random/virtio-rng.c > index e9b15bc..124cac5 100644 > --- a/drivers/char/hw_random/virtio-rng.c > +++ b/drivers/char/hw_random/virtio-rng.c > @@ -34,12 +34,11 @@ struct virtrng_info { > unsigned int data_avail; > struct completion have_data; > bool busy; > + bool probe_done; > char name[25]; > int index; > }; > > -static bool probe_done; > - > static void random_recv_done(struct virtqueue *vq) > { > struct virtrng_info *vi = vq->vdev->priv; > @@ -73,7 +72,7 @@ static int virtio_read(struct hwrng *rng, void *buf, > -size_t size, bool wait) > * Don't ask host for data till we're setup. This call can > * happen during hwrng_register(), after commit d9e7972619. > */ > - if (unlikely(!probe_done)) > + if (unlikely(!vi->probe_done)) > return 0; > > if (!vi->busy) { > @@ -146,7 +145,7 @@ static int probe_common(struct virtio_device > *vdev) > return err; > } > > - probe_done = true; > + vi->probe_done = true; > return 0; > } > > > Amit > >From ec1aa555b67628beefa0ac6902baa2cc2e156f58 Mon Sep 17 00:00:00 2001 > Message-Id: > > From: Amit Shah > Date: Mon, 21 Jul 2014 14:46:28 +0530 > Subject: [PATCH 1/1] virtio-rng: add some trace events > > Add some trace events to virtio-rng for easier debugging > > Signed-off-by: Amit Shah > --- > hw/virtio/virtio-rng.c | 7 +++ > trace-events | 5 + > 2 files changed, 12 insertions(+) > > diff --git a/hw/virtio/virtio-rng.c b/hw/virtio/virtio-rng.c > index 7c5a675..4a6472b 100644 > --- a/hw/virtio/virtio-rng.c > +++ b/hw/virtio/virtio-rng.c > @@ -16,6 +16,7 @@ > #include "hw/virtio/virtio-rng.h" > #include "sysemu/rng.h" > #include "qom/object_interfaces.h" > +#include "trace.h" > > static bool is_guest_ready(VirtIORNG *vrng) > { > @@ -24,6 +25,7 @@ static bool is_guest_ready(VirtIORNG *vrng) > && (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) { > return true; > } > +trace_virtio_rng_guest_not_ready(vrng); > return false; > } > > @@ -62,6 +64,7 @@ static void chr_read(void *opaque, const void *buf, size_t > size) > offset += len; > > virtqueue_push(vrng->vq, &elem, len); > +trace_virtio_rng_pushed(vrng, len); > } > virtio_notify(vdev, vrng->vq); > } > @@ -81,7 +84,11 @@ static void virtio_rng_process(VirtIORNG *vrng) > quota = MIN((uint64_t)vrng->quota_remaining, (uint64_t)UINT32_MAX); > } > size = get_request_size(vrng->vq, quota); > + > +trace_virtio_rng_request(vrng, size, quota); > + > size = MIN(vrng->quota_remaining, size); > + > if (size) { > rng_backend_request_entropy(vrng->rng, size, chr_read, vrng); > } > diff --git a/trace-events b/trace-events > index 11a17a8.
Re: deadline of CFP for kvm forum
Il 28/07/2014 04:08, Zhang, Yang Z ha scritto: > Hi all, > > I see the deadline of CFP for KVM forum is July 27, 2014. But I found > there is no kvm forum selection list when I tried to submit a > presentation yesterday. Is the CFP closed early than expected? > > BTW, it is in July 27 in US when sending this mail. The deadline was at midnight UTC. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] target-mips: Ignore unassigned accesses with KVM
MIPS registers an unassigned access handler which raises a guest bus error exception. However this causes QEMU to crash when KVM is enabled as it isn't called from the main execution loop so longjmp() gets called without a corresponding setjmp(). Until the KVM API can be updated to trigger a guest exception in response to an MMIO exit, prevent the bus error exception being raised from mips_cpu_unassigned_access() if KVM is enabled. The check is at run time since the do_unassigned_access callback is initialised before it is known whether KVM will be enabled. The problem can be triggered with Malta emulation by making the guest write to the reset region at physical address 0x1bf0, since it is marked read-only which is treated as unassigned for writes. Signed-off-by: James Hogan Cc: Aurelien Jarno Cc: Peter Maydell Cc: Paolo Bonzini Cc: Gleb Natapov Cc: Christoffer Dall Cc: Sanjay Lal --- target-mips/op_helper.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c index 27651a4a00c1..df97b35f8701 100644 --- a/target-mips/op_helper.c +++ b/target-mips/op_helper.c @@ -21,6 +21,7 @@ #include "qemu/host-utils.h" #include "exec/helper-proto.h" #include "exec/cpu_ldst.h" +#include "sysemu/kvm.h" #ifndef CONFIG_USER_ONLY static inline void cpu_mips_tlb_flush (CPUMIPSState *env, int flush_global); @@ -2168,6 +2169,16 @@ void mips_cpu_unassigned_access(CPUState *cs, hwaddr addr, MIPSCPU *cpu = MIPS_CPU(cs); CPUMIPSState *env = &cpu->env; +/* + * Raising an exception with KVM enabled will crash because it won't be from + * the main execution loop so the longjmp won't have a matching setjmp. + * Until we can trigger a bus error exception through KVM lets just ignore + * the access. + */ +if (kvm_enabled()) { +return; +} + if (is_exec) { helper_raise_exception(env, EXCP_IBE); } else { -- 1.8.5.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/5] Improve PPC instruction emulation
On 19.07.14 12:14, Paul Mackerras wrote: This series aims to increase the range of instructions that KVM on PPC can emulate and reduce code duplication by using the existing instruction emulation code from arch/powerpc/lib/sstep.c for KVM. The ultimate goal is to make PR KVM run faster on the kind of instruction sequences that we get in Linux's first-level interrupt handlers, where we have privileged instructions such as move to/from SPR, mtmsrd, rfid, etc., intermingled with ordinary unprivileged loads, stores, arithmetic instructions, etc. If KVM could emulate those ordinary instructions as well as the privileged ones, we could emulate these code sequences without incurring the cost to exit and re-enter the guest for every single privileged instruction. That would be a speedup provided the guest entry/exit cost was greater than the cost of emulating a few ordinary instructions. This series doesn't get to that ultimate goal but does lay the groundwork. It splits the emulate_step() function into two parts, analyse_instr() and emulate_step(), and uses analyse_instr() in kvmppc_emulate_instruction(). This means that KVM needs to store its vcpu integer register state in a struct pt_regs like the rest of the kernel does. We also need to make kvmppc_handle_load() and kvmppc_handle_store() handle loads and stores to ordinary guest memory as well as emulated MMIO. Please take a look at my other patch set that implemented instruction emulation. There we split the code paths between MMIO emulation and normal instruction emulation. I really think that approach is a prerequisite to doing full instruction emulation in longer code snippets. Obviously the generic load/store should then handle MMIO as well as generic memory operations. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/5] powerpc: Split out instruction analysis part of emulate_step()
On 19.07.14 12:14, Paul Mackerras wrote: This splits out the instruction analysis part of emulate_step() into a separate analyse_instr() function, which decodes the instruction, but doesn't execute any load or store instructions. It does execute integer instructions and branches which can be executed purely by updating register values in the pt_regs struct. For other instructions, it returns the instruction type and other details in a new instruction_op struct. emulate_step() then uses that information to execute loads, stores, cache operations, mfmsr, mtmsr[d], and (on 64-bit) sc instructions. Signed-off-by: Paul Mackerras This should definitely get posted on the normal Linux PPC list. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: When I boot two virtio-rng devices, guest will hang
On Mon, Jul 28, 2014 at 02:42:13PM +0530, Amit Shah wrote: > On (Mon) 28 Jul 2014 [16:49:20], Amos Kong wrote: > > On Mon, Jul 28, 2014 at 01:25:14PM +0530, Amit Shah wrote: > > > On (Mon) 28 Jul 2014 [15:32:42], Amos Kong wrote: > > > > QEMU commandline: > > > > > > > > ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -m 2000 -drive > > > > file=/images/nolvm.qcow2 --kernel > > > > /home/devel/linux/arch/x86/boot/bzImage -append "ro root=/dev/sda1 > > > > console=ttyS0,115200" -monitor unix:/tmp/m,nowait,server -device > > > > virtio-net-pci,netdev=h0,vectors=17,mq=on,id=n0 -netdev > > > > tap,id=h0,queues=8 -device > > > > virtio-net-pci,netdev=h1,vectors=0,mq=on,id=n1 -netdev > > > > tap,id=h1,queues=8 -vnc :0 -mon chardev=qmp,mode=control,pretty=on > > > > -chardev socket,id=qmp,host=localhost,port=1234,server,nowait -serial > > > > stdio -object rng-random,filename=/dev/urandom,id=rng0 -device > > > > virtio-rng-pci,rng=rng0,id=h0 -object > > > > rng-random,filename=/dev/urandom,id=rng1 -device > > > > virtio-rng-pci,rng=rng1,id=h1 > > > > > > > > It works when I only add one virtio-rng device. Did you touch this > > > > problem? > > > > > > > > I'm using latest net-next/master > > > > (ac3d2e5a9ef2f4d8f57c50070c4883ecb7cec29f) > > > > Hi Amit, > > > > > > > > > [0.223503] Non-volatile memory driver v1.3 > > > > [1.172293] tsc: Refined TSC clocksource calibration: 2893.436 MHz > > > > qemu: terminating on signal 2 <-- (I have to cancel > > > > QEMU process by Ctrl + C) > > > > > > This looks similar to what I saw when driver asks for randomness from the > > > host > > > before probe is completed. > > > > > > Does the following patch help? > > > > This patch was already inclued in latest net-next/master > > patch commit: e052dbf554610e2104c5a7518c4d8374bed701bb > > No, it's a different one, goes on top of the commit you referenced. Thanks. This patch fixed the problem. -- Amos. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 4/5] KVM: PPC: Use analyse_instr() in kvmppc_emulate_instruction()
On 19.07.14 12:14, Paul Mackerras wrote: This changes kvmppc_emulate_instruction() to use the common instruction decoding code from arch/powerpc/lib/sstep.c. This expands the set of instructions that we recognize to include all of the integer load and store instructions except for the string (lsw*, stsw*) and multiple (lmw, stmw) instructions and reduces the total amount of code. This removes kvmppc_handle_loads() and instead adds a 'sign_extend' parameter to kvmppc_handle_load(). (In fact kvmppc_handle_loads() could not have worked previously; it sets vcpu->arch.mmio_sign_extend to 1 before calling kvmppc_handle_load, which sets it back to 0.) The instruction emulation for specific CPU flavours is largely unchanged, except that emulation of mfmsr, eieio, and (for book 3S) mtmsr[d] has moved into the generic code, and tlbsync emulation into the CPU-specific code. At this point the code still assumes that the instruction caused the most recent guest exit, and that if the instruction is a load or store, it must have been an emulated MMIO access and vcpu->arch.paddr_accessed already contains the guest physical address being accessed. Signed-off-by: Paul Mackerras --- arch/powerpc/include/asm/kvm_ppc.h | 5 +- arch/powerpc/kvm/book3s_emulate.c| 22 +-- arch/powerpc/kvm/book3s_paired_singles.c | 6 +- arch/powerpc/kvm/booke_emulate.c | 8 +- arch/powerpc/kvm/emulate.c | 317 ++- arch/powerpc/kvm/powerpc.c | 17 +- 6 files changed, 110 insertions(+), 265 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 246fb9a..9318cf3 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -54,10 +54,7 @@ extern void kvmppc_handler_highmem(void); extern void kvmppc_dump_vcpu(struct kvm_vcpu *vcpu); extern int kvmppc_handle_load(struct kvm_run *run, struct kvm_vcpu *vcpu, unsigned int rt, unsigned int bytes, - int is_default_endian); -extern int kvmppc_handle_loads(struct kvm_run *run, struct kvm_vcpu *vcpu, - unsigned int rt, unsigned int bytes, - int is_default_endian); + int is_default_endian, int sign_extend); extern int kvmppc_handle_store(struct kvm_run *run, struct kvm_vcpu *vcpu, u64 val, unsigned int bytes, int is_default_endian); diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index 84fddcd..f74b8d5 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++ b/arch/powerpc/kvm/book3s_emulate.c @@ -129,24 +129,6 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run, struct kvm_vcpu *vcpu, break; case 31: switch (get_xop(inst)) { - case OP_31_XOP_MFMSR: - kvmppc_set_gpr(vcpu, rt, kvmppc_get_msr(vcpu)); - break; - case OP_31_XOP_MTMSRD: - { - ulong rs_val = kvmppc_get_gpr(vcpu, rs); - if (inst & 0x1) { - ulong new_msr = kvmppc_get_msr(vcpu); - new_msr &= ~(MSR_RI | MSR_EE); - new_msr |= rs_val & (MSR_RI | MSR_EE); - kvmppc_set_msr_fast(vcpu, new_msr); - } else - kvmppc_set_msr(vcpu, rs_val); - break; - } - case OP_31_XOP_MTMSR: - kvmppc_set_msr(vcpu, kvmppc_get_gpr(vcpu, rs)); - break; case OP_31_XOP_MFSR: { int srnum; @@ -189,6 +171,8 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run, struct kvm_vcpu *vcpu, vcpu->arch.mmu.tlbie(vcpu, addr, large); break; } + case OP_31_XOP_TLBSYNC: + break; #ifdef CONFIG_PPC_BOOK3S_64 case OP_31_XOP_FAKE_SC1: { @@ -217,8 +201,6 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run, struct kvm_vcpu *vcpu, break; } #endif - case OP_31_XOP_EIOIO: - break; case OP_31_XOP_SLBMTE: if (!vcpu->arch.mmu.slbmte) return EMULATE_FAIL; diff --git a/arch/powerpc/kvm/book3s_paired_singles.c b/arch/powerpc/kvm/book3s_paired_singles.c index 6c8011f..a5bde19 100644 --- a/arch/powerpc/kvm/book3s_paired_singles.c +++ b/arch/powerpc/kvm/book3s_paired_singles.c @@ -200,7 +200,7 @@ static int kvmppc_emulate_fpr_load(struct kvm_run *run, struct kvm_vcpu *vcpu, goto done_load; } else if
Re: [RFC PATCH 5/5] KVM: PPC: Book3S: Make kvmppc_handle_load/store handle any load or store
On 19.07.14 12:14, Paul Mackerras wrote: At present, kvmppc_handle_load and kvmppc_handle_store only handle emulated MMIO loads and stores. This extends them to be able to handle loads and stores to guest memory as well. This is so that kvmppc_emulate_instruction can be used to emulate loads and stores in cases other than when an attempt to execute the instruction by the CPU has resulted in an interrupt. To avoid having to look up the translation for the effective address again in kvmppc_handle_load/store when the caller of kvmppc_emulate_mmio has already done it, we arrange to pass down the translation in a new struct kvmppc_translated_address, which is a new argument to kvmppc_emulate_mmio() and kvmppc_emulate_instruction(). This also enables us to check that the guest hasn't replaced a load with a store instruction. This also makes the register updates for the paired-single FPU registers match for emulated MMIO accesses what is done for accesses to normal memory. The new code for accessing normal guest memory uses kvmppc_ld and kvmppc_st, which call kvmppc_xlate, which is only defined for Book 3S. For Book E, kvmppc_handle_load/store still only work for emulated MMIO. Signed-off-by: Paul Mackerras Please check out my other patch set where I made kvmppc_ld/st available for BookE and also split the MMIO path off completely. Since we do want to take the shortcut through paddr that we only know for memory traps, I really think we're better off treating that whole optimized code path as a separate piece. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] KVM: PPC: Book3S: Make kvmppc_ld return a more accurate error indication
On 19.07.14 09:59, Paul Mackerras wrote: At present, kvmppc_ld calls kvmppc_xlate, and if kvmppc_xlate returns any error indication, it returns -ENOENT, which is taken to mean an HPTE not found error. However, the error could have been a segment found (no SLB entry) or a permission error. Similarly, kvmppc_pte_to_hva currently does permission checking, but any error from it is taken by kvmppc_ld to mean that the access is an emulated MMIO access. Also, kvmppc_ld does no execute permission checking. This fixes these problems by (a) returning any error from kvmppc_xlate directly, (b) moving the permission check from kvmppc_pte_to_hva into kvmppc_ld, and (c) adding an execute permission check to kvmppc_ld. This is similar to what was done for kvmppc_st() by commit 82ff911317c3 ("KVM: PPC: Deflect page write faults properly in kvmppc_st"). Signed-off-by: Paul Mackerras --- arch/powerpc/kvm/book3s.c | 25 - 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 31facfc..087f8f9 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -413,17 +413,10 @@ static hva_t kvmppc_bad_hva(void) return PAGE_OFFSET; } -static hva_t kvmppc_pte_to_hva(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte, - bool read) +static hva_t kvmppc_pte_to_hva(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte) { hva_t hpage; - if (read && !pte->may_read) - goto err; - - if (!read && !pte->may_write) - goto err; - hpage = gfn_to_hva(vcpu->kvm, pte->raddr >> PAGE_SHIFT); if (kvm_is_error_hva(hpage)) goto err; @@ -462,15 +455,23 @@ int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, { struct kvmppc_pte pte; hva_t hva = *eaddr; + int rc; vcpu->stat.ld++; - if (kvmppc_xlate(vcpu, *eaddr, data, false, &pte)) - goto nopte; + rc = kvmppc_xlate(vcpu, *eaddr, data, false, &pte); + if (rc) + return rc; *eaddr = pte.raddr; - hva = kvmppc_pte_to_hva(vcpu, &pte, true); + if (!pte.may_read) + return -EPERM; + + if (!data && !pte.may_execute) + return -ENOEXEC; We should probably do a full audit of all that code and decide who is responsible for returning errors where. IIRC our MMU frontends already check pte.may* and return error codes respectively. However, double-checking doesn't hurt for now, so I've applied this patch regardless. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Fixes for PPC Book3S KVM
On 19.07.14 09:59, Paul Mackerras wrote: Here are three small fixes for the PPC Book 3S code. The first should go into 3.16 if possible, I think, or if not, certainly 3.17. The remaining two are less urgent and should go into 3.17. The patch series is against Alex Graf's kvm-ppc-queue branch. Thanks, applied all to kvm-ppc-queue with the first two marked as CC: stable. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] KVM: PPC: BOOK3S: PR: Emulate virtual timebase register
On 06.06.14 18:27, Aneesh Kumar K.V wrote: Alexander Graf writes: On 05.06.14 14:08, Aneesh Kumar K.V wrote: virtual time base register is a per VM, per cpu register that needs to be saved and restored on vm exit and entry. Writing to VTB is not allowed in the privileged mode. Signed-off-by: Aneesh Kumar K.V For some reason BUG() doesn't always trigger the "execution stops here" logic in gcc. So I've squashed this patch into yours. Alex diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index 3e7085d..99de6ad 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -1206,6 +1206,7 @@ static inline unsigned long mfvtb (void) * capture that. */ BUG(); +return 0; } #ifdef __powerpc64__ you can then drop the include header change. ie, #include Yeah, things are even worse than I thought. I've eventually squashed the following in. a NOP'ed mfspr() won't keep the branch from blr'ing, so we'd never hit the BUG() anyway. Alex diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index 1f34ef7..c8f3381 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -25,7 +25,6 @@ #ifdef CONFIG_8xx #include #endif /* CONFIG_8xx */ -#include #define MSR_SF_LG 63 /* Enable 64 bit mode */ #define MSR_ISF_LG 61 /* Interrupt 64b mode valid on 630 */ @@ -1210,12 +1209,6 @@ static inline unsigned long mfvtb (void) if (cpu_has_feature(CPU_FTR_ARCH_207S)) return mfspr(SPRN_VTB); #endif - /* -* The above mfspr will be a no-op on anything before Power8 -* That can result in random values returned. We need to -* capture that. -*/ - BUG(); return 0; } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC 0/5] s390x/kvm: track the logical cpu state in QEMU and propagate it to kvm
On 10.07.14 15:27, David Hildenbrand wrote: This is the qemu part of kernel series "Let user space control the cpu states" Christian Borntraeger (1): update linux headers with with cpustate changes David Hildenbrand (4): s390x/kvm: introduce proper states for s390 cpus s390x/kvm: proper use of the cpu states OPERATING and STOPPED s390x/kvm: test whether a cpu is STOPPED when checking "has_work" s390x/kvm: propagate s390 cpu state to kvm hw/s390x/ipl.c| 2 +- hw/s390x/s390-virtio.c| 32 -- linux-headers/linux/kvm.h | 7 ++- target-s390x/cpu.c| 106 +++--- target-s390x/cpu.h| 33 +-- target-s390x/helper.c | 11 ++--- target-s390x/kvm.c| 49 ++--- trace-events | 6 +++ 8 files changed, 179 insertions(+), 67 deletions(-) Looks good to me -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html @all thought it was the final internal review :) It's a perfectly good thing to say "looks good to me" in public too. The only major difference is that usually you would say "reviewed-by" ;). Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH/RFC 0/5] s390x/kvm: track the logical cpu state in QEMU and propagate it to kvm
On 28.07.2014, at 15:43, Alexander Graf wrote: > > On 10.07.14 15:27, David Hildenbrand wrote: This is the qemu part of kernel series "Let user space control the cpu states" Christian Borntraeger (1): update linux headers with with cpustate changes David Hildenbrand (4): s390x/kvm: introduce proper states for s390 cpus s390x/kvm: proper use of the cpu states OPERATING and STOPPED s390x/kvm: test whether a cpu is STOPPED when checking "has_work" s390x/kvm: propagate s390 cpu state to kvm hw/s390x/ipl.c| 2 +- hw/s390x/s390-virtio.c| 32 -- linux-headers/linux/kvm.h | 7 ++- target-s390x/cpu.c| 106 +++--- target-s390x/cpu.h| 33 +-- target-s390x/helper.c | 11 ++--- target-s390x/kvm.c| 49 ++--- trace-events | 6 +++ 8 files changed, 179 insertions(+), 67 deletions(-) >>> Looks good to me >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe kvm" in >>> the body of a message to majord...@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> @all thought it was the final internal review :) > > It's a perfectly good thing to say "looks good to me" in public too. The only > major difference is that usually you would say "reviewed-by" ;). Meh - only realized after I sent this that all those patches are From: you. Then of course it's not useful ;) Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking "has_work"
On 10.07.14 15:10, Christian Borntraeger wrote: From: David Hildenbrand If a cpu is stopped, it must never be allowed to run and no interrupt may wake it up. A cpu also has to be unhalted if it is halted and has work to do - this scenario wasn't hit in kvm case yet, as only "disabled wait" is processed within QEMU. Signed-off-by: David Hildenbrand Reviewed-by: Cornelia Huck Reviewed-by: Christian Borntraeger Signed-off-by: Christian Borntraeger This looks like it's something that generic infrastructure should take care of, no? How does this work for the other archs? They always get an interrupt on the transition between !has_work -> has_work. Why don't we get one for s390x? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception
On 11.07.14 10:39, Bharat Bhushan wrote: This patch emulates debug registers and debug exception to support guest using debug resource. This enables running gdb/kgdb etc in guest. On BOOKE architecture we cannot share debug resources between QEMU and guest because: When QEMU is using debug resources then debug exception must be always enabled. To achieve this we set MSR_DE and also set MSRP_DEP so guest cannot change MSR_DE. When emulating debug resource for guest we want guest to control MSR_DE (enable/disable debug interrupt on need). So above mentioned two configuration cannot be supported at the same time. So the result is that we cannot share debug resources between QEMU and Guest on BOOKE architecture. In the current design QEMU gets priority over guest, this means that if QEMU is using debug resources then guest cannot use them and if guest is using debug resource then QEMU can overwrite them. Signed-off-by: Bharat Bhushan --- Hi Alex, I thought of having some print in register emulation if QEMU is using debug resource, Also when QEMU overwrites guest written values but that looks excessive. If I uses some variable which get set when guest starts using debug registers and check in debug set ioctl then that look ugly. Looking for suggestions Whatever you do, have QEMU do the print, not the kernel. arch/powerpc/include/asm/kvm_ppc.h | 3 + arch/powerpc/kvm/booke.c | 27 +++ arch/powerpc/kvm/booke_emulate.c | 157 + 3 files changed, 187 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index e2fd5a1..f3f7611 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -173,6 +173,9 @@ extern int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 *server, extern int kvmppc_xics_int_on(struct kvm *kvm, u32 irq); extern int kvmppc_xics_int_off(struct kvm *kvm, u32 irq); +void kvmppc_core_dequeue_debug(struct kvm_vcpu *vcpu); +void kvmppc_core_queue_debug(struct kvm_vcpu *vcpu); + union kvmppc_one_reg { u32 wval; u64 dval; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index fadfe76..c2471ed 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -264,6 +264,16 @@ static void kvmppc_core_dequeue_watchdog(struct kvm_vcpu *vcpu) clear_bit(BOOKE_IRQPRIO_WATCHDOG, &vcpu->arch.pending_exceptions); } +void kvmppc_core_queue_debug(struct kvm_vcpu *vcpu) +{ + kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_DEBUG); +} + +void kvmppc_core_dequeue_debug(struct kvm_vcpu *vcpu) +{ + clear_bit(BOOKE_IRQPRIO_DEBUG, &vcpu->arch.pending_exceptions); +} + static void set_guest_srr(struct kvm_vcpu *vcpu, unsigned long srr0, u32 srr1) { #ifdef CONFIG_KVM_BOOKE_HV @@ -783,6 +793,23 @@ static int kvmppc_handle_debug(struct kvm_run *run, struct kvm_vcpu *vcpu) struct debug_reg *dbg_reg = &(vcpu->arch.shadow_dbg_reg); u32 dbsr = vcpu->arch.dbsr; + if (vcpu->guest_debug == 0) { + /* Debug resources belong to Guest */ + if (dbsr && (vcpu->arch.shared->msr & MSR_DE)) + kvmppc_core_queue_debug(vcpu); + + /* Inject a program interrupt if trap debug is not allowed */ + if ((dbsr & DBSR_TIE) && !(vcpu->arch.shared->msr & MSR_DE)) + kvmppc_core_queue_program(vcpu, ESR_PTR); In that case we would've received a program interrupt and never entered this code path, no? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page
On 27.06.14 15:35, Aneesh Kumar K.V wrote: When calculating the lower bits of AVA field, use the shift count based on the base page size. Also add the missing segment size and remove stale comment. Signed-off-by: Aneesh Kumar K.V Thanks, applied to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/6] KVM: PPC: BOOKE: Force MSR_DE in rfci if guest is under debug
On 11.07.14 10:38, Bharat Bhushan wrote: When userspace (QEMU) is using the debug resource to debug guest then we want MSR_DE to be always set. This patch adds missing MSR_DE setting in "rfci" instruction. Signed-off-by: Bharat Bhushan Shouldn't this be in kvmppc_set_msr() instead then to catch all users? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking "has_work"
> > On 10.07.14 15:10, Christian Borntraeger wrote: > > From: David Hildenbrand > > > > If a cpu is stopped, it must never be allowed to run and no interrupt may > > wake it > > up. A cpu also has to be unhalted if it is halted and has work to do - this > > scenario wasn't hit in kvm case yet, as only "disabled wait" is processed > > within > > QEMU. > > > > Signed-off-by: David Hildenbrand > > Reviewed-by: Cornelia Huck > > Reviewed-by: Christian Borntraeger > > Signed-off-by: Christian Borntraeger > > This looks like it's something that generic infrastructure should take > care of, no? How does this work for the other archs? They always get an > interrupt on the transition between !has_work -> has_work. Why don't we > get one for s390x? > > > Alex > > Well, we have the special case on s390 as a CPU that is in the STOPPED or the CHECK STOP state may never run - even if there is an interrupt. It's basically like this CPU has been switched off. Imagine that it is tried to inject an interrupt into a stopped vcpu. It will kick the stopped vcpu and thus lead to a call to "kvm_arch_process_async_events()". We have to deny that this vcpu will ever run as long as it is stopped. It's like a way to "suppress" the interrupt for such a transition you mentioned. Later, another vcpu might decide to turn that vcpu back on (by e.g. sending a SIGP START to that vcpu). I am not sure if such a mechanism/scenario is applicable to any other arch. They all seem to reset the cs->halted flag if they know they are able to run (e.g. due to an interrupt) - they have no such thing as "stopped cpus", only "halted/waiting cpus". David -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/33] KVM: PPC: Fix IRQ race in magic page code
On 25.06.14 02:21, Scott Wood wrote: On Wed, 2014-06-25 at 01:40 +0200, Alexander Graf wrote: On 25.06.14 01:15, Scott Wood wrote: On Wed, 2014-06-25 at 00:41 +0200, Alexander Graf wrote: On 24.06.14 20:53, Scott Wood wrote: The timer interrupt works, but I'm not fully convinced that it's a good idea for things like MC events which we also block during critical sections on e500v2. Are you concerned about the guest seeing machine checks that are (more) asynchronous with the error condition? e500v2 machine checks are always asynchronous. From the core manual: Machine check interrupts are typically caused by a hardware or memory subsystem failure or by an attempt to access an invalid address. They may be caused indirectly by execution of an instruction, but may not be recognized or reported until long after the processor has executed past the instruction that caused the machine check. As such, machine check interrupts are not thought of as synchronous or asynchronous nor as precise or imprecise. I don't think the lag would be a problem, and certainly it's better than the current situation. So what value would you set the timer to? If the value is too small, we never finish the critical section. If it's too big, we add lots of jitter. Maybe something like 100us? Single stepping would be better, though. Single stepping is hard enough to get right on interaction between QEMU, KVM and the guest. I didn't really want to make that stuff any more complicated. I'm not sure that it would add much complexity. We'd just need to check whether any source other than the magic page turned wants DCBR0_IC on, to determine whether to exit to userspace or not. What if the guest is single stepping itself? How do we determine when to unset the bit again? When we get out of the critical section? How do we know what the value was before we set it? Keep track of each requester of single stepping separately, and only ever set the real bit by ORing them. Considering that Paul started working on integrating the in-kernel emulator with KVM I think we're best off to just wait for that one and then use it :). Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking "has_work"
Il 28/07/2014 16:16, David Hildenbrand ha scritto: > Later, another vcpu might decide to turn that vcpu back on (by e.g. sending a > SIGP START to that vcpu). > > I am not sure if such a mechanism/scenario is applicable to any other arch. > They > all seem to reset the cs->halted flag if they know they are able to run (e.g. > due to an interrupt) - they have no such thing as "stopped cpus", only > "halted/waiting cpus". On x86, INIT_RECEIVED is pretty much a stopped CPU. It can only run (and receive interrupts) after getting a special startup interrupt ("SIPI"). Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking "has_work"
On 28.07.2014, at 16:16, David Hildenbrand wrote: >> >> On 10.07.14 15:10, Christian Borntraeger wrote: >>> From: David Hildenbrand >>> >>> If a cpu is stopped, it must never be allowed to run and no interrupt may >>> wake it >>> up. A cpu also has to be unhalted if it is halted and has work to do - this >>> scenario wasn't hit in kvm case yet, as only "disabled wait" is processed >>> within >>> QEMU. >>> >>> Signed-off-by: David Hildenbrand >>> Reviewed-by: Cornelia Huck >>> Reviewed-by: Christian Borntraeger >>> Signed-off-by: Christian Borntraeger >> >> This looks like it's something that generic infrastructure should take >> care of, no? How does this work for the other archs? They always get an >> interrupt on the transition between !has_work -> has_work. Why don't we >> get one for s390x? >> >> >> Alex >> >> > > Well, we have the special case on s390 as a CPU that is in the STOPPED or the > CHECK STOP state may never run - even if there is an interrupt. It's > basically like this CPU has been switched off. > > Imagine that it is tried to inject an interrupt into a stopped vcpu. It > will kick the stopped vcpu and thus lead to a call to > "kvm_arch_process_async_events()". We have to deny that this vcpu will ever > run as long as it is stopped. It's like a way to "suppress" the > interrupt for such a transition you mentioned. An interrupt kick usually just means we go back into the main loop. From there we check the interrupt bitmap which interrupt to handle. Check out the handling code here: http://git.qemu.org/?p=qemu.git;a=blob;f=cpu-exec.c;h=38e5f02a307523d99134f4e2e6c51683bb10b45b;hb=HEAD#l580 If you just check for the stopped state in here, do_interrupt() will never get called and thus the CPU shouldn't ever get executed. Unless I'm heavily mistaken :). > > Later, another vcpu might decide to turn that vcpu back on (by e.g. sending a > SIGP START to that vcpu). Yes, in that case that other CPU generates a signal (a different bit in interrupt_request) and the first CPU would see that it has to wake up and wake up. > I am not sure if such a mechanism/scenario is applicable to any other arch. > They > all seem to reset the cs->halted flag if they know they are able to run (e.g. > due to an interrupt) - they have no such thing as "stopped cpus", only > "halted/waiting cpus". There's not really much difference between the two. The only difference from a software point of view is that a "stopped" CPU has its external interrupt bits masked off, no? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/7] KVM: PPC: Separate loadstore emulation from priv emulation
Today the instruction emulator can get called via 2 separate code paths. It can either be called by MMIO emulation detection code or by privileged instruction traps. This is bad, as both code paths prepare the environment differently. For MMIO emulation we already know the virtual address we faulted on, so instructions there don't have to actually fetch that information. Split out the two separate use cases into separate files. Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_ppc.h | 1 + arch/powerpc/kvm/Makefile| 4 +- arch/powerpc/kvm/emulate.c | 192 + arch/powerpc/kvm/emulate_loadstore.c | 266 +++ arch/powerpc/kvm/powerpc.c | 2 +- 5 files changed, 272 insertions(+), 193 deletions(-) create mode 100644 arch/powerpc/kvm/emulate_loadstore.c diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 17fa277..2214ee6 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -86,6 +86,7 @@ extern int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data); extern int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu); +extern int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu); extern int kvmppc_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu); extern void kvmppc_emulate_dec(struct kvm_vcpu *vcpu); extern u32 kvmppc_get_dec(struct kvm_vcpu *vcpu, u64 tb); diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile index 777f894..1ccd7a1 100644 --- a/arch/powerpc/kvm/Makefile +++ b/arch/powerpc/kvm/Makefile @@ -13,8 +13,9 @@ common-objs-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \ CFLAGS_e500_mmu.o := -I. CFLAGS_e500_mmu_host.o := -I. CFLAGS_emulate.o := -I. +CFLAGS_emulate_loadstore.o := -I. -common-objs-y += powerpc.o emulate.o +common-objs-y += powerpc.o emulate.o emulate_loadstore.o obj-$(CONFIG_KVM_EXIT_TIMING) += timing.o obj-$(CONFIG_KVM_BOOK3S_HANDLER) += book3s_exports.o @@ -91,6 +92,7 @@ kvm-book3s_64-module-objs += \ $(KVM)/eventfd.o \ powerpc.o \ emulate.o \ + emulate_loadstore.o \ book3s.o \ book3s_64_vio.o \ book3s_rtas.o \ diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c index c5c64b6..e96b50d 100644 --- a/arch/powerpc/kvm/emulate.c +++ b/arch/powerpc/kvm/emulate.c @@ -207,25 +207,12 @@ static int kvmppc_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, int rt) return emulated; } -/* XXX to do: - * lhax - * lhaux - * lswx - * lswi - * stswx - * stswi - * lha - * lhau - * lmw - * stmw - * - */ /* XXX Should probably auto-generate instruction decoding for a particular core * from opcode tables in the future. */ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu) { u32 inst; - int ra, rs, rt, sprn; + int rs, rt, sprn; enum emulation_result emulated; int advance = 1; @@ -238,7 +225,6 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu) pr_debug("Emulating opcode %d / %d\n", get_op(inst), get_xop(inst)); - ra = get_ra(inst); rs = get_rs(inst); rt = get_rt(inst); sprn = get_sprn(inst); @@ -270,200 +256,24 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu) #endif advance = 0; break; - case OP_31_XOP_LWZX: - emulated = kvmppc_handle_load(run, vcpu, rt, 4, 1); - break; - - case OP_31_XOP_LBZX: - emulated = kvmppc_handle_load(run, vcpu, rt, 1, 1); - break; - - case OP_31_XOP_LBZUX: - emulated = kvmppc_handle_load(run, vcpu, rt, 1, 1); - kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed); - break; - - case OP_31_XOP_STWX: - emulated = kvmppc_handle_store(run, vcpu, - kvmppc_get_gpr(vcpu, rs), - 4, 1); - break; - - case OP_31_XOP_STBX: - emulated = kvmppc_handle_store(run, vcpu, - kvmppc_get_gpr(vcpu, rs), - 1, 1); - break; - - case OP_31_XOP_STBUX: - emulated = kvmppc_handle_store(run, vcpu, - kvmppc_get_gpr(vcpu, rs), - 1, 1); - kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed); - break; - -
[PATCH 3/7] KVM: PPC: Remove kvmppc_bad_hva()
We have a proper define for invalid HVA numbers. Use those instead of the ppc specific kvmppc_bad_hva(). Signed-off-by: Alexander Graf --- arch/powerpc/kvm/powerpc.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 2c5a1c3..3d59730 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -309,11 +309,6 @@ int kvmppc_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvmppc_emulate_mmio); -static hva_t kvmppc_bad_hva(void) -{ - return PAGE_OFFSET; -} - static hva_t kvmppc_pte_to_hva(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte) { hva_t hpage; @@ -324,7 +319,7 @@ static hva_t kvmppc_pte_to_hva(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte) return hpage | (pte->raddr & ~PAGE_MASK); err: - return kvmppc_bad_hva(); + return KVM_HVA_ERR_BAD; } int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/7] KVM: PPC: Handle magic page in kvmppc_ld/st
We use kvmppc_ld and kvmppc_st to emulate load/store instructions that may as well access the magic page. Special case it out so that we can properly access it. Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_book3s.h | 7 +++ arch/powerpc/include/asm/kvm_booke.h | 10 ++ arch/powerpc/kvm/powerpc.c| 22 ++ 3 files changed, 39 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 172fd6d..6166791 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -286,6 +286,13 @@ static inline bool is_kvmppc_resume_guest(int r) return (r == RESUME_GUEST || r == RESUME_GUEST_NV); } +static inline bool is_kvmppc_hv_enabled(struct kvm *kvm); +static inline bool kvmppc_supports_magic_page(struct kvm_vcpu *vcpu) +{ + /* Only PR KVM supports the magic page */ + return !is_kvmppc_hv_enabled(vcpu->kvm); +} + /* Magic register values loaded into r3 and r4 before the 'sc' assembly * instruction for the OSI hypercalls */ #define OSI_SC_MAGIC_R30x113724FA diff --git a/arch/powerpc/include/asm/kvm_booke.h b/arch/powerpc/include/asm/kvm_booke.h index cbb1990..f7aa5cc 100644 --- a/arch/powerpc/include/asm/kvm_booke.h +++ b/arch/powerpc/include/asm/kvm_booke.h @@ -103,4 +103,14 @@ static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu) { return vcpu->arch.fault_dear; } + +static inline bool kvmppc_supports_magic_page(struct kvm_vcpu *vcpu) +{ + /* Magic page is only supported on e500v2 */ +#ifdef CONFIG_KVM_E500V2 + return true; +#else + return false; +#endif +} #endif /* __ASM_KVM_BOOKE_H__ */ diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index be40886..544d1d3 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -312,6 +312,7 @@ EXPORT_SYMBOL_GPL(kvmppc_emulate_mmio); int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data) { + ulong mp_pa = vcpu->arch.magic_page_pa & KVM_PAM & PAGE_MASK; struct kvmppc_pte pte; int r; @@ -327,6 +328,16 @@ int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, if (!pte.may_write) return -EPERM; + /* Magic page override */ + if (kvmppc_supports_magic_page(vcpu) && mp_pa && + ((pte.raddr & KVM_PAM & PAGE_MASK) == mp_pa) && + !(kvmppc_get_msr(vcpu) & MSR_PR)) { + void *magic = vcpu->arch.shared; + magic += pte.eaddr & 0xfff; + memcpy(magic, ptr, size); + return EMULATE_DONE; + } + if (kvm_write_guest(vcpu->kvm, pte.raddr, ptr, size)) return EMULATE_DO_MMIO; @@ -337,6 +348,7 @@ EXPORT_SYMBOL_GPL(kvmppc_st); int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data) { + ulong mp_pa = vcpu->arch.magic_page_pa & KVM_PAM & PAGE_MASK; struct kvmppc_pte pte; int rc; @@ -355,6 +367,16 @@ int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, if (!data && !pte.may_execute) return -ENOEXEC; + /* Magic page override */ + if (kvmppc_supports_magic_page(vcpu) && mp_pa && + ((pte.raddr & KVM_PAM & PAGE_MASK) == mp_pa) && + !(kvmppc_get_msr(vcpu) & MSR_PR)) { + void *magic = vcpu->arch.shared; + magic += pte.eaddr & 0xfff; + memcpy(ptr, magic, size); + return EMULATE_DONE; + } + if (kvm_read_guest(vcpu->kvm, pte.raddr, ptr, size)) return EMULATE_DO_MMIO; -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/7] KVM: PPC: Use kvm_read_guest in kvmppc_ld
We have a nice and handy helper to read from guest physical address space, so we should make use of it in kvmppc_ld as we already do for its counterpart in kvmppc_st. Signed-off-by: Alexander Graf --- arch/powerpc/kvm/powerpc.c | 27 ++- 1 file changed, 2 insertions(+), 25 deletions(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 3d59730..be40886 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -309,19 +309,6 @@ int kvmppc_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvmppc_emulate_mmio); -static hva_t kvmppc_pte_to_hva(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte) -{ - hva_t hpage; - - hpage = gfn_to_hva(vcpu->kvm, pte->raddr >> PAGE_SHIFT); - if (kvm_is_error_hva(hpage)) - goto err; - - return hpage | (pte->raddr & ~PAGE_MASK); -err: - return KVM_HVA_ERR_BAD; -} - int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data) { @@ -351,7 +338,6 @@ int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data) { struct kvmppc_pte pte; - hva_t hva = *eaddr; int rc; vcpu->stat.ld++; @@ -369,19 +355,10 @@ int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, if (!data && !pte.may_execute) return -ENOEXEC; - hva = kvmppc_pte_to_hva(vcpu, &pte); - if (kvm_is_error_hva(hva)) - goto mmio; - - if (copy_from_user(ptr, (void __user *)hva, size)) { - printk(KERN_INFO "kvmppc_ld at 0x%lx failed\n", hva); - goto mmio; - } + if (kvm_read_guest(vcpu->kvm, pte.raddr, ptr, size)) + return EMULATE_DO_MMIO; return EMULATE_DONE; - -mmio: - return EMULATE_DO_MMIO; } EXPORT_SYMBOL_GPL(kvmppc_ld); -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/7] KVM: PPC: Move kvmppc_ld/st to common code
We have enough common infrastructure now to resolve GVA->GPA mappings at runtime. With this we can move our book3s specific helpers to load / store in guest virtual address space to common code as well. Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_book3s.h | 2 +- arch/powerpc/include/asm/kvm_host.h | 4 +- arch/powerpc/include/asm/kvm_ppc.h| 4 ++ arch/powerpc/kvm/book3s.c | 81 --- arch/powerpc/kvm/powerpc.c| 81 +++ 5 files changed, 88 insertions(+), 84 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index a86ca65..172fd6d 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -148,8 +148,8 @@ extern void kvmppc_mmu_hpte_sysexit(void); extern int kvmppc_mmu_hv_init(void); extern int kvmppc_book3s_hcall_implemented(struct kvm *kvm, unsigned long hc); +/* XXX remove this export when load_last_inst() is generic */ extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data); -extern int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data); extern void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec); extern void kvmppc_book3s_dequeue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec); diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 11385bb..66f5b59 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -111,15 +111,15 @@ struct kvm_vcpu_stat { u32 halt_wakeup; u32 dbell_exits; u32 gdbell_exits; + u32 ld; + u32 st; #ifdef CONFIG_PPC_BOOK3S u32 pf_storage; u32 pf_instruc; u32 sp_storage; u32 sp_instruc; u32 queue_intr; - u32 ld; u32 ld_slow; - u32 st; u32 st_slow; #endif }; diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 1a60af9..17fa277 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -80,6 +80,10 @@ extern int kvmppc_handle_store(struct kvm_run *run, struct kvm_vcpu *vcpu, extern int kvmppc_load_last_inst(struct kvm_vcpu *vcpu, enum instruction_type type, u32 *inst); +extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, +bool data); +extern int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, +bool data); extern int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu); extern int kvmppc_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu); diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 0b6c84e..de8da33 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -410,87 +410,6 @@ int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, enum xlate_instdata xlid, return r; } -static hva_t kvmppc_bad_hva(void) -{ - return PAGE_OFFSET; -} - -static hva_t kvmppc_pte_to_hva(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte) -{ - hva_t hpage; - - hpage = gfn_to_hva(vcpu->kvm, pte->raddr >> PAGE_SHIFT); - if (kvm_is_error_hva(hpage)) - goto err; - - return hpage | (pte->raddr & ~PAGE_MASK); -err: - return kvmppc_bad_hva(); -} - -int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, - bool data) -{ - struct kvmppc_pte pte; - int r; - - vcpu->stat.st++; - - r = kvmppc_xlate(vcpu, *eaddr, data ? XLATE_DATA : XLATE_INST, -XLATE_WRITE, &pte); - if (r < 0) - return r; - - *eaddr = pte.raddr; - - if (!pte.may_write) - return -EPERM; - - if (kvm_write_guest(vcpu->kvm, pte.raddr, ptr, size)) - return EMULATE_DO_MMIO; - - return EMULATE_DONE; -} -EXPORT_SYMBOL_GPL(kvmppc_st); - -int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, - bool data) -{ - struct kvmppc_pte pte; - hva_t hva = *eaddr; - int rc; - - vcpu->stat.ld++; - - rc = kvmppc_xlate(vcpu, *eaddr, data ? XLATE_DATA : XLATE_INST, - XLATE_READ, &pte); - if (rc) - return rc; - - *eaddr = pte.raddr; - - if (!pte.may_read) - return -EPERM; - - if (!data && !pte.may_execute) - return -ENOEXEC; - - hva = kvmppc_pte_to_hva(vcpu, &pte); - if (kvm_is_error_hva(hva)) - goto mmio; - - if (copy_from_user(ptr, (void __user *)hva, size)) { - printk(KERN_INFO "kvmppc_ld at 0x%lx failed\n", hva); - goto mmio; - } - - retur
[PATCH 7/7] KVM: PPC: Expose helper functions for data/inst faults
We're going to implement guest code interpretation in KVM for some rare corner cases. This code needs to be able to inject data and instruction faults into the guest when it encounters them. Expose generic APIs to do this in a reasonably subarch agnostic fashion. Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_ppc.h | 8 arch/powerpc/kvm/book3s.c | 17 + arch/powerpc/kvm/booke.c | 16 ++-- 3 files changed, 35 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 2214ee6..cbee453 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -132,6 +132,14 @@ extern void kvmppc_core_dequeue_dec(struct kvm_vcpu *vcpu); extern void kvmppc_core_queue_external(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq); extern void kvmppc_core_dequeue_external(struct kvm_vcpu *vcpu); +extern void kvmppc_core_queue_dtlb_miss(struct kvm_vcpu *vcpu, ulong dear_flags, + ulong esr_flags); +extern void kvmppc_core_queue_data_storage(struct kvm_vcpu *vcpu, + ulong dear_flags, + ulong esr_flags); +extern void kvmppc_core_queue_itlb_miss(struct kvm_vcpu *vcpu); +extern void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu, + ulong esr_flags); extern void kvmppc_core_flush_tlb(struct kvm_vcpu *vcpu); extern int kvmppc_core_check_requests(struct kvm_vcpu *vcpu); diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index de8da33..dd03f6b 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -230,6 +230,23 @@ void kvmppc_core_dequeue_external(struct kvm_vcpu *vcpu) kvmppc_book3s_dequeue_irqprio(vcpu, BOOK3S_INTERRUPT_EXTERNAL_LEVEL); } +void kvmppc_core_queue_data_storage(struct kvm_vcpu *vcpu, ulong dar, + ulong flags) +{ + kvmppc_set_dar(vcpu, dar); + kvmppc_set_dsisr(vcpu, flags); + kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_DATA_STORAGE); +} + +void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu, ulong flags) +{ + u64 msr = kvmppc_get_msr(vcpu); + msr &= ~(SRR1_ISI_NOPT | SRR1_ISI_N_OR_G | SRR1_ISI_PROT); + msr |= flags & (SRR1_ISI_NOPT | SRR1_ISI_N_OR_G | SRR1_ISI_PROT); + kvmppc_set_msr_fast(vcpu, msr); + kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_INST_STORAGE); +} + int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority) { int deliver = 1; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 2f697b4..f30948a 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -185,24 +185,28 @@ static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu, set_bit(priority, &vcpu->arch.pending_exceptions); } -static void kvmppc_core_queue_dtlb_miss(struct kvm_vcpu *vcpu, -ulong dear_flags, ulong esr_flags) +void kvmppc_core_queue_dtlb_miss(struct kvm_vcpu *vcpu, +ulong dear_flags, ulong esr_flags) { vcpu->arch.queued_dear = dear_flags; vcpu->arch.queued_esr = esr_flags; kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_DTLB_MISS); } -static void kvmppc_core_queue_data_storage(struct kvm_vcpu *vcpu, - ulong dear_flags, ulong esr_flags) +void kvmppc_core_queue_data_storage(struct kvm_vcpu *vcpu, + ulong dear_flags, ulong esr_flags) { vcpu->arch.queued_dear = dear_flags; vcpu->arch.queued_esr = esr_flags; kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_DATA_STORAGE); } -static void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu, - ulong esr_flags) +void kvmppc_core_queue_itlb_miss(struct kvm_vcpu *vcpu) +{ + kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_ITLB_MISS); +} + +void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu, ulong esr_flags) { vcpu->arch.queued_esr = esr_flags; kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_INST_STORAGE); -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/7] KVM: PPC: Implement kvmppc_xlate for all targets
We have a nice API to find the translated GPAs of a GVA including protection flags. So far we only use it on Book3S, but there's no reason the same shouldn't be used on BookE as well. Implement a kvmppc_xlate() version for BookE and clean it up to make it more readable in general. Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_ppc.h | 13 ++ arch/powerpc/kvm/book3s.c | 12 ++--- arch/powerpc/kvm/booke.c | 51 ++ 3 files changed, 72 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index e381363..1a60af9 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -52,6 +52,16 @@ enum instruction_type { INST_SC,/* system call */ }; +enum xlate_instdata { + XLATE_INST, /* translate instruction address */ + XLATE_DATA /* translate data address */ +}; + +enum xlate_readwrite { + XLATE_READ, /* check for read permissions */ + XLATE_WRITE /* check for write permissions */ +}; + extern int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu); extern int __kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu); extern void kvmppc_handler_highmem(void); @@ -94,6 +104,9 @@ extern gpa_t kvmppc_mmu_xlate(struct kvm_vcpu *vcpu, unsigned int gtlb_index, gva_t eaddr); extern void kvmppc_mmu_dtlb_miss(struct kvm_vcpu *vcpu); extern void kvmppc_mmu_itlb_miss(struct kvm_vcpu *vcpu); +extern int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, + enum xlate_instdata xlid, enum xlate_readwrite xlrw, + struct kvmppc_pte *pte); extern struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id); diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index a3cbada..0b6c84e 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -380,9 +380,11 @@ pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t gpa, bool writing, } EXPORT_SYMBOL_GPL(kvmppc_gpa_to_pfn); -static int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, bool data, - bool iswrite, struct kvmppc_pte *pte) +int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, enum xlate_instdata xlid, +enum xlate_readwrite xlrw, struct kvmppc_pte *pte) { + bool data = (xlid == XLATE_DATA); + bool iswrite = (xlrw == XLATE_WRITE); int relocated = (kvmppc_get_msr(vcpu) & (data ? MSR_DR : MSR_IR)); int r; @@ -434,7 +436,8 @@ int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, vcpu->stat.st++; - r = kvmppc_xlate(vcpu, *eaddr, data, true, &pte); + r = kvmppc_xlate(vcpu, *eaddr, data ? XLATE_DATA : XLATE_INST, +XLATE_WRITE, &pte); if (r < 0) return r; @@ -459,7 +462,8 @@ int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, vcpu->stat.ld++; - rc = kvmppc_xlate(vcpu, *eaddr, data, false, &pte); + rc = kvmppc_xlate(vcpu, *eaddr, data ? XLATE_DATA : XLATE_INST, + XLATE_READ, &pte); if (rc) return rc; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 97bcde2..2f697b4 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -1785,6 +1785,57 @@ void kvm_guest_protect_msr(struct kvm_vcpu *vcpu, ulong prot_bitmap, bool set) #endif } +int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, enum xlate_instdata xlid, +enum xlate_readwrite xlrw, struct kvmppc_pte *pte) +{ + int gtlb_index; + gpa_t gpaddr; + +#ifdef CONFIG_KVM_E500V2 + if (!(vcpu->arch.shared->msr & MSR_PR) && + (eaddr & PAGE_MASK) == vcpu->arch.magic_page_ea) { + pte->eaddr = eaddr; + pte->raddr = (vcpu->arch.magic_page_pa & PAGE_MASK) | +(eaddr & ~PAGE_MASK); + pte->vpage = eaddr >> PAGE_SHIFT; + pte->may_read = true; + pte->may_write = true; + pte->may_execute = true; + + return 0; + } +#endif + + /* Check the guest TLB. */ + switch (xlid) { + case XLATE_INST: + gtlb_index = kvmppc_mmu_itlb_index(vcpu, eaddr); + break; + case XLATE_DATA: + gtlb_index = kvmppc_mmu_dtlb_index(vcpu, eaddr); + break; + default: + BUG(); + } + + /* Do we have a TLB entry at all? */ + if (gtlb_index < 0) + return -ENOENT; + + gpaddr = kvmppc_mmu_xlate(vcpu, gtlb_index, eaddr); + + pte->eaddr = eaddr; + pte->raddr = (gpaddr & PAGE_MASK) | (eaddr & ~PAGE_MASK); + pte->v
[PATCH 0/7] KVM: PPC: Prepare for shared kernel / kvm emulation code
Paul recently posted a nice patch set that started to make kvm code use the already existing in-kernel instruction emulator. Some bits from my previous attempt to implement yet another instruction emulator can and should be reused for that approach as well. So this patch set gathers all the bits that make sense as a basis for any code that wants to implement more generic instruction emulation in KVM. And even if we may never get this awesome emulation, at least it also is a nice code cleanup to make things more understandable and common across booke and book3s. Alexander Graf (7): KVM: PPC: Implement kvmppc_xlate for all targets KVM: PPC: Move kvmppc_ld/st to common code KVM: PPC: Remove kvmppc_bad_hva() KVM: PPC: Use kvm_read_guest in kvmppc_ld KVM: PPC: Handle magic page in kvmppc_ld/st KVM: PPC: Separate loadstore emulation from priv emulation KVM: PPC: Expose helper functions for data/inst faults arch/powerpc/include/asm/kvm_book3s.h | 9 +- arch/powerpc/include/asm/kvm_booke.h | 10 ++ arch/powerpc/include/asm/kvm_host.h | 4 +- arch/powerpc/include/asm/kvm_ppc.h| 26 arch/powerpc/kvm/Makefile | 4 +- arch/powerpc/kvm/book3s.c | 102 +++-- arch/powerpc/kvm/booke.c | 67 - arch/powerpc/kvm/emulate.c| 192 +--- arch/powerpc/kvm/emulate_loadstore.c | 266 ++ arch/powerpc/kvm/powerpc.c| 77 +- 10 files changed, 474 insertions(+), 283 deletions(-) create mode 100644 arch/powerpc/kvm/emulate_loadstore.c -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking "has_work"
> > On 28.07.2014, at 16:16, David Hildenbrand wrote: > > >> > >> On 10.07.14 15:10, Christian Borntraeger wrote: > >>> From: David Hildenbrand > >>> > >>> If a cpu is stopped, it must never be allowed to run and no interrupt may > >>> wake it > >>> up. A cpu also has to be unhalted if it is halted and has work to do - > >>> this > >>> scenario wasn't hit in kvm case yet, as only "disabled wait" is processed > >>> within > >>> QEMU. > >>> > >>> Signed-off-by: David Hildenbrand > >>> Reviewed-by: Cornelia Huck > >>> Reviewed-by: Christian Borntraeger > >>> Signed-off-by: Christian Borntraeger > >> > >> This looks like it's something that generic infrastructure should take > >> care of, no? How does this work for the other archs? They always get an > >> interrupt on the transition between !has_work -> has_work. Why don't we > >> get one for s390x? > >> > >> > >> Alex > >> > >> > > > > Well, we have the special case on s390 as a CPU that is in the STOPPED or > > the > > CHECK STOP state may never run - even if there is an interrupt. It's > > basically like this CPU has been switched off. > > > > Imagine that it is tried to inject an interrupt into a stopped vcpu. It > > will kick the stopped vcpu and thus lead to a call to > > "kvm_arch_process_async_events()". We have to deny that this vcpu will ever > > run as long as it is stopped. It's like a way to "suppress" the > > interrupt for such a transition you mentioned. > > An interrupt kick usually just means we go back into the main loop. From > there we check the interrupt bitmap which interrupt to handle. Check out the > handling code here: > > > http://git.qemu.org/?p=qemu.git;a=blob;f=cpu-exec.c;h=38e5f02a307523d99134f4e2e6c51683bb10b45b;hb=HEAD#l580 > > If you just check for the stopped state in here, do_interrupt() will never > get called and thus the CPU shouldn't ever get executed. Unless I'm heavily > mistaken :). So you would rather move the check out of has_work() into the main loop in cpu-exec.c and directly into kvm_arch_process_async_events()? This would on the other hand lead to an unhalt of the vcpu in cpu_exec() on any CPU_INTERRUPT_HARD. A VCPU might thus be unhalted although it is not able to run. Is okay? Looking at cpu.c:cpu_thread_is_idle(), we would maybe return false, although we are idle (because we are idle when we are stopped)? My qemu kvm knowledge is way better than the qemu emulation knowledge, so I appreciate any insights :) > > > > > Later, another vcpu might decide to turn that vcpu back on (by e.g. sending > > a > > SIGP START to that vcpu). > > Yes, in that case that other CPU generates a signal (a different bit in > interrupt_request) and the first CPU would see that it has to wake up and > wake up. > > > I am not sure if such a mechanism/scenario is applicable to any other arch. > > They > > all seem to reset the cs->halted flag if they know they are able to run > > (e.g. > > due to an interrupt) - they have no such thing as "stopped cpus", only > > "halted/waiting cpus". > > There's not really much difference between the two. The only difference from > a software point of view is that a "stopped" CPU has its external interrupt > bits masked off, no? Well the difference is, that a STOPPED vcpu can be woken up by non-interrupt like things (SIGP START) AND a special interrupt (SIGP RESTART - which is like a "SIPI"++ as it performs a psw exchange - "NMI"). So we basically have two paths that can lead to a state change. All interrupt bits may be in any combination (SIGP RESTART interrupts can't be masked out, nor can SIGP START be denied). The other thing may be that on s390, each vcpu (including itself) can put another vcpu into the STOPPED state - I assume that this is different for x86 " INIT_RECEIVED". For this reason we have to watch out for bad race conditions (e.g. multiple vcpus working on another vcpu)... David > > > Alex > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking "has_work"
> > > > On 28.07.2014, at 16:16, David Hildenbrand wrote: > > > > >> > > >> On 10.07.14 15:10, Christian Borntraeger wrote: > > >>> From: David Hildenbrand > > >>> > > >>> If a cpu is stopped, it must never be allowed to run and no interrupt > > >>> may wake it > > >>> up. A cpu also has to be unhalted if it is halted and has work to do - > > >>> this > > >>> scenario wasn't hit in kvm case yet, as only "disabled wait" is > > >>> processed within > > >>> QEMU. > > >>> > > >>> Signed-off-by: David Hildenbrand > > >>> Reviewed-by: Cornelia Huck > > >>> Reviewed-by: Christian Borntraeger > > >>> Signed-off-by: Christian Borntraeger > > >> > > >> This looks like it's something that generic infrastructure should take > > >> care of, no? How does this work for the other archs? They always get an > > >> interrupt on the transition between !has_work -> has_work. Why don't we > > >> get one for s390x? > > >> > > >> > > >> Alex > > >> > > >> > > > > > > Well, we have the special case on s390 as a CPU that is in the STOPPED or > > > the > > > CHECK STOP state may never run - even if there is an interrupt. It's > > > basically like this CPU has been switched off. > > > > > > Imagine that it is tried to inject an interrupt into a stopped vcpu. It > > > will kick the stopped vcpu and thus lead to a call to > > > "kvm_arch_process_async_events()". We have to deny that this vcpu will > > > ever > > > run as long as it is stopped. It's like a way to "suppress" the > > > interrupt for such a transition you mentioned. > > > > An interrupt kick usually just means we go back into the main loop. From > > there we check the interrupt bitmap which interrupt to handle. Check out > > the handling code here: > > > > > > http://git.qemu.org/?p=qemu.git;a=blob;f=cpu-exec.c;h=38e5f02a307523d99134f4e2e6c51683bb10b45b;hb=HEAD#l580 > > > > If you just check for the stopped state in here, do_interrupt() will never > > get called and thus the CPU shouldn't ever get executed. Unless I'm heavily > > mistaken :). > > So you would rather move the check out of has_work() into the main loop in > cpu-exec.c and directly into kvm_arch_process_async_events()? > > This would on the other hand lead to an unhalt of the vcpu in cpu_exec() on > any > CPU_INTERRUPT_HARD. A VCPU might thus be unhalted although it is not able to > run. Is okay? > > Looking at cpu.c:cpu_thread_is_idle(), we would maybe return false, although > we > are idle (because we are idle when we are stopped)? > > My qemu kvm knowledge is way better than the qemu emulation knowledge, so I > appreciate any insights :) > > > > > > > > > Later, another vcpu might decide to turn that vcpu back on (by e.g. > > > sending a > > > SIGP START to that vcpu). > > > > Yes, in that case that other CPU generates a signal (a different bit in > > interrupt_request) and the first CPU would see that it has to wake up and > > wake up. > > > > > I am not sure if such a mechanism/scenario is applicable to any other > > > arch. They > > > all seem to reset the cs->halted flag if they know they are able to run > > > (e.g. > > > due to an interrupt) - they have no such thing as "stopped cpus", only > > > "halted/waiting cpus". > > > > There's not really much difference between the two. The only difference > > from a software point of view is that a "stopped" CPU has its external > > interrupt bits masked off, no? > > Well the difference is, that a STOPPED vcpu can be woken up by non-interrupt > like things (SIGP START) AND a special interrupt (SIGP RESTART - which is like > a "SIPI"++ as it performs a psw exchange - "NMI"). So we basically have two > paths that can lead to a state change. All interrupt bits may be in any > combination (SIGP RESTART interrupts can't be masked out, nor can SIGP START > be > denied). > > The other thing may be that on s390, each vcpu (including itself) can put > another vcpu into the STOPPED state - I assume that this is different for x86 > " > INIT_RECEIVED". For this reason we have to watch out for bad race conditions > (e.g. multiple vcpus working on another vcpu)... Ah, sorry, just to clearify, a vcpu always sets itself to STOPPED, its the other vcpus that trigger it (= interrupt-like). David > > David > > > > > > > Alex > > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking "has_work"
On 28.07.14 17:03, David Hildenbrand wrote: On 28.07.2014, at 16:16, David Hildenbrand wrote: On 10.07.14 15:10, Christian Borntraeger wrote: From: David Hildenbrand If a cpu is stopped, it must never be allowed to run and no interrupt may wake it up. A cpu also has to be unhalted if it is halted and has work to do - this scenario wasn't hit in kvm case yet, as only "disabled wait" is processed within QEMU. Signed-off-by: David Hildenbrand Reviewed-by: Cornelia Huck Reviewed-by: Christian Borntraeger Signed-off-by: Christian Borntraeger This looks like it's something that generic infrastructure should take care of, no? How does this work for the other archs? They always get an interrupt on the transition between !has_work -> has_work. Why don't we get one for s390x? Alex Well, we have the special case on s390 as a CPU that is in the STOPPED or the CHECK STOP state may never run - even if there is an interrupt. It's basically like this CPU has been switched off. Imagine that it is tried to inject an interrupt into a stopped vcpu. It will kick the stopped vcpu and thus lead to a call to "kvm_arch_process_async_events()". We have to deny that this vcpu will ever run as long as it is stopped. It's like a way to "suppress" the interrupt for such a transition you mentioned. An interrupt kick usually just means we go back into the main loop. From there we check the interrupt bitmap which interrupt to handle. Check out the handling code here: http://git.qemu.org/?p=qemu.git;a=blob;f=cpu-exec.c;h=38e5f02a307523d99134f4e2e6c51683bb10b45b;hb=HEAD#l580 If you just check for the stopped state in here, do_interrupt() will never get called and thus the CPU shouldn't ever get executed. Unless I'm heavily mistaken :). So you would rather move the check out of has_work() into the main loop in cpu-exec.c and directly into kvm_arch_process_async_events()? This would on the other hand lead to an unhalt of the vcpu in cpu_exec() on any CPU_INTERRUPT_HARD. A VCPU might thus be unhalted although it is not able to run. Is okay? Not really I think. We could create a new interrupt_request bit called CPU_INTERRUPT_STOPPED that doesn't get unset automatically and simply sets cpu->halted = 1 (similar to CPU_INTERRUPT_HALT). Looking at cpu.c:cpu_thread_is_idle(), we would maybe return false, although we are idle (because we are idle when we are stopped)? My qemu kvm knowledge is way better than the qemu emulation knowledge, so I appreciate any insights :) Later, another vcpu might decide to turn that vcpu back on (by e.g. sending a SIGP START to that vcpu). Yes, in that case that other CPU generates a signal (a different bit in interrupt_request) and the first CPU would see that it has to wake up and wake up. I am not sure if such a mechanism/scenario is applicable to any other arch. They all seem to reset the cs->halted flag if they know they are able to run (e.g. due to an interrupt) - they have no such thing as "stopped cpus", only "halted/waiting cpus". There's not really much difference between the two. The only difference from a software point of view is that a "stopped" CPU has its external interrupt bits masked off, no? Well the difference is, that a STOPPED vcpu can be woken up by non-interrupt like things (SIGP START) AND a special interrupt (SIGP RESTART - which is like a "SIPI"++ as it performs a psw exchange - "NMI"). So we basically have two paths that can lead to a state change. All interrupt bits may be in any combination (SIGP RESTART interrupts can't be masked out, nor can SIGP START be denied). That's perfectly normal behavior. Just make it two different interrupt types: if (interrupt_request & CPU_INTERRUPT_STOPPED) { /* Go back to halted state */ ... } else if (interrupt_request & CPU_INTERRUPT_SIGP) { env->interrupt_request &= ~CPU_INTERRUPT_STOPPED; /* swap PSW */ ... } else if ((interrupt_request & CPU_INTERRUPT_HARD) && (env->psw.mask & PSW_MASK_EXT)) { ... } The other thing may be that on s390, each vcpu (including itself) can put another vcpu into the STOPPED state - I assume that this is different for x86 " INIT_RECEIVED". For this reason we have to watch out for bad race conditions (e.g. multiple vcpus working on another vcpu)... TCG is single-threaded :). And if you stick to the interrupt logic above all should be good. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Verifying Execution Integrity in Untrusted hypervisors
On 07/25/2014 03:11 PM, Shiva V wrote: > Hello, > I am exploring on finding a way to ensure runtime integrity of > > a executable in untrusted hypervisors. > > In particular, this is my requirements: > > 1. I have a 2 virtual machines. (A, B). > > 2. VM-A is running some service (exe) inside it. For example any resource > > accounting service intended to monitor for VM-B. > > 3. I need a way to verify run time integrity from VM-B of the executable > > running inside VM-A. > > 4. Both the vm's are not privileged vm's and are just normal client virtual > > machines. > > 5. Underlying hypervisor is untrusted. If the hypervisor is untrusted you have broken the root of trust and are going to be pretty out of luck. Any solution will require a level below the hypervisor that you trust. An example would be hardware that isolates memory from the hypervisor, ie https://www.google.com/patents/WO2013054528A1?cl=en&dq=Joel+Schopp&hl=en&sa=X&ei=YYPWU6aVJNProATe5IHACQ&ved=0CDMQ6AEwAw Another approach might be to start with something like a TPM and a trusted runtime UEFI. You could then have the guest call UEFI to do measurement with the TPM and use that for remote attestation. With such a method you could probably get to the point that you could measure something in guest memory at run-time, but you would have no assurance it hadn't been modified prior or after and was just temporarily correct, it would be a very point in time measurement. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: Remove 440 support
The 440 target hasn't been properly functioning for a few releases and before I was the only one who fixes a very serious bug that indicates to me that nobody used it before either. Furthermore KVM on 440 is slow to the extent of unusable. We don't have to carry along completely unused code. Remove 440 and give us one less thing to worry about. Signed-off-by: Alexander Graf diff --git a/Documentation/powerpc/00-INDEX b/Documentation/powerpc/00-INDEX index 6db73df..a68784d 100644 --- a/Documentation/powerpc/00-INDEX +++ b/Documentation/powerpc/00-INDEX @@ -17,8 +17,6 @@ firmware-assisted-dump.txt - Documentation on the firmware assisted dump mechanism "fadump". hvcs.txt - IBM "Hypervisor Virtual Console Server" Installation Guide -kvm_440.txt - - Various notes on the implementation of KVM for PowerPC 440. mpc52xx.txt - Linux 2.6.x on MPC52xx family pmu-ebb.txt diff --git a/Documentation/powerpc/kvm_440.txt b/Documentation/powerpc/kvm_440.txt deleted file mode 100644 index c02a003..000 --- a/Documentation/powerpc/kvm_440.txt +++ /dev/null @@ -1,41 +0,0 @@ -Hollis Blanchard -15 Apr 2008 - -Various notes on the implementation of KVM for PowerPC 440: - -To enforce isolation, host userspace, guest kernel, and guest userspace all -run at user privilege level. Only the host kernel runs in supervisor mode. -Executing privileged instructions in the guest traps into KVM (in the host -kernel), where we decode and emulate them. Through this technique, unmodified -440 Linux kernels can be run (slowly) as guests. Future performance work will -focus on reducing the overhead and frequency of these traps. - -The usual code flow is started from userspace invoking an "run" ioctl, which -causes KVM to switch into guest context. We use IVPR to hijack the host -interrupt vectors while running the guest, which allows us to direct all -interrupts to kvmppc_handle_interrupt(). At this point, we could either -- handle the interrupt completely (e.g. emulate "mtspr SPRG0"), or -- let the host interrupt handler run (e.g. when the decrementer fires), or -- return to host userspace (e.g. when the guest performs device MMIO) - -Address spaces: We take advantage of the fact that Linux doesn't use the AS=1 -address space (in host or guest), which gives us virtual address space to use -for guest mappings. While the guest is running, the host kernel remains mapped -in AS=0, but the guest can only use AS=1 mappings. - -TLB entries: The TLB entries covering the host linear mapping remain -present while running the guest. This reduces the overhead of lightweight -exits, which are handled by KVM running in the host kernel. We keep three -copies of the TLB: - - guest TLB: contents of the TLB as the guest sees it - - shadow TLB: the TLB that is actually in hardware while guest is running - - host TLB: to restore TLB state when context switching guest -> host -When a TLB miss occurs because a mapping was not present in the shadow TLB, -but was present in the guest TLB, KVM handles the fault without invoking the -guest. Large guest pages are backed by multiple 4KB shadow pages through this -mechanism. - -IO: MMIO and DCR accesses are emulated by userspace. We use virtio for network -and block IO, so those drivers must be enabled in the guest. It's possible -that some qemu device emulation (e.g. e1000 or rtl8139) may also work with -little effort. diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug index 790352f..93500f6 100644 --- a/arch/powerpc/Kconfig.debug +++ b/arch/powerpc/Kconfig.debug @@ -202,9 +202,7 @@ config PPC_EARLY_DEBUG_BEAT config PPC_EARLY_DEBUG_44x bool "Early serial debugging for IBM/AMCC 44x CPUs" - # PPC_EARLY_DEBUG on 440 leaves AS=1 mappings above the TLB high water - # mark, which doesn't work with current 440 KVM. - depends on 44x && !KVM + depends on 44x help Select this to enable early debugging for IBM 44x chips via the inbuilt serial port. If you enable this, ensure you set diff --git a/arch/powerpc/configs/ppc44x_defconfig b/arch/powerpc/configs/ppc44x_defconfig index ccf66b9..924e10d 100644 --- a/arch/powerpc/configs/ppc44x_defconfig +++ b/arch/powerpc/configs/ppc44x_defconfig @@ -127,4 +127,3 @@ CONFIG_CRYPTO_PCBC=y # CONFIG_CRYPTO_ANSI_CPRNG is not set # CONFIG_CRYPTO_HW is not set CONFIG_VIRTUALIZATION=y -CONFIG_KVM_440=y diff --git a/arch/powerpc/include/asm/kvm_44x.h b/arch/powerpc/include/asm/kvm_44x.h deleted file mode 100644 index a0e5761..000 --- a/arch/powerpc/include/asm/kvm_44x.h +++ /dev/null @@ -1,67 +0,0 @@ -/* - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License, version 2, as - * published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PA
[PATCH] KVM: PPC: Remove DCR handling
DCR handling was only needed for 440 KVM. Since we removed it, we can also remove handling of DCR accesses. Signed-off-by: Alexander Graf --- Documentation/virtual/kvm/api.txt | 6 +++--- arch/powerpc/include/asm/kvm_host.h | 4 arch/powerpc/include/asm/kvm_ppc.h | 1 - arch/powerpc/kvm/booke.c| 5 - arch/powerpc/kvm/powerpc.c | 10 -- arch/powerpc/kvm/timing.c | 1 - arch/powerpc/kvm/timing.h | 3 --- include/uapi/linux/kvm.h| 4 ++-- 8 files changed, 5 insertions(+), 29 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 8898caf..a21ff22 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2613,8 +2613,8 @@ The 'data' member contains, in its first 'len' bytes, the value as it would appear if the VCPU performed a load or store of the appropriate width directly to the byte array. -NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_DCR, - KVM_EXIT_PAPR and KVM_EXIT_EPR the corresponding +NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI KVM_EXIT_PAPR and + KVM_EXIT_EPR the corresponding operations are complete (and guest state is consistent) only after userspace has re-entered the kernel with KVM_RUN. The kernel side will first finish incomplete operations and then check for pending signals. Userspace @@ -2685,7 +2685,7 @@ Principles of Operation Book in the Chapter for Dynamic Address Translation __u8 is_write; } dcr; -powerpc specific. +Deprecated - was used for 440 KVM. /* KVM_EXIT_OSI */ struct { diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 66f5b59..98d9dd5 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -94,7 +94,6 @@ struct kvm_vm_stat { struct kvm_vcpu_stat { u32 sum_exits; u32 mmio_exits; - u32 dcr_exits; u32 signal_exits; u32 light_exits; /* Account for special types of light exits: */ @@ -126,7 +125,6 @@ struct kvm_vcpu_stat { enum kvm_exit_types { MMIO_EXITS, - DCR_EXITS, SIGNAL_EXITS, ITLB_REAL_MISS_EXITS, ITLB_VIRT_MISS_EXITS, @@ -601,8 +599,6 @@ struct kvm_vcpu_arch { u8 io_gpr; /* GPR used as IO source/target */ u8 mmio_is_bigendian; u8 mmio_sign_extend; - u8 dcr_needed; - u8 dcr_is_write; u8 osi_needed; u8 osi_enabled; u8 papr_enabled; diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index cbee453..8e36c1e 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -41,7 +41,6 @@ enum emulation_result { EMULATE_DONE, /* no further processing */ EMULATE_DO_MMIO, /* kvm_run filled with MMIO request */ - EMULATE_DO_DCR, /* kvm_run filled with DCR request */ EMULATE_FAIL, /* can't emulate this instruction */ EMULATE_AGAIN,/* something went wrong. go again */ EMULATE_EXIT_USER,/* emulation requires exit to user-space */ diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index f30948a..b4c89fa 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -51,7 +51,6 @@ unsigned long kvmppc_booke_handlers; struct kvm_stats_debugfs_item debugfs_entries[] = { { "mmio", VCPU_STAT(mmio_exits) }, - { "dcr",VCPU_STAT(dcr_exits) }, { "sig",VCPU_STAT(signal_exits) }, { "itlb_r", VCPU_STAT(itlb_real_miss_exits) }, { "itlb_v", VCPU_STAT(itlb_virt_miss_exits) }, @@ -709,10 +708,6 @@ static int emulation_exit(struct kvm_run *run, struct kvm_vcpu *vcpu) case EMULATE_AGAIN: return RESUME_GUEST; - case EMULATE_DO_DCR: - run->exit_reason = KVM_EXIT_DCR; - return RESUME_HOST; - case EMULATE_FAIL: printk(KERN_CRIT "%s: emulation at %lx failed (%08x)\n", __func__, vcpu->arch.pc, vcpu->arch.last_inst); diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index c14ed15..288b4bb 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -743,12 +743,6 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) #endif } -static void kvmppc_complete_dcr_load(struct kvm_vcpu *vcpu, - struct kvm_run *run) -{ - kvmppc_set_gpr(vcpu, vcpu->arch.io_gpr, run->dcr.data); -} - static void kvmppc_complete_mmio_load(struct kvm_vcpu *vcpu, struct kvm_run *run) { @@ -945,10 +939,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) if (!vcpu->mmio_is_write) kvmppc_complete_mmio_load(vcpu, run);
Re: Verifying Execution Integrity in Untrusted hypervisors
On 2014-07-28 19:17, Joel Schopp wrote: > > On 07/25/2014 03:11 PM, Shiva V wrote: >> Hello, >> I am exploring on finding a way to ensure runtime integrity of >> >> a executable in untrusted hypervisors. >> >> In particular, this is my requirements: >> >> 1. I have a 2 virtual machines. (A, B). >> >> 2. VM-A is running some service (exe) inside it. For example any resource >> >> accounting service intended to monitor for VM-B. >> >> 3. I need a way to verify run time integrity from VM-B of the executable >> >> running inside VM-A. >> >> 4. Both the vm's are not privileged vm's and are just normal client virtual >> >> machines. >> >> 5. Underlying hypervisor is untrusted. > If the hypervisor is untrusted you have broken the root of trust and are > going to be pretty out of luck. > > Any solution will require a level below the hypervisor that you trust. > An example would be hardware that isolates memory from the hypervisor, > ie > https://www.google.com/patents/WO2013054528A1?cl=en&dq=Joel+Schopp&hl=en&sa=X&ei=YYPWU6aVJNProATe5IHACQ&ved=0CDMQ6AEwAw The hypervisor has full control of and insight into the guest vCPU state. Only protecting some portions of guest memory seems insufficient. We rather need encryption of every data that leaves the CPU or moves from guest to host mode (and decryption the other way around). I guess that would have quite some performance impact and is far from being easy to integrate into modern processors. But, who knows... Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Verifying Execution Integrity in Untrusted hypervisors
Il 28/07/2014 20:31, Jan Kiszka ha scritto: > The hypervisor has full control of and insight into the guest vCPU > state. Only protecting some portions of guest memory seems insufficient. > > We rather need encryption of every data that leaves the CPU or moves > from guest to host mode (and decryption the other way around). I guess > that would have quite some performance impact and is far from being easy > to integrate into modern processors. But, who knows... Intel SGX sounds somewhat like what you describe, but I'm not sure how it's going to be virtualized. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] PCI: introduce helper functions for device flag operation
On Wed, 2014-07-23 at 00:19 +0800, Ethan Zhao wrote: > This patch introduced three helper functions to hide direct > device flag operation. > > void pci_set_dev_assigned(struct pci_dev *pdev); > void pci_set_dev_deassigned(struct pci_dev *pdev); > bool pci_is_dev_assigned(struct pci_dev *pdev); > > Signed-off-by: Ethan Zhao > --- > include/linux/pci.h | 13 + > 1 files changed, 13 insertions(+), 0 deletions(-) > > diff --git a/include/linux/pci.h b/include/linux/pci.h > index aab57b4..5f6f8fa 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -1129,6 +1129,19 @@ resource_size_t pcibios_window_alignment(struct > pci_bus *bus, > > int pci_set_vga_state(struct pci_dev *pdev, bool decode, > unsigned int command_bits, u32 flags); > +/* helper functions for operation of device flag */ > +static inline void pci_set_dev_assigned(struct pci_dev *pdev) > +{ > + pdev->dev_flags |= PCI_DEV_FLAGS_ASSIGNED; > +} > +static inline void pci_set_dev_deassigned(struct pci_dev *pdev) > +{ > + pdev->dev_flags &= ~PCI_DEV_FLAGS_ASSIGNED; > +} I think pci_clear_dev_assigned() would make more sense, we're not setting a flag named DEASSIGNED. > +static inline bool pci_is_dev_assigned(struct pci_dev *pdev) > +{ > + return pdev->dev_flags & PCI_DEV_FLAGS_ASSIGNED ? true : false; > +} The ternary operation isn't necessary. Thanks, Alex > /* kmem_cache style wrapper around pci_alloc_consistent() */ > > #include -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Verifying Execution Integrity in Untrusted hypervisors
On Mon, Jul 28, 2014 at 1:27 PM, Paolo Bonzini wrote: > Il 28/07/2014 20:31, Jan Kiszka ha scritto: >> The hypervisor has full control of and insight into the guest vCPU >> state. Only protecting some portions of guest memory seems insufficient. >> >> We rather need encryption of every data that leaves the CPU or moves >> from guest to host mode (and decryption the other way around). I guess >> that would have quite some performance impact and is far from being easy >> to integrate into modern processors. But, who knows... > > Intel SGX sounds somewhat like what you describe, but I'm not sure how > it's going to be virtualized. > Right. It's possible to virtualize (or pass-through) SGX without losing the security feature. With SGX, you can create secure (encrypted) islands on processes in VMs as well. But I'm not sure if it's useful for solving the problem described. -- Jun Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] target-mips: Ignore unassigned accesses with KVM
On Mon, Jul 28, 2014 at 12:37:50PM +0100, James Hogan wrote: > MIPS registers an unassigned access handler which raises a guest bus > error exception. However this causes QEMU to crash when KVM is enabled > as it isn't called from the main execution loop so longjmp() gets called > without a corresponding setjmp(). > > Until the KVM API can be updated to trigger a guest exception in > response to an MMIO exit, prevent the bus error exception being raised > from mips_cpu_unassigned_access() if KVM is enabled. > > The check is at run time since the do_unassigned_access callback is > initialised before it is known whether KVM will be enabled. > > The problem can be triggered with Malta emulation by making the guest > write to the reset region at physical address 0x1bf0, since it is > marked read-only which is treated as unassigned for writes. > > Signed-off-by: James Hogan > Cc: Aurelien Jarno > Cc: Peter Maydell > Cc: Paolo Bonzini > Cc: Gleb Natapov > Cc: Christoffer Dall > Cc: Sanjay Lal > --- > target-mips/op_helper.c | 11 +++ > 1 file changed, 11 insertions(+) > > diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c > index 27651a4a00c1..df97b35f8701 100644 > --- a/target-mips/op_helper.c > +++ b/target-mips/op_helper.c > @@ -21,6 +21,7 @@ > #include "qemu/host-utils.h" > #include "exec/helper-proto.h" > #include "exec/cpu_ldst.h" > +#include "sysemu/kvm.h" > > #ifndef CONFIG_USER_ONLY > static inline void cpu_mips_tlb_flush (CPUMIPSState *env, int flush_global); > @@ -2168,6 +2169,16 @@ void mips_cpu_unassigned_access(CPUState *cs, hwaddr > addr, > MIPSCPU *cpu = MIPS_CPU(cs); > CPUMIPSState *env = &cpu->env; > > +/* > + * Raising an exception with KVM enabled will crash because it won't be > from > + * the main execution loop so the longjmp won't have a matching setjmp. > + * Until we can trigger a bus error exception through KVM lets just > ignore > + * the access. > + */ > +if (kvm_enabled()) { > +return; > +} > + > if (is_exec) { > helper_raise_exception(env, EXCP_IBE); > } else { Reviewed-by: Aurelien Jarno Note that even if the test is added for each exception, it is light enough compared to triggering and handling an exception so that it has no impact on performance. Paolo, do you want to take this patch in your kvm tree? -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurel...@aurel32.net http://www.aurel32.net -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/6] KVM: PPC: BOOKE: No need to set DBCR0_EDM in guest visible register
On Fri, 2014-07-11 at 14:08 +0530, Bharat Bhushan wrote: > This is not used and even I do not remember why this was added > in first place. > > Signed-off-by: Bharat Bhushan > --- > arch/powerpc/kvm/booke.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c > index ab62109..a5ee42c 100644 > --- a/arch/powerpc/kvm/booke.c > +++ b/arch/powerpc/kvm/booke.c > @@ -1804,8 +1804,6 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu > *vcpu, > kvm_guest_protect_msr(vcpu, MSR_DE, true); > vcpu->guest_debug = dbg->control; > vcpu->arch.shadow_dbg_reg.dbcr0 = 0; > - /* Set DBCR0_EDM in guest visible DBCR0 register. */ > - vcpu->arch.dbg_reg.dbcr0 = DBCR0_EDM; > > if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) > vcpu->arch.shadow_dbg_reg.dbcr0 |= DBCR0_IDM | DBCR0_IC; This was intended to let the guest know that the host owns the debug resources, by analogy to what a JTAG debugger would do. The Power ISA has this "Virtualized Implementation Note": It is the responsibility of the hypervisor to ensure that DBCR0[EDM] is consistent with usage of DEP. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/6] KVM: PPC: BOOKE: Force MSR_DE in rfci if guest is under debug
On Fri, 2014-07-11 at 14:08 +0530, Bharat Bhushan wrote: > When userspace (QEMU) is using the debug resource to debug guest > then we want MSR_DE to be always set. This patch adds missing > MSR_DE setting in "rfci" instruction. > > Signed-off-by: Bharat Bhushan > --- > arch/powerpc/kvm/booke_emulate.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/kvm/booke_emulate.c > b/arch/powerpc/kvm/booke_emulate.c > index 27a4b28..80c51a2 100644 > --- a/arch/powerpc/kvm/booke_emulate.c > +++ b/arch/powerpc/kvm/booke_emulate.c > @@ -40,7 +40,11 @@ static void kvmppc_emul_rfi(struct kvm_vcpu *vcpu) > static void kvmppc_emul_rfci(struct kvm_vcpu *vcpu) > { > vcpu->arch.pc = vcpu->arch.csrr0; > - kvmppc_set_msr(vcpu, vcpu->arch.csrr1); > + /* Force MSR_DE when guest does not own debug facilities */ > + if (vcpu->guest_debug) > + kvmppc_set_msr(vcpu, vcpu->arch.csrr1 | MSR_DE); > + else > + kvmppc_set_msr(vcpu, vcpu->arch.csrr1); > } > > int kvmppc_booke_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, It looks like this is already handled by kvmppc_vcpu_sync_debug(), which is called by kvmppc_set_msr(). Plus, it should only be done for HV mode. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/6] KVM: PPC: BOOKE: Allow guest to change MSR_DE
On Fri, 2014-07-11 at 14:09 +0530, Bharat Bhushan wrote: > When userspace is debugging guest then MSR_DE is always set and > MSRP_DEP is set so that guest cannot change MSR_DE. > Guest debug resources are not yet emulated, So there seems no reason > we should stop guest controlling MSR_DE. > Also a followup patch will enable debug emulation and that requires > guest to control MSR_DE. Why does it matter whether we emulate debug resources? We still don't want the guest to be able to clear MSR[DE] and thus break host debug. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception
On Fri, 2014-07-11 at 14:09 +0530, Bharat Bhushan wrote: > This patch emulates debug registers and debug exception > to support guest using debug resource. This enables running > gdb/kgdb etc in guest. > > On BOOKE architecture we cannot share debug resources between QEMU and > guest because: > When QEMU is using debug resources then debug exception must > be always enabled. To achieve this we set MSR_DE and also set > MSRP_DEP so guest cannot change MSR_DE. > > When emulating debug resource for guest we want guest > to control MSR_DE (enable/disable debug interrupt on need). > > So above mentioned two configuration cannot be supported > at the same time. So the result is that we cannot share > debug resources between QEMU and Guest on BOOKE architecture. > > In the current design QEMU gets priority over guest, this means that if > QEMU is using debug resources then guest cannot use them and if guest is > using debug resource then QEMU can overwrite them. > > Signed-off-by: Bharat Bhushan > --- > Hi Alex, > > I thought of having some print in register emulation if QEMU > is using debug resource, Also when QEMU overwrites guest written > values but that looks excessive. If I uses some variable which > get set when guest starts using debug registers and check in > debug set ioctl then that look ugly. Looking for suggestions > > arch/powerpc/include/asm/kvm_ppc.h | 3 + > arch/powerpc/kvm/booke.c | 27 +++ > arch/powerpc/kvm/booke_emulate.c | 157 > + > 3 files changed, 187 insertions(+) > > diff --git a/arch/powerpc/include/asm/kvm_ppc.h > b/arch/powerpc/include/asm/kvm_ppc.h > index e2fd5a1..f3f7611 100644 > --- a/arch/powerpc/include/asm/kvm_ppc.h > +++ b/arch/powerpc/include/asm/kvm_ppc.h > @@ -173,6 +173,9 @@ extern int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, > u32 *server, > extern int kvmppc_xics_int_on(struct kvm *kvm, u32 irq); > extern int kvmppc_xics_int_off(struct kvm *kvm, u32 irq); > > +void kvmppc_core_dequeue_debug(struct kvm_vcpu *vcpu); > +void kvmppc_core_queue_debug(struct kvm_vcpu *vcpu); > + > union kvmppc_one_reg { > u32 wval; > u64 dval; > diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c > index fadfe76..c2471ed 100644 > --- a/arch/powerpc/kvm/booke.c > +++ b/arch/powerpc/kvm/booke.c > @@ -264,6 +264,16 @@ static void kvmppc_core_dequeue_watchdog(struct kvm_vcpu > *vcpu) > clear_bit(BOOKE_IRQPRIO_WATCHDOG, &vcpu->arch.pending_exceptions); > } > > +void kvmppc_core_queue_debug(struct kvm_vcpu *vcpu) > +{ > + kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_DEBUG); > +} > + > +void kvmppc_core_dequeue_debug(struct kvm_vcpu *vcpu) > +{ > + clear_bit(BOOKE_IRQPRIO_DEBUG, &vcpu->arch.pending_exceptions); > +} > + > static void set_guest_srr(struct kvm_vcpu *vcpu, unsigned long srr0, u32 > srr1) > { > #ifdef CONFIG_KVM_BOOKE_HV > @@ -783,6 +793,23 @@ static int kvmppc_handle_debug(struct kvm_run *run, > struct kvm_vcpu *vcpu) > struct debug_reg *dbg_reg = &(vcpu->arch.shadow_dbg_reg); > u32 dbsr = vcpu->arch.dbsr; > > + if (vcpu->guest_debug == 0) { > + /* Debug resources belong to Guest */ > + if (dbsr && (vcpu->arch.shared->msr & MSR_DE)) > + kvmppc_core_queue_debug(vcpu); Should also check for DBCR0[IDM]. > + > + /* Inject a program interrupt if trap debug is not allowed */ > + if ((dbsr & DBSR_TIE) && !(vcpu->arch.shared->msr & MSR_DE)) > + kvmppc_core_queue_program(vcpu, ESR_PTR); > + > + return RESUME_GUEST; > + } > + > + /* > + * Prepare for userspace exit as debug resources > + * are owned by userspace. > + */ > + vcpu->arch.dbsr = 0; > run->debug.arch.status = 0; > run->debug.arch.address = vcpu->arch.pc; Why are you clearing vcpu->arch.dbsr? Userspace might be interested in the raw value, plus it's a change from the current API semantics. Where is this run->debug.arch. stuff and KVM_EXIT_DEBUG documented? > diff --git a/arch/powerpc/kvm/booke_emulate.c > b/arch/powerpc/kvm/booke_emulate.c > index 2a20194..3d143fe 100644 > --- a/arch/powerpc/kvm/booke_emulate.c > +++ b/arch/powerpc/kvm/booke_emulate.c > @@ -139,6 +139,7 @@ int kvmppc_booke_emulate_op(struct kvm_run *run, struct > kvm_vcpu *vcpu, > int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, ulong > spr_val) > { > int emulated = EMULATE_DONE; > + bool debug_inst = false; > > switch (sprn) { > case SPRN_DEAR: > @@ -153,14 +154,137 @@ int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, > int sprn, ulong spr_val) > case SPRN_CSRR1: > vcpu->arch.csrr1 = spr_val; > break; > + case SPRN_DSRR0: > + vcpu->arch.dsrr0 = spr_val; > + break; > + case SPRN_DSRR1: > + vcpu->arch.dsrr1 = spr_val
Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception
On Mon, 2014-07-28 at 16:04 +0200, Alexander Graf wrote: > On 11.07.14 10:39, Bharat Bhushan wrote: > > This patch emulates debug registers and debug exception > > to support guest using debug resource. This enables running > > gdb/kgdb etc in guest. > > > > On BOOKE architecture we cannot share debug resources between QEMU and > > guest because: > > When QEMU is using debug resources then debug exception must > > be always enabled. To achieve this we set MSR_DE and also set > > MSRP_DEP so guest cannot change MSR_DE. > > > > When emulating debug resource for guest we want guest > > to control MSR_DE (enable/disable debug interrupt on need). > > > > So above mentioned two configuration cannot be supported > > at the same time. So the result is that we cannot share > > debug resources between QEMU and Guest on BOOKE architecture. > > > > In the current design QEMU gets priority over guest, this means that if > > QEMU is using debug resources then guest cannot use them and if guest is > > using debug resource then QEMU can overwrite them. > > > > Signed-off-by: Bharat Bhushan > > --- > > Hi Alex, > > > > I thought of having some print in register emulation if QEMU > > is using debug resource, Also when QEMU overwrites guest written > > values but that looks excessive. If I uses some variable which > > get set when guest starts using debug registers and check in > > debug set ioctl then that look ugly. Looking for suggestions > > Whatever you do, have QEMU do the print, not the kernel. How would that be accomplished? How would the kernel know to exit to QEMU, and how would the exit reason be conveyed? -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers
On Mon, 2014-07-28 at 03:54 -0500, Caraman Mihai Claudiu-B02008 wrote: > > -Original Message- > > From: Wood Scott-B07421 > > Sent: Saturday, July 26, 2014 3:11 AM > > To: Caraman Mihai Claudiu-B02008 > > Cc: Alexander Graf; kvm-...@vger.kernel.org; kvm@vger.kernel.org; > > linuxppc-...@lists.ozlabs.org > > Subject: Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for > > SPE/FP/AltiVec int numbers > > > > On Thu, 2014-07-24 at 04:16 -0500, Caraman Mihai Claudiu-B02008 wrote: > > > Scott, Alex's request to define SPE handlers only for e500v2 implies > > changes > > > in 32-bit FSL kernel to have exclusive configurations for e200/e500v2 > > and > > > e500mc/e5500. We would probably need something like this, what's your > > take on it? > > > > That is already a compile-time decision. > > Yes, but is not fully implemented. We might be missing some kconfig language to prohibit e500v2 boards from being enabled in an e500mc kernel, but if you try to do so it won't work (except for obsolete hacks like running QEMU's mpc8544ds platform with "-cpu e500mc"). > > > diff --git a/arch/powerpc/kernel/head_fsl_booke.S > > b/arch/powerpc/kernel/head_fsl_booke.S > > > index b497188..9d41015 100644 > > > --- a/arch/powerpc/kernel/head_fsl_booke.S > > > +++ b/arch/powerpc/kernel/head_fsl_booke.S > > > @@ -613,6 +613,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV) > > > mfspr r10, SPRN_SPRG_RSCRATCH0 > > > b InstructionStorage > > > > > > +/* Define SPE handlers only for e500v2 and e200 */ > > > +#ifndef CONFIG_PPC_E500MC > > > #ifdef CONFIG_SPE > > > /* SPE Unavailable */ > > > START_EXCEPTION(SPEUnavailable) > > > @@ -626,7 +628,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV) > > > EXCEPTION(0x2020, SPE_ALTIVEC_UNAVAIL, SPEUnavailable, \ > > > unknown_exception, EXC_XFER_EE) > > > #endif /* CONFIG_SPE */ > > > +#endif > > > > This is redundant: > > > > config SPE > > bool "SPE Support" > > depends on E200 || (E500 && !PPC_E500MC) > > default y > > I see what you mean, but it's not redundant. OK, I didn't realize there was an #else that wasn't included in the context. It would have been nice if the comment at the end were "!CONFIG_SPE" rather than "CONFIG_SPE". > Alex was looking to remove SPE > handlers for e500mc+ and the proposal handled !SPE case. In the new > light I find this variant more readable: > > +/* Define SPE handlers for e200 and e500v2 */ > #ifdef CONFIG_SPE > /* SPE Unavailable */ > START_EXCEPTION(SPEUnavailable) > @@ -622,11 +623,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV) > b fast_exception_return > 1: addir3,r1,STACK_FRAME_OVERHEAD > EXC_XFER_EE_LITE(0x2010, KernelSPE) > -#else > +#elif defined(CONFIG_E200) || \ > + (defined(CONFIG_E500) && !defined(CONFIG_PPC_E500MC)) > EXCEPTION(0x2020, SPE_ALTIVEC_UNAVAIL, SPEUnavailable, \ > unknown_exception, EXC_XFER_EE) > #endif /* CONFIG_SPE */ Yes, or better define a CONFIG_SPE_POSSIBLE so that the list only has to exist in one place, and the intent is clearer. > > > diff --git a/arch/powerpc/kernel/cputable.c > > b/arch/powerpc/kernel/cputable.c > > > index c1faade..3ab65c2 100644 > > > --- a/arch/powerpc/kernel/cputable.c > > > +++ b/arch/powerpc/kernel/cputable.c > > > @@ -2030,6 +2030,7 @@ static struct cpu_spec __initdata cpu_specs[] = { > > > #endif /* CONFIG_PPC32 */ > > > #ifdef CONFIG_E500 > > > #ifdef CONFIG_PPC32 > > > +#ifndef CONFIG_PPC_E500MC > > > { /* e500 */ > > > .pvr_mask = 0x, > > > .pvr_value = 0x8020, > > > @@ -2069,6 +2070,7 @@ static struct cpu_spec __initdata cpu_specs[] = { > > > .machine_check = machine_check_e500, > > > .platform = "ppc8548", > > > }, > > > +#endif /* !CONFIG_PPC_E500MC */ > > > { /* e500mc */ > > > .pvr_mask = 0x, > > > .pvr_value = 0x8023, > > > > > > > This looks a bit strange -- e500v2 gets excluded if PPC_E500MC, but > > e500mc gets included in !PPC_E500MC? > > Right, my main purpose was to get rid of __setup_e500_ivors on PPC_E500MC > which refers SPEUnavailable. I will add an #else to exclude e500mc. > > The "generic E500 PPC" default cpu advertises PPC_FEATURE_HAS_SPE_COMP. > Do we want to excluded it if PPC_E500MC? Is this cpu useful without > cpu_setup() > and if so why don't we also have a default for 64-bit? I don't think that default will do anything useful. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] KVM: PPC: BOOK3S: PR: Emulate virtual timebase register
Alexander Graf writes: > --- a/arch/powerpc/include/asm/reg.h > +++ b/arch/powerpc/include/asm/reg.h > @@ -25,7 +25,6 @@ > #ifdef CONFIG_8xx > #include > #endif /* CONFIG_8xx */ > -#include > > #define MSR_SF_LG 63 /* Enable 64 bit mode */ > #define MSR_ISF_LG 61 /* Interrupt 64b mode valid on > 630 */ > @@ -1210,12 +1209,6 @@ static inline unsigned long mfvtb (void) > if (cpu_has_feature(CPU_FTR_ARCH_207S)) > return mfspr(SPRN_VTB); > #endif > - /* > -* The above mfspr will be a no-op on anything before Power8 > -* That can result in random values returned. We need to > -* capture that. > -*/ > - BUG(); > return 0; > } (i missed CCing aneesh on this mail in reply to the build robot, so inserting the same reply here) the only place that calls it also does the cpu_has_feature() check and returns 0 ifndef CONFIG_PPC_BOOK3S_64 or !cpu_has_feature(). Looking get_vtb (the only caller) and the places it's called (as well as PowerISA 2.07) I think in the emulation code we're missing invoking the "system privileged instruction error handler" as the VTB SPR has spr bit 0 set to 1 (page 107 of PowerISA 2.07, mfspr docs). That being said... any guest sholud do the cpu_has_feature check themselves, so this probably isn't an issue in the real world. Certainly the host really shouldn't BUG() for what is really a guest issue (actually.. this would be a good DoS attack on < Power8 host). Reviewed-by: Stewart Smith -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost: Add polling mode
Maybe tie a knot between "vhost-net scalability tuning: threading for many VMs" and "vhost: Add polling mode" is a good marriage, because it's more possibility to get work to do with less polling time, so less cpu cycles waste. Thanks, Zhang Haoyu >> > >>> Hello All, >> > >>> >> > >>> When vhost is waiting for buffers from the guest driver (e.g., more >> > >>> packets >> > >>> to send in vhost-net's transmit queue), it normally goes to sleep >> > >>> and >> > >>> waits >> > >>> for the guest to "kick" it. This kick involves a PIO in the guest, >> > >>> and >> > >>> therefore an exit (and possibly userspace involvement in >> > >>> translating > > this >> > >>> PIO >> > >>> exit into a file descriptor event), all of which hurts performance. >> > >>> >> > >>> If the system is under-utilized (has cpu time to spare), vhost can >> > >>> continuously poll the virtqueues for new buffers, and avoid asking >> > >>> the guest to kick us. >> > >>> This patch adds an optional polling mode to vhost, that can be >> > >>> enabled >> > >>> via a kernel module parameter, "poll_start_rate". >> > >>> >> > >>> When polling is active for a virtqueue, the guest is asked to >> > >>> disable notification (kicks), and the worker thread continuously > > checks >> > >>> for >> > >>> new buffers. When it does discover new buffers, it simulates a >> > >>> "kick" > > by >> > >>> invoking the underlying backend driver (such as vhost-net), which > > thinks >> > >>> it >> > >>> got a real kick from the guest, and acts accordingly. If the > > underlying >> > >>> driver asks not to be kicked, we disable polling on this virtqueue. >> > >>> >> > >>> We start polling on a virtqueue when we notice it has >> > >>> work to do. Polling on this virtqueue is later disabled after 3 > > seconds of >> > >>> polling turning up no new work, as in this case we are better off >> > >>> returning >> > >>> to the exit-based notification mechanism. The default timeout of 3 > > seconds >> > >>> can be changed with the "poll_stop_idle" kernel module parameter. >> > >>> >> > >>> This polling approach makes lot of sense for new HW with > > posted-interrupts >> > >>> for which we have exitless host-to-guest notifications. But even >> > >>> with >> > >>> support >> > >>> for posted interrupts, guest-to-host communication still causes >> > >>> exits. >> > >>> Polling adds the missing part. >> > >>> >> > >>> When systems are overloaded, there won?t be enough cpu time for the >> > >>> various >> > >>> vhost threads to poll their guests' devices. For these scenarios, >> > >>> we > > plan >> > >>> to add support for vhost threads that can be shared by multiple > > devices, >> > >>> even of multiple vms. >> > >>> Our ultimate goal is to implement the I/O acceleration features > > described >> > >>> in: >> > >>> KVM Forum 2013: Efficient and Scalable Virtio (by Abel Gordon) >> > >>> https://www.youtube.com/watch?v=9EyweibHfEs >> > >>> and >> > >>> https://www.mail-archive.com/kvm@vger.kernel.org/msg98179.html >> > >>> >> > >>> >> > >>> Comments are welcome, >> > >>> Thank you, >> > >>> Razya > > >> Thanks for the work. Do you have perf numbers for this? > > >> > > Hi Jason, > > Thanks for reviewing. I ran some experiments with TCP stream netperf > > and > > filebench (having 2 threads performing random reads) benchmarks on an > > IBM > > System x3650 M4. > > All runs loaded the guests in a way that they were (cpu) saturated. > > The system had two cores per guest, as to allow for both the vcpu and > > the > > vhost thread to > > run concurrently for maximum throughput (but I didn't pin the threads > > to > > specific cores) > > I get: > > > > Netperf, 1 vm: > > The polling patch improved throughput by ~33%. Number of exits/sec > > decreased 6x. > > The same improvement was shown when I tested with 3 vms running > > netperf. > > > > filebench, 1 vm: > > ops/sec improved by 13% with the polling patch. Number of exits was > > reduced by 31%. > > The same experiment with 3 vms running filebench showed similar > > numbers. >>> > >>> > Looks good, may worth to add the result in the commit log. > > > > >> And looks like the patch only poll for virtqueue. In the future, may > > >> worth to add callbacks for vhost_net to poll socket. Then it could be > > >> used with rx busy polling in host which may speedup the rx also. > > Did you mean polling the network device to avoid interrupts? >>> > >>> > Yes, recent linux host support rx busy polling which can reduce the >>> > interrupts. If vhost can utilize this,
Re: [PATCH 1/4] PCI: introduce helper functions for device flag operation
On 2014/7/29 5:00, Alex Williamson wrote: On Wed, 2014-07-23 at 00:19 +0800, Ethan Zhao wrote: This patch introduced three helper functions to hide direct device flag operation. void pci_set_dev_assigned(struct pci_dev *pdev); void pci_set_dev_deassigned(struct pci_dev *pdev); bool pci_is_dev_assigned(struct pci_dev *pdev); Signed-off-by: Ethan Zhao --- include/linux/pci.h | 13 + 1 files changed, 13 insertions(+), 0 deletions(-) diff --git a/include/linux/pci.h b/include/linux/pci.h index aab57b4..5f6f8fa 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1129,6 +1129,19 @@ resource_size_t pcibios_window_alignment(struct pci_bus *bus, int pci_set_vga_state(struct pci_dev *pdev, bool decode, unsigned int command_bits, u32 flags); +/* helper functions for operation of device flag */ +static inline void pci_set_dev_assigned(struct pci_dev *pdev) +{ + pdev->dev_flags |= PCI_DEV_FLAGS_ASSIGNED; +} +static inline void pci_set_dev_deassigned(struct pci_dev *pdev) +{ + pdev->dev_flags &= ~PCI_DEV_FLAGS_ASSIGNED; +} I think pci_clear_dev_assigned() would make more sense, we're not setting a flag named DEASSIGNED. Though it is a flag operation now, may not later, we define it because we want to hide the internal operation. 'set' to 'deassigned' status is enough. So I would like keep it. +static inline bool pci_is_dev_assigned(struct pci_dev *pdev) +{ + return pdev->dev_flags & PCI_DEV_FLAGS_ASSIGNED ? true : false; +} The ternary operation isn't necessary. Thanks, Yep, return pdev->dev_flags & PCI_DEV_FLAGS_ASSIGNED is enough. Thanks, Ethan Alex /* kmem_cache style wrapper around pci_alloc_consistent() */ #include -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4 v2] PCI: introduce helper functions for device flag operation
This patch introduced three helper functions to hide direct device flag operation. void pci_set_dev_assigned(struct pci_dev *pdev); void pci_set_dev_deassigned(struct pci_dev *pdev); bool pci_is_dev_assigned(struct pci_dev *pdev); Signed-off-by: Ethan Zhao --- v2: simplify unnecessory ternary operation in function pci_is_dev_assigned(); include/linux/pci.h | 13 + 1 files changed, 13 insertions(+), 0 deletions(-) diff --git a/include/linux/pci.h b/include/linux/pci.h index aab57b4..5f6f8fa 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1129,6 +1129,19 @@ resource_size_t pcibios_window_alignment(struct pci_bus *bus, int pci_set_vga_state(struct pci_dev *pdev, bool decode, unsigned int command_bits, u32 flags); +/* helper functions for operation of device flag */ +static inline void pci_set_dev_assigned(struct pci_dev *pdev) +{ + pdev->dev_flags |= PCI_DEV_FLAGS_ASSIGNED; +} +static inline void pci_set_dev_deassigned(struct pci_dev *pdev) +{ + pdev->dev_flags &= ~PCI_DEV_FLAGS_ASSIGNED; +} +static inline bool pci_is_dev_assigned(struct pci_dev *pdev) +{ + return pdev->dev_flags & PCI_DEV_FLAGS_ASSIGNED; +} /* kmem_cache style wrapper around pci_alloc_consistent() */ #include -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] PCI: introduce helper functions for device flag operation
On Tue, 2014-07-29 at 09:53 +0800, ethan zhao wrote: > On 2014/7/29 5:00, Alex Williamson wrote: > > On Wed, 2014-07-23 at 00:19 +0800, Ethan Zhao wrote: > >> This patch introduced three helper functions to hide direct > >> device flag operation. > >> > >> void pci_set_dev_assigned(struct pci_dev *pdev); > >> void pci_set_dev_deassigned(struct pci_dev *pdev); > >> bool pci_is_dev_assigned(struct pci_dev *pdev); > >> > >> Signed-off-by: Ethan Zhao > >> --- > >> include/linux/pci.h | 13 + > >> 1 files changed, 13 insertions(+), 0 deletions(-) > >> > >> diff --git a/include/linux/pci.h b/include/linux/pci.h > >> index aab57b4..5f6f8fa 100644 > >> --- a/include/linux/pci.h > >> +++ b/include/linux/pci.h > >> @@ -1129,6 +1129,19 @@ resource_size_t pcibios_window_alignment(struct > >> pci_bus *bus, > >> > >> int pci_set_vga_state(struct pci_dev *pdev, bool decode, > >> unsigned int command_bits, u32 flags); > >> +/* helper functions for operation of device flag */ > >> +static inline void pci_set_dev_assigned(struct pci_dev *pdev) > >> +{ > >> + pdev->dev_flags |= PCI_DEV_FLAGS_ASSIGNED; > >> +} > >> +static inline void pci_set_dev_deassigned(struct pci_dev *pdev) > >> +{ > >> + pdev->dev_flags &= ~PCI_DEV_FLAGS_ASSIGNED; > >> +} > > I think pci_clear_dev_assigned() would make more sense, we're not > > setting a flag named DEASSIGNED. >Though it is a flag operation now, may not later, we define it > because we want to hide the internal operation. >'set' to 'deassigned' status is enough. So I would like keep it. I disagree, the opposite of a 'set' is a 'clear', or at least an 'unset'. Using bit-ops-like terminology doesn't lock us into an implementation. As written, this could just as easily be setting two different variables. > > > >> +static inline bool pci_is_dev_assigned(struct pci_dev *pdev) > >> +{ > >> + return pdev->dev_flags & PCI_DEV_FLAGS_ASSIGNED ? true : false; > >> +} > > The ternary operation isn't necessary. Thanks, >Yep, > >return pdev->dev_flags & PCI_DEV_FLAGS_ASSIGNED > >is enough. > >Thanks, >Ethan > > > > Alex > > > >> /* kmem_cache style wrapper around pci_alloc_consistent() */ > >> > >> #include > > > > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] PCI: introduce helper functions for device flag operation
On 2014/7/29 10:31, Alex Williamson wrote: On Tue, 2014-07-29 at 09:53 +0800, ethan zhao wrote: On 2014/7/29 5:00, Alex Williamson wrote: On Wed, 2014-07-23 at 00:19 +0800, Ethan Zhao wrote: This patch introduced three helper functions to hide direct device flag operation. void pci_set_dev_assigned(struct pci_dev *pdev); void pci_set_dev_deassigned(struct pci_dev *pdev); bool pci_is_dev_assigned(struct pci_dev *pdev); Signed-off-by: Ethan Zhao --- include/linux/pci.h | 13 + 1 files changed, 13 insertions(+), 0 deletions(-) diff --git a/include/linux/pci.h b/include/linux/pci.h index aab57b4..5f6f8fa 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1129,6 +1129,19 @@ resource_size_t pcibios_window_alignment(struct pci_bus *bus, int pci_set_vga_state(struct pci_dev *pdev, bool decode, unsigned int command_bits, u32 flags); +/* helper functions for operation of device flag */ +static inline void pci_set_dev_assigned(struct pci_dev *pdev) +{ + pdev->dev_flags |= PCI_DEV_FLAGS_ASSIGNED; +} +static inline void pci_set_dev_deassigned(struct pci_dev *pdev) +{ + pdev->dev_flags &= ~PCI_DEV_FLAGS_ASSIGNED; +} I think pci_clear_dev_assigned() would make more sense, we're not setting a flag named DEASSIGNED. Though it is a flag operation now, may not later, we define it because we want to hide the internal operation. 'set' to 'deassigned' status is enough. So I would like keep it. I disagree, the opposite of a 'set' is a 'clear', or at least an 'unset'. Using bit-ops-like terminology doesn't lock us into an implementation. As written, this could just as easily be setting two different variables. So there are two pairs of opposite: set assigned ---> unset assigned set assigned ---> set deassigned Here you prefer the 'verb' set /unset, and I prefer the 'adj.' assigned / deassigned. Do they really have different meaning or make confusion ? I don't think so. Thanks, Ethan +static inline bool pci_is_dev_assigned(struct pci_dev *pdev) +{ + return pdev->dev_flags & PCI_DEV_FLAGS_ASSIGNED ? true : false; +} The ternary operation isn't necessary. Thanks, Yep, return pdev->dev_flags & PCI_DEV_FLAGS_ASSIGNED is enough. Thanks, Ethan Alex /* kmem_cache style wrapper around pci_alloc_consistent() */ #include -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] PCI: introduce helper functions for device flag operation
On 07/28/2014 07:43 PM, ethan zhao wrote: > > On 2014/7/29 10:31, Alex Williamson wrote: >> On Tue, 2014-07-29 at 09:53 +0800, ethan zhao wrote: >>> On 2014/7/29 5:00, Alex Williamson wrote: On Wed, 2014-07-23 at 00:19 +0800, Ethan Zhao wrote: > This patch introduced three helper functions to hide direct > device flag operation. > > void pci_set_dev_assigned(struct pci_dev *pdev); > void pci_set_dev_deassigned(struct pci_dev *pdev); > bool pci_is_dev_assigned(struct pci_dev *pdev); > > Signed-off-by: Ethan Zhao > --- >include/linux/pci.h | 13 + >1 files changed, 13 insertions(+), 0 deletions(-) > > diff --git a/include/linux/pci.h b/include/linux/pci.h > index aab57b4..5f6f8fa 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -1129,6 +1129,19 @@ resource_size_t > pcibios_window_alignment(struct pci_bus *bus, > int pci_set_vga_state(struct pci_dev *pdev, bool decode, > unsigned int command_bits, u32 flags); > +/* helper functions for operation of device flag */ > +static inline void pci_set_dev_assigned(struct pci_dev *pdev) > +{ > +pdev->dev_flags |= PCI_DEV_FLAGS_ASSIGNED; > +} > +static inline void pci_set_dev_deassigned(struct pci_dev *pdev) > +{ > +pdev->dev_flags &= ~PCI_DEV_FLAGS_ASSIGNED; > +} I think pci_clear_dev_assigned() would make more sense, we're not setting a flag named DEASSIGNED. >>> Though it is a flag operation now, may not later, we define it >>> because we want to hide the internal operation. >>> 'set' to 'deassigned' status is enough. So I would like keep it. >> I disagree, the opposite of a 'set' is a 'clear', or at least an >> 'unset'. Using bit-ops-like terminology doesn't lock us into an >> implementation. As written, this could just as easily be setting two >> different variables. > So there are two pairs of opposite: > > set assigned ---> unset assigned > set assigned ---> set deassigned > > Here you prefer the 'verb' set /unset, and I prefer the 'adj.' assigned > / deassigned. > > Do they really have different meaning or make confusion ? I don't think > so. > > Thanks, > Ethan > I agree with Alex W. If you are going to use the "set" name you should probably use "clear" for the operation that undoes it. Using the term set implies that it is setting a bit and we currently don't have a deassigned/unassigned bit. If that doesn't work perhaps you could use something like get/put_assigned_device or acquire/release_assigned_device. You just need to describe what it is you are doing. You could even consider adding some tests to return an error if you attempt to assign a device that is already assigned. Thanks, Alex D -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] PCI: introduce helper functions for device flag operation
Both of you and Alex W prefer the 'Verb' , Ok, I accept the suggestion. Thanks, Ethan On Tue, Jul 29, 2014 at 11:37 AM, Alexander Duyck wrote: > On 07/28/2014 07:43 PM, ethan zhao wrote: >> >> On 2014/7/29 10:31, Alex Williamson wrote: >>> On Tue, 2014-07-29 at 09:53 +0800, ethan zhao wrote: On 2014/7/29 5:00, Alex Williamson wrote: > On Wed, 2014-07-23 at 00:19 +0800, Ethan Zhao wrote: >> This patch introduced three helper functions to hide direct >> device flag operation. >> >> void pci_set_dev_assigned(struct pci_dev *pdev); >> void pci_set_dev_deassigned(struct pci_dev *pdev); >> bool pci_is_dev_assigned(struct pci_dev *pdev); >> >> Signed-off-by: Ethan Zhao >> --- >>include/linux/pci.h | 13 + >>1 files changed, 13 insertions(+), 0 deletions(-) >> >> diff --git a/include/linux/pci.h b/include/linux/pci.h >> index aab57b4..5f6f8fa 100644 >> --- a/include/linux/pci.h >> +++ b/include/linux/pci.h >> @@ -1129,6 +1129,19 @@ resource_size_t >> pcibios_window_alignment(struct pci_bus *bus, >> int pci_set_vga_state(struct pci_dev *pdev, bool decode, >> unsigned int command_bits, u32 flags); >> +/* helper functions for operation of device flag */ >> +static inline void pci_set_dev_assigned(struct pci_dev *pdev) >> +{ >> +pdev->dev_flags |= PCI_DEV_FLAGS_ASSIGNED; >> +} >> +static inline void pci_set_dev_deassigned(struct pci_dev *pdev) >> +{ >> +pdev->dev_flags &= ~PCI_DEV_FLAGS_ASSIGNED; >> +} > I think pci_clear_dev_assigned() would make more sense, we're not > setting a flag named DEASSIGNED. Though it is a flag operation now, may not later, we define it because we want to hide the internal operation. 'set' to 'deassigned' status is enough. So I would like keep it. >>> I disagree, the opposite of a 'set' is a 'clear', or at least an >>> 'unset'. Using bit-ops-like terminology doesn't lock us into an >>> implementation. As written, this could just as easily be setting two >>> different variables. >> So there are two pairs of opposite: >> >> set assigned ---> unset assigned >> set assigned ---> set deassigned >> >> Here you prefer the 'verb' set /unset, and I prefer the 'adj.' assigned >> / deassigned. >> >> Do they really have different meaning or make confusion ? I don't think >> so. >> >> Thanks, >> Ethan >> > > I agree with Alex W. If you are going to use the "set" name you should > probably use "clear" for the operation that undoes it. Using the term > set implies that it is setting a bit and we currently don't have a > deassigned/unassigned bit. > > If that doesn't work perhaps you could use something like > get/put_assigned_device or acquire/release_assigned_device. You just > need to describe what it is you are doing. You could even consider > adding some tests to return an error if you attempt to assign a device > that is already assigned. > > Thanks, > > Alex D > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4 v3] xen-pciback: use pci device flag operation helper function
Use pci device flag operation helper functions when set device to assigned or deassigned state. Signed-off-by: Ethan Zhao --- v3: amend helper functions naming. drivers/xen/xen-pciback/pci_stub.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c index 62fcd48..71f69f1 100644 --- a/drivers/xen/xen-pciback/pci_stub.c +++ b/drivers/xen/xen-pciback/pci_stub.c @@ -133,7 +133,7 @@ static void pcistub_device_release(struct kref *kref) xen_pcibk_config_free_dyn_fields(dev); xen_pcibk_config_free_dev(dev); - dev->dev_flags &= ~PCI_DEV_FLAGS_ASSIGNED; + pci_clear_dev_assigned(dev); pci_dev_put(dev); kfree(psdev); @@ -404,7 +404,7 @@ static int pcistub_init_device(struct pci_dev *dev) dev_dbg(&dev->dev, "reset device\n"); xen_pcibk_reset_device(dev); - dev->dev_flags |= PCI_DEV_FLAGS_ASSIGNED; + pci_set_dev_assigned(dev); return 0; config_release: -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4 v3] PCI: use device flag operation helper function in iov.c
Use device flag operation helper functions when check device assignment status. Signed-off-by: Ethan Zhao --- v3: amend helper functions naming. drivers/pci/iov.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index de7a747..61fad36 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -633,7 +633,7 @@ int pci_vfs_assigned(struct pci_dev *dev) * our dev as the physical function and the assigned bit is set */ if (vfdev->is_virtfn && (vfdev->physfn == dev) && - (vfdev->dev_flags & PCI_DEV_FLAGS_ASSIGNED)) + pci_is_dev_assigned(vfdev)) vfs_assigned++; vfdev = pci_get_device(dev->vendor, dev_id, vfdev); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4 v3] PCI: introduce helper functions for device flag operation
This patch introduced three helper functions to hide direct device flag operation. void pci_set_dev_assigned(struct pci_dev *pdev); void pci_clear_dev_assigned(struct pci_dev *pdev); bool pci_is_dev_assigned(struct pci_dev *pdev); Signed-off-by: Ethan Zhao --- v2: simplify unnecessory ternary operation in function pci_is_dev_assigned(); v3: amend helper functions naming. include/linux/pci.h | 13 + 1 files changed, 13 insertions(+), 0 deletions(-) diff --git a/include/linux/pci.h b/include/linux/pci.h index aab57b4..5f6f8fa 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1129,6 +1129,19 @@ resource_size_t pcibios_window_alignment(struct pci_bus *bus, int pci_set_vga_state(struct pci_dev *pdev, bool decode, unsigned int command_bits, u32 flags); +/* helper functions for operation of device flag */ +static inline void pci_set_dev_assigned(struct pci_dev *pdev) +{ + pdev->dev_flags |= PCI_DEV_FLAGS_ASSIGNED; +} +static inline void pci_clear_dev_assigned(struct pci_dev *pdev) +{ + pdev->dev_flags &= ~PCI_DEV_FLAGS_ASSIGNED; +} +static inline bool pci_is_dev_assigned(struct pci_dev *pdev) +{ + return pdev->dev_flags & PCI_DEV_FLAGS_ASSIGNED; +} /* kmem_cache style wrapper around pci_alloc_consistent() */ #include -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4 v3] Introduce device assignment flag operation helper function
This patch set introduces three PCI device flag operation helper functions when set pci device PF/VF to assigned or deassigned status also check it. and patch 2,3,4 apply these helper functions to KVM,XEN and PCI. v2: simplify unnecessory ternary operation in function pci_is_dev_assigned(). v3: amend helper function naming. Appreciate suggestion from alex.william...@redhat.com, david.vra...@citrix.com, alexander.h.du...@intel.com BR, Ethan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4 v3] KVM: use pci device flag operation helper functions
Use helper function instead of direct operation to pci device flag when set device to assigned or deassigned. Signed-off-by: Ethan Zhao --- v3: amend helper functions naming. virt/kvm/assigned-dev.c |2 +- virt/kvm/iommu.c|4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c index bf06577..d122bda 100644 --- a/virt/kvm/assigned-dev.c +++ b/virt/kvm/assigned-dev.c @@ -302,7 +302,7 @@ static void kvm_free_assigned_device(struct kvm *kvm, else pci_restore_state(assigned_dev->dev); - assigned_dev->dev->dev_flags &= ~PCI_DEV_FLAGS_ASSIGNED; + pci_clear_dev_assigned(assigned_dev->dev); pci_release_regions(assigned_dev->dev); pci_disable_device(assigned_dev->dev); diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c index 0df7d4b..8cfe021 100644 --- a/virt/kvm/iommu.c +++ b/virt/kvm/iommu.c @@ -194,7 +194,7 @@ int kvm_assign_device(struct kvm *kvm, goto out_unmap; } - pdev->dev_flags |= PCI_DEV_FLAGS_ASSIGNED; + pci_set_dev_assigned(pdev); dev_info(&pdev->dev, "kvm assign device\n"); @@ -220,7 +220,7 @@ int kvm_deassign_device(struct kvm *kvm, iommu_detach_device(domain, &pdev->dev); - pdev->dev_flags &= ~PCI_DEV_FLAGS_ASSIGNED; + pci_clear_dev_assigned(pdev); dev_info(&pdev->dev, "kvm deassign device\n"); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Verifying Execution Integrity in Untrusted hypervisors
On 2014-07-28 23:17, Nakajima, Jun wrote: > On Mon, Jul 28, 2014 at 1:27 PM, Paolo Bonzini wrote: >> Il 28/07/2014 20:31, Jan Kiszka ha scritto: >>> The hypervisor has full control of and insight into the guest vCPU >>> state. Only protecting some portions of guest memory seems insufficient. >>> >>> We rather need encryption of every data that leaves the CPU or moves >>> from guest to host mode (and decryption the other way around). I guess >>> that would have quite some performance impact and is far from being easy >>> to integrate into modern processors. But, who knows... >> >> Intel SGX sounds somewhat like what you describe, but I'm not sure how >> it's going to be virtualized. >> > > Right. It's possible to virtualize (or pass-through) SGX without > losing the security feature. Interesting thing. Somehow missed this so far. Fairly complicated one, though. Still trying to wrap my head around how attestation practically works. > With SGX, you can create secure (encrypted) islands on processes in > VMs as well. But I'm not sure if it's useful for solving the problem > described. Huh? I thought remote attestation is a key feature of SGX? That is, to my understanding, what Shiva is looking for (though on current hardware, which remains infeasible unfortunately). Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html