Re: [patch 0/3] fixes for kvm on s390
On 06/23/2009 06:24 PM, Christian Borntraeger wrote: Hello Avi, here are three patches against kvm.git that fix several issues in our kvm port. Please review and consider all patches for 2.6.31. Applied all, and queued the stfle patch for 2.6.31. The commits referenced in patch 1 and 3 doesn't exist in 2.6.31. Please correct me if I misread things. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/3] fixes for kvm on s390
Am Mittwoch 24 Juni 2009 10:09:18 schrieben Sie: > On 06/23/2009 06:24 PM, Christian Borntraeger wrote: > > Hello Avi, > > > > here are three patches against kvm.git that fix several issues in our > > kvm port. Please review and consider all patches for 2.6.31. > > Applied all, and queued the stfle patch for 2.6.31. The commits > referenced in patch 1 and 3 doesn't exist in 2.6.31. Please correct me > if I misread things. Yes, the stfle issue is present on linus git and should go to Linus. Fixes 1 and 3 are only for your kvm.git-tree. They should go to Linus in as soon as you push the referenced commits to Linus. Christian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio_blk fails for rhel5.3 guest with LVM: kernel panic
On 06/23/2009 08:21 PM, sudhir kumar wrote: I see that Rhel5.3 32 and 64 bit guests fail to boot with virtio for block device. The guest can not find the root filesystem. The guest is using LVM. As per the linux-kvm wiki instructions I did change device.map and booted with if=virtio. The wiki link is http://www.linux-kvm.org/page/Boot_from_virtio_block_device Does your initrd include virtio-pci.ko and virtio-blk.ko? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/8 v3] KVM support for 1GB pages
On 06/19/2009 04:16 PM, Joerg Roedel wrote: Hi, this is the third version of the patches for KVM to support 1GB pages. Changes to the last version include: - changed reporting of 1GB page support to userspace to use SUPPORTED_CPUID ioctl - added support for 1GB pages to the software page walker code too This code still only can make use of 1GB pages with nested paging enabled. I will give the shadow paging code another debug round soon. Please comment or consider to apply these patches. Thanks, Joerg Shortlog: Joerg Roedel (8): hugetlbfs: export vma_kernel_pagsize to modules kvm: change memslot data structures for multiple hugepage sizes At Marcelo's suggestion I've applied the first two patches, to reduce your rebase load. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: remove in_range from kvm_io_device
On Tue, Jun 23, 2009 at 09:43:31PM -0400, Gregory Haskins wrote: > Michael S. Tsirkin wrote: > > Remove in_range from kvm_io_device and ask read/write callbacks, if > > supplied, to perform range checks internally. This allows aliasing > > (mostly for in-kernel virtio), as well as better error handling by > > making it possible to pass errors up to userspace. And it's enough to > > look at the diffstat to see that it's a better API anyway. > > > > While we are at it, document locking rules for kvm_io_device. > > > > Hi Michael, > > I just tried to apply this to kvm.git/master, and it blew up really > badly. What tree should I be using? Ugh, this is against 2.6.30. I'll post kvm.git version soon. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: x86: missing locking in PIT/IRQCHIP/SET_BSP_CPU ioctl paths
On 06/23/2009 09:05 PM, Marcelo Tosatti wrote: Correct missing locking in a few places in x86's vm_ioctl handling path. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/3] fixes for kvm on s390
On 06/24/2009 11:18 AM, Christian Bornträger wrote: Yes, the stfle issue is present on linus git and should go to Linus. Fixes 1 and 3 are only for your kvm.git-tree. They should go to Linus in as soon as you push the referenced commits to Linus. That'll be 2.6.32. I generally fold commits that fix other commits in the same pull. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/3] fixes for kvm on s390
Am Mittwoch 24 Juni 2009 11:18:32 schrieb Avi Kivity: > On 06/24/2009 11:18 AM, Christian Bornträger wrote: > > Yes, the stfle issue is present on linus git and should go to Linus. > > Fixes 1 and 3 are only for your kvm.git-tree. They should go to Linus in > > as soon as you push the referenced commits to Linus. > > That'll be 2.6.32. I generally fold commits that fix other commits in > the same pull. That would be fine with me. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio_blk fails for rhel5.3 guest with LVM: kernel panic
On 06/24/2009 12:41 PM, sudhir kumar wrote: initrd does not contain these modules. My kernel contains the above modules. Is it mandatory for the initrd to contain these modules? It does not contain virtio-net and the guest boots fine with virtio. virtio-net isn't used for locating the disks. If you want your root filesystem to use virtio, you must include these modules. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio_blk fails for rhel5.3 guest with LVM: kernel panic
initrd does not contain these modules. My kernel contains the above modules. Is it mandatory for the initrd to contain these modules? It does not contain virtio-net and the guest boots fine with virtio. On Wed, Jun 24, 2009 at 2:04 PM, Avi Kivity wrote: > On 06/23/2009 08:21 PM, sudhir kumar wrote: >> >> I see that Rhel5.3 32 and 64 bit guests fail to boot with virtio for >> block device. The >> guest can not find the root filesystem. The guest is using LVM. As per >> the linux-kvm wiki instructions I did change device.map and booted >> with if=virtio. >> The wiki link is >> http://www.linux-kvm.org/page/Boot_from_virtio_block_device >> >> > > Does your initrd include virtio-pci.ko and virtio-blk.ko? > > -- > error compiling committee.c: too many arguments to function > > -- Sudhir Kumar -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] introduce -cpu host target
On 06/23/2009 12:47 AM, Andre Przywara wrote: Should we ignore unhandled MSRs like QEMU or Xen do? Ignoring unhandled msrs is dangerous. If a write has some effect the guest depends on, and we're not emulating that effect, the guest will fail. Similarly if you don't know what a register mean, who knows what returning zero for a read will do. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] ignore AMDs HWCR register access to set the FFDIS bit
On 06/23/2009 01:20 AM, Andre Przywara wrote: Linux tries to disable the flush filter on all AMD K8 CPUs. Since KVM does not handle the needed MSR, the injected #GP will panic the Linux kernel. Ignore setting of the HWCR.FFDIS bit in this MSR to let Linux boot with an AMD K8 family guest CPU. Signed-off-by: Andre Przywara --- arch/x86/kvm/x86.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5a66bb9..4c19c24 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -796,6 +796,13 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) case MSR_EFER: set_efer(vcpu, data); break; + case MSR_K7_HWCR: + if (data != 0x40) { + pr_unimpl(vcpu, "unimplemented HWCR wrmsr: 0x%llx\n", + data); + return 1; + } + break; Won't that printk() if writing a zero? Just mask out that bit. I also see some HWCR handling in svm.c, can probably be removed. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: x86: verify MTRR/PAT validity
On 06/22/2009 09:27 PM, Marcelo Tosatti wrote: This in the BIOS is writing 1's into the reserved address bits into variable MTRR: wrmsr_smp(MTRRphysMask_MSR(0), ~(0x2000ull - 1) | 0x800); So i'll leave just memory type validity checking and MSR_MTRRdefType valid bit checks in for now: Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Cleanup cpu loop
On 06/22/2009 05:27 PM, Gleb Natapov wrote: Rearrange cpu loop to be (hopefully) more readable. Put difference between kernel/userspace irqchip in one place. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] ignore reads from AMDs C1E enabled MSR
If the Linux kernel detects an C1E capable AMD processor (K8 RevF and higher), it will access a certain MSR on every attempt to go to halt. Explicitly handle this read and return 0 to let KVM run a Linux guest with the native AMD host CPU propagated to the guest. Signed-off-by: Andre Przywara --- arch/x86/kvm/x86.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c5a0d37..a148f4c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1001,6 +1001,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) case MSR_P6_EVNTSEL0: case MSR_P6_EVNTSEL1: case MSR_K7_EVNTSEL0: + case MSR_K8_INT_PENDING_MSG: data = 0; break; case MSR_MTRRcap: -- 1.6.1.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] ignore PCI ECS I/O enablement
Linux guests will try to enable access to the extended PCI config space via the I/O ports 0xCF8/0xCFC on AMD Fam10h CPU. Since we (currently?) don't use ECS, simply ignore this write attempt. Signed-off-by: Andre Przywara --- arch/x86/kvm/x86.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a148f4c..e6e61ee 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -804,6 +804,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) return 1; } break; + case MSR_AMD64_NB_CFG: + break; case MSR_IA32_DEBUGCTLMSR: if (!data) { /* We support the non-activated case already */ -- 1.6.1.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] dont trim the guest's hypervisor CPUID bit in KVM if the guest requests it
Signed-off-by: Andre Przywara --- arch/x86/kvm/x86.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e6e61ee..6ad0f93 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1415,7 +1415,7 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, 0 /* Reserved */ | F(CX16) | 0 /* xTPR Update, PDCM */ | 0 /* Reserved, DCA */ | F(XMM4_1) | F(XMM4_2) | 0 /* x2APIC */ | F(MOVBE) | F(POPCNT) | - 0 /* Reserved, XSAVE, OSXSAVE */; + 0 /* Reserved, XSAVE, OSXSAVE */ | F(HYPERVISOR); /* cpuid 0x8001.ecx */ const u32 kvm_supported_word6_x86_features = F(LAHF_LM) | F(CMP_LEGACY) | F(SVM) | 0 /* ExtApicSpace */ | -- 1.6.1.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] ignore AMDs HWCR register access to set the FFDIS bit
Linux tries to disable the flush filter on all AMD K8 CPUs. Since KVM does not handle the needed MSR, the injected #GP will panic the Linux kernel. Ignore setting of the HWCR.FFDIS bit in this MSR to let Linux boot with an AMD K8 family guest CPU. Signed-off-by: Andre Przywara --- arch/x86/kvm/svm.c |1 - arch/x86/kvm/x86.c |8 2 files changed, 8 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 9b37043..610bcae 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2139,7 +2139,6 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data) break; case MSR_VM_CR: case MSR_VM_IGNNE: - case MSR_K7_HWCR: pr_unimpl(vcpu, "unimplemented wrmsr: 0x%x data 0x%llx\n", ecx, data); break; default: diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5a66bb9..c5a0d37 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -796,6 +796,14 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) case MSR_EFER: set_efer(vcpu, data); break; + case MSR_K7_HWCR: + data &= ~0x40U; /* ignore flush filter disable */ + if (data != 0) { + pr_unimpl(vcpu, "unimplemented HWCR wrmsr: 0x%llx\n", + data); + return 1; + } + break; case MSR_IA32_DEBUGCTLMSR: if (!data) { /* We support the non-activated case already */ -- 1.6.1.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] introduce -cpu host target
Avi Kivity wrote: On 06/23/2009 12:47 AM, Andre Przywara wrote: Should we ignore unhandled MSRs like QEMU or Xen do? Ignoring unhandled msrs is dangerous. If a write has some effect the guest depends on, and we're not emulating that effect, the guest will fail. Similarly if you don't know what a register mean, who knows what returning zero for a read will do. I agree - from an academic POV. But if the pragmatic approach simply enables many guests to run, then it's at least worth considering it. And with the current approach the guest fails, too (due to the injected #GP). If I only look at AMD's list of MSRs (not to speak of the internal list ;-), there will be a lot of work to emulate them. Even worse, most of them cannot be properly emulated (like disable Lock prefix). But nevertheless I would like to continue the "patch-on-demand" path by catching those MSRs that in-the-wild OSes really touch and handle them appropriately. Hopefully that will cover most of the MSRs. Maybe we could consider an (module? QEMU cmdline?) option to ignore unknown MSRs. Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 448 3567 12 to satisfy European Law for business letters: Advanced Micro Devices GmbH Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Thomas M. McCoy; Giuliano Meroni Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC] virtio_test: A module for testing virtio via userspace
Am Mittwoch 24 Juni 2009 05:40:34 schrieb Rusty Russell: > > o the general idea of a virtio_test module > > o the user interface ioctls > > o further ideas and comments > > Not mugging real drivers would be a requirement, I think. Ok, I try to find a proper way to avoid that virtio_test binds to devices that have real drivers available. That patch to virtio_dev_match should be relatively easy. The open question I have: Should virtio_test bind to a device if no other driver is (yet) available? > > +config VIRTIO_TEST > > + tristate "Virtio test driver (EXPERIMENTAL)" > > + select VIRTIO > > + select VIRTIO_RING > > Perhaps these should be depends? Plus, depends on EXPERIMENTAL. > > > +If unsure, say M. > > That's "N" I think. Yes. > > > + case VIOTEST_IOCGETBUF: > > + ret = do_get_buf(vtest, (struct viotest_getbuf __user *) arg); > > + break; > > + case VIOTEST_IOCGETCBS: > > + ret = get_callbacks(vtest, (struct viotest_cbinfo __user *) > > arg); > > + break; > > Generally the point of callbacks is to tell you you have new buffers; in > fact you're insulated from callbacks which don't show new buffers. So I'm > not sure these two need to be separate? > In which case, a read/write interface starts to make sense (write for > addbuf and kick, read for get_buf). That fits nicely with O_NONBLOCK and > poll(). Hmm - makes sense. I will try to propose a 2nd version of the interface. The interface must handle multiple virtqueues per device, should allow non-blocking mode etc. Lets see what ideas come to my mind. Thanks for the comments Christian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] introduce -cpu host target
On 06/24/2009 02:04 PM, Andre Przywara wrote: Avi Kivity wrote: On 06/23/2009 12:47 AM, Andre Przywara wrote: Should we ignore unhandled MSRs like QEMU or Xen do? Ignoring unhandled msrs is dangerous. If a write has some effect the guest depends on, and we're not emulating that effect, the guest will fail. Similarly if you don't know what a register mean, who knows what returning zero for a read will do. I agree - from an academic POV. But if the pragmatic approach simply enables many guests to run, then it's at least worth considering it. And with the current approach the guest fails, too (due to the injected #GP). If I only look at AMD's list of MSRs (not to speak of the internal list ;-), there will be a lot of work to emulate them. Even worse, most of them cannot be properly emulated (like disable Lock prefix). We don't need to emulate all values. We can allow default values or bits that don't matter for virtualization, and warn on bits that don't emulate properly. I'm not concerned about failure -- just silent failure. But nevertheless I would like to continue the "patch-on-demand" path by catching those MSRs that in-the-wild OSes really touch and handle them appropriately. Hopefully that will cover most of the MSRs. Maybe we could consider an (module? QEMU cmdline?) option to ignore unknown MSRs. Good idea. We can start with a module option, if it proves popular we can graduate it to a per-guest ioctl. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] dont trim the guest's hypervisor CPUID bit in KVM if the guest requests it
On 06/24/2009 01:44 PM, Andre Przywara wrote: Signed-off-by: Andre Przywara --- arch/x86/kvm/x86.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e6e61ee..6ad0f93 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1415,7 +1415,7 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, 0 /* Reserved */ | F(CX16) | 0 /* xTPR Update, PDCM */ | 0 /* Reserved, DCA */ | F(XMM4_1) | F(XMM4_2) | 0 /* x2APIC */ | F(MOVBE) | F(POPCNT) | - 0 /* Reserved, XSAVE, OSXSAVE */; + 0 /* Reserved, XSAVE, OSXSAVE */ | F(HYPERVISOR); I think this should be handled in qemu, since it isn't really a cpu bit. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] ignore PCI ECS I/O enablement
On 06/24/2009 01:44 PM, Andre Przywara wrote: Linux guests will try to enable access to the extended PCI config space via the I/O ports 0xCF8/0xCFC on AMD Fam10h CPU. Since we (currently?) don't use ECS, simply ignore this write attempt. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a148f4c..e6e61ee 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -804,6 +804,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) return 1; } break; + case MSR_AMD64_NB_CFG: + break; case MSR_IA32_DEBUGCTLMSR: if (!data) { /* We support the non-activated case already */ I see Linux does both rdmsr and wrmsr, don't we need to support both? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] ignore AMDs HWCR register access to set the FFDIS bit
On 06/24/2009 01:44 PM, Andre Przywara wrote: Linux tries to disable the flush filter on all AMD K8 CPUs. Since KVM does not handle the needed MSR, the injected #GP will panic the Linux kernel. Ignore setting of the HWCR.FFDIS bit in this MSR to let Linux boot with an AMD K8 family guest CPU. Applied first two patched, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2811454 ] CPU ID Emulation bug: AMD Athlon
Bugs item #2811454, was opened at 2009-06-24 14:40 Message generated for change (Tracker Item Submitted) made by technologov You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2811454&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: qemu Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Technologov (technologov) Assigned to: Nobody/Anonymous (nobody) Summary: CPU ID Emulation bug: AMD Athlon Initial Comment: Qemu incorrectly emulates CPU ID of AMD Athlon CPU. Namely: 1. It uses CPU Model of "Qemu CPU" 2. It shows SSE/SSE2 instructions as CPU flags, while AMD Athlons don't have it. 3. When running KVM on Intel CPU, it shows "CPU vendor" as Intel instead of AMD. In my opinion KVM should refuse to emulate AMD Athlon CPUID on Intel CPUs, due to lacking 3Dnow! (unless those instructions can be emulated) -Alexey, 24.6.2009. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2811454&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2811454 ] CPU ID Emulation bug: AMD Athlon
Bugs item #2811454, was opened at 2009-06-24 14:40 Message generated for change (Comment added) made by technologov You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2811454&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: qemu Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Technologov (technologov) Assigned to: Nobody/Anonymous (nobody) Summary: CPU ID Emulation bug: AMD Athlon Initial Comment: Qemu incorrectly emulates CPU ID of AMD Athlon CPU. Namely: 1. It uses CPU Model of "Qemu CPU" 2. It shows SSE/SSE2 instructions as CPU flags, while AMD Athlons don't have it. 3. When running KVM on Intel CPU, it shows "CPU vendor" as Intel instead of AMD. In my opinion KVM should refuse to emulate AMD Athlon CPUID on Intel CPUs, due to lacking 3Dnow! (unless those instructions can be emulated) -Alexey, 24.6.2009. -- >Comment By: Technologov (technologov) Date: 2009-06-24 14:41 Message: Qemu command: ./qemu-kvm -hda /isos/disks-vm/alexeye/fc9-32.qcow2 -snapshot -m 512 -cpu athlon -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2811454&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2811454 ] CPU ID Emulation bug: AMD Athlon
Bugs item #2811454, was opened at 2009-06-24 14:40 Message generated for change (Comment added) made by technologov You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2811454&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: qemu Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Technologov (technologov) Assigned to: Nobody/Anonymous (nobody) Summary: CPU ID Emulation bug: AMD Athlon Initial Comment: Qemu incorrectly emulates CPU ID of AMD Athlon CPU. Namely: 1. It uses CPU Model of "Qemu CPU" 2. It shows SSE/SSE2 instructions as CPU flags, while AMD Athlons don't have it. 3. When running KVM on Intel CPU, it shows "CPU vendor" as Intel instead of AMD. In my opinion KVM should refuse to emulate AMD Athlon CPUID on Intel CPUs, due to lacking 3Dnow! (unless those instructions can be emulated) -Alexey, 24.6.2009. -- >Comment By: Technologov (technologov) Date: 2009-06-24 14:57 Message: Tesed on KVM-87rc7 on Intel, AMD and -no-kvm (Qemu) -- Comment By: Technologov (technologov) Date: 2009-06-24 14:41 Message: Qemu command: ./qemu-kvm -hda /isos/disks-vm/alexeye/fc9-32.qcow2 -snapshot -m 512 -cpu athlon -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2811454&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] dont trim the guest's hypervisor CPUID bit in KVM if the guest requests it
Avi Kivity wrote: On 06/24/2009 01:44 PM, Andre Przywara wrote: Signed-off-by: Andre Przywara --- arch/x86/kvm/x86.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e6e61ee..6ad0f93 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1415,7 +1415,7 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, 0 /* Reserved */ | F(CX16) | 0 /* xTPR Update, PDCM */ | 0 /* Reserved, DCA */ | F(XMM4_1) | F(XMM4_2) | 0 /* x2APIC */ | F(MOVBE) | F(POPCNT) | -0 /* Reserved, XSAVE, OSXSAVE */; +0 /* Reserved, XSAVE, OSXSAVE */ | F(HYPERVISOR); I think this should be handled in qemu, since it isn't really a cpu bit. But this would require to make an exception for turning this bit on again after it has been trimmed. I just made (QEMU) patches for make this trimming really work (and removing all the hacked bits), so I would like to not spoil this again by introducing another excecption. After all this is more a list of what KVM does _not_ support (should we make this a negative one? gets ugly with the reserved bits), so I would like to leave it here. I will send out the patches to qemu-devel when I found the last bug. Regards, Andre. -- Andre Przywara AMD-OSRC (Dresden) Tel: x29712 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio-serial: A guest <-> host interface for simple communication
On (Wed) Jun 24 2009 [13:45:01], Rusty Russell wrote: > On Tue, 23 Jun 2009 10:12:31 pm Amit Shah wrote: > > Hello, > > > > Here are two patches. One implements a virtio-serial device in qemu > > and the other is the driver for a guest kernel. > > > > While working on a vmchannel interface that is needed for communication > > between guest userspace and host userspace, I saw that most of the > > interface can be abstracted out as a "serial" device with "ports". > > OK, I don't think the "naming" idea works though. A userspace user would > have > to open each one in turn to get its name. I'd stick with numbers. What if an ioctl were added to get the port number from the port name? Userspace would do ioctl(fd, VIRTIO_SERIAL_GET_PORT_FROM_NAME, &nr); sprintf(port, "/dev/vmch%s", nr); fd2 = open(port, ...); ? > You also don't have dynamic creation and removal, except by hotpluging the > entire device (which was on your requirements page). Actually we're more interested in hotplugging ports than the device itself ("Dynamic channel creation"). > what ports exist. Register on the change interrupt to get updates. Drop the > control vq entirely. If the ioctl mentioned above were added, it would justify the control vq, right? Thanks for the comments, Amit -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] dont trim the guest's hypervisor CPUID bit in KVM if the guest requests it
Andre Przywara wrote: Avi Kivity wrote: --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1415,7 +1415,7 @@ static void do_cpuid_ent(struct -0 /* Reserved, XSAVE, OSXSAVE */; +0 /* Reserved, XSAVE, OSXSAVE */ | F(HYPERVISOR); I think this should be handled in qemu, since it isn't really a cpu bit. But this would require to make an exception for turning this bit on again after it has been trimmed. I just made (QEMU) patches for make this trimming really work (and removing all the hacked bits), so I would like to not spoil this again by introducing another exception. I kind of tricked myself here. Since the hypervisor bit is (well, mostly) not set in the host, it will always be removed from the guest. So we need an exception for this anyway. So please drop this patch, I will come up with some QEMU solution for this. Regards, Andre. After all this is more a list of what KVM does _not_ support (should we make this a negative one? gets ugly with the reserved bits), so I would like to leave it here. I will send out the patches to qemu-devel when I found the last bug. -- Andre Przywara AMD-OSRC (Dresden) Tel: x29712 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] ignore PCI ECS I/O enablement
Linux guests will try to enable access to the extended PCI config space via the I/O ports 0xCF8/0xCFC on AMD Fam10h CPU. Since we (currently?) don't use ECS, simply ignore write and read attempts. Signed-off-by: Andre Przywara --- arch/x86/kvm/x86.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a148f4c..c717037 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -804,6 +804,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) return 1; } break; + case MSR_AMD64_NB_CFG: + break; case MSR_IA32_DEBUGCTLMSR: if (!data) { /* We support the non-activated case already */ @@ -1002,6 +1004,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) case MSR_P6_EVNTSEL1: case MSR_K7_EVNTSEL0: case MSR_K8_INT_PENDING_MSG: + case MSR_AMD64_NB_CFG: data = 0; break; case MSR_MTRRcap: -- 1.6.1.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
BUG: using smp_processor_id() in preemptible
Hi, I'm trying to run a test environment in kvm (because uml doesn't have lockdep), and am running into the following problems: 1) I get the $subject warning a lot, when starting kvm: [85763.262707] BUG: using smp_processor_id() in preemptible [] code: kvm/13877 [85763.262719] caller is kvm_write_guest_time+0x40/0x220 [kvm] [85763.262722] Pid: 13877, comm: kvm Not tainted 2.6.30-wl-26837-g0ee651a-dirty #54 [85763.262725] Call Trace: [85763.262729] [] debug_smp_processor_id+0xf2/0x100 [85763.262741] [] kvm_write_guest_time+0x40/0x220 [kvm] [85763.262753] [] vcpu_enter_guest+0x320/0x580 [kvm] [85763.262780] [] __vcpu_run+0x74/0x2f0 [kvm] [85763.262792] [] kvm_arch_vcpu_ioctl_run+0x8f/0x200 [kvm] [85763.262804] [] kvm_vcpu_ioctl+0x4b8/0x900 [kvm] [85763.262816] [] vfs_ioctl+0x36/0xb0 [85763.262819] [] do_vfs_ioctl+0x89/0x320 [85763.262826] [] sys_ioctl+0x4f/0x80 [85763.262830] [] system_call_fastpath+0x16/0x1b That kernel version is wireless-testing, which is currently based on v2.6.30, and the -dirty is for some wireless patches I did. 2) The second problem is that it doesn't actually work. I use this command line: kvm -kernel arch/x86_64/boot/bzImage \ -hda ../uml/Ubuntu-IntrepidIbex-amd64-root_fs \ -append "root=/dev/hda console=ttyS0" -curses and the system hangs after Plex86/Bochs VGABios (PCI) current-cvs 12 Jun 2009 This VGA/VBE Bios is released under the GNU LGPL Please visit : . http://bochs.sourceforge.net . http://www.nongnu.org/vgabios cirrus-compatible VGA is detected QEMU BIOS - build: 06/12/09 $Revision: 1.182 $ $Date: 2007/08/01 17:09:51 $ Options: apmbios pcibios eltorito rombios32 ata0 master: QEMU HARDDISK ATA-7 Hard-Disk (1024 MBytes) ata1 master: QEMU DVD-ROM ATAPI-4 CD-Rom/DVD-Rom Press F12 for boot menu. Decompressing Linux... Parsing ELF... done. Booting the kernel. The guest kernel is the same as the host, but with somewhat different config options. The strange thing here is that the exact same command line, with qemu-system-x86_64 instead of kvm works perfectly. johannes signature.asc Description: This is a digitally signed message part
ia64: why is mmio bus access lockless?
Hi, arch/ia64/kvm/kvm-ia64.c does this: mmio_dev = vcpu_find_mmio_dev(vcpu, p->addr, p->size, !p->dir); if (mmio_dev) { if (!p->dir) kvm_iodevice_write(mmio_dev, p->addr, p->size, &p->data); else kvm_iodevice_read(mmio_dev, p->addr, p->size, &p->data); without holding kvm lock. I know that find/read/write on the bus require locking on x86 - are they safe to do without locking on ia64? -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] virtio-serial: A guest <-> host interface for simple communication
Amit Shah wrote: > A few sample uses for a vmchannel are to share the host and guest > clipboards (to allow copy/paste between a host and a guest), to > lock the screen of the guest session when the vnc viewer is closed, > to find out which applications are installed on a guest OS even when > the guest is powered down (using virt-inspector) and so on. Those all look like useful features. Can you run an application to provide those features on a guest which _doesn't_ have a vmchannel/virtio-serial support in the kernel? Or will it be restricted only to guests which have QEMU-specific support in their kernel? -- Jamie -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH 2/2] introduce -cpu host target
Andre Przywara wrote: > Even worse, most of them cannot be properly emulated (like disable > Lock prefix). Disable Lock prefix should be easy to emulate by ignoring it, shouldn't it? :-) -- Jamie -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] support colon in filenames
Problem: It is impossible to feed filenames with the character colon because qemu interprets such names as a protocol. For example a filename scsi:0, is interpreted as a protocol by name "scsi". This patch allows users to espace colon characters. For example the above filename can now be expressed as 'scsi\:0' Here are couple of examples: ndb:\:: is treated as a ndb protocol with a hostname ':' on port scsi\:0\:abc is a local file scsi:0:abc http\://myweb is a local file by name http://myweb nbd\::localhost:2558 is a protocol by name nbd: Signed-off-by: Ram Pai --- block.c | 26 +- block/nbd.c | 19 ++- block/raw-posix.c | 24 +--- cutils.c | 22 ++ qemu-common.h |1 + vl.c |3 +-- 6 files changed, 72 insertions(+), 23 deletions(-) diff --git a/block.c b/block.c index aca5a6d..80bded9 100644 --- a/block.c +++ b/block.c @@ -225,22 +225,30 @@ static BlockDriver *find_protocol(const char *filename) { BlockDriver *drv1; char protocol[128]; -int len; -const char *p; +char *p = protocol; +const char *f=filename; +int len = strnlen(filename, 128); #ifdef _WIN32 if (is_windows_drive(filename) || is_windows_drive_prefix(filename)) return bdrv_find_format("raw"); #endif -p = strchr(filename, ':'); -if (!p) +while ( f < filename+len ) { + if ( *f == ':' ) + break; + if ( *f == '\\') { + f++; + if ( *f == '\0') + break; + } + *p++ = *f++; +} +*p='\0'; + +if (*f != ':') return bdrv_find_format("raw"); -len = p - filename; -if (len > sizeof(protocol) - 1) -len = sizeof(protocol) - 1; -memcpy(protocol, filename, len); -protocol[len] = '\0'; + for(drv1 = first_drv; drv1 != NULL; drv1 = drv1->next) { if (drv1->protocol_name && !strcmp(drv1->protocol_name, protocol)) diff --git a/block/nbd.c b/block/nbd.c index 47d4778..a011cc7 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -64,18 +64,27 @@ static int nbd_open(BlockDriverState *bs, const char* filename, int flags) } else { uint16_t port; -char *p, *r; +char *q, *p, *r; char hostname[128]; pstrcpy(hostname, 128, host); -p = strchr(hostname, ':'); -if (p == NULL) + q=p=hostname; + while ( p < hostname+128 ) { + if (*p == ':') + break; + if ( *p == '\\' ) { + p++; + if (*p =='\0') + break; + } + *q++=*p++; + } +if (*p != ':') return -EINVAL; -*p = '\0'; + *q='\0'; p++; - port = strtol(p, &r, 0); if (r == p) return -EINVAL; diff --git a/block/raw-posix.c b/block/raw-posix.c index 41bfa37..98ede17 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -124,6 +124,16 @@ static int fd_open(BlockDriverState *bs); static int cdrom_reopen(BlockDriverState *bs); #endif +static int _open(const char *filename, int flags, ...) +{ + char myfile[PATH_MAX]; + va_list ap; + va_start(ap, flags); + return open(esc_string(myfile, PATH_MAX, filename), + flags, ap); +} + + static int raw_open_common(BlockDriverState *bs, const char *filename, int flags) { @@ -151,7 +161,7 @@ static int raw_open_common(BlockDriverState *bs, const char *filename, s->open_flags |= O_DSYNC; s->fd = -1; -fd = open(filename, s->open_flags, 0644); +fd = _open(filename, s->open_flags, 0644); if (fd < 0) { ret = -errno; if (ret == -EROFS) @@ -844,7 +854,7 @@ static int raw_create(const char *filename, QEMUOptionParameter *options) options++; } -fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, +fd = _open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, 0644); if (fd < 0) return -EIO; @@ -985,7 +995,7 @@ static int hdev_open(BlockDriverState *bs, const char *filename, int flags) if ( bsdPath[ 0 ] != '\0' ) { strcat(bsdPath,"s0"); /* some CDs don't have a partition 0 */ -fd = open(bsdPath, O_RDONLY | O_BINARY | O_LARGEFILE); +fd = _open(bsdPath, O_RDONLY | O_BINARY | O_LARGEFILE); if (fd < 0) { bsdPath[strlen(bsdPath)-1] = '1'; } else { @@ -1037,7 +1047,7 @@ static int fd_open(BlockDriverState *bs) #endif return -EIO; } -s->fd = open(bs->filename, s->open_flags & ~O_NONBLOCK); +s->fd = _open(bs->filename, s->open_flags & ~O_NONBLOCK);
Re: [PATCH] support colon in filenames
* Ram Pai [2009-06-24 09:58:59]: > Problem: It is impossible to feed filenames with the character colon because > qemu interprets such names as a protocol. For example a filename scsi:0, > is interpreted as a protocol by name "scsi". > > This patch allows users to espace colon characters. For example the above > filename > can now be expressed as 'scsi\:0' > > Here are couple of examples: > > ndb:\:: is treated as a ndb protocol with a hostname ':' on port > scsi\:0\:abc is a local file scsi:0:abc > http\://myweb is a local file by name http://myweb > nbd\::localhost:2558 is a protocol by name nbd: > > Signed-off-by: Ram Pai Are colons useful for filenames? Is there a common use case for this? -- Balbir -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] virtio-serial: A guest <-> host interface for simple communication
On (Wed) Jun 24 2009 [17:40:49], Jamie Lokier wrote: > Amit Shah wrote: > > A few sample uses for a vmchannel are to share the host and guest > > clipboards (to allow copy/paste between a host and a guest), to > > lock the screen of the guest session when the vnc viewer is closed, > > to find out which applications are installed on a guest OS even when > > the guest is powered down (using virt-inspector) and so on. > > Those all look like useful features. > > Can you run an application to provide those features on a guest which > _doesn't_ have a vmchannel/virtio-serial support in the kernel? > > Or will it be restricted only to guests which have QEMU-specific > support in their kernel? libguestfs currently uses the -net user based vmchannel interface that exists in current qemu. That doesn't need a kernel that doesn't have support for virtio-serial. Amit -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] support colon in filenames
On (Wed) Jun 24 2009 [09:58:59], Ram Pai wrote: > Problem: It is impossible to feed filenames with the character colon because > qemu interprets such names as a protocol. For example a filename scsi:0, > is interpreted as a protocol by name "scsi". > > This patch allows users to espace colon characters. For example the above > filename > can now be expressed as 'scsi\:0' Eduardo (CC'ed) had a patch as well: http://thread.gmane.org/gmane.comp.emulators.qemu/39229/focus=39229 Amit -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] support colon in filenames
(Actually add Eduardo to CC) On (Wed) Jun 24 2009 [22:56:59], Amit Shah wrote: > On (Wed) Jun 24 2009 [09:58:59], Ram Pai wrote: > > Problem: It is impossible to feed filenames with the character colon > > because > > qemu interprets such names as a protocol. For example a filename scsi:0, > > is interpreted as a protocol by name "scsi". > > > > This patch allows users to espace colon characters. For example the above > > filename > > can now be expressed as 'scsi\:0' > > Eduardo (CC'ed) had a patch as well: > > http://thread.gmane.org/gmane.comp.emulators.qemu/39229/focus=39229 Amit -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] support colon in filenames
On Wed, 2009-06-24 at 22:38 +0530, Balbir Singh wrote: > * Ram Pai [2009-06-24 09:58:59]: > > > Problem: It is impossible to feed filenames with the character colon > > because > > qemu interprets such names as a protocol. For example a filename scsi:0, > > is interpreted as a protocol by name "scsi". > > > > This patch allows users to espace colon characters. For example the above > > filename > > can now be expressed as 'scsi\:0' > > > > Here are couple of examples: > > > > ndb:\:: is treated as a ndb protocol with a hostname ':' on port > > scsi\:0\:abc is a local file scsi:0:abc > > http\://myweb is a local file by name http://myweb > > nbd\::localhost:2558 is a protocol by name nbd: > > > > Signed-off-by: Ram Pai > > Are colons useful for filenames? Is there a common use case for this? Yes. files like /dev/disk/by-path/pci-:0b:00.0-sas-phy0:1-0x5000c5000c14b41d:0-lun0-part3 RP -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH 2/2] introduce -cpu host target
On Wed, Jun 24, 2009 at 11:54 AM, Avi Kivity wrote: > On 06/23/2009 12:47 AM, Andre Przywara wrote: >> >> Should we ignore unhandled MSRs like QEMU or Xen do? >> > > Ignoring unhandled msrs is dangerous. If a write has some effect the guest > depends on, and we're not emulating that effect, the guest will fail. > Similarly if you don't know what a register mean, who knows what returning > zero for a read will do. It is definitely a bad idea to ignore unknown MSRs. Kernel patch protection scheme used by certain operating system depend on them to work properly and it's pretty hard to debug when you don't know what failed (the MSR read in this case). http://www.uninformed.org/?v=3&a=3 http://www.uninformed.org/?v=6&a=1 http://www.uninformed.org/?v=8&a=5 http://en.wikipedia.org/wiki/Kernel_Patch_Protection Best regards, Filip Navara -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH 2/2] introduce -cpu host target
On 06/24/2009 08:37 PM, Filip Navara wrote: On Wed, Jun 24, 2009 at 11:54 AM, Avi Kivity wrote: On 06/23/2009 12:47 AM, Andre Przywara wrote: Should we ignore unhandled MSRs like QEMU or Xen do? Ignoring unhandled msrs is dangerous. If a write has some effect the guest depends on, and we're not emulating that effect, the guest will fail. Similarly if you don't know what a register mean, who knows what returning zero for a read will do. It is definitely a bad idea to ignore unknown MSRs. Kernel patch protection scheme used by certain operating system depend on them to work properly and it's pretty hard to debug when you don't know what failed (the MSR read in this case). http://www.uninformed.org/?v=3&a=3 http://www.uninformed.org/?v=6&a=1 http://www.uninformed.org/?v=8&a=5 http://en.wikipedia.org/wiki/Kernel_Patch_Protection Which unknown msrs are used by kernel patch protection? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] virtio-serial: A guest <-> host interface for simple communication
Amit Shah wrote: > On (Wed) Jun 24 2009 [17:40:49], Jamie Lokier wrote: > > Amit Shah wrote: > > > A few sample uses for a vmchannel are to share the host and guest > > > clipboards (to allow copy/paste between a host and a guest), to > > > lock the screen of the guest session when the vnc viewer is closed, > > > to find out which applications are installed on a guest OS even when > > > the guest is powered down (using virt-inspector) and so on. > > > > Those all look like useful features. > > > > Can you run an application to provide those features on a guest which > > _doesn't_ have a vmchannel/virtio-serial support in the kernel? > > > > Or will it be restricted only to guests which have QEMU-specific > > support in their kernel? > > libguestfs currently uses the -net user based vmchannel interface that > exists in current qemu. That doesn't need a kernel that doesn't have > support for virtio-serial. That's great! If that works fine, and guest apps/libraries are using that as a fallback anyway, what benefit do they get from switching to virtio-serial when they detect that instead, given they still have code for the -net method? Is the plan to remove -net user based support from libguestfs? Is virtio-serial significantly simpler to use? -- Jamie -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] support colon in filenames
Copying the qemu-devel mailing list too. On Wed, 2009-06-24 at 09:58 -0700, Ram Pai wrote: > Problem: It is impossible to feed filenames with the character colon because > qemu interprets such names as a protocol. For example a filename scsi:0, > is interpreted as a protocol by name "scsi". > > This patch allows users to espace colon characters. For example the above > filename > can now be expressed as 'scsi\:0' > > Here are couple of examples: > > ndb:\:: is treated as a ndb protocol with a hostname ':' on port > scsi\:0\:abc is a local file scsi:0:abc > http\://myweb is a local file by name http://myweb > nbd\::localhost:2558 is a protocol by name nbd: > > Signed-off-by: Ram Pai > --- > > > block.c | 26 +- > block/nbd.c | 19 ++- > block/raw-posix.c | 24 +--- > cutils.c | 22 ++ > qemu-common.h |1 + > vl.c |3 +-- > 6 files changed, 72 insertions(+), 23 deletions(-) > > diff --git a/block.c b/block.c > index aca5a6d..80bded9 100644 > --- a/block.c > +++ b/block.c > @@ -225,22 +225,30 @@ static BlockDriver *find_protocol(const char *filename) > { > BlockDriver *drv1; > char protocol[128]; > -int len; > -const char *p; > +char *p = protocol; > +const char *f=filename; > +int len = strnlen(filename, 128); > > #ifdef _WIN32 > if (is_windows_drive(filename) || > is_windows_drive_prefix(filename)) > return bdrv_find_format("raw"); > #endif > -p = strchr(filename, ':'); > -if (!p) > +while ( f < filename+len ) { > + if ( *f == ':' ) > + break; > + if ( *f == '\\') { > + f++; > + if ( *f == '\0') > + break; > + } > + *p++ = *f++; > +} > +*p='\0'; > + > +if (*f != ':') > return bdrv_find_format("raw"); > -len = p - filename; > -if (len > sizeof(protocol) - 1) > -len = sizeof(protocol) - 1; > -memcpy(protocol, filename, len); > -protocol[len] = '\0'; > + > for(drv1 = first_drv; drv1 != NULL; drv1 = drv1->next) { > if (drv1->protocol_name && > !strcmp(drv1->protocol_name, protocol)) > diff --git a/block/nbd.c b/block/nbd.c > index 47d4778..a011cc7 100644 > --- a/block/nbd.c > +++ b/block/nbd.c > @@ -64,18 +64,27 @@ static int nbd_open(BlockDriverState *bs, const char* > filename, int flags) > > } else { > uint16_t port; > -char *p, *r; > +char *q, *p, *r; > char hostname[128]; > > pstrcpy(hostname, 128, host); > > -p = strchr(hostname, ':'); > -if (p == NULL) > + q=p=hostname; > + while ( p < hostname+128 ) { > + if (*p == ':') > + break; > + if ( *p == '\\' ) { > + p++; > + if (*p =='\0') > + break; > + } > + *q++=*p++; > + } > +if (*p != ':') > return -EINVAL; > > -*p = '\0'; > + *q='\0'; > p++; > - > port = strtol(p, &r, 0); > if (r == p) > return -EINVAL; > diff --git a/block/raw-posix.c b/block/raw-posix.c > index 41bfa37..98ede17 100644 > --- a/block/raw-posix.c > +++ b/block/raw-posix.c > @@ -124,6 +124,16 @@ static int fd_open(BlockDriverState *bs); > static int cdrom_reopen(BlockDriverState *bs); > #endif > > +static int _open(const char *filename, int flags, ...) > +{ > + char myfile[PATH_MAX]; > + va_list ap; > + va_start(ap, flags); > + return open(esc_string(myfile, PATH_MAX, filename), > + flags, ap); > +} > + > + > static int raw_open_common(BlockDriverState *bs, const char *filename, > int flags) > { > @@ -151,7 +161,7 @@ static int raw_open_common(BlockDriverState *bs, const > char *filename, > s->open_flags |= O_DSYNC; > > s->fd = -1; > -fd = open(filename, s->open_flags, 0644); > +fd = _open(filename, s->open_flags, 0644); > if (fd < 0) { > ret = -errno; > if (ret == -EROFS) > @@ -844,7 +854,7 @@ static int raw_create(const char *filename, > QEMUOptionParameter *options) > options++; > } > > -fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, > +fd = _open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, >0644); > if (fd < 0) > return -EIO; > @@ -985,7 +995,7 @@ static int hdev_open(BlockDriverState *bs, const char > *filename, int flags) > if ( bsdPath[ 0 ] != '\0' ) { > strcat(bsdPath,"s0"); > /* some CDs don't have a partition 0 */ > -fd = open(bsdPath, O_RDONLY | O_BINARY | O_LARGEFILE); > +fd = _open(bsdPath, O_RDONLY | O_BINARY | O_LARG
Re: [Qemu-devel] Re: [PATCH 2/2] introduce -cpu host target
On Wed, Jun 24, 2009 at 7:46 PM, Avi Kivity wrote: > On 06/24/2009 08:37 PM, Filip Navara wrote: >> >> On Wed, Jun 24, 2009 at 11:54 AM, Avi Kivity wrote: >> >>> >>> On 06/23/2009 12:47 AM, Andre Przywara wrote: >>> Should we ignore unhandled MSRs like QEMU or Xen do? >>> >>> Ignoring unhandled msrs is dangerous. If a write has some effect the >>> guest >>> depends on, and we're not emulating that effect, the guest will fail. >>> Similarly if you don't know what a register mean, who knows what >>> returning >>> zero for a read will do. >>> >> >> It is definitely a bad idea to ignore unknown MSRs. Kernel patch >> protection scheme used by certain operating system depend on them to >> work properly and it's pretty hard to debug when you don't know what >> failed (the MSR read in this case). >> >> http://www.uninformed.org/?v=3&a=3 >> http://www.uninformed.org/?v=6&a=1 >> http://www.uninformed.org/?v=8&a=5 >> http://en.wikipedia.org/wiki/Kernel_Patch_Protection >> >> > > Which unknown msrs are used by kernel patch protection? It's a moving target. At the time I first got Win64 running on QEMU it was the one for getting number of implemented virtual address bits (0x8008 iirc) and some other for getting cache sizes (0x8005/0x8006 iirc). Both of them were documented in AMD manuals and not implemented by QEMU. Also the higher bits of virtual addresses must be treated as sign-extended (as per the information in the 0x8008 MSR) even though there are actually bits stored in the address. Me and Alex Ionescu have spent considerable time by reversing the PatchGuard v1 and that information is described in more detail in the first link above. I haven't looked at PatchGuard v2/v3 yet. Best regards, Filip Navara -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] virtio-serial: A guest <-> host interface for simple communication
On (Wed) Jun 24 2009 [18:50:02], Jamie Lokier wrote: > Amit Shah wrote: > > On (Wed) Jun 24 2009 [17:40:49], Jamie Lokier wrote: > > > Amit Shah wrote: > > > > A few sample uses for a vmchannel are to share the host and guest > > > > clipboards (to allow copy/paste between a host and a guest), to > > > > lock the screen of the guest session when the vnc viewer is closed, > > > > to find out which applications are installed on a guest OS even when > > > > the guest is powered down (using virt-inspector) and so on. > > > > > > Those all look like useful features. > > > > > > Can you run an application to provide those features on a guest which > > > _doesn't_ have a vmchannel/virtio-serial support in the kernel? > > > > > > Or will it be restricted only to guests which have QEMU-specific > > > support in their kernel? > > > > libguestfs currently uses the -net user based vmchannel interface that > > exists in current qemu. That doesn't need a kernel that doesn't have > > support for virtio-serial. > > That's great! > > If that works fine, and guest apps/libraries are using that as a > fallback anyway, what benefit do they get from switching to > virtio-serial when they detect that instead, given they still have > code for the -net method? Speed is the biggest benefit. > Is the plan to remove -net user based support from libguestfs? I don't know what Richard's plan is, but if the kernel that libguestfs deploys in the appliance gains support for virtio-serial, there's no reason it shouldn't switch. > Is virtio-serial significantly simpler to use? I think the interface from the guest POV stays the same: reads / writes to char devices. With virtio-serial, though, we can add a few other interesting things like names to ports, ability to hot-add ports on demand, request notifications when either end goes down, etc. Amit -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Fix non-KVM build
This introduces some #ifdefs in pcspk to fix the build when KVM isn't enabled. Signed-off-by: Anthony Liguori --- hw/pcspk.c | 15 +-- 1 files changed, 9 insertions(+), 6 deletions(-) diff --git a/hw/pcspk.c b/hw/pcspk.c index 9e1b59a..236995a 100644 --- a/hw/pcspk.c +++ b/hw/pcspk.c @@ -80,11 +80,6 @@ static void kvm_set_pit_ch2(PITState *pit, kvm_set_pit(kvm_context, inkernel_state); } } -#else -static inline void kvm_get_pit_ch2(PITState *pit, - kvm_pit_state *inkernel_state) { } -static inline void kvm_set_pit_ch2(PITState *pit, - kvm_pit_state *inkernel_state) { } #endif static inline void generate_samples(PCSpkState *s) @@ -111,7 +106,9 @@ static void pcspk_callback(void *opaque, int free) PCSpkState *s = opaque; unsigned int n; +#ifdef USE_KVM_PIT kvm_get_pit_ch2(s->pit, NULL); +#endif if (pit_get_mode(s->pit, 2) != 3) return; @@ -158,7 +155,9 @@ static uint32_t pcspk_ioport_read(void *opaque, uint32_t addr) PCSpkState *s = opaque; int out; +#ifdef USE_KVM_PIT kvm_get_pit_ch2(s->pit, NULL); +#endif s->dummy_refresh_clock ^= (1 << 4); out = pit_get_out(s->pit, 2, qemu_get_clock(vm_clock)) << 5; @@ -168,11 +167,13 @@ static uint32_t pcspk_ioport_read(void *opaque, uint32_t addr) static void pcspk_ioport_write(void *opaque, uint32_t addr, uint32_t val) { -struct kvm_pit_state inkernel_state; PCSpkState *s = opaque; const int gate = val & 1; +#ifdef USE_KVM_PIT +struct kvm_pit_state inkernel_state; kvm_get_pit_ch2(s->pit, &inkernel_state); +#endif s->data_on = (val >> 1) & 1; pit_set_gate(s->pit, 2, gate); @@ -182,7 +183,9 @@ static void pcspk_ioport_write(void *opaque, uint32_t addr, uint32_t val) AUD_set_active_out(s->voice, gate & s->data_on); } +#ifdef USE_KVM_PIT kvm_set_pit_ch2(s->pit, &inkernel_state); +#endif } void pcspk_init(PITState *pit) -- 1.6.2.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] support colon in filenames
* Ram Pai [2009-06-24 10:30:00]: > On Wed, 2009-06-24 at 22:38 +0530, Balbir Singh wrote: > > * Ram Pai [2009-06-24 09:58:59]: > > > > > Problem: It is impossible to feed filenames with the character colon > > > because > > > qemu interprets such names as a protocol. For example a filename scsi:0, > > > is interpreted as a protocol by name "scsi". > > > > > > This patch allows users to espace colon characters. For example the above > > > filename > > > can now be expressed as 'scsi\:0' > > > > > > Here are couple of examples: > > > > > > ndb:\:: is treated as a ndb protocol with a hostname ':' on port > > > scsi\:0\:abc is a local file scsi:0:abc > > > http\://myweb is a local file by name http://myweb > > > nbd\::localhost:2558 is a protocol by name nbd: > > > > > > Signed-off-by: Ram Pai > > > > Are colons useful for filenames? Is there a common use case for this? > > Yes. files like > > /dev/disk/by-path/pci-:0b:00.0-sas-phy0:1-0x5000c5000c14b41d:0-lun0-part3 > Yes, that does make sense. Thanks! -- Balbir -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix non-KVM build
On Wed, Jun 24, 2009 at 1:13 PM, Anthony Liguori wrote: > This introduces some #ifdefs in pcspk to fix the build when KVM isn't enabled. > > Signed-off-by: Anthony Liguori > --- > hw/pcspk.c | 15 +-- > 1 files changed, 9 insertions(+), 6 deletions(-) ... Thanks, Anthony. This patch does solve my pcspk build breakage, and gets me a bit further in the build, before: ... CCppc-softmmu/disas.o CCppc-softmmu/i386-dis.o CCppc-softmmu/ppc-dis.o ARppc-softmmu/libqemu.a LINK ppc-softmmu/qemu-system-ppc machine.o: In function `cpu_load': /tmp/tmp.wibDyxlHPK/qemu-kvm-0.10.50.20090624132948/target-ppc/machine.c:180: undefined reference to `cpu_synchronize_state' machine.o: In function `cpu_save': /tmp/tmp.wibDyxlHPK/qemu-kvm-0.10.50.20090624132948/target-ppc/machine.c:10: undefined reference to `cpu_synchronize_state' ppc440.o: In function `ppc440ep_init': /tmp/tmp.wibDyxlHPK/qemu-kvm-0.10.50.20090624132948/hw/ppc440.c:48: undefined reference to `kvm_enabled' ppc440_bamboo.o: In function `bamboo_init': /tmp/tmp.wibDyxlHPK/qemu-kvm-0.10.50.20090624132948/hw/ppc440_bamboo.c:183: undefined reference to `kvm_enabled' /tmp/tmp.wibDyxlHPK/qemu-kvm-0.10.50.20090624132948/hw/ppc440_bamboo.c:184: undefined reference to `kvmppc_init' ppce500_mpc8544ds.o: In function `mpc8544ds_init': /tmp/tmp.wibDyxlHPK/qemu-kvm-0.10.50.20090624132948/hw/ppce500_mpc8544ds.c:277: undefined reference to `kvm_enabled' /tmp/tmp.wibDyxlHPK/qemu-kvm-0.10.50.20090624132948/hw/ppce500_mpc8544ds.c:278: undefined reference to `kvmppc_init' libqemu.a(helper.o): In function `ppc_tlb_invalidate_all': /tmp/tmp.wibDyxlHPK/qemu-kvm-0.10.50.20090624132948/target-ppc/helper.c:1861: undefined reference to `kvm_enabled' collect2: ld returned 1 exit status make[2]: *** [qemu-system-ppc] Error 1 make[1]: *** [subdir-ppc-softmmu] Error 2 :-Dustin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] virtio-serial: A guest <-> host interface for simple communication
Amit Shah wrote: > On (Wed) Jun 24 2009 [18:50:02], Jamie Lokier wrote: > > Amit Shah wrote: > > > On (Wed) Jun 24 2009 [17:40:49], Jamie Lokier wrote: > > > > Amit Shah wrote: > > > > > A few sample uses for a vmchannel are to share the host and guest > > > > > clipboards (to allow copy/paste between a host and a guest), to > > > > > lock the screen of the guest session when the vnc viewer is closed, > > > > > to find out which applications are installed on a guest OS even when > > > > > the guest is powered down (using virt-inspector) and so on. > > > > > > > > Those all look like useful features. > > > > > > > > Can you run an application to provide those features on a guest which > > > > _doesn't_ have a vmchannel/virtio-serial support in the kernel? > > > > > > > > Or will it be restricted only to guests which have QEMU-specific > > > > support in their kernel? > > > > > > libguestfs currently uses the -net user based vmchannel interface that > > > exists in current qemu. That doesn't need a kernel that doesn't have > > > support for virtio-serial. > > > > That's great! > > > > If that works fine, and guest apps/libraries are using that as a > > fallback anyway, what benefit do they get from switching to > > virtio-serial when they detect that instead, given they still have > > code for the -net method? > > Speed is the biggest benefit. Fair enough, sounds good, and I can see how it's more usable than a network interface in many respects. > > Is the plan to remove -net user based support from libguestfs? > > I don't know what Richard's plan is, but if the kernel that libguestfs > deploys in the appliance gains support for virtio-serial, there's no > reason it shouldn't switch. > > > Is virtio-serial significantly simpler to use? > > I think the interface from the guest POV stays the same: reads / writes > to char devices. With virtio-serial, though, we can add a few other > interesting things like names to ports, ability to hot-add ports on > demand, request notifications when either end goes down, etc. Good features, useful for a lot of handy things. I think it would be handy if the same features were available to the guest application generally, not just on guest kernels with a specific driver though. As we talked before, about things like boot loaders and kernel debuggers, and installing the support applications on old guests. Is it possible to support access to the same capabilities through a well known IO/MMIO address, in the same way that VGA and IDE are both ordinary PCI devices, but also can be reached easily through well known IO/MMIO addresses in simple code? -- Jamie -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Fix ppc-softmmu build
This gets ppc-softmmu building when KVM is not enabled. It may be enough to get it working with KVM enabled but I haven't checked. Signed-off-by: Anthony Liguori --- hw/ppc440.c|1 + hw/ppc440_bamboo.c |1 + hw/ppce500_mpc8544ds.c |1 + qemu-kvm.h |1 + target-ppc/helper.c|1 + target-ppc/machine.c |1 + 6 files changed, 6 insertions(+), 0 deletions(-) diff --git a/hw/ppc440.c b/hw/ppc440.c index 00d82e4..c2c9e65 100644 --- a/hw/ppc440.c +++ b/hw/ppc440.c @@ -19,6 +19,7 @@ #include "ppc405.h" #include "sysemu.h" #include "kvm.h" +#include "qemu-kvm.h" #define PPC440EP_PCI_CONFIG 0xeec0 #define PPC440EP_PCI_INTACK 0xeed0 diff --git a/hw/ppc440_bamboo.c b/hw/ppc440_bamboo.c index 00aa2c7..e4aad39 100644 --- a/hw/ppc440_bamboo.c +++ b/hw/ppc440_bamboo.c @@ -22,6 +22,7 @@ #include "kvm.h" #include "kvm_ppc.h" #include "device_tree.h" +#include "qemu-kvm.h" #define BINARY_DEVICE_TREE_FILE "bamboo.dtb" diff --git a/hw/ppce500_mpc8544ds.c b/hw/ppce500_mpc8544ds.c index d9ed36c..1099b99 100644 --- a/hw/ppce500_mpc8544ds.c +++ b/hw/ppce500_mpc8544ds.c @@ -29,6 +29,7 @@ #include "device_tree.h" #include "openpic.h" #include "ppce500.h" +#include "qemu-kvm.h" #define BINARY_DEVICE_TREE_FILE"mpc8544ds.dtb" #define UIMAGE_LOAD_BASE 0 diff --git a/qemu-kvm.h b/qemu-kvm.h index 68a5b40..9341d0c 100644 --- a/qemu-kvm.h +++ b/qemu-kvm.h @@ -169,6 +169,7 @@ int kvm_has_sync_mmu(void); void kvm_init_vcpu(CPUState *env); void kvm_load_tsc(CPUState *env); #else +#define kvm_has_sync_mmu() (0) #define kvm_enabled() (0) #define kvm_nested 0 #define qemu_kvm_irqchip_in_kernel() (0) diff --git a/target-ppc/helper.c b/target-ppc/helper.c index 5a7a935..3629c99 100644 --- a/target-ppc/helper.c +++ b/target-ppc/helper.c @@ -29,6 +29,7 @@ #include "helper_regs.h" #include "qemu-common.h" #include "kvm.h" +#include "qemu-kvm.h" //#define DEBUG_MMU //#define DEBUG_BATS diff --git a/target-ppc/machine.c b/target-ppc/machine.c index 99ba3eb..ec8e197 100644 --- a/target-ppc/machine.c +++ b/target-ppc/machine.c @@ -1,6 +1,7 @@ #include "hw/hw.h" #include "hw/boards.h" #include "kvm.h" +#include "qemu-kvm.h" void cpu_save(QEMUFile *f, void *opaque) { -- 1.6.2.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM PATCH v4 3/4] slow-work: add (module*)work->owner to fix races with module clients
The slow_work facility was designed to use reference counting instead of barriers for synchronization. The reference counting mechanism is implemented as a vtable op (->get_ref, ->put_ref) callback. This is problematic for module use of the slow_work facility because it is impossible to synchronize against the .text installed in the callbacks: There is no way to ensure that the slow-work threads have completely exited the text in question and rmmod may yank it out from under the slow_work thread. This patch attempts to address this issue by transparently mapping "struct module* owner" to the slow_work item, and maintaining a module reference count coincident with the more externally visible reference count. Since the slow_work facility is resident in kernel, it should be a race-free location to issue a module_put() call. This will ensure that modules can properly cleanup before exiting. A module_get()/module_put() pair on slow_work_enqueue() and the subsequent dequeue technically adds the overhead of the atomic operations for every work item scheduled. However, slow_work is designed for deferring relatively long-running and/or sleepy tasks to begin with, so this overhead will hopefully be negligible. Signed-off-by: Gregory Haskins CC: David Howells --- include/linux/slow-work.h |4 kernel/slow-work.c|5 + 2 files changed, 9 insertions(+), 0 deletions(-) diff --git a/include/linux/slow-work.h b/include/linux/slow-work.h index b65c888..9f48dab 100644 --- a/include/linux/slow-work.h +++ b/include/linux/slow-work.h @@ -17,6 +17,7 @@ #ifdef CONFIG_SLOW_WORK #include +#include struct slow_work; @@ -42,6 +43,7 @@ struct slow_work_ops { * queued */ struct slow_work { + struct module *owner; unsigned long flags; #define SLOW_WORK_PENDING 0 /* item pending (further) execution */ #define SLOW_WORK_EXECUTING1 /* item currently executing */ @@ -61,6 +63,7 @@ struct slow_work { static inline void slow_work_init(struct slow_work *work, const struct slow_work_ops *ops) { + work->owner = THIS_MODULE; work->flags = 0; work->ops = ops; INIT_LIST_HEAD(&work->link); @@ -78,6 +81,7 @@ static inline void slow_work_init(struct slow_work *work, static inline void vslow_work_init(struct slow_work *work, const struct slow_work_ops *ops) { + work->owner = THIS_MODULE; work->flags = 1 << SLOW_WORK_VERY_SLOW; work->ops = ops; INIT_LIST_HEAD(&work->link); diff --git a/kernel/slow-work.c b/kernel/slow-work.c index 521ed20..fa27500 100644 --- a/kernel/slow-work.c +++ b/kernel/slow-work.c @@ -220,6 +220,7 @@ static bool slow_work_execute(void) } work->ops->put_ref(work); + module_put(work->owner); return true; auto_requeue: @@ -299,6 +300,8 @@ int slow_work_enqueue(struct slow_work *work) if (test_bit(SLOW_WORK_EXECUTING, &work->flags)) { set_bit(SLOW_WORK_ENQ_DEFERRED, &work->flags); } else { + if (!try_module_get(work->owner)) + goto cant_get_mod; if (work->ops->get_ref(work) < 0) goto cant_get_ref; if (test_bit(SLOW_WORK_VERY_SLOW, &work->flags)) @@ -313,6 +316,8 @@ int slow_work_enqueue(struct slow_work *work) return 0; cant_get_ref: + module_put(work->owner); +cant_get_mod: spin_unlock_irqrestore(&slow_work_queue_lock, flags); return -EAGAIN; } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM PATCH v4 0/4] irqfd fixes
(Applies to kvm.git/master:4631e094) The following is the latest attempt to fix the remaining races in irqfd/eventfd. For more details, please read the patch headers. This series has been tested against the kvm-eventfd unit test, and appears to be functioning properly. You can download this test here: ftp://ftp.novell.com/dev/ghaskins/kvm-eventfd.tar.bz2 I've included version 2 of Davide's eventfd patch (ported to kvm.git) so that its a complete reviewable series. Note, however, that there may be later versions of his patch to consider for merging, so we should coordinate with him. Note also that I potentially found an issue in slow_work which I subsequently fixed, so I have cc'd David Howells. It is needed to close some of the remaining issues. -Greg --- Davide Libenzi (1): eventfd - revised interface and cleanups (2nd rev) Gregory Haskins (3): KVM: Fix races in irqfd using new eventfd_kref_get interface slow-work: add (module*)work->owner to fix races with module clients kvm: prepare irqfd for having interrupts disabled during eventfd->release drivers/lguest/lg.h |2 drivers/lguest/lguest_user.c |4 - fs/aio.c | 24 +--- fs/eventfd.c | 119 ++--- include/linux/aio.h |4 - include/linux/eventfd.h | 18 +-- include/linux/kvm_host.h |7 + include/linux/slow-work.h|4 + init/Kconfig |1 kernel/slow-work.c |5 + virt/kvm/Kconfig |1 virt/kvm/eventfd.c | 234 -- 12 files changed, 312 insertions(+), 111 deletions(-) -- Signature -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM PATCH v4 1/4] kvm: prepare irqfd for having interrupts disabled during eventfd->release
We need to plug some race conditions on eventfd shutdown. In order to do this, we need to change the context in which the release notification is delivered so that the wqh lock is now held. However, there is currently code in the release callback that assumes it can sleep. We have a slight chicken and egg problem where we cant fix the race without adding the lock, and we can't add the lock without breaking the sleepy code. Normally we could deal with this by making both changes in an atomic changeset. However, we want to keep the eventfd and kvm specific changes isolated to ease the reviewer burden on upstream eventfd (at the specific request of upstream). Therefore, we have this intermediate patch. This intermediate patch allows the release() method to work in an atomic context, at the expense of correctness w.r.t. memory-leaks. Today we have a race condition. With this patch applied we leak. Both issues will be resolved later in the series. It is the author's opinion that a leak is better for bisectability than the hang would be should we leave the sleepy code in place after the locking changeover. Signed-off-by: Gregory Haskins --- virt/kvm/eventfd.c | 89 1 files changed, 27 insertions(+), 62 deletions(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index a9e7de7..9656027 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -28,7 +28,6 @@ #include #include #include -#include /* * @@ -38,8 +37,6 @@ * */ struct _irqfd { - struct mutex lock; - struct srcu_structsrcu; struct kvm *kvm; int gsi; struct list_head list; @@ -53,48 +50,12 @@ static void irqfd_inject(struct work_struct *work) { struct _irqfd *irqfd = container_of(work, struct _irqfd, inject); - struct kvm *kvm; - int idx; + struct kvm *kvm = irqfd->kvm; - idx = srcu_read_lock(&irqfd->srcu); - - kvm = rcu_dereference(irqfd->kvm); - if (kvm) { - mutex_lock(&kvm->irq_lock); - kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1); - kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0); - mutex_unlock(&kvm->irq_lock); - } - - srcu_read_unlock(&irqfd->srcu, idx); -} - -static void -irqfd_disconnect(struct _irqfd *irqfd) -{ - struct kvm *kvm; - - mutex_lock(&irqfd->lock); - - kvm = rcu_dereference(irqfd->kvm); - rcu_assign_pointer(irqfd->kvm, NULL); - - mutex_unlock(&irqfd->lock); - - if (!kvm) - return; - - mutex_lock(&kvm->lock); - list_del(&irqfd->list); - mutex_unlock(&kvm->lock); - - /* -* It is important to not drop the kvm reference until the next grace -* period because there might be lockless references in flight up -* until then -*/ - synchronize_srcu(&irqfd->srcu); - kvm_put_kvm(kvm); + mutex_lock(&kvm->irq_lock); + kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1); + kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0); + mutex_unlock(&kvm->irq_lock); } static int @@ -103,26 +64,24 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait); unsigned long flags = (unsigned long)key; + /* +* Assume we will be called with interrupts disabled +*/ if (flags & POLLIN) /* -* The POLLIN wake_up is called with interrupts disabled. -* Therefore we need to defer the IRQ injection until later -* since we need to acquire the kvm->lock to do so. +* Defer the IRQ injection until later since we need to +* acquire the kvm->lock to do so. */ schedule_work(&irqfd->inject); if (flags & POLLHUP) { /* -* The POLLHUP is called unlocked, so it theoretically should -* be safe to remove ourselves from the wqh using the locked -* variant of remove_wait_queue() +* for now, just remove ourselves from the list and let +* the rest dangle. We will fix this up later once +* the races in eventfd are fixed */ - remove_wait_queue(irqfd->wqh, &irqfd->wait); - flush_work(&irqfd->inject); - irqfd_disconnect(irqfd); - - cleanup_srcu_struct(&irqfd->srcu); - kfree(irqfd); + __remove_wait_queue(irqfd->wqh, &irqfd->wait); + irqfd->wqh = NULL; } return 0; @@
[KVM PATCH v4 4/4] KVM: Fix races in irqfd using new eventfd_kref_get interface
This patch fixes all known races in irqfd, and paves the way to restore DEASSIGN support. For details of the eventfd races, please see the header for patch 2/4, or the thread on which it was based on: http://www.mail-archive.com/kvm@vger.kernel.org/msg17767.html In a nutshell, we use eventfd_ctx_get/put() to properly manage the lifetime of the underlying eventfd. We also use careful coordination with our workqueue to ensure that all irqfd objects have terminated before we allow kvm to shutdown. The logic used for shutdown walks all open irqfds and releases them. This logic can be generalized in the future to allow a subset of irqfds to be released, thus allowing DEASSIGN support. Signed-off-by: Gregory Haskins --- include/linux/kvm_host.h |7 +- virt/kvm/Kconfig |1 virt/kvm/eventfd.c | 199 +- 3 files changed, 183 insertions(+), 24 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 2451f48..d94ee72 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -141,7 +141,12 @@ struct kvm { struct kvm_io_bus mmio_bus; struct kvm_io_bus pio_bus; #ifdef CONFIG_HAVE_KVM_EVENTFD - struct list_head irqfds; + struct { + spinlock_tlock; + struct list_head items; + atomic_t outstanding; + wait_queue_head_t wqh; + } irqfds; #endif struct kvm_vm_stat stat; struct kvm_arch arch; diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index daece36..ab7848a 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -9,6 +9,7 @@ config HAVE_KVM_IRQCHIP config HAVE_KVM_EVENTFD bool select EVENTFD + select SLOW_WORK config KVM_APIC_ARCHITECTURE bool diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 9656027..e9e74f4 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -28,6 +28,7 @@ #include #include #include +#include /* * @@ -36,17 +37,27 @@ * Credit goes to Avi Kivity for the original idea. * */ + struct _irqfd { struct kvm *kvm; + struct eventfd_ctx *eventfd; int gsi; struct list_head list; poll_tablept; wait_queue_head_t*wqh; wait_queue_t wait; struct work_structinject; + struct slow_work shutdown; }; static void +irqfd_release(struct _irqfd *irqfd) +{ + eventfd_ctx_put(irqfd->eventfd); + kfree(irqfd); +} + +static void irqfd_inject(struct work_struct *work) { struct _irqfd *irqfd = container_of(work, struct _irqfd, inject); @@ -58,6 +69,55 @@ irqfd_inject(struct work_struct *work) mutex_unlock(&kvm->irq_lock); } +static struct _irqfd * +work_to_irqfd(struct slow_work *work) +{ + return container_of(work, struct _irqfd, shutdown); +} + +static int +irqfd_shutdown_get_ref(struct slow_work *work) +{ + struct _irqfd *irqfd = work_to_irqfd(work); + struct kvm *kvm = irqfd->kvm; + + atomic_inc(&kvm->irqfds.outstanding); + + return 0; +} + +static void +irqfd_shutdown_put_ref(struct slow_work *work) +{ + struct _irqfd *irqfd = work_to_irqfd(work); + struct kvm *kvm = irqfd->kvm; + + irqfd_release(irqfd); + + if (atomic_dec_and_test(&kvm->irqfds.outstanding)) + wake_up(&kvm->irqfds.wqh); +} + +static void +irqfd_shutdown_execute(struct slow_work *work) +{ + struct _irqfd *irqfd = work_to_irqfd(work); + + /* +* Ensure there are no outstanding "inject" work-items before we blow +* away our state. Once this job completes, the slow_work +* infrastructure will drop the irqfd object completely via put_ref +*/ + flush_work(&irqfd->inject); +} + +const static struct slow_work_ops irqfd_shutdown_work_ops = { + .get_ref = irqfd_shutdown_get_ref, + .put_ref = irqfd_shutdown_put_ref, + .execute = irqfd_shutdown_execute, +}; + + static int irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) { @@ -65,25 +125,47 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) unsigned long flags = (unsigned long)key; /* -* Assume we will be called with interrupts disabled +* Called with interrupts disabled */ if (flags & POLLIN) - /* -* Defer the IRQ injection until later since we need to -* acquire the kvm->lock to do so. -*/ + /* An event has been signaled, inject an interrupt */ schedule_work(&irqfd->inject); if (flags & POLLHUP) { - /* -* for now, jus
[KVM PATCH v4 2/4] eventfd - revised interface and cleanups (2nd rev)
From: Davide Libenzi The following patch changes the eventfd interface to de-couple the eventfd memory context, from the file pointer instance. Without such change, there is no clean way to racely free handle the POLLHUP event sent when the last instance of the file* goes away. Also, now the internal eventfd APIs are using the eventfd context instead of the file*. Another cleanup this patch does, is making AIO select EVENTFD, instead of adding a bunch of empty function stubs inside eventfd.h in order to handle the (AIO && !EVENTFD) case. [gmh: resolved trivial merge conflict with kvm's POLLHUP patch] Signed-off-by: Davide Libenzi Signed-off-by: Gregory Haskins --- drivers/lguest/lg.h |2 - drivers/lguest/lguest_user.c |4 + fs/aio.c | 24 ++-- fs/eventfd.c | 119 -- include/linux/aio.h |4 + include/linux/eventfd.h | 18 ++ init/Kconfig |1 7 files changed, 120 insertions(+), 52 deletions(-) diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h index d4e8979..9c31382 100644 --- a/drivers/lguest/lg.h +++ b/drivers/lguest/lg.h @@ -82,7 +82,7 @@ struct lg_cpu { struct lg_eventfd { unsigned long addr; - struct file *event; + struct eventfd_ctx *event; }; struct lg_eventfd_map { diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c index 32e2971..9f9a295 100644 --- a/drivers/lguest/lguest_user.c +++ b/drivers/lguest/lguest_user.c @@ -50,7 +50,7 @@ static int add_eventfd(struct lguest *lg, unsigned long addr, int fd) /* Now append new entry. */ new->map[new->num].addr = addr; - new->map[new->num].event = eventfd_fget(fd); + new->map[new->num].event = eventfd_ctx_fdget(fd); if (IS_ERR(new->map[new->num].event)) { kfree(new); return PTR_ERR(new->map[new->num].event); @@ -357,7 +357,7 @@ static int close(struct inode *inode, struct file *file) /* Release any eventfds they registered. */ for (i = 0; i < lg->eventfds->num; i++) - fput(lg->eventfds->map[i].event); + eventfd_ctx_put(lg->eventfds->map[i].event); kfree(lg->eventfds); /* If lg->dead doesn't contain an error code it will be NULL or a diff --git a/fs/aio.c b/fs/aio.c index 76da125..d065b2c 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -485,6 +485,8 @@ static inline void really_put_req(struct kioctx *ctx, struct kiocb *req) { assert_spin_locked(&ctx->ctx_lock); + if (req->ki_eventfd != NULL) + eventfd_ctx_put(req->ki_eventfd); if (req->ki_dtor) req->ki_dtor(req); if (req->ki_iovec != &req->ki_inline_vec) @@ -509,8 +511,6 @@ static void aio_fput_routine(struct work_struct *data) /* Complete the fput(s) */ if (req->ki_filp != NULL) __fput(req->ki_filp); - if (req->ki_eventfd != NULL) - __fput(req->ki_eventfd); /* Link the iocb into the context's free list */ spin_lock_irq(&ctx->ctx_lock); @@ -528,8 +528,6 @@ static void aio_fput_routine(struct work_struct *data) */ static int __aio_put_req(struct kioctx *ctx, struct kiocb *req) { - int schedule_putreq = 0; - dprintk(KERN_DEBUG "aio_put(%p): f_count=%ld\n", req, atomic_long_read(&req->ki_filp->f_count)); @@ -549,24 +547,16 @@ static int __aio_put_req(struct kioctx *ctx, struct kiocb *req) * we would not be holding the last reference to the file*, so * this function will be executed w/out any aio kthread wakeup. */ - if (unlikely(atomic_long_dec_and_test(&req->ki_filp->f_count))) - schedule_putreq++; - else - req->ki_filp = NULL; - if (req->ki_eventfd != NULL) { - if (unlikely(atomic_long_dec_and_test(&req->ki_eventfd->f_count))) - schedule_putreq++; - else - req->ki_eventfd = NULL; - } - if (unlikely(schedule_putreq)) { + if (unlikely(atomic_long_dec_and_test(&req->ki_filp->f_count))) { get_ioctx(ctx); spin_lock(&fput_lock); list_add(&req->ki_list, &fput_head); spin_unlock(&fput_lock); queue_work(aio_wq, &fput_work); - } else + } else { + req->ki_filp = NULL; really_put_req(ctx, req); + } return 1; } @@ -1622,7 +1612,7 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, * an eventfd() fd, and will be signaled for each completed * event using the eventfd_signal() function. */ - req->ki_eventfd = eventfd_fget((int) iocb->aio_resfd); + req->ki_eventfd = eventfd_ctx_
[PATCH] Use upstream guard for kvm support in ppcemb-softmmu
ppcemb-softmmu uses upstream KVM support so put an appropriate guard around it. This fixes the non-KVM ppcemb-softmmu build. Signed-off-by: Anthony Liguori --- configure |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/configure b/configure index 420e101..4bad3c3 100755 --- a/configure +++ b/configure @@ -2261,11 +2261,13 @@ case "$target_cpu" in echo "#define TARGET_ARCH \"ppcemb\"" >> $config_h echo "#define TARGET_PPC 1" >> $config_h echo "#define TARGET_PPCEMB 1" >> $config_h +if test use_upstream_kvm = yes ; then if test "$target_kvm" = "yes" ; then echo "USE_KVM=yes" >> $config_mak echo "KVM_CFLAGS=$kvm_cflags" >> $config_mak echo "#define USE_KVM 1" >> $config_h fi +fi gdb_xml_files="power-core.xml power-fpu.xml power-altivec.xml power-spe.xml" target_phys_bits=64 ;; -- 1.6.2.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH 1/2] allow hypervisor CPUID bit to be overriden
Avi Kivity wrote: > On 06/23/2009 02:31 PM, Paul Brook wrote: > >On Tuesday 23 June 2009, Avi Kivity wrote: > > > >>On 06/23/2009 12:47 AM, Andre Przywara wrote: > >> > >>>KVM defaults to the hypervisor CPUID bit to be set, whereas pure QEMU > >>>clears it. On some occasions one want to set or clear it the other way > >>>round (for instance to get HyperV running inside a guest). > >>>Allow the default to be overridden on the command line and fix some > >>>whitespace damage on the way. > >>> > >>It makes sense for qemu to set the hypervisor bit unconditionally. A > >>guest running under qemu is not bare metal. > >> > > > >I see no reason why a guest has to be told that it's running inside a VM. > >In principle an appropriately configured qemu should be indistinguishable > >from > >real hardware. In practice it's technically infeasible to cover absolutely > >everything, but if we set this bit we're not even trying. > > > >I have no objection to the bit being set by default for the QEMU CPU types. > > > > I agree it's pointless, but it is a Microsoft requirement for passing > their SVVP tests. Enabling it by default makes life a little easier for > users who wish to validate their hypervisor and has no drawbacks. Hold on. Do the SVVP tests fail on a real (non-virtal) machine then? Or is QEMU's machine emulation insufficiently accurate? I see a drawback in setting the bit by default. Something I expect from an emulator is that it behaves like a real machine to the extent possible. In particular, guest code which attempts to check if it's running on a real machine should get the answer "yes". Unfriendly guest code which pops up a message like "Sorry I refuse to work for you after 100 hours/ because you are attempting to run me in a virtual machine, and don't even think of trying to hide this from me now you know I look for it" should never do so. -- Jamie -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Only compile KVM support for matching CPU types
I don't know why this is disabled for qemu-kvm.git. Signed-off-by: Anthony Liguori --- configure |5 + 1 files changed, 1 insertions(+), 4 deletions(-) diff --git a/configure b/configure index 4bad3c3..1b73eaf 100755 --- a/configure +++ b/configure @@ -2117,13 +2117,12 @@ configure_kvm() { fi } -if [ use_upstream_kvm = yes ]; then - # Make sure the target and host cpus are compatible if test ! \( "$target_cpu" = "$cpu" -o \ \( "$target_cpu" = "ppcemb" -a "$cpu" = "ppc" \) -o \ \( "$target_cpu" = "x86_64" -a "$cpu" = "i386" \) -o \ \( "$target_cpu" = "i386" -a "$cpu" = "x86_64" \) \) ; then + \( "$target_cpu" = "ia64" -a "$cpu" = "ia64" \) \) ; then target_kvm="no" fi # Disable KVM for linux-user @@ -2131,8 +2130,6 @@ if test "$target_softmmu" = "no" ; then target_kvm="no" fi -fi - case "$target_cpu" in i386) echo "TARGET_ARCH=i386" >> $config_mak -- 1.6.2.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] eventfd: add internal reference counting to fix notifier race conditions
On Wed, 24 Jun 2009, Rusty Russell wrote: > On Tue, 23 Jun 2009 03:33:22 am Davide Libenzi wrote: > > What you're doing there, is setting up a kernel-to-kernel (since > > userspace only role is to create the eventfd) communication, using a file* > > as accessory. That IMO is plain wrong. > > The most sensible is that userspace can use these fds; an in-kernel variant > is > possible too, but not primary IMHO. > > It's nice that userspace create the fds; it can then use the same fd for > multiple event sources. > > But I didn't see anything wrong with the way eventfd used to work: you have a > kvm ioctl to say "attach this eventfd to this guest notification" and that > does > the eventfd_fget. A detach ioctl does the fput (as does release of the kvm > fd). > > If they close the eventfd and don't do the detach ioctl, it's their problem. Some components would like to know if userspace dropped the fd, and take proper action accordingly (release resources, drop module instances, etc...). The POLLHUP helps with that, but w/out decoupling file* memory from eventfd_ctx memory, it becomes pretty tricky (if feasible at all) to handle the event in a race-free way. Once the file* is decoupled from the eventfd_ctx, it becomes saner to expose the internal kernel API via the eventfd_ctx. Another thing that comes in my mind (that for some components might not matter) is considering the effect of userspace doing things like: for (;;) { fd = eventfd(...); ioctl(xfd, XXX_ADD, fd); close(fd); } That might lead to unprivileged users drawing kernel memory w/out any userspace accountability, if not properly handled. - Davide -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] eventfd - revised interface and cleanups (2nd rev)
On Tue, 23 Jun 2009, Andrew Morton wrote: > Maybe there is such a reason, and it hasn't yet been beaten into my > skull. But I still think it should be done in a separate patch. One > which comes with a full description of the reasons for the change, btw. Since your skull seems pretty hard to beat, would you like me to split it for ya? :) - Davide -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Only compile KVM support for matching CPU types
On Wed, 2009-06-24 at 17:40 -0500, Anthony Liguori wrote: > I don't know why this is disabled for qemu-kvm.git. > > Signed-off-by: Anthony Liguori > --- > configure |5 + > 1 files changed, 1 insertions(+), 4 deletions(-) > > diff --git a/configure b/configure > index 4bad3c3..1b73eaf 100755 > --- a/configure > +++ b/configure > @@ -2117,13 +2117,12 @@ configure_kvm() { >fi > } > > -if [ use_upstream_kvm = yes ]; then > - > # Make sure the target and host cpus are compatible > if test ! \( "$target_cpu" = "$cpu" -o \ >\( "$target_cpu" = "ppcemb" -a "$cpu" = "ppc" \) -o \ >\( "$target_cpu" = "x86_64" -a "$cpu" = "i386" \) -o \ >\( "$target_cpu" = "i386" -a "$cpu" = "x86_64" \) \) ; then > + \( "$target_cpu" = "ia64" -a "$cpu" = "ia64" \) \) ; then Almost... - \( "$target_cpu" = "i386" -a "$cpu" = "x86_64" \) \) ; then + \( "$target_cpu" = "i386" -a "$cpu" = "x86_64" \) \) -o \ + \( "$target_cpu" = "ia64" -a "$cpu" = "ia64" \) \) ; then :-Dustin signature.asc Description: This is a digitally signed message part
Re: [patch] eventfd - revised interface and cleanups (2nd rev)
On Wed, 24 Jun 2009 15:47:47 -0700 (PDT) Davide Libenzi wrote: > On Tue, 23 Jun 2009, Andrew Morton wrote: > > > Maybe there is such a reason, and it hasn't yet been beaten into my > > skull. But I still think it should be done in a separate patch. One > > which comes with a full description of the reasons for the change, btw. > > Since your skull seems pretty hard to beat, would you like me to split it > for ya? :) Split what? My skull? umm, yes please, I believe the patches should be split. And I'm still not seeing the justification for forcing CONFIG_EVENTFD onto all CONFIG_AIO users! -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Only compile KVM support for matching CPU types
On Wed, 2009-06-24 at 18:09 -0500, Dustin Kirkland wrote: > On Wed, 2009-06-24 at 17:40 -0500, Anthony Liguori wrote: > > # Make sure the target and host cpus are compatible > > if test ! \( "$target_cpu" = "$cpu" -o \ > >\( "$target_cpu" = "ppcemb" -a "$cpu" = "ppc" \) -o \ > >\( "$target_cpu" = "x86_64" -a "$cpu" = "i386" \) -o \ > >\( "$target_cpu" = "i386" -a "$cpu" = "x86_64" \) \) ; then > > + \( "$target_cpu" = "ia64" -a "$cpu" = "ia64" \) \) ; then > > Almost... > > - \( "$target_cpu" = "i386" -a "$cpu" = "x86_64" \) \) ; then > + \( "$target_cpu" = "i386" -a "$cpu" = "x86_64" \) \) -o \ > + \( "$target_cpu" = "ia64" -a "$cpu" = "ia64" \) \) ; then Doh. Still not quite right... Clean patch below. Signed-off-by: Dustin Kirkland === modified file 'configure' --- configure 2009-06-17 11:16:33 + +++ configure 2009-06-24 23:13:00 + @@ -2117,13 +2117,12 @@ fi } -if [ use_upstream_kvm = yes ]; then - # Make sure the target and host cpus are compatible if test ! \( "$target_cpu" = "$cpu" -o \ \( "$target_cpu" = "ppcemb" -a "$cpu" = "ppc" \) -o \ \( "$target_cpu" = "x86_64" -a "$cpu" = "i386" \) -o \ - \( "$target_cpu" = "i386" -a "$cpu" = "x86_64" \) \) ; then + \( "$target_cpu" = "i386" -a "$cpu" = "x86_64" \) -o \ + \( "$target_cpu" = "ia64" -a "$cpu" = "ia64" \) \) ; then target_kvm="no" fi # Disable KVM for linux-user @@ -2131,8 +2130,6 @@ target_kvm="no" fi -fi - case "$target_cpu" in i386) echo "TARGET_ARCH=i386" >> $config_mak signature.asc Description: This is a digitally signed message part
Re: [PATCH] Only compile KVM support for matching CPU types
Dustin Kirkland wrote: On Wed, 2009-06-24 at 17:40 -0500, Anthony Liguori wrote: I don't know why this is disabled for qemu-kvm.git. Signed-off-by: Anthony Liguori --- configure |5 + 1 files changed, 1 insertions(+), 4 deletions(-) diff --git a/configure b/configure index 4bad3c3..1b73eaf 100755 --- a/configure +++ b/configure @@ -2117,13 +2117,12 @@ configure_kvm() { fi } -if [ use_upstream_kvm = yes ]; then - # Make sure the target and host cpus are compatible if test ! \( "$target_cpu" = "$cpu" -o \ \( "$target_cpu" = "ppcemb" -a "$cpu" = "ppc" \) -o \ \( "$target_cpu" = "x86_64" -a "$cpu" = "i386" \) -o \ \( "$target_cpu" = "i386" -a "$cpu" = "x86_64" \) \) ; then + \( "$target_cpu" = "ia64" -a "$cpu" = "ia64" \) \) ; then Almost... - \( "$target_cpu" = "i386" -a "$cpu" = "x86_64" \) \) ; then + \( "$target_cpu" = "i386" -a "$cpu" = "x86_64" \) \) -o \ + \( "$target_cpu" = "ia64" -a "$cpu" = "ia64" \) \) ; then Yeah, I have that locally but I forgot to commit --amend. doh! Regards, Anthony Liguori :-Dustin -- Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Only compile KVM support for matching CPU types (v2)
I don't know why this is disabled for qemu-kvm.git. Signed-off-by: Anthony Liguori -- v1 -> v2 Fix typo --- configure |7 ++- 1 files changed, 2 insertions(+), 5 deletions(-) diff --git a/configure b/configure index 4bad3c3..0e65edb 100755 --- a/configure +++ b/configure @@ -2117,13 +2117,12 @@ configure_kvm() { fi } -if [ use_upstream_kvm = yes ]; then - # Make sure the target and host cpus are compatible if test ! \( "$target_cpu" = "$cpu" -o \ \( "$target_cpu" = "ppcemb" -a "$cpu" = "ppc" \) -o \ \( "$target_cpu" = "x86_64" -a "$cpu" = "i386" \) -o \ - \( "$target_cpu" = "i386" -a "$cpu" = "x86_64" \) \) ; then + \( "$target_cpu" = "i386" -a "$cpu" = "x86_64" \) -o \ + \( "$target_cpu" = "ia64" -a "$cpu" = "ia64" \) \) ; then target_kvm="no" fi # Disable KVM for linux-user @@ -2131,8 +2130,6 @@ if test "$target_softmmu" = "no" ; then target_kvm="no" fi -fi - case "$target_cpu" in i386) echo "TARGET_ARCH=i386" >> $config_mak -- 1.6.2.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] eventfd - revised interface and cleanups (2nd rev)
On Wed, 24 Jun 2009, Andrew Morton wrote: > Split what? My skull? Heh :) > umm, yes please, I believe the patches should be split. And I'm still > not seeing the justification for forcing CONFIG_EVENTFD onto all > CONFIG_AIO users! Eventfd notifications became part of the AIO API (it's not even delivered through a new syscall, from the AIO side - same existing aiocb struct and io_submit syscall) once we merged it, so IMHO (AIO && !EVENTFD) would be similar to split AIO in AIO_READ and AIO_WRITE and have (AIO && !AIO_WRITE). Considering that the kernel config, once you unleash the CONFIG_EMBEDDED pandora box, allows you to select (AIO && !EVENTFD) w/out even a warning about possible userspace breakages, this makes it rather a confusing configuration if you ask me. It's not a biggie from the kernel side, just a few ugly errors wrappers around functions. For me AIO (or whatever userspace visible kernel subsystem) should select all the components that are part of the userspace API, but my argument ends here. - Davide -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] eventfd - revised interface and cleanups (4th rev)
The following patch changes the eventfd interface to de-couple the eventfd memory context, from the file pointer instance. Without such change, there is no clean way to racely free handle the POLLHUP event sent when the last instance of the file* goes away. Also, now the internal eventfd APIs are using the eventfd context instead of the file*. Gregory, none of the APIs changed, so your patches do not need to be rebased over this one. The last three revisions just updated comments. Andrew, I'm not posting the "AIO select EVENTFD" patch at this time. Will eventually try another push in your skull once I come back from vacation ;) Signed-off-by: Davide Libenzi - Davide --- drivers/lguest/lg.h |2 drivers/lguest/lguest_user.c |4 - fs/aio.c | 24 ++-- fs/eventfd.c | 122 ++- include/linux/aio.h |4 - include/linux/eventfd.h | 35 +--- 6 files changed, 149 insertions(+), 42 deletions(-) Index: linux-2.6.mod/fs/eventfd.c === --- linux-2.6.mod.orig/fs/eventfd.c 2009-06-23 19:04:38.0 -0700 +++ linux-2.6.mod/fs/eventfd.c 2009-06-24 16:13:18.0 -0700 @@ -14,35 +14,44 @@ #include #include #include -#include #include #include +#include +#include struct eventfd_ctx { + struct kref kref; wait_queue_head_t wqh; /* * Every time that a write(2) is performed on an eventfd, the * value of the __u64 being written is added to "count" and a * wakeup is performed on "wqh". A read(2) will return the "count" * value to userspace, and will reset "count" to zero. The kernel -* size eventfd_signal() also, adds to the "count" counter and +* side eventfd_signal() also, adds to the "count" counter and * issue a wakeup. */ __u64 count; unsigned int flags; }; -/* - * Adds "n" to the eventfd counter "count". Returns "n" in case of - * success, or a value lower then "n" in case of coutner overflow. - * This function is supposed to be called by the kernel in paths - * that do not allow sleeping. In this function we allow the counter - * to reach the ULLONG_MAX value, and we signal this as overflow - * condition by returining a POLLERR to poll(2). +/** + * eventfd_signal - Adds @n to the eventfd counter. + * @ctx: [in] Pointer to the eventfd context. + * @n: [in] Value of the counter to be added to the eventfd internal counter. + * The value cannot be negative. + * + * This function is supposed to be called by the kernel in paths that do not + * allow sleeping. In this function we allow the counter to reach the ULLONG_MAX + * value, and we signal this as overflow condition by returining a POLLERR + * to poll(2). + * + * Returns @n in case of success, a non-negative number lower than @n in case + * of overflow, or the following error codes: + * + * -EINVAL: The value of @n is negative. */ -int eventfd_signal(struct file *file, int n) +int eventfd_signal(struct eventfd_ctx *ctx, int n) { - struct eventfd_ctx *ctx = file->private_data; unsigned long flags; if (n < 0) @@ -59,9 +68,45 @@ int eventfd_signal(struct file *file, in } EXPORT_SYMBOL_GPL(eventfd_signal); +static void eventfd_free(struct kref *kref) +{ + struct eventfd_ctx *ctx = container_of(kref, struct eventfd_ctx, kref); + + kfree(ctx); +} + +/** + * eventfd_ctx_get - Acquires a reference to the internal eventfd context. + * @ctx: [in] Pointer to the eventfd context. + * + * Returns: In case of success, returns a pointer to the eventfd context. + */ +struct eventfd_ctx *eventfd_ctx_get(struct eventfd_ctx *ctx) +{ + kref_get(&ctx->kref); + return ctx; +} +EXPORT_SYMBOL_GPL(eventfd_ctx_get); + +/** + * eventfd_ctx_put - Releases a reference to the internal eventfd context. + * @ctx: [in] Pointer to eventfd context. + * + * The eventfd context reference must have been previously acquired either + * with eventfd_ctx_get() or eventfd_ctx_fdget()). + */ +void eventfd_ctx_put(struct eventfd_ctx *ctx) +{ + kref_put(&ctx->kref, eventfd_free); +} +EXPORT_SYMBOL_GPL(eventfd_ctx_put); + static int eventfd_release(struct inode *inode, struct file *file) { - kfree(file->private_data); + struct eventfd_ctx *ctx = file->private_data; + + wake_up_poll(&ctx->wqh, POLLHUP); + eventfd_ctx_put(ctx); return 0; } @@ -185,6 +230,16 @@ static const struct file_operations even .write = eventfd_write, }; +/** + * eventfd_fget - Acquire a reference of an eventfd file descriptor. + * @fd: [in] Eventfd file descriptor. + * + * Returns a pointer to the eventfd file structure in case of success, or the + * following error pointer: + * + * -EBADF: Invalid @fd file descriptor. + * -EINVAL : The @fd file descriptor is not an eventfd f
Re: [PATCH 3/3] eventfd: add internal reference counting to fix notifier race conditions
On Tue, 23 Jun 2009, Davide Libenzi wrote: > On Tue, 23 Jun 2009, Rusty Russell wrote: > > > The first 'struct eventfd_ctx;' line is not required. > > Will repost dropping that. Almost forgot. While fixing lg.h to drop the fwd declaration, I noticed there's another one ;) - Davide --- a/drivers/lguest/lg.h 2009-06-15 15:35:24.0 -0700 +++ b/drivers/lguest/lg.h 2009-06-24 17:08:26.0 -0700 @@ -38,8 +38,6 @@ #define CHANGED_GDT_TLS4 /* Actually a subset of CHANGED_GDT */ #define CHANGED_ALL3 -struct lguest; - struct lg_cpu { unsigned int id; struct lguest *lg; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] eventfd - revised interface and cleanups (2nd rev)
On Wed, 24 Jun 2009 16:52:06 -0700 (PDT) Davide Libenzi wrote: > > umm, yes please, I believe the patches should be split. And I'm still > > not seeing the justification for forcing CONFIG_EVENTFD onto all > > CONFIG_AIO users! > > Eventfd notifications became part of the AIO API (it's not even delivered > through a new syscall, from the AIO side - same existing aiocb struct and > io_submit syscall) once we merged it, That was a regression for existing embedded AIO users ;) > so IMHO (AIO && !EVENTFD) would be > similar to split AIO in AIO_READ and AIO_WRITE and have (AIO && !AIO_WRITE). > Considering that the kernel config, once you unleash the CONFIG_EMBEDDED > pandora box, allows you to select (AIO && !EVENTFD) w/out even a warning > about possible userspace breakages, this makes it rather a confusing > configuration if you ask me. Sure. But we do assume that one someone sets CONFIG_EMBEDDED, they know what they're doing. We prefer to give them maximum flexibility and foot-shooting ability. As long as the maintenance cost of doing so in each case is reasonable, of course. > It's not a biggie from the kernel side, just a few ugly errors wrappers > around functions. For me AIO (or whatever userspace visible kernel > subsystem) should select all the components that are part of the userspace > API, but my argument ends here. Sure, it's not a biggie either way. Doubtful if many tiny systems are using AIO anyway. Heck, lots of them are running 2.4.18... But from the general this-is-the-way-we-do-things POV, it's preferred that the two features be separable under CONFIG_EMBEDDED if poss. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ESX on KVM requirements
I applied the patch I found here: http://thread.gmane.org/gmane.comp.emulators.qemu/35419 and here are my new debug results: http://pastebin.com/m60d53e9d It looks like it is still crashing in the same place, so I double checked that the patch was applied correctly. I also didn't have the "npt=1" originally on in the module, so I added that as well. Any ideas? Thanks, Ben On Fri, Jun 19, 2009 at 1:30 AM, Alexander Graf wrote: > > Looks like you're missing my VMware interface extension patches that > implement TSC fake value passing to the guest? The series was called "VMware > ESX guest bringup (partial)". > > Also, try to keep the ML in CC and simply pastebin instead of attaching the > log file. There are probably more people out there who have the same > question ;-). > > Alex > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] virtio-serial: A guest <-> host interface for simple communication
On (Wed) Jun 24 2009 [20:20:27], Jamie Lokier wrote: > Amit Shah wrote: > > > > I think the interface from the guest POV stays the same: reads / writes > > to char devices. With virtio-serial, though, we can add a few other > > interesting things like names to ports, ability to hot-add ports on > > demand, request notifications when either end goes down, etc. > > Good features, useful for a lot of handy things. I think it would be > handy if the same features were available to the guest application > generally, not just on guest kernels with a specific driver though. > > As we talked before, about things like boot loaders and kernel > debuggers, and installing the support applications on old guests. The -net vmchannel offers that option. Can't see how to do it other way. > Is it possible to support access to the same capabilities through a > well known IO/MMIO address, in the same way that VGA and IDE are both > ordinary PCI devices, but also can be reached easily through well > known IO/MMIO addresses in simple code? The exits caused by accessing IO/MMIO regions are costlier than using PV. So using the current vmchannel option would be almost similar in those respects. Also, virtio is the almost standard way these days to get performance. We already have block, net drivers over virtio and they have been ported to older kernels as well. Amit -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH 1/2] allow hypervisor CPUID bit to be overriden
On Wed, Jun 24, 2009 at 10:20:41PM +0100, Jamie Lokier wrote: > Avi Kivity wrote: > > On 06/23/2009 02:31 PM, Paul Brook wrote: > > >On Tuesday 23 June 2009, Avi Kivity wrote: > > > > > >>On 06/23/2009 12:47 AM, Andre Przywara wrote: > > >> > > >>>KVM defaults to the hypervisor CPUID bit to be set, whereas pure QEMU > > >>>clears it. On some occasions one want to set or clear it the other way > > >>>round (for instance to get HyperV running inside a guest). > > >>>Allow the default to be overridden on the command line and fix some > > >>>whitespace damage on the way. > > >>> > > >>It makes sense for qemu to set the hypervisor bit unconditionally. A > > >>guest running under qemu is not bare metal. > > >> > > > > > >I see no reason why a guest has to be told that it's running inside a VM. > > >In principle an appropriately configured qemu should be indistinguishable > > >from > > >real hardware. In practice it's technically infeasible to cover absolutely > > >everything, but if we set this bit we're not even trying. > > > > > >I have no objection to the bit being set by default for the QEMU CPU types. > > > > > > > I agree it's pointless, but it is a Microsoft requirement for passing > > their SVVP tests. Enabling it by default makes life a little easier for > > users who wish to validate their hypervisor and has no drawbacks. > > Hold on. > > Do the SVVP tests fail on a real (non-virtal) machine then? > SVVP -> Server Virtualization Validation Program Definitely fails on a real machine :) -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html