[PATCH] KVM: x86: Ignore reads to K7 EVNTSEL MSRs
In commit 7fe29e0faacb650d31b9e9f538203a157bec821d we ignored the reads to the P6 EVNTSEL MSRs. That fixed crashes on Intel machines. Ignore the reads to K7 EVNTSEL MSRs as well to fix this on AMD hosts. This fixes Kaspersky antivirus crashing Windows guests on AMD hosts. Signed-off-by: Amit Shah --- arch/x86/kvm/x86.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1b91ea7..c5b44c9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -957,6 +957,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) case MSR_VM_HSAVE_PA: case MSR_P6_EVNTSEL0: case MSR_P6_EVNTSEL1: + case MSR_K7_EVNTSEL0: data = 0; break; case MSR_MTRRcap: -- 1.6.2.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On Sun, 2009-06-14 at 12:34 +0300, Michael S. Tsirkin wrote: > On Fri, Jun 12, 2009 at 04:53:27PM +0100, Mark McLoughlin wrote: > > The other obvious piece to add to it would be PCI addresses, so that > > even if you remove a device, the addresses assigned to existing devices > > don't change. > > Could you clarify this requirement please? Avi clarified, but I've written it up here too: https://fedoraproject.org/wiki/Features/KVM_Stable_PCI_Addresses Cheers, Mark. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On Sun, 2009-06-14 at 12:50 +0300, Michael S. Tsirkin wrote: > On Fri, Jun 12, 2009 at 05:48:23PM +0100, Mark McLoughlin wrote: > > However, in order to retain compat for that SCSI device (e.g. ensuring > > the PCI address doesn't change as other devices are added an removed), > > we're back to the same problem ... either: > > > > 1) Use '-drive file=foo.img,if=scsi,pci_addr=foo'; in order to figure > > out what address to use, libvirt would need to query qemu for what > > address was originally allocated to device or it would do all the > > PCI address allocation itself ... > > This last option makes sense to me: in a real world the user has > control over where he places the device on the bus, so why > not with qemu? Yep, most people seem to agree that it makes sense to allow this, but some believe it should only be via a machine description file, not the command line. However, the first problem is that it isn't a solution to the guest ABI problem more generally. And the second problem is that for e.g. libvirt to use it, it would have to be possible to query qemu for what PCI slots were assigned to the devices - libvirt would need to be able to parse 'info pci' and match the devices listed with the devices specified on the command line. Again, details written up here: https://fedoraproject.org/wiki/Features/KVM_Stable_PCI_Addresses Cheers, Mark. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On 06/15/2009 12:08 PM, Mark McLoughlin wrote: This last option makes sense to me: in a real world the user has control over where he places the device on the bus, so why not with qemu? Yep, most people seem to agree that it makes sense to allow this, but some believe it should only be via a machine description file, not the command line. I don't understand this opposition. It's clear a machine config file is a long way in our future. It's also clear lack of stable PCI addresses hurts us now. However, the first problem is that it isn't a solution to the guest ABI problem more generally. pci_addr was never meant to bring world peace, just stable PCI addresses. The other issues should be addressed separately. And the second problem is that for e.g. libvirt to use it, it would have to be possible to query qemu for what PCI slots were assigned to the devices - libvirt would need to be able to parse 'info pci' and match the devices listed with the devices specified on the command line. If all devices (including vga, ide) are set up with pci_addr, then this is unneeded. You do need to export available slot numbers from qemu. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On 06/14/2009 12:47 PM, Michael S. Tsirkin wrote: Michael S. Tsirkin wrote: If we want to remove a device from under a running guest, you need hotplug. So we can't just remove several lines from the config and hope that it'll work simply because the PCI address is stable. Why not? E.g. configuration cycles address a specific bus/slot. You need cooperation from guest if you want to move a device. By "remove several lines from the config" I understood the guest needs to be restarted. Of course if you don't restart the guest you need true hotplug. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On 06/14/2009 12:50 PM, Michael S. Tsirkin wrote: On Fri, Jun 12, 2009 at 05:48:23PM +0100, Mark McLoughlin wrote: However, in order to retain compat for that SCSI device (e.g. ensuring the PCI address doesn't change as other devices are added an removed), we're back to the same problem ... either: 1) Use '-drive file=foo.img,if=scsi,pci_addr=foo'; in order to figure out what address to use, libvirt would need to query qemu for what address was originally allocated to device or it would do all the PCI address allocation itself ... This last option makes sense to me: in a real world the user has control over where he places the device on the bus, so why not with qemu? Yes, the user build the machine using the command line and monitor (or, in 2017, the machine configuration file), then turns on the power. Command line options are the parts lying around when we start. btw, -drive needs to be separated: -controller type=lsi1234,pci_addr=foobar,name=blah -drive file=foo.img,controller=blah,index=0 -drive file=bar.img,controller=blah,index=1 Drives to not have pci addresses. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v2 2/2] kvm: use POLLHUP to close an irqfd instead of an explicit ioctl
On Sun, Jun 14, 2009 at 11:39:21PM -0400, Gregory Haskins wrote: > Michael S. Tsirkin wrote: > > On Sun, Jun 14, 2009 at 08:53:11AM -0400, Gregory Haskins wrote: > > > >> Michael S. Tsirkin wrote: > >> > >>> On Thu, Jun 04, 2009 at 08:48:12AM -0400, Gregory Haskins wrote: > >>> > >>> > +static void > +irqfd_disconnect(struct _irqfd *irqfd) > +{ > +struct kvm *kvm; > + > +mutex_lock(&irqfd->lock); > + > +kvm = rcu_dereference(irqfd->kvm); > +rcu_assign_pointer(irqfd->kvm, NULL); > + > +mutex_unlock(&irqfd->lock); > + > +if (!kvm) > +return; > > mutex_lock(&kvm->lock); > -kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1); > -kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0); > +list_del(&irqfd->list); > mutex_unlock(&kvm->lock); > + > +/* > + * It is important to not drop the kvm reference until the next > grace > + * period because there might be lockless references in flight > up > + * until then > + */ > +synchronize_srcu(&irqfd->srcu); > +kvm_put_kvm(kvm); > } > > > >>> So irqfd object will persist after kvm goes away, until eventfd is closed? > >>> > >>> > >> Yep, by design. It becomes part of the eventfd and is thus associated > >> with its lifetime. Consider it as if we made our own anon-fd > >> implementation for irqfd and the lifetime looks similar. The difference > >> is that we are reusing eventfd and its interface semantics. > >> > >>> > >>> > > static int > irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) > { > struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait); > +unsigned long flags = (unsigned long)key; > > -/* > - * The wake_up is called with interrupts disabled. Therefore > we need > - * to defer the IRQ injection until later since we need to > acquire the > - * kvm->lock to do so. > - */ > -schedule_work(&irqfd->work); > +if (flags & POLLIN) > +/* > + * The POLLIN wake_up is called with interrupts > disabled. > + * Therefore we need to defer the IRQ injection until > later > + * since we need to acquire the kvm->lock to do so. > + */ > +schedule_work(&irqfd->inject); > + > +if (flags & POLLHUP) { > +/* > + * The POLLHUP is called unlocked, so it theoretically > should > + * be safe to remove ourselves from the wqh using the > locked > + * variant of remove_wait_queue() > + */ > +remove_wait_queue(irqfd->wqh, &irqfd->wait); > +flush_work(&irqfd->inject); > +irqfd_disconnect(irqfd); > + > +cleanup_srcu_struct(&irqfd->srcu); > +kfree(irqfd); > +} > > return 0; > } > > > >>> And it is removed by this function when eventfd is closed. > >>> But what prevents the kvm module from going away, meanwhile? > >>> > >>> > >> Well, we hold a reference to struct kvm until we call > >> irqfd_disconnect(). If kvm closes first, we disconnect and disassociate > >> all references to kvm leaving irqfd->kvm = NULL. Likewise, if irqfd > >> closes first, we disassociate with kvm with the above quoted logic. In > >> either case, we are holding a kvm reference up until that "disconnect" > >> point. Therefore kvm should not be able to disappear before that > >> disconnect, and after that point we do not care. > >> > > > > Yes, we do care. > > > > Here's the scenario in more detail: > > > > - kvm is closed > > - irq disconnect is called > > - kvm is put > > - kvm module is removed: all irqs are disconnected > > - eventfd closes and triggers callback into removed kvm module > > - crash > > > > [ lightbulb turns on] > > Ah, now I see the point you were making. I thought you were talking > about the .text in kvm_set_irq() (which would be protected by my > kvm_get_kvm() reference afaict). But you are actually talking about the > irqfd .text itself. Indeed, you are correct that is this currently a > race. Good catch! > > > > >> If that is not sufficient to prevent kvm.ko from going away in the > >> middle, then IMO kvm_get_kvm() has a bug, not irqfd. ;) However, I > >> believe everything is actually ok here. > >> > >> -Greg > >> > >> > > > > > > BTW,
Re: [PATCH 1/6] env->kvm_cpu_state.init is always zero here.
On 06/14/2009 01:52 PM, Gleb Natapov wrote: Signed-off-by: Gleb Natapov --- qemu-kvm.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/qemu-kvm.c b/qemu-kvm.c index 2aeb17c..ec911ef 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -412,7 +412,7 @@ static int kvm_main_loop_cpu(CPUState *env) if (env->kvm_cpu_state.sipi_needed) update_regs_for_sipi(env); } - if (!env->halted&& !env->kvm_cpu_state.init) + if (!env->halted) kvm_cpu_exec(env); env->exit_request = 0; env->exception_index = EXCP_INTERRUPT; Are you sure? Can't a reset reenable it? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/6] env->kvm_cpu_state.init is always zero here.
On Mon, Jun 15, 2009 at 12:55:27PM +0300, Avi Kivity wrote: > On 06/14/2009 01:52 PM, Gleb Natapov wrote: >> Signed-off-by: Gleb Natapov >> --- >> qemu-kvm.c |2 +- >> 1 files changed, 1 insertions(+), 1 deletions(-) >> >> diff --git a/qemu-kvm.c b/qemu-kvm.c >> index 2aeb17c..ec911ef 100644 >> --- a/qemu-kvm.c >> +++ b/qemu-kvm.c >> @@ -412,7 +412,7 @@ static int kvm_main_loop_cpu(CPUState *env) >> if (env->kvm_cpu_state.sipi_needed) >> update_regs_for_sipi(env); >> } >> -if (!env->halted&& !env->kvm_cpu_state.init) >> +if (!env->halted) >> kvm_cpu_exec(env); >> env->exit_request = 0; >> env->exception_index = EXCP_INTERRUPT; >> > > Are you sure? Can't a reset reenable it? > The thing is used only with userspace irq chip. If env->kvm_cpu_state.init == 1 update_regs_for_init() is called three line above and it will set kvm_cpu_state.init to zero. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] Handle vcpu init/sipi by calling a function on vcpu
On 06/14/2009 01:52 PM, Gleb Natapov wrote: Instead of having special case in vcpu event loop. I'm a little worried about two vcpus INITing each other simultaneously and deadlocking. INIT/SIPI are async events, the initiator should not wait for them. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/6] env->kvm_cpu_state.init is always zero here.
On 06/15/2009 12:58 PM, Gleb Natapov wrote: On Mon, Jun 15, 2009 at 12:55:27PM +0300, Avi Kivity wrote: On 06/14/2009 01:52 PM, Gleb Natapov wrote: Signed-off-by: Gleb Natapov --- qemu-kvm.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/qemu-kvm.c b/qemu-kvm.c index 2aeb17c..ec911ef 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -412,7 +412,7 @@ static int kvm_main_loop_cpu(CPUState *env) if (env->kvm_cpu_state.sipi_needed) update_regs_for_sipi(env); } - if (!env->halted&& !env->kvm_cpu_state.init) + if (!env->halted) kvm_cpu_exec(env); env->exit_request = 0; env->exception_index = EXCP_INTERRUPT; Are you sure? Can't a reset reenable it? The thing is used only with userspace irq chip. If env->kvm_cpu_state.init == 1 update_regs_for_init() is called three line above and it will set kvm_cpu_state.init to zero. Right, and nothing can sleep in between. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On Sun, 2009-06-14 at 10:58 +0300, Avi Kivity wrote: > Mark McLoughlin wrote: > > > > >> I think the point is that you don't need version numbers if you have a > >> proper device tree. > >> > > > > How do you add a new attribute to the device tree and, when a supplied > > device tree lacking said attribute, distinguish between a device tree > > from an old version of qemu (i.e. use the old default) and a partial > > device tree from the VM manager (i.e. use the new default) ? > > > > -baseline 0.10 That's a version number :-) (I was responding to Anthony's "you don't need a version number") Cheers, Mark. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] Handle vcpu init/sipi by calling a function on vcpu
On Mon, Jun 15, 2009 at 01:03:39PM +0300, Avi Kivity wrote: > On 06/14/2009 01:52 PM, Gleb Natapov wrote: >> Instead of having special case in vcpu event loop. >> >> > > I'm a little worried about two vcpus INITing each other simultaneously > and deadlocking. INIT/SIPI are async events, the initiator should not > wait for them. > I thought to add on_vcpu_async() for that (if this case is worth warring about). -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] Handle vcpu init/sipi by calling a function on vcpu
On 06/15/2009 01:11 PM, Gleb Natapov wrote: On Mon, Jun 15, 2009 at 01:03:39PM +0300, Avi Kivity wrote: On 06/14/2009 01:52 PM, Gleb Natapov wrote: Instead of having special case in vcpu event loop. I'm a little worried about two vcpus INITing each other simultaneously and deadlocking. INIT/SIPI are async events, the initiator should not wait for them. I thought to add on_vcpu_async() for that (if this case is worth warring about). A generic on_vcpu_async() would need to allocate, that might be expoitable. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] Handle vcpu init/sipi by calling a function on vcpu
On Mon, Jun 15, 2009 at 01:14:21PM +0300, Avi Kivity wrote: > On 06/15/2009 01:11 PM, Gleb Natapov wrote: >> On Mon, Jun 15, 2009 at 01:03:39PM +0300, Avi Kivity wrote: >> >>> On 06/14/2009 01:52 PM, Gleb Natapov wrote: >>> Instead of having special case in vcpu event loop. >>> I'm a little worried about two vcpus INITing each other simultaneously >>> and deadlocking. INIT/SIPI are async events, the initiator should not >>> wait for them. >>> >>> >> I thought to add on_vcpu_async() for that (if this case is worth warring >> about). >> > > A generic on_vcpu_async() would need to allocate, that might be expoitable. > Then what about processing events while waiting in on_vcpu()? -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] Handle vcpu init/sipi by calling a function on vcpu
On 06/15/2009 01:16 PM, Gleb Natapov wrote: A generic on_vcpu_async() would need to allocate, that might be expoitable. Then what about processing events while waiting in on_vcpu()? Could work, but prefer a simpler solution. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On Mon, Jun 15, 2009 at 12:43:48PM +0300, Avi Kivity wrote: > On 06/14/2009 12:50 PM, Michael S. Tsirkin wrote: >> On Fri, Jun 12, 2009 at 05:48:23PM +0100, Mark McLoughlin wrote: >> >>> However, in order to retain compat for that SCSI device (e.g. ensuring >>> the PCI address doesn't change as other devices are added an removed), >>> we're back to the same problem ... either: >>> >>>1) Use '-drive file=foo.img,if=scsi,pci_addr=foo'; in order to figure >>> out what address to use, libvirt would need to query qemu for what >>> address was originally allocated to device or it would do all the >>> PCI address allocation itself ... >>> >> >> This last option makes sense to me: in a real world the user has >> control over where he places the device on the bus, so why >> not with qemu? >> > > Yes, the user build the machine using the command line and monitor (or, > in 2017, the machine configuration file), then turns on the power. > Command line options are the parts lying around when we start. > > btw, -drive needs to be separated: > > -controller type=lsi1234,pci_addr=foobar,name=blah > -drive file=foo.img,controller=blah,index=0 > -drive file=bar.img,controller=blah,index=1 > > Drives to not have pci addresses. Maybe we need a generic 'bus options' flag. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On Mon, Jun 15, 2009 at 12:27:08PM +0300, Avi Kivity wrote: > On 06/15/2009 12:08 PM, Mark McLoughlin wrote: >>> This last option makes sense to me: in a real world the user has >>> control over where he places the device on the bus, so why >>> not with qemu? >>> >> >> Yep, most people seem to agree that it makes sense to allow this, but >> some believe it should only be via a machine description file, not the >> command line. >> > > I don't understand this opposition. It's clear a machine config file is > a long way in our future. It's also clear lack of stable PCI addresses > hurts us now. > >> However, the first problem is that it isn't a solution to the guest ABI >> problem more generally. >> > > pci_addr was never meant to bring world peace, just stable PCI > addresses. The other issues should be addressed separately. > >> And the second problem is that for e.g. libvirt to use it, it would have >> to be possible to query qemu for what PCI slots were assigned to the >> devices - libvirt would need to be able to parse 'info pci' and match >> the devices listed with the devices specified on the command line. >> > > If all devices (including vga, ide) are set up with pci_addr, then this > is unneeded. Right. I think it could be an all or nothing at all approach. > You do need to export available slot numbers from qemu. Why would a slot be unavailable? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On Mon, Jun 15, 2009 at 01:32:49PM +0300, Michael S. Tsirkin wrote: > > You do need to export available slot numbers from qemu. > > Why would a slot be unavailable? > Because it does not exist? -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On Mon, Jun 15, 2009 at 01:44:56PM +0300, Gleb Natapov wrote: > On Mon, Jun 15, 2009 at 01:32:49PM +0300, Michael S. Tsirkin wrote: > > > You do need to export available slot numbers from qemu. > > > > Why would a slot be unavailable? > > > Because it does not exist? We can create a slot with any number, can't we? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On Mon, Jun 15, 2009 at 01:46:53PM +0300, Michael S. Tsirkin wrote: > On Mon, Jun 15, 2009 at 01:44:56PM +0300, Gleb Natapov wrote: > > On Mon, Jun 15, 2009 at 01:32:49PM +0300, Michael S. Tsirkin wrote: > > > > You do need to export available slot numbers from qemu. > > > > > > Why would a slot be unavailable? > > > > > Because it does not exist? > > We can create a slot with any number, can't we? What do you mean? If the mobo has 4 slots you can't create fifth. KVM describes 32 slots in the BIOS. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On Mon, Jun 15, 2009 at 01:52:13PM +0300, Gleb Natapov wrote: > On Mon, Jun 15, 2009 at 01:46:53PM +0300, Michael S. Tsirkin wrote: > > On Mon, Jun 15, 2009 at 01:44:56PM +0300, Gleb Natapov wrote: > > > On Mon, Jun 15, 2009 at 01:32:49PM +0300, Michael S. Tsirkin wrote: > > > > > You do need to export available slot numbers from qemu. > > > > > > > > Why would a slot be unavailable? > > > > > > > Because it does not exist? > > > > We can create a slot with any number, can't we? > What do you mean? If the mobo has 4 slots you can't create fifth. > KVM describes 32 slots in the BIOS. Do you mean the KVM kernel module here? I don't know much about the BIOS. Can't qemu control the number of slots declared? -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On Mon, Jun 15, 2009 at 02:07:53PM +0300, Michael S. Tsirkin wrote: > On Mon, Jun 15, 2009 at 01:52:13PM +0300, Gleb Natapov wrote: > > On Mon, Jun 15, 2009 at 01:46:53PM +0300, Michael S. Tsirkin wrote: > > > On Mon, Jun 15, 2009 at 01:44:56PM +0300, Gleb Natapov wrote: > > > > On Mon, Jun 15, 2009 at 01:32:49PM +0300, Michael S. Tsirkin wrote: > > > > > > You do need to export available slot numbers from qemu. > > > > > > > > > > Why would a slot be unavailable? > > > > > > > > > Because it does not exist? > > > > > > We can create a slot with any number, can't we? > > What do you mean? If the mobo has 4 slots you can't create fifth. > > KVM describes 32 slots in the BIOS. > > Do you mean the KVM kernel module here? I don't know much about the No I don't mean KVM kernel module here. > BIOS. Can't qemu control the number of slots declared? > Qemu represents HW, BIOS drives this HW. They should be in sync on such important issues like pci slot configuration. Even if QEMU can control the number of slots declared (which it can't easily do), it will be able to do it only on startup (before BIOS runs). The way to have dynamic number of slots may be pci bridge emulation. Not sure what is needed from BIOS for that. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On 06/15/2009 01:32 PM, Michael S. Tsirkin wrote: You do need to export available slot numbers from qemu. Why would a slot be unavailable? A slot needs to be configured in ACPI, and not be taken by onboard chips (piix takes slot 0, for example). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Activate Virtualization On Demand v2
X86 CPUs need to have some magic happening to enable the virtualization extensions on them. This magic can result in unpleasant results for users, like blocking other VMMs from working (vmx) or using invalid TLB entries (svm). Currently KVM activates virtualization when the respective kernel module is loaded. This blocks us from autoloading KVM modules without breaking other VMMs. To circumvent this problem at least a bit, this patch introduces on demand activation of virtualization. This means, that instead virtualization is enabled on creation of the first virtual machine and disabled on destruction of the last one. So using this, KVM can be easily autoloaded, while keeping other hypervisors usable. --- v2 uses kvm_lock and traces failures atomically Signed-off-by: Alexander Graf --- arch/ia64/kvm/kvm-ia64.c|8 ++- arch/powerpc/kvm/powerpc.c |2 +- arch/s390/kvm/kvm-s390.c|2 +- arch/x86/include/asm/kvm_host.h |2 +- arch/x86/kvm/svm.c | 13 -- arch/x86/kvm/vmx.c |7 +++- arch/x86/kvm/x86.c |4 +- include/linux/kvm_host.h|2 +- virt/kvm/kvm_main.c | 82 +-- 9 files changed, 96 insertions(+), 26 deletions(-) diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index 906d597..3141a92 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -124,7 +124,7 @@ long ia64_pal_vp_create(u64 *vpd, u64 *host_iva, u64 *opt_handler) static DEFINE_SPINLOCK(vp_lock); -void kvm_arch_hardware_enable(void *garbage) +int kvm_arch_hardware_enable(void *garbage) { long status; long tmp_base; @@ -137,7 +137,7 @@ void kvm_arch_hardware_enable(void *garbage) slot = ia64_itr_entry(0x3, KVM_VMM_BASE, pte, KVM_VMM_SHIFT); local_irq_restore(saved_psr); if (slot < 0) - return; + return -EINVAL; spin_lock(&vp_lock); status = ia64_pal_vp_init_env(kvm_vsa_base ? @@ -145,7 +145,7 @@ void kvm_arch_hardware_enable(void *garbage) __pa(kvm_vm_buffer), KVM_VM_BUFFER_BASE, &tmp_base); if (status != 0) { printk(KERN_WARNING"kvm: Failed to Enable VT Support\n"); - return ; + return -EINVAL; } if (!kvm_vsa_base) { @@ -154,6 +154,8 @@ void kvm_arch_hardware_enable(void *garbage) } spin_unlock(&vp_lock); ia64_ptr_entry(0x3, slot); + + return 0; } void kvm_arch_hardware_disable(void *garbage) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 9057335..6558ab7 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -80,7 +80,8 @@ int kvmppc_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu) return r; } -void kvm_arch_hardware_enable(void *garbage) +int kvm_arch_hardware_enable(void *garbage) { + return 0; } diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index cbfe91e..a14e676 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -70,7 +70,8 @@ struct kvm_stats_debugfs_item debugfs_entries[] = { /* Section: not file related */ -void kvm_arch_hardware_enable(void *garbage) +int kvm_arch_hardware_enable(void *garbage) { /* every s390 is virtualization enabled ;-) */ + return 0; } diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 4627627..72d5075 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -463,7 +463,7 @@ struct descriptor_table { struct kvm_x86_ops { int (*cpu_has_kvm_support)(void); /* __init */ int (*disabled_by_bios)(void); /* __init */ - void (*hardware_enable)(void *dummy); /* __init */ + int (*hardware_enable)(void *dummy); void (*hardware_disable)(void *dummy); void (*check_processor_compatibility)(void *rtn); int (*hardware_setup)(void); /* __init */ diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 04ee964..47a8b94 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -245,7 +245,7 @@ static void svm_hardware_disable(void *garbage) cpu_svm_disable(); } -static void svm_hardware_enable(void *garbage) +static int svm_hardware_enable(void *garbage) { struct svm_cpu_data *svm_data; @@ -254,16 +254,20 @@ static void svm_hardware_enable(void *garbage) struct desc_struct *gdt; int me = raw_smp_processor_id(); + rdmsrl(MSR_EFER, efer); + if (efer & EFER_SVME) + return -EBUSY; + if (!has_svm()) { printk(KERN_ERR "svm_cpu_init: err EOPNOTSUPP on %d\n", me); - return; + return -EINVAL; } svm_data = per_cpu(svm_data, me); if (!svm_data) { printk(KERN_ERR "svm_cpu_init: svm_data is
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On 06/15/2009 12:09 PM, Mark McLoughlin wrote: I think the point is that you don't need version numbers if you have a proper device tree. How do you add a new attribute to the device tree and, when a supplied device tree lacking said attribute, distinguish between a device tree from an old version of qemu (i.e. use the old default) and a partial device tree from the VM manager (i.e. use the new default) ? -baseline 0.10 That's a version number :-) (I was responding to Anthony's "you don't need a version number") If you want to prevent incompatibilities, you need to make everything new (potentially including bugfixes) non-default. Eventually the default configuration becomes increasingly unusable and you need a new baseline. You must still be able to fall back to the old baseline for older guests. I don't think games with configuration files can hide that. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On Mon, Jun 15, 2009 at 02:14:15PM +0300, Gleb Natapov wrote: > On Mon, Jun 15, 2009 at 02:07:53PM +0300, Michael S. Tsirkin wrote: > > On Mon, Jun 15, 2009 at 01:52:13PM +0300, Gleb Natapov wrote: > > > On Mon, Jun 15, 2009 at 01:46:53PM +0300, Michael S. Tsirkin wrote: > > > > On Mon, Jun 15, 2009 at 01:44:56PM +0300, Gleb Natapov wrote: > > > > > On Mon, Jun 15, 2009 at 01:32:49PM +0300, Michael S. Tsirkin wrote: > > > > > > > You do need to export available slot numbers from qemu. > > > > > > > > > > > > Why would a slot be unavailable? > > > > > > > > > > > Because it does not exist? > > > > > > > > We can create a slot with any number, can't we? > > > What do you mean? If the mobo has 4 slots you can't create fifth. > > > KVM describes 32 slots in the BIOS. > > > > Do you mean the KVM kernel module here? I don't know much about the > No I don't mean KVM kernel module here. > > > BIOS. Can't qemu control the number of slots declared? > > > Qemu represents HW, BIOS drives this HW. They should be in sync on such > important issues like pci slot configuration. As a simple solution, let's stick to 32 slots per bus. That's the maximum that the PCI spec allows, anyway. > Even if QEMU can control the number of slots declared (which it can't > easily do), it will be able to do it only on startup (before BIOS > runs). That's OK - this is when the machine description is read. > The way to have dynamic > number of slots may be pci bridge emulation. Not sure what is needed > from BIOS for that. Since bridge can be hot-plugged, probably nothing? But we don't necessarily need dynamic number of slots IMO. > > -- > Gleb. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints
Avi Kivity writes: > On 06/15/2009 12:08 PM, Mark McLoughlin wrote: >>> This last option makes sense to me: in a real world the user has >>> control over where he places the device on the bus, so why >>> not with qemu? >>> >> >> Yep, most people seem to agree that it makes sense to allow this, but >> some believe it should only be via a machine description file, not the >> command line. >> > > I don't understand this opposition. It's clear a machine config file > is a long way in our future. It's also clear lack of stable PCI > addresses hurts us now. Correct. >> However, the first problem is that it isn't a solution to the guest ABI >> problem more generally. >> > > pci_addr was never meant to bring world peace, just stable PCI > addresses. The other issues should be addressed separately. > >> And the second problem is that for e.g. libvirt to use it, it would have >> to be possible to query qemu for what PCI slots were assigned to the >> devices - libvirt would need to be able to parse 'info pci' and match >> the devices listed with the devices specified on the command line. >> > > If all devices (including vga, ide) are set up with pci_addr, then > this is unneeded. You do need to export available slot numbers from > qemu. Not really. QEMU gives just the host bridge a fixed slot[*]. All the other slots are available. The real problem is devices that get implicitly added, like the SCSI controller. Those devices get their slots auto-assigned, which can interfere with slot numbers chosen by the user. We need a way to avoid that, as you suggested elsewhere in this thread. [*] There's an exception or two for oddball targets. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] Implement Hyper-V MSRs v2
On 19.05.2009, at 14:52, Avi Kivity wrote: Alexander Graf wrote: Hyper-V uses some MSRs, some of which are actually reserved for BIOS usage. But let's be nice today and have it its way, because otherwise it fails terribly. v2 changes: - remove the 0x4081 MSR definition - add pr_unimpl() on unimplemented writes Signed-off-by: Alexander Graf --- arch/x86/kvm/svm.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index ef43a18..4b4eadd 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2034,6 +2034,11 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data) case MSR_VM_HSAVE_PA: svm->hsave_msr = data; break; + case MSR_VM_CR: + case MSR_VM_IGNNE: + case MSR_K8_HWCR: + pr_unimpl(vcpu, "unimplemented wrmsr: 0x%x data 0x%llx\n", ecx, data); + break; We can be nicer, if the write doesn't set bits which we don't implement, we can let it proceed silently. See for example MSR_IA32_DEBUGCTLMSR. Most likely the values written are already correctly implemented (by doing nothing). Actually we implement very little of the bits. See http://support.amd.com/us/Processor_TechDocs/31116-Public-GH-BKDG_3-28_5-28-09.pdf for what we're missing ;-). So it might make sense to always warn for now, see what OSs use and only filter out those bits they actually try to access (and maybe implement them). Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints
(adding cc) On 06/15/2009 02:35 PM, Markus Armbruster wrote: Not really. QEMU gives just the host bridge a fixed slot[*]. All the other slots are available. qemu needs to export these two bits of information: the first free slot and the number of slots. More generally, which slots are open. We can assume 1:31, but that's unlovely. The real problem is devices that get implicitly added, like the SCSI controller. Those devices get their slots auto-assigned, which can interfere with slot numbers chosen by the user. We need a way to avoid that, as you suggested elsewhere in this thread. Paul/Anthony, can we have -vga pci_addr=, -usb-controller pci_addr=, and -drive pci_addr= (and later, -disk-controller)? Stalling while waiting for the ultimate config file is only generating pain and out-of-tree patches. (I'd be quite happy constructing the entire machine config on the command line, but I realize it's just me) -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] Nested SVM: Improve interrupt injection v2
On 19.05.2009, at 15:22, Gleb Natapov wrote: On Tue, May 19, 2009 at 12:54:03PM +0200, Alexander Graf wrote: While trying to get Hyper-V running, I realized that the interrupt injection mechanisms that are in place right now are not 100% correct. This patch makes nested SVM's interrupt injection behave more like on a real machine. v2 calls BUG_ON when svm_set_irq is called with GIF=0 Signed-off-by: Alexander Graf --- arch/x86/kvm/svm.c | 39 --- 1 files changed, 24 insertions(+), 15 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index fa2a710..5b14c9d 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1517,7 +1517,8 @@ static int nested_svm_vmexit_real(struct vcpu_svm *svm, void *arg1, /* Kill any pending exceptions */ if (svm->vcpu.arch.exception.pending == true) nsvm_printk("WARNING: Pending Exception\n"); - svm->vcpu.arch.exception.pending = false; + kvm_clear_exception_queue(&svm->vcpu); + kvm_clear_interrupt_queue(&svm->vcpu); What about pending NMI here? NMI injected to the guest? That should have triggered by now and caused an #NMI exit, no? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On Mon, Jun 15, 2009 at 02:27:14PM +0300, Avi Kivity wrote: > On 06/15/2009 01:32 PM, Michael S. Tsirkin wrote: >>> You do need to export available slot numbers from qemu. >>> >> >> Why would a slot be unavailable? >> > > A slot needs to be configured in ACPI, Can we configure all possible 32 slots? > and not be taken by onboard chips > (piix takes slot 0, for example). piix is the root complex, isn't it? Are there other examples? If not, we could teach management about the root complex being special ... > -- > error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] Nested SVM: Improve interrupt injection v2
On Mon, Jun 15, 2009 at 01:47:08PM +0200, Alexander Graf wrote: > > On 19.05.2009, at 15:22, Gleb Natapov wrote: > >> On Tue, May 19, 2009 at 12:54:03PM +0200, Alexander Graf wrote: >>> While trying to get Hyper-V running, I realized that the interrupt >>> injection >>> mechanisms that are in place right now are not 100% correct. >>> >>> This patch makes nested SVM's interrupt injection behave more like >>> on a >>> real machine. >>> >>> v2 calls BUG_ON when svm_set_irq is called with GIF=0 >>> >>> Signed-off-by: Alexander Graf >>> --- >>> arch/x86/kvm/svm.c | 39 --- >>> 1 files changed, 24 insertions(+), 15 deletions(-) >>> >>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c >>> index fa2a710..5b14c9d 100644 >>> --- a/arch/x86/kvm/svm.c >>> +++ b/arch/x86/kvm/svm.c >>> @@ -1517,7 +1517,8 @@ static int nested_svm_vmexit_real(struct >>> vcpu_svm *svm, void *arg1, >>> /* Kill any pending exceptions */ >>> if (svm->vcpu.arch.exception.pending == true) >>> nsvm_printk("WARNING: Pending Exception\n"); >>> - svm->vcpu.arch.exception.pending = false; >>> + kvm_clear_exception_queue(&svm->vcpu); >>> + kvm_clear_interrupt_queue(&svm->vcpu); >>> >> What about pending NMI here? > > NMI injected to the guest? That should have triggered by now and caused > an #NMI exit, no? > I don't really understand what this code is doing, but there are three types of events exception/interrupt/nmi you clear only two of them. If you are sure this is correct then OK. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On 06/15/2009 02:48 PM, Michael S. Tsirkin wrote: A slot needs to be configured in ACPI, Can we configure all possible 32 slots? That's what we do. But one is always taken. In the future, perhaps more. and not be taken by onboard chips (piix takes slot 0, for example). piix is the root complex, isn't it? Are there other examples? If not, we could teach management about the root complex being special ... We should just tell the user which slots are open. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: Configuration vs. compat hints
Avi Kivity wrote: > (I'd be quite happy constructing the entire machine config on the > command line, but I realize it's just me) > It is not just you. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v2 2/2] kvm: use POLLHUP to close an irqfd instead of an explicit ioctl
Michael S. Tsirkin wrote: > On Sun, Jun 14, 2009 at 11:39:21PM -0400, Gregory Haskins wrote: > >> Michael S. Tsirkin wrote: >> >>> On Sun, Jun 14, 2009 at 08:53:11AM -0400, Gregory Haskins wrote: >>> >>> Michael S. Tsirkin wrote: > On Thu, Jun 04, 2009 at 08:48:12AM -0400, Gregory Haskins wrote: > > > >> +static void >> +irqfd_disconnect(struct _irqfd *irqfd) >> +{ >> +struct kvm *kvm; >> + >> +mutex_lock(&irqfd->lock); >> + >> +kvm = rcu_dereference(irqfd->kvm); >> +rcu_assign_pointer(irqfd->kvm, NULL); >> + >> +mutex_unlock(&irqfd->lock); >> + >> +if (!kvm) >> +return; >> >> mutex_lock(&kvm->lock); >> -kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1); >> -kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0); >> +list_del(&irqfd->list); >> mutex_unlock(&kvm->lock); >> + >> +/* >> + * It is important to not drop the kvm reference until the next >> grace >> + * period because there might be lockless references in flight >> up >> + * until then >> + */ >> +synchronize_srcu(&irqfd->srcu); >> +kvm_put_kvm(kvm); >> } >> >> >> > So irqfd object will persist after kvm goes away, until eventfd is closed? > > > Yep, by design. It becomes part of the eventfd and is thus associated with its lifetime. Consider it as if we made our own anon-fd implementation for irqfd and the lifetime looks similar. The difference is that we are reusing eventfd and its interface semantics. > > > >> >> static int >> irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) >> { >> struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait); >> +unsigned long flags = (unsigned long)key; >> >> -/* >> - * The wake_up is called with interrupts disabled. Therefore >> we need >> - * to defer the IRQ injection until later since we need to >> acquire the >> - * kvm->lock to do so. >> - */ >> -schedule_work(&irqfd->work); >> +if (flags & POLLIN) >> +/* >> + * The POLLIN wake_up is called with interrupts >> disabled. >> + * Therefore we need to defer the IRQ injection until >> later >> + * since we need to acquire the kvm->lock to do so. >> + */ >> +schedule_work(&irqfd->inject); >> + >> +if (flags & POLLHUP) { >> +/* >> + * The POLLHUP is called unlocked, so it theoretically >> should >> + * be safe to remove ourselves from the wqh using the >> locked >> + * variant of remove_wait_queue() >> + */ >> +remove_wait_queue(irqfd->wqh, &irqfd->wait); >> +flush_work(&irqfd->inject); >> +irqfd_disconnect(irqfd); >> + >> +cleanup_srcu_struct(&irqfd->srcu); >> +kfree(irqfd); >> +} >> >> return 0; >> } >> >> >> > And it is removed by this function when eventfd is closed. > But what prevents the kvm module from going away, meanwhile? > > > Well, we hold a reference to struct kvm until we call irqfd_disconnect(). If kvm closes first, we disconnect and disassociate all references to kvm leaving irqfd->kvm = NULL. Likewise, if irqfd closes first, we disassociate with kvm with the above quoted logic. In either case, we are holding a kvm reference up until that "disconnect" point. Therefore kvm should not be able to disappear before that disconnect, and after that point we do not care. >>> Yes, we do care. >>> >>> Here's the scenario in more detail: >>> >>> - kvm is closed >>> - irq disconnect is called >>> - kvm is put >>> - kvm module is removed: all irqs are disconnected >>> - eventfd closes and triggers callback into removed kvm module >>> - crash >>> >>> >> [ lightbulb turns on] >> >> Ah, now I see the point you were making. I thought you were talking >> about the .text in kvm_set_irq() (which would be protected by my >> kvm_get_kvm() reference afaict). But you are actually talking about the >> irqfd .text itself. Indeed, you are correct that is this currently a >> race. Good catch! >> >> >>> >>>
[KVM-AUTOTEST PATCH 0/5] Introducing kvm_subprocess
The following patches replace the run_bg() function and the kvm_spawn class with kvm_subprocess. The new module is intended to solve a problem with run_bg() (which loses track of the child process when the parent process exits) and allows for more flexibility in handling SSH/Telnet sessions (allows reusing sessions started in previous tests). kvm_subprocess defines a class 'kvm_spawn' and two functions: run_bg() and run_fg(). Its main job is to run a process in the background and allow the user to control it interactively and/or monitor its output on the fly. Its most important feature in the context of KVM tests is that it allows to control child processes left behind by processes that no longer exist. This means that if QEMU is started by the 'boot' test, its output will be logged in the following 'reboot' test as well. See kvm_spawn's docstring for more details. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-AUTOTEST PATCH 1/5] Add new module kvm_subprocess
This module is intended to be used for controlling all child processes in KVM tests: both QEMU processes and SSH/SCP/Telnet processes. Processes started with this module keep running and can be interacted with even after the parent process exits. The current run_bg() utility tracks a child process as long as the parent process is running. When the parent process exits, the tracking thread terminates and cannot resume when needed. Currently SSH/SCP/Telnet communication is handled by kvm_utils.kvm_spawn, which does not allow the child process to run after the parent process exits. Thus, open SSH/SCP/Telnet sessions cannot be reused by tests following the one in which they are opened. The new module provides a solution to these two problems, and also saves some code by reusing common code required both for QEMU processes and SSH/SCP/Telnet processes. Signed-off-by: Michael Goldish --- client/tests/kvm/kvm_subprocess.py | 870 1 files changed, 870 insertions(+), 0 deletions(-) create mode 100644 client/tests/kvm/kvm_subprocess.py diff --git a/client/tests/kvm/kvm_subprocess.py b/client/tests/kvm/kvm_subprocess.py new file mode 100644 index 000..a6090ad --- /dev/null +++ b/client/tests/kvm/kvm_subprocess.py @@ -0,0 +1,870 @@ +#!/usr/bin/python +import sys, subprocess, pty, select, os, time, signal, re, termios, fcntl +import threading, logging +import common, kvm_utils + +""" +A class and functions used for running and controlling child processes. + +...@copyright: 2008-2009 Red Hat Inc. +""" + + +def _lock(filename): +if not os.path.exists(filename): +open(filename, "w").close() +fd = os.open(filename, os.O_RDWR) +fcntl.lockf(fd, fcntl.LOCK_EX) +return fd + + +def _unlock(fd): +fcntl.lockf(fd, fcntl.LOCK_UN) +os.close(fd) + + +def _locked(filename): +try: +fd = os.open(filename, os.O_RDWR) +except: +return False +try: +fcntl.lockf(fd, fcntl.LOCK_EX | fcntl.LOCK_NB) +except: +os.close(fd) +return True +fcntl.lockf(fd, fcntl.LOCK_UN) +os.close(fd) +return False + + +def _wait(filename): +fd = _lock(filename) +_unlock(fd) + + +def _get_filenames(base_dir, id): +return map(lambda s: os.path.join(base_dir, s + "-" + id), + ("pid", "status", "output", "outpipe1", "outpipe2", "inpipe", +"lock-server-running", "lock-client-starting")) + + +class kvm_spawn: +""" +This class is used for spawning and controlling a child process. + +A new instance of this class can either run a new server (a small Python +program that reads output from the child process and reports it to the +client and to a text file) or attach to an already running server. +When a server is started it runs the child process. +The server reports output from the child's STDOUT and STDERR via 3 +channels: two named pipes and a text file. +The text file can be accessed at any time using get_output(). +The first named pipe is used by _tail(), a function that runs in the +background and reports new output from the child as it is produced. +The second named pipe is used by a set of functions that read and parse +output as requested by the user in an interactive manner, similar to +pexpect. +The server also receives input from the client and sends it to the child +process. +An instance of this class can be pickled. When unpickled it automatically +resumes _tail() if needed. +""" + +def __init__(self, command=None, id=None, termination_func=None, + output_func=None, output_prefix="", linesep="\n", + prompt=r"[\#\$]\s*$", status_test_command="echo $?"): +""" +Initialize the class and run command as a child process. + +@param command: Command that will be run, or None if accessing an +already running process. +@param id: id of an already running instance, if accessing a running +instance, or None if starting a new one. +@param termination_func: Function to call when the process exits. The +function must accept a single exit status parameter. +@param output_func: Function to call whenever a line of output is +available from the STDOUT or STDERR streams of the process. +The function must accept a single string parameter. The string +does not include the final newline. +@param output_prefix: String to prepend to lines sent to output_func. +@param linesep: Line separator to be appended to strings sent to the +child process using sendline(). +@param prompt: Regular expression to be used with read_up_to_prompt(). +@param status_test_command: Command to be used for getting the last +exit status of commands run inside a shell (used by +get_command_statu
[KVM-AUTOTEST PATCH 2/5] Modify kvm_vm and kvm_preprocessing to use the new kvm_subprocess module
Signed-off-by: Michael Goldish --- client/tests/kvm/kvm_preprocessing.py | 17 +- client/tests/kvm/kvm_vm.py| 103 + 2 files changed, 44 insertions(+), 76 deletions(-) diff --git a/client/tests/kvm/kvm_preprocessing.py b/client/tests/kvm/kvm_preprocessing.py index f60cfe8..7fac46a 100644 --- a/client/tests/kvm/kvm_preprocessing.py +++ b/client/tests/kvm/kvm_preprocessing.py @@ -1,7 +1,7 @@ import sys, os, time, commands, re, logging from autotest_lib.client.bin import test from autotest_lib.client.common_lib import error -import kvm_vm, kvm_utils +import kvm_vm, kvm_utils, kvm_subprocess def preprocess_image(test, params): @@ -167,17 +167,6 @@ def preprocess(test, params, env): @param params: A dict containing all VM and image parameters. @param env: The environment (a dict-like object). """ -# Verify the identities of all living VMs -for vm in env.values(): -if not kvm_utils.is_vm(vm): -continue -if vm.is_dead(): -continue -if not vm.verify_process_identity(): -logging.debug("VM '%s' seems to have been replaced by another" - " process" % vm.name) -vm.pid = None - # Destroy and remove VMs that are no longer needed in the environment requested_vms = kvm_utils.get_sub_dict_names(params, "vms") for key in env.keys(): @@ -239,8 +228,8 @@ def postprocess(test, params, env): # Remove them all logging.debug("'keep_ppm_files' not specified; removing all PPM files" " from results dir...") -kvm_utils.run_bg("rm -vf %s" % os.path.join(test.debugdir, "*.ppm"), - None, logging.debug, "(rm) ", timeout=5.0) +rm_cmd = "rm -vf %s" % os.path.join(test.debugdir, "*.ppm") +kvm_subprocess.run_fg(rm_cmd, logging.debug, "(rm) ", timeout=5.0) def postprocess_on_error(test, params, env): diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py index 5028161..9aea3fb 100644 --- a/client/tests/kvm/kvm_vm.py +++ b/client/tests/kvm/kvm_vm.py @@ -1,6 +1,6 @@ #!/usr/bin/python import time, socket, os, logging, fcntl -import kvm_utils +import kvm_utils, kvm_subprocess """ Utility classes and functions to handle Virtual Machine creation using qemu. @@ -54,17 +54,22 @@ def create_image(params, qemu_img_path, image_dir): qemu_img_cmd += " %s" % size logging.debug("Running qemu-img command:\n%s" % qemu_img_cmd) -(status, pid, output) = kvm_utils.run_bg(qemu_img_cmd, None, - logging.debug, "(qemu-img) ", - timeout=30) +(status, output) = kvm_subprocess.run_fg(qemu_img_cmd, logging.debug, + "(qemu-img) ", timeout=30) -if status: -logging.debug("qemu-img exited with status %d" % status) -logging.error("Could not create image %s" % image_filename) +if status is None: +logging.error("Timeout elapsed while waiting for qemu-img command " + "to complete:\n%s" % qemu_img_cmd) +return None +elif status != 0: +logging.error("Could not create image; " + "qemu-img command failed:\n%s" % qemu_img_cmd) +logging.error("Status: %s" % status) +logging.error("Output:" + kvm_utils.format_str_for_message(output)) return None if not os.path.exists(image_filename): -logging.debug("Image file does not exist for some reason") -logging.error("Could not create image %s" % image_filename) +logging.error("Image could not be created for some reason; " + "qemu-img command:\n%s" % qemu_img_cmd) return None logging.info("Image created in %s" % image_filename) @@ -106,7 +111,7 @@ class VM: @param image_dir: The directory where images reside @param iso_dir: The directory where ISOs reside """ -self.pid = None +self.process = None self.name = name self.params = params @@ -153,28 +158,6 @@ class VM: return VM(name, params, qemu_path, image_dir, iso_dir) -def verify_process_identity(self): -""" -Make sure .pid really points to the original qemu process. If .pid -points to the same process that was created with the create method, -or to a dead process, return True. Otherwise return False. -""" -if self.is_dead(): -return True -filename = "/proc/%d/cmdline" % self.pid -if not os.path.exists(filename): -logging.debug("Filename %s does not exist" % filename) -return False -file = open(filename) -cmdline = file.read() -file.close() -if not self.qemu_path in cmdline: -return False -if not self.monitor_file_name in cmdli
[KVM-AUTOTEST PATCH 3/5] Modify remote_login and remote_scp in kvm_utils to use kvm_subprocess
Also remove a reference to kvm_log that was left behind. Signed-off-by: Michael Goldish --- client/tests/kvm/kvm_utils.py | 13 - 1 files changed, 8 insertions(+), 5 deletions(-) diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py index 4bc8dc7..09af62d 100644 --- a/client/tests/kvm/kvm_utils.py +++ b/client/tests/kvm/kvm_utils.py @@ -2,6 +2,7 @@ import md5, thread, subprocess, time, string, random, socket, os, signal, pty import select, re, logging from autotest_lib.client.bin import utils from autotest_lib.client.common_lib import error +import kvm_subprocess """ KVM test utility functions. @@ -635,7 +636,8 @@ def remote_login(command, password, prompt, linesep="\n", timeout=10): @return Return the kvm_spawn object on success and None on failure. """ -sub = kvm_spawn(command, linesep) +sub = kvm_subprocess.kvm_spawn(command) +sub.set_linesep(linesep) sub.set_prompt(prompt) password_prompt_count = 0 @@ -670,7 +672,7 @@ def remote_login(command, password, prompt, linesep="\n", timeout=10): sub.close() return None elif match == 4: # "Connection refused" -kvm_log.debug("Got 'Connection refused'") +logging.debug("Got 'Connection refused'") sub.close() return None elif match == 5: # prompt @@ -702,7 +704,7 @@ def remote_scp(command, password, timeout=300, login_timeout=10): @return: True if the transfer succeeds and False on failure. """ -sub = kvm_spawn(command) +sub = kvm_subprocess.kvm_spawn(command) password_prompt_count = 0 _timeout = login_timeout @@ -733,9 +735,10 @@ def remote_scp(command, password, timeout=300, login_timeout=10): sub.close() return False else: # match == None -logging.debug("Timeout or process terminated") +logging.debug("Timeout elapsed or process terminated") +status = sub.get_status() sub.close() -return sub.poll() == 0 +return status == 0 def scp_to_remote(host, port, username, password, local_path, remote_path, -- 1.5.4.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-AUTOTEST PATCH 4/5] Modify run_autotest() in kvm_tests.py to use the new kvm_subprocess module.
Signed-off-by: Michael Goldish --- client/tests/kvm/kvm_tests.py |9 + 1 files changed, 5 insertions(+), 4 deletions(-) diff --git a/client/tests/kvm/kvm_tests.py b/client/tests/kvm/kvm_tests.py index 54d2a7a..ffe9116 100644 --- a/client/tests/kvm/kvm_tests.py +++ b/client/tests/kvm/kvm_tests.py @@ -1,6 +1,6 @@ import time, os, logging from autotest_lib.client.common_lib import utils, error -import kvm_utils, ppm_utils, scan_results +import kvm_utils, kvm_subprocess, ppm_utils, scan_results """ KVM test definitions. @@ -237,15 +237,16 @@ def run_autotest(test, params, env): cmd += " --exclude=*.pyc" cmd += " --exclude=*.svn" cmd += " --exclude=*.git" -kvm_utils.run_bg(cmd % (test.bindir, tarred_autotest_path), timeout=30) +kvm_subprocess.run_fg(cmd % (test.bindir, tarred_autotest_path), timeout=30) # tar the contents of bindir/autotest/tests/ cmd = "cd %s; tar cvjf %s %s/*" cmd += " --exclude=*.pyc" cmd += " --exclude=*.svn" cmd += " --exclude=*.git" -kvm_utils.run_bg(cmd % (os.path.join(test.bindir, "autotest", "tests"), -tarred_test_path, test_name), timeout=30) +kvm_subprocess.run_fg(cmd % + (os.path.join(test.bindir, "autotest", "tests"), + tarred_test_path, test_name), timeout=30) # Check if we need to copy autotest.tar.bz2 copy = False -- 1.5.4.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-AUTOTEST PATCH 5/5] Remove kvm_spawn and run_bg() from kvm_utils.py.
These are now provided by kvm_subprocess.py. Signed-off-by: Michael Goldish --- client/tests/kvm/kvm_utils.py | 477 + 1 files changed, 2 insertions(+), 475 deletions(-) diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py index 09af62d..0ddb7fc 100644 --- a/client/tests/kvm/kvm_utils.py +++ b/client/tests/kvm/kvm_utils.py @@ -231,390 +231,8 @@ def check_kvm_source_dir(source_dir): raise error.TestError("Unknown source dir layout, cannot proceed.") -# The following are a class and functions used for SSH, SCP and Telnet -# communication with guests. - -class kvm_spawn: -""" -This class is used for spawning and controlling a child process. -""" - -def __init__(self, command, linesep="\n"): -""" -Initialize the class and run command as a child process. - -@param command: Command that will be run. -@param linesep: Line separator for the given platform. -""" -self.exitstatus = None -self.linesep = linesep -(pid, fd) = pty.fork() -if pid == 0: -os.execv("/bin/sh", ["/bin/sh", "-c", command]) -else: -self.pid = pid -self.fd = fd - - -def set_linesep(self, linesep): -""" -Sets the line separator string (usually "\\n"). - -@param linesep: Line separator character. -""" -self.linesep = linesep - - -def is_responsive(self, timeout=5.0): -""" -Return True if the session is responsive. - -Send a newline to the child process (e.g. SSH or Telnet) and read some -output using read_nonblocking. -If all is OK, some output should be available (e.g. the shell prompt). -In that case return True. Otherwise return False. - -@param timeout: Timeout that will happen before we consider the -process unresponsive -""" -self.read_nonblocking(timeout=0.1) -self.sendline() -output = self.read_nonblocking(timeout=timeout) -if output.strip(): -return True -return False - - -def poll(self): -""" -If the process exited, return its exit status. Otherwise return None. -The exit status is stored for use in subsequent calls. -""" -if self.exitstatus != None: -return self.exitstatus -pid, status = os.waitpid(self.pid, os.WNOHANG) -if pid: -self.exitstatus = os.WEXITSTATUS(status) -return self.exitstatus -else: -return None - - -def close(self): -""" -Close the session (close the process filedescriptors and kills the -process ID), and return the exit status. -""" -try: -os.close(self.fd) -os.kill(self.pid, signal.SIGTERM) -except OSError: -pass -return self.poll() - - -def sendline(self, str=""): -""" -Sends a string followed by a line separator to the child process. - -@param str: String that will be sent to the child process. -""" -try: -os.write(self.fd, str + self.linesep) -except OSError: -pass - - -def read_nonblocking(self, timeout=1.0): -""" -Read from child until there is nothing to read for timeout seconds. - -@param timeout: Time (seconds) of wait before we give up reading from -the child process. -""" -data = "" -while True: -r, w, x = select.select([self.fd], [], [], timeout) -if self.fd in r: -try: -data += os.read(self.fd, 1024) -except OSError: -return data -else: -return data - - -def match_patterns(self, str, patterns): -""" -Match str against a list of patterns. - -Return the index of the first pattern that matches a substring of str. -None and empty strings in patterns are ignored. -If no match is found, return None. - -@param patterns: List of strings (regular expression patterns). -""" -for i in range(len(patterns)): -if not patterns[i]: -continue -exp = re.compile(patterns[i]) -if exp.search(str): -return i - - -def read_until_output_matches(self, patterns, filter=lambda(x):x, - timeout=30.0, internal_timeout=1.0, - print_func=None): -""" -Read using read_nonblocking until a match is found using match_patterns, -or until timeout expires. Before attempting to search for a match, the -data is filtered using the filter function provided. - -@brief: Read from child using read_nonblocking until a pattern -matches. -
Re: [PATCH] Activate Virtualization On Demand v2
On Mon, Jun 15, 2009 at 01:30:05PM +0200, Alexander Graf wrote: > X86 CPUs need to have some magic happening to enable the virtualization > extensions on them. This magic can result in unpleasant results for > users, like blocking other VMMs from working (vmx) or using invalid TLB > entries (svm). > > Currently KVM activates virtualization when the respective kernel module > is loaded. This blocks us from autoloading KVM modules without breaking > other VMMs. That will only become interesting if we every have such a thing in mainline. So NACK, lots of complication for no good reason. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Activate Virtualization On Demand v2
On 15.06.2009, at 14:17, Christoph Hellwig wrote: On Mon, Jun 15, 2009 at 01:30:05PM +0200, Alexander Graf wrote: X86 CPUs need to have some magic happening to enable the virtualization extensions on them. This magic can result in unpleasant results for users, like blocking other VMMs from working (vmx) or using invalid TLB entries (svm). Currently KVM activates virtualization when the respective kernel module is loaded. This blocks us from autoloading KVM modules without breaking other VMMs. That will only become interesting if we every have such a thing in mainline. So NACK, lots of complication for no good reason. I don't want to fight political battles here. Seriously - we're out of kindergarden. There are users out there who want to have VBox/VMware and kvm installed in parallel and can't have both kernel modules loaded at the same time. We're only hurting _our_ users, not the others if we keep people from having kvm*.ko loaded. Sigh. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Activate Virtualization On Demand v2
On Mon, Jun 15, 2009 at 02:25:01PM +0200, Alexander Graf wrote: > I don't want to fight political battles here. So stop that crap. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: Ignore reads to K7 EVNTSEL MSRs
On 06/15/2009 10:55 AM, Amit Shah wrote: In commit 7fe29e0faacb650d31b9e9f538203a157bec821d we ignored the reads to the P6 EVNTSEL MSRs. That fixed crashes on Intel machines. Ignore the reads to K7 EVNTSEL MSRs as well to fix this on AMD hosts. This fixes Kaspersky antivirus crashing Windows guests on AMD hosts. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
Avi Kivity wrote: On 06/15/2009 12:08 PM, Mark McLoughlin wrote: This last option makes sense to me: in a real world the user has control over where he places the device on the bus, so why not with qemu? Yep, most people seem to agree that it makes sense to allow this, but some believe it should only be via a machine description file, not the command line. I don't understand this opposition. It's clear a machine config file is a long way in our future. It's also clear lack of stable PCI addresses hurts us now. Is there opposition? I don't ever recall seeing a patch... I think it's a perfectly fine idea. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2803638 ] kvm-86: winxp guest + kvmnet.sys virtio unstable
Bugs item #2803638, was opened at 2009-06-09 19:45 Message generated for change (Comment added) made by thekozmo You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2803638&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Geoffrey Brimhall (vinyvat) Assigned to: Nobody/Anonymous (nobody) Summary: kvm-86: winxp guest + kvmnet.sys virtio unstable Initial Comment: * cpu model: Core i7 920 * kvm version: kvm 86 * host kernel version: 2.6.29.4 debian lenny * host kernel arch: x86_64 * guest: winxp 32bit sp3 * qemu command line: qemu-system-x86_64 -boot c -hda ./winxp.qcow2 -cdrom winxp-sp3.iso -smp 2 -m 1024 -vga std -net nic,model=virtio,vlan=0,macaddr=00:15:60:51:09:BD -net tap,vlan=0,ifname=tap0,script=/etc/kvm/kvm-ifup -soundhw es1370 -localtime -k en-us -name -usb When using kvmnet.sys in guest winxp32, guest crashes ( blue-screen with IRQ_NOT_LESS_OR_EQUAL most common ) on complex web2.0 browser pages ( lots of ajax and form fields ). Switching back to default rtl8139 driver ( ie remove model=virtio from command line options ) gets system back to being very stable. Reproduced crash using all permutations of kvm-85 and 2.6.26 kernel also. -- Comment By: Dor Laor (thekozmo) Date: 2009-06-15 15:43 Message: Once we'll get authorization to release new windows drivers it will get solved. -- Comment By: Geoffrey Brimhall (vinyvat) Date: 2009-06-09 20:32 Message: Also have 6GB ram on system. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2803638&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: Configuration vs. compat hints
Avi Kivity writes: > (adding cc) > > On 06/15/2009 02:35 PM, Markus Armbruster wrote: >> Not really. QEMU gives just the host bridge a fixed slot[*]. All the >> other slots are available. >> > > qemu needs to export these two bits of information: the first free > slot and the number of slots. > > More generally, which slots are open. We can assume 1:31, but that's > unlovely. Point. >> The real problem is devices that get implicitly added, like the SCSI >> controller. Those devices get their slots auto-assigned, which can >> interfere with slot numbers chosen by the user. We need a way to avoid >> that, as you suggested elsewhere in this thread. >> > > Paul/Anthony, can we have -vga pci_addr=, -usb-controller pci_addr=, > and -drive pci_addr= (and later, -disk-controller)? Stalling while > waiting for the ultimate config file is only generating pain and > out-of-tree patches. Yup. I got bit-rotten patches for pci_addr=, and I can unrot them if they're wanted. > (I'd be quite happy constructing the entire machine config on the > command line, but I realize it's just me) Ha, .bash_history as config file... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On Mon, Jun 15, 2009 at 02:56:42PM +0300, Avi Kivity wrote: > On 06/15/2009 02:48 PM, Michael S. Tsirkin wrote: >>> A slot needs to be configured in ACPI, >>> >> >> Can we configure all possible 32 slots? >> > > That's what we do. But one is always taken. In the future, perhaps more. > >>> and not be taken by onboard chips >>> (piix takes slot 0, for example). >>> >> >> piix is the root complex, isn't it? Are there other examples? If not, >> we could teach management about the root complex being special ... >> > > We should just tell the user which slots are open. This might be tricky if the config is passed in with the command line flags. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
Avi Kivity wrote: On 06/14/2009 12:50 PM, Michael S. Tsirkin wrote: On Fri, Jun 12, 2009 at 05:48:23PM +0100, Mark McLoughlin wrote: However, in order to retain compat for that SCSI device (e.g. ensuring the PCI address doesn't change as other devices are added an removed), we're back to the same problem ... either: 1) Use '-drive file=foo.img,if=scsi,pci_addr=foo'; in order to figure out what address to use, libvirt would need to query qemu for what address was originally allocated to device or it would do all the PCI address allocation itself ... This last option makes sense to me: in a real world the user has control over where he places the device on the bus, so why not with qemu? Yes, the user build the machine using the command line and monitor (or, in 2017, the machine configuration file), Considering pbrook just posted a machine config for arm, I think it would be rather sad if pc wasn't converted to it by 2017... then turns on the power. Command line options are the parts lying around when we start. btw, -drive needs to be separated: -controller type=lsi1234,pci_addr=foobar,name=blah -drive file=foo.img,controller=blah,index=0 -drive file=bar.img,controller=blah,index=1 Drives to not have pci addresses. Drivers don't have indexes and buses but we specify it on the -drive line. -drive is convenient syntax. It stops being convenient when you force it to be two options. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
Avi Kivity wrote: On 06/15/2009 12:09 PM, Mark McLoughlin wrote: I think the point is that you don't need version numbers if you have a proper device tree. How do you add a new attribute to the device tree and, when a supplied device tree lacking said attribute, distinguish between a device tree from an old version of qemu (i.e. use the old default) and a partial device tree from the VM manager (i.e. use the new default) ? -baseline 0.10 That's a version number :-) (I was responding to Anthony's "you don't need a version number") If you want to prevent incompatibilities, you need to make everything new (potentially including bugfixes) non-default. Eventually the default configuration becomes increasingly unusable and you need a new baseline. You must still be able to fall back to the old baseline for older guests. I don't think games with configuration files can hide that. -M pc1 -M pc2 etc. This is pretty easy to maintain with config files. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: Configuration vs. compat hints
Markus Armbruster wrote: Avi Kivity writes: Paul/Anthony, can we have -vga pci_addr=, -usb-controller pci_addr=, and -drive pci_addr= (and later, -disk-controller)? Stalling while waiting for the ultimate config file is only generating pain and out-of-tree patches. Yup. I got bit-rotten patches for pci_addr=, and I can unrot them if they're wanted. Yes, would be good to have patches on the list to discuss. In principle, I have no objection to this. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
Avi Kivity wrote: On 06/15/2009 03:41 PM, Michael S. Tsirkin wrote: We should just tell the user which slots are open. This might be tricky if the config is passed in with the command line flags. qemu -show-available-pci-slots Why does the user care? Let QEMU allocate the PCI slot, then query it to see what slot it assigned and remember that. It's not a good idea to have management applications attempt to do PCI slot allocation. For instance, one day we may decide to make virtio devices multi-function. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On 06/15/2009 03:41 PM, Michael S. Tsirkin wrote: We should just tell the user which slots are open. This might be tricky if the config is passed in with the command line flags. qemu -show-available-pci-slots (the qemu equivalent to KVM_CHECK_EXTENSION) -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v2 2/2] kvm: use POLLHUP to close an irqfd instead of an explicit ioctl
On Mon, Jun 15, 2009 at 08:08:18AM -0400, Gregory Haskins wrote: > >> @@ -123,6 +124,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int > >> sync, void > >> *key) > >> > >> cleanup_srcu_struct(&irqfd->srcu); > >> kfree(irqfd); > >> + module_put(THIS_MODULE); > >> } > >> > >> return 0; > >> > > > > module_put(THIS_MODULE) is always a bug unless you know that someone has > > a reference to the current module: the module could go away between this > > call and returning from function. > > > > Hmm. I understand what you are saying conceptually (i.e. the .text > could get yanked before we hit the next line of code, in this case the > "return 0"). However, holding a reference when you _know_ someone else > holds a reference to me says that one of the references is redundant. > In addition, there is certainly plenty of precedence for > module_put(THIS_MODULE) all throughout the kernel (including > module_put_and_exit()). Are those broken as well? Maybe not, but I don't know why. It works fine as long as you don't unload any modules though :) Rusty, could you enlighten us please? -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On 06/15/2009 03:41 PM, Anthony Liguori wrote: Yep, most people seem to agree that it makes sense to allow this, but some believe it should only be via a machine description file, not the command line. I don't understand this opposition. It's clear a machine config file is a long way in our future. It's also clear lack of stable PCI addresses hurts us now. Is there opposition? I don't ever recall seeing a patch... Izik Eidus posted a patch (using a different syntax) in November 2007. I think it's a perfectly fine idea. Good. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On 06/15/2009 03:45 PM, Anthony Liguori wrote: This last option makes sense to me: in a real world the user has control over where he places the device on the bus, so why not with qemu? Yes, the user build the machine using the command line and monitor (or, in 2017, the machine configuration file), Considering pbrook just posted a machine config for arm, I think it would be rather sad if pc wasn't converted to it by 2017... I'd be sad too, but not surprised. then turns on the power. Command line options are the parts lying around when we start. btw, -drive needs to be separated: -controller type=lsi1234,pci_addr=foobar,name=blah -drive file=foo.img,controller=blah,index=0 -drive file=bar.img,controller=blah,index=1 Drives to not have pci addresses. Drivers don't have indexes and buses but we specify it on the -drive line. Drives do have indexes. On old parallel scsi drives you set the index by clicking a button on the back of the drive to cycle through scsi addresses 0-7. An IDE drive's index is determined by the cable (master/slave). A SATA drive's index is determined by which header on the motherboard the drive connects to. If by bus you mean the if= parameter, then drives certainly do have buses. Just try connecting the scsi drive from the previous paragraph to a USB port. -drive is convenient syntax. It stops being convenient when you force it to be two options. controller= defaults to some builtin thing which autoinstantiates when necessary, so the old -drive syntax works. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints
Anthony Liguori writes: > Avi Kivity wrote: >> On 06/15/2009 12:08 PM, Mark McLoughlin wrote: This last option makes sense to me: in a real world the user has control over where he places the device on the bus, so why not with qemu? >>> >>> Yep, most people seem to agree that it makes sense to allow this, but >>> some believe it should only be via a machine description file, not the >>> command line. >>> >> >> I don't understand this opposition. It's clear a machine config >> file is a long way in our future. It's also clear lack of stable >> PCI addresses hurts us now. > > Is there opposition? I don't ever recall seeing a patch... http://www.archivum.info/qemu-de...@nongnu.org/2009-01/msg01458.html > I think it's a perfectly fine idea. Off to dust off my patch series. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On 06/15/2009 03:52 PM, Anthony Liguori wrote: Avi Kivity wrote: On 06/15/2009 03:41 PM, Michael S. Tsirkin wrote: We should just tell the user which slots are open. This might be tricky if the config is passed in with the command line flags. qemu -show-available-pci-slots Why does the user care? Let QEMU allocate the PCI slot, then query it to see what slot it assigned and remember that. It's a roundabout way of doing things. As an example, if you try to fit too many devices into the machine, you have to try to add all devices and watch for a qemu error. If you know in advance how many slots you have, you never enter into that situation (and you need to show the limit to the user anyway). It's not a good idea to have management applications attempt to do PCI slot allocation. For instance, one day we may decide to make virtio devices multi-function. Non-virtio, as well. But we can't make that the default, so the user will have to specify this anyway. Given that you can't hotunplug individual functions, the user will have to specify exactly how functions are aggregated into devices. My recommendation would be for a GUI to allow the user to select a 'quad port virtio NIC' or 'dual port virtio scsi controller' rather than trying to do anything automatic. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On 06/15/2009 03:48 PM, Anthony Liguori wrote: device tree lacking said attribute, distinguish between a device tree from an old version of qemu (i.e. use the old default) and a partial device tree from the VM manager (i.e. use the new default) ? -baseline 0.10 That's a version number :-) (I was responding to Anthony's "you don't need a version number") If you want to prevent incompatibilities, you need to make everything new (potentially including bugfixes) non-default. Eventually the default configuration becomes increasingly unusable and you need a new baseline. You must still be able to fall back to the old baseline for older guests. I don't think games with configuration files can hide that. How do you add a new attribute to the device tree and, when a supplied -M pc1 -M pc2 Certainly preferable to -baseline. This is pretty easy to maintain with config files. Let's not tie the two together. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
Hi, Yes, the user build the machine using the command line and monitor (or, in 2017, the machine configuration file), Considering pbrook just posted a machine config for arm, I think it would be rather sad if pc wasn't converted to it by 2017... It shouldn't last until 2017, but the process isn't that trivial. Some qemu code / control flow needs serious restruction until it is in a state that creating the devices from a fdt can actually work. Drivers don't have indexes and buses but we specify it on the -drive line. -drive is convenient syntax. It stops being convenient when you force it to be two options. One more issue: -drive also mixes host and guest configuration. These must be separated too. cheers, Gerd -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] Add rudimentary Hyper-V guest support v3
Now that we have nested SVM in place, let's make use of it and virtualize something non-kvm. The first interesting target that came to my mind here was Hyper-V. This patchset makes Windows Server 2008 boot with Hyper-V, which runs the "dom0" in virtualized mode already. It hangs somewhere in IDE code when booted, so I haven't been able to run a second VM within for now yet. Please keep in mind that Hyper-V won't work unless you apply the userspace patches too and the PAT bit patch --- v2 changes: - remove reserved PAT check patch (Avi will do this) - remove #PF inject on emulated_read - take comments from v1 into account (listed individually) v3 changes: - forward-port to current git Alexander Graf (4): Add definition for IGNNE MSR Implement Hyper-V MSRs v2 Nested SVM: Implement INVLPGA v2 Nested SVM: Improve interrupt injection v2 arch/x86/include/asm/msr-index.h |1 + arch/x86/kvm/svm.c | 59 +++-- 2 files changed, 44 insertions(+), 16 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] Add definition for IGNNE MSR
Hyper-V tried to access MSR_IGNNE, so let's at least have a definition for it in our headers. Signed-off-by: Alexander Graf --- arch/x86/include/asm/msr-index.h |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index ec41fc1..e273549 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -372,6 +372,7 @@ /* AMD-V MSRs */ #define MSR_VM_CR 0xc0010114 +#define MSR_VM_IGNNE0xc0010115 #define MSR_VM_HSAVE_PA 0xc0010117 #endif /* _ASM_X86_MSR_INDEX_H */ -- 1.6.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
Avi Kivity wrote: On 06/15/2009 03:45 PM, Anthony Liguori wrote: This last option makes sense to me: in a real world the user has control over where he places the device on the bus, so why not with qemu? Yes, the user build the machine using the command line and monitor (or, in 2017, the machine configuration file), Considering pbrook just posted a machine config for arm, I think it would be rather sad if pc wasn't converted to it by 2017... I'd be sad too, but not surprised. then turns on the power. Command line options are the parts lying around when we start. btw, -drive needs to be separated: -controller type=lsi1234,pci_addr=foobar,name=blah -drive file=foo.img,controller=blah,index=0 -drive file=bar.img,controller=blah,index=1 Drives to not have pci addresses. Drivers don't have indexes and buses but we specify it on the -drive line. Drives do have indexes. On old parallel scsi drives you set the index by clicking a button on the back of the drive to cycle through scsi addresses 0-7. An IDE drive's index is determined by the cable (master/slave). A SATA drive's index is determined by which header on the motherboard the drive connects to. It's not at all that simple. SCSI has a hierarchical address mechanism with 0-7 targets but then potentially multiple LUNs per target. Today, we always emulate a single LUN per target but if we ever wanted to support more than 7 disks on a SCSI controller, we would have to add multiple LUN support too. So the current linear unit= parameter is actually pretty broken for SCSI. For IDE, it's a combination of bus, slot, and master/slave. For virtio, it's just a PCI address. What we really need is something that is more opaque and controller specific. For instance, if we were going to do controllers... -controller type=lsi1234,pci_addr=foobar,name=blah -controller-disk controller=blah,target=0,lun=1,name=sda -controller type=ide,pci_addr=barfoo,name=ide -controller-disk controller=ide,slot=secondary,cable=slave,name=hdd -drive file=foo.img,controller-disk=sda -drive file=bar.img,controller-disk=hdd And having "-hdd file=foo.img" be short-hand for "-drive file=%s,controller-disk=%s". If by bus you mean the if= parameter, then drives certainly do have buses. Just try connecting the scsi drive from the previous paragraph to a USB port. No, I meant drive file=foo.img,bus=3. If that doesn't seem obvious what it should do to you that's because it isn't at all obvious :-) It ends up skipping a predefined number of locations in the drive table. This is pretty broken fundamentally because it assumes controllers always support a fixed number of devices. Nothing really respects bus_id though so in practice, I assume it's almost universally broken. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] Implement Hyper-V MSRs
Hyper-V uses some MSRs, some of which are actually reserved for BIOS usage. But let's be nice today and have it its way, because otherwise it fails terribly. v2 changes: - remove the 0x4081 MSR definition - add pr_unimpl() on unimplemented writes Signed-off-by: Alexander Graf --- arch/x86/kvm/svm.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index ef43a18..4b4eadd 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2034,6 +2034,11 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data) case MSR_VM_HSAVE_PA: svm->hsave_msr = data; break; + case MSR_VM_CR: + case MSR_VM_IGNNE: + case MSR_K8_HWCR: + pr_unimpl(vcpu, "unimplemented wrmsr: 0x%x data 0x%llx\n", ecx, data); + break; default: return kvm_set_msr_common(vcpu, ecx, data); } -- 1.6.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] Nested SVM: Implement INVLPGA
SVM adds another way to do INVLPG by ASID which Hyper-V makes use of, so let's implement it! For now we just do the same thing invlpg does, as asid switching means we flush the mmu anyways. That might change one day though. v2 makes invlpga do the same as invlpg, not flush the whole mmu Signed-off-by: Alexander Graf --- arch/x86/kvm/svm.c | 15 ++- 1 files changed, 14 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 4b4eadd..fa2a710 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1785,6 +1785,19 @@ static int clgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) return 1; } +static int invlpga_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) +{ + struct kvm_vcpu *vcpu = &svm->vcpu; + nsvm_printk("INVLPGA\n"); + + /* Let's treat INVLPGA the same as INVLPG (can be optimized!) */ + kvm_mmu_invlpg(vcpu, vcpu->arch.regs[VCPU_REGS_RAX]); + + svm->next_rip = kvm_rip_read(&svm->vcpu) + 3; + skip_emulated_instruction(&svm->vcpu); + return 1; +} + static int invalid_op_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) { @@ -2130,7 +2143,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm, [SVM_EXIT_INVD] = emulate_on_interception, [SVM_EXIT_HLT] = halt_interception, [SVM_EXIT_INVLPG] = invlpg_interception, - [SVM_EXIT_INVLPGA] = invalid_op_interception, + [SVM_EXIT_INVLPGA] = invlpga_interception, [SVM_EXIT_IOIO] = io_interception, [SVM_EXIT_MSR] = msr_interception, [SVM_EXIT_TASK_SWITCH] = task_switch_interception, -- 1.6.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] Nested SVM: Improve interrupt injection
While trying to get Hyper-V running, I realized that the interrupt injection mechanisms that are in place right now are not 100% correct. This patch makes nested SVM's interrupt injection behave more like on a real machine. v2 calls BUG_ON when svm_set_irq is called with GIF=0 Signed-off-by: Alexander Graf --- arch/x86/kvm/svm.c | 39 --- 1 files changed, 24 insertions(+), 15 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index fa2a710..5b14c9d 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1517,7 +1517,8 @@ static int nested_svm_vmexit_real(struct vcpu_svm *svm, void *arg1, /* Kill any pending exceptions */ if (svm->vcpu.arch.exception.pending == true) nsvm_printk("WARNING: Pending Exception\n"); - svm->vcpu.arch.exception.pending = false; + kvm_clear_exception_queue(&svm->vcpu); + kvm_clear_interrupt_queue(&svm->vcpu); /* Restore selected save entries */ svm->vmcb->save.es = hsave->save.es; @@ -1585,7 +1586,8 @@ static int nested_svm_vmrun(struct vcpu_svm *svm, void *arg1, svm->nested_vmcb = svm->vmcb->save.rax; /* Clear internal status */ - svm->vcpu.arch.exception.pending = false; + kvm_clear_exception_queue(&svm->vcpu); + kvm_clear_interrupt_queue(&svm->vcpu); /* Save the old vmcb, so we don't need to pick what we save, but can restore everything when a VMEXIT occurs */ @@ -2277,21 +2279,14 @@ static inline void svm_inject_irq(struct vcpu_svm *svm, int irq) ((/*control->int_vector >> 4*/ 0xf) << V_INTR_PRIO_SHIFT); } -static void svm_queue_irq(struct kvm_vcpu *vcpu, unsigned nr) -{ - struct vcpu_svm *svm = to_svm(vcpu); - - svm->vmcb->control.event_inj = nr | - SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_INTR; -} - static void svm_set_irq(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); - nested_svm_intr(svm); + BUG_ON(!(svm->vcpu.arch.hflags & HF_GIF_MASK)); - svm_queue_irq(vcpu, vcpu->arch.interrupt.nr); + svm->vmcb->control.event_inj = vcpu->arch.interrupt.nr | + SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_INTR; } static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr) @@ -2319,13 +2314,25 @@ static int svm_interrupt_allowed(struct kvm_vcpu *vcpu) struct vmcb *vmcb = svm->vmcb; return (vmcb->save.rflags & X86_EFLAGS_IF) && !(vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) && - (svm->vcpu.arch.hflags & HF_GIF_MASK); + (svm->vcpu.arch.hflags & HF_GIF_MASK) && + !is_nested(svm); } static void enable_irq_window(struct kvm_vcpu *vcpu) { - svm_set_vintr(to_svm(vcpu)); - svm_inject_irq(to_svm(vcpu), 0x0); + struct vcpu_svm *svm = to_svm(vcpu); + nsvm_printk("Trying to open IRQ window\n"); + + nested_svm_intr(svm); + + /* In case GIF=0 we can't rely on the CPU to tell us when +* GIF becomes 1, because that's a separate STGI/VMRUN intercept. +* The next time we get that intercept, this function will be +* called again though and we'll get the vintr intercept. */ + if (svm->vcpu.arch.hflags & HF_GIF_MASK) { + svm_set_vintr(svm); + svm_inject_irq(svm, 0x0); + } } static void enable_nmi_window(struct kvm_vcpu *vcpu) @@ -2393,6 +2400,8 @@ static void svm_complete_interrupts(struct vcpu_svm *svm) case SVM_EXITINTINFO_TYPE_EXEPT: /* In case of software exception do not reinject an exception vector, but re-execute and instruction instead */ + if (is_nested(svm)) + break; if (kvm_exception_is_soft(vector)) break; if (exitintinfo & SVM_EXITINTINFO_VALID_ERR) { -- 1.6.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/8] Do not use env->halted to decide where halted state should be handled.
Use kvm_irqchip_in_kernel() for that. If irq chip is not handled by userspace kernel should be entered even when CPU is halted. Signed-off-by: Gleb Natapov --- hw/apic.c |3 +-- qemu-kvm.c |6 +++--- 2 files changed, 4 insertions(+), 5 deletions(-) diff --git a/hw/apic.c b/hw/apic.c index c5d97b2..f186202 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -467,8 +467,7 @@ static void apic_init_ipi(APICState *s) cpu_reset(s->cpu_env); -if (!(s->apicbase & MSR_IA32_APICBASE_BSP) && -(!kvm_enabled() || !qemu_kvm_irqchip_in_kernel())) +if (!(s->apicbase & MSR_IA32_APICBASE_BSP)) s->cpu_env->halted = 1; if (kvm_enabled() && !qemu_kvm_irqchip_in_kernel()) diff --git a/qemu-kvm.c b/qemu-kvm.c index ec911ef..7676e02 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -217,6 +217,8 @@ static int has_work(CPUState *env) { if (!vm_running || (env && env->kvm_cpu_state.stopped)) return 0; +if (kvm_irqchip_in_kernel(kvm_context)) +return 1; if (!env->halted) return 1; return kvm_arch_has_work(env); @@ -390,8 +392,6 @@ static int kvm_main_loop_cpu(CPUState *env) setup_kernel_sigmask(env); pthread_mutex_lock(&qemu_mutex); -if (kvm_irqchip_in_kernel(kvm_context)) - env->halted = 0; kvm_qemu_init_env(env); #ifdef TARGET_I386 @@ -412,7 +412,7 @@ static int kvm_main_loop_cpu(CPUState *env) if (env->kvm_cpu_state.sipi_needed) update_regs_for_sipi(env); } - if (!env->halted) + if (!env->halted || kvm_irqchip_in_kernel(kvm_context)) kvm_cpu_exec(env); env->exit_request = 0; env->exception_index = EXCP_INTERRUPT; -- 1.6.2.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/8] Small cpu loop cleanups
Also fix "info cpus" to show correct halt status with in kernel irq chip. I removed patch that uses on_vcpu() for init/sipi handling for now. Gleb Natapov (8): env->kvm_cpu_state.init is always zero here. Do not use env->halted to decide where halted state should be handled. Call kvm_arch_load_regs() instead of kvm_load_registers() Rename kvm_(load|save)_mpstate to kvm_arch_(load|save)_mpstate Retrieve mp state info in cpu_synchronize_state() env->exception_index is not used by kvm code. s->cpu_env cannot be zero here. env->exit_request is not used by kvm. hw/apic.c |6 ++ qemu-kvm-ia64.c |6 ++ qemu-kvm-x86.c|9 +++-- qemu-kvm.c| 39 +++ qemu-kvm.h|4 target-i386/machine.c |6 +++--- target-ia64/machine.c |4 ++-- vl.c |1 - 8 files changed, 47 insertions(+), 28 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/8] Rename kvm_(load|save)_mpstate to kvm_arch_(load|save)_mpstate
To be consistent with other function naming. Signed-off-by: Gleb Natapov --- qemu-kvm-ia64.c |4 ++-- qemu-kvm-x86.c|4 ++-- qemu-kvm.h|2 ++ target-i386/machine.c |6 +++--- target-ia64/machine.c |4 ++-- 5 files changed, 11 insertions(+), 9 deletions(-) diff --git a/qemu-kvm-ia64.c b/qemu-kvm-ia64.c index d33c1c3..234602c 100644 --- a/qemu-kvm-ia64.c +++ b/qemu-kvm-ia64.c @@ -98,7 +98,7 @@ void kvm_arch_update_guest_debug(CPUState *env, struct kvm_guest_debug *dbg) { } -void kvm_save_mpstate(CPUState *env) +void kvm_arch_save_mpstate(CPUState *env) { #ifdef KVM_CAP_MP_STATE int r; @@ -112,7 +112,7 @@ void kvm_save_mpstate(CPUState *env) #endif } -void kvm_load_mpstate(CPUState *env) +void kvm_arch_load_mpstate(CPUState *env) { #ifdef KVM_CAP_MP_STATE struct kvm_mp_state mp_state = { .mp_state = env->mp_state }; diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c index 729d600..8e6fb75 100644 --- a/qemu-kvm-x86.c +++ b/qemu-kvm-x86.c @@ -295,7 +295,7 @@ void kvm_load_tsc(CPUState *env) perror("kvm_set_tsc FAILED.\n"); } -void kvm_save_mpstate(CPUState *env) +void kvm_arch_save_mpstate(CPUState *env) { #ifdef KVM_CAP_MP_STATE int r; @@ -309,7 +309,7 @@ void kvm_save_mpstate(CPUState *env) #endif } -void kvm_load_mpstate(CPUState *env) +void kvm_arch_load_mpstate(CPUState *env) { #ifdef KVM_CAP_MP_STATE struct kvm_mp_state mp_state = { .mp_state = env->mp_state }; diff --git a/qemu-kvm.h b/qemu-kvm.h index fa40542..3b73fe9 100644 --- a/qemu-kvm.h +++ b/qemu-kvm.h @@ -69,6 +69,8 @@ int kvm_arch_qemu_create_context(void); void kvm_arch_save_regs(CPUState *env); void kvm_arch_load_regs(CPUState *env); +void kvm_arch_load_mpstate(CPUState *env); +void kvm_arch_save_mpstate(CPUState *env); int kvm_arch_qemu_init_env(CPUState *cenv); void kvm_arch_pre_kvm_run(void *opaque, CPUState *env); void kvm_arch_post_kvm_run(void *opaque, CPUState *env); diff --git a/target-i386/machine.c b/target-i386/machine.c index 07df1e1..14942c0 100644 --- a/target-i386/machine.c +++ b/target-i386/machine.c @@ -34,7 +34,7 @@ void cpu_save(QEMUFile *f, void *opaque) if (kvm_enabled()) { kvm_save_registers(env); -kvm_save_mpstate(env); +kvm_arch_save_mpstate(env); } for(i = 0; i < CPU_NB_REGS; i++) @@ -369,12 +369,12 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id) kvm_load_tsc(env); if (version_id >= 5) { qemu_get_be32s(f, &env->mp_state); -kvm_load_mpstate(env); +kvm_arch_load_mpstate(env); } } else { kvm_load_registers(env); kvm_load_tsc(env); -kvm_load_mpstate(env); +kvm_arch_load_mpstate(env); } } return 0; diff --git a/target-ia64/machine.c b/target-ia64/machine.c index dd205c5..70ef379 100644 --- a/target-ia64/machine.c +++ b/target-ia64/machine.c @@ -10,7 +10,7 @@ void cpu_save(QEMUFile *f, void *opaque) if (kvm_enabled()) { kvm_save_registers(env); -kvm_save_mpstate(env); +kvm_arch_save_mpstate(env); } } @@ -20,7 +20,7 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id) if (kvm_enabled()) { kvm_load_registers(env); -kvm_load_mpstate(env); +kvm_arch_load_mpstate(env); } return 0; } -- 1.6.2.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/8] Call kvm_arch_load_regs() instead of kvm_load_registers()
The call is done from vcpu thread. Signed-off-by: Gleb Natapov --- qemu-kvm.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/qemu-kvm.c b/qemu-kvm.c index 7676e02..5fa7154 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -399,7 +399,7 @@ static int kvm_main_loop_cpu(CPUState *env) #endif cpu_single_env = env; -kvm_load_registers(env); +kvm_arch_load_regs(env); while (1) { while (!has_work(env)) -- 1.6.2.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/8] env->kvm_cpu_state.init is always zero here.
Signed-off-by: Gleb Natapov --- qemu-kvm.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/qemu-kvm.c b/qemu-kvm.c index 2aeb17c..ec911ef 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -412,7 +412,7 @@ static int kvm_main_loop_cpu(CPUState *env) if (env->kvm_cpu_state.sipi_needed) update_regs_for_sipi(env); } - if (!env->halted && !env->kvm_cpu_state.init) + if (!env->halted) kvm_cpu_exec(env); env->exit_request = 0; env->exception_index = EXCP_INTERRUPT; -- 1.6.2.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 8/8] env->exit_request is not used by kvm.
Remove its use from kvm code. Signed-off-by: Gleb Natapov --- qemu-kvm-x86.c |3 +-- qemu-kvm.c |3 --- vl.c |1 - 3 files changed, 1 insertions(+), 6 deletions(-) diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c index 6865385..5460136 100644 --- a/qemu-kvm-x86.c +++ b/qemu-kvm-x86.c @@ -634,8 +634,7 @@ void kvm_arch_post_kvm_run(void *opaque, CPUState *env) int kvm_arch_has_work(CPUState *env) { -if (env->exit_request || -((env->interrupt_request & CPU_INTERRUPT_HARD) && +if (((env->interrupt_request & CPU_INTERRUPT_HARD) && (env->eflags & IF_MASK)) || (env->interrupt_request & CPU_INTERRUPT_NMI)) return 1; diff --git a/qemu-kvm.c b/qemu-kvm.c index 2930a1d..bbdb03a 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -168,8 +168,6 @@ static int kvm_pre_run(void *opaque, void *data) kvm_arch_pre_kvm_run(opaque, env); -if (env->exit_request) - return 1; pthread_mutex_unlock(&qemu_mutex); return 0; } @@ -441,7 +439,6 @@ static int kvm_main_loop_cpu(CPUState *env) } if (!env->halted || kvm_irqchip_in_kernel(kvm_context)) kvm_cpu_exec(env); - env->exit_request = 0; kvm_main_loop_wait(env, 0); } pthread_mutex_unlock(&qemu_mutex); diff --git a/vl.c b/vl.c index 845ed54..c08299c 100644 --- a/vl.c +++ b/vl.c @@ -3724,7 +3724,6 @@ void qemu_system_reset_request(void) } if (cpu_single_env) { qemu_kvm_cpu_stop(cpu_single_env); -cpu_exit(cpu_single_env); } qemu_notify_event(); } -- 1.6.2.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/8] Retrieve mp state info in cpu_synchronize_state()
And set env->halted based on the value to show accurate vcpu state in QEMU monitor. Signed-off-by: Gleb Natapov --- qemu-kvm.c | 27 +++ qemu-kvm.h |2 ++ 2 files changed, 29 insertions(+), 0 deletions(-) diff --git a/qemu-kvm.c b/qemu-kvm.c index 5fa7154..3ae4b45 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -200,6 +200,33 @@ void kvm_save_registers(CPUState *env) on_vcpu(env, kvm_do_save_registers, env); } +static void kvm_do_load_mpstate(void *_env) +{ +CPUState *env = _env; + +kvm_arch_load_mpstate(env); +} + +void kvm_load_mpstate(CPUState *env) +{ +if (kvm_enabled() && qemu_system_ready) +on_vcpu(env, kvm_do_load_mpstate, env); +} + +static void kvm_do_save_mpstate(void *_env) +{ +CPUState *env = _env; + +kvm_arch_save_mpstate(env); +env->halted = (env->mp_state == KVM_MP_STATE_HALTED); +} + +void kvm_save_mpstate(CPUState *env) +{ +if (kvm_enabled()) +on_vcpu(env, kvm_do_save_mpstate, env); +} + int kvm_cpu_exec(CPUState *env) { int r; diff --git a/qemu-kvm.h b/qemu-kvm.h index 3b73fe9..22452e9 100644 --- a/qemu-kvm.h +++ b/qemu-kvm.h @@ -221,11 +221,13 @@ static inline int kvm_sync_vcpus(void) { return 0; } static inline void kvm_arch_get_registers(CPUState *env) { kvm_save_registers(env); +kvm_save_mpstate(env); } static inline void kvm_arch_put_registers(CPUState *env) { kvm_load_registers(env); +kvm_load_mpstate(env); } static inline void cpu_synchronize_state(CPUState *env, int modified) -- 1.6.2.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/8] env->exception_index is not used by kvm code.
Signed-off-by: Gleb Natapov --- qemu-kvm-ia64.c |2 -- qemu-kvm-x86.c |2 -- qemu-kvm.c |1 - 3 files changed, 0 insertions(+), 5 deletions(-) diff --git a/qemu-kvm-ia64.c b/qemu-kvm-ia64.c index 234602c..477d24c 100644 --- a/qemu-kvm-ia64.c +++ b/qemu-kvm-ia64.c @@ -35,7 +35,6 @@ int kvm_arch_halt(void *opaque, kvm_vcpu_context_t vcpu) { CPUState *env = cpu_single_env; env->hflags |= HF_HALTED_MASK; -env->exception_index = EXCP_HLT; return 1; } @@ -135,7 +134,6 @@ void kvm_arch_cpu_reset(CPUState *env) } else { env->interrupt_request &= ~CPU_INTERRUPT_HARD; env->halted = 1; - env->exception_index = EXCP_HLT; } } diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c index 8e6fb75..6865385 100644 --- a/qemu-kvm-x86.c +++ b/qemu-kvm-x86.c @@ -611,7 +611,6 @@ int kvm_arch_halt(void *opaque, kvm_vcpu_context_t vcpu) (env->eflags & IF_MASK)) && !(env->interrupt_request & CPU_INTERRUPT_NMI)) { env->halted = 1; - env->exception_index = EXCP_HLT; } return 1; } @@ -707,7 +706,6 @@ void kvm_arch_cpu_reset(CPUState *env) } else { env->interrupt_request &= ~CPU_INTERRUPT_HARD; env->halted = 1; - env->exception_index = EXCP_HLT; } } } diff --git a/qemu-kvm.c b/qemu-kvm.c index 3ae4b45..2930a1d 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -442,7 +442,6 @@ static int kvm_main_loop_cpu(CPUState *env) if (!env->halted || kvm_irqchip_in_kernel(kvm_context)) kvm_cpu_exec(env); env->exit_request = 0; -env->exception_index = EXCP_INTERRUPT; kvm_main_loop_wait(env, 0); } pthread_mutex_unlock(&qemu_mutex); -- 1.6.2.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/8] s->cpu_env cannot be zero here.
Remove redundant check. Signed-off-by: Gleb Natapov --- hw/apic.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/hw/apic.c b/hw/apic.c index f186202..eac54fd 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -471,8 +471,7 @@ static void apic_init_ipi(APICState *s) s->cpu_env->halted = 1; if (kvm_enabled() && !qemu_kvm_irqchip_in_kernel()) - if (s->cpu_env) - kvm_apic_init(s->cpu_env); +kvm_apic_init(s->cpu_env); } /* send a SIPI message to the CPU to start it */ -- 1.6.2.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
Avi Kivity wrote: On 06/15/2009 03:52 PM, Anthony Liguori wrote: Avi Kivity wrote: On 06/15/2009 03:41 PM, Michael S. Tsirkin wrote: We should just tell the user which slots are open. This might be tricky if the config is passed in with the command line flags. qemu -show-available-pci-slots Why does the user care? Let QEMU allocate the PCI slot, then query it to see what slot it assigned and remember that. It's a roundabout way of doing things. Having libvirt do PCI slot allocation scares me. It assumes we can return a whitelist of available slots, and then let libvirt just randomly assign things. There's knowledge though in slot assignment that's board-specific. For instance, depending on how many LNK lines you have, you may want to put things in slots in such a way to optimize interrupt balancing or something like that. Some platforms have quirks about expecting a particular slot to have a particular device. It's still an optimal device but it has to be in that slot. You can't really express that via an available slot list. Non-virtio, as well. But we can't make that the default, so the user will have to specify this anyway. Given that you can't hotunplug individual functions, the user will have to specify exactly how functions are aggregated into devices. My recommendation would be for a GUI to allow the user to select a 'quad port virtio NIC' or 'dual port virtio scsi controller' rather than trying to do anything automatic. Yeah, I haven't thought much about that. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
Avi Kivity wrote: Certainly preferable to -baseline. This is pretty easy to maintain with config files. Let's not tie the two together. I mentioned it because it suggests a good transition. We at least have to think through how things map to the post-config file world regardless of whether that's a few months from now or a decade :-) Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [KVM-AUTOTEST PATCH 1/4] Make all programs on kvm test use /usr/bin/python
- "Martin Bligh" wrote: > On Wed, Jun 10, 2009 at 4:01 AM, Alexey Eromenko > wrote: > > > > Even better would be to use "/usr/bin/python2". > > That doesn't seem to exist, on Ubuntu at least. > Red Hat systems have it. "/usr/bin/python2" is a symlink to "/usr/bin/python" (which is python2 executable) Is there any Ubuntu-compatible way of achieving this? -Alexey -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On 06/15/2009 04:20 PM, Anthony Liguori wrote: then turns on the power. Command line options are the parts lying around when we start. btw, -drive needs to be separated: -controller type=lsi1234,pci_addr=foobar,name=blah -drive file=foo.img,controller=blah,index=0 -drive file=bar.img,controller=blah,index=1 Drives to not have pci addresses. Drivers don't have indexes and buses but we specify it on the -drive line. Drives do have indexes. On old parallel scsi drives you set the index by clicking a button on the back of the drive to cycle through scsi addresses 0-7. An IDE drive's index is determined by the cable (master/slave). A SATA drive's index is determined by which header on the motherboard the drive connects to. It's not at all that simple. SCSI has a hierarchical address mechanism with 0-7 targets but then potentially multiple LUNs per target. Today, we always emulate a single LUN per target but if we ever wanted to support more than 7 disks on a SCSI controller, we would have to add multiple LUN support too. So the current linear unit= parameter is actually pretty broken for SCSI. Well, another level in the hierarchy, but I don't think it materially changes things. For IDE, it's a combination of bus, slot, and master/slave. For virtio, it's just a PCI address. What we really need is something that is more opaque and controller specific. virtio also has a bus (did you mean the pci bus for IDE?), master/slave is the index. virtio doesn't have index, but IMO that was a mistake and we should have designed it as a disk controller in the first place. For instance, if we were going to do controllers... -controller type=lsi1234,pci_addr=foobar,name=blah -controller-disk controller=blah,target=0,lun=1,name=sda -controller type=ide,pci_addr=barfoo,name=ide -controller-disk controller=ide,slot=secondary,cable=slave,name=hdd -drive file=foo.img,controller-disk=sda -drive file=bar.img,controller-disk=hdd And having "-hdd file=foo.img" be short-hand for "-drive file=%s,controller-disk=%s". Yeah. If by bus you mean the if= parameter, then drives certainly do have buses. Just try connecting the scsi drive from the previous paragraph to a USB port. No, I meant drive file=foo.img,bus=3. If that doesn't seem obvious what it should do to you that's because it isn't at all obvious :-) It ends up skipping a predefined number of locations in the drive table. This is pretty broken fundamentally because it assumes controllers always support a fixed number of devices. Nothing really respects bus_id though so in practice, I assume it's almost universally broken. Isn't the drive table something totally internal? And how does bus= relate to it? Confused. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
Avi Kivity wrote: On 06/15/2009 04:20 PM, Anthony Liguori wrote: It's not at all that simple. SCSI has a hierarchical address mechanism with 0-7 targets but then potentially multiple LUNs per target. Today, we always emulate a single LUN per target but if we ever wanted to support more than 7 disks on a SCSI controller, we would have to add multiple LUN support too. So the current linear unit= parameter is actually pretty broken for SCSI. Well, another level in the hierarchy, but I don't think it materially changes things. Depends on whether you expect to say index=0,lun=3 or index=3. If you mean the later, then it's quite conceivable that each target supports less than the maximum number of LUNs. This makes things pretty confusing to the user because they have to know that in the current implementation, index=0 is valid, index=1 isn't, but index=8 is. No, I meant drive file=foo.img,bus=3. If that doesn't seem obvious what it should do to you that's because it isn't at all obvious :-) It ends up skipping a predefined number of locations in the drive table. This is pretty broken fundamentally because it assumes controllers always support a fixed number of devices. Nothing really respects bus_id though so in practice, I assume it's almost universally broken. Isn't the drive table something totally internal? And how does bus= relate to it? The reality of unit=X,bus=Y,if=Z is that they expand to: drive_table_index=Y*max_devs[Z] + X Whereas max_devs = {"ide":4, "scsi": 7, *:0} How drive_table_index is interpreted is "if" specific. For if=scsi, each lsi device gets a base drive table index that starts at bus_index * 7. For virtio, the first empty spot in drive_table results in no more drives being created. It's broken by design. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On 06/15/2009 04:23 PM, Anthony Liguori wrote: Avi Kivity wrote: On 06/15/2009 03:52 PM, Anthony Liguori wrote: Avi Kivity wrote: On 06/15/2009 03:41 PM, Michael S. Tsirkin wrote: We should just tell the user which slots are open. This might be tricky if the config is passed in with the command line flags. qemu -show-available-pci-slots Why does the user care? Let QEMU allocate the PCI slot, then query it to see what slot it assigned and remember that. It's a roundabout way of doing things. Having libvirt do PCI slot allocation scares me. It assumes we can return a whitelist of available slots, and then let libvirt just randomly assign things. There's knowledge though in slot assignment that's board-specific. For instance, depending on how many LNK lines you have, you may want to put things in slots in such a way to optimize interrupt balancing or something like that. How would qemu know which slots to optimize for? In practice, I don't see that as a real problem. We should (a) add an ioapic and four more pci links (b) recommend that slots be assigned in ascending order, and everything works. I don't see your concern about libvirt allocating slots. If a human can plug a card into a slot, so can libvirt. Doing an interactive back-and-forth (equivalent to plugging a card while blindfolded, then looking to see which slot we hit) is certainly more difficult. Some platforms have quirks about expecting a particular slot to have a particular device. It's still an optimal device but it has to be in that slot. You can't really express that via an available slot list. I'll be surprised if we ever measure different dma speeds on different slots in the qemu virtual pci bus. If we do, we'll find a way to express them: $ qemu -print-pci slot 0:01: available 33MHz slot 0:02: available 33MHz slot 0:03: available 66MHz I feel a little silly typing this. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On 06/15/2009 04:24 PM, Anthony Liguori wrote: Avi Kivity wrote: Certainly preferable to -baseline. This is pretty easy to maintain with config files. Let's not tie the two together. I mentioned it because it suggests a good transition. We at least have to think through how things map to the post-config file world regardless of whether that's a few months from now or a decade :-) Sure, it's good both from the transitional point of view and in its own right. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] kvm-s390: streamline memslot handling - rebased
Avi Kivity wrote: Marcelo Tosatti wrote: (continued below) Anyway, yeah, the set request / wait mechanism you implement here is quite similar to the idea mentioned earlier that could be used for x86. Just get rid of this explicit KVM_REQ_MMU_RELOAD knowledge in arch-independent code please (if you want to see this merged). I agree to lift the wait part to other archs later if needed, but as mentioned above I could move this to arch code to the cost of one arch hook more. But as also mentioned it doesn't really hurt. I agree that it does not need to be KVM_REQ_MMU_RELOAD specific, we could just walk/clear/wake all bits on that vcpu->requests variable. Would that be generic enough in your opinion ? Don't know. Avi? I think I lost the thread here, but I'll try. Isn't the wake part make_all_vcpus_request() in kvm_main.c? The wait part could be moved to a similar generic function. I'll try to summarize my current thoughts a bit: The rebased patch series brings several fixes and the wait/wakeup mechanism which is in discussion here. As explained before this keeps the new wait implementation in s390 arch code which allows us to experiment with it. Later if we are happy with it we might (or not) continue the merge and bring this mechanism to make_all_vcpus_request (as on x86 you don't have the issues I try to fix here we don't need to hurry bringing that into generic code). Well now to the wait/wakeup which is here in discussion in detail: The s390 arch code can kick a guest, but we don't know implicitly (as x86 does) that the kick succeeded, it might happen somewhen sooner or later. Therefore the code uses wait_on_bit to wait until the vcpu->request bit is consumed. To ensure cleanup of these waiting threads in some special cases the clear&wake up is also needed at other places than the real bit consumption. One of them is the vcpu release code where we should clear&wakeup all waiters (Marcelo correctly pointed out that we should not be bit specific there, so I just just wake up all in the updated code). That was the discussion here: "if it would be ok to clear & wake up all". As wake_up_bit doesn't hurt if there is no waiter it looks like the best solution to to do that in the generic part of vcpu_release. If ever someone else waits for this or another bit in vcpu->requests, the code ensures all of them are awaken on vcpu release. I send an updated version of the rebased series in a few minutes, containing updates related to what marcelo pointed out. P.S. in case you think we need much more discussions we might try to catch up on irc to save this thread a few cycles :-) -- GrĂ¼sse / regards, Christian Ehrhardt IBM Linux Technology Center, Open Virtualization -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] kvm-s390: streamline memslot handling - rebased v2
From: Christian Ehrhardt As requested this is a rebased patch on top of the already applied v3 of the patch series. *updates to applied version* - remove dependency to KVM_REQ_MMU_RELOAD in generic code - remove explicit barrier after test_and_clear_bit as it is implied - ensure the wait_on_bit waiter is notified - ensure dropping vcpu all requests while freeing a vcpu - kickout only scheduled vcpus (its superfluous and wait might hang forever on not running vcpus) - kvm_arch_set_memory_region waits until the bit is consumed by the vcpu This patch relocates the variables kvm-s390 uses to track guest mem addr/size. As discussed dropping the variables at struct kvm_arch level allows to use the common vcpu->request based mechanism to reload guest memory if e.g. changes via set_memory_region. The kick mechanism introduced in this series is used to ensure running vcpus leave guest state to catch the update. Signed-off-by: Christian Ehrhardt --- [diffstat] arch/s390/kvm/kvm-s390.c | 27 --- arch/s390/kvm/kvm-s390.h |6 ++ virt/kvm/kvm_main.c |6 ++ 3 files changed, 32 insertions(+), 7 deletions(-) Index: kvm/arch/s390/kvm/kvm-s390.c === --- kvm.orig/arch/s390/kvm/kvm-s390.c +++ kvm/arch/s390/kvm/kvm-s390.c @@ -674,6 +674,12 @@ long kvm_arch_vcpu_ioctl(struct file *fi return -EINVAL; } +static int wait_bit_schedule(void *word) +{ + schedule(); + return 0; +} + /* Section: memory related */ int kvm_arch_set_memory_region(struct kvm *kvm, struct kvm_userspace_memory_region *mem, @@ -681,6 +687,7 @@ int kvm_arch_set_memory_region(struct kv int user_alloc) { int i; + struct kvm_vcpu *vcpu; /* A few sanity checks. We can have exactly one memory slot which has to start at guest virtual zero and which has to be located at a @@ -706,13 +713,19 @@ int kvm_arch_set_memory_region(struct kv /* request update of sie control block for all available vcpus */ for (i = 0; i < KVM_MAX_VCPUS; ++i) { - if (kvm->vcpus[i]) { - if (test_and_set_bit(KVM_REQ_MMU_RELOAD, - &kvm->vcpus[i]->requests)) - continue; - kvm_s390_inject_sigp_stop(kvm->vcpus[i], - ACTION_VCPUREQUEST_ON_STOP); - } + vcpu = kvm->vcpus[i]; + if (!vcpu) + continue; + + if (!test_and_set_bit(KVM_REQ_MMU_RELOAD, &vcpu->requests)) + continue; + + if (vcpu->cpu == -1) + continue; + + kvm_s390_inject_sigp_stop(vcpu, ACTION_VCPUREQUEST_ON_STOP); + wait_on_bit(&vcpu->requests, KVM_REQ_MMU_RELOAD, + wait_bit_schedule, TASK_UNINTERRUPTIBLE); } return 0; Index: kvm/arch/s390/kvm/kvm-s390.h === --- kvm.orig/arch/s390/kvm/kvm-s390.h +++ kvm/arch/s390/kvm/kvm-s390.h @@ -92,6 +92,12 @@ static inline unsigned long kvm_s390_han if (!vcpu->requests) return 0; + /* requests that can be handled at all levels */ + if (test_and_clear_bit(KVM_REQ_MMU_RELOAD, &vcpu->requests)) { + wake_up_bit(&vcpu->requests, KVM_REQ_MMU_RELOAD); + kvm_s390_vcpu_set_mem(vcpu); + } + return vcpu->requests; } Index: kvm/virt/kvm/kvm_main.c === --- kvm.orig/virt/kvm/kvm_main.c +++ kvm/virt/kvm/kvm_main.c @@ -1681,6 +1681,12 @@ static int kvm_vcpu_mmap(struct file *fi static int kvm_vcpu_release(struct inode *inode, struct file *filp) { struct kvm_vcpu *vcpu = filp->private_data; + int i; + + vcpu->requests = 0; + smp_mb(); + for (i = 0; i < sizeof(vcpu->requests); i++) + wake_up_bit(&vcpu->requests, i); kvm_put_kvm(vcpu->kvm); return 0; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] kvm-s390: update vcpu->cpu - rebased
From: Christian Ehrhardt As requested this is a rebased patch on top of the already applied v3 of the patch series. kvm on s390 formerly ignored vcpu->cpu. This patch adds set/unset vcpu->cpu in kvm_arch_vcpu_load/put to allow further architecture unification e.g. let generic code not find -1 on currently scheduled vcpus. Signed-off-by: Christian Ehrhardt --- [diffstat] kvm-s390.c |2 ++ 1 file changed, 2 insertions(+) [diff] Index: kvm/arch/s390/kvm/kvm-s390.c === --- kvm.orig/arch/s390/kvm/kvm-s390.c +++ kvm/arch/s390/kvm/kvm-s390.c @@ -244,6 +244,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcp void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { + vcpu->cpu = cpu; save_fp_regs(&vcpu->arch.host_fpregs); save_access_regs(vcpu->arch.host_acrs); vcpu->arch.guest_fpregs.fpc &= FPC_VALID_MASK; @@ -253,6 +254,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) { + vcpu->cpu = -1; save_fp_regs(&vcpu->arch.guest_fpregs); save_access_regs(vcpu->arch.guest_acrs); restore_fp_regs(&vcpu->arch.host_fpregs); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] kvm-s390: infrastructure to kick vcpus out of guest state - rebased
From: Christian Ehrhardt As requested this is a rebased patch on top of the already applied v3 of the patch series. *updates to already applied version* - ensure allocations (might_sleep) are out of atomic context - centralize consumption of vcpu->request bits To ensure vcpu's come out of guest context in certain cases this patch adds a s390 specific way to kick them out of guest context. Currently it kicks them out to rerun the vcpu_run path in the s390 code, but the mechanism itself is expandable and with a new flag we could also add e.g. kicks to userspace etc. Signed-off-by: Christian Ehrhardt --- [diffstat] include/asm/kvm_host.h |2 +- kvm/intercept.c| 10 ++ kvm/kvm-s390.c |7 +++ kvm/kvm-s390.h | 16 +++- kvm/sigp.c | 31 +-- 5 files changed, 46 insertions(+), 20 deletions(-) Index: kvm/arch/s390/kvm/intercept.c === --- kvm.orig/arch/s390/kvm/intercept.c +++ kvm/arch/s390/kvm/intercept.c @@ -141,10 +141,12 @@ static int handle_stop(struct kvm_vcpu * rc = -ENOTSUPP; } - if (vcpu->arch.local_int.action_bits & ACTION_RELOADVCPU_ON_STOP) { - vcpu->arch.local_int.action_bits &= ~ACTION_RELOADVCPU_ON_STOP; - rc = SIE_INTERCEPT_RERUNVCPU; - vcpu->run->exit_reason = KVM_EXIT_INTR; + if (vcpu->arch.local_int.action_bits & ACTION_VCPUREQUEST_ON_STOP) { + vcpu->arch.local_int.action_bits &= ~ACTION_VCPUREQUEST_ON_STOP; + if (kvm_s390_handle_vcpu_requests(vcpu, VCPUREQUESTLVL_SIGP)) { + rc = SIE_INTERCEPT_CHECKREQUESTS; + vcpu->run->exit_reason = KVM_EXIT_INTR; + } } if (vcpu->arch.local_int.action_bits & ACTION_STOP_ON_STOP) { Index: kvm/arch/s390/include/asm/kvm_host.h === --- kvm.orig/arch/s390/include/asm/kvm_host.h +++ kvm/arch/s390/include/asm/kvm_host.h @@ -182,7 +182,7 @@ struct kvm_s390_interrupt_info { /* for local_interrupt.action_flags */ #define ACTION_STORE_ON_STOP (1<<0) #define ACTION_STOP_ON_STOP(1<<1) -#define ACTION_RELOADVCPU_ON_STOP (1<<2) +#define ACTION_VCPUREQUEST_ON_STOP (1<<2) struct kvm_s390_local_interrupt { spinlock_t lock; Index: kvm/arch/s390/kvm/kvm-s390.c === --- kvm.orig/arch/s390/kvm/kvm-s390.c +++ kvm/arch/s390/kvm/kvm-s390.c @@ -484,8 +484,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_v rerun_vcpu: if (vcpu->requests) - if (test_and_clear_bit(KVM_REQ_MMU_RELOAD, &vcpu->requests)) - kvm_s390_vcpu_set_mem(vcpu); + kvm_s390_handle_vcpu_requests(vcpu, VCPUREQUESTLVL_VCPURUN); /* verify, that memory has been registered */ if (!vcpu->arch.sie_block->gmslm) { @@ -521,7 +520,7 @@ rerun_vcpu: rc = kvm_handle_sie_intercept(vcpu); } while (!signal_pending(current) && !rc); - if (rc == SIE_INTERCEPT_RERUNVCPU) + if (rc == SIE_INTERCEPT_CHECKREQUESTS) goto rerun_vcpu; if (signal_pending(current) && !rc) { @@ -710,7 +709,7 @@ int kvm_arch_set_memory_region(struct kv &kvm->vcpus[i]->requests)) continue; kvm_s390_inject_sigp_stop(kvm->vcpus[i], - ACTION_RELOADVCPU_ON_STOP); + ACTION_VCPUREQUEST_ON_STOP); } } Index: kvm/arch/s390/kvm/kvm-s390.h === --- kvm.orig/arch/s390/kvm/kvm-s390.h +++ kvm/arch/s390/kvm/kvm-s390.h @@ -25,7 +25,7 @@ typedef int (*intercept_handler_t)(struct kvm_vcpu *vcpu); /* negativ values are error codes, positive values for internal conditions */ -#define SIE_INTERCEPT_RERUNVCPU(1<<0) +#define SIE_INTERCEPT_CHECKREQUESTS(1<<0) int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu); #define VM_EVENT(d_kvm, d_loglevel, d_string, d_args...)\ @@ -81,6 +81,20 @@ static inline void kvm_s390_vcpu_set_mem up_read(&vcpu->kvm->slots_lock); } +/* interception levels from which handle vcpu requests can be called */ +#define VCPUREQUESTLVL_SIGP1 +#define VCPUREQUESTLVL_VCPURUN 2 +static inline unsigned long kvm_s390_handle_vcpu_requests(struct kvm_vcpu *vcpu, + int level) +{ + BUG_ON(!level); + + if (!vcpu->requests) + return 0; + + return vcpu->requests; +} + /* implemented in priv.c */ int kvm_s390_handle_b2(struct kvm_vcpu *vcpu); Index: kvm/arch/s390/kvm/sigp.c ===
[PATCH 0/3] kvm-s390: revised version of kvm-s390 guest memory handling - rebased v2
From: Christian Ehrhardt As requested this is a rebased patch on top of the already applied v3 of the patch series. *updates to already applied version* - remove dependency to KVM_REQ_MMU_RELOAD in generic code - remove explicit barrier after test_and_clear_bit as it is implied - ensure the wait_on_bit waiter is notified - move the reset of requests to kvm_vcpu_release to drop them early - ensure dropping all vcpu requests while freeing a vcpu - ensure kick allocations (might_sleep) are out of atomic context - update vcpu->cpu in kvm-s390 arch handler for load/put - centralize consumption of vcpu->request bits - updates on running vcpus can now be handled without need to rerun the vcpu - kvm_arch_set_memory_region waits until the bit is consumed by the vcpu - kickout only scheduled vcpus (wait might hang forever on non-scheduled vcpus) Note: further unification of make_all_cpu_request and the kick mechanism is planned, but it might be good to split it from this step towards commonality. Patches included: Subject: [PATCH 1/3] kvm-s390: infrastructure to kick vcpus out of guest state - rebased Subject: [PATCH 2/3] kvm-s390: update vcpu->cpu - rebased Subject: [PATCH 3/3] kvm-s390: streamline memslot handling - rebased v2 Overall-Diffstat: arch/s390/include/asm/kvm_host.h |2 +- arch/s390/kvm/intercept.c| 10 ++ arch/s390/kvm/kvm-s390.c | 36 +--- arch/s390/kvm/kvm-s390.h | 22 +- arch/s390/kvm/sigp.c | 31 +-- virt/kvm/kvm_main.c |6 ++ 6 files changed, 80 insertions(+), 27 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/8] Small cpu loop cleanups
On 06/15/2009 04:23 PM, Gleb Natapov wrote: Also fix "info cpus" to show correct halt status with in kernel irq chip. Applied all, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
Avi Kivity wrote: On 06/15/2009 04:23 PM, Anthony Liguori wrote: How would qemu know which slots to optimize for? In practice, I don't see that as a real problem. We should (a) add an ioapic and four more pci links (b) recommend that slots be assigned in ascending order, and everything works. I don't see your concern about libvirt allocating slots. If a human can plug a card into a slot, so can libvirt. Doing an interactive back-and-forth (equivalent to plugging a card while blindfolded, then looking to see which slot we hit) is certainly more difficult. Let's take a concrete example because I think you missed my point. For the r2d board, if you have 1 on-board NIC, it has to go in slot 2. Additional NICs can go in any slot, but the primary on-board NIC is expected to live in slot 2. It's possible to not have that on-board NIC. If you let QEMU allocate which PCI slot a device goes in, we can hide this detail from libvirt. If you have libvirt do PCI slot allocation by default, it has to know about this restriction in the r2d board unless you have a clever way to express this sort of information. Once QEMU has allocated a device to a slot, libvirt can do a good job maintaining that relationship. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On Mon, 2009-06-15 at 07:48 -0500, Anthony Liguori wrote: > Avi Kivity wrote: > > On 06/15/2009 12:09 PM, Mark McLoughlin wrote: > > I think the point is that you don't need version numbers if you > > have a > > proper device tree. > > > > > How do you add a new attribute to the device tree and, when a supplied > device tree lacking said attribute, distinguish between a device tree > from an old version of qemu (i.e. use the old default) and a partial > device tree from the VM manager (i.e. use the new default) ? > > > >>> -baseline 0.10 > >>> > >> > >> That's a version number :-) > >> > >> (I was responding to Anthony's "you don't need a version number") > >> > > > > If you want to prevent incompatibilities, you need to make everything > > new (potentially including bugfixes) non-default. No need to punish new guests in order to maintain compatibility for old guests. > > Eventually the > > default configuration becomes increasingly unusable and you need a new > > baseline. You must still be able to fall back to the old baseline for > > older guests. I don't think games with configuration files can hide > > that. > > -M pc1 > -M pc2 > > etc. > > This is pretty easy to maintain with config files. I think this would be reasonable, but it is essentially just a version number which you objected to on the basis that it would make cherry-picking harder for distros. One thing that would be nice with this '-M pc1' thing would be to retain '-M pc' as a symlink to the latest version. We'd also need a way to read the symlink too, so that you can query what the current latest version is and use that in future. How would this machine type version relate to e.g. changing the default PCI class of virtio-blk? Would we bump the version number of all machine types can use virtio-blk? A per-device version number is workable alternative, but only with a saveabi type file IMHO. I've tried to summarise the options here: https://fedoraproject.org/wiki/Features/KVM_Stable_Guest_ABI Cheers, Mark. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
Anthony Liguori wrote: Avi Kivity wrote: On 06/15/2009 04:23 PM, Anthony Liguori wrote: How would qemu know which slots to optimize for? In practice, I don't see that as a real problem. We should (a) add an ioapic and four more pci links (b) recommend that slots be assigned in ascending order, and everything works. I don't see your concern about libvirt allocating slots. If a human can plug a card into a slot, so can libvirt. Doing an interactive back-and-forth (equivalent to plugging a card while blindfolded, then looking to see which slot we hit) is certainly more difficult. Let's take a concrete example because I think you missed my point. For the r2d board, if you have 1 on-board NIC, it has to go in slot 2. Additional NICs can go in any slot, but the primary on-board NIC is expected to live in slot 2. It's possible to not have that on-board NIC. Libvirt does not support r2d. I hope it won't start to support it. We can have default values for these types of devices or something like pci_addr=auto. If you let QEMU allocate which PCI slot a device goes in, we can hide this detail from libvirt. If you have libvirt do PCI slot allocation by default, it has to know about this restriction in the r2d board unless you have a clever way to express this sort of information. Once QEMU has allocated a device to a slot, libvirt can do a good job maintaining that relationship. The end user should have a mechanism to control device slot positioning. For example, if you have several pci devices, some get high rate of interrupts and some not, if you want to optimize you guest you should isolate the high rate 'interesting' devices. This is something the user will need to do. I agree that the default behavior might be 'auto' Also, while moving from one qemu version to another, you'll need to represent the older behavior. -qemu-0.10 is not good enough since there will be multiple versions in the future with multiple distributions setting their defaults. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On 06/15/2009 04:45 PM, Anthony Liguori wrote: Avi Kivity wrote: On 06/15/2009 04:20 PM, Anthony Liguori wrote: It's not at all that simple. SCSI has a hierarchical address mechanism with 0-7 targets but then potentially multiple LUNs per target. Today, we always emulate a single LUN per target but if we ever wanted to support more than 7 disks on a SCSI controller, we would have to add multiple LUN support too. So the current linear unit= parameter is actually pretty broken for SCSI. Well, another level in the hierarchy, but I don't think it materially changes things. Depends on whether you expect to say index=0,lun=3 or index=3. If you mean the later, then it's quite conceivable that each target supports less than the maximum number of LUNs. This makes things pretty confusing to the user because they have to know that in the current implementation, index=0 is valid, index=1 isn't, but index=8 is. I'd object to any implicit addressing rules. If we have to say target=2,lun=7,street=8,city=9,state=99,zip=12345 instead of index=8345345235 so be it. No, I meant drive file=foo.img,bus=3. If that doesn't seem obvious what it should do to you that's because it isn't at all obvious :-) It ends up skipping a predefined number of locations in the drive table. This is pretty broken fundamentally because it assumes controllers always support a fixed number of devices. Nothing really respects bus_id though so in practice, I assume it's almost universally broken. Isn't the drive table something totally internal? And how does bus= relate to it? The reality of unit=X,bus=Y,if=Z is that they expand to: drive_table_index=Y*max_devs[Z] + X Whereas max_devs = {"ide":4, "scsi": 7, *:0} How drive_table_index is interpreted is "if" specific. For if=scsi, each lsi device gets a base drive table index that starts at bus_index * 7. For virtio, the first empty spot in drive_table results in no more drives being created. It's broken by design. Agreed. Pity that it's exposed to the poor users. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: slow guest performance with build load, looking for ideas
> What is the host cpu type? On pre-Nehalem/Barcelona processors kvm has > poor scalability in mmu intensive workloads like kernel builds. Thanks for getting back to me. Today is pretty booked but I'm going to go find a Nehalem system and try to run similar tests to compare. I'll post my results to this thread. So if I understand what you're saying: best not to use kvm guests for build servers with pre-Nehalem processors. Both systems I used were pre-Nehalem. Here is a cpuinfo snip from both systems I tested on: processor : 7 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU E5345 @ 2.33GHz stepping: 7 cpu MHz : 2327.500 cache size : 4096 KB physical id : 1 siblings: 4 core id : 3 cpu cores : 4 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr dca lahf_lm bogomips: 4655.14 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: and processor : 7 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Xeon(R) CPU X5460 @ 3.16GHz stepping: 6 cpu MHz : 3158.307 cache size : 6144 KB physical id : 1 siblings: 4 core id : 3 cpu cores : 4 apicid : 7 initial apicid : 7 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 lahf_lm tpr_shadow vnmi flexpriority bogomips: 6317.51 clflush size: 64 cache_alignment : 64 address sizes : 38 bits physical, 48 bits virtual power management: -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
Mark McLoughlin wrote: On Mon, 2009-06-15 at 07:48 -0500, Anthony Liguori wrote: Eventually the default configuration becomes increasingly unusable and you need a new baseline. You must still be able to fall back to the old baseline for older guests. I don't think games with configuration files can hide that. -M pc1 -M pc2 etc. This is pretty easy to maintain with config files. I think this would be reasonable, but it is essentially just a version number which you objected to on the basis that it would make cherry-picking harder for distros. It doesn't have to be pc1, pc2. It could be pc-with-usb or pc-with-balloon. If a distro cherry picks in such a way that their pc is not a standard QEMU pc, they would add a new PC type that's specific to their distro. One thing that would be nice with this '-M pc1' thing would be to retain '-M pc' as a symlink to the latest version. We'd also need a way to read the symlink too, so that you can query what the current latest version is and use that in future. Another option is an explicit -M default which always uses the default machine for the architecture. Likewise, we would need a way to query what the default machine was for an architecture. How would this machine type version relate to e.g. changing the default PCI class of virtio-blk? Would we bump the version number of all machine types can use virtio-blk? You would introduce a new machine type. For instance, pc-virtio-class-other. The names don't have to look like that, I'm just doing that to make a point. This may mean that you end up with dozens of machine types but you preserve compatibility, which is a good thing. Of course, the flip side is that you make preserving the machine config the duty of the user and we don't maintain compatible machine types. This won't work without a proper config file though so for now, we're stuck maintaining machine types. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints
On Mon, Jun 15, 2009 at 6:43 AM, Avi Kivity wrote: > (I'd be quite happy constructing the entire machine config on the command > line, but I realize it's just me) as a user-only (well, i'm a developer, but don't meddle in kernel affairs since 0.99pl9); I also like that kvm is totally CLI-managed. but migration-wise, i think it could be nicer if the 'origin' process could send the config to the 'target' one. IOW: the -incoming flag shouldn't need any other parameter, and the 'migrate' command should send the whole hardware description before the CPU state, and fail with a 'can't comply' message if the target complains. of course, that's a simplification. for example, the 'target' process should be able to respect some parameters, mostly the 'external' descriptions, like storage pathnames, or '-net tap' ones. -- Javier -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: slow guest performance with build load, looking for ideas
On 06/15/2009 05:15 PM, Erik Jacobson wrote: What is the host cpu type? On pre-Nehalem/Barcelona processors kvm has poor scalability in mmu intensive workloads like kernel builds. Thanks for getting back to me. Today is pretty booked but I'm going to go find a Nehalem system and try to run similar tests to compare. I'll post my results to this thread. So if I understand what you're saying: best not to use kvm guests for build servers with pre-Nehalem processors. pre-Nehalem / pre-Barcelona, > 4 vcpus, yes. Both systems I used were pre-Nehalem. Here is a cpuinfo snip from both systems I tested on: Yes, so I expect you're seeing contention on kvm->mmu_lock. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
Dor Laor wrote: Libvirt does not support r2d. I hope it won't start to support it. It supports mips, sparc, and ppc machines now. I don't see why it wouldn't support r2d. For ppcemb, I expect this same problem to occur. This sort of restriction is going to be common with embedded boards. We can have default values for these types of devices or something like pci_addr=auto. Why wouldn't libvirt always use pci_addr=auto? If the only argument for having libvirt do pci slot allocation is error messages, can't we find a nice way to allow libvirt to create friendly error messages when QEMU fails? If you let QEMU allocate which PCI slot a device goes in, we can hide this detail from libvirt. If you have libvirt do PCI slot allocation by default, it has to know about this restriction in the r2d board unless you have a clever way to express this sort of information. Once QEMU has allocated a device to a slot, libvirt can do a good job maintaining that relationship. The end user should have a mechanism to control device slot positioning. For example, if you have several pci devices, some get high rate of interrupts and some not, if you want to optimize you guest you should isolate the high rate 'interesting' devices. This is something the user will need to do. I agree that the default behavior might be 'auto' I'm not at all arguing against pci_addr. I'm arguing about how libvirt should use it with respect to the "genesis" use-case where libvirt has no specific reason to choose one PCI slot over another. In that case, I'm merely advocating that we want to let QEMU make the decision. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
On Mon, Jun 15, 2009 at 09:20:00AM -0500, Anthony Liguori wrote: > Mark McLoughlin wrote: >> On Mon, 2009-06-15 at 07:48 -0500, Anthony Liguori wrote: >> Eventually the default configuration becomes increasingly unusable and you need a new baseline. You must still be able to fall back to the old baseline for older guests. I don't think games with configuration files can hide that. >>> -M pc1 >>> -M pc2 >>> >>> etc. >>> >>> This is pretty easy to maintain with config files. >>> >> >> I think this would be reasonable, but it is essentially just a version >> number which you objected to on the basis that it would make >> cherry-picking harder for distros. >> > > It doesn't have to be pc1, pc2. It could be pc-with-usb or > pc-with-balloon. If a distro cherry picks in such a way that their pc > is not a standard QEMU pc, they would add a new PC type that's specific > to their distro. > >> One thing that would be nice with this '-M pc1' thing would be to retain >> '-M pc' as a symlink to the latest version. We'd also need a way to read >> the symlink too, so that you can query what the current latest version >> is and use that in future. >> > > Another option is an explicit -M default which always uses the default > machine for the architecture. Likewise, we would need a way to query > what the default machine was for an architecture. > >> How would this machine type version relate to e.g. changing the default >> PCI class of virtio-blk? Would we bump the version number of all machine >> types can use virtio-blk? >> > You would introduce a new machine type. For instance, > pc-virtio-class-other. The names don't have to look like that, I'm just > doing that to make a point. This may mean that you end up with dozens > of machine types but you preserve compatibility, which is a good thing. And then pc-virtio-class-other-with-balloon-without-usb? Wouldn't it be more straightforward to have capability bits which can be switched on and off independently rather than trying to fit unrelated features into a machine type? IMO it only seems more work at first, and QA gets a bit nervious that they can't exhaustively test all options. But in the long run it simplifies things as you don't have to set policy and invent silly names. > Of course, the flip side is that you make preserving the machine config > the duty of the user and we don't maintain compatible machine types. > This won't work without a proper config file though so for now, we're > stuck maintaining machine types. > > Regards, > > Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html