Steal time in KVM
Hi, I am trying to get the steal time with 2 VMs (each with 1 Vcpu) pinned to same core. While finding documentation on this, I came across your patches and posts related to the implementation of this feature, so I thought it would be best to ask you. I run the same application on these 2 VMs simultaneously and see the performance difference. I am trying to read the steal time from inside the guest using top, vmstat etc. Both, top and vmstat -s report the steal time (st) as 0. I also checked that procps is in latest version. I am using virtio-net. I suspect that the steal time is not being updated well. Is there something which I need to configure for this to work? My Linux version for guest image is: Linux server-147 2.6.35-24-virtual #42-Ubuntu SMP Thu Dec 2 05:15:26 UTC 2010 x86_64 GNU/Linux And /proc/cpuinfo shows i: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 2 model name : QEMU Virtual CPU version 0.14.0 stepping: 3 cpu MHz : 2992.498 cache size : 4096 KB fpu : yes fpu_exception : yes cpuid level : 4 wp : yes flags : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm up rep_good pni cx16 hypervisor lahf_lm bogomips: 5984.99 clflush size: 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: Another question I had was that is there a way to programmatically get the value of steal cycles (e.g. from a C program) ? Thanks, Abhishek On Monday, June 13, 2011 6:40:02 PM UTC-5, Glauber Costa wrote: > Hi, > > This series is a repost of the last series I posted about this. > It tries to address most concerns that were raised at the time, > plus makes uses of the static_branch interface to disable the > steal code when not in use. > > > Glauber Costa (7): > KVM-HDR Add constant to represent KVM MSRs enabled bit > KVM-HDR: KVM Steal time implementation > KVM-HV: KVM Steal time implementation > KVM-GST: Add a pv_ops stub for steal time > KVM-GST: KVM Steal time accounting > KVM-GST: adjust scheduler cpu power > KVM-GST: KVM Steal time registration > > Documentation/kernel-parameters.txt |4 ++ > Documentation/virtual/kvm/msr.txt | 33 + > arch/x86/Kconfig | 12 + > arch/x86/include/asm/kvm_host.h |8 +++ > arch/x86/include/asm/kvm_para.h | 15 ++ > arch/x86/include/asm/paravirt.h |9 > arch/x86/include/asm/paravirt_types.h |1 + > arch/x86/kernel/kvm.c | 72 + > arch/x86/kernel/kvmclock.c|2 + > arch/x86/kernel/paravirt.c|9 > arch/x86/kvm/x86.c| 60 +++- > kernel/sched.c| 81 > + > kernel/sched_features.h |4 +- > 13 files changed, 296 insertions(+), 14 deletions(-) > > -- > 1.7.3.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ On Monday, June 13, 2011 6:40:02 PM UTC-5, Glauber Costa wrote: > Hi, > > This series is a repost of the last series I posted about this. > It tries to address most concerns that were raised at the time, > plus makes uses of the static_branch interface to disable the > steal code when not in use. > > > Glauber Costa (7): > KVM-HDR Add constant to represent KVM MSRs enabled bit > KVM-HDR: KVM Steal time implementation > KVM-HV: KVM Steal time implementation > KVM-GST: Add a pv_ops stub for steal time > KVM-GST: KVM Steal time accounting > KVM-GST: adjust scheduler cpu power > KVM-GST: KVM Steal time registration > > Documentation/kernel-parameters.txt |4 ++ > Documentation/virtual/kvm/msr.txt | 33 + > arch/x86/Kconfig | 12 + > arch/x86/include/asm/kvm_host.h |8 +++ > arch/x86/include/asm/kvm_para.h | 15 ++ > arch/x86/include/asm/paravirt.h |9 > arch/x86/include/asm/paravirt_types.h |1 + > arch/x86/kernel/kvm.c | 72 + > arch/x86/kernel/kvmclock.c|2 + > arch/x86/kernel/paravirt.c|9 > arch/x86/kvm/x86.c| 60 +++- > kernel/sched.c| 81 > + > kernel/sched_features.h |4 +- > 13 files changed, 296 insertions(+), 14 deletions(-) > > -- > 1.7.3.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.htm
Re: [PATCH 14/15] KVM: ARM: Handle I/O aborts
On Fri, Oct 05, 2012 at 10:00:25AM +0100, Russell King - ARM Linux wrote: > On Mon, Oct 01, 2012 at 01:53:26PM +0100, Dave Martin wrote: > > A good starting point would be load/store emulation as this seems to be a > > common theme, and we would need a credible deployment for any new > > framework so that we know it's fit for purpose. > > Probably not actually, that code is written to be fast, because things > like IP stack throughput depend on it - particularly when your network > card can only DMA packets to 32-bit aligned addresses (resulting in > virtually all network data being misaligned.) A fair point, but surely it would still be worth a try? We might decide that a few particular cases of instruction decode should not use the generic framework for performance reaons, but in most cases being critically dependent on fault-driven software emulation for performance would be a serious mistake in the first place (discussions about the network code notwithstanding). This is not an argument for being slower just for the sake of it, but it can make sense to factor code on paths where performance is not an issue. Cheers ---Dave -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-ppc] [RFC PATCH 05/17] KVM: PPC: booke: Extend MAS2 EPN mask for 64-bit
On 05.07.2012, at 13:14, Caraman Mihai Claudiu-B02008 wrote: > > >> -Original Message- >> From: Alexander Graf [mailto:ag...@suse.de] >> Sent: Wednesday, July 04, 2012 4:50 PM >> To: Caraman Mihai Claudiu-B02008 >> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc- >> d...@lists.ozlabs.org; qemu-...@nongnu.org >> Subject: Re: [Qemu-ppc] [RFC PATCH 05/17] KVM: PPC: booke: Extend MAS2 >> EPN mask for 64-bit >> >> >> On 25.06.2012, at 14:26, Mihai Caraman wrote: >> >>> Extend MAS2 EPN mask for 64-bit hosts, to retain most significant bits. >>> Change get tlb eaddr to use this mask. >> >> Please see section 6.11.4.8 in the PowerISA 2.06b: >> >> MMU behavior is largely unaffected by whether the thread is in 32-bit >> computation mode (MSRCM=0) or 64- bit computation mode (MSRCM=1). The >> only differ- ences occur in the EPN field of the TLB entry and the EPN >> field of MAS2. The differences are summarized here. >> >> * Executing a tlbwe instruction in 32-bit mode will set bits 0:31 >> of the TLB EPN field to zero unless MAS0ATSEL is set, in which case those >> bits are not written to zero. >> * In 32-bit implementations, MAS2U can be used to read or write >> EPN0:31 of MAS2. >> >> So if MSR.CM is not set tlbwe should mask the upper 32 bits out - which >> can happen regardless of CONFIG_64BIT. > > MAS2_EPN reflects EPN field of MAS2 aka bits 0:51 (for MAV = 1.0) according > to section 6.10.3.10 in the PowerISA 2.06b. > > MAS2_EPN is not used in tlbwe execution emulation, we have MAS2_VAL define > for this case. So tlbe->mas2 is guaranteed to have the upper bits be 0 when MSR.CM=0? > >> Also, we need to implement MAS2U, to potentially make the upper 32bits of >> MAS2 available, right? But that one isn't as important as the first bit. > > MAS2U is guest privileged why does it need special care? Maybe it's mapped to the upper bits of GMAS2 automatically? > Freescale core Manuals and EREF does not mention MAS2U so I think I our case > it is not implemented. Please check with a simple mfspr() test on real hw to see if it really isn't implemented. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] kvmclock: fix guest stop notification
On (Sun) 30 Sep 2012 [20:05:16], Marcelo Tosatti wrote: > On Thu, Sep 20, 2012 at 09:46:41AM -0300, Marcelo Tosatti wrote: > > On Thu, Sep 20, 2012 at 01:55:20PM +0530, Amit Shah wrote: > > > Commit f349c12c0434e29c79ecde89029320c4002f7253 added the guest stop > > > notification, but it did it in a way that the stop notification would > > > never reach the kernel. The kvm_vm_state_changed() function gets a > > > value of 0 for the 'running' parameter when the VM is stopped, making > > > all the code added previously dead code. > > > > > > This patch reworks the code so that it's called when 'running' is 0, > > > which indicates the VM was stopped. ... > NACK, guest should be notified when the VM is starting, not > when stopping. Ah, right. Amit -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] kvmclock: fix guest stop notification
On (Sun) 30 Sep 2012 [21:50:07], Amos Kong wrote: > - Original Message - > > On Thu, Sep 20, 2012 at 09:46:41AM -0300, Marcelo Tosatti wrote: > > > On Thu, Sep 20, 2012 at 01:55:20PM +0530, Amit Shah wrote: > > > > Commit f349c12c0434e29c79ecde89029320c4002f7253 added the guest > > > > stop > > In commitlog of f349c12c0434e29c79ecde89029320c4002f7253: > > ## This patch uses the qemu Notifier system to tell the guest it _is about to > be_ stopped > > > > > > notification, but it did it in a way that the stop notification > > > > would > > > > never reach the kernel. The kvm_vm_state_changed() function gets > > > > a > > > > value of 0 for the 'running' parameter when the VM is stopped, > > > > making > > > > all the code added previously dead code. > > > > > > > > This patch reworks the code so that it's called when 'running' is > > > > 0, > > > > which indicates the VM was stopped. > > Amit, did you touch any real issue? guest gets call trace with current code? > which kind of context? I guess you're asking for a testcase to trigger softlockups? Run a VM, make it do some work (like kernel compile). Then, 'stop' from the monitor for a few minutes. Later, on 'cont', the softlockup detector in the guest wakes up and shows a warning message mentioning the cpus were stuck for seconds. For this particular patch, though, I didn't really test things; just 'found' this by examining code. But as Marcelo points out, this patch is wrong. > Someone told me he got call trace when shutdown guest by 'init 0', I didn't > verify this issue. That sounds like a completely different thing, unless the trace is invoked by the softlockup detector. Amit -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [Qemu-ppc] [RFC PATCH 05/17] KVM: PPC: booke: Extend MAS2 EPN mask for 64-bit
> -Original Message- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Monday, October 08, 2012 1:11 PM > To: Caraman Mihai Claudiu-B02008 > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc- > d...@lists.ozlabs.org; qemu-...@nongnu.org > Subject: Re: [Qemu-ppc] [RFC PATCH 05/17] KVM: PPC: booke: Extend MAS2 > EPN mask for 64-bit > > > On 05.07.2012, at 13:14, Caraman Mihai Claudiu-B02008 wrote: > > > > > > >> -Original Message- > >> From: Alexander Graf [mailto:ag...@suse.de] > >> Sent: Wednesday, July 04, 2012 4:50 PM > >> To: Caraman Mihai Claudiu-B02008 > >> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc- > >> d...@lists.ozlabs.org; qemu-...@nongnu.org > >> Subject: Re: [Qemu-ppc] [RFC PATCH 05/17] KVM: PPC: booke: Extend MAS2 > >> EPN mask for 64-bit > >> > >> > >> On 25.06.2012, at 14:26, Mihai Caraman wrote: > >> > >>> Extend MAS2 EPN mask for 64-bit hosts, to retain most significant > bits. > >>> Change get tlb eaddr to use this mask. > >> > >> Please see section 6.11.4.8 in the PowerISA 2.06b: > >> > >> MMU behavior is largely unaffected by whether the thread is in 32-bit > >> computation mode (MSRCM=0) or 64- bit computation mode (MSRCM=1). The > >> only differ- ences occur in the EPN field of the TLB entry and the EPN > >> field of MAS2. The differences are summarized here. > >> > >>* Executing a tlbwe instruction in 32-bit mode will set bits 0:31 > >> of the TLB EPN field to zero unless MAS0ATSEL is set, in which case > those > >> bits are not written to zero. > >>* In 32-bit implementations, MAS2U can be used to read or write > >> EPN0:31 of MAS2. > >> > >> So if MSR.CM is not set tlbwe should mask the upper 32 bits out - > which > >> can happen regardless of CONFIG_64BIT. > > > > MAS2_EPN reflects EPN field of MAS2 aka bits 0:51 (for MAV = 1.0) > according > > to section 6.10.3.10 in the PowerISA 2.06b. > > > > MAS2_EPN is not used in tlbwe execution emulation, we have MAS2_VAL > define > > for this case. > > So tlbe->mas2 is guaranteed to have the upper bits be 0 when MSR.CM=0? We chose to mask out mas2 upper bits on tlbwe emulation so gtlbe->mas2 will respect this but vcpu->arch.shared->mas2 will not. tlb entry selection does not require this treatment since EPN upper bits are not taken into consideration anyway. > > > > >> Also, we need to implement MAS2U, to potentially make the upper 32bits > of > >> MAS2 available, right? But that one isn't as important as the first > bit. > > > > MAS2U is guest privileged why does it need special care? > > Maybe it's mapped to the upper bits of GMAS2 automatically? GMAS2? > > > Freescale core Manuals and EREF does not mention MAS2U so I think I our > case > > it is not implemented. > > Please check with a simple mfspr() test on real hw to see if it really > isn't implemented. I will try this with SPR number 0x277. -Mike -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-ppc] [RFC PATCH 05/17] KVM: PPC: booke: Extend MAS2 EPN mask for 64-bit
On 08.10.2012, at 15:06, Caraman Mihai Claudiu-B02008 wrote: >> -Original Message- >> From: Alexander Graf [mailto:ag...@suse.de] >> Sent: Monday, October 08, 2012 1:11 PM >> To: Caraman Mihai Claudiu-B02008 >> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc- >> d...@lists.ozlabs.org; qemu-...@nongnu.org >> Subject: Re: [Qemu-ppc] [RFC PATCH 05/17] KVM: PPC: booke: Extend MAS2 >> EPN mask for 64-bit >> >> >> On 05.07.2012, at 13:14, Caraman Mihai Claudiu-B02008 wrote: >> >>> >>> -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Wednesday, July 04, 2012 4:50 PM To: Caraman Mihai Claudiu-B02008 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc- d...@lists.ozlabs.org; qemu-...@nongnu.org Subject: Re: [Qemu-ppc] [RFC PATCH 05/17] KVM: PPC: booke: Extend MAS2 EPN mask for 64-bit On 25.06.2012, at 14:26, Mihai Caraman wrote: > Extend MAS2 EPN mask for 64-bit hosts, to retain most significant >> bits. > Change get tlb eaddr to use this mask. Please see section 6.11.4.8 in the PowerISA 2.06b: MMU behavior is largely unaffected by whether the thread is in 32-bit computation mode (MSRCM=0) or 64- bit computation mode (MSRCM=1). The only differ- ences occur in the EPN field of the TLB entry and the EPN field of MAS2. The differences are summarized here. * Executing a tlbwe instruction in 32-bit mode will set bits 0:31 of the TLB EPN field to zero unless MAS0ATSEL is set, in which case >> those bits are not written to zero. * In 32-bit implementations, MAS2U can be used to read or write EPN0:31 of MAS2. So if MSR.CM is not set tlbwe should mask the upper 32 bits out - >> which can happen regardless of CONFIG_64BIT. >>> >>> MAS2_EPN reflects EPN field of MAS2 aka bits 0:51 (for MAV = 1.0) >> according >>> to section 6.10.3.10 in the PowerISA 2.06b. >>> >>> MAS2_EPN is not used in tlbwe execution emulation, we have MAS2_VAL >> define >>> for this case. >> >> So tlbe->mas2 is guaranteed to have the upper bits be 0 when MSR.CM=0? > > We chose to mask out mas2 upper bits on tlbwe emulation so gtlbe->mas2 will > respect this but vcpu->arch.shared->mas2 will not. tlb entry selection does > not > require this treatment since EPN upper bits are not taken into consideration > anyway. That's fine. We don't control the contents of shared->mas2 anyway. > >> >>> Also, we need to implement MAS2U, to potentially make the upper 32bits >> of MAS2 available, right? But that one isn't as important as the first >> bit. >>> >>> MAS2U is guest privileged why does it need special care? >> >> Maybe it's mapped to the upper bits of GMAS2 automatically? > > GMAS2? Ah. The guest has direct control over the real MAS2. Oh well. > >> >>> Freescale core Manuals and EREF does not mention MAS2U so I think I our >> case >>> it is not implemented. >> >> Please check with a simple mfspr() test on real hw to see if it really >> isn't implemented. > > I will try this with SPR number 0x277. Thanks :) Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Using PCI config space to indicate config location
Rusty Russell writes: > (Topic updated, cc's trimmed). > > Anthony Liguori writes: >> Rusty Russell writes: >>> 4) The only significant change to the spec is that we use PCI >>>capabilities, so we can have infinite feature bits. >>>(see >>> http://lists.linuxfoundation.org/pipermail/virtualization/2011-December/019198.html) >> >> We discussed this on IRC last night. I don't think PCI capabilites are >> a good mechanism to use... >> >> PCI capabilities are there to organize how the PCI config space is >> allocated to allow vendor extensions to co-exist with future PCI >> extensions. >> >> But we've never used the PCI config space within virtio-pci. We do >> everything in BAR0. I don't think there's any real advantage of using >> the config space vs. a BAR for virtio-pci. > > Note before anyone gets confused; we were talking about using the PCI > config space to indicate what BAR(s) the virtio stuff is in. An > alternative would be to simply specify a new layout format in BAR1. > > The arguments for a more flexible format that I know of: > > 1) virtio-pci has already extended the pci-specific part of the >configuration once (for MSI-X), so I don't want to assume it won't >happen again. "configuration" is the wrong word here. The virtio-pci BAR0 layout is: 0..19 virtio-pci registers 20+ virtio configuration space MSI-X needed to add additional virtio-pci registers, so now we have: 0..19 virtio-pci registers if MSI-X: 20..23 virtio-pci MSI-X registers 24+ virtio configuration space else: 20+ virtio configuration space I agree, this stinks. But I think we could solve this in a different way. I think we could just move the virtio configuration space to BAR1 by using a transport feature bit. That then frees up the entire BAR0 for use as virtio-pci registers. We can then always include the virtio-pci MSI-X register space and introduce all new virtio-pci registers as simply being appended. This new feature bit then becomes essentially a virtio configuration latch. When unacked, virtio configuration hides new registers, when acked, those new registers are exposed. Another option is to simply put new registers after the virtio configuration blob. > 2) ISTR an argument about mapping the ISR register separately, for >performance, but I can't find a reference to it. I think the rationale is that ISR really needs to be PIO but everything else doesn't. PIO is much faster on x86 because it doesn't require walking page tables or instruction emulation to handle the exit. The argument to move the remaining registers to MMIO is to allow 64-bit accesses to registers which isn't possible with PIO. >> This maps really nicely to non-PCI transports too. > > This isn't right. Noone else can use the PCI layout. While parts are > common, other parts are pci-specific (MSI-X and ISR for example), and > yet other parts are specified by PCI elsewhere (eg interrupt numbers). > >> But extending the >> PCI config space (especially dealing with capability allocation) is >> pretty gnarly and there isn't an obvious equivalent outside of PCI. > > That's OK, because general changes should be done with feature bits, and > the others all have an infinite number. Being the first, virtio-pci has > some unique limitations we'd like to fix. > >> There are very devices that we emulate today that make use of extended >> PCI device registers outside the platform devices (that have no BARs). > > This sentence confused me? There is a missing "few". "There are very few devices..." Extending the PCI configuration space is unusual for PCI devices. That was the point. Regards, Anthony Liguori > > Thanks, > Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] kvm: Set default accelerator to "kvm" if the host supports it
Am 05.10.2012 04:24, schrieb Alexander Graf: > > On 05.10.2012, at 04:17, Anthony Liguori wrote: > >> Alexander Graf writes: >> >>> On 03.10.2012, at 22:26, Peter Maydell wrote: >>> On 3 October 2012 21:01, Blue Swirl wrote: > On Mon, Oct 1, 2012 at 4:20 PM, Anthony Liguori > wrote: >> Jan Kiszka writes: >>> +/* The default accelerator depends on the availability of KVM. >>> */ >>> +p = kvm_configured ? "kvm" : "tcg"; >>>} >> Blue/Aurelien, any objections? > > No, maybe a message could be printed that says that the default has > changed, for a few releases. I've lost track of the conversation, are we currently proposing the accelerator default to be "kvm" (as per the original patch you quote here) or "kvm:tcg" ? I'm not entirely sure which I prefer from an ARM perspective For some time to come and for a lot of targets (ie any target CPU except A15), having a default of "kvm" is going to cause existing working commandlines to stop working. [I expect that ARM-host qemu binaries will be built with CONFIG_KVM once ARM KVM support lands, but the same binary will be run on hosts without virtualization extensions.] On the other hand, perhaps there just aren't really very many people who run QEMU on ARM hosts, and so we can ignore them :-) >>> >>> We get similar problems on PPC. Take the following example: >>> >>> $ qemu-system-ppc -M mpc8544ds -kernel uImage -nographic >> >> But do you really expect people to do this? I have to believe that >> people running on PPC hardware and running qemu-system-ppc most likely >> want to do KVM... > > Sure. But we wouldn't be able to even tell them what went wrong, as we don't > have a negotiation mechanism right now that could tell user space "hey, the > CPU you selected is unknown to me". Would it help to split out the cpu_model -> CPUClass lookup from cpu_ppc_init() to invoke a hook or inquire a field indicating KVM support? Andreas > > However, if during cpu init we could add such a check and then fall back to > tcg mode if accel=kvm:tcg with a warning, that'd be nice user experience. > > We could do the same for ARM. If you do -M beagle on an A15 KVM enabled > machine, you would still be able to do so, but KVM tells you it can't emulate > an A8 right now. And if in the future KVM learns how to expose an A8 on A15, > we could just not bail out and things would magically work. > > Apart from that, I like the idea of kvm:tcg with a warning as the default for > qemu. We should still have a qemu-kvm binary in the distro that does > accel=kvm so people don't accidentally fall back to tcg mode. > > > Alex > -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] kvm: Set default accelerator to "kvm" if the host supports it
On 08.10.2012, at 16:03, Andreas Färber wrote: > Am 05.10.2012 04:24, schrieb Alexander Graf: >> >> On 05.10.2012, at 04:17, Anthony Liguori wrote: >> >>> Alexander Graf writes: >>> On 03.10.2012, at 22:26, Peter Maydell wrote: > On 3 October 2012 21:01, Blue Swirl wrote: >> On Mon, Oct 1, 2012 at 4:20 PM, Anthony Liguori >> wrote: >>> Jan Kiszka writes: +/* The default accelerator depends on the availability of KVM. */ +p = kvm_configured ? "kvm" : "tcg"; } > >>> Blue/Aurelien, any objections? >> >> No, maybe a message could be printed that says that the default has >> changed, for a few releases. > > I've lost track of the conversation, are we currently proposing > the accelerator default to be "kvm" (as per the original patch > you quote here) or "kvm:tcg" ? > > I'm not entirely sure which I prefer from an ARM perspective > For some time to come and for a lot of targets (ie any target > CPU except A15), having a default of "kvm" is going to cause > existing working commandlines to stop working. [I expect that > ARM-host qemu binaries will be built with CONFIG_KVM once ARM > KVM support lands, but the same binary will be run on hosts > without virtualization extensions.] On the other hand, perhaps > there just aren't really very many people who run QEMU on > ARM hosts, and so we can ignore them :-) We get similar problems on PPC. Take the following example: $ qemu-system-ppc -M mpc8544ds -kernel uImage -nographic >>> >>> But do you really expect people to do this? I have to believe that >>> people running on PPC hardware and running qemu-system-ppc most likely >>> want to do KVM... >> >> Sure. But we wouldn't be able to even tell them what went wrong, as we don't >> have a negotiation mechanism right now that could tell user space "hey, the >> CPU you selected is unknown to me". > > Would it help to split out the cpu_model -> CPUClass lookup from > cpu_ppc_init() to invoke a hook or inquire a field indicating KVM support? Well, we need to basically determine whether KVM is enabled only after cpu creation of the machine file. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
macvlan/macvtap guest to host communication
Hi! I have connected my kvm guest using a macvtap interface and configured a macvlan interface in bridge mode on the host to allow host to guest communication. If using virtio_net in the guest, host to guest transfer works fine. But in guest to host direction large tcp segments do not arrive at the host and are retransmitted using smaller segment sizes. (See iperf-virtio-guest2host_guest.pcap and iperf-virtio-guest2host_host.pcap. [1]) Disabling tcp segmentation offload at the guest nic fixes that problem. If I switch to different nics (e1000, rtl8139, etc), checksum offloading also seems to interfere. The checksum errors go away after disabling TX checksumming on the underlying host interface but that's a performance killer. Checksum errors also occur on other connections, not only on host to guest traffic. Does macvlan/macvtap networking only work with virtio-net or are there any tweakings for other guest NICs? Thanks, --leo [1] http://leo.kloburg.at/tmp/kvm-macvtap/ -- e-mail ::: Leo.Bergolth (at) wu.ac.at fax ::: +43-1-31336-906050 location ::: IT-Services | Vienna University of Economics | Austria -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Using PCI config space to indicate config location
Hi, > But I think we could solve this in a different way. I think we could > just move the virtio configuration space to BAR1 by using a transport > feature bit. Why hard-code stuff? I think it makes alot of sense to have a capability simliar to msi-x which simply specifies bar and offset of the register sets: [root@fedora ~]# lspci -vvs4 00:04.0 SCSI storage controller: Red Hat, Inc Virtio block device [ ... ] Region 0: I/O ports at c000 [size=64] Region 1: Memory at fc029000 (32-bit) [size=4K] Capabilities: [40] MSI-X: Enable+ Count=2 Masked- Vector table: BAR=1 offset= PBA: BAR=1 offset=0800 So we could have for virtio something like this: Capabilities: [??] virtio-regs: legacy: BAR=0 offset=0 virtio-pci: BAR=1 offset=1000 virtio-cfg: BAR=1 offset=1800 > That then frees up the entire BAR0 for use as virtio-pci registers. We > can then always include the virtio-pci MSI-X register space and > introduce all new virtio-pci registers as simply being appended. BAR0 needs to stay as-is for compatibility reasons. New devices which don't have to care about old guests don't need to provide a 'legacy' register region. Most devices have mmio at BAR1 for msi-x support anyway, we can place the virtio-pci and virtio configuration registers there too by default. I wouldn't hardcode that though. > This new feature bit then becomes essentially a virtio configuration > latch. When unacked, virtio configuration hides new registers, when > acked, those new registers are exposed. I'd just expose them all all the time. >> 2) ISTR an argument about mapping the ISR register separately, for >>performance, but I can't find a reference to it. > > I think the rationale is that ISR really needs to be PIO but everything > else doesn't. PIO is much faster on x86 because it doesn't require > walking page tables or instruction emulation to handle the exit. Is this still a pressing issue? With MSI-X enabled ISR isn't needed, correct? Which would imply that pretty much only old guests without MSI-X support need this, and we don't need to worry that much when designing something new ... cheers, Gerd -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Using PCI config space to indicate config location
Gerd Hoffmann writes: > Hi, > >> But I think we could solve this in a different way. I think we could >> just move the virtio configuration space to BAR1 by using a transport >> feature bit. > > Why hard-code stuff? > > I think it makes alot of sense to have a capability simliar to msi-x > which simply specifies bar and offset of the register sets: > > [root@fedora ~]# lspci -vvs4 > 00:04.0 SCSI storage controller: Red Hat, Inc Virtio block device > [ ... ] > Region 0: I/O ports at c000 [size=64] > Region 1: Memory at fc029000 (32-bit) [size=4K] > Capabilities: [40] MSI-X: Enable+ Count=2 Masked- > Vector table: BAR=1 offset= > PBA: BAR=1 offset=0800 MSI-X capability is a standard PCI capability which is why lspci can parse it. > > So we could have for virtio something like this: > > Capabilities: [??] virtio-regs: > legacy: BAR=0 offset=0 > virtio-pci: BAR=1 offset=1000 > virtio-cfg: BAR=1 offset=1800 This would be a vendor specific PCI capability so lspci wouldn't automatically know how to parse it. You could just as well teach lspci to parse BAR0 to figure out what features are supported. >> That then frees up the entire BAR0 for use as virtio-pci registers. We >> can then always include the virtio-pci MSI-X register space and >> introduce all new virtio-pci registers as simply being appended. > > BAR0 needs to stay as-is for compatibility reasons. New devices which > don't have to care about old guests don't need to provide a 'legacy' > register region. A latch feature bit would allow the format to change without impacting compatibility at all. >>> 2) ISTR an argument about mapping the ISR register separately, for >>>performance, but I can't find a reference to it. >> >> I think the rationale is that ISR really needs to be PIO but everything >> else doesn't. PIO is much faster on x86 because it doesn't require >> walking page tables or instruction emulation to handle the exit. > > Is this still a pressing issue? With MSI-X enabled ISR isn't needed, > correct? Which would imply that pretty much only old guests without > MSI-X support need this, and we don't need to worry that much when > designing something new ... It wasn't that long ago that MSI-X wasn't supported.. I think we should continue to keep ISR as PIO as it is a fast path. Regards, Anthony Liguori > > cheers, > Gerd > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: Make emulator_fix_hypercall static
On Thu, Sep 20, 2012 at 07:43:17AM +0200, Jan Kiszka wrote: > From: Jan Kiszka > > No users outside of kvm/x86.c. > > Signed-off-by: Jan Kiszka Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: Convert kvm_arch_vcpu_reset into private kvm_vcpu_reset
On Thu, Sep 20, 2012 at 07:43:08AM +0200, Jan Kiszka wrote: > From: Jan Kiszka > > There are no external callers of this function as there is no concept of > resetting a vcpu from generic code. > > Signed-off-by: Jan Kiszka Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
On Wed, Sep 19, 2012 at 05:44:46PM +, Auld, Will wrote: > >From 9982bb73460b05c1328068aae047b14b2294e2da Mon Sep 17 00:00:00 2001 > From: Will Auld > Date: Wed, 12 Sep 2012 18:10:56 -0700 > Subject: [PATCH] Enabling IA32_TSC_ADJUST for guest VM > > CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported > > Basic design is to emulate the MSR by allowing reads and writes to a guest > vcpu specific location to store the value of the emulated MSR while adding > the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will > be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This > is of course as long as the "use TSC counter offsetting" VM-execution control > is enabled as well as the IA32_TSC_ADJUST control. > > However, because hardware will only return the TSC + IA32_TSC_ADJUST + vmsc > tsc_offset for a guest process when it does and rdtsc (with the correct > settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one > of these three locations. The argument against storing it in the actual MSR > is performance. This is likely to be seldom used while the save/restore is > required on every transition. IA32_TSC_ADJUST was created as a way to solve > some issues with writing TSC itself so that is not an option either. The > remaining option, defined above as our solution has the problem of returning > incorrect vmcs tsc_offset values (unless we intercept and fix, not done here) > as mentioned above. However, more problematic is that storing the data in > vmcs tsc_offset will have a different semantic effect on the system than does > using the actual MSR. This is illustrated in the following example: The > hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a guest > process perfor! > ms a rdtsc. In this case the guest process will get TSC + > IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including IA32_TSC_ADJUST_guest. > While the total system semantics changed the semantics as seen by the guest > do not and hence this will not cause a problem. > --- > arch/x86/include/asm/cpufeature.h |1 + > arch/x86/include/asm/kvm_host.h |2 ++ > arch/x86/include/asm/msr-index.h |1 + > arch/x86/kvm/cpuid.c |4 ++-- > arch/x86/kvm/vmx.c| 12 > arch/x86/kvm/x86.c|1 + > 6 files changed, 19 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/include/asm/cpufeature.h > b/arch/x86/include/asm/cpufeature.h > index 6b7ee5f..e574d81 100644 > --- a/arch/x86/include/asm/cpufeature.h > +++ b/arch/x86/include/asm/cpufeature.h > @@ -199,6 +199,7 @@ > > /* Intel-defined CPU features, CPUID level 0x0007:0 (ebx), word 9 */ > #define X86_FEATURE_FSGSBASE (9*32+ 0) /* {RD/WR}{FS/GS}BASE instructions*/ > +#define X86_FEATURE_TSC_ADJUST (9*32+ 1) /* TSC adjustment MSR 0x3b */ > #define X86_FEATURE_BMI1 (9*32+ 3) /* 1st group bit manipulation > extensions */ > #define X86_FEATURE_HLE (9*32+ 4) /* Hardware Lock Elision */ > #define X86_FEATURE_AVX2 (9*32+ 5) /* AVX2 instructions */ > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 09155d6..8a001a4 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -442,6 +442,8 @@ struct kvm_vcpu_arch { > u32 virtual_tsc_mult; > u32 virtual_tsc_khz; > > + s64 tsc_adjust; > + > atomic_t nmi_queued; /* unprocessed asynchronous NMIs */ > unsigned nmi_pending; /* NMI queued after currently running handler */ > bool nmi_injected;/* Trying to inject an NMI this entry */ > diff --git a/arch/x86/include/asm/msr-index.h > b/arch/x86/include/asm/msr-index.h > index 957ec87..8e82e29 100644 > --- a/arch/x86/include/asm/msr-index.h > +++ b/arch/x86/include/asm/msr-index.h > @@ -231,6 +231,7 @@ > #define MSR_IA32_EBL_CR_POWERON 0x002a > #define MSR_EBC_FREQUENCY_ID 0x002c > #define MSR_IA32_FEATURE_CONTROL0x003a > +#define MSR_TSC_ADJUST 0x003b > > #define FEATURE_CONTROL_LOCKED (1<<0) > #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX (1<<1) > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index 0595f13..8f5943e 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -248,8 +248,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, > u32 function, > > /* cpuid 7.0.ebx */ > const u32 kvm_supported_word9_x86_features = > - F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) | > - F(BMI2) | F(ERMS) | f_invpcid | F(RTM); > + F(FSGSBASE) | F(TSC_ADJUST) | F(BMI1) | F(HLE) | > + F(AVX2) | F(SMEP) | F(BMI2) | F(ERMS) | f_invpcid | F(RTM); > > /* all calls to cpuid_count() should be made on the same cpu */ > get_cpu(); > diff --git a/arch/x86/kvm/vmx.c b/arch/x86
Re: [Question] Intercept CR3 access in EPT
On Mon, Oct 08, 2012 at 04:15:57PM +0800, R wrote: > Hi, > > I am a student. And my teacher told me to monitor every process in guest. > So, I try to intercept every Cr3 access. However, if kvm is loaded > with EPT enable, Acesses to Cr3 would not cause VM-exit. Disable EPT by loading kvm-intel.ko module with enable_ept=0 parameter. Then, CR3 accesses will trap. > I modified the code to change vmcs configuration. > To be specific, these functions are rewirted. > static void ept_update_paging_mode_cr0(unsigned long *hw_cr0, > unsigned long cr0, > struct kvm_vcpu *vcpu) > { > > } else if (!is_paging(vcpu)) { > /* From nonpaging to paging */ > vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, >vmcs_read32(CPU_BASED_VM_EXEC_CONTROL) & > - ~(CPU_BASED_CR3_LOAD_EXITING | > + ~(// CPU_BASED_CR3_LOAD_EXITING| > CPU_BASED_CR3_STORE_EXITING)); > > } > > static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) > { > ... > if (_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) { > /* CR3 accesses and invlpg don't need to cause VM Exits when EPT > enabled */ > - _cpu_based_exec_control &= ~(CPU_BASED_CR3_LOAD_EXITING | > + _cpu_based_exec_control &= ~( // > CPU_BASED_CR3_LOAD_EXITING | >CPU_BASED_CR3_STORE_EXITING | >CPU_BASED_INVLPG_EXITING); > > } > > I though it can force every Cr3 access to be trapped with EPT enable. > However, VM seems to fail to boot when it changes from nonpaging to > paging. > Do U guys have any idea? Or Can someone tell me how can I intercept > Cr3 access and why can not it work? > > Thank U for answering. > > -- > Thanks > Rui Wu > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Failed to get host power management capabilities
On Fri, Oct 05, 2012 at 06:22:58AM -0600, David Torres wrote: > Hi all, > > My name is David Torres, I am from Costa Rica. See this is the problem I have > with the KVM instalation: > > 2012-10-03 20:28:17.395+: 25793: warning : qemuCapsInit:856 : Failed to > get host power management capabilities > 2012-10-03 20:28:17.661+: 25793: error : virExecWithHook:328 : Cannot > find 'pm-is-supported' in path: No such file or directory > > And this error from the kern.log: > > Oct 4 21:50:53 kvm kernel: [22727.849902] device vnet0 entered promiscuous > mode > Oct 4 21:50:53 kvm kernel: [22727.883686] br0: port 2(vnet0) entering > forwarding state > Oct 4 21:50:53 kvm kernel: [22727.883692] br0: port 2(vnet0) entering > forwarding state > Oct 4 21:50:53 kvm kernel: [22728.130242] br0: port 2(vnet0) entering > forwarding state > Oct 4 21:50:53 kvm kernel: [22728.134443] br0: port 2(vnet0) entering > disabled state > Oct 4 21:50:53 kvm kernel: [22728.135238] device vnet0 left promiscuous mode > Oct 4 21:50:53 kvm kernel: [22728.135242] br0: port 2(vnet0) entering > disabled state > Oct 4 21:50:54 kvm kernel: [22728.673620] type=1400 > audit(1349409054.320:42): apparmor="STATUS" operation="profile_remove" > name="libvirt-9b75f498-7959-7321-9461-d729d9c60668" pid=6349 > comm="apparmor_parser" > > > And when I try to create a Virtual Machine from the Virtual Machine Monitor, > I got this error message: > > 2012-10-04 22:59:32.154+: 1333: error : qemuProcessReadLogOutput:1006 : > internal error Process exited while reading console log output: char device > redirected to /dev/pts/2 > Could not access KVM kernel module: Is a directory > failed to initialize KVM: Is a directory > No accelerator found! > > > I already enabled the virtualization feature on the BIOS and run the modprobe > kvm-intel command also. > No error message was received during the instalation process. > > So I am stuck on this problem :( I will appreciate sooo much any help. > > > Thank you so much in advanced > > > Regards David, Please send this message to the libvirt list, at libvir-l...@redhat.com. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Steal time in KVM
On Mon, Oct 08, 2012 at 02:55:25AM -0500, Abhishek Gupta wrote: > Hi, > > I am trying to get the steal time with 2 VMs (each with 1 Vcpu) pinned > to same core. > > While finding documentation on this, I came across your patches and > posts related to the implementation of this feature, so I thought it > would be best to ask you. > > I run the same application on these 2 VMs simultaneously and see the > performance difference. I am trying to read the steal time from inside > the guest using top, vmstat etc. > > Both, top and vmstat -s report the steal time (st) as 0. I also > checked that procps is in latest version. I am using virtio-net. I > suspect that the steal time is not being updated well. Is there > something which I need to configure for this to work? My Linux version > for guest image is: > > Linux server-147 2.6.35-24-virtual #42-Ubuntu SMP Thu Dec 2 05:15:26 > UTC 2010 x86_64 GNU/Linux > > And /proc/cpuinfo shows i: > > processor : 0 > vendor_id : GenuineIntel > cpu family: 6 > model : 2 > model name: QEMU Virtual CPU version 0.14.0 > stepping : 3 > cpu MHz : 2992.498 > cache size: 4096 KB > fpu : yes > fpu_exception : yes > cpuid level : 4 > wp: yes > flags : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pse36 clflush mmx fxsr sse sse2 syscall nx lm up rep_good pni cx16 > hypervisor lahf_lm bogomips : 5984.99 > clflush size : 64 > cache_alignment : 64 > address sizes : 40 bits physical, 48 bits virtual > power management: > > > Another question I had was that is there a way to programmatically get > the value of steal cycles (e.g. from a C program) ? > Thanks, > > Abhishek Make sure CONFIG_SCHEDSTATS is enabled in your host kernel. > On Monday, June 13, 2011 6:40:02 PM UTC-5, Glauber Costa wrote: > > Hi, > > > > This series is a repost of the last series I posted about this. > > It tries to address most concerns that were raised at the time, > > plus makes uses of the static_branch interface to disable the > > steal code when not in use. > > > > > > Glauber Costa (7): > > KVM-HDR Add constant to represent KVM MSRs enabled bit > > KVM-HDR: KVM Steal time implementation > > KVM-HV: KVM Steal time implementation > > KVM-GST: Add a pv_ops stub for steal time > > KVM-GST: KVM Steal time accounting > > KVM-GST: adjust scheduler cpu power > > KVM-GST: KVM Steal time registration > > > > Documentation/kernel-parameters.txt |4 ++ > > Documentation/virtual/kvm/msr.txt | 33 + > > arch/x86/Kconfig | 12 + > > arch/x86/include/asm/kvm_host.h |8 +++ > > arch/x86/include/asm/kvm_para.h | 15 ++ > > arch/x86/include/asm/paravirt.h |9 > > arch/x86/include/asm/paravirt_types.h |1 + > > arch/x86/kernel/kvm.c | 72 + > > arch/x86/kernel/kvmclock.c|2 + > > arch/x86/kernel/paravirt.c|9 > > arch/x86/kvm/x86.c| 60 +++- > > kernel/sched.c| 81 > > + > > kernel/sched_features.h |4 +- > > 13 files changed, 296 insertions(+), 14 deletions(-) > > > > -- > > 1.7.3.4 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > > > > On Monday, June 13, 2011 6:40:02 PM UTC-5, Glauber Costa wrote: > > Hi, > > > > This series is a repost of the last series I posted about this. > > It tries to address most concerns that were raised at the time, > > plus makes uses of the static_branch interface to disable the > > steal code when not in use. > > > > > > Glauber Costa (7): > > KVM-HDR Add constant to represent KVM MSRs enabled bit > > KVM-HDR: KVM Steal time implementation > > KVM-HV: KVM Steal time implementation > > KVM-GST: Add a pv_ops stub for steal time > > KVM-GST: KVM Steal time accounting > > KVM-GST: adjust scheduler cpu power > > KVM-GST: KVM Steal time registration > > > > Documentation/kernel-parameters.txt |4 ++ > > Documentation/virtual/kvm/msr.txt | 33 + > > arch/x86/Kconfig | 12 + > > arch/x86/include/asm/kvm_host.h |8 +++ > > arch/x86/include/asm/kvm_para.h | 15 ++ > > arch/x86/include/asm/paravirt.h |9 > > arch/x86/include/asm/paravirt_types.h |1 + > > arch/x86/kernel/kvm.c | 72 + > > arch/x86/kernel/kvmclock.c|2 + > > arch/x86/kernel/paravirt.c|9 > > arch/x86/kvm/x86.c| 60 +++- > > kernel/sched.c| 81 > > +
Re: [PATCH 0/3] virtio-net: inline header support
On Wed, Oct 03, 2012 at 04:14:17PM +0930, Rusty Russell wrote: > "Michael S. Tsirkin" writes: > > > Thinking about Sasha's patches, we can reduce ring usage > > for virtio net small packets dramatically if we put > > virtio net header inline with the data. > > This can be done for free in case guest net stack allocated > > extra head room for the packet, and I don't see > > why would this have any downsides. > > I've been wanting to do this for the longest time... but... > > > Even though with my recent patches qemu > > no longer requires header to be the first s/g element, > > we need a new feature bit to detect this. > > A trivial qemu patch will be sent separately. > > There's a reason I haven't done this. I really, really dislike "my > implemention isn't broken" feature bits. We could have an infinite > number of them, for each bug in each device. > > So my plan was to tie this assumption to the new PCI layout. I don't object but old qemu has this limitation for s390 as well, and that's not using PCI, right? So how do we detect new hypervisor there? -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Steal time in KVM
Thanks for the answer. I am more of an application user. Is there a quick way to check this flag and enable it if its disabled? Abhishek On Mon, Oct 8, 2012 at 2:39 PM, Marcelo Tosatti wrote: > On Mon, Oct 08, 2012 at 02:55:25AM -0500, Abhishek Gupta wrote: >> Hi, >> >> I am trying to get the steal time with 2 VMs (each with 1 Vcpu) pinned >> to same core. >> >> While finding documentation on this, I came across your patches and >> posts related to the implementation of this feature, so I thought it >> would be best to ask you. >> >> I run the same application on these 2 VMs simultaneously and see the >> performance difference. I am trying to read the steal time from inside >> the guest using top, vmstat etc. >> >> Both, top and vmstat -s report the steal time (st) as 0. I also >> checked that procps is in latest version. I am using virtio-net. I >> suspect that the steal time is not being updated well. Is there >> something which I need to configure for this to work? My Linux version >> for guest image is: >> >> Linux server-147 2.6.35-24-virtual #42-Ubuntu SMP Thu Dec 2 05:15:26 >> UTC 2010 x86_64 GNU/Linux >> >> And /proc/cpuinfo shows i: >> >> processor : 0 >> vendor_id : GenuineIntel >> cpu family: 6 >> model : 2 >> model name: QEMU Virtual CPU version 0.14.0 >> stepping : 3 >> cpu MHz : 2992.498 >> cache size: 4096 KB >> fpu : yes >> fpu_exception : yes >> cpuid level : 4 >> wp: yes >> flags : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov >> pse36 clflush mmx fxsr sse sse2 syscall nx lm up rep_good pni cx16 >> hypervisor lahf_lm > bogomips : 5984.99 >> clflush size : 64 >> cache_alignment : 64 >> address sizes : 40 bits physical, 48 bits virtual >> power management: >> >> >> Another question I had was that is there a way to programmatically get >> the value of steal cycles (e.g. from a C program) ? >> Thanks, >> >> Abhishek > > Make sure CONFIG_SCHEDSTATS is enabled in your host kernel. > >> On Monday, June 13, 2011 6:40:02 PM UTC-5, Glauber Costa wrote: >> > Hi, >> > >> > This series is a repost of the last series I posted about this. >> > It tries to address most concerns that were raised at the time, >> > plus makes uses of the static_branch interface to disable the >> > steal code when not in use. >> > >> > >> > Glauber Costa (7): >> > KVM-HDR Add constant to represent KVM MSRs enabled bit >> > KVM-HDR: KVM Steal time implementation >> > KVM-HV: KVM Steal time implementation >> > KVM-GST: Add a pv_ops stub for steal time >> > KVM-GST: KVM Steal time accounting >> > KVM-GST: adjust scheduler cpu power >> > KVM-GST: KVM Steal time registration >> > >> > Documentation/kernel-parameters.txt |4 ++ >> > Documentation/virtual/kvm/msr.txt | 33 + >> > arch/x86/Kconfig | 12 + >> > arch/x86/include/asm/kvm_host.h |8 +++ >> > arch/x86/include/asm/kvm_para.h | 15 ++ >> > arch/x86/include/asm/paravirt.h |9 >> > arch/x86/include/asm/paravirt_types.h |1 + >> > arch/x86/kernel/kvm.c | 72 + >> > arch/x86/kernel/kvmclock.c|2 + >> > arch/x86/kernel/paravirt.c|9 >> > arch/x86/kvm/x86.c| 60 +++- >> > kernel/sched.c| 81 >> > + >> > kernel/sched_features.h |4 +- >> > 13 files changed, 296 insertions(+), 14 deletions(-) >> > >> > -- >> > 1.7.3.4 >> > >> > -- >> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> > the body of a message to majord...@vger.kernel.org >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> > Please read the FAQ at http://www.tux.org/lkml/ >> >> >> >> On Monday, June 13, 2011 6:40:02 PM UTC-5, Glauber Costa wrote: >> > Hi, >> > >> > This series is a repost of the last series I posted about this. >> > It tries to address most concerns that were raised at the time, >> > plus makes uses of the static_branch interface to disable the >> > steal code when not in use. >> > >> > >> > Glauber Costa (7): >> > KVM-HDR Add constant to represent KVM MSRs enabled bit >> > KVM-HDR: KVM Steal time implementation >> > KVM-HV: KVM Steal time implementation >> > KVM-GST: Add a pv_ops stub for steal time >> > KVM-GST: KVM Steal time accounting >> > KVM-GST: adjust scheduler cpu power >> > KVM-GST: KVM Steal time registration >> > >> > Documentation/kernel-parameters.txt |4 ++ >> > Documentation/virtual/kvm/msr.txt | 33 + >> > arch/x86/Kconfig | 12 + >> > arch/x86/include/asm/kvm_host.h |8 +++ >> > arch/x86/include/asm/kvm_para.h | 15 ++ >> > arch/x86/include/asm/paravirt.h |9 >> > arch/x86/include/asm/paravirt_types.h |1 + >> > ar
Re: Steal time in KVM
I think this flag is enabled since I see that there is some information cat /proc/schedstat version 15 timestamp 4332448322 cpu0 12 0 65142011 32031105 32750248 22717261 2447620065997 58787405876 32964108 domain0 ,,,,,,,0005 3254935 3249367 2404 4821929 3229 0 96 3249269 6990 6986 0 2577 4 0 0 6986 23305437 22926698 242127 354902327 136655 0 22841 22903857 0 0 0 0 0 0 0 0 0 724375 51388 0 domain1 ,,,,,,,000f 3184279 3170371 13759 16042900 159 9 691 3169674 6836 6827 9 10875 0 0 2 410 23168825 22679032 475057 534512726 14817 23 66639 22612393 6 0 6 0 0 0 0 0 0 1884449 6647 0 cpu1 0 0 2140210892 1069012296 1070207005 1067616970 4593016088954 1172134807032 1071029875 domain0 ,,,,,,,000a 4637501 4632524 2203 6211781 2780 1 2 4632522 6282 6281 1 856 0 0 0 6281 14462483 14303529 126376 156661372 32595 0 9195 14294334 0 0 0 0 0 0 0 0 0 1234313 30153 0 domain1 ,,,,,,,000f 5766088 5765346 577 841712 179 2 110 5765235 8803 8803 0 0 0 0 0 1514 14429905 14181972 228676 264974439 19576 0 35872 14146100 1 0 1 0 0 0 0 0 0 7622572 16252 0 cpu2 0 0 52577101 25565275 26801121 24132631 812994237346 43152983079 26937729 domain0 ,,,,,,,0005 1125081 1123749 733 2378290 602 0 8 1123742 13138 13137 1 2018 0 0 0 13137 23954299 23750670 88041 190709270 115590 2 10688 23739982 0 0 0 0 0 0 0 0 0 979865 57090 0 domain1 ,,,,,,,000f 1052559 1052538 15 114576 7 0 0 25607 13113 13113 0 0 0 0 0 55 23838711 23200171 613151 756023415 25505 0 52756 23147415 0 0 0 0 0 0 0 0 0 597157 46792 0 cpu3 112320 0 28758268 13561833 14268286 12145196 2012983455891 88768447010 14923733 domain0 ,,,,,,,000a 1705787 1704495 727 5715943 571 2 6 1704489 22311 22311 0 1476 0 0 0 22311 12332932 12230181 67994 135432986 34760 4 7662 1519 1 0 1 0 0 0 0 0 0 997198 25844 0 domain1 ,,,,,,,000f 1873701 1873686 8 15157 5 0 0 126733 30014 30014 0 0 0 0 0 48 12298175 11963494 304127 386917447 32071 0 44945 11918549 0 0 0 0 0 0 0 0 0 3374661 31194 0 Abhishek On Mon, Oct 8, 2012 at 2:42 PM, Abhishek Gupta wrote: > Thanks for the answer. I am more of an application user. Is there a > quick way to check this flag and enable it if its disabled? > Abhishek > > > On Mon, Oct 8, 2012 at 2:39 PM, Marcelo Tosatti wrote: >> On Mon, Oct 08, 2012 at 02:55:25AM -0500, Abhishek Gupta wrote: >>> Hi, >>> >>> I am trying to get the steal time with 2 VMs (each with 1 Vcpu) pinned >>> to same core. >>> >>> While finding documentation on this, I came across your patches and >>> posts related to the implementation of this feature, so I thought it >>> would be best to ask you. >>> >>> I run the same application on these 2 VMs simultaneously and see the >>> performance difference. I am trying to read the steal time from inside >>> the guest using top, vmstat etc. >>> >>> Both, top and vmstat -s report the steal time (st) as 0. I also >>> checked that procps is in latest version. I am using virtio-net. I >>> suspect that the steal time is not being updated well. Is there >>> something which I need to configure for this to work? My Linux version >>> for guest image is: >>> >>> Linux server-147 2.6.35-24-virtual #42-Ubuntu SMP Thu Dec 2 05:15:26 >>> UTC 2010 x86_64 GNU/Linux >>> >>> And /proc/cpuinfo shows i: >>> >>> processor : 0 >>> vendor_id : GenuineIntel >>> cpu family: 6 >>> model : 2 >>> model name: QEMU Virtual CPU version 0.14.0 >>> stepping : 3 >>> cpu MHz : 2992.498 >>> cache size: 4096 KB >>> fpu : yes >>> fpu_exception : yes >>> cpuid level : 4 >>> wp: yes >>> flags : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov >>> pse36 clflush mmx fxsr sse sse2 syscall nx lm up rep_good pni cx16 >>> hypervisor lahf_lm >> bogomips : 5984.99 >>> clflush size : 64 >>> cache_alignment : 64 >>> address sizes : 40 bits physical, 48 bits virtual >>> power management: >>> >>> >>> Another question I had was that is there a way to programmatically get >>> the value of steal cycles (e.g. from a C program) ? >>> Thanks, >>> >>> Abhishek >> >> Make sure CONFIG_SCHEDSTATS is enabled in your host kernel. >> >>> On Monday, June 13, 2011 6:40:02 PM UTC-5, Glauber Costa wrote: >>> > Hi, >>> > >>> > This series is a repost of the last series I posted about this. >>> > It tries to address most concerns that were raised at the time, >>> > plus makes uses of the static_branch interface to disable the >>> > steal code when not in use. >>> > >>> > >>> > Glauber Costa (7): >>> > KVM-HDR Add constant to represent KVM MSRs enabled bit >
Re: Steal time in KVM
On Mon, Oct 08, 2012 at 02:47:59PM -0500, Abhishek Gupta wrote: > I think this flag is enabled since I see that there is some information > > cat /proc/schedstat This is in the host? Then, yes, the host has schedstat enabled. Definition of steal time: the amount of time in which this vCPU did not run. So with some CPU load on the host system, you should see "steal time" != 0 in the guest system. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Steal time in KVM
Yes, that is in the host, but in the guest I always see "steal time" =0. I checked through /proc/stat, mpstat, top, vmstat. Not sure, why the steal time information is not getting propagated to the guest. Thanks, Abhishek On Mon, Oct 8, 2012 at 3:02 PM, Marcelo Tosatti wrote: > On Mon, Oct 08, 2012 at 02:47:59PM -0500, Abhishek Gupta wrote: >> I think this flag is enabled since I see that there is some information >> >> cat /proc/schedstat > > This is in the host? Then, yes, the host has schedstat enabled. > > Definition of steal time: the amount of time in which this vCPU did not > run. > > So with some CPU load on the host system, you should see "steal time" > != 0 in the guest system. > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Using PCI config space to indicate config location
Hi, >> So we could have for virtio something like this: >> >> Capabilities: [??] virtio-regs: >> legacy: BAR=0 offset=0 >> virtio-pci: BAR=1 offset=1000 >> virtio-cfg: BAR=1 offset=1800 > > This would be a vendor specific PCI capability so lspci wouldn't > automatically know how to parse it. Sure, would need a patch to actually parse+print the cap, /me was just trying to make my point clear in a simple way. 2) ISTR an argument about mapping the ISR register separately, for performance, but I can't find a reference to it. >>> >>> I think the rationale is that ISR really needs to be PIO but everything >>> else doesn't. PIO is much faster on x86 because it doesn't require >>> walking page tables or instruction emulation to handle the exit. >> >> Is this still a pressing issue? With MSI-X enabled ISR isn't needed, >> correct? Which would imply that pretty much only old guests without >> MSI-X support need this, and we don't need to worry that much when >> designing something new ... > > It wasn't that long ago that MSI-X wasn't supported.. I think we should > continue to keep ISR as PIO as it is a fast path. No problem if we allow to have both legacy layout and new layout at the same time. Guests can continue to use ISR @ BAR0 in PIO space for existing virtio devices, even in case they want use mmio for other registers -> all fine. New virtio devices can support MSI-X from day one and decide to not expose a legacy layout PIO bar. cheers, Gerd -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] virtio-net: inline header support
On Thu, Oct 04, 2012 at 01:04:56PM +0930, Rusty Russell wrote: > Anthony Liguori writes: > > Rusty Russell writes: > > > >> "Michael S. Tsirkin" writes: > >> > >>> Thinking about Sasha's patches, we can reduce ring usage > >>> for virtio net small packets dramatically if we put > >>> virtio net header inline with the data. > >>> This can be done for free in case guest net stack allocated > >>> extra head room for the packet, and I don't see > >>> why would this have any downsides. > >> > >> I've been wanting to do this for the longest time... but... > >> > >>> Even though with my recent patches qemu > >>> no longer requires header to be the first s/g element, > >>> we need a new feature bit to detect this. > >>> A trivial qemu patch will be sent separately. > >> > >> There's a reason I haven't done this. I really, really dislike "my > >> implemention isn't broken" feature bits. We could have an infinite > >> number of them, for each bug in each device. > > > > This is a bug in the specification. > > > > The QEMU implementation pre-dates the specification. All of the actual > > implementations of virtio relied on the semantics of s/g elements and > > still do. > > lguest fix is pending in my queue. lkvm and qemu are broken; lkvm isn't > ever going to be merged, so I'm not sure what its status is? But I'm > determined to fix qemu, and hence my torture patch to make sure this > doesn't creep in again. If you look at my patch you'll notice there's also a comment in virtio_net.h that seems to be broken in this respect: /* This is the first element of the scatter-gather list. If you don't * specify GSO or CSUM features, you can simply ignore the header. */ There is a similar comment in virtio-blk. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] KVM: PPC: Add SPR emulation exits
On 10/07/2012 08:30:06 AM, Alexander Graf wrote: On 07.10.2012, at 15:26, Avi Kivity wrote: > The downside of this generic approach is that it prepares suprises down > the road. The alternative approach, of adding a new KVM_EXIT_RESET, > avoids this minefield, but requires ABI changes every time we want to > emulate something in userspace. Can you provide a critique of this > alternate approach? Yeah, it doesn't scale as well. The SPR read/write give us all information we need to emulate other registers too, like the magical "read this SPR and automatically get the interrupt vector from the MPIC and ack the interrupt along the way" register we have on e500. That's not actually how the register works in hardware (though it may be a reasonable way to emulate it with a userspace mpic). The interrupt is acknowledged when the core branches to the interrupt vector. The register itself is just storage that gets filled when that happens. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] KVM: PPC: Remove 44x target
On 10/07/2012 08:58:02 AM, Alexander Graf wrote: diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index 099fe82..4421293 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -23,7 +23,6 @@ #include #include #include -#include #include #include #include This could come out of here regardless of whether 44x stays. diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 7d120dc..f522110 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -158,10 +158,7 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu) } case KVM_HCALL_TOKEN(KVM_HC_FEATURES): r = EV_SUCCESS; -#if defined(CONFIG_PPC_BOOK3S) || defined(CONFIG_KVM_E500V2) - /* XXX Missing magic page on 44x */ r2 |= (1 << KVM_FEATURE_MAGIC_PAGE); -#endif We also don't support this on e500mc -- or at least it hasn't been tested, and would be pretty pointless nothing in the magic page traps on e500mc. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] KVM: PPC: Remove 44x target
On 08.10.2012, at 22:50, Scott Wood wrote: > On 10/07/2012 08:58:02 AM, Alexander Graf wrote: >> diff --git a/arch/powerpc/kvm/bookehv_interrupts.S >> b/arch/powerpc/kvm/bookehv_interrupts.S >> index 099fe82..4421293 100644 >> --- a/arch/powerpc/kvm/bookehv_interrupts.S >> +++ b/arch/powerpc/kvm/bookehv_interrupts.S >> @@ -23,7 +23,6 @@ >> #include >> #include >> #include >> -#include >> #include >> #include >> #include > > This could come out of here regardless of whether 44x stays. :) > >> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c >> index 7d120dc..f522110 100644 >> --- a/arch/powerpc/kvm/powerpc.c >> +++ b/arch/powerpc/kvm/powerpc.c >> @@ -158,10 +158,7 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu) >> } >> case KVM_HCALL_TOKEN(KVM_HC_FEATURES): >> r = EV_SUCCESS; >> -#if defined(CONFIG_PPC_BOOK3S) || defined(CONFIG_KVM_E500V2) >> -/* XXX Missing magic page on 44x */ >> r2 |= (1 << KVM_FEATURE_MAGIC_PAGE); >> -#endif > > We also don't support this on e500mc -- or at least it hasn't been tested, > and would be pretty pointless nothing in the magic page traps on e500mc. Ah, good point! Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Using PCI config space to indicate config location
Gerd Hoffmann writes: > Hi, > >>> So we could have for virtio something like this: >>> >>> Capabilities: [??] virtio-regs: >>> legacy: BAR=0 offset=0 >>> virtio-pci: BAR=1 offset=1000 >>> virtio-cfg: BAR=1 offset=1800 >> >> This would be a vendor specific PCI capability so lspci wouldn't >> automatically know how to parse it. > > Sure, would need a patch to actually parse+print the cap, > /me was just trying to make my point clear in a simple way. > > 2) ISTR an argument about mapping the ISR register separately, for >performance, but I can't find a reference to it. I think the rationale is that ISR really needs to be PIO but everything else doesn't. PIO is much faster on x86 because it doesn't require walking page tables or instruction emulation to handle the exit. >>> >>> Is this still a pressing issue? With MSI-X enabled ISR isn't needed, >>> correct? Which would imply that pretty much only old guests without >>> MSI-X support need this, and we don't need to worry that much when >>> designing something new ... >> >> It wasn't that long ago that MSI-X wasn't supported.. I think we should >> continue to keep ISR as PIO as it is a fast path. > > No problem if we allow to have both legacy layout and new layout at the > same time. Guests can continue to use ISR @ BAR0 in PIO space for > existing virtio devices, even in case they want use mmio for other > registers -> all fine. > > New virtio devices can support MSI-X from day one and decide to not > expose a legacy layout PIO bar. I think having BAR1 be an MMIO mirror of the registers + a BAR2 for virtio configuration space is probably not that bad of a solution. Regards, Anthony Liguori > > cheers, > Gerd > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] KVM: PPC: Add SPR emulation exits
On 08.10.2012, at 22:45, Scott Wood wrote: > On 10/07/2012 08:30:06 AM, Alexander Graf wrote: >> On 07.10.2012, at 15:26, Avi Kivity wrote: >> > The downside of this generic approach is that it prepares suprises down >> > the road. The alternative approach, of adding a new KVM_EXIT_RESET, >> > avoids this minefield, but requires ABI changes every time we want to >> > emulate something in userspace. Can you provide a critique of this >> > alternate approach? >> Yeah, it doesn't scale as well. The SPR read/write give us all information >> we need to emulate other registers too, like the magical "read this SPR and >> automatically get the interrupt vector from the MPIC and ack the interrupt >> along the way" register we have on e500. > > That's not actually how the register works in hardware (though it may be a > reasonable way to emulate it with a userspace mpic). The interrupt is > acknowledged when the core branches to the interrupt vector. The register > itself is just storage that gets filled when that happens. Mind to enlighten me again on how exactly this mode gets enabled so that an OS that does not make use of the SPR can still ask the MPIC by hand :)? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] KVM: PPC: Add SPR emulation exits
On 10/08/2012 04:01:11 PM, Alexander Graf wrote: On 08.10.2012, at 22:45, Scott Wood wrote: > On 10/07/2012 08:30:06 AM, Alexander Graf wrote: >> On 07.10.2012, at 15:26, Avi Kivity wrote: >> > The downside of this generic approach is that it prepares suprises down >> > the road. The alternative approach, of adding a new KVM_EXIT_RESET, >> > avoids this minefield, but requires ABI changes every time we want to >> > emulate something in userspace. Can you provide a critique of this >> > alternate approach? >> Yeah, it doesn't scale as well. The SPR read/write give us all information we need to emulate other registers too, like the magical "read this SPR and automatically get the interrupt vector from the MPIC and ack the interrupt along the way" register we have on e500. > > That's not actually how the register works in hardware (though it may be a reasonable way to emulate it with a userspace mpic). The interrupt is acknowledged when the core branches to the interrupt vector. The register itself is just storage that gets filled when that happens. Mind to enlighten me again on how exactly this mode gets enabled so that an OS that does not make use of the SPR can still ask the MPIC by hand :)? GCR[M] is set to 3 for external proxy mode, versus 1 for traditional operation. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/15] KVM: ARM: Handle I/O aborts
On Mon, Oct 8, 2012 at 6:04 AM, Dave Martin wrote: > On Fri, Oct 05, 2012 at 10:00:25AM +0100, Russell King - ARM Linux wrote: >> On Mon, Oct 01, 2012 at 01:53:26PM +0100, Dave Martin wrote: >> > A good starting point would be load/store emulation as this seems to be a >> > common theme, and we would need a credible deployment for any new >> > framework so that we know it's fit for purpose. >> >> Probably not actually, that code is written to be fast, because things >> like IP stack throughput depend on it - particularly when your network >> card can only DMA packets to 32-bit aligned addresses (resulting in >> virtually all network data being misaligned.) > > A fair point, but surely it would still be worth a try? > > We might decide that a few particular cases of instruction decode > should not use the generic framework for performance reaons, but in > most cases being critically dependent on fault-driven software > emulation for performance would be a serious mistake in the first place > (discussions about the network code notwithstanding). > > This is not an argument for being slower just for the sake of it, but > it can make sense to factor code on paths where performance is not an > issue. > I'm all for unifying this stuff, but I still think it doesn't qualify for holding back on merging KVM patches. The ARM mode instruction decoding can definitely be cleaned up though to look more like the Thumb2 mode decoding which will be a good step before refactoring to use a more common framework. Currently we decode too many types of instructions (not just the ones with cleared HSR.IV) in ARM mode, so the whole complexity of that code can be reduced. I'll give that a go before re-sending the KVM patch series. -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: INFO: rcu_preempt detected stalls on CPUs/tasks: { 1} (detected by 0, t=10002 jiffies)
On 09/30/2012 04:59 AM, Fengguang Wu wrote: On Sun, Sep 30, 2012 at 01:32:46PM +0200, Avi Kivity wrote: On 09/30/2012 01:23 PM, Fengguang Wu wrote: On Sun, Sep 30, 2012 at 01:10:55PM +0200, Avi Kivity wrote: On 09/28/2012 05:35 AM, Paul E. McKenney wrote: On Thu, Sep 27, 2012 at 12:40:44PM +0800, Fengguang Wu wrote: On Wed, Sep 26, 2012 at 09:28:50PM -0700, Paul E. McKenney wrote: On Thu, Sep 27, 2012 at 10:54:00AM +0800, Fengguang Wu wrote: On Wed, Sep 26, 2012 at 09:45:43AM -0700, Paul E. McKenney wrote: On Wed, Sep 26, 2012 at 04:15:01PM +0800, Fengguang Wu wrote: [ . . . ] But could you also please send your .config file and a description of .config attached. the workload you are running? It's basically the below commands. The exact initrd is not relevant in this case because it's a boot time warning before user space is started. The stalls roughly happen 1 time on every 10 boots. Yow!!! You have severe cross-CPU time-synchronization problems. See for example the first dmesg, with the relevant part extracted right here. One CPU believes that it is about 37 seconds past boot, and the other CPU beleives that it is about 137 seconds past boot. Given that large of a time difference, an RCU CPU stall warning is expected behavior. Good spot! Yeah I noticed that huge timestamp gap, however didn't take it seriously enough.. Get your two CPUs in agreement about what time it is, and I bet that the CPU stall warnings will go away. Possibly KVM related? Because the warnings show up in many test boxes running KVM and so is not likely some hardware specific issue. I vaguely recall seeing something recently. But let's ask the KVM and timekeeping guys. >From the logs it looks like hpet (why not kvmclock?) is used for the clock, it should not generate such drifts since it is a global clock. Can you verify current_clocksource on a boot that actually failed (in case the clocksource is switched during runtime)? I've checked out the dmesg that's cited by Paul, attached. Yes it contains lines [4.970051] Switching to clocksource hpet and then [7.250353] Switching to clocksource tsc And there is no kvm-clock lines. Oh well for this particular kernel: Ah, tsc will certainly break on kvm if the hardware doesn't provide a constant tsc source. I'm surprised the guest kernel didn't detect it and switch back to hpet though. Thanks, it's good to know the root cause. All the dmesgs show the same hpet+tsc switching pattern (and never switch back): $ grep Switching dmesg-kvm_bisect2-inn-*21 dmesg-kvm_bisect2-inn-41931-2012-09-27-10-37-51-3.6.0-rc7-bisect2-00078-g593d100-21:[ 4.111415] Switching to clocksource hpet dmesg-kvm_bisect2-inn-41931-2012-09-27-10-37-51-3.6.0-rc7-bisect2-00078-g593d100-21:[ 6.550098] Switching to clocksource tsc Is this still an open issue? Fengguang's mail sounds like its resolved, but I'm not sure it is. The switching from HPET -> TSC I believe is expected, as the refined calibration will delay the TSC from being registered for a few seconds. However, its unclear why the TSC, if it is faulty, isn't being caught and demoted by the clocksource watchdog. I'm also curious why this originally bisected down to 06ae115a1d551cd952d8 (when using the kvm clock) if it was more of a hardware issue. And in those logs, I don't see the printk time-stamp inconsistencies that were alluded to in this thread. Fengguang: Is this still reproducible? Do you have any details (dmesg) about host system as well? thanks -john -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Using PCI config space to indicate config location
Anthony Liguori writes: > Gerd Hoffmann writes: > >> Hi, >> So we could have for virtio something like this: Capabilities: [??] virtio-regs: legacy: BAR=0 offset=0 virtio-pci: BAR=1 offset=1000 virtio-cfg: BAR=1 offset=1800 >>> >>> This would be a vendor specific PCI capability so lspci wouldn't >>> automatically know how to parse it. >> >> Sure, would need a patch to actually parse+print the cap, >> /me was just trying to make my point clear in a simple way. >> >> 2) ISTR an argument about mapping the ISR register separately, for >>performance, but I can't find a reference to it. > > I think the rationale is that ISR really needs to be PIO but everything > else doesn't. PIO is much faster on x86 because it doesn't require > walking page tables or instruction emulation to handle the exit. Is this still a pressing issue? With MSI-X enabled ISR isn't needed, correct? Which would imply that pretty much only old guests without MSI-X support need this, and we don't need to worry that much when designing something new ... >>> >>> It wasn't that long ago that MSI-X wasn't supported.. I think we should >>> continue to keep ISR as PIO as it is a fast path. >> >> No problem if we allow to have both legacy layout and new layout at the >> same time. Guests can continue to use ISR @ BAR0 in PIO space for >> existing virtio devices, even in case they want use mmio for other >> registers -> all fine. >> >> New virtio devices can support MSI-X from day one and decide to not >> expose a legacy layout PIO bar. > > I think having BAR1 be an MMIO mirror of the registers + a BAR2 for > virtio configuration space is probably not that bad of a solution. Well, we also want to clean up the registers, so how about: BAR0: legacy, as is. If you access this, don't use the others. BAR1: new format virtio-pci layout. If you use this, don't use BAR0. BAR2: virtio-cfg. If you use this, don't use BAR0. BAR3: ISR. If you use this, don't use BAR0. I prefer the cases exclusive (ie. use one or the other) as a clear path to remove the legacy layout; and leaving the ISR in BAR0 leaves us with an ugly corner case in future (ISR is BAR0 + 19? WTF?). As to MMIO vs PIO, the BARs are self-describing, so we should explicitly endorse that and leave it to the devices. The detection is simple: if BAR1 has non-zero length, it's new-style, otherwise legacy. Thoughts? Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] virt/kvm: change kvm_assign_device() to print return value when iommu_attach_device() fails
Change existing kernel error message to include return value from iommu_attach_device() when it fails. This will help debug device assignment failures more effectively. Signed-off-by: Shuah Khan --- virt/kvm/iommu.c |6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c index 037cb67..18e1e30 100644 --- a/virt/kvm/iommu.c +++ b/virt/kvm/iommu.c @@ -168,11 +168,7 @@ int kvm_assign_device(struct kvm *kvm, r = iommu_attach_device(domain, &pdev->dev); if (r) { - printk(KERN_ERR "assign device %x:%x:%x.%x failed", - pci_domain_nr(pdev->bus), - pdev->bus->number, - PCI_SLOT(pdev->devfn), - PCI_FUNC(pdev->devfn)); + dev_err(&pdev->dev, "kvm assign device failed ret %d", r); return r; } -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Question] Intercept CR3 access in EPT
Hi, Actually, I know that disabling EPT would work. But thank U anyway. What I interesting in is why would it fail when EPT is enable. Thank U for answering. 2012/10/9 Marcelo Tosatti : > On Mon, Oct 08, 2012 at 04:15:57PM +0800, R wrote: >> Hi, >> >> I am a student. And my teacher told me to monitor every process in guest. >> So, I try to intercept every Cr3 access. However, if kvm is loaded >> with EPT enable, Acesses to Cr3 would not cause VM-exit. > > Disable EPT by loading kvm-intel.ko module with enable_ept=0 parameter. > Then, CR3 accesses will trap. > >> I modified the code to change vmcs configuration. >> To be specific, these functions are rewirted. >> static void ept_update_paging_mode_cr0(unsigned long *hw_cr0, >> unsigned long cr0, >> struct kvm_vcpu *vcpu) >> { >> >> } else if (!is_paging(vcpu)) { >> /* From nonpaging to paging */ >> vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, >>vmcs_read32(CPU_BASED_VM_EXEC_CONTROL) & >> - ~(CPU_BASED_CR3_LOAD_EXITING | >> + ~(// CPU_BASED_CR3_LOAD_EXITING| >> CPU_BASED_CR3_STORE_EXITING)); >> >> } >> >> static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) >> { >> ... >> if (_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) >> { >> /* CR3 accesses and invlpg don't need to cause VM Exits when >> EPT >> enabled */ >> - _cpu_based_exec_control &= ~(CPU_BASED_CR3_LOAD_EXITING | >> + _cpu_based_exec_control &= ~( // >> CPU_BASED_CR3_LOAD_EXITING | >>CPU_BASED_CR3_STORE_EXITING | >>CPU_BASED_INVLPG_EXITING); >> >> } >> >> I though it can force every Cr3 access to be trapped with EPT enable. >> However, VM seems to fail to boot when it changes from nonpaging to >> paging. >> Do U guys have any idea? Or Can someone tell me how can I intercept >> Cr3 access and why can not it work? >> >> Thank U for answering. > >> >> -- >> Thanks >> Rui Wu >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- Thanks Rui Wu -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Using PCI config space to indicate config location
Rusty Russell writes: > Anthony Liguori writes: >> Gerd Hoffmann writes: >> >>> Hi, >>> > So we could have for virtio something like this: > > Capabilities: [??] virtio-regs: > legacy: BAR=0 offset=0 > virtio-pci: BAR=1 offset=1000 > virtio-cfg: BAR=1 offset=1800 This would be a vendor specific PCI capability so lspci wouldn't automatically know how to parse it. >>> >>> Sure, would need a patch to actually parse+print the cap, >>> /me was just trying to make my point clear in a simple way. >>> >>> 2) ISTR an argument about mapping the ISR register separately, for >>>performance, but I can't find a reference to it. >> >> I think the rationale is that ISR really needs to be PIO but everything >> else doesn't. PIO is much faster on x86 because it doesn't require >> walking page tables or instruction emulation to handle the exit. > > Is this still a pressing issue? With MSI-X enabled ISR isn't needed, > correct? Which would imply that pretty much only old guests without > MSI-X support need this, and we don't need to worry that much when > designing something new ... It wasn't that long ago that MSI-X wasn't supported.. I think we should continue to keep ISR as PIO as it is a fast path. >>> >>> No problem if we allow to have both legacy layout and new layout at the >>> same time. Guests can continue to use ISR @ BAR0 in PIO space for >>> existing virtio devices, even in case they want use mmio for other >>> registers -> all fine. >>> >>> New virtio devices can support MSI-X from day one and decide to not >>> expose a legacy layout PIO bar. >> >> I think having BAR1 be an MMIO mirror of the registers + a BAR2 for >> virtio configuration space is probably not that bad of a solution. > > Well, we also want to clean up the registers, so how about: > > BAR0: legacy, as is. If you access this, don't use the others. > BAR1: new format virtio-pci layout. If you use this, don't use BAR0. > BAR2: virtio-cfg. If you use this, don't use BAR0. > BAR3: ISR. If you use this, don't use BAR0. > > I prefer the cases exclusive (ie. use one or the other) as a clear path > to remove the legacy layout; and leaving the ISR in BAR0 leaves us with > an ugly corner case in future (ISR is BAR0 + 19? WTF?). We'll never remove legacy so we shouldn't plan on it. There are literally hundreds of thousands of VMs out there with the current virtio drivers installed in them. We'll be supporting them for a very, very long time :-) I don't think we gain a lot by moving the ISR into a separate BAR. Splitting up registers like that seems weird to me too. It's very normal to have a mirrored set of registers that are PIO in one bar and MMIO in a different BAR. If we added an additional constraints that BAR1 was mirrored except for the config space and the MSI section was always there, I think the end result would be nice. IOW: BAR0[pio]: virtio-pci registers + optional MSI section + virtio-config BAR1[mmio]: virtio-pci registers + MSI section + future extensions BAR2[mmio]: virtio-config We can continue to do ISR access via BAR0 for performance reasons. > As to MMIO vs PIO, the BARs are self-describing, so we should explicitly > endorse that and leave it to the devices. > > The detection is simple: if BAR1 has non-zero length, it's new-style, > otherwise legacy. I agree that this is the best way to extend, but I think we should still use a transport feature bit. We want to be able to detect within QEMU whether a guest is using these new features because we need to adjust migration state accordingly. Otherwise we would have to detect reads/writes to the new BARs to maintain whether the extended register state needs to be saved. This gets nasty dealing with things like reset. A feature bit simplifies this all pretty well. Regards, Anthony Liguori > > Thoughts? > Rusty. > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Using PCI config space to indicate config location
Anthony Liguori writes: > We'll never remove legacy so we shouldn't plan on it. There are > literally hundreds of thousands of VMs out there with the current virtio > drivers installed in them. We'll be supporting them for a very, very > long time :-) You will be supporting this for qemu on x86, sure. As I think we're still in the growth phase for virtio, I prioritize future spec cleanliness pretty high. But I think you'll be surprised how fast this is deprecated: 1) Bigger queues for block devices (guest-specified ringsize) 2) Smaller rings for openbios (guest-specified alignment) 3) All-mmio mode (powerpc) 4) Whatever network features get numbers > 31. > I don't think we gain a lot by moving the ISR into a separate BAR. > Splitting up registers like that seems weird to me too. Confused. I proposed the same split as you have, just ISR by itself. > It's very normal to have a mirrored set of registers that are PIO in one > bar and MMIO in a different BAR. > > If we added an additional constraints that BAR1 was mirrored except for > the config space and the MSI section was always there, I think the end > result would be nice. IOW: But it won't be the same, because we want all that extra stuff, like more feature bits and queue size alignment. (Admittedly queues past 16TB aren't a killer feature). To make it concrete: Current: struct { __le32 host_features; /* read-only */ __le32 guest_features; /* read/write */ __le32 queue_pfn; /* read/write */ __le16 queue_size; /* read-only */ __le16 queue_sel; /* read/write */ __le16 queue_notify;/* read/write */ u8 status; /* read/write */ u8 isr; /* read-only, clear on read */ /* Optional */ __le16 msi_config_vector; /* read/write */ __le16 msi_queue_vector;/* read/write */ /* ... device features */ }; Proposed: struct virtio_pci_cfg { /* About the whole device. */ __le32 device_feature_select; /* read-write */ __le32 device_feature; /* read-only */ __le32 guest_feature_select;/* read-write */ __le32 guest_feature; /* read-only */ __le16 msix_config; /* read-write */ __u8 device_status; /* read-write */ __u8 unused; /* About a specific virtqueue. */ __le16 queue_select;/* read-write */ __le16 queue_align; /* read-write, power of 2. */ __le16 queue_size; /* read-write, power of 2. */ __le16 queue_msix_vector;/* read-write */ __le64 queue_address; /* read-write: 0x == DNE. */ }; struct virtio_pci_isr { __u8 isr; /* read-only, clear on read */ }; We could also enforce LE in the per-device config space in this case, another nice cleanup for PCI people. > BAR0[pio]: virtio-pci registers + optional MSI section + virtio-config > BAR1[mmio]: virtio-pci registers + MSI section + future extensions > BAR2[mmio]: virtio-config > > We can continue to do ISR access via BAR0 for performance reasons. But powerpc explicitly *doesnt* want a pio bar. So let it be its own bar, which can be either. >> As to MMIO vs PIO, the BARs are self-describing, so we should explicitly >> endorse that and leave it to the devices. >> >> The detection is simple: if BAR1 has non-zero length, it's new-style, >> otherwise legacy. > > I agree that this is the best way to extend, but I think we should still > use a transport feature bit. We want to be able to detect within QEMU > whether a guest is using these new features because we need to adjust > migration state accordingly. > > Otherwise we would have to detect reads/writes to the new BARs to > maintain whether the extended register state needs to be saved. This > gets nasty dealing with things like reset. I don't think it'll be that bad; reset clears the device to unknown, bar0 moves it from unknown->legacy mode, bar1/2/3 changes it from unknown->modern mode, and anything else is bad (I prefer being strict so we catch bad implementations from the beginning). But I'm happy to implement it and see what it's like. > A feature bit simplifies this all pretty well. I suspect it will be quite ugly, actually. The guest has to use BAR0 to get the features to see if they can use BAR1. Do they ack the feature (via BAR0) before accessing BAR1? If not, qemu can't rely on the feature bit. Cheers, Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
live attached disk cannot be found in the guest VM sometimes
Hi all, I got a issue when attach disk to a qemu-kvm guest(kernel version: Linux debian 3.2.0-3-amd64 #1 SMP Thu Jun 28 09:07:26 UTC 2012 x86_64 GNU/Linux) The steps are: 1. using libvirt API start a guest 2. 'ping' the IP address of the guest once each time 3. when 'ping' return OK, I attach a ISCSI LVM disk into the guest by libvirt API, and the disk appears in the guest this works successfully most of the time, BUT sometimes I cannot see the disk in the guest(by 'fdisk -l') evenif the libvirt API return OK and I can see the disk is in the XML configuration dumped by 'virsh dumpxml'. I check the syslog of the guest, and guess the reason may be that, the attach operation is before the kernel module loaded(pci_hotplug/acpiphp), so the attached disk doesn't appear in the guest. Is this right? Log messages: qemu log: 2012-10-08 09:47:31.753+: starting up(guest starts time, UTC should +8 to CST) libvirt API call log: 2012-10-08 17:47:36,931 INFO Attach volume 1449 into virtual machine device 11606f9e-98ee-4857-99ea-14a576037bfc begin... 2012-10-08 17:47:37,757 INFO Successfully attach volume 1449 into 11606f9e-98ee-4857-99ea-14a576037bfc, virtual machine device: /dev/ebs/xdey syslog: Oct 8 17:47:42 debian kernel: imklog 5.8.11, log source = /proc/kmsg started Any suggestion is welcome, thanks in advance. Wangpan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in kvm on ia64
The Buildbot has detected a new failure on builder ia64 while building kvm. Full details are available at: http://buildbot.b1-systems.de/kvm/builders/ia64/builds/688 Buildbot URL: http://buildbot.b1-systems.de/kvm/ Buildslave for this Build: b1_kvm_1 Build Reason: The Nightly scheduler named 'nightly_master' triggered this build Build Source Stamp: [branch master] HEAD Blamelist: BUILD FAILED: failed compile sincerely, -The Buildbot
Re: live attached disk cannot be found in the guest VM sometimes
Another thing should be concerned: when I attach another disk after a disk attached but cannot found in guest, the TWO disks all appear in guest! Oct 9 10:45:24 debian kernel: [61068.049509] pci :00:11.0: [1af4:1001] type 0 class 0x000100 Oct 9 10:45:24 debian kernel: [61068.049715] pci :00:11.0: reg 10: [io 0x-0x003f] Oct 9 10:45:24 debian kernel: [61068.049813] pci :00:11.0: reg 14: [mem 0x-0x0fff] Oct 9 10:45:24 debian kernel: [61068.051544] pci :00:11.0: BAR 1: assigned [mem 0xe000-0xefff] Oct 9 10:45:24 debian kernel: [61068.051584] pci :00:11.0: BAR 1: set to [mem 0xe000-0xefff] (PCI address [0xe000-0xefff]) Oct 9 10:45:24 debian kernel: [61068.051591] pci :00:11.0: BAR 0: assigned [io 0x1000-0x103f] Oct 9 10:45:24 debian kernel: [61068.051623] pci :00:11.0: BAR 0: set to [io 0x1000-0x103f] (PCI address [0x1000-0x103f]) Oct 9 10:45:24 debian kernel: [61068.051633] pci :00:00.0: no hotplug settings from platform Oct 9 10:45:24 debian kernel: [61068.051636] pci :00:00.0: using default PCI settings Oct 9 10:45:24 debian kernel: [61068.051695] pci :00:01.0: no hotplug settings from platform Oct 9 10:45:24 debian kernel: [61068.051698] pci :00:01.0: using default PCI settings Oct 9 10:45:24 debian kernel: [61068.051756] ata_piix :00:01.1: no hotplug settings from platform Oct 9 10:45:24 debian kernel: [61068.051759] ata_piix :00:01.1: using default PCI settings Oct 9 10:45:24 debian kernel: [61068.051818] uhci_hcd :00:01.2: no hotplug settings from platform Oct 9 10:45:24 debian kernel: [61068.051820] uhci_hcd :00:01.2: using default PCI settings Oct 9 10:45:24 debian kernel: [61068.051878] piix4_smbus :00:01.3: no hotplug settings from platform Oct 9 10:45:24 debian kernel: [61068.051880] piix4_smbus :00:01.3: using default PCI settings Oct 9 10:45:24 debian kernel: [61068.051938] pci :00:02.0: no hotplug settings from platform Oct 9 10:45:24 debian kernel: [61068.051941] pci :00:02.0: using default PCI settings Oct 9 10:45:24 debian kernel: [61068.052021] virtio-pci :00:03.0: no hotplug settings from platform Oct 9 10:45:24 debian kernel: [61068.052021] virtio-pci :00:03.0: using default PCI settings Oct 9 10:45:24 debian kernel: [61068.052146] virtio-pci :00:04.0: no hotplug settings from platform Oct 9 10:45:24 debian kernel: [61068.052146] virtio-pci :00:04.0: using default PCI settings Oct 9 10:45:24 debian kernel: [61068.052173] virtio-pci :00:05.0: no hotplug settings from platform Oct 9 10:45:24 debian kernel: [61068.052175] virtio-pci :00:05.0: using default PCI settings Oct 9 10:45:24 debian kernel: [61068.052311] virtio-pci :00:06.0: no hotplug settings from platform Oct 9 10:45:24 debian kernel: [61068.052316] virtio-pci :00:06.0: using default PCI settings Oct 9 10:45:24 debian kernel: [61068.052337] pci :00:11.0: no hotplug settings from platform Oct 9 10:45:24 debian kernel: [61068.052337] pci :00:11.0: using default PCI settings Oct 9 10:45:24 debian kernel: [61068.052939] virtio-pci :00:11.0: enabling device ( -> 0003) the new attached disk// Oct 9 10:45:24 debian kernel: [61068.053982] virtio-pci :00:11.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ 10 Oct 9 10:45:24 debian kernel: [61068.054199] virtio-pci :00:11.0: setting latency timer to 64 Oct 9 10:45:24 debian kernel: [61068.054930] virtio-pci :00:11.0: irq 47 for MSI/MSI-X Oct 9 10:45:24 debian kernel: [61068.054964] virtio-pci :00:11.0: irq 48 for MSI/MSI-X Oct 9 10:45:25 debian kernel: [61068.219537] vdc: unknown partition table /the new attached disk/ Oct 9 10:45:25 debian kernel: [61068.221828] pci :00:10.0: [1af4:1001] type 0 class 0x000100 Oct 9 10:45:25 debian kernel: [61068.222047] pci :00:10.0: reg 10: [io 0x-0x003f] Oct 9 10:45:25 debian kernel: [61068.222145] pci :00:10.0: reg 14: [mem 0x-0x0fff] Oct 9 10:45:25 debian kernel: [61068.223162] pci :00:10.0: BAR 1: assigned [mem 0xe0001000-0xe0001fff] Oct 9 10:45:25 debian kernel: [61068.223223] pci :00:10.0: BAR 1: set to [mem 0xe0001000-0xe0001fff] (PCI address [0xe0001000-0xe0001fff]) Oct 9 10:45:25 debian kernel: [61068.223223] pci :00:10.0: BAR 0: assigned [io 0x1040-0x107f] Oct 9 10:45:25 debian kernel: [61068.223260] pci :00:10.0: BAR 0: set to [io 0x1040-0x107f] (PCI address [0x1040-0x107f]) Oct 9 10:45:25 debian kernel: [61068.223427] pci :00:00.0: no hotplug settings from platform Oct 9 10:45:25 debian kernel: [61068.223430] pci :00:00.0: using default PCI settings Oct 9 10:45:25 debian kernel: [61068.223495] pci :00:01.0: no hotplug settings from platform Oct 9 10:45:25 debian kernel: [61068.223497] pci :00:01.0: using default PCI settings
Re: [PATCH 0/3] virtio-net: inline header support
Paolo Bonzini writes: > Il 05/10/2012 07:43, Rusty Russell ha scritto: >> That's good. But virtio_blk's scsi command is insoluble AFAICT. As I >> said to Anthony, the best rules are "always" and "never", so I'd really >> rather not have to grandfather that in. > > It is, but we can add a rule that if the (transport) flag > VIRTIO_RING_F_ANY_HEADER_SG is set, the cdb field is always 32 bytes in > virtio-blk. Could we do that? It's the cmd length I'm concerned about; is it always 32 in practice for some reason? Currently qemu does: struct sg_io_hdr hdr; memset(&hdr, 0, sizeof(struct sg_io_hdr)); hdr.interface_id = 'S'; hdr.cmd_len = req->elem.out_sg[1].iov_len; hdr.cmdp = req->elem.out_sg[1].iov_base; hdr.dxfer_len = 0; If it's a command which expects more output data, there's no way to guess where the boundary is between that command and the data. Cheers, Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in kvm on next-ia64
The Buildbot has detected a new failure on builder next-ia64 while building kvm. Full details are available at: http://buildbot.b1-systems.de/kvm/builders/next-ia64/builds/673 Buildbot URL: http://buildbot.b1-systems.de/kvm/ Buildslave for this Build: b1_kvm_1 Build Reason: The Nightly scheduler named 'nightly_next' triggered this build Build Source Stamp: [branch next] HEAD Blamelist: BUILD FAILED: failed compile sincerely, -The Buildbot
Re: [Qemu-devel] Using PCI config space to indicate config location
Hi, >> Well, we also want to clean up the registers, so how about: >> >> BAR0: legacy, as is. If you access this, don't use the others. Ok. >> BAR1: new format virtio-pci layout. If you use this, don't use BAR0. >> BAR2: virtio-cfg. If you use this, don't use BAR0. Why use two bars for this? You can put them into one mmio bar, together with the msi-x vector table and PBA. Of course a pci capability describing the location is helpful for that ;) >> BAR3: ISR. If you use this, don't use BAR0. Again, I wouldn't hardcode that but use a capability. >> I prefer the cases exclusive (ie. use one or the other) as a clear path >> to remove the legacy layout; and leaving the ISR in BAR0 leaves us with >> an ugly corner case in future (ISR is BAR0 + 19? WTF?). Ok, so we have four register sets: (1) legacy layout (2) new virtio-pci (3) new virtio-config (4) new virtio-isr We can have a vendor pci capability, with a dword for each register set: bit 31-- present bit bits 26-24 -- bar bits 23-0 -- offset So current drivers which must support legacy can use this: legacy layout -- present, bar 0, offset 0 new virtio-pci-- present, bar 1, offset 0 new virtio-config -- present, bar 1, offset 256 new virtio-isr-- present, bar 0, offset 19 [ For completeness: msi-x capability could add this: ] msi-x vector tablebar 1, offset 512 msi-x pba bar 1, offset 768 > We'll never remove legacy so we shouldn't plan on it. There are > literally hundreds of thousands of VMs out there with the current virtio > drivers installed in them. We'll be supporting them for a very, very > long time :-) But new devices (virtio-qxl being a candidate) don't have old guests and don't need to worry. They could use this if they care about fast isr: legacy layout -- not present new virtio-pci-- present, bar 1, offset 0 new virtio-config -- present, bar 1, offset 256 new virtio-isr-- present, bar 0, offset 0 Or this if they don't worry about isr performance: legacy layout -- not present new virtio-pci-- present, bar 0, offset 0 new virtio-config -- present, bar 0, offset 256 new virtio-isr-- not present > I don't think we gain a lot by moving the ISR into a separate BAR. > Splitting up registers like that seems weird to me too. Main advantage of defining a register set with just isr is that it reduces pio address space consumtion for new virtio devices which don't have to worry about the legacy layout (8 bytes which is minimum size for io bars instead of 64 bytes). > If we added an additional constraints that BAR1 was mirrored except for Why add constraints? We want something future-proof, don't we? >> The detection is simple: if BAR1 has non-zero length, it's new-style, >> otherwise legacy. Doesn't fly. BAR1 is in use today for MSI-X support. > I agree that this is the best way to extend, but I think we should still > use a transport feature bit. We want to be able to detect within QEMU > whether a guest is using these new features because we need to adjust > migration state accordingly. Why does migration need adjustments? [ Not that I want veto a feature bit, but I don't see the need yet ] cheers, Gerd -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html