Steal time in KVM

2012-10-08 Thread Abhishek Gupta
Hi,

I am trying to get the steal time with 2 VMs (each with 1 Vcpu) pinned
to same core.

While finding documentation on this, I came across your patches and
posts related to the implementation of this feature, so I thought it
would be best to ask you.

I run the same application on these 2 VMs simultaneously and see the
performance difference. I am trying to read the steal time from inside
the guest using top, vmstat etc.

Both, top and vmstat -s report the steal time (st) as 0. I also
checked that procps is in latest version. I am using virtio-net. I
suspect that the steal time is not being updated well. Is there
something which I need to configure for this to work? My Linux version
for guest image is:

Linux server-147 2.6.35-24-virtual #42-Ubuntu SMP Thu Dec 2 05:15:26
UTC 2010 x86_64 GNU/Linux

And /proc/cpuinfo shows i:

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 2
model name  : QEMU Virtual CPU version 0.14.0
stepping: 3
cpu MHz : 2992.498
cache size  : 4096 KB
fpu : yes
fpu_exception   : yes
cpuid level : 4
wp  : yes
flags   : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pse36 clflush mmx fxsr sse sse2 syscall nx lm up rep_good pni cx16
hypervisor lahf_lm
bogomips: 5984.99
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:


Another question I had was that is there a way to programmatically get
the value of steal cycles (e.g. from a  C program) ?
Thanks,

Abhishek

On Monday, June 13, 2011 6:40:02 PM UTC-5, Glauber Costa wrote:
> Hi,
>
> This series is a repost of the last series I posted about this.
> It tries to address most concerns that were raised at the time,
> plus makes uses of the static_branch interface to disable the
> steal code when not in use.
>
>
> Glauber Costa (7):
>   KVM-HDR Add constant to represent KVM MSRs enabled bit
>   KVM-HDR: KVM Steal time implementation
>   KVM-HV: KVM Steal time implementation
>   KVM-GST: Add a pv_ops stub for steal time
>   KVM-GST: KVM Steal time accounting
>   KVM-GST: adjust scheduler cpu power
>   KVM-GST: KVM Steal time registration
>
>  Documentation/kernel-parameters.txt   |4 ++
>  Documentation/virtual/kvm/msr.txt |   33 +
>  arch/x86/Kconfig  |   12 +
>  arch/x86/include/asm/kvm_host.h   |8 +++
>  arch/x86/include/asm/kvm_para.h   |   15 ++
>  arch/x86/include/asm/paravirt.h   |9 
>  arch/x86/include/asm/paravirt_types.h |1 +
>  arch/x86/kernel/kvm.c |   72 +
>  arch/x86/kernel/kvmclock.c|2 +
>  arch/x86/kernel/paravirt.c|9 
>  arch/x86/kvm/x86.c|   60 +++-
>  kernel/sched.c|   81 
> +
>  kernel/sched_features.h   |4 +-
>  13 files changed, 296 insertions(+), 14 deletions(-)
>
> --
> 1.7.3.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



On Monday, June 13, 2011 6:40:02 PM UTC-5, Glauber Costa wrote:
> Hi,
>
> This series is a repost of the last series I posted about this.
> It tries to address most concerns that were raised at the time,
> plus makes uses of the static_branch interface to disable the
> steal code when not in use.
>
>
> Glauber Costa (7):
>   KVM-HDR Add constant to represent KVM MSRs enabled bit
>   KVM-HDR: KVM Steal time implementation
>   KVM-HV: KVM Steal time implementation
>   KVM-GST: Add a pv_ops stub for steal time
>   KVM-GST: KVM Steal time accounting
>   KVM-GST: adjust scheduler cpu power
>   KVM-GST: KVM Steal time registration
>
>  Documentation/kernel-parameters.txt   |4 ++
>  Documentation/virtual/kvm/msr.txt |   33 +
>  arch/x86/Kconfig  |   12 +
>  arch/x86/include/asm/kvm_host.h   |8 +++
>  arch/x86/include/asm/kvm_para.h   |   15 ++
>  arch/x86/include/asm/paravirt.h   |9 
>  arch/x86/include/asm/paravirt_types.h |1 +
>  arch/x86/kernel/kvm.c |   72 +
>  arch/x86/kernel/kvmclock.c|2 +
>  arch/x86/kernel/paravirt.c|9 
>  arch/x86/kvm/x86.c|   60 +++-
>  kernel/sched.c|   81 
> +
>  kernel/sched_features.h   |4 +-
>  13 files changed, 296 insertions(+), 14 deletions(-)
>
> --
> 1.7.3.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.htm

Re: [PATCH 14/15] KVM: ARM: Handle I/O aborts

2012-10-08 Thread Dave Martin
On Fri, Oct 05, 2012 at 10:00:25AM +0100, Russell King - ARM Linux wrote:
> On Mon, Oct 01, 2012 at 01:53:26PM +0100, Dave Martin wrote:
> > A good starting point would be load/store emulation as this seems to be a
> > common theme, and we would need a credible deployment for any new
> > framework so that we know it's fit for purpose.
> 
> Probably not actually, that code is written to be fast, because things
> like IP stack throughput depend on it - particularly when your network
> card can only DMA packets to 32-bit aligned addresses (resulting in
> virtually all network data being misaligned.)

A fair point, but surely it would still be worth a try?
 
We might decide that a few particular cases of instruction decode
should not use the generic framework for performance reaons, but in
most cases being critically dependent on fault-driven software
emulation for performance would be a serious mistake in the first place
(discussions about the network code notwithstanding).

This is not an argument for being slower just for the sake of it, but
it can make sense to factor code on paths where performance is not an
issue.

Cheers
---Dave
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-ppc] [RFC PATCH 05/17] KVM: PPC: booke: Extend MAS2 EPN mask for 64-bit

2012-10-08 Thread Alexander Graf

On 05.07.2012, at 13:14, Caraman Mihai Claudiu-B02008 wrote:

> 
> 
>> -Original Message-
>> From: Alexander Graf [mailto:ag...@suse.de]
>> Sent: Wednesday, July 04, 2012 4:50 PM
>> To: Caraman Mihai Claudiu-B02008
>> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc-
>> d...@lists.ozlabs.org; qemu-...@nongnu.org
>> Subject: Re: [Qemu-ppc] [RFC PATCH 05/17] KVM: PPC: booke: Extend MAS2
>> EPN mask for 64-bit
>> 
>> 
>> On 25.06.2012, at 14:26, Mihai Caraman wrote:
>> 
>>> Extend MAS2 EPN mask for 64-bit hosts, to retain most significant bits.
>>> Change get tlb eaddr to use this mask.
>> 
>> Please see section 6.11.4.8 in the PowerISA 2.06b:
>> 
>> MMU behavior is largely unaffected by whether the thread is in 32-bit
>> computation mode (MSRCM=0) or 64- bit computation mode (MSRCM=1). The
>> only differ- ences occur in the EPN field of the TLB entry and the EPN
>> field of MAS2. The differences are summarized here.
>> 
>>  *  Executing a tlbwe instruction in 32-bit mode will set bits 0:31
>> of the TLB EPN field to zero unless MAS0ATSEL is set, in which case those
>> bits are not written to zero.
>>  *  In 32-bit implementations, MAS2U can be used to read or write
>> EPN0:31 of MAS2.
>> 
>> So if MSR.CM is not set tlbwe should mask the upper 32 bits out - which
>> can happen regardless of CONFIG_64BIT.
> 
> MAS2_EPN reflects EPN field of MAS2 aka bits 0:51 (for MAV = 1.0) according
> to section 6.10.3.10 in the PowerISA 2.06b.
> 
> MAS2_EPN is not used in tlbwe execution emulation, we have MAS2_VAL define
> for this case.

So tlbe->mas2 is guaranteed to have the upper bits be 0 when MSR.CM=0?

> 
>> Also, we need to implement MAS2U, to potentially make the upper 32bits of
>> MAS2 available, right? But that one isn't as important as the first bit.
> 
> MAS2U is guest privileged why does it need special care?

Maybe it's mapped to the upper bits of GMAS2 automatically?

> Freescale core Manuals and EREF does not mention MAS2U so I think I our case
> it is not implemented.

Please check with a simple mfspr() test on real hw to see if it really isn't 
implemented.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] kvmclock: fix guest stop notification

2012-10-08 Thread Amit Shah
On (Sun) 30 Sep 2012 [20:05:16], Marcelo Tosatti wrote:
> On Thu, Sep 20, 2012 at 09:46:41AM -0300, Marcelo Tosatti wrote:
> > On Thu, Sep 20, 2012 at 01:55:20PM +0530, Amit Shah wrote:
> > > Commit f349c12c0434e29c79ecde89029320c4002f7253 added the guest stop
> > > notification, but it did it in a way that the stop notification would
> > > never reach the kernel.  The kvm_vm_state_changed() function gets a
> > > value of 0 for the 'running' parameter when the VM is stopped, making
> > > all the code added previously dead code.
> > > 
> > > This patch reworks the code so that it's called when 'running' is 0,
> > > which indicates the VM was stopped.

...

> NACK, guest should be notified when the VM is starting, not
> when stopping.

Ah, right.

Amit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] kvmclock: fix guest stop notification

2012-10-08 Thread Amit Shah
On (Sun) 30 Sep 2012 [21:50:07], Amos Kong wrote:
> - Original Message -
> > On Thu, Sep 20, 2012 at 09:46:41AM -0300, Marcelo Tosatti wrote:
> > > On Thu, Sep 20, 2012 at 01:55:20PM +0530, Amit Shah wrote:
> > > > Commit f349c12c0434e29c79ecde89029320c4002f7253 added the guest
> > > > stop
> 
> In commitlog of f349c12c0434e29c79ecde89029320c4002f7253: 
> 
> ## This patch uses the qemu Notifier system to tell the guest it _is about to 
> be_ stopped
> 
> 
> > > > notification, but it did it in a way that the stop notification
> > > > would
> > > > never reach the kernel.  The kvm_vm_state_changed() function gets
> > > > a
> > > > value of 0 for the 'running' parameter when the VM is stopped,
> > > > making
> > > > all the code added previously dead code.
> > > > 
> > > > This patch reworks the code so that it's called when 'running' is
> > > > 0,
> > > > which indicates the VM was stopped.
> 
> Amit, did you touch any real issue? guest gets call trace with current code?
> which kind of context?

I guess you're asking for a testcase to trigger softlockups?

Run a VM, make it do some work (like kernel compile).  Then, 'stop'
from the monitor for a few minutes.  Later, on 'cont', the softlockup
detector in the guest wakes up and shows a warning message mentioning
the cpus were stuck for  seconds.

For this particular patch, though, I didn't really test things; just
'found' this by examining code.  But as Marcelo points out, this patch
is wrong.

> Someone told me he got call trace when shutdown guest by 'init 0', I didn't
> verify this issue.

That sounds like a completely different thing, unless the trace is
invoked by the softlockup detector.

Amit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Qemu-ppc] [RFC PATCH 05/17] KVM: PPC: booke: Extend MAS2 EPN mask for 64-bit

2012-10-08 Thread Caraman Mihai Claudiu-B02008
> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Monday, October 08, 2012 1:11 PM
> To: Caraman Mihai Claudiu-B02008
> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc-
> d...@lists.ozlabs.org; qemu-...@nongnu.org
> Subject: Re: [Qemu-ppc] [RFC PATCH 05/17] KVM: PPC: booke: Extend MAS2
> EPN mask for 64-bit
> 
> 
> On 05.07.2012, at 13:14, Caraman Mihai Claudiu-B02008 wrote:
> 
> >
> >
> >> -Original Message-
> >> From: Alexander Graf [mailto:ag...@suse.de]
> >> Sent: Wednesday, July 04, 2012 4:50 PM
> >> To: Caraman Mihai Claudiu-B02008
> >> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc-
> >> d...@lists.ozlabs.org; qemu-...@nongnu.org
> >> Subject: Re: [Qemu-ppc] [RFC PATCH 05/17] KVM: PPC: booke: Extend MAS2
> >> EPN mask for 64-bit
> >>
> >>
> >> On 25.06.2012, at 14:26, Mihai Caraman wrote:
> >>
> >>> Extend MAS2 EPN mask for 64-bit hosts, to retain most significant
> bits.
> >>> Change get tlb eaddr to use this mask.
> >>
> >> Please see section 6.11.4.8 in the PowerISA 2.06b:
> >>
> >> MMU behavior is largely unaffected by whether the thread is in 32-bit
> >> computation mode (MSRCM=0) or 64- bit computation mode (MSRCM=1). The
> >> only differ- ences occur in the EPN field of the TLB entry and the EPN
> >> field of MAS2. The differences are summarized here.
> >>
> >>*  Executing a tlbwe instruction in 32-bit mode will set bits 0:31
> >> of the TLB EPN field to zero unless MAS0ATSEL is set, in which case
> those
> >> bits are not written to zero.
> >>*  In 32-bit implementations, MAS2U can be used to read or write
> >> EPN0:31 of MAS2.
> >>
> >> So if MSR.CM is not set tlbwe should mask the upper 32 bits out -
> which
> >> can happen regardless of CONFIG_64BIT.
> >
> > MAS2_EPN reflects EPN field of MAS2 aka bits 0:51 (for MAV = 1.0)
> according
> > to section 6.10.3.10 in the PowerISA 2.06b.
> >
> > MAS2_EPN is not used in tlbwe execution emulation, we have MAS2_VAL
> define
> > for this case.
> 
> So tlbe->mas2 is guaranteed to have the upper bits be 0 when MSR.CM=0?

We chose to mask out mas2 upper bits on tlbwe emulation so gtlbe->mas2 will
respect this but vcpu->arch.shared->mas2 will not. tlb entry selection does not
require this treatment since EPN upper bits are not taken into consideration 
anyway.

> 
> >
> >> Also, we need to implement MAS2U, to potentially make the upper 32bits
> of
> >> MAS2 available, right? But that one isn't as important as the first
> bit.
> >
> > MAS2U is guest privileged why does it need special care?
> 
> Maybe it's mapped to the upper bits of GMAS2 automatically?

GMAS2?

> 
> > Freescale core Manuals and EREF does not mention MAS2U so I think I our
> case
> > it is not implemented.
> 
> Please check with a simple mfspr() test on real hw to see if it really
> isn't implemented.

I will try this with SPR number 0x277.

-Mike




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-ppc] [RFC PATCH 05/17] KVM: PPC: booke: Extend MAS2 EPN mask for 64-bit

2012-10-08 Thread Alexander Graf

On 08.10.2012, at 15:06, Caraman Mihai Claudiu-B02008 wrote:

>> -Original Message-
>> From: Alexander Graf [mailto:ag...@suse.de]
>> Sent: Monday, October 08, 2012 1:11 PM
>> To: Caraman Mihai Claudiu-B02008
>> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc-
>> d...@lists.ozlabs.org; qemu-...@nongnu.org
>> Subject: Re: [Qemu-ppc] [RFC PATCH 05/17] KVM: PPC: booke: Extend MAS2
>> EPN mask for 64-bit
>> 
>> 
>> On 05.07.2012, at 13:14, Caraman Mihai Claudiu-B02008 wrote:
>> 
>>> 
>>> 
 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Wednesday, July 04, 2012 4:50 PM
 To: Caraman Mihai Claudiu-B02008
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc-
 d...@lists.ozlabs.org; qemu-...@nongnu.org
 Subject: Re: [Qemu-ppc] [RFC PATCH 05/17] KVM: PPC: booke: Extend MAS2
 EPN mask for 64-bit
 
 
 On 25.06.2012, at 14:26, Mihai Caraman wrote:
 
> Extend MAS2 EPN mask for 64-bit hosts, to retain most significant
>> bits.
> Change get tlb eaddr to use this mask.
 
 Please see section 6.11.4.8 in the PowerISA 2.06b:
 
 MMU behavior is largely unaffected by whether the thread is in 32-bit
 computation mode (MSRCM=0) or 64- bit computation mode (MSRCM=1). The
 only differ- ences occur in the EPN field of the TLB entry and the EPN
 field of MAS2. The differences are summarized here.
 
*  Executing a tlbwe instruction in 32-bit mode will set bits 0:31
 of the TLB EPN field to zero unless MAS0ATSEL is set, in which case
>> those
 bits are not written to zero.
*  In 32-bit implementations, MAS2U can be used to read or write
 EPN0:31 of MAS2.
 
 So if MSR.CM is not set tlbwe should mask the upper 32 bits out -
>> which
 can happen regardless of CONFIG_64BIT.
>>> 
>>> MAS2_EPN reflects EPN field of MAS2 aka bits 0:51 (for MAV = 1.0)
>> according
>>> to section 6.10.3.10 in the PowerISA 2.06b.
>>> 
>>> MAS2_EPN is not used in tlbwe execution emulation, we have MAS2_VAL
>> define
>>> for this case.
>> 
>> So tlbe->mas2 is guaranteed to have the upper bits be 0 when MSR.CM=0?
> 
> We chose to mask out mas2 upper bits on tlbwe emulation so gtlbe->mas2 will
> respect this but vcpu->arch.shared->mas2 will not. tlb entry selection does 
> not
> require this treatment since EPN upper bits are not taken into consideration 
> anyway.

That's fine. We don't control the contents of shared->mas2 anyway.

> 
>> 
>>> 
 Also, we need to implement MAS2U, to potentially make the upper 32bits
>> of
 MAS2 available, right? But that one isn't as important as the first
>> bit.
>>> 
>>> MAS2U is guest privileged why does it need special care?
>> 
>> Maybe it's mapped to the upper bits of GMAS2 automatically?
> 
> GMAS2?

Ah. The guest has direct control over the real MAS2. Oh well.

> 
>> 
>>> Freescale core Manuals and EREF does not mention MAS2U so I think I our
>> case
>>> it is not implemented.
>> 
>> Please check with a simple mfspr() test on real hw to see if it really
>> isn't implemented.
> 
> I will try this with SPR number 0x277.

Thanks :)


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-08 Thread Anthony Liguori
Rusty Russell  writes:

> (Topic updated, cc's trimmed).
>
> Anthony Liguori  writes:
>> Rusty Russell  writes:
>>> 4) The only significant change to the spec is that we use PCI
>>>capabilities, so we can have infinite feature bits.
>>>(see 
>>> http://lists.linuxfoundation.org/pipermail/virtualization/2011-December/019198.html)
>>
>> We discussed this on IRC last night.  I don't think PCI capabilites are
>> a good mechanism to use...
>>
>> PCI capabilities are there to organize how the PCI config space is
>> allocated to allow vendor extensions to co-exist with future PCI
>> extensions.
>>
>> But we've never used the PCI config space within virtio-pci.  We do
>> everything in BAR0.  I don't think there's any real advantage of using
>> the config space vs. a BAR for virtio-pci.
>
> Note before anyone gets confused; we were talking about using the PCI
> config space to indicate what BAR(s) the virtio stuff is in.  An
> alternative would be to simply specify a new layout format in BAR1.
>
> The arguments for a more flexible format that I know of:
>
> 1) virtio-pci has already extended the pci-specific part of the
>configuration once (for MSI-X), so I don't want to assume it won't
>happen again.

"configuration" is the wrong word here.

The virtio-pci BAR0 layout is:

   0..19   virtio-pci registers
   20+ virtio configuration space

MSI-X needed to add additional virtio-pci registers, so now we have:

   0..19   virtio-pci registers

if MSI-X:
   20..23  virtio-pci MSI-X registers
   24+ virtio configuration space
else:
   20+ virtio configuration space

I agree, this stinks.

But I think we could solve this in a different way.  I think we could
just move the virtio configuration space to BAR1 by using a transport
feature bit.

That then frees up the entire BAR0 for use as virtio-pci registers.  We
can then always include the virtio-pci MSI-X register space and
introduce all new virtio-pci registers as simply being appended.

This new feature bit then becomes essentially a virtio configuration
latch.  When unacked, virtio configuration hides new registers, when
acked, those new registers are exposed.

Another option is to simply put new registers after the virtio
configuration blob.

> 2) ISTR an argument about mapping the ISR register separately, for
>performance, but I can't find a reference to it.

I think the rationale is that ISR really needs to be PIO but everything
else doesn't.  PIO is much faster on x86 because it doesn't require
walking page tables or instruction emulation to handle the exit.

The argument to move the remaining registers to MMIO is to allow 64-bit
accesses to registers which isn't possible with PIO.

>> This maps really nicely to non-PCI transports too.
>
> This isn't right.  Noone else can use the PCI layout.  While parts are
> common, other parts are pci-specific (MSI-X and ISR for example), and
> yet other parts are specified by PCI elsewhere (eg interrupt numbers).
>
>> But extending the
>> PCI config space (especially dealing with capability allocation) is
>> pretty gnarly and there isn't an obvious equivalent outside of PCI.
>
> That's OK, because general changes should be done with feature bits, and
> the others all have an infinite number.  Being the first, virtio-pci has
> some unique limitations we'd like to fix.
>
>> There are very devices that we emulate today that make use of extended
>> PCI device registers outside the platform devices (that have no BARs).
>
> This sentence confused me?

There is a missing "few".  "There are very few devices..."

Extending the PCI configuration space is unusual for PCI devices.  That
was the point.

Regards,

Anthony Liguori

>
> Thanks,
> Rusty.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] kvm: Set default accelerator to "kvm" if the host supports it

2012-10-08 Thread Andreas Färber
Am 05.10.2012 04:24, schrieb Alexander Graf:
> 
> On 05.10.2012, at 04:17, Anthony Liguori wrote:
> 
>> Alexander Graf  writes:
>>
>>> On 03.10.2012, at 22:26, Peter Maydell wrote:
>>>
 On 3 October 2012 21:01, Blue Swirl  wrote:
> On Mon, Oct 1, 2012 at 4:20 PM, Anthony Liguori  
> wrote:
>> Jan Kiszka  writes:
>>> +/* The default accelerator depends on the availability of KVM. 
>>> */
>>> +p = kvm_configured ? "kvm" : "tcg";
>>>}

>> Blue/Aurelien, any objections?
>
> No, maybe a message could be printed that says that the default has
> changed, for a few releases.

 I've lost track of the conversation, are we currently proposing
 the accelerator default to be "kvm" (as per the original patch
 you quote here) or "kvm:tcg" ?

 I'm not entirely sure which I prefer from an ARM perspective
 For some time to come and for a lot of targets (ie any target
 CPU except A15), having a default of "kvm" is going to cause
 existing working commandlines to stop working. [I expect that
 ARM-host qemu binaries will be built with CONFIG_KVM once ARM
 KVM support lands, but the same binary will be run on hosts
 without virtualization extensions.] On the other hand, perhaps
 there just aren't really very many people who run QEMU on
 ARM hosts, and so we can ignore them :-)
>>>
>>> We get similar problems on PPC. Take the following example:
>>>
>>>  $ qemu-system-ppc -M mpc8544ds -kernel uImage -nographic
>>
>> But do you really expect people to do this?  I have to believe that
>> people running on PPC hardware and running qemu-system-ppc most likely
>> want to do KVM...
> 
> Sure. But we wouldn't be able to even tell them what went wrong, as we don't 
> have a negotiation mechanism right now that could tell user space "hey, the 
> CPU you selected is unknown to me".

Would it help to split out the cpu_model -> CPUClass lookup from
cpu_ppc_init() to invoke a hook or inquire a field indicating KVM support?

Andreas

> 
> However, if during cpu init we could add such a check and then fall back to 
> tcg mode if accel=kvm:tcg with a warning, that'd be nice user experience.
> 
> We could do the same for ARM. If you do -M beagle on an A15 KVM enabled 
> machine, you would still be able to do so, but KVM tells you it can't emulate 
> an A8 right now. And if in the future KVM learns how to expose an A8 on A15, 
> we could just not bail out and things would magically work.
> 
> Apart from that, I like the idea of kvm:tcg with a warning as the default for 
> qemu. We should still have a qemu-kvm binary in the distro that does 
> accel=kvm so people don't accidentally fall back to tcg mode.
> 
> 
> Alex
> 


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] kvm: Set default accelerator to "kvm" if the host supports it

2012-10-08 Thread Alexander Graf

On 08.10.2012, at 16:03, Andreas Färber wrote:

> Am 05.10.2012 04:24, schrieb Alexander Graf:
>> 
>> On 05.10.2012, at 04:17, Anthony Liguori wrote:
>> 
>>> Alexander Graf  writes:
>>> 
 On 03.10.2012, at 22:26, Peter Maydell wrote:
 
> On 3 October 2012 21:01, Blue Swirl  wrote:
>> On Mon, Oct 1, 2012 at 4:20 PM, Anthony Liguori  
>> wrote:
>>> Jan Kiszka  writes:
 +/* The default accelerator depends on the availability of 
 KVM. */
 +p = kvm_configured ? "kvm" : "tcg";
   }
> 
>>> Blue/Aurelien, any objections?
>> 
>> No, maybe a message could be printed that says that the default has
>> changed, for a few releases.
> 
> I've lost track of the conversation, are we currently proposing
> the accelerator default to be "kvm" (as per the original patch
> you quote here) or "kvm:tcg" ?
> 
> I'm not entirely sure which I prefer from an ARM perspective
> For some time to come and for a lot of targets (ie any target
> CPU except A15), having a default of "kvm" is going to cause
> existing working commandlines to stop working. [I expect that
> ARM-host qemu binaries will be built with CONFIG_KVM once ARM
> KVM support lands, but the same binary will be run on hosts
> without virtualization extensions.] On the other hand, perhaps
> there just aren't really very many people who run QEMU on
> ARM hosts, and so we can ignore them :-)
 
 We get similar problems on PPC. Take the following example:
 
 $ qemu-system-ppc -M mpc8544ds -kernel uImage -nographic
>>> 
>>> But do you really expect people to do this?  I have to believe that
>>> people running on PPC hardware and running qemu-system-ppc most likely
>>> want to do KVM...
>> 
>> Sure. But we wouldn't be able to even tell them what went wrong, as we don't 
>> have a negotiation mechanism right now that could tell user space "hey, the 
>> CPU you selected is unknown to me".
> 
> Would it help to split out the cpu_model -> CPUClass lookup from
> cpu_ppc_init() to invoke a hook or inquire a field indicating KVM support?

Well, we need to basically determine whether KVM is enabled only after cpu 
creation of the machine file.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


macvlan/macvtap guest to host communication

2012-10-08 Thread Alexander 'Leo' Bergolth
Hi!

I have connected my kvm guest using a macvtap interface and configured a
macvlan interface in bridge mode on the host to allow host to guest
communication.

If using virtio_net in the guest, host to guest transfer works fine. But
in guest to host direction large tcp segments do not arrive at the host
and are retransmitted using smaller segment sizes. (See
iperf-virtio-guest2host_guest.pcap and
iperf-virtio-guest2host_host.pcap. [1])
Disabling tcp segmentation offload at the guest nic fixes that problem.

If I switch to different nics (e1000, rtl8139, etc), checksum offloading
also seems to interfere.
The checksum errors go away after disabling TX checksumming on the
underlying host interface but that's a performance killer. Checksum
errors also occur on other connections, not only on host to guest traffic.

Does macvlan/macvtap networking only work with virtio-net or are there
any tweakings for other guest NICs?

Thanks,
--leo

[1] http://leo.kloburg.at/tmp/kvm-macvtap/

-- 
e-mail   ::: Leo.Bergolth (at) wu.ac.at
fax  ::: +43-1-31336-906050
location ::: IT-Services | Vienna University of Economics | Austria

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-08 Thread Gerd Hoffmann
  Hi,

> But I think we could solve this in a different way.  I think we could
> just move the virtio configuration space to BAR1 by using a transport
> feature bit.

Why hard-code stuff?

I think it makes alot of sense to have a capability simliar to msi-x
which simply specifies bar and offset of the register sets:

[root@fedora ~]# lspci -vvs4
00:04.0 SCSI storage controller: Red Hat, Inc Virtio block device
[ ... ]
Region 0: I/O ports at c000 [size=64]
Region 1: Memory at fc029000 (32-bit) [size=4K]
Capabilities: [40] MSI-X: Enable+ Count=2 Masked-
Vector table: BAR=1 offset=
PBA: BAR=1 offset=0800

So we could have for virtio something like this:

Capabilities: [??] virtio-regs:
legacy: BAR=0 offset=0
virtio-pci: BAR=1 offset=1000
virtio-cfg: BAR=1 offset=1800

> That then frees up the entire BAR0 for use as virtio-pci registers.  We
> can then always include the virtio-pci MSI-X register space and
> introduce all new virtio-pci registers as simply being appended.

BAR0 needs to stay as-is for compatibility reasons.  New devices which
don't have to care about old guests don't need to provide a 'legacy'
register region.

Most devices have mmio at BAR1 for msi-x support anyway, we can place
the virtio-pci and virtio configuration registers there too by default.
 I wouldn't hardcode that though.

> This new feature bit then becomes essentially a virtio configuration
> latch.  When unacked, virtio configuration hides new registers, when
> acked, those new registers are exposed.

I'd just expose them all all the time.

>> 2) ISTR an argument about mapping the ISR register separately, for
>>performance, but I can't find a reference to it.
> 
> I think the rationale is that ISR really needs to be PIO but everything
> else doesn't.  PIO is much faster on x86 because it doesn't require
> walking page tables or instruction emulation to handle the exit.

Is this still a pressing issue?  With MSI-X enabled ISR isn't needed,
correct?  Which would imply that pretty much only old guests without
MSI-X support need this, and we don't need to worry that much when
designing something new ...

cheers,
  Gerd
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-08 Thread Anthony Liguori
Gerd Hoffmann  writes:

>   Hi,
>
>> But I think we could solve this in a different way.  I think we could
>> just move the virtio configuration space to BAR1 by using a transport
>> feature bit.
>
> Why hard-code stuff?
>
> I think it makes alot of sense to have a capability simliar to msi-x
> which simply specifies bar and offset of the register sets:
>
> [root@fedora ~]# lspci -vvs4
> 00:04.0 SCSI storage controller: Red Hat, Inc Virtio block device
> [ ... ]
>   Region 0: I/O ports at c000 [size=64]
>   Region 1: Memory at fc029000 (32-bit) [size=4K]
>   Capabilities: [40] MSI-X: Enable+ Count=2 Masked-
>   Vector table: BAR=1 offset=
>   PBA: BAR=1 offset=0800

MSI-X capability is a standard PCI capability which is why lspci can
parse it.

>
> So we could have for virtio something like this:
>
> Capabilities: [??] virtio-regs:
> legacy: BAR=0 offset=0
> virtio-pci: BAR=1 offset=1000
> virtio-cfg: BAR=1 offset=1800

This would be a vendor specific PCI capability so lspci wouldn't
automatically know how to parse it.

You could just as well teach lspci to parse BAR0 to figure out what
features are supported.

>> That then frees up the entire BAR0 for use as virtio-pci registers.  We
>> can then always include the virtio-pci MSI-X register space and
>> introduce all new virtio-pci registers as simply being appended.
>
> BAR0 needs to stay as-is for compatibility reasons.  New devices which
> don't have to care about old guests don't need to provide a 'legacy'
> register region.

A latch feature bit would allow the format to change without impacting
compatibility at all.

>>> 2) ISTR an argument about mapping the ISR register separately, for
>>>performance, but I can't find a reference to it.
>> 
>> I think the rationale is that ISR really needs to be PIO but everything
>> else doesn't.  PIO is much faster on x86 because it doesn't require
>> walking page tables or instruction emulation to handle the exit.
>
> Is this still a pressing issue?  With MSI-X enabled ISR isn't needed,
> correct?  Which would imply that pretty much only old guests without
> MSI-X support need this, and we don't need to worry that much when
> designing something new ...

It wasn't that long ago that MSI-X wasn't supported..  I think we should
continue to keep ISR as PIO as it is a fast path.

Regards,

Anthony Liguori

>
> cheers,
>   Gerd
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Make emulator_fix_hypercall static

2012-10-08 Thread Marcelo Tosatti
On Thu, Sep 20, 2012 at 07:43:17AM +0200, Jan Kiszka wrote:
> From: Jan Kiszka 
> 
> No users outside of kvm/x86.c.
> 
> Signed-off-by: Jan Kiszka 

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Convert kvm_arch_vcpu_reset into private kvm_vcpu_reset

2012-10-08 Thread Marcelo Tosatti
On Thu, Sep 20, 2012 at 07:43:08AM +0200, Jan Kiszka wrote:
> From: Jan Kiszka 
> 
> There are no external callers of this function as there is no concept of
> resetting a vcpu from generic code.
> 
> Signed-off-by: Jan Kiszka 

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM

2012-10-08 Thread Marcelo Tosatti

On Wed, Sep 19, 2012 at 05:44:46PM +, Auld, Will wrote:
> >From 9982bb73460b05c1328068aae047b14b2294e2da Mon Sep 17 00:00:00 2001
> From: Will Auld 
> Date: Wed, 12 Sep 2012 18:10:56 -0700
> Subject: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
> 
> CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported
> 
> Basic design is to emulate the MSR by allowing reads and writes to a guest 
> vcpu specific location to store the value of the emulated MSR while adding 
> the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will 
> be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This 
> is of course as long as the "use TSC counter offsetting" VM-execution control 
> is enabled as well as the IA32_TSC_ADJUST control.
> 
> However, because hardware will only return the TSC + IA32_TSC_ADJUST + vmsc 
> tsc_offset for a guest process when it does and rdtsc (with the correct 
> settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one 
> of these three locations. The argument against storing it in the actual MSR 
> is performance. This is likely to be seldom used while the save/restore is 
> required on every transition. IA32_TSC_ADJUST was created as a way to solve 
> some issues with writing TSC itself so that is not an option either. The 
> remaining option, defined above as our solution has the problem of returning 
> incorrect vmcs tsc_offset values (unless we intercept and fix, not done here) 
> as mentioned above. However, more problematic is that storing the data in 
> vmcs tsc_offset will have a different semantic effect on the system than does 
> using the actual MSR. This is illustrated in the following example: The 
> hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a guest 
> process perfor!
>  ms a rdtsc. In this case the guest process will get TSC + 
> IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including IA32_TSC_ADJUST_guest. 
> While the total system semantics changed the semantics as seen by the guest 
> do not and hence this will not cause a problem.
> ---
>  arch/x86/include/asm/cpufeature.h |1 +
>  arch/x86/include/asm/kvm_host.h   |2 ++
>  arch/x86/include/asm/msr-index.h  |1 +
>  arch/x86/kvm/cpuid.c  |4 ++--
>  arch/x86/kvm/vmx.c|   12 
>  arch/x86/kvm/x86.c|1 +
>  6 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/cpufeature.h 
> b/arch/x86/include/asm/cpufeature.h
> index 6b7ee5f..e574d81 100644
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -199,6 +199,7 @@
>  
>  /* Intel-defined CPU features, CPUID level 0x0007:0 (ebx), word 9 */
>  #define X86_FEATURE_FSGSBASE (9*32+ 0) /* {RD/WR}{FS/GS}BASE instructions*/
> +#define X86_FEATURE_TSC_ADJUST  (9*32+ 1) /* TSC adjustment MSR 0x3b */
>  #define X86_FEATURE_BMI1 (9*32+ 3) /* 1st group bit manipulation 
> extensions */
>  #define X86_FEATURE_HLE  (9*32+ 4) /* Hardware Lock Elision */
>  #define X86_FEATURE_AVX2 (9*32+ 5) /* AVX2 instructions */
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 09155d6..8a001a4 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -442,6 +442,8 @@ struct kvm_vcpu_arch {
>   u32 virtual_tsc_mult;
>   u32 virtual_tsc_khz;
>  
> + s64 tsc_adjust;
> +
>   atomic_t nmi_queued;  /* unprocessed asynchronous NMIs */
>   unsigned nmi_pending; /* NMI queued after currently running handler */
>   bool nmi_injected;/* Trying to inject an NMI this entry */
> diff --git a/arch/x86/include/asm/msr-index.h 
> b/arch/x86/include/asm/msr-index.h
> index 957ec87..8e82e29 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -231,6 +231,7 @@
>  #define MSR_IA32_EBL_CR_POWERON  0x002a
>  #define MSR_EBC_FREQUENCY_ID 0x002c
>  #define MSR_IA32_FEATURE_CONTROL0x003a
> +#define MSR_TSC_ADJUST   0x003b
>  
>  #define FEATURE_CONTROL_LOCKED   (1<<0)
>  #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX (1<<1)
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 0595f13..8f5943e 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -248,8 +248,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, 
> u32 function,
>  
>   /* cpuid 7.0.ebx */
>   const u32 kvm_supported_word9_x86_features =
> - F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
> - F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
> + F(FSGSBASE) | F(TSC_ADJUST) | F(BMI1) | F(HLE) |
> + F(AVX2) | F(SMEP) | F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
>  
>   /* all calls to cpuid_count() should be made on the same cpu */
>   get_cpu();
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86

Re: [Question] Intercept CR3 access in EPT

2012-10-08 Thread Marcelo Tosatti
On Mon, Oct 08, 2012 at 04:15:57PM +0800, R wrote:
> Hi,
> 
> I am a student. And my teacher told me to monitor every process in guest.
> So, I try to intercept every Cr3 access. However, if kvm is loaded
> with EPT enable, Acesses to  Cr3  would not cause VM-exit.

Disable EPT by loading kvm-intel.ko module with enable_ept=0 parameter.
Then, CR3 accesses will trap.

> I modified the code to change vmcs configuration.
> To be specific, these functions are rewirted.
> static void ept_update_paging_mode_cr0(unsigned long *hw_cr0,
>   unsigned long cr0,
>   struct kvm_vcpu *vcpu)
> {
> 
> } else if (!is_paging(vcpu)) {
>   /* From nonpaging to paging */
>   vmcs_write32(CPU_BASED_VM_EXEC_CONTROL,
>vmcs_read32(CPU_BASED_VM_EXEC_CONTROL) &
> -  ~(CPU_BASED_CR3_LOAD_EXITING |
> +   ~(//   CPU_BASED_CR3_LOAD_EXITING|
>  CPU_BASED_CR3_STORE_EXITING));
>   
> }
> 
> static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
> {
>  ...
> if (_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) {
>   /* CR3 accesses and invlpg don't need to cause VM Exits when EPT
>  enabled */
> - _cpu_based_exec_control &= ~(CPU_BASED_CR3_LOAD_EXITING |
> +  _cpu_based_exec_control &= ~( //
> CPU_BASED_CR3_LOAD_EXITING |
>CPU_BASED_CR3_STORE_EXITING |
>CPU_BASED_INVLPG_EXITING);
> 
> }
> 
> I though it can force every Cr3 access to be trapped with EPT enable.
> However, VM seems to fail to boot when it changes from nonpaging to
> paging.
> Do U guys have any idea? Or Can someone tell me how can I intercept
> Cr3 access and why can not it work?
> 
> Thank U for answering.

> 
> -- 
> Thanks
> Rui Wu
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Failed to get host power management capabilities

2012-10-08 Thread Marcelo Tosatti
On Fri, Oct 05, 2012 at 06:22:58AM -0600, David Torres wrote:
> Hi all,
> 
> My name is David Torres, I am from Costa Rica. See this is the problem I have 
> with the KVM instalation:
> 
> 2012-10-03 20:28:17.395+: 25793: warning : qemuCapsInit:856 : Failed to 
> get host power management capabilities
> 2012-10-03 20:28:17.661+: 25793: error : virExecWithHook:328 : Cannot 
> find 'pm-is-supported' in path: No such file or directory
> 
> And this error from the kern.log:
> 
> Oct  4 21:50:53 kvm kernel: [22727.849902] device vnet0 entered promiscuous 
> mode
> Oct  4 21:50:53 kvm kernel: [22727.883686] br0: port 2(vnet0) entering 
> forwarding state
> Oct  4 21:50:53 kvm kernel: [22727.883692] br0: port 2(vnet0) entering 
> forwarding state
> Oct  4 21:50:53 kvm kernel: [22728.130242] br0: port 2(vnet0) entering 
> forwarding state
> Oct  4 21:50:53 kvm kernel: [22728.134443] br0: port 2(vnet0) entering 
> disabled state
> Oct  4 21:50:53 kvm kernel: [22728.135238] device vnet0 left promiscuous mode
> Oct  4 21:50:53 kvm kernel: [22728.135242] br0: port 2(vnet0) entering 
> disabled state
> Oct  4 21:50:54 kvm kernel: [22728.673620] type=1400 
> audit(1349409054.320:42): apparmor="STATUS" operation="profile_remove" 
> name="libvirt-9b75f498-7959-7321-9461-d729d9c60668" pid=6349 
> comm="apparmor_parser"
> 
> 
> And when I try to create a Virtual Machine from the Virtual Machine Monitor, 
> I got this error message:
> 
> 2012-10-04 22:59:32.154+: 1333: error : qemuProcessReadLogOutput:1006 : 
> internal error Process exited while reading console log output: char device 
> redirected to /dev/pts/2
> Could not access KVM kernel module: Is a directory
> failed to initialize KVM: Is a directory
> No accelerator found!
> 
> 
> I already enabled the virtualization feature on the BIOS and run the modprobe 
> kvm-intel command also.
> No error message was received during the instalation process. 
> 
> So I am stuck on this problem :( I will appreciate sooo much any help.
> 
> 
> Thank you so much in advanced 
> 
> 
> Regards

David,

Please send this message to the libvirt list, at libvir-l...@redhat.com.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Steal time in KVM

2012-10-08 Thread Marcelo Tosatti
On Mon, Oct 08, 2012 at 02:55:25AM -0500, Abhishek Gupta wrote:
> Hi,
> 
> I am trying to get the steal time with 2 VMs (each with 1 Vcpu) pinned
> to same core.
> 
> While finding documentation on this, I came across your patches and
> posts related to the implementation of this feature, so I thought it
> would be best to ask you.
> 
> I run the same application on these 2 VMs simultaneously and see the
> performance difference. I am trying to read the steal time from inside
> the guest using top, vmstat etc.
> 
> Both, top and vmstat -s report the steal time (st) as 0. I also
> checked that procps is in latest version. I am using virtio-net. I
> suspect that the steal time is not being updated well. Is there
> something which I need to configure for this to work? My Linux version
> for guest image is:
> 
> Linux server-147 2.6.35-24-virtual #42-Ubuntu SMP Thu Dec 2 05:15:26
> UTC 2010 x86_64 GNU/Linux
> 
> And /proc/cpuinfo shows i:
> 
> processor : 0
> vendor_id : GenuineIntel
> cpu family: 6
> model : 2
> model name: QEMU Virtual CPU version 0.14.0
> stepping  : 3
> cpu MHz   : 2992.498
> cache size: 4096 KB
> fpu   : yes
> fpu_exception : yes
> cpuid level   : 4
> wp: yes
> flags : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
> pse36 clflush mmx fxsr sse sse2 syscall nx lm up rep_good pni cx16
> hypervisor lahf_lm
 bogomips   : 5984.99
> clflush size  : 64
> cache_alignment   : 64
> address sizes : 40 bits physical, 48 bits virtual
> power management:
> 
> 
> Another question I had was that is there a way to programmatically get
> the value of steal cycles (e.g. from a  C program) ?
> Thanks,
> 
> Abhishek

Make sure CONFIG_SCHEDSTATS is enabled in your host kernel.

> On Monday, June 13, 2011 6:40:02 PM UTC-5, Glauber Costa wrote:
> > Hi,
> >
> > This series is a repost of the last series I posted about this.
> > It tries to address most concerns that were raised at the time,
> > plus makes uses of the static_branch interface to disable the
> > steal code when not in use.
> >
> >
> > Glauber Costa (7):
> >   KVM-HDR Add constant to represent KVM MSRs enabled bit
> >   KVM-HDR: KVM Steal time implementation
> >   KVM-HV: KVM Steal time implementation
> >   KVM-GST: Add a pv_ops stub for steal time
> >   KVM-GST: KVM Steal time accounting
> >   KVM-GST: adjust scheduler cpu power
> >   KVM-GST: KVM Steal time registration
> >
> >  Documentation/kernel-parameters.txt   |4 ++
> >  Documentation/virtual/kvm/msr.txt |   33 +
> >  arch/x86/Kconfig  |   12 +
> >  arch/x86/include/asm/kvm_host.h   |8 +++
> >  arch/x86/include/asm/kvm_para.h   |   15 ++
> >  arch/x86/include/asm/paravirt.h   |9 
> >  arch/x86/include/asm/paravirt_types.h |1 +
> >  arch/x86/kernel/kvm.c |   72 +
> >  arch/x86/kernel/kvmclock.c|2 +
> >  arch/x86/kernel/paravirt.c|9 
> >  arch/x86/kvm/x86.c|   60 +++-
> >  kernel/sched.c|   81 
> > +
> >  kernel/sched_features.h   |4 +-
> >  13 files changed, 296 insertions(+), 14 deletions(-)
> >
> > --
> > 1.7.3.4
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> 
> 
> 
> On Monday, June 13, 2011 6:40:02 PM UTC-5, Glauber Costa wrote:
> > Hi,
> >
> > This series is a repost of the last series I posted about this.
> > It tries to address most concerns that were raised at the time,
> > plus makes uses of the static_branch interface to disable the
> > steal code when not in use.
> >
> >
> > Glauber Costa (7):
> >   KVM-HDR Add constant to represent KVM MSRs enabled bit
> >   KVM-HDR: KVM Steal time implementation
> >   KVM-HV: KVM Steal time implementation
> >   KVM-GST: Add a pv_ops stub for steal time
> >   KVM-GST: KVM Steal time accounting
> >   KVM-GST: adjust scheduler cpu power
> >   KVM-GST: KVM Steal time registration
> >
> >  Documentation/kernel-parameters.txt   |4 ++
> >  Documentation/virtual/kvm/msr.txt |   33 +
> >  arch/x86/Kconfig  |   12 +
> >  arch/x86/include/asm/kvm_host.h   |8 +++
> >  arch/x86/include/asm/kvm_para.h   |   15 ++
> >  arch/x86/include/asm/paravirt.h   |9 
> >  arch/x86/include/asm/paravirt_types.h |1 +
> >  arch/x86/kernel/kvm.c |   72 +
> >  arch/x86/kernel/kvmclock.c|2 +
> >  arch/x86/kernel/paravirt.c|9 
> >  arch/x86/kvm/x86.c|   60 +++-
> >  kernel/sched.c|   81 
> > +

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-08 Thread Michael S. Tsirkin
On Wed, Oct 03, 2012 at 04:14:17PM +0930, Rusty Russell wrote:
> "Michael S. Tsirkin"  writes:
> 
> > Thinking about Sasha's patches, we can reduce ring usage
> > for virtio net small packets dramatically if we put
> > virtio net header inline with the data.
> > This can be done for free in case guest net stack allocated
> > extra head room for the packet, and I don't see
> > why would this have any downsides.
> 
> I've been wanting to do this for the longest time... but...
> 
> > Even though with my recent patches qemu
> > no longer requires header to be the first s/g element,
> > we need a new feature bit to detect this.
> > A trivial qemu patch will be sent separately.
> 
> There's a reason I haven't done this.  I really, really dislike "my
> implemention isn't broken" feature bits.  We could have an infinite
> number of them, for each bug in each device.
> 
> So my plan was to tie this assumption to the new PCI layout.

I don't object but old qemu has this limitation for s390 as well,
and that's not using PCI, right? So how do we detect
new hypervisor there?

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Steal time in KVM

2012-10-08 Thread Abhishek Gupta
Thanks for the answer. I am more of an application user. Is there a
quick way to check this flag and enable it if its disabled?
Abhishek


On Mon, Oct 8, 2012 at 2:39 PM, Marcelo Tosatti  wrote:
> On Mon, Oct 08, 2012 at 02:55:25AM -0500, Abhishek Gupta wrote:
>> Hi,
>>
>> I am trying to get the steal time with 2 VMs (each with 1 Vcpu) pinned
>> to same core.
>>
>> While finding documentation on this, I came across your patches and
>> posts related to the implementation of this feature, so I thought it
>> would be best to ask you.
>>
>> I run the same application on these 2 VMs simultaneously and see the
>> performance difference. I am trying to read the steal time from inside
>> the guest using top, vmstat etc.
>>
>> Both, top and vmstat -s report the steal time (st) as 0. I also
>> checked that procps is in latest version. I am using virtio-net. I
>> suspect that the steal time is not being updated well. Is there
>> something which I need to configure for this to work? My Linux version
>> for guest image is:
>>
>> Linux server-147 2.6.35-24-virtual #42-Ubuntu SMP Thu Dec 2 05:15:26
>> UTC 2010 x86_64 GNU/Linux
>>
>> And /proc/cpuinfo shows i:
>>
>> processor : 0
>> vendor_id : GenuineIntel
>> cpu family: 6
>> model : 2
>> model name: QEMU Virtual CPU version 0.14.0
>> stepping  : 3
>> cpu MHz   : 2992.498
>> cache size: 4096 KB
>> fpu   : yes
>> fpu_exception : yes
>> cpuid level   : 4
>> wp: yes
>> flags : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
>> pse36 clflush mmx fxsr sse sse2 syscall nx lm up rep_good pni cx16
>> hypervisor lahf_lm
>  bogomips   : 5984.99
>> clflush size  : 64
>> cache_alignment   : 64
>> address sizes : 40 bits physical, 48 bits virtual
>> power management:
>>
>>
>> Another question I had was that is there a way to programmatically get
>> the value of steal cycles (e.g. from a  C program) ?
>> Thanks,
>>
>> Abhishek
>
> Make sure CONFIG_SCHEDSTATS is enabled in your host kernel.
>
>> On Monday, June 13, 2011 6:40:02 PM UTC-5, Glauber Costa wrote:
>> > Hi,
>> >
>> > This series is a repost of the last series I posted about this.
>> > It tries to address most concerns that were raised at the time,
>> > plus makes uses of the static_branch interface to disable the
>> > steal code when not in use.
>> >
>> >
>> > Glauber Costa (7):
>> >   KVM-HDR Add constant to represent KVM MSRs enabled bit
>> >   KVM-HDR: KVM Steal time implementation
>> >   KVM-HV: KVM Steal time implementation
>> >   KVM-GST: Add a pv_ops stub for steal time
>> >   KVM-GST: KVM Steal time accounting
>> >   KVM-GST: adjust scheduler cpu power
>> >   KVM-GST: KVM Steal time registration
>> >
>> >  Documentation/kernel-parameters.txt   |4 ++
>> >  Documentation/virtual/kvm/msr.txt |   33 +
>> >  arch/x86/Kconfig  |   12 +
>> >  arch/x86/include/asm/kvm_host.h   |8 +++
>> >  arch/x86/include/asm/kvm_para.h   |   15 ++
>> >  arch/x86/include/asm/paravirt.h   |9 
>> >  arch/x86/include/asm/paravirt_types.h |1 +
>> >  arch/x86/kernel/kvm.c |   72 +
>> >  arch/x86/kernel/kvmclock.c|2 +
>> >  arch/x86/kernel/paravirt.c|9 
>> >  arch/x86/kvm/x86.c|   60 +++-
>> >  kernel/sched.c|   81 
>> > +
>> >  kernel/sched_features.h   |4 +-
>> >  13 files changed, 296 insertions(+), 14 deletions(-)
>> >
>> > --
>> > 1.7.3.4
>> >
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> > the body of a message to majord...@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> > Please read the FAQ at  http://www.tux.org/lkml/
>>
>>
>>
>> On Monday, June 13, 2011 6:40:02 PM UTC-5, Glauber Costa wrote:
>> > Hi,
>> >
>> > This series is a repost of the last series I posted about this.
>> > It tries to address most concerns that were raised at the time,
>> > plus makes uses of the static_branch interface to disable the
>> > steal code when not in use.
>> >
>> >
>> > Glauber Costa (7):
>> >   KVM-HDR Add constant to represent KVM MSRs enabled bit
>> >   KVM-HDR: KVM Steal time implementation
>> >   KVM-HV: KVM Steal time implementation
>> >   KVM-GST: Add a pv_ops stub for steal time
>> >   KVM-GST: KVM Steal time accounting
>> >   KVM-GST: adjust scheduler cpu power
>> >   KVM-GST: KVM Steal time registration
>> >
>> >  Documentation/kernel-parameters.txt   |4 ++
>> >  Documentation/virtual/kvm/msr.txt |   33 +
>> >  arch/x86/Kconfig  |   12 +
>> >  arch/x86/include/asm/kvm_host.h   |8 +++
>> >  arch/x86/include/asm/kvm_para.h   |   15 ++
>> >  arch/x86/include/asm/paravirt.h   |9 
>> >  arch/x86/include/asm/paravirt_types.h |1 +
>> >  ar

Re: Steal time in KVM

2012-10-08 Thread Abhishek Gupta
I think this flag is enabled since I see that there is some information

cat /proc/schedstat

version 15
timestamp 4332448322
cpu0 12 0 65142011 32031105 32750248 22717261 2447620065997 58787405876 32964108
domain0 ,,,,,,,0005
3254935 3249367 2404 4821929 3229 0 96 3249269 6990 6986 0 2577 4 0 0
6986 23305437 22926698 242127 354902327 136655 0 22841 22903857 0 0 0
0 0 0 0 0 0 724375 51388 0
domain1 ,,,,,,,000f
3184279 3170371 13759 16042900 159 9 691 3169674 6836 6827 9 10875 0 0
2 410 23168825 22679032 475057 534512726 14817 23 66639 22612393 6 0 6
0 0 0 0 0 0 1884449 6647 0
cpu1 0 0 2140210892 1069012296 1070207005 1067616970 4593016088954
1172134807032 1071029875
domain0 ,,,,,,,000a
4637501 4632524 2203 6211781 2780 1 2 4632522 6282 6281 1 856 0 0 0
6281 14462483 14303529 126376 156661372 32595 0 9195 14294334 0 0 0 0
0 0 0 0 0 1234313 30153 0
domain1 ,,,,,,,000f
5766088 5765346 577 841712 179 2 110 5765235 8803 8803 0 0 0 0 0 1514
14429905 14181972 228676 264974439 19576 0 35872 14146100 1 0 1 0 0 0
0 0 0 7622572 16252 0
cpu2 0 0 52577101 25565275 26801121 24132631 812994237346 43152983079 26937729
domain0 ,,,,,,,0005
1125081 1123749 733 2378290 602 0 8 1123742 13138 13137 1 2018 0 0 0
13137 23954299 23750670 88041 190709270 115590 2 10688 23739982 0 0 0
0 0 0 0 0 0 979865 57090 0
domain1 ,,,,,,,000f
1052559 1052538 15 114576 7 0 0 25607 13113 13113 0 0 0 0 0 55
23838711 23200171 613151 756023415 25505 0 52756 23147415 0 0 0 0 0 0
0 0 0 597157 46792 0
cpu3 112320 0 28758268 13561833 14268286 12145196 2012983455891
88768447010 14923733
domain0 ,,,,,,,000a
1705787 1704495 727 5715943 571 2 6 1704489 22311 22311 0 1476 0 0 0
22311 12332932 12230181 67994 135432986 34760 4 7662 1519 1 0 1 0
0 0 0 0 0 997198 25844 0
domain1 ,,,,,,,000f
1873701 1873686 8 15157 5 0 0 126733 30014 30014 0 0 0 0 0 48 12298175
11963494 304127 386917447 32071 0 44945 11918549 0 0 0 0 0 0 0 0 0
3374661 31194 0

Abhishek


On Mon, Oct 8, 2012 at 2:42 PM, Abhishek Gupta  wrote:
> Thanks for the answer. I am more of an application user. Is there a
> quick way to check this flag and enable it if its disabled?
> Abhishek
>
>
> On Mon, Oct 8, 2012 at 2:39 PM, Marcelo Tosatti  wrote:
>> On Mon, Oct 08, 2012 at 02:55:25AM -0500, Abhishek Gupta wrote:
>>> Hi,
>>>
>>> I am trying to get the steal time with 2 VMs (each with 1 Vcpu) pinned
>>> to same core.
>>>
>>> While finding documentation on this, I came across your patches and
>>> posts related to the implementation of this feature, so I thought it
>>> would be best to ask you.
>>>
>>> I run the same application on these 2 VMs simultaneously and see the
>>> performance difference. I am trying to read the steal time from inside
>>> the guest using top, vmstat etc.
>>>
>>> Both, top and vmstat -s report the steal time (st) as 0. I also
>>> checked that procps is in latest version. I am using virtio-net. I
>>> suspect that the steal time is not being updated well. Is there
>>> something which I need to configure for this to work? My Linux version
>>> for guest image is:
>>>
>>> Linux server-147 2.6.35-24-virtual #42-Ubuntu SMP Thu Dec 2 05:15:26
>>> UTC 2010 x86_64 GNU/Linux
>>>
>>> And /proc/cpuinfo shows i:
>>>
>>> processor : 0
>>> vendor_id : GenuineIntel
>>> cpu family: 6
>>> model : 2
>>> model name: QEMU Virtual CPU version 0.14.0
>>> stepping  : 3
>>> cpu MHz   : 2992.498
>>> cache size: 4096 KB
>>> fpu   : yes
>>> fpu_exception : yes
>>> cpuid level   : 4
>>> wp: yes
>>> flags : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
>>> pse36 clflush mmx fxsr sse sse2 syscall nx lm up rep_good pni cx16
>>> hypervisor lahf_lm
>>  bogomips   : 5984.99
>>> clflush size  : 64
>>> cache_alignment   : 64
>>> address sizes : 40 bits physical, 48 bits virtual
>>> power management:
>>>
>>>
>>> Another question I had was that is there a way to programmatically get
>>> the value of steal cycles (e.g. from a  C program) ?
>>> Thanks,
>>>
>>> Abhishek
>>
>> Make sure CONFIG_SCHEDSTATS is enabled in your host kernel.
>>
>>> On Monday, June 13, 2011 6:40:02 PM UTC-5, Glauber Costa wrote:
>>> > Hi,
>>> >
>>> > This series is a repost of the last series I posted about this.
>>> > It tries to address most concerns that were raised at the time,
>>> > plus makes uses of the static_branch interface to disable the
>>> > steal code when not in use.
>>> >
>>> >
>>> > Glauber Costa (7):
>>> >   KVM-HDR Add constant to represent KVM MSRs enabled bit
>

Re: Steal time in KVM

2012-10-08 Thread Marcelo Tosatti
On Mon, Oct 08, 2012 at 02:47:59PM -0500, Abhishek Gupta wrote:
> I think this flag is enabled since I see that there is some information
> 
> cat /proc/schedstat

This is in the host? Then, yes, the host has schedstat enabled.

Definition of steal time: the amount of time in which this vCPU did not
run.

So with some CPU load on the host system, you should see "steal time"
!= 0 in the guest system.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Steal time in KVM

2012-10-08 Thread Abhishek Gupta
Yes, that is in the host, but in the guest I always see "steal time"
=0. I checked through /proc/stat, mpstat, top, vmstat.

Not sure, why the steal time information is not getting propagated to
the guest.

Thanks,
Abhishek


On Mon, Oct 8, 2012 at 3:02 PM, Marcelo Tosatti  wrote:
> On Mon, Oct 08, 2012 at 02:47:59PM -0500, Abhishek Gupta wrote:
>> I think this flag is enabled since I see that there is some information
>>
>> cat /proc/schedstat
>
> This is in the host? Then, yes, the host has schedstat enabled.
>
> Definition of steal time: the amount of time in which this vCPU did not
> run.
>
> So with some CPU load on the host system, you should see "steal time"
> != 0 in the guest system.
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-08 Thread Gerd Hoffmann
  Hi,

>> So we could have for virtio something like this:
>>
>> Capabilities: [??] virtio-regs:
>> legacy: BAR=0 offset=0
>> virtio-pci: BAR=1 offset=1000
>> virtio-cfg: BAR=1 offset=1800
> 
> This would be a vendor specific PCI capability so lspci wouldn't
> automatically know how to parse it.

Sure, would need a patch to actually parse+print the cap,
/me was just trying to make my point clear in a simple way.

 2) ISTR an argument about mapping the ISR register separately, for
performance, but I can't find a reference to it.
>>>
>>> I think the rationale is that ISR really needs to be PIO but everything
>>> else doesn't.  PIO is much faster on x86 because it doesn't require
>>> walking page tables or instruction emulation to handle the exit.
>>
>> Is this still a pressing issue?  With MSI-X enabled ISR isn't needed,
>> correct?  Which would imply that pretty much only old guests without
>> MSI-X support need this, and we don't need to worry that much when
>> designing something new ...
> 
> It wasn't that long ago that MSI-X wasn't supported..  I think we should
> continue to keep ISR as PIO as it is a fast path.

No problem if we allow to have both legacy layout and new layout at the
same time.  Guests can continue to use ISR @ BAR0 in PIO space for
existing virtio devices, even in case they want use mmio for other
registers -> all fine.

New virtio devices can support MSI-X from day one and decide to not
expose a legacy layout PIO bar.

cheers,
  Gerd

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] virtio-net: inline header support

2012-10-08 Thread Michael S. Tsirkin
On Thu, Oct 04, 2012 at 01:04:56PM +0930, Rusty Russell wrote:
> Anthony Liguori  writes:
> > Rusty Russell  writes:
> >
> >> "Michael S. Tsirkin"  writes:
> >>
> >>> Thinking about Sasha's patches, we can reduce ring usage
> >>> for virtio net small packets dramatically if we put
> >>> virtio net header inline with the data.
> >>> This can be done for free in case guest net stack allocated
> >>> extra head room for the packet, and I don't see
> >>> why would this have any downsides.
> >>
> >> I've been wanting to do this for the longest time... but...
> >>
> >>> Even though with my recent patches qemu
> >>> no longer requires header to be the first s/g element,
> >>> we need a new feature bit to detect this.
> >>> A trivial qemu patch will be sent separately.
> >>
> >> There's a reason I haven't done this.  I really, really dislike "my
> >> implemention isn't broken" feature bits.  We could have an infinite
> >> number of them, for each bug in each device.
> >
> > This is a bug in the specification.
> >
> > The QEMU implementation pre-dates the specification.  All of the actual
> > implementations of virtio relied on the semantics of s/g elements and
> > still do.
> 
> lguest fix is pending in my queue.  lkvm and qemu are broken; lkvm isn't
> ever going to be merged, so I'm not sure what its status is?  But I'm
> determined to fix qemu, and hence my torture patch to make sure this
> doesn't creep in again.

If you look at my patch you'll notice there's also a
comment in virtio_net.h that seems to be broken in this respect:

/* This is the first element of the scatter-gather list.  If you don't
 * specify GSO or CSUM features, you can simply ignore the header. */

There is a similar comment in virtio-blk.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: PPC: Add SPR emulation exits

2012-10-08 Thread Scott Wood

On 10/07/2012 08:30:06 AM, Alexander Graf wrote:


On 07.10.2012, at 15:26, Avi Kivity wrote:

> The downside of this generic approach is that it prepares suprises  
down

> the road.  The alternative approach, of adding a new KVM_EXIT_RESET,
> avoids this minefield, but requires ABI changes every time we want  
to

> emulate something in userspace.  Can you provide a critique of this
> alternate approach?

Yeah, it doesn't scale as well. The SPR read/write give us all  
information we need to emulate other registers too, like the magical  
"read this SPR and automatically get the interrupt vector from the  
MPIC and ack the interrupt along the way" register we have on e500.


That's not actually how the register works in hardware (though it may  
be a reasonable way to emulate it with a userspace mpic).  The  
interrupt is acknowledged when the core branches to the interrupt  
vector.  The register itself is just storage that gets filled when that  
happens.


-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] KVM: PPC: Remove 44x target

2012-10-08 Thread Scott Wood

On 10/07/2012 08:58:02 AM, Alexander Graf wrote:
diff --git a/arch/powerpc/kvm/bookehv_interrupts.S  
b/arch/powerpc/kvm/bookehv_interrupts.S

index 099fe82..4421293 100644
--- a/arch/powerpc/kvm/bookehv_interrupts.S
+++ b/arch/powerpc/kvm/bookehv_interrupts.S
@@ -23,7 +23,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 


This could come out of here regardless of whether 44x stays.


diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 7d120dc..f522110 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -158,10 +158,7 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
}
case KVM_HCALL_TOKEN(KVM_HC_FEATURES):
r = EV_SUCCESS;
-#if defined(CONFIG_PPC_BOOK3S) || defined(CONFIG_KVM_E500V2)
-   /* XXX Missing magic page on 44x */
r2 |= (1 << KVM_FEATURE_MAGIC_PAGE);
-#endif


We also don't support this on e500mc -- or at least it hasn't been  
tested, and would be pretty pointless nothing in the magic page traps  
on e500mc.


-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] KVM: PPC: Remove 44x target

2012-10-08 Thread Alexander Graf

On 08.10.2012, at 22:50, Scott Wood wrote:

> On 10/07/2012 08:58:02 AM, Alexander Graf wrote:
>> diff --git a/arch/powerpc/kvm/bookehv_interrupts.S 
>> b/arch/powerpc/kvm/bookehv_interrupts.S
>> index 099fe82..4421293 100644
>> --- a/arch/powerpc/kvm/bookehv_interrupts.S
>> +++ b/arch/powerpc/kvm/bookehv_interrupts.S
>> @@ -23,7 +23,6 @@
>> #include 
>> #include 
>> #include 
>> -#include 
>> #include 
>> #include 
>> #include 
> 
> This could come out of here regardless of whether 44x stays.

:)

> 
>> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
>> index 7d120dc..f522110 100644
>> --- a/arch/powerpc/kvm/powerpc.c
>> +++ b/arch/powerpc/kvm/powerpc.c
>> @@ -158,10 +158,7 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
>>  }
>>  case KVM_HCALL_TOKEN(KVM_HC_FEATURES):
>>  r = EV_SUCCESS;
>> -#if defined(CONFIG_PPC_BOOK3S) || defined(CONFIG_KVM_E500V2)
>> -/* XXX Missing magic page on 44x */
>>  r2 |= (1 << KVM_FEATURE_MAGIC_PAGE);
>> -#endif
> 
> We also don't support this on e500mc -- or at least it hasn't been tested, 
> and would be pretty pointless nothing in the magic page traps on e500mc.

Ah, good point!


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-08 Thread Anthony Liguori
Gerd Hoffmann  writes:

>   Hi,
>
>>> So we could have for virtio something like this:
>>>
>>> Capabilities: [??] virtio-regs:
>>> legacy: BAR=0 offset=0
>>> virtio-pci: BAR=1 offset=1000
>>> virtio-cfg: BAR=1 offset=1800
>> 
>> This would be a vendor specific PCI capability so lspci wouldn't
>> automatically know how to parse it.
>
> Sure, would need a patch to actually parse+print the cap,
> /me was just trying to make my point clear in a simple way.
>
> 2) ISTR an argument about mapping the ISR register separately, for
>performance, but I can't find a reference to it.

 I think the rationale is that ISR really needs to be PIO but everything
 else doesn't.  PIO is much faster on x86 because it doesn't require
 walking page tables or instruction emulation to handle the exit.
>>>
>>> Is this still a pressing issue?  With MSI-X enabled ISR isn't needed,
>>> correct?  Which would imply that pretty much only old guests without
>>> MSI-X support need this, and we don't need to worry that much when
>>> designing something new ...
>> 
>> It wasn't that long ago that MSI-X wasn't supported..  I think we should
>> continue to keep ISR as PIO as it is a fast path.
>
> No problem if we allow to have both legacy layout and new layout at the
> same time.  Guests can continue to use ISR @ BAR0 in PIO space for
> existing virtio devices, even in case they want use mmio for other
> registers -> all fine.
>
> New virtio devices can support MSI-X from day one and decide to not
> expose a legacy layout PIO bar.

I think having BAR1 be an MMIO mirror of the registers + a BAR2 for
virtio configuration space is probably not that bad of a solution.

Regards,

Anthony Liguori

>
> cheers,
>   Gerd
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: PPC: Add SPR emulation exits

2012-10-08 Thread Alexander Graf

On 08.10.2012, at 22:45, Scott Wood wrote:

> On 10/07/2012 08:30:06 AM, Alexander Graf wrote:
>> On 07.10.2012, at 15:26, Avi Kivity wrote:
>> > The downside of this generic approach is that it prepares suprises down
>> > the road.  The alternative approach, of adding a new KVM_EXIT_RESET,
>> > avoids this minefield, but requires ABI changes every time we want to
>> > emulate something in userspace.  Can you provide a critique of this
>> > alternate approach?
>> Yeah, it doesn't scale as well. The SPR read/write give us all information 
>> we need to emulate other registers too, like the magical "read this SPR and 
>> automatically get the interrupt vector from the MPIC and ack the interrupt 
>> along the way" register we have on e500.
> 
> That's not actually how the register works in hardware (though it may be a 
> reasonable way to emulate it with a userspace mpic).  The interrupt is 
> acknowledged when the core branches to the interrupt vector.  The register 
> itself is just storage that gets filled when that happens.

Mind to enlighten me again on how exactly this mode gets enabled so that an OS 
that does not make use of the SPR can still ask the MPIC by hand :)?


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: PPC: Add SPR emulation exits

2012-10-08 Thread Scott Wood

On 10/08/2012 04:01:11 PM, Alexander Graf wrote:


On 08.10.2012, at 22:45, Scott Wood wrote:

> On 10/07/2012 08:30:06 AM, Alexander Graf wrote:
>> On 07.10.2012, at 15:26, Avi Kivity wrote:
>> > The downside of this generic approach is that it prepares  
suprises down
>> > the road.  The alternative approach, of adding a new  
KVM_EXIT_RESET,
>> > avoids this minefield, but requires ABI changes every time we  
want to
>> > emulate something in userspace.  Can you provide a critique of  
this

>> > alternate approach?
>> Yeah, it doesn't scale as well. The SPR read/write give us all  
information we need to emulate other registers too, like the magical  
"read this SPR and automatically get the interrupt vector from the  
MPIC and ack the interrupt along the way" register we have on e500.

>
> That's not actually how the register works in hardware (though it  
may be a reasonable way to emulate it with a userspace mpic).  The  
interrupt is acknowledged when the core branches to the interrupt  
vector.  The register itself is just storage that gets filled when  
that happens.


Mind to enlighten me again on how exactly this mode gets enabled so  
that an OS that does not make use of the SPR can still ask the MPIC  
by hand :)?


GCR[M] is set to 3 for external proxy mode, versus 1 for traditional  
operation.


-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] KVM: ARM: Handle I/O aborts

2012-10-08 Thread Christoffer Dall
On Mon, Oct 8, 2012 at 6:04 AM, Dave Martin  wrote:
> On Fri, Oct 05, 2012 at 10:00:25AM +0100, Russell King - ARM Linux wrote:
>> On Mon, Oct 01, 2012 at 01:53:26PM +0100, Dave Martin wrote:
>> > A good starting point would be load/store emulation as this seems to be a
>> > common theme, and we would need a credible deployment for any new
>> > framework so that we know it's fit for purpose.
>>
>> Probably not actually, that code is written to be fast, because things
>> like IP stack throughput depend on it - particularly when your network
>> card can only DMA packets to 32-bit aligned addresses (resulting in
>> virtually all network data being misaligned.)
>
> A fair point, but surely it would still be worth a try?
>
> We might decide that a few particular cases of instruction decode
> should not use the generic framework for performance reaons, but in
> most cases being critically dependent on fault-driven software
> emulation for performance would be a serious mistake in the first place
> (discussions about the network code notwithstanding).
>
> This is not an argument for being slower just for the sake of it, but
> it can make sense to factor code on paths where performance is not an
> issue.
>

I'm all for unifying this stuff, but I still think it doesn't qualify
for holding back on merging KVM patches. The ARM mode instruction
decoding can definitely be cleaned up though to look more like the
Thumb2 mode decoding which will be a good step before refactoring to
use a more common framework. Currently we decode too many types of
instructions (not just the ones with cleared HSR.IV) in ARM mode, so
the whole complexity of that code can be reduced.

I'll give that a go before re-sending the KVM patch series.

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: INFO: rcu_preempt detected stalls on CPUs/tasks: { 1} (detected by 0, t=10002 jiffies)

2012-10-08 Thread John Stultz

On 09/30/2012 04:59 AM, Fengguang Wu wrote:

On Sun, Sep 30, 2012 at 01:32:46PM +0200, Avi Kivity wrote:

On 09/30/2012 01:23 PM, Fengguang Wu wrote:

On Sun, Sep 30, 2012 at 01:10:55PM +0200, Avi Kivity wrote:

On 09/28/2012 05:35 AM, Paul E. McKenney wrote:

On Thu, Sep 27, 2012 at 12:40:44PM +0800, Fengguang Wu wrote:

On Wed, Sep 26, 2012 at 09:28:50PM -0700, Paul E. McKenney wrote:

On Thu, Sep 27, 2012 at 10:54:00AM +0800, Fengguang Wu wrote:

On Wed, Sep 26, 2012 at 09:45:43AM -0700, Paul E. McKenney wrote:

On Wed, Sep 26, 2012 at 04:15:01PM +0800, Fengguang Wu wrote:

[ . . . ]


But could you also please send your .config file and a description of

.config attached.


the workload you are running?

It's basically the below commands. The exact initrd is not relevant in
this case because it's a boot time warning before user space is
started. The stalls roughly happen 1 time on every 10 boots.

Yow!!!

You have severe cross-CPU time-synchronization problems.  See for
example the first dmesg, with the relevant part extracted right here.
One CPU believes that it is about 37 seconds past boot, and the other
CPU beleives that it is about 137 seconds past boot.  Given that large
of a time difference, an RCU CPU stall warning is expected behavior.

Good spot! Yeah I noticed that huge timestamp gap, however didn't take
it seriously enough..


Get your two CPUs in agreement about what time it is, and I bet that
the CPU stall warnings will go away.

Possibly KVM related? Because the warnings show up in many test boxes
running KVM and so is not likely some hardware specific issue.

I vaguely recall seeing something recently.  But let's ask the KVM and
timekeeping guys.

>From the logs it looks like hpet (why not kvmclock?) is used for the
clock, it should not generate such drifts since it is a global clock.
Can you verify current_clocksource on a boot that actually failed (in
case the clocksource is switched during runtime)?

I've checked out the dmesg that's cited by Paul, attached. Yes it
contains lines

[4.970051] Switching to clocksource hpet

and then

[7.250353] Switching to clocksource tsc

And there is no kvm-clock lines. Oh well for this particular kernel:


Ah, tsc will certainly break on kvm if the hardware doesn't provide a
constant tsc source.  I'm surprised the guest kernel didn't detect it
and switch back to hpet though.

Thanks, it's good to know the root cause. All the dmesgs show the same hpet+tsc
switching pattern (and never switch back):

$ grep Switching dmesg-kvm_bisect2-inn-*21
dmesg-kvm_bisect2-inn-41931-2012-09-27-10-37-51-3.6.0-rc7-bisect2-00078-g593d100-21:[
4.111415] Switching to clocksource hpet
dmesg-kvm_bisect2-inn-41931-2012-09-27-10-37-51-3.6.0-rc7-bisect2-00078-g593d100-21:[
6.550098] Switching to clocksource tsc


Is this still an open issue? Fengguang's mail sounds like its resolved, 
but I'm not sure it is.


The switching from HPET -> TSC  I believe is expected, as the refined 
calibration will delay the TSC from being registered for a few seconds.  
However, its unclear why the TSC, if it is faulty, isn't being caught 
and demoted by the clocksource watchdog.


I'm also curious why this originally bisected down to 
06ae115a1d551cd952d8  (when using the kvm clock) if it was more of a 
hardware issue. And in those logs, I don't see the printk time-stamp 
inconsistencies that were alluded to in this thread.


Fengguang: Is this still reproducible? Do you have any details (dmesg) 
about host system as well?


thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-08 Thread Rusty Russell
Anthony Liguori  writes:
> Gerd Hoffmann  writes:
>
>>   Hi,
>>
 So we could have for virtio something like this:

 Capabilities: [??] virtio-regs:
 legacy: BAR=0 offset=0
 virtio-pci: BAR=1 offset=1000
 virtio-cfg: BAR=1 offset=1800
>>> 
>>> This would be a vendor specific PCI capability so lspci wouldn't
>>> automatically know how to parse it.
>>
>> Sure, would need a patch to actually parse+print the cap,
>> /me was just trying to make my point clear in a simple way.
>>
>> 2) ISTR an argument about mapping the ISR register separately, for
>>performance, but I can't find a reference to it.
>
> I think the rationale is that ISR really needs to be PIO but everything
> else doesn't.  PIO is much faster on x86 because it doesn't require
> walking page tables or instruction emulation to handle the exit.

 Is this still a pressing issue?  With MSI-X enabled ISR isn't needed,
 correct?  Which would imply that pretty much only old guests without
 MSI-X support need this, and we don't need to worry that much when
 designing something new ...
>>> 
>>> It wasn't that long ago that MSI-X wasn't supported..  I think we should
>>> continue to keep ISR as PIO as it is a fast path.
>>
>> No problem if we allow to have both legacy layout and new layout at the
>> same time.  Guests can continue to use ISR @ BAR0 in PIO space for
>> existing virtio devices, even in case they want use mmio for other
>> registers -> all fine.
>>
>> New virtio devices can support MSI-X from day one and decide to not
>> expose a legacy layout PIO bar.
>
> I think having BAR1 be an MMIO mirror of the registers + a BAR2 for
> virtio configuration space is probably not that bad of a solution.

Well, we also want to clean up the registers, so how about:

BAR0: legacy, as is.  If you access this, don't use the others.
BAR1: new format virtio-pci layout.  If you use this, don't use BAR0.
BAR2: virtio-cfg.  If you use this, don't use BAR0.
BAR3: ISR. If you use this, don't use BAR0.

I prefer the cases exclusive (ie. use one or the other) as a clear path
to remove the legacy layout; and leaving the ISR in BAR0 leaves us with
an ugly corner case in future (ISR is BAR0 + 19?  WTF?).

As to MMIO vs PIO, the BARs are self-describing, so we should explicitly
endorse that and leave it to the devices.

The detection is simple: if BAR1 has non-zero length, it's new-style,
otherwise legacy.

Thoughts?
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] virt/kvm: change kvm_assign_device() to print return value when iommu_attach_device() fails

2012-10-08 Thread Shuah Khan
Change existing kernel error message to include return value from
iommu_attach_device() when it fails. This will help debug device
assignment failures more effectively.

Signed-off-by: Shuah Khan 
---
 virt/kvm/iommu.c |6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index 037cb67..18e1e30 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -168,11 +168,7 @@ int kvm_assign_device(struct kvm *kvm,
 
r = iommu_attach_device(domain, &pdev->dev);
if (r) {
-   printk(KERN_ERR "assign device %x:%x:%x.%x failed",
-   pci_domain_nr(pdev->bus),
-   pdev->bus->number,
-   PCI_SLOT(pdev->devfn),
-   PCI_FUNC(pdev->devfn));
+   dev_err(&pdev->dev, "kvm assign device failed ret %d", r);
return r;
}
 
-- 
1.7.9.5



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Question] Intercept CR3 access in EPT

2012-10-08 Thread R
Hi,

Actually, I know that disabling EPT would work. But thank U anyway.
What I interesting in is why would it fail when EPT is enable.

Thank U for answering.

2012/10/9 Marcelo Tosatti :
> On Mon, Oct 08, 2012 at 04:15:57PM +0800, R wrote:
>> Hi,
>>
>> I am a student. And my teacher told me to monitor every process in guest.
>> So, I try to intercept every Cr3 access. However, if kvm is loaded
>> with EPT enable, Acesses to  Cr3  would not cause VM-exit.
>
> Disable EPT by loading kvm-intel.ko module with enable_ept=0 parameter.
> Then, CR3 accesses will trap.
>
>> I modified the code to change vmcs configuration.
>> To be specific, these functions are rewirted.
>> static void ept_update_paging_mode_cr0(unsigned long *hw_cr0,
>>   unsigned long cr0,
>>   struct kvm_vcpu *vcpu)
>> {
>> 
>> } else if (!is_paging(vcpu)) {
>>   /* From nonpaging to paging */
>>   vmcs_write32(CPU_BASED_VM_EXEC_CONTROL,
>>vmcs_read32(CPU_BASED_VM_EXEC_CONTROL) &
>> -  ~(CPU_BASED_CR3_LOAD_EXITING |
>> +   ~(//   CPU_BASED_CR3_LOAD_EXITING|
>>  CPU_BASED_CR3_STORE_EXITING));
>>   
>> }
>>
>> static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
>> {
>>  ...
>> if (_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) 
>> {
>>   /* CR3 accesses and invlpg don't need to cause VM Exits when 
>> EPT
>>  enabled */
>> - _cpu_based_exec_control &= ~(CPU_BASED_CR3_LOAD_EXITING |
>> +  _cpu_based_exec_control &= ~( //
>> CPU_BASED_CR3_LOAD_EXITING |
>>CPU_BASED_CR3_STORE_EXITING |
>>CPU_BASED_INVLPG_EXITING);
>> 
>> }
>>
>> I though it can force every Cr3 access to be trapped with EPT enable.
>> However, VM seems to fail to boot when it changes from nonpaging to
>> paging.
>> Do U guys have any idea? Or Can someone tell me how can I intercept
>> Cr3 access and why can not it work?
>>
>> Thank U for answering.
>
>>
>> --
>> Thanks
>> Rui Wu
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Thanks
Rui Wu
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-08 Thread Anthony Liguori
Rusty Russell  writes:

> Anthony Liguori  writes:
>> Gerd Hoffmann  writes:
>>
>>>   Hi,
>>>
> So we could have for virtio something like this:
>
> Capabilities: [??] virtio-regs:
> legacy: BAR=0 offset=0
> virtio-pci: BAR=1 offset=1000
> virtio-cfg: BAR=1 offset=1800
 
 This would be a vendor specific PCI capability so lspci wouldn't
 automatically know how to parse it.
>>>
>>> Sure, would need a patch to actually parse+print the cap,
>>> /me was just trying to make my point clear in a simple way.
>>>
>>> 2) ISTR an argument about mapping the ISR register separately, for
>>>performance, but I can't find a reference to it.
>>
>> I think the rationale is that ISR really needs to be PIO but everything
>> else doesn't.  PIO is much faster on x86 because it doesn't require
>> walking page tables or instruction emulation to handle the exit.
>
> Is this still a pressing issue?  With MSI-X enabled ISR isn't needed,
> correct?  Which would imply that pretty much only old guests without
> MSI-X support need this, and we don't need to worry that much when
> designing something new ...
 
 It wasn't that long ago that MSI-X wasn't supported..  I think we should
 continue to keep ISR as PIO as it is a fast path.
>>>
>>> No problem if we allow to have both legacy layout and new layout at the
>>> same time.  Guests can continue to use ISR @ BAR0 in PIO space for
>>> existing virtio devices, even in case they want use mmio for other
>>> registers -> all fine.
>>>
>>> New virtio devices can support MSI-X from day one and decide to not
>>> expose a legacy layout PIO bar.
>>
>> I think having BAR1 be an MMIO mirror of the registers + a BAR2 for
>> virtio configuration space is probably not that bad of a solution.
>
> Well, we also want to clean up the registers, so how about:
>
> BAR0: legacy, as is.  If you access this, don't use the others.
> BAR1: new format virtio-pci layout.  If you use this, don't use BAR0.
> BAR2: virtio-cfg.  If you use this, don't use BAR0.
> BAR3: ISR. If you use this, don't use BAR0.
>
> I prefer the cases exclusive (ie. use one or the other) as a clear path
> to remove the legacy layout; and leaving the ISR in BAR0 leaves us with
> an ugly corner case in future (ISR is BAR0 + 19?  WTF?).

We'll never remove legacy so we shouldn't plan on it.  There are
literally hundreds of thousands of VMs out there with the current virtio
drivers installed in them.  We'll be supporting them for a very, very
long time :-)

I don't think we gain a lot by moving the ISR into a separate BAR.
Splitting up registers like that seems weird to me too.

It's very normal to have a mirrored set of registers that are PIO in one
bar and MMIO in a different BAR.

If we added an additional constraints that BAR1 was mirrored except for
the config space and the MSI section was always there, I think the end
result would be nice.  IOW:

BAR0[pio]: virtio-pci registers + optional MSI section + virtio-config
BAR1[mmio]: virtio-pci registers + MSI section + future extensions
BAR2[mmio]: virtio-config

We can continue to do ISR access via BAR0 for performance reasons.

> As to MMIO vs PIO, the BARs are self-describing, so we should explicitly
> endorse that and leave it to the devices.
>
> The detection is simple: if BAR1 has non-zero length, it's new-style,
> otherwise legacy.

I agree that this is the best way to extend, but I think we should still
use a transport feature bit.  We want to be able to detect within QEMU
whether a guest is using these new features because we need to adjust
migration state accordingly.

Otherwise we would have to detect reads/writes to the new BARs to
maintain whether the extended register state needs to be saved.  This
gets nasty dealing with things like reset.

A feature bit simplifies this all pretty well.

Regards,

Anthony Liguori

>
> Thoughts?
> Rusty.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-08 Thread Rusty Russell
Anthony Liguori  writes:
> We'll never remove legacy so we shouldn't plan on it.  There are
> literally hundreds of thousands of VMs out there with the current virtio
> drivers installed in them.  We'll be supporting them for a very, very
> long time :-)

You will be supporting this for qemu on x86, sure.  As I think we're
still in the growth phase for virtio, I prioritize future spec
cleanliness pretty high.

But I think you'll be surprised how fast this is deprecated:
1) Bigger queues for block devices (guest-specified ringsize)
2) Smaller rings for openbios (guest-specified alignment)
3) All-mmio mode (powerpc)
4) Whatever network features get numbers > 31.

> I don't think we gain a lot by moving the ISR into a separate BAR.
> Splitting up registers like that seems weird to me too.

Confused.  I proposed the same split as you have, just ISR by itself.

> It's very normal to have a mirrored set of registers that are PIO in one
> bar and MMIO in a different BAR.
>
> If we added an additional constraints that BAR1 was mirrored except for
> the config space and the MSI section was always there, I think the end
> result would be nice.  IOW:

But it won't be the same, because we want all that extra stuff, like
more feature bits and queue size alignment.  (Admittedly queues past
16TB aren't a killer feature).

To make it concrete:

Current:
struct {
__le32 host_features;   /* read-only */
__le32 guest_features;  /* read/write */
__le32 queue_pfn;   /* read/write */
__le16 queue_size;  /* read-only */
__le16 queue_sel;   /* read/write */
__le16 queue_notify;/* read/write */
u8 status;  /* read/write */
u8 isr; /* read-only, clear on read */
/* Optional */
__le16 msi_config_vector;   /* read/write */
__le16 msi_queue_vector;/* read/write */
/* ... device features */
};

Proposed:
struct virtio_pci_cfg {
/* About the whole device. */
__le32 device_feature_select;   /* read-write */
__le32 device_feature;  /* read-only */
__le32 guest_feature_select;/* read-write */
__le32 guest_feature;   /* read-only */
__le16 msix_config; /* read-write */
__u8 device_status; /* read-write */
__u8 unused;

/* About a specific virtqueue. */
__le16 queue_select;/* read-write */
__le16 queue_align; /* read-write, power of 2. */
__le16 queue_size;  /* read-write, power of 2. */
__le16 queue_msix_vector;/* read-write */
__le64 queue_address;   /* read-write: 0x == DNE. */
};

struct virtio_pci_isr {
__u8 isr; /* read-only, clear on read */
};

We could also enforce LE in the per-device config space in this case,
another nice cleanup for PCI people.

> BAR0[pio]: virtio-pci registers + optional MSI section + virtio-config
> BAR1[mmio]: virtio-pci registers + MSI section + future extensions
> BAR2[mmio]: virtio-config
>
> We can continue to do ISR access via BAR0 for performance reasons.

But powerpc explicitly *doesnt* want a pio bar.  So let it be its own
bar, which can be either.

>> As to MMIO vs PIO, the BARs are self-describing, so we should explicitly
>> endorse that and leave it to the devices.
>>
>> The detection is simple: if BAR1 has non-zero length, it's new-style,
>> otherwise legacy.
>
> I agree that this is the best way to extend, but I think we should still
> use a transport feature bit.  We want to be able to detect within QEMU
> whether a guest is using these new features because we need to adjust
> migration state accordingly.
>
> Otherwise we would have to detect reads/writes to the new BARs to
> maintain whether the extended register state needs to be saved.  This
> gets nasty dealing with things like reset.

I don't think it'll be that bad; reset clears the device to unknown,
bar0 moves it from unknown->legacy mode, bar1/2/3 changes it from
unknown->modern mode, and anything else is bad (I prefer being strict so
we catch bad implementations from the beginning).

But I'm happy to implement it and see what it's like.

> A feature bit simplifies this all pretty well.

I suspect it will be quite ugly, actually.  The guest has to use BAR0 to
get the features to see if they can use BAR1.  Do they ack the feature
(via BAR0) before accessing BAR1?  If not, qemu can't rely on the
feature bit.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


live attached disk cannot be found in the guest VM sometimes

2012-10-08 Thread Wangpan
Hi all,
I got a issue when attach disk to a qemu-kvm guest(kernel version: Linux debian 
3.2.0-3-amd64 #1 SMP Thu Jun 28 09:07:26 UTC 2012 x86_64 GNU/Linux)
The steps are:
1. using libvirt API start a guest
2. 'ping' the IP address of the guest once each time
3. when 'ping' return OK, I attach a ISCSI LVM disk into the guest by libvirt 
API, and the disk appears in the guest
this works successfully most of the time,
BUT sometimes I cannot see the disk in the guest(by 'fdisk -l') evenif the 
libvirt API return OK and I can see the disk is in the XML configuration dumped 
by 'virsh dumpxml'.

I check the syslog of the guest, and guess the reason may be that, the attach 
operation is before the kernel module loaded(pci_hotplug/acpiphp), so the 
attached disk doesn't appear in the guest. Is this right?
Log messages:
qemu log: 2012-10-08 09:47:31.753+: starting up(guest starts time, UTC 
should +8 to CST)
libvirt API call log: 2012-10-08 17:47:36,931 INFO Attach volume 1449 into 
virtual machine device 11606f9e-98ee-4857-99ea-14a576037bfc begin...
2012-10-08 17:47:37,757 INFO Successfully attach volume 1449 into 
11606f9e-98ee-4857-99ea-14a576037bfc, virtual machine device: /dev/ebs/xdey
syslog: Oct 8 17:47:42 debian kernel: imklog 5.8.11, log source = /proc/kmsg 
started

Any suggestion is welcome, thanks in advance.
Wangpan

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


buildbot failure in kvm on ia64

2012-10-08 Thread kvm
The Buildbot has detected a new failure on builder ia64 while building kvm.
Full details are available at:
 http://buildbot.b1-systems.de/kvm/builders/ia64/builds/688

Buildbot URL: http://buildbot.b1-systems.de/kvm/

Buildslave for this Build: b1_kvm_1

Build Reason: The Nightly scheduler named 'nightly_master' triggered this build
Build Source Stamp: [branch master] HEAD
Blamelist: 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot



Re: live attached disk cannot be found in the guest VM sometimes

2012-10-08 Thread Wangpan
Another thing should be concerned:
when I attach another disk after a disk attached but cannot found in guest,
the TWO disks all appear in guest!

Oct  9 10:45:24 debian kernel: [61068.049509] pci :00:11.0: [1af4:1001] 
type 0 class 0x000100
Oct  9 10:45:24 debian kernel: [61068.049715] pci :00:11.0: reg 10: [io  
0x-0x003f]
Oct  9 10:45:24 debian kernel: [61068.049813] pci :00:11.0: reg 14: [mem 
0x-0x0fff]
Oct  9 10:45:24 debian kernel: [61068.051544] pci :00:11.0: BAR 1: assigned 
[mem 0xe000-0xefff]
Oct  9 10:45:24 debian kernel: [61068.051584] pci :00:11.0: BAR 1: set to 
[mem 0xe000-0xefff] (PCI address [0xe000-0xefff])
Oct  9 10:45:24 debian kernel: [61068.051591] pci :00:11.0: BAR 0: assigned 
[io  0x1000-0x103f]
Oct  9 10:45:24 debian kernel: [61068.051623] pci :00:11.0: BAR 0: set to 
[io  0x1000-0x103f] (PCI address [0x1000-0x103f])
Oct  9 10:45:24 debian kernel: [61068.051633] pci :00:00.0: no hotplug 
settings from platform
Oct  9 10:45:24 debian kernel: [61068.051636] pci :00:00.0: using default 
PCI settings
Oct  9 10:45:24 debian kernel: [61068.051695] pci :00:01.0: no hotplug 
settings from platform
Oct  9 10:45:24 debian kernel: [61068.051698] pci :00:01.0: using default 
PCI settings
Oct  9 10:45:24 debian kernel: [61068.051756] ata_piix :00:01.1: no hotplug 
settings from platform
Oct  9 10:45:24 debian kernel: [61068.051759] ata_piix :00:01.1: using 
default PCI settings
Oct  9 10:45:24 debian kernel: [61068.051818] uhci_hcd :00:01.2: no hotplug 
settings from platform
Oct  9 10:45:24 debian kernel: [61068.051820] uhci_hcd :00:01.2: using 
default PCI settings
Oct  9 10:45:24 debian kernel: [61068.051878] piix4_smbus :00:01.3: no 
hotplug settings from platform
Oct  9 10:45:24 debian kernel: [61068.051880] piix4_smbus :00:01.3: using 
default PCI settings
Oct  9 10:45:24 debian kernel: [61068.051938] pci :00:02.0: no hotplug 
settings from platform
Oct  9 10:45:24 debian kernel: [61068.051941] pci :00:02.0: using default 
PCI settings
Oct  9 10:45:24 debian kernel: [61068.052021] virtio-pci :00:03.0: no 
hotplug settings from platform
Oct  9 10:45:24 debian kernel: [61068.052021] virtio-pci :00:03.0: using 
default PCI settings
Oct  9 10:45:24 debian kernel: [61068.052146] virtio-pci :00:04.0: no 
hotplug settings from platform
Oct  9 10:45:24 debian kernel: [61068.052146] virtio-pci :00:04.0: using 
default PCI settings
Oct  9 10:45:24 debian kernel: [61068.052173] virtio-pci :00:05.0: no 
hotplug settings from platform
Oct  9 10:45:24 debian kernel: [61068.052175] virtio-pci :00:05.0: using 
default PCI settings
Oct  9 10:45:24 debian kernel: [61068.052311] virtio-pci :00:06.0: no 
hotplug settings from platform
Oct  9 10:45:24 debian kernel: [61068.052316] virtio-pci :00:06.0: using 
default PCI settings
Oct  9 10:45:24 debian kernel: [61068.052337] pci :00:11.0: no hotplug 
settings from platform
Oct  9 10:45:24 debian kernel: [61068.052337] pci :00:11.0: using default 
PCI settings
Oct  9 10:45:24 debian kernel: [61068.052939] virtio-pci :00:11.0: enabling 
device ( -> 0003)   the new attached disk//
Oct  9 10:45:24 debian kernel: [61068.053982] virtio-pci :00:11.0: PCI INT 
A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ 10
Oct  9 10:45:24 debian kernel: [61068.054199] virtio-pci :00:11.0: setting 
latency timer to 64
Oct  9 10:45:24 debian kernel: [61068.054930] virtio-pci :00:11.0: irq 47 
for MSI/MSI-X
Oct  9 10:45:24 debian kernel: [61068.054964] virtio-pci :00:11.0: irq 48 
for MSI/MSI-X
Oct  9 10:45:25 debian kernel: [61068.219537]  vdc: unknown partition table  
/the new attached disk/
Oct  9 10:45:25 debian kernel: [61068.221828] pci :00:10.0: [1af4:1001] 
type 0 class 0x000100
Oct  9 10:45:25 debian kernel: [61068.222047] pci :00:10.0: reg 10: [io  
0x-0x003f]
Oct  9 10:45:25 debian kernel: [61068.222145] pci :00:10.0: reg 14: [mem 
0x-0x0fff]
Oct  9 10:45:25 debian kernel: [61068.223162] pci :00:10.0: BAR 1: assigned 
[mem 0xe0001000-0xe0001fff]
Oct  9 10:45:25 debian kernel: [61068.223223] pci :00:10.0: BAR 1: set to 
[mem 0xe0001000-0xe0001fff] (PCI address [0xe0001000-0xe0001fff])
Oct  9 10:45:25 debian kernel: [61068.223223] pci :00:10.0: BAR 0: assigned 
[io  0x1040-0x107f]
Oct  9 10:45:25 debian kernel: [61068.223260] pci :00:10.0: BAR 0: set to 
[io  0x1040-0x107f] (PCI address [0x1040-0x107f])
Oct  9 10:45:25 debian kernel: [61068.223427] pci :00:00.0: no hotplug 
settings from platform
Oct  9 10:45:25 debian kernel: [61068.223430] pci :00:00.0: using default 
PCI settings
Oct  9 10:45:25 debian kernel: [61068.223495] pci :00:01.0: no hotplug 
settings from platform
Oct  9 10:45:25 debian kernel: [61068.223497] pci :00:01.0: using default 
PCI settings

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-08 Thread Rusty Russell
Paolo Bonzini  writes:
> Il 05/10/2012 07:43, Rusty Russell ha scritto:
>> That's good.  But virtio_blk's scsi command is insoluble AFAICT.  As I
>> said to Anthony, the best rules are "always" and "never", so I'd really
>> rather not have to grandfather that in.
>
> It is, but we can add a rule that if the (transport) flag
> VIRTIO_RING_F_ANY_HEADER_SG is set, the cdb field is always 32 bytes in
> virtio-blk.

Could we do that?  It's the cmd length I'm concerned about; is it always
32 in practice for some reason?

Currently qemu does:

struct sg_io_hdr hdr;
memset(&hdr, 0, sizeof(struct sg_io_hdr));
hdr.interface_id = 'S';
hdr.cmd_len = req->elem.out_sg[1].iov_len;
hdr.cmdp = req->elem.out_sg[1].iov_base;
hdr.dxfer_len = 0;

If it's a command which expects more output data, there's no way to
guess where the boundary is between that command and the data.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


buildbot failure in kvm on next-ia64

2012-10-08 Thread kvm
The Buildbot has detected a new failure on builder next-ia64 while building kvm.
Full details are available at:
 http://buildbot.b1-systems.de/kvm/builders/next-ia64/builds/673

Buildbot URL: http://buildbot.b1-systems.de/kvm/

Buildslave for this Build: b1_kvm_1

Build Reason: The Nightly scheduler named 'nightly_next' triggered this build
Build Source Stamp: [branch next] HEAD
Blamelist: 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot



Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-08 Thread Gerd Hoffmann
  Hi,

>> Well, we also want to clean up the registers, so how about:
>>
>> BAR0: legacy, as is.  If you access this, don't use the others.

Ok.

>> BAR1: new format virtio-pci layout.  If you use this, don't use BAR0.
>> BAR2: virtio-cfg.  If you use this, don't use BAR0.

Why use two bars for this?  You can put them into one mmio bar, together
with the msi-x vector table and PBA.  Of course a pci capability
describing the location is helpful for that ;)

>> BAR3: ISR. If you use this, don't use BAR0.

Again, I wouldn't hardcode that but use a capability.

>> I prefer the cases exclusive (ie. use one or the other) as a clear path
>> to remove the legacy layout; and leaving the ISR in BAR0 leaves us with
>> an ugly corner case in future (ISR is BAR0 + 19?  WTF?).

Ok, so we have four register sets:

  (1) legacy layout
  (2) new virtio-pci
  (3) new virtio-config
  (4) new virtio-isr

We can have a vendor pci capability, with a dword for each register set:

  bit  31-- present bit
  bits 26-24 -- bar
  bits 23-0  -- offset

So current drivers which must support legacy can use this:

  legacy layout -- present, bar 0, offset 0
  new virtio-pci-- present, bar 1, offset 0
  new virtio-config -- present, bar 1, offset 256
  new virtio-isr-- present, bar 0, offset 19

[ For completeness: msi-x capability could add this: ]

  msi-x vector tablebar 1, offset 512
  msi-x pba bar 1, offset 768

> We'll never remove legacy so we shouldn't plan on it.  There are
> literally hundreds of thousands of VMs out there with the current virtio
> drivers installed in them.  We'll be supporting them for a very, very
> long time :-)

But new devices (virtio-qxl being a candidate) don't have old guests and
don't need to worry.

They could use this if they care about fast isr:

  legacy layout -- not present
  new virtio-pci-- present, bar 1, offset 0
  new virtio-config -- present, bar 1, offset 256
  new virtio-isr-- present, bar 0, offset 0

Or this if they don't worry about isr performance:

  legacy layout -- not present
  new virtio-pci-- present, bar 0, offset 0
  new virtio-config -- present, bar 0, offset 256
  new virtio-isr-- not present

> I don't think we gain a lot by moving the ISR into a separate BAR.
> Splitting up registers like that seems weird to me too.

Main advantage of defining a register set with just isr is that it
reduces pio address space consumtion for new virtio devices which don't
have to worry about the legacy layout (8 bytes which is minimum size for
io bars instead of 64 bytes).

> If we added an additional constraints that BAR1 was mirrored except for

Why add constraints?  We want something future-proof, don't we?

>> The detection is simple: if BAR1 has non-zero length, it's new-style,
>> otherwise legacy.

Doesn't fly.  BAR1 is in use today for MSI-X support.

> I agree that this is the best way to extend, but I think we should still
> use a transport feature bit.  We want to be able to detect within QEMU
> whether a guest is using these new features because we need to adjust
> migration state accordingly.

Why does migration need adjustments?

[ Not that I want veto a feature bit, but I don't see the need yet ]

cheers,
  Gerd
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html