kvm + vmwgfx

2012-04-19 Thread Alberich de megres
Hello!

Anyone has succeed on making a kvm guest linux, say fedora 16, with
vmwgfx driver?
I want to make it work over dri interface, but probably without the
x-org, it will be some testbed for wayland builds.

My host is a fedora 16, with kernel 3.3.0-8, with a working dri card.

I run the kvm:
qemu-kvm -vga vmware -hda f16.kernel-3.3.1.qcow

When intro the guest OS, i try to modprobe kvm, drm, ttm and last vmwgfx.

I got the following output in dmesg:
 [drm:vmw_driver_load] *ERROR* Hardware has no pitchloc

the probe failed with error -38

Any suggestion?
thanks!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Fix page-crossing MMIO

2012-04-19 Thread Gleb Natapov
On Wed, Apr 18, 2012 at 07:22:47PM +0300, Avi Kivity wrote:
> MMIO that are split across a page boundary are currently broken - the
> code does not expect to be aborted by the exit to userspace for the
> first MMIO fragment.
> 
> This patch fixes the problem by generalizing the current code for handling
> 16-byte MMIOs to handle a number of "fragments", and changes the MMIO
> code to create those fragments.
> 
For 16 bit IO userspace will see two 8bit writes. Is this OK? Is there
real code that does this kind of IO, or the patch is for correctness
only?

> Signed-off-by: Avi Kivity 
> ---
>  arch/ia64/include/asm/kvm_host.h |2 +
>  arch/ia64/kvm/kvm-ia64.c |   10 ++--
>  arch/x86/kvm/x86.c   |  114 
> +++---
>  include/linux/kvm_host.h |   31 +--
>  4 files changed, 115 insertions(+), 42 deletions(-)
> 
> diff --git a/arch/ia64/include/asm/kvm_host.h 
> b/arch/ia64/include/asm/kvm_host.h
> index c4b4bac..6d6a5ac 100644
> --- a/arch/ia64/include/asm/kvm_host.h
> +++ b/arch/ia64/include/asm/kvm_host.h
> @@ -449,6 +449,8 @@ struct kvm_vcpu_arch {
>   char log_buf[VMM_LOG_LEN];
>   union context host;
>   union context guest;
> +
> + char mmio_data[8];
>  };
>  
>  struct kvm_vm_stat {
> diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
> index 9d80ff8..882ab21 100644
> --- a/arch/ia64/kvm/kvm-ia64.c
> +++ b/arch/ia64/kvm/kvm-ia64.c
> @@ -232,12 +232,12 @@ static int handle_mmio(struct kvm_vcpu *vcpu, struct 
> kvm_run *kvm_run)
>   if ((p->addr & PAGE_MASK) == IOAPIC_DEFAULT_BASE_ADDRESS)
>   goto mmio;
>   vcpu->mmio_needed = 1;
> - vcpu->mmio_phys_addr = kvm_run->mmio.phys_addr = p->addr;
> - vcpu->mmio_size = kvm_run->mmio.len = p->size;
> + vcpu->mmio_fragments[0].gpa = kvm_run->mmio.phys_addr = p->addr;
> + vcpu->mmio_fragments[0].len = kvm_run->mmio.len = p->size;
>   vcpu->mmio_is_write = kvm_run->mmio.is_write = !p->dir;
>  
>   if (vcpu->mmio_is_write)
> - memcpy(vcpu->mmio_data, &p->data, p->size);
> + memcpy(vcpu->arch.mmio_data, &p->data, p->size);
>   memcpy(kvm_run->mmio.data, &p->data, p->size);
>   kvm_run->exit_reason = KVM_EXIT_MMIO;
>   return 0;
> @@ -719,7 +719,7 @@ static void kvm_set_mmio_data(struct kvm_vcpu *vcpu)
>   struct kvm_mmio_req *p = kvm_get_vcpu_ioreq(vcpu);
>  
>   if (!vcpu->mmio_is_write)
> - memcpy(&p->data, vcpu->mmio_data, 8);
> + memcpy(&p->data, vcpu->arch.mmio_data, 8);
>   p->state = STATE_IORESP_READY;
>  }
>  
> @@ -739,7 +739,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
> kvm_run *kvm_run)
>   }
>  
>   if (vcpu->mmio_needed) {
> - memcpy(vcpu->mmio_data, kvm_run->mmio.data, 8);
> + memcpy(vcpu->arch.mmio_data, kvm_run->mmio.data, 8);
>   kvm_set_mmio_data(vcpu);
>   vcpu->mmio_read_completed = 1;
>   vcpu->mmio_needed = 0;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0d9a578..4de705c 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3718,9 +3718,8 @@ struct read_write_emulator_ops {
>  static int read_prepare(struct kvm_vcpu *vcpu, void *val, int bytes)
>  {
>   if (vcpu->mmio_read_completed) {
> - memcpy(val, vcpu->mmio_data, bytes);
>   trace_kvm_mmio(KVM_TRACE_MMIO_READ, bytes,
> -vcpu->mmio_phys_addr, *(u64 *)val);
> +vcpu->mmio_fragments[0].gpa, *(u64 *)val);
>   vcpu->mmio_read_completed = 0;
>   return 1;
>   }
> @@ -3756,8 +3755,9 @@ static int read_exit_mmio(struct kvm_vcpu *vcpu, gpa_t 
> gpa,
>  static int write_exit_mmio(struct kvm_vcpu *vcpu, gpa_t gpa,
>  void *val, int bytes)
>  {
> - memcpy(vcpu->mmio_data, val, bytes);
> - memcpy(vcpu->run->mmio.data, vcpu->mmio_data, 8);
> + struct kvm_mmio_fragment *frag = &vcpu->mmio_fragments[0];
> +
> + memcpy(vcpu->run->mmio.data, frag->data, frag->len);
>   return X86EMUL_CONTINUE;
>  }
>  
> @@ -3784,10 +3784,7 @@ static int emulator_read_write_onepage(unsigned long 
> addr, void *val,
>   gpa_t gpa;
>   int handled, ret;
>   bool write = ops->write;
> -
> - if (ops->read_write_prepare &&
> -   ops->read_write_prepare(vcpu, val, bytes))
> - return X86EMUL_CONTINUE;
> + struct kvm_mmio_fragment *frag;
>  
>   ret = vcpu_mmio_gva_to_gpa(vcpu, addr, &gpa, exception, write);
>  
> @@ -3813,15 +3810,19 @@ static int emulator_read_write_onepage(unsigned long 
> addr, void *val,
>   bytes -= handled;
>   val += handled;
>  
> - vcpu->mmio_needed = 1;
> - vcpu->run->exit_reason = KVM_EXIT_MMIO;
> - vcpu->run->mmio.phys_addr = vcpu->mmio_phys_addr = gpa;
> - vcpu->mmio_size = bytes;
> - vcpu->run->mmio.len = min(vcpu->mmio_size, 8);
> - vcpu->run

Re: [PATCH] KVM: Fix page-crossing MMIO

2012-04-19 Thread Avi Kivity
On 04/19/2012 11:59 AM, Gleb Natapov wrote:
> On Wed, Apr 18, 2012 at 07:22:47PM +0300, Avi Kivity wrote:
> > MMIO that are split across a page boundary are currently broken - the
> > code does not expect to be aborted by the exit to userspace for the
> > first MMIO fragment.
> > 
> > This patch fixes the problem by generalizing the current code for handling
> > 16-byte MMIOs to handle a number of "fragments", and changes the MMIO
> > code to create those fragments.
> > 
> For 16 bit IO userspace will see two 8bit writes. Is this OK? 

I believe so.  The Pentium bus was 64-bit wide with no address lines
A0-A2, instead it had byte enables BE0-BE7.  So a write that crosses a
quadword boundary would be split.  Similarly the 32-bit PCI bus has 4
byte enables, so anything that crosses a dword boundary is split. 
LOCKed ops are implemented by asserting a signal during the two
operations.  No idea how it's implemented with modern busless
processors, but the same semantics have been kept, probably.

Even more interesting is that a 32-bit write is seen as an 8 bit write
followed by a 24-bit write.

Note also that the two accesses can target different devices.

> Is there
> real code that does this kind of IO, or the patch is for correctness
> only?

I should have mentioned it in the changelog - it's Windows 95.  There is
also a Red Hat Linux version that does something like this, I remember
adding some code in qemu-kvm to handle the 3-byte write.  So I guess
this fixes a regression.

(not sure - in the Windows 95 case we have a write to VGA that crosses a
page boundary (but the pages are physically contiguous, nothing strange
is going on; the RHL case was writing partially to RAM and partially to
unallocated memory, so it might work with the current code).

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm + vmwgfx

2012-04-19 Thread Avi Kivity
On 04/19/2012 10:34 AM, Alberich de megres wrote:
> Hello!
>
> Anyone has succeed on making a kvm guest linux, say fedora 16, with
> vmwgfx driver?
> I want to make it work over dri interface, but probably without the
> x-org, it will be some testbed for wayland builds.
>
> My host is a fedora 16, with kernel 3.3.0-8, with a working dri card.
>
> I run the kvm:
> qemu-kvm -vga vmware -hda f16.kernel-3.3.1.qcow
>
> When intro the guest OS, i try to modprobe kvm, drm, ttm and last vmwgfx.
>
> I got the following output in dmesg:
>  [drm:vmw_driver_load] *ERROR* Hardware has no pitchloc
>
> the probe failed with error -38
>

IIUC the vmware device emulation was based on reverse-engineering the
linux xorg driver, not on a spec.  As such it may be incomplete.

You can either reverse-engineer the linux driver for the missing bits,
or ask vmware for documentation.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Introduce bitmask for apic attention reasons.

2012-04-19 Thread Gleb Natapov
The patch introduces a bitmap that will hold reasons apic should be
checked during vmexit. This is in a preparation for vp eoi patch
that will add one more check on vmexit. With the bitmap we can do
if(apic_attention) to check everything simultaneously which will
add zero overhead on the fast path.

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_host.h |3 +++
 arch/x86/kvm/lapic.c|   12 +++-
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f624ca7..fe4e85b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -172,6 +172,8 @@ enum {
 #define DR7_FIXED_10x0400
 #define DR7_VOLATILE   0x23ff
 
+#define KVM_APIC_CHECK_VAPIC   0
+
 /*
  * We don't want allocation failures within the mmu code, so we preallocate
  * enough memory for a single page fault in a cache.
@@ -337,6 +339,7 @@ struct kvm_vcpu_arch {
u64 efer;
u64 apic_base;
struct kvm_lapic *apic;/* kernel irqchip context */
+   unsigned long apic_attention;
int32_t apic_arb_prio;
int mp_state;
int sipi_vector;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 992b4ea..93c1574 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1088,6 +1088,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
apic_update_ppr(apic);
 
vcpu->arch.apic_arb_prio = 0;
+   vcpu->arch.apic_attention = 0;
 
apic_debug(KERN_INFO "%s: vcpu=%p, id=%d, base_msr="
   "0x%016" PRIx64 ", base_address=0x%0lx.\n", __func__,
@@ -1287,7 +1288,7 @@ void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
u32 data;
void *vapic;
 
-   if (!irqchip_in_kernel(vcpu->kvm) || !vcpu->arch.apic->vapic_addr)
+   if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
return;
 
vapic = kmap_atomic(vcpu->arch.apic->vapic_page);
@@ -1304,7 +1305,7 @@ void kvm_lapic_sync_to_vapic(struct kvm_vcpu *vcpu)
struct kvm_lapic *apic;
void *vapic;
 
-   if (!irqchip_in_kernel(vcpu->kvm) || !vcpu->arch.apic->vapic_addr)
+   if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
return;
 
apic = vcpu->arch.apic;
@@ -1324,10 +1325,11 @@ void kvm_lapic_sync_to_vapic(struct kvm_vcpu *vcpu)
 
 void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr)
 {
-   if (!irqchip_in_kernel(vcpu->kvm))
-   return;
-
vcpu->arch.apic->vapic_addr = vapic_addr;
+   if (vapic_addr)
+   __set_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention);
+   else
+   __clear_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention);
 }
 
 int kvm_x2apic_msr_write(struct kvm_vcpu *vcpu, u32 msr, u64 data)
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 0/3] switch to seavgabios

2012-04-19 Thread Avi Kivity
On 04/18/2012 08:07 AM, Gerhard Wiesinger wrote:
> Negative also here:
> Don't see anything on screen on startup...
>
> From log, latest qemu-kvm git version:
> Running option rom at c180:3d4e
> Running option rom at c180:3da2
> Running option rom at c180:3df6
> Running option rom at c580:0003
> qemu-system-x86_64: /root/download/qemu/git/qemu-kvm/exec.c:2641:
> register_subpage: Assertion `existing.mr->subpage || existing.mr ==
> &io_mem_unassigned' failed.
>


> Backtrace isn't valid.

Can you build with ./configure --disable-pie --enable-debug and retry?

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm + vmwgfx

2012-04-19 Thread Alberich de megres
Thanks!

Is there any other way on having at least 2d acceleration with qemu?
If it is possible, using ksm/drm.

Thanks!!!

On Thu, Apr 19, 2012 at 11:17 AM, Avi Kivity  wrote:
> On 04/19/2012 10:34 AM, Alberich de megres wrote:
>> Hello!
>>
>> Anyone has succeed on making a kvm guest linux, say fedora 16, with
>> vmwgfx driver?
>> I want to make it work over dri interface, but probably without the
>> x-org, it will be some testbed for wayland builds.
>>
>> My host is a fedora 16, with kernel 3.3.0-8, with a working dri card.
>>
>> I run the kvm:
>> qemu-kvm -vga vmware -hda f16.kernel-3.3.1.qcow
>>
>> When intro the guest OS, i try to modprobe kvm, drm, ttm and last vmwgfx.
>>
>> I got the following output in dmesg:
>>  [drm:vmw_driver_load] *ERROR* Hardware has no pitchloc
>>
>> the probe failed with error -38
>>
>
> IIUC the vmware device emulation was based on reverse-engineering the
> linux xorg driver, not on a spec.  As such it may be incomplete.
>
> You can either reverse-engineer the linux driver for the missing bits,
> or ask vmware for documentation.
>
> --
> error compiling committee.c: too many arguments to function
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Introduce bitmask for apic attention reasons.

2012-04-19 Thread Avi Kivity
On 04/19/2012 12:33 PM, Gleb Natapov wrote:
> The patch introduces a bitmap that will hold reasons apic should be
> checked during vmexit. This is in a preparation for vp eoi patch
> that will add one more check on vmexit. With the bitmap we can do
> if(apic_attention) to check everything simultaneously which will
> add zero overhead on the fast path.
>

Good idea.

>  
> +#define KVM_APIC_CHECK_VAPIC 0

Comment above relating this to apic_attention.

>   vcpu->arch.apic->vapic_addr = vapic_addr;
> + if (vapic_addr)
> + __set_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention);
> + else
> + __clear_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention);
>  }
>

Unrelated: this pattern is probably common.  Would be nice to have a
__deposit_bit() function.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm + vmwgfx

2012-04-19 Thread Avi Kivity
On 04/19/2012 12:53 PM, Alberich de megres wrote:
> Thanks!
>
> Is there any other way on having at least 2d acceleration with qemu?
> If it is possible, using ksm/drm.
>

-vga qxl.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm + vmwgfx

2012-04-19 Thread Alberich de megres
I tried this one, but I have no /dev/dri/card0,
nor any module is loaded in the kernel that uses drm.




On Thu, Apr 19, 2012 at 11:53 AM, Avi Kivity  wrote:
> On 04/19/2012 12:53 PM, Alberich de megres wrote:
>> Thanks!
>>
>> Is there any other way on having at least 2d acceleration with qemu?
>> If it is possible, using ksm/drm.
>>
>
> -vga qxl.
>
> --
> error compiling committee.c: too many arguments to function
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Biweekly upstream qemu-kvm test report (using autotest) - Week 16

2012-04-19 Thread Prem Karat
Folks,
 
This is a test report on upstream qemu-kvm testing (24853eece248d4a58d705c).
Tests were executed using latest autotest git (6a15572d5307fa0b).

This time we have tested 3 guests.

We are analysing the test results further to find out if the failure is in
autotest or in qemu-kvm. Will post the analysis asap and report bugs to
appropriate community.

Host Kernel level: 3.4.0-rc2+ (git: 923e9a1399b620d063cd8853).
Guest OS: Windows 7 64 SP1, Fedora 16 x86-64, RHEL 6.2 x86-64

Please find the detailed report below for all the 3 guests.


Host Kernel: Kernel: 3.4.0-rc2+
KVM Version:  1.0.50 (qemu-kvm-devel)
Guest OS: Windows 7 64 SP1
Date: Mon Apr 16 22:14:02 2012
Stat: From 13 tests executed, 5 have passed (61% failures)

Tests Failed:


 Test Name   
ResultRun time

kvm.qed.smp4.Win7.64.sp1.nic_hotplug.default.nic_virtio   
FAIL  108
kvm.qed.smp4.Win7.64.sp1.nic_hotplug.additional.nic_8139  
FAIL   75
kvm.qed.smp4.Win7.64.sp1.nic_hotplug.additional.nic_virtio
FAIL   17
kvm.qed.smp4.Win7.64.sp1.nic_hotplug.additional.nic_e1000 
FAIL  214
kvm.qed.smp4.Win7.64.sp1.block_hotplug.fmt_qcow2.block_virtio 
FAIL  115
kvm.qed.smp4.Win7.64.sp1.block_hotplug.fmt_qcow2.block_scsi   
FAIL  161
kvm.qed.smp4.Win7.64.sp1.block_hotplug.fmt_raw.block_virtio   
FAIL  160
kvm.qed.smp4.Win7.64.sp1.block_hotplug.fmt_raw.block_scsi 
FAIL  160


Tests Passed:


 Test Name   
ResultRun time

kvm.qed.smp4.Win7.64.sp1.boot 
PASS   70
kvm.qed.smp4.Win7.64.sp1.reboot   
PASS   62
kvm.qed.smp4.Win7.64.sp1.nic_hotplug.default.nic_8139 
PASS  119
kvm.qed.smp4.Win7.64.sp1.nic_hotplug.default.nic_e1000
PASS  184
kvm.qed.smp4.Win7.64.sp1.shutdown 
PASS  122

*

Host Kernel: Kernel: 3.4.0-rc2+
KVM Version:  1.0.50 (qemu-kvm-devel)
Guest OS: Fedora 16 x86-64
Date: Sat Apr 14 02:02:50 2012
Stat: From 28 tests executed, 20 have passed (28% failures)

Tests Failed:


 Test Name   
ResultRun time

kvm.raw.smp4.Fedora.16.64.autotest.bonnie 
FAIL  179
kvm.raw.smp4.Fedora.16.64.balloon_check   
FAIL   91 
kvm.raw.smp4.Fedora.16.64.balloon_check.balloon-migrate   
FAIL  151
kvm.raw.smp4.Fedora.16.64.cgroup.blkio_bandwidth  
FAIL  189
kvm.raw.smp4.Fedora.16.64.cgroup.blkio_throttle_multi 
FAIL  348
kvm.raw.smp4.Fedora.16.64.cgroup.cpuset_cpus  
FAIL8
kvm.raw.smp4.Fedora.16.64.cgroup.memory_move  
FAIL   50
kvm.raw.smp4.Fedora.16.64.cpu_hotplug_test
FAIL   96


Tests Passed:


 Test Name   
ResultRun time

kvm.raw.smp4.Fedora.16.64.boot
PASS   38
kvm.raw.smp4.Fedora.16.64.autotest.dbench 
PASS  109
kvm.raw.smp4.Fedora.16.64.autotest.ebizzy 
PASS   24
kvm.raw.smp4.Fedora.16.64.autotest.stress 
PASS   92
kvm.raw.smp4.Fedora.16.64.autotest.disktest   
PASS  209
kvm.raw.smp4.Fedora.16.64.autotest.hackbench  
PASS   29
kvm.raw.smp4.Fedora.16.64.autotest.iozone 

Re: [PATCH] Introduce bitmask for apic attention reasons.

2012-04-19 Thread Michael S. Tsirkin
On Thu, Apr 19, 2012 at 12:33:22PM +0300, Gleb Natapov wrote:
> The patch introduces a bitmap that will hold reasons apic should be
> checked during vmexit. This is in a preparation for vp eoi patch
> that will add one more check on vmexit. With the bitmap we can do
> if(apic_attention) to check everything simultaneously which will
> add zero overhead on the fast path.
> 
> Signed-off-by: Gleb Natapov 

Looks very clean, thanks!
I'll integrate this in the eoi patchset.

> ---
>  arch/x86/include/asm/kvm_host.h |3 +++
>  arch/x86/kvm/lapic.c|   12 +++-
>  2 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f624ca7..fe4e85b 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -172,6 +172,8 @@ enum {
>  #define DR7_FIXED_1  0x0400
>  #define DR7_VOLATILE 0x23ff
>  
> +#define KVM_APIC_CHECK_VAPIC 0
> +
>  /*
>   * We don't want allocation failures within the mmu code, so we preallocate
>   * enough memory for a single page fault in a cache.
> @@ -337,6 +339,7 @@ struct kvm_vcpu_arch {
>   u64 efer;
>   u64 apic_base;
>   struct kvm_lapic *apic;/* kernel irqchip context */
> + unsigned long apic_attention;
>   int32_t apic_arb_prio;
>   int mp_state;
>   int sipi_vector;
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 992b4ea..93c1574 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1088,6 +1088,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
>   apic_update_ppr(apic);
>  
>   vcpu->arch.apic_arb_prio = 0;
> + vcpu->arch.apic_attention = 0;
>  
>   apic_debug(KERN_INFO "%s: vcpu=%p, id=%d, base_msr="
>  "0x%016" PRIx64 ", base_address=0x%0lx.\n", __func__,
> @@ -1287,7 +1288,7 @@ void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
>   u32 data;
>   void *vapic;
>  
> - if (!irqchip_in_kernel(vcpu->kvm) || !vcpu->arch.apic->vapic_addr)
> + if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
>   return;
>  
>   vapic = kmap_atomic(vcpu->arch.apic->vapic_page);
> @@ -1304,7 +1305,7 @@ void kvm_lapic_sync_to_vapic(struct kvm_vcpu *vcpu)
>   struct kvm_lapic *apic;
>   void *vapic;
>  
> - if (!irqchip_in_kernel(vcpu->kvm) || !vcpu->arch.apic->vapic_addr)
> + if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
>   return;
>  
>   apic = vcpu->arch.apic;
> @@ -1324,10 +1325,11 @@ void kvm_lapic_sync_to_vapic(struct kvm_vcpu *vcpu)
>  
>  void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr)
>  {
> - if (!irqchip_in_kernel(vcpu->kvm))
> - return;
> -
>   vcpu->arch.apic->vapic_addr = vapic_addr;
> + if (vapic_addr)
> + __set_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention);
> + else
> + __clear_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention);
>  }
>  
>  int kvm_x2apic_msr_write(struct kvm_vcpu *vcpu, u32 msr, u64 data)
> -- 
> 1.7.7.3
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm + vmwgfx

2012-04-19 Thread Alon Levy
On Thu, Apr 19, 2012 at 11:57:50AM +0200, Alberich de megres wrote:
> I tried this one, but I have no /dev/dri/card0,
> nor any module is loaded in the kernel that uses drm.
> 

It doesn't use DRM/DRI yet. There is a userspace X driver,
xf86-video-qxl, that directly accesses the pci.

> 
> 
> 
> On Thu, Apr 19, 2012 at 11:53 AM, Avi Kivity  wrote:
> > On 04/19/2012 12:53 PM, Alberich de megres wrote:
> >> Thanks!
> >>
> >> Is there any other way on having at least 2d acceleration with qemu?
> >> If it is possible, using ksm/drm.
> >>
> >
> > -vga qxl.
> >
> > --
> > error compiling committee.c: too many arguments to function
> >
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Introduce bitmask for apic attention reasons.

2012-04-19 Thread Avi Kivity
On 04/19/2012 01:26 PM, Gleb Natapov wrote:
> > 
> > Unrelated: this pattern is probably common.  Would be nice to have a
> > __deposit_bit() function.
> > 
> What semantics should it have? Set bit A in bitmap B if value C is
> non-zero?
>

void __deposit_bit(bool bit, unsigned index, unsigned long *word)
{
if (bit)
 __set_bit(...)
else
 __clear_bit(...)
}

I think some processors have an instruction for it (not an s390 reference).

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Export offsets of VMCS fields as note information for kdump

2012-04-19 Thread HATAYAMA Daisuke
From: Avi Kivity 
Subject: Re: [PATCH 0/4] Export offsets of VMCS fields as note information for 
kdump
Date: Wed, 18 Apr 2012 14:56:39 +0300

> On 04/18/2012 12:49 PM, zhangyanfei wrote:
>> >>
>> > 
>> > What type of resource?  Can you give an example?
>> > 
>> Sorry. No concrete example for now.
>>
>> We are developing this on a conservative policy and I have put the vmcs 
>> processing
>> in a new module in patch set v2 as you required. The new module is 
>> auto-loaded when
>> the vmx cpufeature is detected and it depends on module kvm-intel. Loading 
>> and unloading
>> this module will have no side effect on the running guests.
>>
>> And one thing I have to stress is that, we can see guest image as crash dump 
>> from
>> guest machine's view if we have the vmcsinfo, this itself is useful.
> 
> Why is it useful?  Without a concrete example, it's hard to justify the
> code bloat.
> 

The reason why we want to retrieve guest machine's memory image as
crash dump is that then we can debug guest machine's status using
symbolic debugger such as gdb and crash utility.

This is very useful. Please consider the situation where engineers are
forced to look into guest machine's memory image through qemu-kvm's
process core dump using gdb without any symbolic information. It's
very inefficient.

Thanks.
HATAYAMA, Daisuke

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Export offsets of VMCS fields as note information for kdump

2012-04-19 Thread Avi Kivity
On 04/19/2012 01:36 PM, HATAYAMA Daisuke wrote:
> From: Avi Kivity 
> Subject: Re: [PATCH 0/4] Export offsets of VMCS fields as note information 
> for kdump
> Date: Wed, 18 Apr 2012 14:56:39 +0300
>
> > On 04/18/2012 12:49 PM, zhangyanfei wrote:
> >> >>
> >> > 
> >> > What type of resource?  Can you give an example?
> >> > 
> >> Sorry. No concrete example for now.
> >>
> >> We are developing this on a conservative policy and I have put the vmcs 
> >> processing
> >> in a new module in patch set v2 as you required. The new module is 
> >> auto-loaded when
> >> the vmx cpufeature is detected and it depends on module kvm-intel. Loading 
> >> and unloading
> >> this module will have no side effect on the running guests.
> >>
> >> And one thing I have to stress is that, we can see guest image as crash 
> >> dump from
> >> guest machine's view if we have the vmcsinfo, this itself is useful.
> > 
> > Why is it useful?  Without a concrete example, it's hard to justify the
> > code bloat.
> > 
>
> The reason why we want to retrieve guest machine's memory image as
> crash dump is that then we can debug guest machine's status using
> symbolic debugger such as gdb and crash utility.
>
> This is very useful. Please consider the situation where engineers are
> forced to look into guest machine's memory image through qemu-kvm's
> process core dump using gdb without any symbolic information. It's
> very inefficient.

I still don't follow.  If qemu crashed, the values in guest registers
are irrelevant.  In what way can the help debug the qemu crash?

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Introduce bitmask for apic attention reasons.

2012-04-19 Thread Avi Kivity
On 04/19/2012 01:35 PM, Avi Kivity wrote:
> On 04/19/2012 01:26 PM, Gleb Natapov wrote:
> > > 
> > > Unrelated: this pattern is probably common.  Would be nice to have a
> > > __deposit_bit() function.
> > > 
> > What semantics should it have? Set bit A in bitmap B if value C is
> > non-zero?
> >
>
> void __deposit_bit(bool bit, unsigned index, unsigned long *word)
> {
> if (bit)
>  __set_bit(...)
> else
>  __clear_bit(...)
> }
>
> I think some processors have an instruction for it (not an s390 reference).
>

Looks like x86 will get one too: PDEP.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2] Introduce bitmask for apic attention reasons.

2012-04-19 Thread Gleb Natapov
The patch introduces a bitmap that will hold reasons apic should be
checked during vmexit. This is in a preparation for vp eoi patch
that will add one more check on vmexit. With the bitmap we can do
if(apic_attention) to check everything simultaneously which will
add zero overhead on the fast path.

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_host.h |4 
 arch/x86/kvm/lapic.c|   12 +++-
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f624ca7..69e39bc 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -172,6 +172,9 @@ enum {
 #define DR7_FIXED_10x0400
 #define DR7_VOLATILE   0x23ff
 
+/* apic attention bits */
+#define KVM_APIC_CHECK_VAPIC   0
+
 /*
  * We don't want allocation failures within the mmu code, so we preallocate
  * enough memory for a single page fault in a cache.
@@ -337,6 +340,7 @@ struct kvm_vcpu_arch {
u64 efer;
u64 apic_base;
struct kvm_lapic *apic;/* kernel irqchip context */
+   unsigned long apic_attention;
int32_t apic_arb_prio;
int mp_state;
int sipi_vector;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 992b4ea..93c1574 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1088,6 +1088,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
apic_update_ppr(apic);
 
vcpu->arch.apic_arb_prio = 0;
+   vcpu->arch.apic_attention = 0;
 
apic_debug(KERN_INFO "%s: vcpu=%p, id=%d, base_msr="
   "0x%016" PRIx64 ", base_address=0x%0lx.\n", __func__,
@@ -1287,7 +1288,7 @@ void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
u32 data;
void *vapic;
 
-   if (!irqchip_in_kernel(vcpu->kvm) || !vcpu->arch.apic->vapic_addr)
+   if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
return;
 
vapic = kmap_atomic(vcpu->arch.apic->vapic_page);
@@ -1304,7 +1305,7 @@ void kvm_lapic_sync_to_vapic(struct kvm_vcpu *vcpu)
struct kvm_lapic *apic;
void *vapic;
 
-   if (!irqchip_in_kernel(vcpu->kvm) || !vcpu->arch.apic->vapic_addr)
+   if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
return;
 
apic = vcpu->arch.apic;
@@ -1324,10 +1325,11 @@ void kvm_lapic_sync_to_vapic(struct kvm_vcpu *vcpu)
 
 void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr)
 {
-   if (!irqchip_in_kernel(vcpu->kvm))
-   return;
-
vcpu->arch.apic->vapic_addr = vapic_addr;
+   if (vapic_addr)
+   __set_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention);
+   else
+   __clear_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention);
 }
 
 int kvm_x2apic_msr_write(struct kvm_vcpu *vcpu, u32 msr, u64 data)
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] Introduce bitmask for apic attention reasons.

2012-04-19 Thread Gleb Natapov
Forget changelog:
v1->v2:
 - Add a comment before define of attention bits.

On Thu, Apr 19, 2012 at 02:06:29PM +0300, Gleb Natapov wrote:
> The patch introduces a bitmap that will hold reasons apic should be
> checked during vmexit. This is in a preparation for vp eoi patch
> that will add one more check on vmexit. With the bitmap we can do
> if(apic_attention) to check everything simultaneously which will
> add zero overhead on the fast path.
> 
> Signed-off-by: Gleb Natapov 
> ---
>  arch/x86/include/asm/kvm_host.h |4 
>  arch/x86/kvm/lapic.c|   12 +++-
>  2 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f624ca7..69e39bc 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -172,6 +172,9 @@ enum {
>  #define DR7_FIXED_1  0x0400
>  #define DR7_VOLATILE 0x23ff
>  
> +/* apic attention bits */
> +#define KVM_APIC_CHECK_VAPIC 0
> +
>  /*
>   * We don't want allocation failures within the mmu code, so we preallocate
>   * enough memory for a single page fault in a cache.
> @@ -337,6 +340,7 @@ struct kvm_vcpu_arch {
>   u64 efer;
>   u64 apic_base;
>   struct kvm_lapic *apic;/* kernel irqchip context */
> + unsigned long apic_attention;
>   int32_t apic_arb_prio;
>   int mp_state;
>   int sipi_vector;
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 992b4ea..93c1574 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1088,6 +1088,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
>   apic_update_ppr(apic);
>  
>   vcpu->arch.apic_arb_prio = 0;
> + vcpu->arch.apic_attention = 0;
>  
>   apic_debug(KERN_INFO "%s: vcpu=%p, id=%d, base_msr="
>  "0x%016" PRIx64 ", base_address=0x%0lx.\n", __func__,
> @@ -1287,7 +1288,7 @@ void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
>   u32 data;
>   void *vapic;
>  
> - if (!irqchip_in_kernel(vcpu->kvm) || !vcpu->arch.apic->vapic_addr)
> + if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
>   return;
>  
>   vapic = kmap_atomic(vcpu->arch.apic->vapic_page);
> @@ -1304,7 +1305,7 @@ void kvm_lapic_sync_to_vapic(struct kvm_vcpu *vcpu)
>   struct kvm_lapic *apic;
>   void *vapic;
>  
> - if (!irqchip_in_kernel(vcpu->kvm) || !vcpu->arch.apic->vapic_addr)
> + if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
>   return;
>  
>   apic = vcpu->arch.apic;
> @@ -1324,10 +1325,11 @@ void kvm_lapic_sync_to_vapic(struct kvm_vcpu *vcpu)
>  
>  void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr)
>  {
> - if (!irqchip_in_kernel(vcpu->kvm))
> - return;
> -
>   vcpu->arch.apic->vapic_addr = vapic_addr;
> + if (vapic_addr)
> + __set_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention);
> + else
> + __clear_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention);
>  }
>  
>  int kvm_x2apic_msr_write(struct kvm_vcpu *vcpu, u32 msr, u64 data)
> -- 
> 1.7.7.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Introduce bitmask for apic attention reasons.

2012-04-19 Thread Gleb Natapov
On Thu, Apr 19, 2012 at 12:53:12PM +0300, Avi Kivity wrote:
> On 04/19/2012 12:33 PM, Gleb Natapov wrote:
> > The patch introduces a bitmap that will hold reasons apic should be
> > checked during vmexit. This is in a preparation for vp eoi patch
> > that will add one more check on vmexit. With the bitmap we can do
> > if(apic_attention) to check everything simultaneously which will
> > add zero overhead on the fast path.
> >
> 
> Good idea.
> 
> >  
> > +#define KVM_APIC_CHECK_VAPIC   0
> 
> Comment above relating this to apic_attention.
> 
OK.

> > vcpu->arch.apic->vapic_addr = vapic_addr;
> > +   if (vapic_addr)
> > +   __set_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention);
> > +   else
> > +   __clear_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention);
> >  }
> >
> 
> Unrelated: this pattern is probably common.  Would be nice to have a
> __deposit_bit() function.
> 
What semantics should it have? Set bit A in bitmap B if value C is
non-zero?

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Export offsets of VMCS fields as note information for kdump

2012-04-19 Thread HATAYAMA Daisuke
From: Avi Kivity 
Subject: Re: [PATCH 0/4] Export offsets of VMCS fields as note information for 
kdump
Date: Thu, 19 Apr 2012 13:42:42 +0300

> On 04/19/2012 01:36 PM, HATAYAMA Daisuke wrote:
>> From: Avi Kivity 
>> Subject: Re: [PATCH 0/4] Export offsets of VMCS fields as note information 
>> for kdump
>> Date: Wed, 18 Apr 2012 14:56:39 +0300
>>
>> > On 04/18/2012 12:49 PM, zhangyanfei wrote:
>> >> >>
>> >> > 
>> >> > What type of resource?  Can you give an example?
>> >> > 
>> >> Sorry. No concrete example for now.
>> >>
>> >> We are developing this on a conservative policy and I have put the vmcs 
>> >> processing
>> >> in a new module in patch set v2 as you required. The new module is 
>> >> auto-loaded when
>> >> the vmx cpufeature is detected and it depends on module kvm-intel. 
>> >> Loading and unloading
>> >> this module will have no side effect on the running guests.
>> >>
>> >> And one thing I have to stress is that, we can see guest image as crash 
>> >> dump from
>> >> guest machine's view if we have the vmcsinfo, this itself is useful.
>> > 
>> > Why is it useful?  Without a concrete example, it's hard to justify the
>> > code bloat.
>> > 
>>
>> The reason why we want to retrieve guest machine's memory image as
>> crash dump is that then we can debug guest machine's status using
>> symbolic debugger such as gdb and crash utility.
>>
>> This is very useful. Please consider the situation where engineers are
>> forced to look into guest machine's memory image through qemu-kvm's
>> process core dump using gdb without any symbolic information. It's
>> very inefficient.
> 
> I still don't follow.  If qemu crashed, the values in guest registers
> are irrelevant.  In what way can the help debug the qemu crash?
> 

It would be not helpful for the qemu crash case you are concerned
about. We want to use the guest state data to look into guest
machine's image in the crasshed qemu.

Thanks.
HATAYAMA, Daisuke

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Export offsets of VMCS fields as note information for kdump

2012-04-19 Thread Avi Kivity
On 04/19/2012 02:27 PM, HATAYAMA Daisuke wrote:
> >> The reason why we want to retrieve guest machine's memory image as
> >> crash dump is that then we can debug guest machine's status using
> >> symbolic debugger such as gdb and crash utility.
> >>
> >> This is very useful. Please consider the situation where engineers are
> >> forced to look into guest machine's memory image through qemu-kvm's
> >> process core dump using gdb without any symbolic information. It's
> >> very inefficient.
> > 
> > I still don't follow.  If qemu crashed, the values in guest registers
> > are irrelevant.  In what way can the help debug the qemu crash?
> > 
>
> It would be not helpful for the qemu crash case you are concerned
> about. We want to use the guest state data to look into guest
> machine's image in the crasshed qemu.

Why?

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: lock slots_lock around device assignment

2012-04-19 Thread Marcelo Tosatti
On Tue, Apr 17, 2012 at 09:46:44PM -0600, Alex Williamson wrote:
> As pointed out by Jason Baron, when assigning a device to a guest
> we first set the iommu domain pointer, which enables mapping
> and unmapping of memory slots to the iommu.  This leaves a window
> where this path is enabled, but we haven't synchronized the iommu
> mappings to the existing memory slots.  Thus a slot being removed
> at that point could send us down unexpected code paths removing
> non-existent pinnings and iommu mappings.  Take the slots_lock
> around creating the iommu domain and initial mappings as well as
> around iommu teardown to avoid this race.
> 
> Signed-off-by: Alex Williamson 

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] KVM updates for 3.4.0-rc3

2012-04-19 Thread Marcelo Tosatti


Linus, please pull from

git://git.kernel.org/pub/scm/virt/kvm/kvm.git master

To receive the following updates

Alex Williamson (2):
  KVM: unmap pages from the iommu when slots are removed
  KVM: lock slots_lock around device assignment

Avi Kivity (1):
  KVM: VMX: Fix kvm_set_shared_msr() called in preemptible context

Gleb Natapov (1):
  KVM: PMU emulation: GLOBAL_CTRL MSR should be enabled on reset


 arch/x86/kvm/pmu.c   |   18 +-
 arch/x86/kvm/vmx.c   |5 -
 include/linux/kvm_host.h |6 ++
 virt/kvm/iommu.c |   30 +-
 virt/kvm/kvm_main.c  |5 +++--
 5 files changed, 43 insertions(+), 21 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Export offsets of VMCS fields as note information for kdump

2012-04-19 Thread HATAYAMA Daisuke
From: Avi Kivity 
Subject: Re: [PATCH 0/4] Export offsets of VMCS fields as note information for 
kdump
Date: Thu, 19 Apr 2012 14:31:56 +0300

> On 04/19/2012 02:27 PM, HATAYAMA Daisuke wrote:
>> >> The reason why we want to retrieve guest machine's memory image as
>> >> crash dump is that then we can debug guest machine's status using
>> >> symbolic debugger such as gdb and crash utility.
>> >>
>> >> This is very useful. Please consider the situation where engineers are
>> >> forced to look into guest machine's memory image through qemu-kvm's
>> >> process core dump using gdb without any symbolic information. It's
>> >> very inefficient.
>> > 
>> > I still don't follow.  If qemu crashed, the values in guest registers
>> > are irrelevant.  In what way can the help debug the qemu crash?
>> > 
>>
>> It would be not helpful for the qemu crash case you are concerned
>> about. We want to use the guest state data to look into guest
>> machine's image in the crasshed qemu.
> 
> Why?
> 

It seems natural to check the situation from guest machine's side when
qemu crashs. Suppose a service is running on the guest machine, and
then the qemu crash. Then, we may need to know the details of the
progress of the service if it's important. What has been successfully
done, and what has not yet.

Thanks.
HATAYAMA, Daisuke

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Export offsets of VMCS fields as note information for kdump

2012-04-19 Thread Avi Kivity
On 04/19/2012 03:01 PM, HATAYAMA Daisuke wrote:
> >> It would be not helpful for the qemu crash case you are concerned
> >> about. We want to use the guest state data to look into guest
> >> machine's image in the crasshed qemu.
> > 
> > Why?
> > 
>
> It seems natural to check the situation from guest machine's side when
> qemu crashs. Suppose a service is running on the guest machine, and
> then the qemu crash. Then, we may need to know the details of the
> progress of the service if it's important. What has been successfully
> done, and what has not yet.

How can a service on the guest be related to a qemu crash?  And how
would guest registers help?

You can extract the list of running processes from a qemu crash dump
without looking at guest registers.  And most vcpus are running
asynchronously to qemu, so their register contents is irrelevant (if a
vcpu is running synchronously with qemu - it just exited to qemu and is
waiting for a response - then you'd see the details in qemu's call stack).

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Call for Proposals: Linux Plumbers Conference (Aug 2012)

2012-04-19 Thread Amit Shah
Hello,

The Call for Proposals for the Linux Plumbers Conf 2012 is out.  We're
looking for speakers to talk at the Virtualization microconference as
well as the main conference.  The deadline for proposal submissions is
1st May.  This year's edition of LPC is co-located with LinuxCon NA
and will be held at San Diego from the 29th to the 31st of August.

More details are at:

http://www.linuxplumbersconf.org/2012/2012-lpc-call-for-proposals/

Please note: to submit a proposal for the virt microconf, use the
'lpc2012-virt-' prefix in the Name field.

LPC is oriented towards solving problems, and good proposal topics are
those which are unresolved problems or proposals that need interaction
with multiple subsystems.

Please see the CFP page linked above for more details.

Thanks,

Amit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SOLVED] 2.6.32 stuck in flush_tlb_others_ipi()

2012-04-19 Thread Philipp Hahn
Hello,

good news:

On Friday 30 March 2012 19:44:50 you wrote:
> On Monday 09 January 2012 12:41:41 Philipp Hahn wrote:
> > one of our VMs regularly get stuck: the VM is completely unresponsive (no
> > ssh, no serial console, no VNC). Using "gdbserver" and a remote system to
> > debug the running VM, I see 3 CPUs (1,3,4) stuck in
> >  pgd_alloc() → spin_lock_irqsave(pgd_lock)
> > while the 4th CPU (2) is waiting in
> >  pgd_alloc() → pgd_prepopulate_pmb() →... →  flush_tlb_others_ipi()
> >
> > 195 while
> > (!cpumask_empty(to_cpumask(f->flush_cpumask))) 196
> >cpu_relax();
> > (gdb) print f->flush_cpumask
> > $5 = {1}
> >
> > CPU 1 is duing a do_exec() syscall, will CPU 2-4 are doing a do_fork()
> > syscall according to "thread apply all backtrace".

It'a guest kernel bug already fixed in v2.6.38 [1], but not (yet) back-ported 
to 2.6.32-longterm. [2] fixed a bug with TLB flushing when using PAE, which 
made the hidden bug trigger a lot more often. It only happens when using a 
PAE enabled guest kernel with >=2 CPUs.
Full details are in our German Bugzilla [3].

[1] 

[2] 

[3] 

Sincerely
Philipp
-- 
Philipp Hahn   Open Source Software Engineer  h...@univention.de
Univention GmbHbe open.   fon: +49 421 22 232- 0
Mary-Somerville-Str.1  D-28359 Bremen fax: +49 421 22 232-99
   http://www.univention.de/


signature.asc
Description: This is a digitally signed message part.


[RFC PATCH 0/9] ACPI memory hotplug

2012-04-19 Thread Vasilis Liaskovitis
This is a prototype for ACPI memory hotplug on x86_64 target. Based on some
earlier work and comments from Gleb.

Memslot devices are modeled with a new qemu command line 

"-memslot id=name,start=start_addr,size=sz,node=pxm"

user is responsible for defining memslots with meaningful start/size values,
e.g. not defining a memory slot over a PCI-hole. Alternatively, the start size
could also be handled/assigned automatically from the specific emulated hardware
(for hw/pc.c PCI hole is currently [below_4g_mem_size, 4G), so hotplugged memory
should start from max(4G, above_4g_mem_size).

Node is defining numa proximity for this memslot. When not defined it defaults
to zero.

e.g. "-memslot id=hot1,start=4294967296,size=536870912,node=0"
will define a 512M memory slot starting at physical address 4G, belonging to 
numa node 0.

Memory slots are added or removed with a new hmp command "memslot":
Hot-add syntax: "memslot id add"
Hot-remove syntax: "memslot id delete"

- All memslots are initially unpopulated. Memslots are currently modeling only
hotplug-able memory slots i.e. initial system memory is not modeled with
memslots. The concept could be generalized to include all memory though, or it
could more closely follow kvm-memory slots.

- Memslots are abstracted as qdevices attached to the main system bus. However,
memory hotplugging has its own side channel ignoring main_system_bus's hotplug
incapability. A cleaner integration would be needed. What's  the preferred
way of modeling memory devices in qom? Would it be better to attach memory
devices as children-links of an acpi-capable device (in the pc case acpi_piix4)
instead of the system bus?

- Refcounting memory slots has been discussed (but is not included in this 
series yet). Depopulating a memory region happens on a guestOS _EJ callback,
which means the guestOS will not be using the region anymore. However, guest
addresses from the depopulated region need to also be unmapped from the qemu
address space using cpu_physical_memory_unmap(). Does 
memory_region_del_subregion()
or some other memory API call guarantee that a memoryregion has been unmapped
from qemu's address space?

- What is the expected behaviour of hotplugged memory after a reboot? Is it
supposed to be persistent after reboots? The last 2 patches in the series try to
make hotplugged memslots persistent after reboot by creating and consulting e820
map entries.  A better solution is needed for hot-remove after a reboot, because
e820 entries can be merged.

series is based on uq/master for qemu-kvm, and master for seabios. Can be found
also at:


Vasilis Liaskovitis (9):
  Seabios: Add SSDT memory device support
  Seabios, acpi: Implement acpi-dsdt functions for memory hotplug.
  Seabios, acpi: generate hotplug memory devices.
  Implement memslot device abstraction
  acpi_piix4: Implement memory device hotplug registers and handlers. 
  pc: pass paravirt info for hotplug memory slots to BIOS
  Implement memslot command-line option and memslot hmp monitor command
  pc: adjust e820 map on hot-add and hot-remove
  Seabios, acpi: enable memory devices if e820 entry is present

 Makefile.objs   |2 +-
 hmp-commands.hx |   15 
 hw/acpi_piix4.c |  103 +++-
 hw/memslot.c|  201 +++
 hw/memslot.h|   44 
 hw/pc.c |   87 ++--
 hw/pc.h |1 +
 monitor.c   |8 ++
 monitor.h   |1 +
 qemu-config.c   |   25 +++
 qemu-options.hx |8 ++
 sysemu.h|1 +
 vl.c|   44 -
 13 files changed, 528 insertions(+), 12 deletions(-)
 create mode 100644 hw/memslot.c
 create mode 100644 hw/memslot.h

 create mode 100644 src/ssdt-mem.dsl
 src/acpi-dsdt.dsl |   68 ++-
 src/acpi.c|  155 +++--
 src/memmap.c  |   15 +
 src/ssdt-mem.dsl  |   66 ++
 4 files changed, 298 insertions(+), 6 deletions(-)
 create mode 100644 src/ssdt-mem.dsl

-- 
1.7.9

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 3/9][SeaBIOS] acpi: generate hotplug memory devices.

2012-04-19 Thread Vasilis Liaskovitis
 The memory device generation is guided by qemu paravirt info. Seabios
 first uses the info to setup SRAT entries for the hotplug-able memory slots.
 Afterwards, build_memssdt uses the created SRAT entries to generate
 appropriate memory device objects. One memory device (and corresponding SRAT
 entry) is generated for each hotplug-able qemu memslot. Currently no SSDT
 memory device is created for initial system memory (the method can be
 generalized to all memory though).

 Signed-off-by: Vasilis Liaskovitis 
---
 src/acpi.c |  151 ++--
 1 files changed, 147 insertions(+), 4 deletions(-)

diff --git a/src/acpi.c b/src/acpi.c
index 30888b9..5580099 100644
--- a/src/acpi.c
+++ b/src/acpi.c
@@ -484,6 +484,131 @@ build_ssdt(void)
 return ssdt;
 }
 
+static unsigned char ssdt_mem[] = {
+0x5b,0x82,0x47,0x07,0x4d,0x50,0x41,0x41,
+0x08,0x49,0x44,0x5f,0x5f,0x0a,0xaa,0x08,
+0x5f,0x48,0x49,0x44,0x0c,0x41,0xd0,0x0c,
+0x80,0x08,0x5f,0x50,0x58,0x4d,0x0a,0xaa,
+0x08,0x5f,0x43,0x52,0x53,0x11,0x33,0x0a,
+0x30,0x8a,0x2b,0x00,0x00,0x0d,0x03,0x00,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0xef,
+0xbe,0xad,0xde,0x00,0x00,0x00,0x00,0xee,
+0xbe,0xad,0xe6,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x08,0x00,0x00,0x00,0x00,0x79,
+0x00,0x14,0x0f,0x5f,0x53,0x54,0x41,0x00,
+0xa4,0x43,0x4d,0x53,0x54,0x49,0x44,0x5f,
+0x5f,0x14,0x0f,0x5f,0x45,0x4a,0x30,0x01,
+0x4d,0x50,0x45,0x4a,0x49,0x44,0x5f,0x5f,
+0x68
+};
+
+#define SD_OFFSET_MEMHEX 6
+#define SD_OFFSET_MEMID 14
+#define SD_OFFSET_PXMID 31
+#define SD_OFFSET_MEMSTART 55
+#define SD_OFFSET_MEMEND   63
+#define SD_OFFSET_MEMSIZE  79
+
+u64 nb_hp_memslots = 0;
+struct srat_memory_affinity *mem;
+
+static void build_memdev(u8 *ssdt_ptr, int i, u64 mem_base, u64 mem_len, u8 
node)
+{
+memcpy(ssdt_ptr, ssdt_mem, sizeof(ssdt_mem));
+ssdt_ptr[SD_OFFSET_MEMHEX] = getHex(i >> 4);
+ssdt_ptr[SD_OFFSET_MEMHEX+1] = getHex(i);
+ssdt_ptr[SD_OFFSET_MEMID] = i;
+ssdt_ptr[SD_OFFSET_PXMID] = node;
+*(u64*)(ssdt_ptr + SD_OFFSET_MEMSTART) = mem_base;
+*(u64*)(ssdt_ptr + SD_OFFSET_MEMEND) = mem_base + mem_len;
+*(u64*)(ssdt_ptr + SD_OFFSET_MEMSIZE) = mem_len;
+}
+
+static void*
+build_memssdt(void)
+{
+u64 mem_base;
+u64 mem_len;
+u8  node;
+int i;
+struct srat_memory_affinity *entry = mem;
+u64 nb_memdevs = nb_hp_memslots;
+
+int length = ((1+3+4)
+  + (nb_memdevs * sizeof(ssdt_mem))
+  + (1+2+5+(12*nb_memdevs))
+  + (6+2+1+(1*nb_memdevs)));
+u8 *ssdt = malloc_high(sizeof(struct acpi_table_header) + length);
+if (! ssdt) {
+warn_noalloc();
+return NULL;
+}
+u8 *ssdt_ptr = ssdt + sizeof(struct acpi_table_header);
+
+// build Scope(_SB_) header
+*(ssdt_ptr++) = 0x10; // ScopeOp
+ssdt_ptr = encodeLen(ssdt_ptr, length-1, 3);
+*(ssdt_ptr++) = '_';
+*(ssdt_ptr++) = 'S';
+*(ssdt_ptr++) = 'B';
+*(ssdt_ptr++) = '_';
+
+for (i = 0; i < nb_memdevs; i++) {
+mem_base = (((u64)(entry->base_addr_high) << 32 )| 
entry->base_addr_low);
+mem_len = (((u64)(entry->length_high) << 32 )| entry->length_low);
+node = entry->proximity[0];
+build_memdev(ssdt_ptr, i, mem_base, mem_len, node);
+ssdt_ptr += sizeof(ssdt_mem);
+entry++;
+}
+
+// build "Method(MTFY, 2) {If (LEqual(Arg0, 0x00)) {Notify(CM00, Arg1)} 
...}"
+*(ssdt_ptr++) = 0x14; // MethodOp
+ssdt_ptr = encodeLen(ssdt_ptr, 2+5+(12*nb_memdevs), 2);
+*(ssdt_ptr++) = 'M';
+*(ssdt_ptr++) = 'T';
+*(ssdt_ptr++) = 'F';
+*(ssdt_ptr++) = 'Y';
+*(ssdt_ptr++) = 0x02;
+for (i=0; i> 4);
+*(ssdt_ptr++) = getHex(i);
+*(ssdt_ptr++) = 0x69; // Arg1Op
+}
+
+// build "Name(MEON, Package() { One, One, ..., Zero, Zero, ... })"
+*(ssdt_ptr++) = 0x08; // NameOp
+*(ssdt_ptr++) = 'M';
+*(ssdt_ptr++) = 'E';
+*(ssdt_ptr++) = 'O';
+*(ssdt_ptr++) = 'N';
+*(ssdt_ptr++) = 0x12; // PackageOp
+ssdt_ptr = encodeLen(ssdt_ptr, 2+1+(1*nb_memdevs), 2);
+*(ssdt_ptr++) = nb_memdevs;
+
+entry = mem;
+
+for (i = 0; i < nb_memdevs; i++) {
+mem_base = (((u64)(entry->base_addr_high) << 32 )| 
entry->base_addr_low);
+mem_len = (((u64)(entry->length_high) << 32 )| entry->length_low);
+*(ssdt_ptr++) = 0x00;
+entry++;
+}
+build_header((void*)ssdt, SSDT_SIGNATURE, ssdt_ptr - ssdt, 1);
+
+return ssdt;
+}
+
 #include "ssdt-pcihp.hex"
 
 #define PCI_RMV_BASE 0xae0c
@@ -580,18 +705,21 @@ build_srat(void)
 if (nb_numa_nodes == 0)
 return NULL;
 
-u64 *numadata = malloc_tmphigh(sizeof(u64) * (MaxCountCPUs + 
nb_numa_nodes));
+qemu_cfg_get_numa_data(&nb_hp_memslots, 1);
+
+u64 *numadata = malloc_tmphigh(sizeof(u64) * (MaxCountCPUs + nb_numa_nodes 
+
+3 * nb_hp_memslots));
 if (!n

[RFC PATCH 1/9][SeaBIOS] Add SSDT memory device support

2012-04-19 Thread Vasilis Liaskovitis
Define SSDT hotplug-able memory devices in _SB namespace. The dynamically
generated SSDT includes per memory device hotplug methods. These methods
just call methods defined in the DSDT. Also dynamically generate a MTFY
method and a MEON array of the online/available memory devices.  Add file
src/ssdt-mem.dsl with directions for generating the per-memory device
processor object AML code.
The design is taken from SSDT cpu generation.

Signed-off-by: Vasilis Liaskovitis 
---
 src/ssdt-mem.dsl |   66 ++
 1 files changed, 66 insertions(+), 0 deletions(-)
 create mode 100644 src/ssdt-mem.dsl

diff --git a/src/ssdt-mem.dsl b/src/ssdt-mem.dsl
new file mode 100644
index 000..9586643
--- /dev/null
+++ b/src/ssdt-mem.dsl
@@ -0,0 +1,66 @@
+/* This file is the basis for the ssdt_mem[] variable in src/acpi.c.
+ * It is similar in design to the ssdt_proc variable.  
+ * It defines the contents of the per-cpu Processor() object.  At
+ * runtime, a dynamically generated SSDT will contain one copy of this
+ * AML snippet for every possible memory device in the system.  The 
+ * objects will * be placed in the \_SB_ namespace.
+ *
+ * To generate a new ssdt_memc[], run the commands:
+ *   cpp -P src/ssdt-mem.dsl > out/ssdt-mem.dsl.i
+ *   iasl -ta -p out/ssdt-mem out/ssdt-mem.dsl.i
+ *   tail -c +37 < out/ssdt-mem.aml | hexdump -e '"" 8/1 "0x%02x," "\n"'
+ * and then cut-and-paste the output into the src/acpi.c ssdt_mem[]
+ * array.
+ *
+ * In addition to the aml code generated from this file, the
+ * src/acpi.c file creates a MEMNTFY method with an entry for each memdevice:
+ * Method(MTFY, 2) {
+ * If (LEqual(Arg0, 0x00)) { Notify(MP00, Arg1) }
+ * If (LEqual(Arg0, 0x01)) { Notify(MP01, Arg1) }
+ * ...
+ * }
+ * and a MEON array with the list of active and inactive memory devices:
+ * Name(MEON, Package() { One, One, ..., Zero, Zero, ... })
+ */
+DefinitionBlock ("ssdt-mem.aml", "SSDT", 0x02, "BXPC", "CSSDT", 0x1)
+/*  v-- DO NOT EDIT --v */
+{
+Device(MPAA) {
+Name(ID, 0xAA)
+/*  ^-- DO NOT EDIT --^
+ *
+ * The src/acpi.c code requires the above layout so that it can update
+ * MPAA and 0xAA with the appropriate MEMDEVICE id (see
+ * SD_OFFSET_MEMHEX/MEMID1/MEMID2).  Don't change the above without
+ * also updating the C code.
+ */
+Name(_HID, EISAID("PNP0C80"))
+Name(_PXM, 0xAA)
+
+External(CMST, MethodObj)
+External(MPEJ, MethodObj)
+
+Name(_CRS, ResourceTemplate() {
+QwordMemory(
+   ResourceConsumer,
+   ,
+   MinFixed,
+   MaxFixed,
+   Cacheable,
+   ReadWrite, 
+   0x0, 
+   0xDEADBEEF, 
+   0xE6ADBEEE, 
+   0x,
+   0x0800, 
+   )
+})
+Method (_STA, 0) {
+Return(CMST(ID))
+}
+Method (_EJ0, 1, NotSerialized) {
+MPEJ(ID, Arg0)
+}
+}
+}
+
-- 
1.7.9

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 4/9] Implement memslot device abstraction

2012-04-19 Thread Vasilis Liaskovitis
 Each hotplug-able memory slot is a SysBusDevice. All memslots are initially
 unpopulated. A hot-add operation for a particular memory slot creates a new
 MemoryRegion of the given physical address offset, size and node proximity,
 and attaches it to main system memory as a sub_region. A hot-remove operation
 detaches and frees the MemoryRegion from system memory.

 This is an early prototype and lacks proper qdev integration: a separate
 hotplug mechanism/side-channel is used and main system bus hotplug
 capability is ignored.

Signed-off-by: Vasilis Liaskovitis 
---
 hw/memslot.c |  195 ++
 hw/memslot.h |   44 +
 2 files changed, 239 insertions(+), 0 deletions(-)
 create mode 100644 hw/memslot.c
 create mode 100644 hw/memslot.h

diff --git a/hw/memslot.c b/hw/memslot.c
new file mode 100644
index 000..b100824
--- /dev/null
+++ b/hw/memslot.c
@@ -0,0 +1,195 @@
+/*
+ * MemorySlot device for Memory Hotplug
+ *
+ * Copyright ProfitBricks GmbH 2012
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 
+ */
+
+#include "trace.h"
+#include "qdev.h"
+#include "memslot.h"
+#include "../exec-memory.h"
+
+static DeviceState *memslot_hotplug_qdev;
+static memslot_hotplug_fn memslot_hotplug;
+
+static Property memslot_properties[] = {
+DEFINE_PROP_END_OF_LIST()
+};
+
+void memslot_populate(MemSlotState *s)
+{
+char buf[32];
+MemoryRegion *new = NULL;
+
+sprintf(buf, "memslot%u", s->idx);
+new = g_malloc(sizeof(MemoryRegion));
+memory_region_init_ram(new, buf, s->size);
+vmstate_register_ram_global(new);
+memory_region_add_subregion(get_system_memory(), s->start, new);
+s->mr = new;
+s->populated = 1;
+}
+
+void memslot_depopulate(MemSlotState *s)
+{
+assert(s);
+if (s->populated) {
+vmstate_unregister_ram(s->mr, NULL);
+memory_region_del_subregion(get_system_memory(), s->mr);
+memory_region_destroy(s->mr);
+s->populated = 0;
+s->mr = NULL;
+}
+}
+
+MemSlotState *memslot_create(char *id, target_phys_addr_t start, uint64_t size,
+uint64_t node, uint32_t memslot_idx)
+{
+DeviceState *dev;
+MemSlotState *mdev;
+
+dev = sysbus_create_simple("memslot", -1, NULL);
+dev->id = id;
+
+mdev = MEMSLOT(dev);
+mdev->idx = memslot_idx;
+mdev->start = start;
+mdev->size = size;
+mdev->node = node;
+
+return mdev;
+}
+
+void memslot_register_hotplug(memslot_hotplug_fn hotplug, DeviceState *qdev)
+{
+memslot_hotplug_qdev = qdev;
+memslot_hotplug = hotplug;
+}
+
+static MemSlotState *memslot_find(char *id)
+{
+DeviceState *qdev;
+qdev = qdev_find_recursive(sysbus_get_default(), id);
+if (qdev)
+return MEMSLOT(qdev);
+return NULL;
+}
+
+int memslot_do(Monitor *mon, const QDict *qdict)
+{
+MemSlotState *slot = NULL;
+
+char *id = (char*) qdict_get_try_str(qdict, "id");
+if (!id) {
+fprintf(stderr, "ERROR %s invalid id\n",__FUNCTION__);
+return 1;
+}
+
+slot = memslot_find(id);
+
+if (!slot) {
+fprintf(stderr, "%s no slot %s found\n", __FUNCTION__, id);
+return 1;
+}
+
+char *action = (char*) qdict_get_try_str(qdict, "action");
+if (!action || (strcmp(action, "add") && strcmp(action, "delete"))) {
+fprintf(stderr, "ERROR %s invalid action\n", __FUNCTION__);
+return 1;
+}
+
+if (!strcmp(action, "add")) {
+if (slot->populated) {
+fprintf(stderr, "ERROR %s slot %s already populated\n",
+__FUNCTION__, id);
+return 1;
+}
+memslot_populate(slot);
+if (memslot_hotplug)
+memslot_hotplug(memslot_hotplug_qdev, (SysBusDevice*)slot, 1);
+}
+else {
+if (!slot->populated) {
+fprintf(stderr, "ERROR %s slot %s is not populated\n",
+__FUNCTION__, id);
+return 1;
+}
+if (memslot_hotplug)
+memslot_hotplug(memslot_hotplug_qdev, (SysBusDevice*)slot, 0);
+}
+return 0;
+}
+
+MemSlotState *memslot_find_from_idx(uint32_t idx)
+{
+Error *err = NULL;
+DeviceState *dev;
+MemSlotState *slot;
+char *type;
+BusState *bus = sysbus_get_default();
+QTAILQ_FOREACH(dev, &bus->children, sibling) {
+type = object_property_ge

[RFC PATCH 5/9] acpi_piix4: Implement memory device hotplug registers

2012-04-19 Thread Vasilis Liaskovitis
 A 32-byte register is used to present up to 256 hotplug-able memory devices
 to BIOS and OSPM. Hot-add and hot-remove functions trigger an ACPI hotplug
 event through these. Only reads are allowed from these registers (from
 BIOS/OSPM perspective). "memslot id add" will immediately populate the new
 memslot (a new MemoryRegion is created and attached to system memory), and
 then trigger the ACPI hot-add event. "memslot id delete" triggers the ACPI
 hot-remove event but needs to wait for OSPM to eject the device.  We use a
 second set of eject registers to know when OSPM has called the _EJ function
 for a particular memslot. A write to these will depopulate the corresponding
 memslot i.e. detach and free the MemoryRegion. Only writes to the eject
 registers are allowed.

 A new property mem_acpi_hotplug should enable these memory hotplug registers
 for future machine types (not yet implemented in this version).

Signed-off-by: Vasilis Liaskovitis 
---
 hw/acpi_piix4.c |   93 --
 1 files changed, 89 insertions(+), 4 deletions(-)

diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index 797ed24..a14dd3c 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -27,6 +27,8 @@
 #include "sysemu.h"
 #include "range.h"
 #include "ioport.h"
+#include "sysbus.h"
+#include "memslot.h"
 
 //#define DEBUG
 
@@ -43,9 +45,16 @@
 #define PCI_BASE 0xae00
 #define PCI_EJ_BASE 0xae08
 #define PCI_RMV_BASE 0xae0c
+#define MEM_BASE 0xaf20
+#define MEM_EJ_BASE 0xaf40
 
+#define PIIX4_MEM_HOTPLUG_STATUS 8
 #define PIIX4_PCI_HOTPLUG_STATUS 2
 
+struct gpe_regs {
+uint8_t mems_sts[32];
+};
+
 struct pci_status {
 uint32_t up;
 uint32_t down;
@@ -66,6 +75,7 @@ typedef struct PIIX4PMState {
 int kvm_enabled;
 Notifier machine_ready;
 
+struct gpe_regsgpe;
 /* for pci hotplug */
 struct pci_status pci0_status;
 uint32_t pci0_hotplug_enable;
@@ -86,8 +96,8 @@ static void pm_update_sci(PIIX4PMState *s)
ACPI_BITMASK_POWER_BUTTON_ENABLE |
ACPI_BITMASK_GLOBAL_LOCK_ENABLE |
ACPI_BITMASK_TIMER_ENABLE)) != 0) ||
-(((s->ar.gpe.sts[0] & s->ar.gpe.en[0])
-  & PIIX4_PCI_HOTPLUG_STATUS) != 0);
+(((s->ar.gpe.sts[0] & s->ar.gpe.en[0]) &
+  (PIIX4_PCI_HOTPLUG_STATUS | PIIX4_MEM_HOTPLUG_STATUS)) != 0);
 
 qemu_set_irq(s->irq, sci_level);
 /* schedule a timer interruption if needed */
@@ -432,17 +442,34 @@ type_init(piix4_pm_register_types)
 static uint32_t gpe_readb(void *opaque, uint32_t addr)
 {
 PIIX4PMState *s = opaque;
-uint32_t val = acpi_gpe_ioport_readb(&s->ar, addr);
+uint32_t val = 0;
+struct gpe_regs *g = &s->gpe;
+
+switch (addr) {
+case MEM_BASE ... MEM_BASE+31:
+val = g->mems_sts[addr - MEM_BASE];
+break;
+default:
+val = acpi_gpe_ioport_readb(&s->ar, addr);
+}
 
 PIIX4_DPRINTF("gpe read %x == %x\n", addr, val);
 return val;
 }
 
+static void piix4_memslot_eject(uint32_t addr, uint32_t val);
+
 static void gpe_writeb(void *opaque, uint32_t addr, uint32_t val)
 {
 PIIX4PMState *s = opaque;
 
-acpi_gpe_ioport_writeb(&s->ar, addr, val);
+switch (addr) {
+case MEM_EJ_BASE ... MEM_EJ_BASE+31:
+piix4_memslot_eject(addr, val);
+break;
+default:
+acpi_gpe_ioport_writeb(&s->ar, addr, val);
+}
 pm_update_sci(s);
 
 PIIX4_DPRINTF("gpe write %x <== %d\n", addr, val);
@@ -521,9 +548,12 @@ static void pcirmv_write(void *opaque, uint32_t addr, 
uint32_t val)
 static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
 PCIHotplugState state);
 
+static int piix4_memslot_hotplug(DeviceState *qdev, SysBusDevice *dev, int 
add);
+
 static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
 {
 struct pci_status *pci0_status = &s->pci0_status;
+int i = 0;
 
 register_ioport_write(GPE_BASE, GPE_LEN, 1, gpe_writeb, s);
 register_ioport_read(GPE_BASE, GPE_LEN, 1,  gpe_readb, s);
@@ -538,6 +568,13 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, 
PIIX4PMState *s)
 register_ioport_write(PCI_RMV_BASE, 4, 4, pcirmv_write, s);
 register_ioport_read(PCI_RMV_BASE, 4, 4,  pcirmv_read, s);
 
+register_ioport_read(MEM_BASE, 32, 1,  gpe_readb, s);
+register_ioport_write(MEM_EJ_BASE, 32, 1,  gpe_writeb, s);
+for(i = 0; i < 32; i++) {
+s->gpe.mems_sts[i] = 0;
+}
+memslot_register_hotplug(piix4_memslot_hotplug, &s->dev.qdev);
+
 pci_bus_hotplug(bus, piix4_device_hotplug, &s->dev.qdev);
 }
 
@@ -553,6 +590,54 @@ static void disable_device(PIIX4PMState *s, int slot)
 s->pci0_status.down |= (1 << slot);
 }
 
+static void enable_mem_device(PIIX4PMState *s, int memdevice)
+{
+struct gpe_regs *g = &s->gpe;
+s->ar.gpe.sts[0] |= PIIX4_MEM_HOTPLUG_STATUS;
+g->mems_sts[memdevice/8] |= (1 << (memdevice%8));
+}

[RFC PATCH 8/9] pc: adjust e820 map on hot-add and hot-remove

2012-04-19 Thread Vasilis Liaskovitis
 Hotplugged memory is not persistent in the e820 memory maps. After hotplugging
 a memslot and rebooting the VM, the hotplugged device is not present.

 A possible solution is to add an e820 for the new memslot in the acpi_piix4
 hot-add handler. On a reset, Seabios (see next patch in series) will enable all
 memory devices for which it finds an e820 entry that covers the devices's 
address
 range.

 On hot-remove, the acpi_piix4 handler will try to remove the e820 entry
 corresponding to the device. This will work when no VM reboots happen
 between hot-add and hot-remove, but it is not a sufficient solution in
 general: Seabios and GuestOS merge adjacent e820 entries on machine reboot,
 so the sequence hot-add/ rebootVM / hot-remove will fail to remove a
 corresponding e820 entry at the hot-remove phase.

 Signed-off-by: Vasilis Liaskovitis 
---
 hw/acpi_piix4.c |6 ++
 hw/pc.c |   28 
 hw/pc.h |1 +
 3 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index 2921d18..2b5fd04 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -619,6 +619,9 @@ static void piix4_memslot_eject(uint32_t addr, uint32_t val)
 s = memslot_find_from_idx(start + idx);
 assert(s != NULL);
 memslot_depopulate(s);
+if (e820_del_entry(s->start, s->size, E820_RAM) == -EBUSY)
+PIIX4_DPRINTF("failed to remove e820 entry for memslot %u\n",
+   s->idx);
 }
 val = val >> 1;
 idx++;
@@ -634,6 +637,9 @@ static int piix4_memslot_hotplug(DeviceState *qdev, 
SysBusDevice *dev, int
 
 if (add) {
 enable_mem_device(s, slot->idx);
+if (e820_add_entry(slot->start, slot->size, E820_RAM) == -EBUSY)
+PIIX4_DPRINTF("failed to add e820 entry for memslot %u\n",
+slot->idx);
 }
 else {
 disable_mem_device(s, slot->idx);
diff --git a/hw/pc.c b/hw/pc.c
index f1f550a..04d243f 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -593,6 +593,34 @@ int e820_add_entry(uint64_t address, uint64_t length, 
uint32_t type)
 return index;
 }
 
+int e820_del_entry(uint64_t address, uint64_t length, uint32_t type)
+{
+int index = le32_to_cpu(e820_table.count);
+int search;
+struct e820_entry *entry;
+
+if (index == 0)
+return -EBUSY;
+search = index - 1;
+entry = &e820_table.entry[search];
+while (search >= 0) {
+if ((entry->address == cpu_to_le64(address)) &&
+(entry->length == cpu_to_le64(length)) &&
+(entry->type == cpu_to_le32(type))){
+if (search != index - 1) {
+memcpy(&e820_table.entry[search], &e820_table.entry[search + 
1],
+sizeof(struct e820_entry) * (index - search));
+}
+index--;
+e820_table.count = cpu_to_le32(index);
+return 1;
+}
+search--;
+entry = &e820_table.entry[search];
+}
+return -EBUSY;
+}
+
 static void bochs_bios_setup_hp_memslots(uint64_t *fw_cfg_slots);
 
 static void *bochs_bios_init(void)
diff --git a/hw/pc.h b/hw/pc.h
index 74d3369..4925e8c 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -226,5 +226,6 @@ void pc_system_firmware_init(MemoryRegion *rom_memory);
 #define E820_UNUSABLE   5
 
 int e820_add_entry(uint64_t, uint64_t, uint32_t);
+int e820_del_entry(uint64_t, uint64_t, uint32_t);
 
 #endif
-- 
1.7.9

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 9/9][SeaBIOS] enable memory devices if e820 entry is present

2012-04-19 Thread Vasilis Liaskovitis
 On a reboot, seabios regenerates srat/ssdt objects. If a valid e820 entry is
 found spanning the whole address range of a hotplug memory device, the  device
 will be enabled. This ensures persistency of hotplugged memory slots across VM
 reboots.

 Signed-off-by: Vasilis Liaskovitis 
---
 src/acpi.c   |6 +-
 src/memmap.c |   15 +++
 2 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/src/acpi.c b/src/acpi.c
index 5580099..2ebed2e 100644
--- a/src/acpi.c
+++ b/src/acpi.c
@@ -601,7 +601,11 @@ build_memssdt(void)
 for (i = 0; i < nb_memdevs; i++) {
 mem_base = (((u64)(entry->base_addr_high) << 32 )| 
entry->base_addr_low);
 mem_len = (((u64)(entry->length_high) << 32 )| entry->length_low);
-*(ssdt_ptr++) = 0x00;
+if (find_e820(mem_base, mem_len, E820_RAM)) {
+*(ssdt_ptr++) = 0x01;
+}
+else
+*(ssdt_ptr++) = 0x00;
 entry++;
 }
 build_header((void*)ssdt, SSDT_SIGNATURE, ssdt_ptr - ssdt, 1);
diff --git a/src/memmap.c b/src/memmap.c
index 56865b4..9790da1 100644
--- a/src/memmap.c
+++ b/src/memmap.c
@@ -131,6 +131,21 @@ add_e820(u64 start, u64 size, u32 type)
 //dump_map();
 }
 
+// Check if an e820 entry exists that covers the memory range
+// [start, start+size) with same type as type.
+int
+find_e820(u64 start, u64 size, u32 type)
+{
+int i;
+for (i=0; istart <= start) && (e->size >= (size + start - e->start)) &&
+(e->type == type))
+return 1;
+}
+return 0;
+}
+
 // Report on final memory locations.
 void
 memmap_finalize(void)
-- 
1.7.9

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 7/9] Implement memslot command-line option and memslot hmp command

2012-04-19 Thread Vasilis Liaskovitis
 Implement -memslot qemu-kvm command line option to define hotplug-able memory
 slots.
 Syntax: "-memslot id=name,start=addr,size=sz,node=nodeid"

 e.g. "-memslot id=hot1,start=4294967296,size=1073741824,node=0"
 will define a 1G memory slot starting at physical address 4G, belonging to numa
 node 0. Defining no node will automatically add a memslot to node 0.

 Also implement a new hmp monitor command for hot-add and hot-remove of memory 
slots
 Syntax: "memslot slotname action"
 where action is add/delete and slotname is the qdev-id of the memory slot.

 Signed-off-by: Vasilis Liaskovitis 
---
 Makefile.objs   |2 +-
 hmp-commands.hx |   15 +++
 monitor.c   |8 
 monitor.h   |1 +
 qemu-config.c   |   25 +
 qemu-options.hx |8 
 sysemu.h|1 +
 vl.c|   40 
 8 files changed, 99 insertions(+), 1 deletions(-)

diff --git a/Makefile.objs b/Makefile.objs
index 5c3bcda..98ce865 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -240,7 +240,7 @@ hw-obj-$(CONFIG_USB_OHCI) += usb/hcd-ohci.o
 hw-obj-$(CONFIG_USB_EHCI) += usb/hcd-ehci.o
 hw-obj-$(CONFIG_USB_XHCI) += usb/hcd-xhci.o
 hw-obj-$(CONFIG_FDC) += fdc.o
-hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o
+hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o memslot.o
 hw-obj-$(CONFIG_APM) += pm_smbus.o apm.o
 hw-obj-$(CONFIG_DMA) += dma.o
 hw-obj-$(CONFIG_I82374) += i82374.o
diff --git a/hmp-commands.hx b/hmp-commands.hx
index a6f5a84..cadf4ca 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -618,6 +618,21 @@ Add device.
 ETEXI
 
 {
+.name   = "memslot",
+.args_type  = "id:s,action:s",
+.params = "id,action",
+.help   = "add memslot device",
+.user_print = monitor_user_noop,
+.mhandler.cmd_new = do_memslot_add,
+},
+
+STEXI
+@item memslot_add @var{config}
+@findex memslot_add
+
+Add memslot.
+ETEXI
+{
 .name   = "device_del",
 .args_type  = "id:s",
 .params = "device",
diff --git a/monitor.c b/monitor.c
index 8946a10..f672186 100644
--- a/monitor.c
+++ b/monitor.c
@@ -30,6 +30,7 @@
 #include "hw/pci.h"
 #include "hw/watchdog.h"
 #include "hw/loader.h"
+#include "hw/memslot.h"
 #include "gdbstub.h"
 #include "net.h"
 #include "net/slirp.h"
@@ -4675,3 +4676,10 @@ int monitor_read_block_device_key(Monitor *mon, const 
char *device,
 
 return monitor_read_bdrv_key_start(mon, bs, completion_cb, opaque);
 }
+
+int do_memslot_add(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+#if defined(TARGET_I386) || defined(TARGET_X86_64)
+return memslot_do(mon, qdict);
+#endif
+}
diff --git a/monitor.h b/monitor.h
index 0d49800..1e14a63 100644
--- a/monitor.h
+++ b/monitor.h
@@ -80,5 +80,6 @@ int monitor_read_password(Monitor *mon, ReadLineFunc 
*readline_func,
 int qmp_qom_set(Monitor *mon, const QDict *qdict, QObject **ret);
 
 int qmp_qom_get(Monitor *mon, const QDict *qdict, QObject **ret);
+int do_memslot_add(Monitor *mon, const QDict *qdict, QObject **ret_data);
 
 #endif /* !MONITOR_H */
diff --git a/qemu-config.c b/qemu-config.c
index be84a03..1f26187 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -613,6 +613,30 @@ QemuOptsList qemu_boot_opts = {
 },
 };
 
+static QemuOptsList qemu_memslot_opts = {
+.name = "memslot",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_memslot_opts.head),
+.desc = {
+{
+.name = "id",
+.type = QEMU_OPT_STRING,
+},{
+.name = "start",
+.type = QEMU_OPT_SIZE,
+.help = "physical address start for this memslot",
+},{
+.name = "size",
+.type = QEMU_OPT_SIZE,
+.help = "memory size for this memslot",
+},{
+.name = "node",
+.type = QEMU_OPT_NUMBER,
+.help = "NUMA node number (i.e. proximity) for this memslot",
+},
+{ /* end of list */ }
+},
+};
+
 static QemuOptsList *vm_config_groups[32] = {
 &qemu_drive_opts,
 &qemu_chardev_opts,
@@ -628,6 +652,7 @@ static QemuOptsList *vm_config_groups[32] = {
 &qemu_machine_opts,
 &qemu_boot_opts,
 &qemu_iscsi_opts,
+&qemu_memslot_opts,
 NULL,
 };
 
diff --git a/qemu-options.hx b/qemu-options.hx
index a169792..aff0546 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2728,3 +2728,11 @@ HXCOMM This is the last statement. Insert new options 
before this line!
 STEXI
 @end table
 ETEXI
+
+DEF("memslot", HAS_ARG, QEMU_OPTION_memslot,
+"-memslot start=num,size=num,id=name\n"
+"specify unpopulated memory slot",
+QEMU_ARCH_ALL)
+
+
+
diff --git a/sysemu.h b/sysemu.h
index bc2c788..7247099 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -136,6 +136,7 @@ extern QEMUClock *rtc_clock;
 extern int nb_numa_nodes;
 extern uint64_t node_mem[MAX_NODES];
 extern uint64_t node_cpumask[MAX_NODES];
+extern int nb_hp_memslots;
 
 #define MAX_OPTION_ROM

[RFC PATCH 6/9] pc: pass paravirt info for hotplug memory slots to BIOS

2012-04-19 Thread Vasilis Liaskovitis
 The numa_fw_cfg paravirt interface is extended to include SRAT information for
 all hotplug-able memslots. There are 3 words for each hotplug-able memory slot,
 denoting start address, size and node proximity. nb_numa_nodes is set to 1 by
 default (not 0), so that we always pass srat info to SeaBIOS.

 This information is used by Seabios to build hotplug memory device objects at 
runtime.

 Signed-off-by: Vasilis Liaskovitis 
---
 hw/pc.c |   59 +--
 vl.c|4 +++-
 2 files changed, 56 insertions(+), 7 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index 67f0479..f1f550a 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -46,6 +46,7 @@
 #include "ui/qemu-spice.h"
 #include "memory.h"
 #include "exec-memory.h"
+#include "memslot.h"
 
 /* output Bochs bios info messages */
 //#define DEBUG_BIOS
@@ -592,12 +593,15 @@ int e820_add_entry(uint64_t address, uint64_t length, 
uint32_t type)
 return index;
 }
 
+static void bochs_bios_setup_hp_memslots(uint64_t *fw_cfg_slots);
+
 static void *bochs_bios_init(void)
 {
 void *fw_cfg;
 uint8_t *smbios_table;
 size_t smbios_len;
 uint64_t *numa_fw_cfg;
+uint64_t *hp_memslots_fw_cfg;
 int i, j;
 
 register_ioport_write(0x400, 1, 2, bochs_bios_write, NULL);
@@ -630,28 +634,71 @@ static void *bochs_bios_init(void)
 fw_cfg_add_bytes(fw_cfg, FW_CFG_HPET, (uint8_t *)&hpet_cfg,
  sizeof(struct hpet_fw_config));
 /* allocate memory for the NUMA channel: one (64bit) word for the number
- * of nodes, one word for each VCPU->node and one word for each node to
- * hold the amount of memory.
+ * of nodes, one word for the number of hotplug memory slots, one word
+ * for each VCPU->node, one word for each node to hold the amount of 
memory.
+ * Finally three words for each hotplug memory slot, denoting start 
address,
+ * size and node proximity.
  */
-numa_fw_cfg = g_malloc0((1 + max_cpus + nb_numa_nodes) * 8);
+numa_fw_cfg = g_malloc0((2 + max_cpus + nb_numa_nodes + 3 * 
nb_hp_memslots) * 8);
 numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
+numa_fw_cfg[1] = cpu_to_le64(nb_hp_memslots);
+
 for (i = 0; i < max_cpus; i++) {
 for (j = 0; j < nb_numa_nodes; j++) {
 if (node_cpumask[j] & (1 << i)) {
-numa_fw_cfg[i + 1] = cpu_to_le64(j);
+numa_fw_cfg[i + 2] = cpu_to_le64(j);
 break;
 }
 }
 }
 for (i = 0; i < nb_numa_nodes; i++) {
-numa_fw_cfg[max_cpus + 1 + i] = cpu_to_le64(node_mem[i]);
+numa_fw_cfg[max_cpus + 2 + i] = cpu_to_le64(node_mem[i]);
 }
+
+hp_memslots_fw_cfg = numa_fw_cfg + 2 + max_cpus + nb_numa_nodes;
+if (nb_hp_memslots)
+bochs_bios_setup_hp_memslots(hp_memslots_fw_cfg);
+
 fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, (uint8_t *)numa_fw_cfg,
- (1 + max_cpus + nb_numa_nodes) * 8);
+ (2 + max_cpus + nb_numa_nodes + 3 * nb_hp_memslots) * 8);
 
 return fw_cfg;
 }
 
+static void bochs_bios_setup_hp_memslots(uint64_t *fw_cfg_slots)
+{
+int i = 0;
+Error *err = NULL;
+DeviceState *dev;
+MemSlotState *slot;
+char *type;
+BusState *bus = sysbus_get_default();
+
+QTAILQ_FOREACH(dev, &bus->children, sibling) {
+type = object_property_get_str(OBJECT(dev), "type", &err);
+if (err) {
+error_free(err);
+fprintf(stderr, "error getting device type\n");
+exit(1);
+}
+
+if (!strcmp(type, "memslot")) {
+if (!dev->id) {
+error_free(err);
+fprintf(stderr, "error getting memslot device id\n");
+exit(1);
+}
+if (!strcmp(dev->id, "initialslot")) continue;
+slot = MEMSLOT(dev);
+fw_cfg_slots[3 * slot->idx] = cpu_to_le64(slot->start);
+fw_cfg_slots[3 * slot->idx + 1] = cpu_to_le64(slot->size);
+fw_cfg_slots[3 * slot->idx + 2] = cpu_to_le64(slot->node);
+i++;
+}
+}
+assert(i == nb_hp_memslots);
+}
+
 static long get_file_size(FILE *f)
 {
 long where, size;
diff --git a/vl.c b/vl.c
index ae91a8a..50df453 100644
--- a/vl.c
+++ b/vl.c
@@ -3428,8 +3428,10 @@ int main(int argc, char **argv, char **envp)
 
 register_savevm_live(NULL, "ram", 0, 4, NULL, ram_save_live, NULL,
  ram_load, NULL);
+if (!nb_numa_nodes)
+nb_numa_nodes = 1;
 
-if (nb_numa_nodes > 0) {
+{
 int i;
 
 if (nb_numa_nodes > MAX_NODES) {
-- 
1.7.9

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 2/9][SeaBIOS] Implement acpi-dsdt functions for memory hotplug.

2012-04-19 Thread Vasilis Liaskovitis
Extend the DSDT to include methods for handling memory hot-add and hot-remove
notifications and memory device status requests. These functions are called
from the memory device SSDT methods.

Eject has only been tested with level gpe event, but will be changed to edge gpe
event soon, according to recent master patch for other ACPI hotplug events.

Signed-off-by: Vasilis Liaskovitis 
---
 src/acpi-dsdt.dsl |   68 +++-
 1 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 4bdc268..184daf0 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -709,9 +709,72 @@ DefinitionBlock (
 }
 Return(One)
 }
-}
 
+/* Objects filled in by run-time generated SSDT */
+External(MTFY, MethodObj)
+External(MEON, PkgObj)
+
+Method (CMST, 1, NotSerialized) {
+// _STA method - return ON status of memdevice
+// Local0 = MEON flag for this cpu
+Store(DerefOf(Index(MEON, Arg0)), Local0)
+If (Local0) { Return(0xF) } Else { Return(0x0) }
+}
+/* Memory eject notify method */
+OperationRegion(MEMJ, SystemIO, 0xaf40, 32)
+Field (MEMJ, ByteAcc, NoLock, Preserve)
+{
+MPE, 256
+}
+
+Method (MPEJ, 2, NotSerialized) {
+// _EJ0 method - eject callback
+Store(ShiftLeft(1,Arg0), MPE)
+Sleep(200)
+}
+
+/* Memory hotplug notify method */
+OperationRegion(MEST, SystemIO, 0xaf20, 32)
+Field (MEST, ByteAcc, NoLock, Preserve)
+{
+MES, 256
+}
+
+Method(MESC, 0) {
+// Local5 = active memdevice bitmap
+Store (MES, Local5)
+// Local2 = last read byte from bitmap
+Store (Zero, Local2)
+// Local0 = memory device iterator
+Store (Zero, Local0)
+While (LLess(Local0, SizeOf(MEON))) {
+// Local1 = MEON flag for this memory device
+Store(DerefOf(Index(MEON, Local0)), Local1)
+If (And(Local0, 0x07)) {
+// Shift down previously read bitmap byte
+ShiftRight(Local2, 1, Local2)
+} Else {
+// Read next byte from memdevice bitmap
+Store(DerefOf(Index(Local5, ShiftRight(Local0, 3))), 
Local2)
+}
+// Local3 = active state for this memory device
+Store(And(Local2, 1), Local3)
 
+If (LNotEqual(Local1, Local3)) {
+// State change - update MEON with new state
+Store(Local3, Index(MEON, Local0))
+// Do MEM notify
+If (LEqual(Local3, 1)) {
+MTFY(Local0, 1)
+} Else {
+MTFY(Local0, 3)
+}
+}
+Increment(Local0)
+}
+Return(One)
+}
+}
 /
  * General purpose events
  /
@@ -732,7 +795,8 @@ DefinitionBlock (
 Return(\_SB.PRSC())
 }
 Method(_L03) {
-Return(0x01)
+// Memory hotplug event
+Return(\_SB.MESC())
 }
 Method(_L04) {
 Return(0x01)
-- 
1.7.9

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Biweekly upstream qemu-kvm test report (using autotest) - Week 16

2012-04-19 Thread Lucas Meneghel Rodrigues
On Thu, 2012-04-19 at 14:59 +0530, Prem Karat wrote:
> Folks,
>  
> This is a test report on upstream qemu-kvm testing (24853eece248d4a58d705c).
> Tests were executed using latest autotest git (6a15572d5307fa0b).
> 
> This time we have tested 3 guests.
> 
> We are analysing the test results further to find out if the failure is in
> autotest or in qemu-kvm. Will post the analysis asap and report bugs to
> appropriate community.

Ok, great initiative! I have a similar job here, but we use the block
virtio drivers only, and qcow 2. I need to go over your results and see
if they match mine.

Thanks, will keep in touch,

Lucas

> Host Kernel level: 3.4.0-rc2+ (git: 923e9a1399b620d063cd8853).
> Guest OS: Windows 7 64 SP1, Fedora 16 x86-64, RHEL 6.2 x86-64
> 
> Please find the detailed report below for all the 3 guests.
> 
> 
> Host Kernel: Kernel: 3.4.0-rc2+
> KVM Version:  1.0.50 (qemu-kvm-devel)
> Guest OS: Windows 7 64 SP1
> Date: Mon Apr 16 22:14:02 2012
> Stat: From 13 tests executed, 5 have passed (61% failures)
> 
> Tests Failed:
> 
> 
>  Test Name   
> ResultRun time
> 
> kvm.qed.smp4.Win7.64.sp1.nic_hotplug.default.nic_virtio   
> FAIL  108
> kvm.qed.smp4.Win7.64.sp1.nic_hotplug.additional.nic_8139  
> FAIL   75
> kvm.qed.smp4.Win7.64.sp1.nic_hotplug.additional.nic_virtio
> FAIL   17
> kvm.qed.smp4.Win7.64.sp1.nic_hotplug.additional.nic_e1000 
> FAIL  214
> kvm.qed.smp4.Win7.64.sp1.block_hotplug.fmt_qcow2.block_virtio 
> FAIL  115
> kvm.qed.smp4.Win7.64.sp1.block_hotplug.fmt_qcow2.block_scsi   
> FAIL  161
> kvm.qed.smp4.Win7.64.sp1.block_hotplug.fmt_raw.block_virtio   
> FAIL  160
> kvm.qed.smp4.Win7.64.sp1.block_hotplug.fmt_raw.block_scsi 
> FAIL  160
> 
> 
> Tests Passed:
> 
> 
>  Test Name   
> ResultRun time
> 
> kvm.qed.smp4.Win7.64.sp1.boot 
> PASS   70
> kvm.qed.smp4.Win7.64.sp1.reboot   
> PASS   62
> kvm.qed.smp4.Win7.64.sp1.nic_hotplug.default.nic_8139 
> PASS  119
> kvm.qed.smp4.Win7.64.sp1.nic_hotplug.default.nic_e1000
> PASS  184
> kvm.qed.smp4.Win7.64.sp1.shutdown 
> PASS  122
> 
> *
> 
> Host Kernel: Kernel: 3.4.0-rc2+
> KVM Version:  1.0.50 (qemu-kvm-devel)
> Guest OS: Fedora 16 x86-64
> Date: Sat Apr 14 02:02:50 2012
> Stat: From 28 tests executed, 20 have passed (28% failures)
> 
> Tests Failed:
> 
> 
>  Test Name   
> ResultRun time
> 
> kvm.raw.smp4.Fedora.16.64.autotest.bonnie 
> FAIL  179
> kvm.raw.smp4.Fedora.16.64.balloon_check 
> FAIL   91 
> kvm.raw.smp4.Fedora.16.64.balloon_check.balloon-migrate 
> FAIL  151
> kvm.raw.smp4.Fedora.16.64.cgroup.blkio_bandwidth  
> FAIL  189
> kvm.raw.smp4.Fedora.16.64.cgroup.blkio_throttle_multi 
> FAIL  348
> kvm.raw.smp4.Fedora.16.64.cgroup.cpuset_cpus  
> FAIL8
> kvm.raw.smp4.Fedora.16.64.cgroup.memory_move  
> FAIL   50
> kvm.raw.smp4.Fedora.16.64.cpu_hotplug_test
> FAIL   96
> 
> 
> Tests Passed:
> 
> 
>  Test Name   
> ResultRun time
> 
> kvm.raw.smp4.Fedora.16.64.boot
> PASS   38
> kvm.raw.smp4.Fedora.16.64.autotest.dbench  

Re: [RFC PATCH 6/9] pc: pass paravirt info for hotplug memory slots to BIOS

2012-04-19 Thread Avi Kivity
On 04/19/2012 05:08 PM, Vasilis Liaskovitis wrote:
>  The numa_fw_cfg paravirt interface is extended to include SRAT information 
> for
>  all hotplug-able memslots. There are 3 words for each hotplug-able memory 
> slot,
>  denoting start address, size and node proximity. nb_numa_nodes is set to 1 by
>  default (not 0), so that we always pass srat info to SeaBIOS.
>
>  This information is used by Seabios to build hotplug memory device objects 
> at runtime.
>

Please document this ABI.  I don't see an existing place, suggest
docs/specs/fwcfg.txt (only your additions need to be documented).

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 7/9] Implement memslot command-line option and memslot hmp command

2012-04-19 Thread Avi Kivity
On 04/19/2012 05:08 PM, Vasilis Liaskovitis wrote:
>  Implement -memslot qemu-kvm command line option to define hotplug-able memory
>  slots.
>  Syntax: "-memslot id=name,start=addr,size=sz,node=nodeid"
>
>  e.g. "-memslot id=hot1,start=4294967296,size=1073741824,node=0"
>  will define a 1G memory slot starting at physical address 4G, belonging to 
> numa
>  node 0. Defining no node will automatically add a memslot to node 0.

start=4G,size=1G ought to work too, no?

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 0/9] ACPI memory hotplug

2012-04-19 Thread Anthony Liguori

On 04/19/2012 09:08 AM, Vasilis Liaskovitis wrote:

This is a prototype for ACPI memory hotplug on x86_64 target. Based on some
earlier work and comments from Gleb.

Memslot devices are modeled with a new qemu command line

"-memslot id=name,start=start_addr,size=sz,node=pxm"


Hi,

For 1.2, I'd really like to focus on refactoring the PC machine as described in 
this series:


https://github.com/aliguori/qemu/commits/qom-rebase.12

I'd like to represent the guest memory as a "DIMM" device.

In terms of this proposal, I would then expect that the i440fx device would have 
a num_dimms property that controlled how many link's it had.  Hotplug 
would consist of creating a DIMM at run time and connecting it to the 
appropriate link.


One thing that's not clear to me is how the start/size fits in.  On bare metal, 
is this something that's calculated by the firmware during start up and then 
populated in ACPI?   Does it do something like take the largest possible DIMM 
size that it supports and fill out the table?


At any rate, I think we should focus on modeling this in QOM verses adding a new 
option and hacking at the existing memory init code.


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] Biweekly upstream qemu-kvm test report (using autotest) - Week 16

2012-04-19 Thread Prem Karat
On 04/19/12 10:50pm, Amos Kong wrote:
> On Thu, Apr 19, 2012 at 10:13 PM, Lucas Meneghel Rodrigues
> wrote:
> 
> > On Thu, 2012-04-19 at 14:59 +0530, Prem Karat wrote:
> > > Folks,
> > >
> > > This is a test report on upstream qemu-kvm testing
> > (24853eece248d4a58d705c).
> > > Tests were executed using latest autotest git (6a15572d5307fa0b).
> > >
> > > This time we have tested 3 guests.
> > >
> > > We are analysing the test results further to find out if the failure is
> > in
> > > autotest or in qemu-kvm. Will post the analysis asap and report bugs to
> > > appropriate community.
> >
> > Ok, great initiative! I have a similar job here, but we use the block
> > virtio drivers only, and qcow 2. I need to go over your results and see
> > if they match mine.
> >
> > Thanks, will keep in touch,
> >
> 
> 
> Hi Prem,
> 
> Where can we see the whole autotest log results? you can share it to a
> public HTTP server.
> It can help us to analyze the problem.
> 
> Thanks, Amos.

Amos,
As soon as I find out an appropriate public server to post the logs, will do so
and keep you posted. 

-- 
-prem

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[autotest][PATCH 1/3] virt multihost_mig: Repairs bug in starting sequence of tests.

2012-04-19 Thread Jiří Župka
Signed-off-by: Jiří Župka 
---
 client/tests/kvm/multi_host.srv |   22 ++
 1 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/client/tests/kvm/multi_host.srv b/client/tests/kvm/multi_host.srv
index 5aafeda..e54325f 100644
--- a/client/tests/kvm/multi_host.srv
+++ b/client/tests/kvm/multi_host.srv
@@ -22,6 +22,15 @@ AUTOTEST_DIR = job.clientdir
 
 KVM_DIR = os.path.join(AUTOTEST_DIR, 'tests', 'kvm')
 
+CONTROL_MAIN_PART = """
+testname = "kvm"
+bindir = os.path.join(job.testdir, testname)
+job.install_pkg(testname, 'test', bindir)
+
+kvm_test_dir = os.path.join(os.environ['AUTODIR'],'tests', 'kvm')
+sys.path.append(kvm_test_dir)
+"""
+
 try:
 import autotest.common
 except ImportError:
@@ -67,17 +76,10 @@ def run(machines):
 ips = []
 for machine in machines:
 host = _hosts[machine]
-host.control = """
-testname = "kvm"
-bindir = os.path.join(job.testdir, testname)
-job.install_pkg(testname, 'test', bindir)
-
-kvm_test_dir = os.path.join(os.environ['AUTODIR'],'tests', 'kvm')
-sys.path.append(kvm_test_dir)
-"""
 ips.append(host.host.ip)
 
 for params in test_dicts:
+
 params['hosts'] = ips
 
 params['not_preprocess'] = "yes"
@@ -108,6 +110,10 @@ sys.path.append(kvm_test_dir)
 
 for machine in machines:
 host = _hosts[machine]
+host.control = CONTROL_MAIN_PART
+
+for machine in machines:
+host = _hosts[machine]
 host.control += ("job.run_test('kvm', tag='%s', params=%s)" %
  (host.params['shortname'], host.params))
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[autotest][PATCH 2/3] virt: Adds migration over fd for kvm.

2012-04-19 Thread Jiří Župka
Migration over fd:
  source:
1) Make new descriptor (pipe, socket) and send the descriptor
   to qemu-monitor over unix socket.
2) Register the descriptor in qemu-monitor with function
  getfd DSC_NAME
   and close the descriptor in main process.
3) Migrate over descriptor.
  migrate fd:DSC_NAME

  destination:
1) Start child process with open second side of source descriptor.
2) Start machine with param -incoming fd:descriptor
3) Wait for finishing migration.

Signed-off-by: Jiří Župka 
---
 client/virt/kvm_vm.py   |   47 +++---
 client/virt/virt_env_process.py |3 +-
 2 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/client/virt/kvm_vm.py b/client/virt/kvm_vm.py
index 10aafbb..9857998 100644
--- a/client/virt/kvm_vm.py
+++ b/client/virt/kvm_vm.py
@@ -15,7 +15,7 @@ class VM(virt_vm.BaseVM):
 This class handles all basic VM operations.
 """
 
-MIGRATION_PROTOS = ['tcp', 'unix', 'exec']
+MIGRATION_PROTOS = ['tcp', 'unix', 'exec', 'fd']
 
 #
 # By default we inherit all timeouts from the base VM class
@@ -971,7 +971,8 @@ class VM(virt_vm.BaseVM):
 @error.context_aware
 def create(self, name=None, params=None, root_dir=None,
timeout=CREATE_TIMEOUT, migration_mode=None,
-   migration_exec_cmd=None, mac_source=None):
+   migration_exec_cmd=None, migration_fd=None,
+   mac_source=None):
 """
 Start the VM by running a qemu command.
 All parameters are optional. If name, params or root_dir are not
@@ -985,6 +986,7 @@ class VM(virt_vm.BaseVM):
 @param migration_exec_cmd: Command to embed in '-incoming "exec: ..."'
 (e.g. 'gzip -c -d filename') if migration_mode is 'exec'
 default to listening on a random TCP port
+@param migration_fd: Open descriptor from machine should migrate.
 @param mac_source: A VM object from which to copy MAC addresses. If not
 specified, new addresses will be generated.
 
@@ -1183,6 +1185,8 @@ class VM(virt_vm.BaseVM):
 else:
 qemu_command += (' -incoming "exec:%s"' %
  migration_exec_cmd)
+elif migration_mode == "fd":
+qemu_command += ' -incoming "fd:%d"' % (migration_fd)
 
 p9_fs_driver = params.get("9p_fs_driver")
 if p9_fs_driver == "proxy":
@@ -1728,10 +1732,25 @@ class VM(virt_vm.BaseVM):
 
 
 @error.context_aware
+def send_fd(self, fd, fd_name="migfd"):
+"""
+Send file descriptor over unix socket to VM.
+
+@param fd: File descriptor.
+@param fd_name: File descriptor identificator in VM.
+"""
+error.context("Send fd %d like %s to VM %s" % (fd, fd_name, self.name))
+
+logging.debug("Send file descriptor %s to source VM." % fd_name)
+self.monitor.cmd("getfd %s" % (fd_name), fd=fd)
+error.context()
+
+
+@error.context_aware
 def migrate(self, timeout=MIGRATE_TIMEOUT, protocol="tcp",
 cancel_delay=None, offline=False, stable_check=False,
 clean=True, save_path="/tmp", dest_host="localhost",
-remote_port=None):
+remote_port=None, fd_src=None, fd_dst=None):
 """
 Migrate the VM.
 
@@ -1752,6 +1771,10 @@ class VM(virt_vm.BaseVM):
 @save_path: The path for state files.
 @param dest_host: Destination host (defaults to 'localhost').
 @param remote_port: Port to use for remote migration.
+@param fd_s: File descriptor for migration to which source
+ VM write data. Descriptor is closed during the migration.
+@param fd_d: File descriptor for migration from which destination
+ VM read data.
 """
 if protocol not in self.MIGRATION_PROTOS:
 raise virt_vm.VMMigrateProtoUnsupportedError
@@ -1795,6 +1818,16 @@ class VM(virt_vm.BaseVM):
 "for migration to finish")
 
 local = dest_host == "localhost"
+mig_fd_name = None
+
+if protocol == "fd":
+#Check if descriptors aren't None for local migration.
+if local and (fd_dst is None or fd_src is None):
+(fd_dst, fd_src) = os.pipe()
+
+mig_fd_name = "migfd_%d_%d" % (fd_src, time.time())
+self.send_fd(fd_src, mig_fd_name)
+os.close(fd_src)
 
 clone = self.clone()
 if local:
@@ -1803,7 +1836,10 @@ class VM(virt_vm.BaseVM):
 # Pause the dest vm after creation
 extra_params = clone.params.get("extra_params", "") + " -S"
 clone.params["extra_params"] = extra_params
-clone.create(migration_mode=protocol, mac_source=self)
+clone.create(migration_mode=protocol, mac_sour

[autotest][PATCH 3/3] kvm-test: Add support for multihost migration over file descriptor.

2012-04-19 Thread Jiří Župka
Test create socket get descriptor from socket and migrate through
the descriptor.

This test allow migration only of one machine at once.

Signed-off-by: Jiří Župka 
---
 client/tests/kvm/tests/migration_multi_host_fd.py |  124 +
 client/virt/virt_utils.py |   27 +++--
 2 files changed, 141 insertions(+), 10 deletions(-)
 create mode 100644 client/tests/kvm/tests/migration_multi_host_fd.py

diff --git a/client/tests/kvm/tests/migration_multi_host_fd.py 
b/client/tests/kvm/tests/migration_multi_host_fd.py
new file mode 100644
index 000..6f3c72b
--- /dev/null
+++ b/client/tests/kvm/tests/migration_multi_host_fd.py
@@ -0,0 +1,124 @@
+import logging, socket, time, errno, os, fcntl
+from autotest.client.virt import virt_utils
+from autotest.client.shared.syncdata import SyncData
+
+def run_migration_multi_host_fd(test, params, env):
+"""
+KVM multi-host migration over fd test:
+
+Migrate machine over socket's fd. Migration execution progress is
+described in documentation for migrate method in class MultihostMigration.
+This test allows migrate only one machine at once.
+
+@param test: kvm test object.
+@param params: Dictionary with test parameters.
+@param env: Dictionary with the test environment.
+"""
+class TestMultihostMigrationFd(virt_utils.MultihostMigration):
+def __init__(self, test, params, env):
+super(TestMultihostMigrationFd, self).__init__(test, params, env)
+
+def migrate_vms_src(self, mig_data):
+"""
+Migrate vms source.
+
+@param mig_Data: Data for migration.
+
+For change way how machine migrates is necessary
+re implement this method.
+"""
+logging.info("Start migrating now...")
+vm = mig_data.vms[0]
+vm.migrate(dest_host=mig_data.dst,
+   protocol="fd",
+   fd_src=mig_data.params['migration_fd'])
+
+def _check_vms_source(self, mig_data):
+for vm in mig_data.vms:
+vm.wait_for_login(timeout=self.login_timeout)
+self._hosts_barrier(mig_data.hosts, mig_data.mig_id,
+'prepare_VMS', 60)
+
+def _check_vms_dest(self, mig_data):
+self._hosts_barrier(mig_data.hosts, mig_data.mig_id,
+ 'prepare_VMS', 120)
+os.close(mig_data.params['migration_fd'])
+
+def _connect_to_server(self, host, port, timeout=60):
+"""
+Connect to network server.
+"""
+endtime = time.time() + timeout
+sock = None
+while endtime > time.time():
+sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+try:
+sock.connect((host, port))
+break
+except socket.error, err:
+(code, _) = err
+if (code != errno.ECONNREFUSED):
+raise
+time.sleep(1)
+
+return sock
+
+def _create_server(self, port, timeout=60):
+"""
+Create network server.
+"""
+sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
+sock.settimeout(timeout)
+sock.bind(('', port))
+sock.listen(1)
+return sock
+
+def migration_scenario(self):
+srchost = self.params.get("hosts")[0]
+dsthost = self.params.get("hosts")[1]
+mig_port = None
+
+if params.get("hostid") == self.master_id():
+mig_port = virt_utils.find_free_port(5200, 6000)
+
+sync = SyncData(self.master_id(), self.hostid,
+ self.params.get("hosts"),
+ {'src': srchost, 'dst': dsthost,
+  'port': "ports"}, self.sync_server)
+mig_port = sync.sync(mig_port, timeout=120)
+mig_port = mig_port[srchost]
+logging.debug("Migration port %d" % (mig_port))
+
+if params.get("hostid") != self.master_id():
+s = self._connect_to_server(srchost, mig_port)
+try:
+fd = s.fileno()
+logging.debug("File descrtiptor %d used for"
+  " migration." % (fd))
+
+self.migrate_wait(["vm1"], srchost, dsthost, mig_mode="fd",
+  params_append={"migration_fd": fd})
+finally:
+s.close()
+else:
+s = self._create_server(mig_port)
+try:
+conn, _ = s.accept()
+fd = conn.fileno()
+logging.debug("File descrtiptor %d use

[PATCH 0/2] Simplify RCU freeing of shadow pages

2012-04-19 Thread Avi Kivity
This patchset simplifies the freeing by RCU of mmu pages.

Xiao, I'm sure you thought of always freeing by RCU.  Why didn't you choose
this way?  I saves a couple of atomics in the fast path.

Avi Kivity (2):
  KVM: MMU: Always free shadow pages using RCU
  KVM: MMU: Recover space used by rcu_head in struct kvm_mmu_page

 arch/x86/include/asm/kvm_host.h |9 +++---
 arch/x86/kvm/mmu.c  |   58 ---
 2 files changed, 15 insertions(+), 52 deletions(-)

-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: MMU: Always free shadow pages using RCU

2012-04-19 Thread Avi Kivity
The non-RCU path costs us two atomic operations, as well as extra
code complexity.

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_host.h |2 --
 arch/x86/kvm/mmu.c  |   46 ---
 2 files changed, 9 insertions(+), 39 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f624ca7..b885445 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -536,8 +536,6 @@ struct kvm_arch {
u64 hv_guest_os_id;
u64 hv_hypercall;
 
-   atomic_t reader_counter;
-
#ifdef CONFIG_KVM_MMU_AUDIT
int audit_point;
#endif
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 29ad6f9..c10f60b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -549,23 +549,6 @@ static u64 mmu_spte_get_lockless(u64 *sptep)
return __get_spte_lockless(sptep);
 }
 
-static void walk_shadow_page_lockless_begin(struct kvm_vcpu *vcpu)
-{
-   rcu_read_lock();
-   atomic_inc(&vcpu->kvm->arch.reader_counter);
-
-   /* Increase the counter before walking shadow page table */
-   smp_mb__after_atomic_inc();
-}
-
-static void walk_shadow_page_lockless_end(struct kvm_vcpu *vcpu)
-{
-   /* Decrease the counter after walking shadow page table finished */
-   smp_mb__before_atomic_dec();
-   atomic_dec(&vcpu->kvm->arch.reader_counter);
-   rcu_read_unlock();
-}
-
 static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
  struct kmem_cache *base_cache, int min)
 {
@@ -2023,23 +2006,12 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 
kvm_flush_remote_tlbs(kvm);
 
-   if (atomic_read(&kvm->arch.reader_counter)) {
-   kvm_mmu_isolate_pages(invalid_list);
-   sp = list_first_entry(invalid_list, struct kvm_mmu_page, link);
-   list_del_init(invalid_list);
-
-   trace_kvm_mmu_delay_free_pages(sp);
-   call_rcu(&sp->rcu, free_pages_rcu);
-   return;
-   }
-
-   do {
-   sp = list_first_entry(invalid_list, struct kvm_mmu_page, link);
-   WARN_ON(!sp->role.invalid || sp->root_count);
-   kvm_mmu_isolate_page(sp);
-   kvm_mmu_free_page(sp);
-   } while (!list_empty(invalid_list));
+   kvm_mmu_isolate_pages(invalid_list);
+   sp = list_first_entry(invalid_list, struct kvm_mmu_page, link);
+   list_del_init(invalid_list);
 
+   trace_kvm_mmu_delay_free_pages(sp);
+   call_rcu(&sp->rcu, free_pages_rcu);
 }
 
 /*
@@ -2976,11 +2948,11 @@ static u64 walk_shadow_page_get_mmio_spte(struct 
kvm_vcpu *vcpu, u64 addr)
struct kvm_shadow_walk_iterator iterator;
u64 spte = 0ull;
 
-   walk_shadow_page_lockless_begin(vcpu);
+   rcu_read_lock();
for_each_shadow_entry_lockless(vcpu, addr, iterator, spte)
if (!is_shadow_present_pte(spte))
break;
-   walk_shadow_page_lockless_end(vcpu);
+   rcu_read_unlock();
 
return spte;
 }
@@ -4060,14 +4032,14 @@ int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, 
u64 addr, u64 sptes[4])
u64 spte;
int nr_sptes = 0;
 
-   walk_shadow_page_lockless_begin(vcpu);
+   rcu_read_lock();
for_each_shadow_entry_lockless(vcpu, addr, iterator, spte) {
sptes[iterator.level-1] = spte;
nr_sptes++;
if (!is_shadow_present_pte(spte))
break;
}
-   walk_shadow_page_lockless_end(vcpu);
+   rcu_read_unlock();
 
return nr_sptes;
 }
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM: MMU: Recover space used by rcu_head in struct kvm_mmu_page

2012-04-19 Thread Avi Kivity
By overlaying the field with 'link', we reduce the structure size by
16 bytes.

Changing call_rcu() to be per-page is not strictly necessary, but it
can help RCU estimate the amount of work pending.

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_host.h |7 ---
 arch/x86/kvm/mmu.c  |   26 +-
 2 files changed, 13 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b885445..ae02ff8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -208,7 +208,10 @@ union kvm_mmu_page_role {
 };
 
 struct kvm_mmu_page {
-   struct list_head link;
+   union {
+   struct list_head link;
+   struct rcu_head rcu;
+   };
struct hlist_node hash_link;
 
/*
@@ -237,8 +240,6 @@ struct kvm_mmu_page {
 #endif
 
int write_flooding_count;
-
-   struct rcu_head rcu;
 };
 
 struct kvm_pio_request {
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c10f60b..26257d7 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1338,7 +1338,6 @@ static void kvm_mmu_isolate_page(struct kvm_mmu_page *sp)
  */
 static void kvm_mmu_free_page(struct kvm_mmu_page *sp)
 {
-   list_del(&sp->link);
free_page((unsigned long)sp->spt);
kmem_cache_free(mmu_page_header_cache, sp);
 }
@@ -1980,20 +1979,12 @@ static void kvm_mmu_isolate_pages(struct list_head 
*invalid_list)
kvm_mmu_isolate_page(sp);
 }
 
-static void free_pages_rcu(struct rcu_head *head)
+static void free_page_rcu(struct rcu_head *head)
 {
-   struct kvm_mmu_page *next, *sp;
+   struct kvm_mmu_page *sp;
 
sp = container_of(head, struct kvm_mmu_page, rcu);
-   while (sp) {
-   if (!list_empty(&sp->link))
-   next = list_first_entry(&sp->link,
- struct kvm_mmu_page, link);
-   else
-   next = NULL;
-   kvm_mmu_free_page(sp);
-   sp = next;
-   }
+   kvm_mmu_free_page(sp);
 }
 
 static void kvm_mmu_commit_zap_page(struct kvm *kvm,
@@ -2007,11 +1998,12 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
kvm_flush_remote_tlbs(kvm);
 
kvm_mmu_isolate_pages(invalid_list);
-   sp = list_first_entry(invalid_list, struct kvm_mmu_page, link);
-   list_del_init(invalid_list);
-
-   trace_kvm_mmu_delay_free_pages(sp);
-   call_rcu(&sp->rcu, free_pages_rcu);
+   while (!list_empty(invalid_list)) {
+   sp = list_first_entry(invalid_list, struct kvm_mmu_page, link);
+   list_del(&sp->link);
+   trace_kvm_mmu_delay_free_pages(sp);
+   call_rcu(&sp->rcu, free_page_rcu);
+   }
 }
 
 /*
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] virt multihost_mig: Repairs bug in starting sequence of tests.

2012-04-19 Thread Jiří Župka
Signed-off-by: Jiří Župka 
---
 client/tests/kvm/multi_host.srv |   22 ++
 1 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/client/tests/kvm/multi_host.srv b/client/tests/kvm/multi_host.srv
index 5aafeda..e54325f 100644
--- a/client/tests/kvm/multi_host.srv
+++ b/client/tests/kvm/multi_host.srv
@@ -22,6 +22,15 @@ AUTOTEST_DIR = job.clientdir
 
 KVM_DIR = os.path.join(AUTOTEST_DIR, 'tests', 'kvm')
 
+CONTROL_MAIN_PART = """
+testname = "kvm"
+bindir = os.path.join(job.testdir, testname)
+job.install_pkg(testname, 'test', bindir)
+
+kvm_test_dir = os.path.join(os.environ['AUTODIR'],'tests', 'kvm')
+sys.path.append(kvm_test_dir)
+"""
+
 try:
 import autotest.common
 except ImportError:
@@ -67,17 +76,10 @@ def run(machines):
 ips = []
 for machine in machines:
 host = _hosts[machine]
-host.control = """
-testname = "kvm"
-bindir = os.path.join(job.testdir, testname)
-job.install_pkg(testname, 'test', bindir)
-
-kvm_test_dir = os.path.join(os.environ['AUTODIR'],'tests', 'kvm')
-sys.path.append(kvm_test_dir)
-"""
 ips.append(host.host.ip)
 
 for params in test_dicts:
+
 params['hosts'] = ips
 
 params['not_preprocess'] = "yes"
@@ -108,6 +110,10 @@ sys.path.append(kvm_test_dir)
 
 for machine in machines:
 host = _hosts[machine]
+host.control = CONTROL_MAIN_PART
+
+for machine in machines:
+host = _hosts[machine]
 host.control += ("job.run_test('kvm', tag='%s', params=%s)" %
  (host.params['shortname'], host.params))
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] kvm-test: Add support for multihost migration over file descriptor.

2012-04-19 Thread Jiří Župka
Test create socket get descriptor from socket and migrate through
the descriptor.

This test allow migration only of one machine at once.

Signed-off-by: Jiří Župka 
---
 client/tests/kvm/tests/migration_multi_host_fd.py |  124 +
 client/virt/virt_utils.py |   27 +++--
 2 files changed, 141 insertions(+), 10 deletions(-)
 create mode 100644 client/tests/kvm/tests/migration_multi_host_fd.py

diff --git a/client/tests/kvm/tests/migration_multi_host_fd.py 
b/client/tests/kvm/tests/migration_multi_host_fd.py
new file mode 100644
index 000..6f3c72b
--- /dev/null
+++ b/client/tests/kvm/tests/migration_multi_host_fd.py
@@ -0,0 +1,124 @@
+import logging, socket, time, errno, os, fcntl
+from autotest.client.virt import virt_utils
+from autotest.client.shared.syncdata import SyncData
+
+def run_migration_multi_host_fd(test, params, env):
+"""
+KVM multi-host migration over fd test:
+
+Migrate machine over socket's fd. Migration execution progress is
+described in documentation for migrate method in class MultihostMigration.
+This test allows migrate only one machine at once.
+
+@param test: kvm test object.
+@param params: Dictionary with test parameters.
+@param env: Dictionary with the test environment.
+"""
+class TestMultihostMigrationFd(virt_utils.MultihostMigration):
+def __init__(self, test, params, env):
+super(TestMultihostMigrationFd, self).__init__(test, params, env)
+
+def migrate_vms_src(self, mig_data):
+"""
+Migrate vms source.
+
+@param mig_Data: Data for migration.
+
+For change way how machine migrates is necessary
+re implement this method.
+"""
+logging.info("Start migrating now...")
+vm = mig_data.vms[0]
+vm.migrate(dest_host=mig_data.dst,
+   protocol="fd",
+   fd_src=mig_data.params['migration_fd'])
+
+def _check_vms_source(self, mig_data):
+for vm in mig_data.vms:
+vm.wait_for_login(timeout=self.login_timeout)
+self._hosts_barrier(mig_data.hosts, mig_data.mig_id,
+'prepare_VMS', 60)
+
+def _check_vms_dest(self, mig_data):
+self._hosts_barrier(mig_data.hosts, mig_data.mig_id,
+ 'prepare_VMS', 120)
+os.close(mig_data.params['migration_fd'])
+
+def _connect_to_server(self, host, port, timeout=60):
+"""
+Connect to network server.
+"""
+endtime = time.time() + timeout
+sock = None
+while endtime > time.time():
+sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+try:
+sock.connect((host, port))
+break
+except socket.error, err:
+(code, _) = err
+if (code != errno.ECONNREFUSED):
+raise
+time.sleep(1)
+
+return sock
+
+def _create_server(self, port, timeout=60):
+"""
+Create network server.
+"""
+sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
+sock.settimeout(timeout)
+sock.bind(('', port))
+sock.listen(1)
+return sock
+
+def migration_scenario(self):
+srchost = self.params.get("hosts")[0]
+dsthost = self.params.get("hosts")[1]
+mig_port = None
+
+if params.get("hostid") == self.master_id():
+mig_port = virt_utils.find_free_port(5200, 6000)
+
+sync = SyncData(self.master_id(), self.hostid,
+ self.params.get("hosts"),
+ {'src': srchost, 'dst': dsthost,
+  'port': "ports"}, self.sync_server)
+mig_port = sync.sync(mig_port, timeout=120)
+mig_port = mig_port[srchost]
+logging.debug("Migration port %d" % (mig_port))
+
+if params.get("hostid") != self.master_id():
+s = self._connect_to_server(srchost, mig_port)
+try:
+fd = s.fileno()
+logging.debug("File descrtiptor %d used for"
+  " migration." % (fd))
+
+self.migrate_wait(["vm1"], srchost, dsthost, mig_mode="fd",
+  params_append={"migration_fd": fd})
+finally:
+s.close()
+else:
+s = self._create_server(mig_port)
+try:
+conn, _ = s.accept()
+fd = conn.fileno()
+logging.debug("File descrtiptor %d use

[PATCH 2/3] virt: Adds migration over fd for kvm.

2012-04-19 Thread Jiří Župka
Migration over fd:
  source:
1) Make new descriptor (pipe, socket) and send the descriptor
   to qemu-monitor over unix socket.
2) Register the descriptor in qemu-monitor with function
  getfd DSC_NAME
   and close the descriptor in main process.
3) Migrate over descriptor.
  migrate fd:DSC_NAME

  destination:
1) Start child process with open second side of source descriptor.
2) Start machine with param -incoming fd:descriptor
3) Wait for finishing migration.

Signed-off-by: Jiří Župka 
---
 client/virt/kvm_vm.py   |   47 +++---
 client/virt/virt_env_process.py |3 +-
 2 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/client/virt/kvm_vm.py b/client/virt/kvm_vm.py
index 10aafbb..9857998 100644
--- a/client/virt/kvm_vm.py
+++ b/client/virt/kvm_vm.py
@@ -15,7 +15,7 @@ class VM(virt_vm.BaseVM):
 This class handles all basic VM operations.
 """
 
-MIGRATION_PROTOS = ['tcp', 'unix', 'exec']
+MIGRATION_PROTOS = ['tcp', 'unix', 'exec', 'fd']
 
 #
 # By default we inherit all timeouts from the base VM class
@@ -971,7 +971,8 @@ class VM(virt_vm.BaseVM):
 @error.context_aware
 def create(self, name=None, params=None, root_dir=None,
timeout=CREATE_TIMEOUT, migration_mode=None,
-   migration_exec_cmd=None, mac_source=None):
+   migration_exec_cmd=None, migration_fd=None,
+   mac_source=None):
 """
 Start the VM by running a qemu command.
 All parameters are optional. If name, params or root_dir are not
@@ -985,6 +986,7 @@ class VM(virt_vm.BaseVM):
 @param migration_exec_cmd: Command to embed in '-incoming "exec: ..."'
 (e.g. 'gzip -c -d filename') if migration_mode is 'exec'
 default to listening on a random TCP port
+@param migration_fd: Open descriptor from machine should migrate.
 @param mac_source: A VM object from which to copy MAC addresses. If not
 specified, new addresses will be generated.
 
@@ -1183,6 +1185,8 @@ class VM(virt_vm.BaseVM):
 else:
 qemu_command += (' -incoming "exec:%s"' %
  migration_exec_cmd)
+elif migration_mode == "fd":
+qemu_command += ' -incoming "fd:%d"' % (migration_fd)
 
 p9_fs_driver = params.get("9p_fs_driver")
 if p9_fs_driver == "proxy":
@@ -1728,10 +1732,25 @@ class VM(virt_vm.BaseVM):
 
 
 @error.context_aware
+def send_fd(self, fd, fd_name="migfd"):
+"""
+Send file descriptor over unix socket to VM.
+
+@param fd: File descriptor.
+@param fd_name: File descriptor identificator in VM.
+"""
+error.context("Send fd %d like %s to VM %s" % (fd, fd_name, self.name))
+
+logging.debug("Send file descriptor %s to source VM." % fd_name)
+self.monitor.cmd("getfd %s" % (fd_name), fd=fd)
+error.context()
+
+
+@error.context_aware
 def migrate(self, timeout=MIGRATE_TIMEOUT, protocol="tcp",
 cancel_delay=None, offline=False, stable_check=False,
 clean=True, save_path="/tmp", dest_host="localhost",
-remote_port=None):
+remote_port=None, fd_src=None, fd_dst=None):
 """
 Migrate the VM.
 
@@ -1752,6 +1771,10 @@ class VM(virt_vm.BaseVM):
 @save_path: The path for state files.
 @param dest_host: Destination host (defaults to 'localhost').
 @param remote_port: Port to use for remote migration.
+@param fd_s: File descriptor for migration to which source
+ VM write data. Descriptor is closed during the migration.
+@param fd_d: File descriptor for migration from which destination
+ VM read data.
 """
 if protocol not in self.MIGRATION_PROTOS:
 raise virt_vm.VMMigrateProtoUnsupportedError
@@ -1795,6 +1818,16 @@ class VM(virt_vm.BaseVM):
 "for migration to finish")
 
 local = dest_host == "localhost"
+mig_fd_name = None
+
+if protocol == "fd":
+#Check if descriptors aren't None for local migration.
+if local and (fd_dst is None or fd_src is None):
+(fd_dst, fd_src) = os.pipe()
+
+mig_fd_name = "migfd_%d_%d" % (fd_src, time.time())
+self.send_fd(fd_src, mig_fd_name)
+os.close(fd_src)
 
 clone = self.clone()
 if local:
@@ -1803,7 +1836,10 @@ class VM(virt_vm.BaseVM):
 # Pause the dest vm after creation
 extra_params = clone.params.get("extra_params", "") + " -S"
 clone.params["extra_params"] = extra_params
-clone.create(migration_mode=protocol, mac_source=self)
+clone.create(migration_mode=protocol, mac_sour

Re: Question about host CPU usage/allocation by KVM

2012-04-19 Thread Alexander Lyakas
Hi Stuart,
I have been doing some experiments, and I noticed that there are
additional QEMU threads, besides the ones reported by "info cpus"
command. In particular, the main QEMU thread (the one whose LWP is the
same as its PID), also consumes significant CPU time. Is this
expected?

Alex.


On Wed, Apr 18, 2012 at 8:24 PM, Stuart Yoder  wrote:
> On Tue, Apr 17, 2012 at 4:54 PM, Alexander Lyakas
>  wrote:
>> Greetings everybody,
>>
>> Can anybody please point me to code/documentation regarding the
>> following questions I have:
>>
>> - What does it actually mean using "-smp N" option, in terms of CPU
>> sharing between the host and the guest?
>> - How are guest CPUs mapped to host CPUs (if at all)?
>
> Each guest CPU (vcpu) corresponds to a QEMU thread.
> You can see the thread ids in QEMU with "info cpus" in the
> QEMU monitor.
>
> Since a vcpu is a thread you can apply standard Linux
> mechanisms to managing those threads-- CPU affinity, etc.
>
> Stuart
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] Migration through file descriptor

2012-04-19 Thread Lucas Meneghel Rodrigues
Applied, thanks!

On Thu, Apr 19, 2012 at 1:27 PM, Jiří Župka  wrote:
> This patch series adds
> 1) support for migration over fd.
> 2) Add multihost migration test over fd.
>
> pull-request: https://github.com/autotest/autotest/pull/312
>
> ___
> Autotest mailing list
> autot...@test.kernel.org
> http://test.kernel.org/cgi-bin/mailman/listinfo/autotest



-- 
Lucas
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 0/9] ACPI memory hotplug

2012-04-19 Thread Vasilis Liaskovitis
Hi,

On Thu, Apr 19, 2012 at 09:49:31AM -0500, Anthony Liguori wrote:
> On 04/19/2012 09:08 AM, Vasilis Liaskovitis wrote:
> >This is a prototype for ACPI memory hotplug on x86_64 target. Based on some
> >earlier work and comments from Gleb.
> >
> >Memslot devices are modeled with a new qemu command line
> >
> >"-memslot id=name,start=start_addr,size=sz,node=pxm"
> 
> Hi,
> 
> For 1.2, I'd really like to focus on refactoring the PC machine as
> described in this series:
> 
> https://github.com/aliguori/qemu/commits/qom-rebase.12
> 
> I'd like to represent the guest memory as a "DIMM" device.
> 
> In terms of this proposal, I would then expect that the i440fx
> device would have a num_dimms property that controlled how many
> link's it had.  Hotplug would consist of creating a DIMM at
> run time and connecting it to the appropriate link.
>
ok, makes sense.

> One thing that's not clear to me is how the start/size fits in.  On
> bare metal, is this something that's calculated by the firmware
> during start up and then populated in ACPI?   Does it do something
> like take the largest possible DIMM size that it supports and fill
> out the table?

The current series works as follows:
For each DIMM/memslot option, firmware constructs a QWordMemory ACPI object
(see ACPI spec, ASL 18.5.95). This object has AddressMinimum, AddressMaximum,
RangeLength fields. The first of these corresponds directly to the start
attribute, the third corresponds to size, and the second is derived from both.

On bare metal, I believe the firmware detects the actual DIMM devices and their
size and calculates the physical offset (AddressMinimum) for each, taking into
account possible PCI hole. I doubt the largest possible DIMM size is used, since
a hotplug entity/event should correspond to a physical device. (Kevin or Gleb 
may
have a better idea of what real metal firmware usually does).

Perhaps you are suggesting having a predefined number of equally sized DIMMs as
being hotplug-able? This way no memslot/DIMM config would have to be passed by
the user at the command line for each DIMM.

In this series, the user-defined memslot options pass the desired DIMM
descriptions to SeaBIOS, which then builds the aforementioned objects.(I assume
it would be possible to pass this info with normal "-device" commands, after
proper qom-ification)

> 
> At any rate, I think we should focus on modeling this in QOM verses
> adding a new option and hacking at the existing memory init code.

agreed. I will take a look into qom-rebase.

thanks,

- Vasilis
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 7/9] Implement memslot command-line option and memslot hmp command

2012-04-19 Thread Vasilis Liaskovitis
Hi,

On Thu, Apr 19, 2012 at 05:22:52PM +0300, Avi Kivity wrote:
> On 04/19/2012 05:08 PM, Vasilis Liaskovitis wrote:
> >  Implement -memslot qemu-kvm command line option to define hotplug-able 
> > memory
> >  slots.
> >  Syntax: "-memslot id=name,start=addr,size=sz,node=nodeid"
> >
> >  e.g. "-memslot id=hot1,start=4294967296,size=1073741824,node=0"
> >  will define a 1G memory slot starting at physical address 4G, belonging to 
> > numa
> >  node 0. Defining no node will automatically add a memslot to node 0.
> 
> start=4G,size=1G ought to work too, no?

it should, but it didn't when I tried. Probably some silliness on my part, I
will retry.

thanks,

- Vasilis
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 2/2] Expose tsc deadline timer cpuid to guest

2012-04-19 Thread Eduardo Habkost
Jan/Avi: ping?

I would like to get this ABI detail clarified so it can be implemented
the right way on Qemu and KVM.

My proposal is to simply add tsc-deadline to the data returned by
GET_SUPPORTED_CPUID, making KVM_CAP_TSC_DEADLINE_TIMER unnecessary.


On Fri, Mar 23, 2012 at 02:17:52PM +, Liu, Jinsong wrote:
> Eduardo Habkost wrote:
> > On Fri, Mar 23, 2012 at 03:49:27AM +, Liu, Jinsong wrote:
> >> Eduardo Habkost wrote:
> >>> [1] From Documentation/virtual/kvm/api.txt:
> >>> 
> >>> "KVM_GET_SUPPORTED_CPUID
> >>> [...]
> >>> This ioctl returns x86 cpuid features which are supported by both
> >>> the hardware and kvm.  Userspace can use the information returned
> >>> by this ioctl to construct cpuid information (for KVM_SET_CPUID2)
> >>> that is consistent with hardware, kernel, and userspace
> >>>   capabilities, and with
> >>> ^^ 
> >>> user requirements (for example, the user may wish to constrain cpuid
> >>> to emulate older hardware, or for feature consistency across a
> >>> cluster)."
> >> 
> >> The fixbug patch is implemented by Jan and Avi, I reply per my
> >> understanding. 
> > 
> > No problem. I hope Jan or Avi can clarify this.
> > 
> >> 
> >> I think for tsc deadline timer feature, KVM_CAP_TSC_DEADLINE_TIMER is
> >> slightly better than KVM_GET_SUPPORTED_CPUID. If use
> >> KVM_GET_SUPPORTED_CPUID, it means tsc deadline features bind to host
> >> cpuid, while it fact it could be pure software emulated by kvm
> >> (though currently it implemented as bound to hareware). For the sake
> >> of 
> >> extension, it choose KVM_CAP_TSC_DEADLINE_TIMER.
> > 
> > There's no requirement for GET_SUPPORTED_CPUID to be a subset of the
> > host CPU features. If KVM can completely emulate the feature by
> > software, then it can return the feature on GET_SUPPORTED_CPUID even
> > if the host CPU doesn't have the feature. That's the case for x2apic,
> > for example (see commit 0d1de2d901f4ba0972a3886496a44fb1d3300dbd).
> 
> 
> Jan/Avi,
> 
> Could you elaborate more your thought? 
> 
> Thanks,
> Jinsong

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC V7 0/12] Paravirtualized ticketlocks

2012-04-19 Thread Raghavendra K T
From: Jeremy Fitzhardinge 

This series replaces the existing paravirtualized spinlock mechanism
with a paravirtualized ticketlock mechanism. (targeted for 3.5 window)

Changes in V7:
 - Reabsed patches to 3.4-rc3
 - Added jumplabel split patch (originally from Andrew Jones rebased to
3.4-rc3
 - jumplabel changes from Ingo and Jason taken and now using static_key_*
instead of static_branch.
 - using UNINLINE_SPIN_UNLOCK (which was splitted as per suggestion from Linus)
 - This patch series is rebased on debugfs patch (that sould be already in
Xen/linux-next https://lkml.org/lkml/2012/3/23/51)

Ticket locks have an inherent problem in a virtualized case, because
the vCPUs are scheduled rather than running concurrently (ignoring
gang scheduled vCPUs).  This can result in catastrophic performance
collapses when the vCPU scheduler doesn't schedule the correct "next"
vCPU, and ends up scheduling a vCPU which burns its entire timeslice
spinning.  (Note that this is not the same problem as lock-holder
preemption, which this series also addresses; that's also a problem,
but not catastrophic).

(See Thomas Friebel's talk "Prevent Guests from Spinning Around"
http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)

Currently we deal with this by having PV spinlocks, which adds a layer
of indirection in front of all the spinlock functions, and defining a
completely new implementation for Xen (and for other pvops users, but
there are none at present).

PV ticketlocks keeps the existing ticketlock implemenentation
(fastpath) as-is, but adds a couple of pvops for the slow paths:

- If a CPU has been waiting for a spinlock for SPIN_THRESHOLD
  iterations, then call out to the __ticket_lock_spinning() pvop,
  which allows a backend to block the vCPU rather than spinning.  This
  pvop can set the lock into "slowpath state".

- When releasing a lock, if it is in "slowpath state", the call
  __ticket_unlock_kick() to kick the next vCPU in line awake.  If the
  lock is no longer in contention, it also clears the slowpath flag.

The "slowpath state" is stored in the LSB of the within the lock tail
ticket.  This has the effect of reducing the max number of CPUs by
half (so, a "small ticket" can deal with 128 CPUs, and "large ticket"
32768).

This series provides a Xen implementation, KVM implementation will be
posted in next 2-3 days.

Overall, it results in a large reduction in code, it makes the native
and virtualized cases closer, and it removes a layer of indirection
around all the spinlock functions.

The fast path (taking an uncontended lock which isn't in "slowpath"
state) is optimal, identical to the non-paravirtualized case.

The inner part of ticket lock code becomes:
inc = xadd(&lock->tickets, inc);
inc.tail &= ~TICKET_SLOWPATH_FLAG;

if (likely(inc.head == inc.tail))
goto out;
for (;;) {
unsigned count = SPIN_THRESHOLD;
do {
if (ACCESS_ONCE(lock->tickets.head) == inc.tail)
goto out;
cpu_relax();
} while (--count);
__ticket_lock_spinning(lock, inc.tail);
}
out:barrier();
which results in:
push   %rbp
mov%rsp,%rbp

mov$0x200,%eax
lock xadd %ax,(%rdi)
movzbl %ah,%edx
cmp%al,%dl
jne1f   # Slowpath if lock in contention

pop%rbp
retq   

### SLOWPATH START
1:  and$-2,%edx
movzbl %dl,%esi

2:  mov$0x800,%eax
jmp4f

3:  pause  
sub$0x1,%eax
je 5f

4:  movzbl (%rdi),%ecx
cmp%cl,%dl
jne3b

pop%rbp
retq   

5:  callq  *__ticket_lock_spinning
jmp2b
### SLOWPATH END

with CONFIG_PARAVIRT_SPINLOCKS=n, the code has changed slightly, where
the fastpath case is straight through (taking the lock without
contention), and the spin loop is out of line:

push   %rbp
mov%rsp,%rbp

mov$0x100,%eax
lock xadd %ax,(%rdi)
movzbl %ah,%edx
cmp%al,%dl
jne1f

pop%rbp
retq   

### SLOWPATH START
1:  pause  
movzbl (%rdi),%eax
cmp%dl,%al
jne1b

pop%rbp
retq   
### SLOWPATH END

The unlock code is complicated by the need to both add to the lock's
"head" and fetch the slowpath flag from "tail".  This version of the
patch uses a locked add to do this, followed by a test to see if the
slowflag is set.  The lock prefix acts as a full memory barrier, so we
can be sure that other CPUs will have seen the unlock before we read
the flag (without the barrier the read could be fetched from the
store queue before it hits memory, which could result in a deadlock).

This is is all unnecessary complication if you're not using PV ticket
locks, 

[PATCH RFC V7 1/12] x86/spinlock: replace pv spinlocks with pv ticketlocks

2012-04-19 Thread Raghavendra K T
From: Jeremy Fitzhardinge 

Rather than outright replacing the entire spinlock implementation in
order to paravirtualize it, keep the ticket lock implementation but add
a couple of pvops hooks on the slow patch (long spin on lock, unlocking
a contended lock).

Ticket locks have a number of nice properties, but they also have some
surprising behaviours in virtual environments.  They enforce a strict
FIFO ordering on cpus trying to take a lock; however, if the hypervisor
scheduler does not schedule the cpus in the correct order, the system can
waste a huge amount of time spinning until the next cpu can take the lock.

(See Thomas Friebel's talk "Prevent Guests from Spinning Around"
http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)

To address this, we add two hooks:
 - __ticket_spin_lock which is called after the cpu has been
   spinning on the lock for a significant number of iterations but has
   failed to take the lock (presumably because the cpu holding the lock
   has been descheduled).  The lock_spinning pvop is expected to block
   the cpu until it has been kicked by the current lock holder.
 - __ticket_spin_unlock, which on releasing a contended lock
   (there are more cpus with tail tickets), it looks to see if the next
   cpu is blocked and wakes it if so.

When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub
functions causes all the extra code to go away.

Signed-off-by: Jeremy Fitzhardinge 
Reviewed-by: Konrad Rzeszutek Wilk 
Tested-by: Attilio Rao  
Signed-off-by: Raghavendra K T 
---
 arch/x86/include/asm/paravirt.h   |   32 
 arch/x86/include/asm/paravirt_types.h |   10 ++
 arch/x86/include/asm/spinlock.h   |   53 ++--
 arch/x86/include/asm/spinlock_types.h |4 --
 arch/x86/kernel/paravirt-spinlocks.c  |   15 +
 arch/x86/xen/spinlock.c   |8 -
 6 files changed, 61 insertions(+), 61 deletions(-)
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index aa0f913..4bcd146 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -751,36 +751,16 @@ static inline void __set_fixmap(unsigned /* enum 
fixed_addresses */ idx,
 
 #if defined(CONFIG_SMP) && defined(CONFIG_PARAVIRT_SPINLOCKS)
 
-static inline int arch_spin_is_locked(struct arch_spinlock *lock)
+static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
+   __ticket_t ticket)
 {
-   return PVOP_CALL1(int, pv_lock_ops.spin_is_locked, lock);
+   PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
-static inline int arch_spin_is_contended(struct arch_spinlock *lock)
+static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock,
+   __ticket_t ticket)
 {
-   return PVOP_CALL1(int, pv_lock_ops.spin_is_contended, lock);
-}
-#define arch_spin_is_contended arch_spin_is_contended
-
-static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
-{
-   PVOP_VCALL1(pv_lock_ops.spin_lock, lock);
-}
-
-static __always_inline void arch_spin_lock_flags(struct arch_spinlock *lock,
- unsigned long flags)
-{
-   PVOP_VCALL2(pv_lock_ops.spin_lock_flags, lock, flags);
-}
-
-static __always_inline int arch_spin_trylock(struct arch_spinlock *lock)
-{
-   return PVOP_CALL1(int, pv_lock_ops.spin_trylock, lock);
-}
-
-static __always_inline void arch_spin_unlock(struct arch_spinlock *lock)
-{
-   PVOP_VCALL1(pv_lock_ops.spin_unlock, lock);
+   PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
 }
 
 #endif
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 8e8b9a4..005e24d 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -327,13 +327,11 @@ struct pv_mmu_ops {
 };
 
 struct arch_spinlock;
+#include 
+
 struct pv_lock_ops {
-   int (*spin_is_locked)(struct arch_spinlock *lock);
-   int (*spin_is_contended)(struct arch_spinlock *lock);
-   void (*spin_lock)(struct arch_spinlock *lock);
-   void (*spin_lock_flags)(struct arch_spinlock *lock, unsigned long 
flags);
-   int (*spin_trylock)(struct arch_spinlock *lock);
-   void (*spin_unlock)(struct arch_spinlock *lock);
+   void (*lock_spinning)(struct arch_spinlock *lock, __ticket_t ticket);
+   void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket);
 };
 
 /* This contains all the paravirt structures: we get a convenient
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 76bfa2c..3e47608 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -37,6 +37,35 @@
 # define UNLOCK_LOCK_PREFIX
 #endif
 
+/* How long a lock should spin before we consider blocking */
+#define SPIN_THRESHOLD (1 << 11)
+
+#ifndef CONFIG

[PATCH RFC V7 2/12] x86/ticketlock: don't inline _spin_unlock when using paravirt spinlocks

2012-04-19 Thread Raghavendra K T
From: Raghavendra K T  

The code size expands somewhat, and its better to just call
a function rather than inline it.

Thanks Jeremy for original version of ARCH_NOINLINE_SPIN_UNLOCK config patch, 
which is simplified.

Suggested-by: Linus Torvalds 
Signed-off-by: Raghavendra K T 
---
 arch/x86/Kconfig |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 1d14cc6..35eb2e4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -597,6 +597,7 @@ config PARAVIRT
 config PARAVIRT_SPINLOCKS
bool "Paravirtualization layer for spinlocks"
depends on PARAVIRT && SMP && EXPERIMENTAL
+   select UNINLINE_SPIN_UNLOCK
---help---
  Paravirtualized spinlocks allow a pvops backend to replace the
  spinlock implementation with something virtualization-friendly

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC V7 3/12] x86/ticketlock: collapse a layer of functions

2012-04-19 Thread Raghavendra K T
From: Jeremy Fitzhardinge 

Now that the paravirtualization layer doesn't exist at the spinlock
level any more, we can collapse the __ticket_ functions into the arch_
functions.

Signed-off-by: Jeremy Fitzhardinge 
Tested-by: Attilio Rao  
Signed-off-by: Raghavendra K T 
---
 arch/x86/include/asm/spinlock.h |   35 +--
 1 files changed, 5 insertions(+), 30 deletions(-)
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 3e47608..ee4bbd4 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -79,7 +79,7 @@ static __always_inline void __ticket_unlock_kick(struct 
arch_spinlock *lock,
  * in the high part, because a wide xadd increment of the low part would carry
  * up and contaminate the high part.
  */
-static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
+static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 {
register struct __raw_tickets inc = { .tail = 1 };
 
@@ -99,7 +99,7 @@ static __always_inline void __ticket_spin_lock(struct 
arch_spinlock *lock)
 out:   barrier();  /* make sure nothing creeps before the lock is taken */
 }
 
-static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
+static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 {
arch_spinlock_t old, new;
 
@@ -113,7 +113,7 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
 }
 
-static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
+static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
__ticket_t next = lock->tickets.head + 1;
 
@@ -121,46 +121,21 @@ static __always_inline void 
__ticket_spin_unlock(arch_spinlock_t *lock)
__ticket_unlock_kick(lock, next);
 }
 
-static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
+static inline int arch_spin_is_locked(arch_spinlock_t *lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
return tmp.tail != tmp.head;
 }
 
-static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
+static inline int arch_spin_is_contended(arch_spinlock_t *lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
return (__ticket_t)(tmp.tail - tmp.head) > 1;
 }
-
-static inline int arch_spin_is_locked(arch_spinlock_t *lock)
-{
-   return __ticket_spin_is_locked(lock);
-}
-
-static inline int arch_spin_is_contended(arch_spinlock_t *lock)
-{
-   return __ticket_spin_is_contended(lock);
-}
 #define arch_spin_is_contended arch_spin_is_contended
 
-static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
-{
-   __ticket_spin_lock(lock);
-}
-
-static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
-{
-   return __ticket_spin_trylock(lock);
-}
-
-static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
-{
-   __ticket_spin_unlock(lock);
-}
-
 static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock,
  unsigned long flags)
 {

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC V7 4/12] xen: defer spinlock setup until boot CPU setup

2012-04-19 Thread Raghavendra K T
From: Jeremy Fitzhardinge 

There's no need to do it at very early init, and doing it there
makes it impossible to use the jump_label machinery.

Signed-off-by: Jeremy Fitzhardinge 
Reviewed-by: Konrad Rzeszutek Wilk 
Signed-off-by: Raghavendra K T 
---
 arch/x86/xen/smp.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 5fac691..9ac931b 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -207,6 +207,7 @@ static void __init xen_smp_prepare_boot_cpu(void)
 
xen_filter_cpu_maps();
xen_setup_vcpu_info_placement();
+   xen_init_spinlocks();
 }
 
 static void __init xen_smp_prepare_cpus(unsigned int max_cpus)
@@ -536,7 +537,6 @@ void __init xen_smp_init(void)
 {
smp_ops = xen_smp_ops;
xen_fill_possible_map();
-   xen_init_spinlocks();
 }
 
 static void __init xen_hvm_smp_prepare_cpus(unsigned int max_cpus)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC V7 5/12] xen/pvticketlock: Xen implementation for PV ticket locks

2012-04-19 Thread Raghavendra K T
From: Jeremy Fitzhardinge 

Replace the old Xen implementation of PV spinlocks with and implementation
of xen_lock_spinning and xen_unlock_kick.

xen_lock_spinning simply registers the cpu in its entry in lock_waiting,
adds itself to the waiting_cpus set, and blocks on an event channel
until the channel becomes pending.

xen_unlock_kick searches the cpus in waiting_cpus looking for the one
which next wants this lock with the next ticket, if any.  If found,
it kicks it by making its event channel pending, which wakes it up.

We need to make sure interrupts are disabled while we're relying on the
contents of the per-cpu lock_waiting values, otherwise an interrupt
handler could come in, try to take some other lock, block, and overwrite
our values.

Raghu: use function + enum instead of macro, cmpxchg for zero status reset

Signed-off-by: Jeremy Fitzhardinge 
Reviewed-by: Konrad Rzeszutek Wilk 
Signed-off-by: Raghavendra K T 
---
 arch/x86/xen/spinlock.c |  344 +++
 1 files changed, 77 insertions(+), 267 deletions(-)
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index f1f4540..982e64b 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -16,45 +16,46 @@
 #include "xen-ops.h"
 #include "debugfs.h"
 
+enum xen_contention_stat {
+   TAKEN_SLOW,
+   TAKEN_SLOW_PICKUP,
+   TAKEN_SLOW_SPURIOUS,
+   RELEASED_SLOW,
+   RELEASED_SLOW_KICKED,
+   NR_CONTENTION_STATS
+};
+
+
 #ifdef CONFIG_XEN_DEBUG_FS
 static struct xen_spinlock_stats
 {
-   u64 taken;
-   u32 taken_slow;
-   u32 taken_slow_nested;
-   u32 taken_slow_pickup;
-   u32 taken_slow_spurious;
-   u32 taken_slow_irqenable;
-
-   u64 released;
-   u32 released_slow;
-   u32 released_slow_kicked;
+   u32 contention_stats[NR_CONTENTION_STATS];
 
 #define HISTO_BUCKETS  30
-   u32 histo_spin_total[HISTO_BUCKETS+1];
-   u32 histo_spin_spinning[HISTO_BUCKETS+1];
u32 histo_spin_blocked[HISTO_BUCKETS+1];
 
-   u64 time_total;
-   u64 time_spinning;
u64 time_blocked;
 } spinlock_stats;
 
 static u8 zero_stats;
 
-static unsigned lock_timeout = 1 << 10;
-#define TIMEOUT lock_timeout
-
 static inline void check_zero(void)
 {
-   if (unlikely(zero_stats)) {
-   memset(&spinlock_stats, 0, sizeof(spinlock_stats));
-   zero_stats = 0;
+   u8 ret;
+   u8 old = ACCESS_ONCE(zero_stats);
+   if (unlikely(old)) {
+   ret = cmpxchg(&zero_stats, old, 0);
+   /* This ensures only one fellow resets the stat */
+   if (ret == old)
+   memset(&spinlock_stats, 0, sizeof(spinlock_stats));
}
 }
 
-#define ADD_STATS(elem, val)   \
-   do { check_zero(); spinlock_stats.elem += (val); } while(0)
+static inline void add_stats(enum xen_contention_stat var, u32 val)
+{
+   check_zero();
+   spinlock_stats.contention_stats[var] += val;
+}
 
 static inline u64 spin_time_start(void)
 {
@@ -73,22 +74,6 @@ static void __spin_time_accum(u64 delta, u32 *array)
array[HISTO_BUCKETS]++;
 }
 
-static inline void spin_time_accum_spinning(u64 start)
-{
-   u32 delta = xen_clocksource_read() - start;
-
-   __spin_time_accum(delta, spinlock_stats.histo_spin_spinning);
-   spinlock_stats.time_spinning += delta;
-}
-
-static inline void spin_time_accum_total(u64 start)
-{
-   u32 delta = xen_clocksource_read() - start;
-
-   __spin_time_accum(delta, spinlock_stats.histo_spin_total);
-   spinlock_stats.time_total += delta;
-}
-
 static inline void spin_time_accum_blocked(u64 start)
 {
u32 delta = xen_clocksource_read() - start;
@@ -98,19 +83,15 @@ static inline void spin_time_accum_blocked(u64 start)
 }
 #else  /* !CONFIG_XEN_DEBUG_FS */
 #define TIMEOUT(1 << 10)
-#define ADD_STATS(elem, val)   do { (void)(val); } while(0)
+static inline void add_stats(enum xen_contention_stat var, u32 val)
+{
+}
 
 static inline u64 spin_time_start(void)
 {
return 0;
 }
 
-static inline void spin_time_accum_total(u64 start)
-{
-}
-static inline void spin_time_accum_spinning(u64 start)
-{
-}
 static inline void spin_time_accum_blocked(u64 start)
 {
 }
@@ -133,230 +114,83 @@ typedef u16 xen_spinners_t;
asm(LOCK_PREFIX " decw %0" : "+m" ((xl)->spinners) : : "memory");
 #endif
 
-struct xen_spinlock {
-   unsigned char lock; /* 0 -> free; 1 -> locked */
-   xen_spinners_t spinners;/* count of waiting cpus */
+struct xen_lock_waiting {
+   struct arch_spinlock *lock;
+   __ticket_t want;
 };
 
 static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
+static DEFINE_PER_CPU(struct xen_lock_waiting, lock_waiting);
+static cpumask_t waiting_cpus;
 
-#if 0
-static int xen_spin_is_locked(struct arch_spinlock *lock)
+static void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
 {
-   struct xen_spinlock *x

[PATCH RFC V7 6/12] xen/pvticketlocks: add xen_nopvspin parameter to disable xen pv ticketlocks

2012-04-19 Thread Raghavendra K T
From: Jeremy Fitzhardinge 

Signed-off-by: Jeremy Fitzhardinge 
Reviewed-by: Konrad Rzeszutek Wilk 
Signed-off-by: Raghavendra K T 
---
 arch/x86/xen/spinlock.c |   14 ++
 1 files changed, 14 insertions(+), 0 deletions(-)
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 982e64b..c9bf890 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -223,12 +223,26 @@ void xen_uninit_lock_cpu(int cpu)
unbind_from_irqhandler(per_cpu(lock_kicker_irq, cpu), NULL);
 }
 
+static bool xen_pvspin __initdata = true;
+
 void __init xen_init_spinlocks(void)
 {
+   if (!xen_pvspin) {
+   printk(KERN_DEBUG "xen: PV spinlocks disabled\n");
+   return;
+   }
+
pv_lock_ops.lock_spinning = xen_lock_spinning;
pv_lock_ops.unlock_kick = xen_unlock_kick;
 }
 
+static __init int xen_parse_nopvspin(char *arg)
+{
+   xen_pvspin = false;
+   return 0;
+}
+early_param("xen_nopvspin", xen_parse_nopvspin);
+
 #ifdef CONFIG_XEN_DEBUG_FS
 
 static struct dentry *d_spin_debug;

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC V7 7/12] x86/pvticketlock: use callee-save for lock_spinning

2012-04-19 Thread Raghavendra K T
From: Jeremy Fitzhardinge 

Although the lock_spinning calls in the spinlock code are on the
uncommon path, their presence can cause the compiler to generate many
more register save/restores in the function pre/postamble, which is in
the fast path.  To avoid this, convert it to using the pvops callee-save
calling convention, which defers all the save/restores until the actual
function is called, keeping the fastpath clean.

Signed-off-by: Jeremy Fitzhardinge 
Reviewed-by: Konrad Rzeszutek Wilk 
Tested-by: Attilio Rao  
Signed-off-by: Raghavendra K T 
---
 arch/x86/include/asm/paravirt.h   |2 +-
 arch/x86/include/asm/paravirt_types.h |2 +-
 arch/x86/kernel/paravirt-spinlocks.c  |2 +-
 arch/x86/xen/spinlock.c   |3 ++-
 4 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 4bcd146..9769096 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -754,7 +754,7 @@ static inline void __set_fixmap(unsigned /* enum 
fixed_addresses */ idx,
 static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
__ticket_t ticket)
 {
-   PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
+   PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
 static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock,
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 005e24d..5e0c138 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -330,7 +330,7 @@ struct arch_spinlock;
 #include 
 
 struct pv_lock_ops {
-   void (*lock_spinning)(struct arch_spinlock *lock, __ticket_t ticket);
+   struct paravirt_callee_save lock_spinning;
void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket);
 };
 
diff --git a/arch/x86/kernel/paravirt-spinlocks.c 
b/arch/x86/kernel/paravirt-spinlocks.c
index c2e010e..4251c1d 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -9,7 +9,7 @@
 
 struct pv_lock_ops pv_lock_ops = {
 #ifdef CONFIG_SMP
-   .lock_spinning = paravirt_nop,
+   .lock_spinning = __PV_IS_CALLEE_SAVE(paravirt_nop),
.unlock_kick = paravirt_nop,
 #endif
 };
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index c9bf890..b0cdde1 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -173,6 +173,7 @@ out:
local_irq_restore(flags);
spin_time_accum_blocked(start);
 }
+PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning);
 
 static void xen_unlock_kick(struct arch_spinlock *lock, __ticket_t next)
 {
@@ -232,7 +233,7 @@ void __init xen_init_spinlocks(void)
return;
}
 
-   pv_lock_ops.lock_spinning = xen_lock_spinning;
+   pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning);
pv_lock_ops.unlock_kick = xen_unlock_kick;
 }
 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC V7 8/12] x86/pvticketlock: when paravirtualizing ticket locks, increment by 2

2012-04-19 Thread Raghavendra K T
From: Jeremy Fitzhardinge 

Increment ticket head/tails by 2 rather than 1 to leave the LSB free
to store a "is in slowpath state" bit.  This halves the number
of possible CPUs for a given ticket size, but this shouldn't matter
in practice - kernels built for 32k+ CPU systems are probably
specially built for the hardware rather than a generic distro
kernel.

Signed-off-by: Jeremy Fitzhardinge 
Tested-by: Attilio Rao  
Signed-off-by: Raghavendra K T 
---
 arch/x86/include/asm/spinlock.h   |   10 +-
 arch/x86/include/asm/spinlock_types.h |   10 +-
 2 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index ee4bbd4..60b7e83 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -81,7 +81,7 @@ static __always_inline void __ticket_unlock_kick(struct 
arch_spinlock *lock,
  */
 static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 {
-   register struct __raw_tickets inc = { .tail = 1 };
+   register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };
 
inc = xadd(&lock->tickets, inc);
 
@@ -107,7 +107,7 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
if (old.tickets.head != old.tickets.tail)
return 0;
 
-   new.head_tail = old.head_tail + (1 << TICKET_SHIFT);
+   new.head_tail = old.head_tail + (TICKET_LOCK_INC << TICKET_SHIFT);
 
/* cmpxchg is a full barrier, so nothing can move before it */
return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
@@ -115,9 +115,9 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
 
 static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
-   __ticket_t next = lock->tickets.head + 1;
+   __ticket_t next = lock->tickets.head + TICKET_LOCK_INC;
 
-   __add(&lock->tickets.head, 1, UNLOCK_LOCK_PREFIX);
+   __add(&lock->tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX);
__ticket_unlock_kick(lock, next);
 }
 
@@ -132,7 +132,7 @@ static inline int arch_spin_is_contended(arch_spinlock_t 
*lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
-   return (__ticket_t)(tmp.tail - tmp.head) > 1;
+   return (__ticket_t)(tmp.tail - tmp.head) > TICKET_LOCK_INC;
 }
 #define arch_spin_is_contended arch_spin_is_contended
 
diff --git a/arch/x86/include/asm/spinlock_types.h 
b/arch/x86/include/asm/spinlock_types.h
index 83fd3c7..e96fcbd 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -3,7 +3,13 @@
 
 #include 
 
-#if (CONFIG_NR_CPUS < 256)
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#define __TICKET_LOCK_INC  2
+#else
+#define __TICKET_LOCK_INC  1
+#endif
+
+#if (CONFIG_NR_CPUS < (256 / __TICKET_LOCK_INC))
 typedef u8  __ticket_t;
 typedef u16 __ticketpair_t;
 #else
@@ -11,6 +17,8 @@ typedef u16 __ticket_t;
 typedef u32 __ticketpair_t;
 #endif
 
+#define TICKET_LOCK_INC((__ticket_t)__TICKET_LOCK_INC)
+
 #define TICKET_SHIFT   (sizeof(__ticket_t) * 8)
 
 typedef struct arch_spinlock {

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC V7 10/12] x86/ticketlock: add slowpath logic

2012-04-19 Thread Raghavendra K T
From: Jeremy Fitzhardinge 

Maintain a flag in the LSB of the ticket lock tail which indicates
whether anyone is in the lock slowpath and may need kicking when
the current holder unlocks.  The flags are set when the first locker
enters the slowpath, and cleared when unlocking to an empty queue (ie,
no contention).

In the specific implementation of lock_spinning(), make sure to set
the slowpath flags on the lock just before blocking.  We must do
this before the last-chance pickup test to prevent a deadlock
with the unlocker:

UnlockerLocker
test for lock pickup
-> fail
unlock
test slowpath
-> false
set slowpath flags
block

Whereas this works in any ordering:

UnlockerLocker
set slowpath flags
test for lock pickup
-> fail
block
unlock
test slowpath
-> true, kick

If the unlocker finds that the lock has the slowpath flag set but it is
actually uncontended (ie, head == tail, so nobody is waiting), then it
clears the slowpath flag.

The unlock code uses a locked add to update the head counter.  This also
acts as a full memory barrier so that its safe to subsequently
read back the slowflag state, knowing that the updated lock is visible
to the other CPUs.  If it were an unlocked add, then the flag read may
just be forwarded from the store buffer before it was visible to the other
CPUs, which could result in a deadlock.

Unfortunately this means we need to do a locked instruction when
unlocking with PV ticketlocks.  However, if PV ticketlocks are not
enabled, then the old non-locked "add" is the only unlocking code.

Note: this code relies on gcc making sure that unlikely() code is out of
line of the fastpath, which only happens when OPTIMIZE_SIZE=n.  If it
doesn't the generated code isn't too bad, but its definitely suboptimal.

Thanks to Srivatsa Vaddagiri for providing a bugfix to the original
version of this change, which has been folded in.
Thanks to Stephan Diestelhorst for commenting on some code which relied
on an inaccurate reading of the x86 memory ordering rules.

Signed-off-by: Jeremy Fitzhardinge 
Signed-off-by: Srivatsa Vaddagiri 
Reviewed-by: Konrad Rzeszutek Wilk 
Cc: Stephan Diestelhorst 
Signed-off-by: Raghavendra K T 
---
 arch/x86/include/asm/paravirt.h   |2 +-
 arch/x86/include/asm/spinlock.h   |   86 +++-
 arch/x86/include/asm/spinlock_types.h |2 +
 arch/x86/kernel/paravirt-spinlocks.c  |3 +
 arch/x86/xen/spinlock.c   |6 ++
 5 files changed, 74 insertions(+), 25 deletions(-)
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 9769096..af49670 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -757,7 +757,7 @@ static __always_inline void __ticket_lock_spinning(struct 
arch_spinlock *lock,
PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
-static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock,
+static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock,
__ticket_t ticket)
 {
PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 60b7e83..e6881fd 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -1,11 +1,14 @@
 #ifndef _ASM_X86_SPINLOCK_H
 #define _ASM_X86_SPINLOCK_H
 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
+
 /*
  * Your basic SMP spinlocks, allowing only a single CPU anywhere
  *
@@ -40,32 +43,28 @@
 /* How long a lock should spin before we consider blocking */
 #define SPIN_THRESHOLD (1 << 11)
 
-#ifndef CONFIG_PARAVIRT_SPINLOCKS
+extern struct static_key paravirt_ticketlocks_enabled;
+static __always_inline bool static_key_false(struct static_key *key);
 
-static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
-   __ticket_t ticket)
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+static inline void __ticket_enter_slowpath(arch_spinlock_t *lock)
 {
+   set_bit(0, (volatile unsigned long *)&lock->tickets.tail);
 }
 
-static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock,
-__ticket_t ticket)
+#else  /* !CONFIG_PARAVIRT_SPINLOCKS */
+static __always_inline void __ticket_lock_spinning(arch_spinlock_t *lock,
+   __ticket_t ticket)
 {
 }
-
-#endif /* CONFIG_PARAVIRT_SPINLOCKS */
-
-
-/*
- * If a spinlock has someone waiting on it, then kick the appropriate
- * wait

[PATCH RFC V7 11/12] xen/pvticketlock: allow interrupts to be enabled while blocking

2012-04-19 Thread Raghavendra K T
From: Jeremy Fitzhardinge 

If interrupts were enabled when taking the spinlock, we can leave them
enabled while blocking to get the lock.

If we can enable interrupts while waiting for the lock to become
available, and we take an interrupt before entering the poll,
and the handler takes a spinlock which ends up going into
the slow state (invalidating the per-cpu "lock" and "want" values),
then when the interrupt handler returns the event channel will
remain pending so the poll will return immediately, causing it to
return out to the main spinlock loop.

Signed-off-by: Jeremy Fitzhardinge 
Reviewed-by: Konrad Rzeszutek Wilk 
Signed-off-by: Raghavendra K T 
---
 arch/x86/xen/spinlock.c |   46 --
 1 files changed, 40 insertions(+), 6 deletions(-)
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index e2f312f..d2ab57a 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -142,7 +142,20 @@ static void xen_lock_spinning(struct arch_spinlock *lock, 
__ticket_t want)
 * partially setup state.
 */
local_irq_save(flags);
-
+   /*
+* We don't really care if we're overwriting some other
+* (lock,want) pair, as that would mean that we're currently
+* in an interrupt context, and the outer context had
+* interrupts enabled.  That has already kicked the VCPU out
+* of xen_poll_irq(), so it will just return spuriously and
+* retry with newly setup (lock,want).
+*
+* The ordering protocol on this is that the "lock" pointer
+* may only be set non-NULL if the "want" ticket is correct.
+* If we're updating "want", we must first clear "lock".
+*/
+   w->lock = NULL;
+   smp_wmb();
w->want = want;
smp_wmb();
w->lock = lock;
@@ -157,24 +170,43 @@ static void xen_lock_spinning(struct arch_spinlock *lock, 
__ticket_t want)
/* Only check lock once pending cleared */
barrier();
 
-   /* Mark entry to slowpath before doing the pickup test to make
-  sure we don't deadlock with an unlocker. */
+   /*
+* Mark entry to slowpath before doing the pickup test to make
+* sure we don't deadlock with an unlocker.
+*/
__ticket_enter_slowpath(lock);
 
-   /* check again make sure it didn't become free while
-  we weren't looking  */
+   /*
+* check again make sure it didn't become free while
+* we weren't looking
+*/
if (ACCESS_ONCE(lock->tickets.head) == want) {
add_stats(TAKEN_SLOW_PICKUP, 1);
goto out;
}
+
+   /* Allow interrupts while blocked */
+   local_irq_restore(flags);
+
+   /*
+* If an interrupt happens here, it will leave the wakeup irq
+* pending, which will cause xen_poll_irq() to return
+* immediately.
+*/
+
/* Block until irq becomes pending (or perhaps a spurious wakeup) */
xen_poll_irq(irq);
add_stats(TAKEN_SLOW_SPURIOUS, !xen_test_irq_pending(irq));
+
+   local_irq_save(flags);
+
kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq));
 out:
cpumask_clear_cpu(cpu, &waiting_cpus);
w->lock = NULL;
+
local_irq_restore(flags);
+
spin_time_accum_blocked(start);
 }
 PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning);
@@ -188,7 +220,9 @@ static void xen_unlock_kick(struct arch_spinlock *lock, 
__ticket_t next)
for_each_cpu(cpu, &waiting_cpus) {
const struct xen_lock_waiting *w = &per_cpu(lock_waiting, cpu);
 
-   if (w->lock == lock && w->want == next) {
+   /* Make sure we read lock before want */
+   if (ACCESS_ONCE(w->lock) == lock &&
+   ACCESS_ONCE(w->want) == next) {
add_stats(RELEASED_SLOW_KICKED, 1);
xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
break;

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC V7 12/12] xen: enable PV ticketlocks on HVM Xen

2012-04-19 Thread Raghavendra K T
From: Stefano Stabellini 

Signed-off-by: Jeremy Fitzhardinge 
Reviewed-by: Konrad Rzeszutek Wilk 
Signed-off-by: Raghavendra K T 
---
 arch/x86/xen/smp.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 9ac931b..2192f76 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -574,4 +574,5 @@ void __init xen_hvm_smp_init(void)
smp_ops.cpu_die = xen_hvm_cpu_die;
smp_ops.send_call_func_ipi = xen_smp_send_call_function_ipi;
smp_ops.send_call_func_single_ipi = 
xen_smp_send_call_function_single_ipi;
+   xen_init_spinlocks();
 }

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC V7 9/12] split out rate limiting from jump_label.h

2012-04-19 Thread Raghavendra K T
From: Andrew Jones 

Commit b202952075f62603bea9bfb6ebc6b0420db11949 introduced rate limiting
for jump label disabling. The changes were made in the jump label code
in order to be more widely available and to keep things tidier. This is
all fine, except now jump_label.h includes linux/workqueue.h, which
makes it impossible to include jump_label.h from anything that
workqueue.h needs. For example, it's now impossible to include
jump_label.h from asm/spinlock.h, which is done in proposed
pv-ticketlock patches. This patch splits out the rate limiting related
changes from jump_label.h into a new file, jump_label_ratelimit.h, to
resolve the issue.

Signed-off-by: Andrew Jones 
Signed-off-by: Raghavendra K T 
---
 include/linux/jump_label.h   |   26 +-
 include/linux/jump_label_ratelimit.h |   34 ++
 include/linux/perf_event.h   |1 +
 kernel/jump_label.c  |1 +
 4 files changed, 37 insertions(+), 25 deletions(-)
diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index c513a40..8195227 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -49,7 +49,6 @@
 
 #include 
 #include 
-#include 
 
 #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
 
@@ -62,12 +61,6 @@ struct static_key {
 #endif
 };
 
-struct static_key_deferred {
-   struct static_key key;
-   unsigned long timeout;
-   struct delayed_work work;
-};
-
 # include 
 # define HAVE_JUMP_LABEL
 #endif /* CC_HAVE_ASM_GOTO && CONFIG_JUMP_LABEL */
@@ -126,10 +119,7 @@ extern void arch_jump_label_transform_static(struct 
jump_entry *entry,
 extern int jump_label_text_reserved(void *start, void *end);
 extern void static_key_slow_inc(struct static_key *key);
 extern void static_key_slow_dec(struct static_key *key);
-extern void static_key_slow_dec_deferred(struct static_key_deferred *key);
 extern void jump_label_apply_nops(struct module *mod);
-extern void
-jump_label_rate_limit(struct static_key_deferred *key, unsigned long rl);
 
 #define STATIC_KEY_INIT_TRUE ((struct static_key) \
{ .enabled = ATOMIC_INIT(1), .entries = (void *)1 })
@@ -148,10 +138,6 @@ static __always_inline void jump_label_init(void)
 {
 }
 
-struct static_key_deferred {
-   struct static_key  key;
-};
-
 static __always_inline bool static_key_false(struct static_key *key)
 {
if (unlikely(atomic_read(&key->enabled)) > 0)
@@ -184,11 +170,6 @@ static inline void static_key_slow_dec(struct static_key 
*key)
atomic_dec(&key->enabled);
 }
 
-static inline void static_key_slow_dec_deferred(struct static_key_deferred 
*key)
-{
-   static_key_slow_dec(&key->key);
-}
-
 static inline int jump_label_text_reserved(void *start, void *end)
 {
return 0;
@@ -202,12 +183,6 @@ static inline int jump_label_apply_nops(struct module *mod)
return 0;
 }
 
-static inline void
-jump_label_rate_limit(struct static_key_deferred *key,
-   unsigned long rl)
-{
-}
-
 #define STATIC_KEY_INIT_TRUE ((struct static_key) \
{ .enabled = ATOMIC_INIT(1) })
 #define STATIC_KEY_INIT_FALSE ((struct static_key) \
@@ -218,6 +193,7 @@ jump_label_rate_limit(struct static_key_deferred *key,
 #define STATIC_KEY_INIT STATIC_KEY_INIT_FALSE
 #define jump_label_enabled static_key_enabled
 
+static inline int atomic_read(const atomic_t *v);
 static inline bool static_key_enabled(struct static_key *key)
 {
return (atomic_read(&key->enabled) > 0);
diff --git a/include/linux/jump_label_ratelimit.h 
b/include/linux/jump_label_ratelimit.h
new file mode 100644
index 000..1137883
--- /dev/null
+++ b/include/linux/jump_label_ratelimit.h
@@ -0,0 +1,34 @@
+#ifndef _LINUX_JUMP_LABEL_RATELIMIT_H
+#define _LINUX_JUMP_LABEL_RATELIMIT_H
+
+#include 
+#include 
+
+#if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
+struct static_key_deferred {
+   struct static_key key;
+   unsigned long timeout;
+   struct delayed_work work;
+};
+#endif
+
+#ifdef HAVE_JUMP_LABEL
+extern void static_key_slow_dec_deferred(struct static_key_deferred *key);
+extern void
+jump_label_rate_limit(struct static_key_deferred *key, unsigned long rl);
+
+#else  /* !HAVE_JUMP_LABEL */
+struct static_key_deferred {
+   struct static_key  key;
+};
+static inline void static_key_slow_dec_deferred(struct static_key_deferred 
*key)
+{
+   static_key_slow_dec(&key->key);
+}
+static inline void
+jump_label_rate_limit(struct static_key_deferred *key,
+   unsigned long rl)
+{
+}
+#endif /* HAVE_JUMP_LABEL */
+#endif /* _LINUX_JUMP_LABEL_RATELIMIT_H */
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index ddbb6a9..a0e6118 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -605,6 +605,7 @@ struct perf_guest_info_callbacks {
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 4304919..e

Re: vhost-blk development

2012-04-19 Thread Michael Baysek
Hi Yuan, 

Can you point me to the latest revision of the code and provide some 
guidance on how to test it?  I really would love to see if it helps.

Best,
-Mike


- Original Message -
From: "Liu Yuan" 
To: "Michael Baysek" 
Cc: "Stefan Hajnoczi" , kvm@vger.kernel.org
Sent: Thursday, April 12, 2012 10:38:39 PM
Subject: Re: vhost-blk development

On 04/12/2012 12:52 AM, Michael Baysek wrote:

> In this particular case, I did intend to deploy these instances directly to 
> the ramdisk.  I want to squeeze every drop of performance out of these 
> instances for use cases with lots of concurrent accesses.   I thought it 
> would be possible to achieve improvements an order of magnitude or more 
> over SSD, but it seems not to be the case (so far).  


Last year I tried virtio-blk over vhost, which originally planned to put
virtio-blk driver into kernel to reduce system call overhead and shorten
the code path.

I think in your particular case (ramdisk), virtio-blk will hit the best
performance improvement because biggest time-hogger IO is ripped out in
the path, at least would be expected much better than my last test
numbers (+15% for throughput and -10% for latency) which runs on local disk.

But unfortunately, virtio-blk was not considered to be useful enough at
that time, Qemu folks think it is better to optimize the IO stack in
QEMU instead of setting up another code path for it.

I remember that I developed virtio-blk at Linux 3.0 base, So I think it
is not hard to rebase it on latest kernel code or porting it back to
RHEL 6's modified 2.6.32 kernel.

Thanks,
Yuan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Fix page-crossing MMIO

2012-04-19 Thread Marcelo Tosatti
On Wed, Apr 18, 2012 at 07:22:47PM +0300, Avi Kivity wrote:
> MMIO that are split across a page boundary are currently broken - the
> code does not expect to be aborted by the exit to userspace for the
> first MMIO fragment.
> 
> This patch fixes the problem by generalizing the current code for handling
> 16-byte MMIOs to handle a number of "fragments", and changes the MMIO
> code to create those fragments.
> 
> Signed-off-by: Avi Kivity 

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3.4] KVM: ia64: fix build due to typo

2012-04-19 Thread Marcelo Tosatti
On Wed, Apr 18, 2012 at 07:23:50PM +0300, Avi Kivity wrote:
> s/kcm/kvm/.
> 
> Signed-off-by: Avi Kivity 
> ---
>  arch/ia64/kvm/kvm-ia64.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Simplify RCU freeing of shadow pages

2012-04-19 Thread Xiao Guangrong
On 04/20/2012 12:26 AM, Avi Kivity wrote:

> This patchset simplifies the freeing by RCU of mmu pages.
> 
> Xiao, I'm sure you thought of always freeing by RCU.  Why didn't you choose
> this way?  I saves a couple of atomics in the fast path.
> 


Avi, we have discussed it last year:

https://lkml.org/lkml/2011/6/29/177

I have optimized/simplified for "write flood" a lot, but, unfortunately,
the zapping sp is still frequently.

Maybe we can cache the zapped sp in a invalid_sp_list to reduce the
frequency.

Or, we may use SLAB_DESTROY_BY_RCU to free the shadow page.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vhost-blk development

2012-04-19 Thread Liu Yuan
On 04/20/2012 04:26 AM, Michael Baysek wrote:

> Can you point me to the latest revision of the code and provide some 
> guidance on how to test it?  I really would love to see if it helps.


There is no latest revision, I didn't continue the development when I
saw the sign that it wouldn't be accepted.

Thanks,
Yuan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html