Re: [Qemu-devel] [PATCH v3 1/9] net: introduce tcp_server_start()

2012-03-14 Thread Orit Wasserman
On 03/07/2012 12:47 AM, Amos Kong wrote:
> Introduce tcp_server_start() by moving original code in
> tcp_start_incoming_migration().
> 
> Signed-off-by: Amos Kong 
> ---
>  net.c |   28 
>  qemu_socket.h |2 ++
>  2 files changed, 30 insertions(+), 0 deletions(-)
> 
> diff --git a/net.c b/net.c
> index c34474f..e90ff23 100644
> --- a/net.c
> +++ b/net.c
> @@ -99,6 +99,34 @@ static int get_str_sep(char *buf, int buf_size, const char 
> **pp, int sep)
>  return 0;
>  }
>  
> +int tcp_server_start(const char *str, int *fd)
> +{
> +int val, ret;
> +struct sockaddr_in saddr;
> +
> +if (parse_host_port(&saddr, str) < 0) {
> +error_report("invalid host/port combination: %s", str);
> +return -EINVAL;
> +}
> +
> +*fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
> +if (fd < 0) {
> +perror("socket");

> +return -1;

return -socket_error();

> +}
> +socket_set_nonblock(*fd);

In incoming migration we don't use non-blocking socket.
How about a flag to the function for non-blocking ?

> +
> +/* allow fast reuse */
> +val = 1;
> +setsockopt(*fd, SOL_SOCKET, SO_REUSEADDR, (const char *)&val, 
> sizeof(val));
> +
> +ret = bind(*fd, (struct sockaddr *)&saddr, sizeof(saddr));
> +if (ret < 0) {
> +closesocket(*fd);
> +}
> +return ret;

if (bind(*fd, (struct sockaddr *)&saddr, sizeof(saddr)) < 0)
{   
closesocket(*fd);
return -socket_error();
}
return 0;

and than you will not need ret

> +}
> +
>  int parse_host_port(struct sockaddr_in *saddr, const char *str)
>  {
>  char buf[512];
> diff --git a/qemu_socket.h b/qemu_socket.h
> index fe4cf6c..d612793 100644
> --- a/qemu_socket.h
> +++ b/qemu_socket.h
> @@ -54,6 +54,8 @@ int unix_listen(const char *path, char *ostr, int olen);
>  int unix_connect_opts(QemuOpts *opts);
>  int unix_connect(const char *path);
>  
> +int tcp_server_start(const char *str, int *fd);
> +
>  /* Old, ipv4 only bits.  Don't use for new code. */
>  int parse_host_port(struct sockaddr_in *saddr, const char *str);
>  int socket_init(void);
> 
> 

Orit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3 1/9] net: introduce tcp_server_start()

2012-03-14 Thread Paolo Bonzini
Il 14/03/2012 08:14, Orit Wasserman ha scritto:
> if (bind(*fd, (struct sockaddr *)&saddr, sizeof(saddr)) < 0)
> { 
> closesocket(*fd);
> return -socket_error();
> }
> return 0;
> 
> and than you will not need ret

But closesocket could clobber socket_error(), no?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3 3/9] net: introduce tcp_client_start()

2012-03-14 Thread Orit Wasserman
On 03/07/2012 12:48 AM, Amos Kong wrote:
> Introduce tcp_client_start() by moving original code in
> tcp_start_outgoing_migration().
> 
> Signed-off-by: Amos Kong 
> ---
>  net.c |   41 +
>  qemu_socket.h |1 +
>  2 files changed, 42 insertions(+), 0 deletions(-)
> 
> diff --git a/net.c b/net.c
> index e90ff23..9afb0d1 100644
> --- a/net.c
> +++ b/net.c
> @@ -127,6 +127,47 @@ int tcp_server_start(const char *str, int *fd)
>  return ret;
>  }
>  
> +int tcp_client_start(const char *str, int *fd)
> +{
> +struct sockaddr_in saddr;
> +int ret;
> +
> +*fd = -1;
> +if (parse_host_port(&saddr, str) < 0) {
> +error_report("invalid host/port combination: %s", str);
> +return -EINVAL;
> +}
> +
> +*fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
> +if (fd < 0) {
> +perror("socket");
> +return -1;

return -socket_error();

> +}
> +socket_set_nonblock(*fd);
> +
> +for (;;) {
> +ret = connect(*fd, (struct sockaddr *)&saddr, sizeof(saddr));
> +if (ret < 0) {
> +ret = -socket_error();
> +if (ret == -EINPROGRESS) {
> +break;
> +#ifdef _WIN32
> +} else if (ret == -WSAEALREADY || ret == -WSAEINVAL) {
> +break;
> +#endif
> +} else if (ret != -EINTR && ret != -EWOULDBLOCK) {
> +perror("connect");
> +closesocket(*fd);
> +return ret;
> +}
> +} else {
> +break;
> +}
> +}
> +
> +return ret;
> +}
> +
>  int parse_host_port(struct sockaddr_in *saddr, const char *str)
>  {
>  char buf[512];
> diff --git a/qemu_socket.h b/qemu_socket.h
> index d612793..9246578 100644
> --- a/qemu_socket.h
> +++ b/qemu_socket.h
> @@ -55,6 +55,7 @@ int unix_connect_opts(QemuOpts *opts);
>  int unix_connect(const char *path);
>  
>  int tcp_server_start(const char *str, int *fd);
> +int tcp_client_start(const char *str, int *fd);
>  
>  /* Old, ipv4 only bits.  Don't use for new code. */
>  int parse_host_port(struct sockaddr_in *saddr, const char *str);
> 
> 

Orit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3 1/9] net: introduce tcp_server_start()

2012-03-14 Thread Amos Kong

On 14/03/12 15:27, Paolo Bonzini wrote:




Hi Paolo,


Il 14/03/2012 08:14, Orit Wasserman ha scritto:

if (bind(*fd, (struct sockaddr *)&saddr, sizeof(saddr))<  0)
{   
 closesocket(*fd);
 return -socket_error();
}
return 0;

and than you will not need ret


But closesocket could clobber socket_error(), no?


Yes, it will effect socket_error()

How about this fix ?

ret = bind(*fd, (struct sockaddr *)&saddr, sizeof(saddr));
if (ret < 0) {
ret = -socket_error();
closesocket(*fd);
}
return ret;
}

--
Amos.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] powerpc/kvm: Fix magic page vs. 32-bit RTAS on ppc64

2012-03-14 Thread Benjamin Herrenschmidt
When the kernel calls into RTAS, it switches to 32-bit mode. The
magic page was is longer accessible in that case, causing the
patched instructions in the RTAS call wrapper to crash.

This fixes it by making available a 32-bit mapping of the magic
page in that case. This mapping is flushed whenever we switch
the kernel back to 64-bit mode.

Signed-off-by: Benjamin Herrenschmidt 
---

Avi, please consider merging ASAP as this is a fairly annoying
bug and the fix is reasonably obvious.

 arch/powerpc/kvm/book3s.c|3 +++
 arch/powerpc/kvm/book3s_pr.c |   17 +
 2 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index e41ac6f..34487d4 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -289,6 +289,9 @@ pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
ulong mp_pa = vcpu->arch.magic_page_pa;
 
+   if (!(vcpu->arch.shared->msr & MSR_SF))
+   mp_pa = (uint32_t)mp_pa;
+
/* Magic page override */
if (unlikely(mp_pa) &&
unlikely(((gfn << PAGE_SHIFT) & KVM_PAM) ==
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index d6851a1..23919d4 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -137,6 +137,20 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
}
}
 
+   /*
+* When switching from 32 to 64-bit, we may have a stale 32-bit
+* magic page around, we need to flush it. Typically 32-bit magic
+* page will be instanciated when calling into RTAS. Note: We
+* assume that such transition only happens while in kernel mode,
+* ie, we never transition from user 32-bit to kernel 64-bit with
+* a 32-bit magic page around.
+*/
+   if (!(old_msr & MSR_PR) && !(old_msr & MSR_SF) && (msr & MSR_SF)) {
+   /* going from RTAS to normal kernel code */
+   kvmppc_mmu_pte_flush(vcpu, (uint32_t)vcpu->arch.magic_page_pa,
+~0xFFFUL);
+   }
+
/* Preload FPU if it's enabled */
if (vcpu->arch.shared->msr & MSR_FP)
kvmppc_handle_ext(vcpu, BOOK3S_INTERRUPT_FP_UNAVAIL, MSR_FP);
@@ -242,6 +256,9 @@ static int kvmppc_visible_gfn(struct kvm_vcpu *vcpu, gfn_t 
gfn)
 {
ulong mp_pa = vcpu->arch.magic_page_pa;
 
+   if (!(vcpu->arch.shared->msr & MSR_SF))
+   mp_pa = (uint32_t)mp_pa;
+
if (unlikely(mp_pa) &&
unlikely((mp_pa & KVM_PAM) >> PAGE_SHIFT == gfn)) {
return 1;
-- 
1.7.9



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] powerpc/kvm: Fix magic page vs. 32-bit RTAS on ppc64

2012-03-14 Thread Benjamin Herrenschmidt
On Wed, 2012-03-14 at 18:52 +1100, Benjamin Herrenschmidt wrote:
> When the kernel calls into RTAS, it switches to 32-bit mode. The
> magic page was is longer accessible in that case, causing the
> patched instructions in the RTAS call wrapper to crash.
> 
> This fixes it by making available a 32-bit mapping of the magic
> page in that case. This mapping is flushed whenever we switch
> the kernel back to 64-bit mode.

I forgot to give credit to Alex for the original patch, which I tweaked
a little bit (among others it was missing the bit in kvmppc_gfn_to_pfn)

Cheers,
Ben.

> Signed-off-by: Benjamin Herrenschmidt 
> ---
> 
> Avi, please consider merging ASAP as this is a fairly annoying
> bug and the fix is reasonably obvious.
> 
>  arch/powerpc/kvm/book3s.c|3 +++
>  arch/powerpc/kvm/book3s_pr.c |   17 +
>  2 files changed, 20 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index e41ac6f..34487d4 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -289,6 +289,9 @@ pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn)
>  {
>   ulong mp_pa = vcpu->arch.magic_page_pa;
>  
> + if (!(vcpu->arch.shared->msr & MSR_SF))
> + mp_pa = (uint32_t)mp_pa;
> +
>   /* Magic page override */
>   if (unlikely(mp_pa) &&
>   unlikely(((gfn << PAGE_SHIFT) & KVM_PAM) ==
> diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
> index d6851a1..23919d4 100644
> --- a/arch/powerpc/kvm/book3s_pr.c
> +++ b/arch/powerpc/kvm/book3s_pr.c
> @@ -137,6 +137,20 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
>   }
>   }
>  
> + /*
> +  * When switching from 32 to 64-bit, we may have a stale 32-bit
> +  * magic page around, we need to flush it. Typically 32-bit magic
> +  * page will be instanciated when calling into RTAS. Note: We
> +  * assume that such transition only happens while in kernel mode,
> +  * ie, we never transition from user 32-bit to kernel 64-bit with
> +  * a 32-bit magic page around.
> +  */
> + if (!(old_msr & MSR_PR) && !(old_msr & MSR_SF) && (msr & MSR_SF)) {
> + /* going from RTAS to normal kernel code */
> + kvmppc_mmu_pte_flush(vcpu, (uint32_t)vcpu->arch.magic_page_pa,
> +  ~0xFFFUL);
> + }
> +
>   /* Preload FPU if it's enabled */
>   if (vcpu->arch.shared->msr & MSR_FP)
>   kvmppc_handle_ext(vcpu, BOOK3S_INTERRUPT_FP_UNAVAIL, MSR_FP);
> @@ -242,6 +256,9 @@ static int kvmppc_visible_gfn(struct kvm_vcpu *vcpu, 
> gfn_t gfn)
>  {
>   ulong mp_pa = vcpu->arch.magic_page_pa;
>  
> + if (!(vcpu->arch.shared->msr & MSR_SF))
> + mp_pa = (uint32_t)mp_pa;
> +
>   if (unlikely(mp_pa) &&
>   unlikely((mp_pa & KVM_PAM) >> PAGE_SHIFT == gfn)) {
>   return 1;


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Wen Congyang
At 03/13/2012 06:47 PM, Avi Kivity Wrote:
> On 03/13/2012 11:18 AM, Daniel P. Berrange wrote:
>> On Mon, Mar 12, 2012 at 12:33:33PM +0200, Avi Kivity wrote:
>>> On 03/12/2012 11:04 AM, Wen Congyang wrote:
 Do you have any other comments about this patch?

>>>
>>> Not really, but I'm not 100% convinced the patch is worthwhile.  It's
>>> likely to only be used by Linux, which has kexec facilities, and you can
>>> put talk to management via virtio-serial and describe the crash in more
>>> details than a simple hypercall.
>>
>> As mentioned before, I don't think virtio-serial is a good fit for this.
>> We want something that is simple & guaranteed always available. Using
>> virtio-serial requires significant setup work on both the host and guest.
> 
> So what?  It needs to be done anyway for the guest agent.
> 
>> Many management application won't know to make a vioserial device available
>> to all guests they create. 
> 
> Then they won't know to deal with the panic event either.
> 
>> Most administrators won't even configure kexec,
>> let alone virtio serial on top of it. 
> 
> It should be done by the OS vendor, not the individual admin.
> 
>> The hypercall requires zero host
>> side config, and zero guest side config, which IMHO is what we need for
>> this feature.
> 
> If it was this one feature, yes.  But we keep getting more and more
> features like that and we bloat the hypervisor.  There's a reason we
> have a host-to-guest channel, we should use it.
> 

I donot know how to use virtio-serial.

I start vm like this:
qemu ...\
   -device virtio-serial \
  -chardev socket,path=/tmp/foo,server,nowait,id=foo \
  -device virtserialport,chardev=foo,name=port1 ...

You said that there are too many channels. Does it mean /tmp/foo is a channel?

Thanks
Wen Congyang
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3 1/9] net: introduce tcp_server_start()

2012-03-14 Thread Paolo Bonzini
Il 14/03/2012 08:51, Amos Kong ha scritto:
> 
> 
> ret = bind(*fd, (struct sockaddr *)&saddr, sizeof(saddr));
> if (ret < 0) {
> ret = -socket_error();
> closesocket(*fd);
> }
> return ret;
> }

Looks good.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3 1/9] net: introduce tcp_server_start()

2012-03-14 Thread Amos Kong

On 14/03/12 00:39, Michael Roth wrote:

On Wed, Mar 07, 2012 at 06:47:45AM +0800, Amos Kong wrote:

Introduce tcp_server_start() by moving original code in
tcp_start_incoming_migration().

Signed-off-by: Amos Kong
---
  net.c |   28 
  qemu_socket.h |2 ++
  2 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/net.c b/net.c
index c34474f..e90ff23 100644
--- a/net.c
+++ b/net.c
@@ -99,6 +99,34 @@ static int get_str_sep(char *buf, int buf_size, const char 
**pp, int sep)
  return 0;
  }

+int tcp_server_start(const char *str, int *fd)
+{
+int val, ret;
+struct sockaddr_in saddr;
+
+if (parse_host_port(&saddr, str)<  0) {
+error_report("invalid host/port combination: %s", str);
+return -EINVAL;
+}
+
+*fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
+if (fd<  0) {
+perror("socket");
+return -1;
+}
+socket_set_nonblock(*fd);
+
+/* allow fast reuse */
+val = 1;
+setsockopt(*fd, SOL_SOCKET, SO_REUSEADDR, (const char *)&val, sizeof(val));
+
+ret = bind(*fd, (struct sockaddr *)&saddr, sizeof(saddr));
+if (ret<  0) {
+closesocket(*fd);
+}
+return ret;
+}
+


I would combine this with patch 2, since it provides context for why
this function is being added. Would also do the same for 3 and 4.

I see client the client implementation you need to pass fd back by
reference since ret can be set to EINPROGRESS/EWOULDBLOCK on success,


ret restores 0 or -socket_error()
 success: 0, -EINPROGRESS
 fail   : ret < 0 && ret !=-EINTR && ret != -EWOULDBLOCK

, it should be -EINPROGRESS

+ret = connect(*fd, (struct sockaddr *)&saddr, sizeof(saddr));
+if (ret < 0) {
+ret = -socket_error();
+if (ret == -EINPROGRESS) {
+break;



but here it looks like fd is only value if ret=0, so might as well just
pass back the fd as the function return value.


Passed *fd tells us if socket creation success
'ret' would return 0 or -socket_error()

-socket_error() is the error of socket creation/connection, it would be 
used by other functions.


eg: migration-tcp.c:
int tcp_start_incoming_migration(
...
ret = tcp_server_start(host_port, &s);
if (ret < 0) {
fprintf(stderr, "tcp_server_start: %s\n", strerror(-ret));

We can also add a error msg in tcp_start_outgoing_migration() when 
tcp_client_start() fails.



Also, is there any reason we can't re-use
qemu-sockets.c:inet_listen()/qemu-sockets.c:inet_connect()? AFAICT they
serve the same purpose, and already include some of the work from your
PATCH #6.


We could not directly use it, there are some difference,
such as tcp_start_incoming_migration() doesn't set socket no-blocked,
but net_socket_listen_init() sets socket no-blocked.

There are some repeated code, so I write a tcp_common_start(), and use 
it in net_socket_listen_init()/
net_socket_connect_init() and tcp_start_incoming_migration()/ 
tcp_start_outgoing_migration()




  int parse_host_port(struct sockaddr_in *saddr, const char *str)
  {
  char buf[512];
diff --git a/qemu_socket.h b/qemu_socket.h
index fe4cf6c..d612793 100644
--- a/qemu_socket.h
+++ b/qemu_socket.h
@@ -54,6 +54,8 @@ int unix_listen(const char *path, char *ostr, int olen);
  int unix_connect_opts(QemuOpts *opts);
  int unix_connect(const char *path);

+int tcp_server_start(const char *str, int *fd);
+
  /* Old, ipv4 only bits.  Don't use for new code. */
  int parse_host_port(struct sockaddr_in *saddr, const char *str);
  int socket_init(void);






--
Amos.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/7] RTC: New logic to emulate RTC

2012-03-14 Thread Paolo Bonzini
Il 14/03/2012 01:00, Zhang, Yang Z ha scritto:
> Is there any comments with the version 3?

Can you explain why you dropped the logic to set the timer to the next
event?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] PCI: Device specific reset function

2012-03-14 Thread Tadeusz Struk
On 13/03/12 02:42, Alex Williamson wrote:
> On Mon, 2012-03-12 at 16:55 +, Tadeusz Struk wrote:
>> I have a use case where I need to cleanup resource allocated for Virtual
>> Functions after a guest OS that used it crashed. This cleanup needs to
>> be done before the VF is being FLRed. The only possible way to do this
>> seems to be by using pci_dev_specific_reset() function.
>> This patch adds specific reset function to the pci_driver struct.
>>
>> Signed-off-by: Tadeusz Struk 
>> ---
>>  drivers/pci/quirks.c |4 
>>  include/linux/pci.h  |1 +
>>  2 files changed, 5 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> index 6476547..13d224b 100644
>> --- a/drivers/pci/quirks.c
>> +++ b/drivers/pci/quirks.c
>> @@ -3081,8 +3081,12 @@ static const struct pci_dev_reset_methods 
>> pci_dev_reset_methods[] = {
>>  
>>  int pci_dev_specific_reset(struct pci_dev *dev, int probe)
>>  {
>> +struct pci_driver *drv = dev->driver;
>>  const struct pci_dev_reset_methods *i;
>>  
>> +if (drv && drv->reset)
>> +return drv->reset(dev, probe);
>> +
> 
> Isn't this exactly what the below code is for (pci_dev_reset_methods)?
> In the case of KVM device assignment, the device should be reset via
> this path if the guest crashes, but it may or may not be attached to
> pci-stub at the time.  I assume you're not adding a reset handler to
> pci-stub, so we'll pass through here and use FLR, violating the premise
> of the patch.  Thanks,
> 
> Alex

Hi,
The reset handler supposed to be in the "real" driver not in the pci-stub.
You are right, at this stage the VF can still be assigned to the pci-stub, 
which makes it useless in my case.
I think the only option for me will be to go with the v2 of the patch.
I'll do a bit more testing to make sure this is the case and resend the v2.
Thanks,
Tadeusz


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3 0/7] RTC: New logic to emulate RTC

2012-03-14 Thread Zhang, Yang Z
> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Wednesday, March 14, 2012 4:35 PM
> To: Zhang, Yang Z
> Cc: qemu-de...@nongnu.org; Jan Kiszka; kvm@vger.kernel.org;
> aligu...@us.ibm.com; Marcelo Tosatti
> Subject: Re: [PATCH v3 0/7] RTC: New logic to emulate RTC
> 
> Il 14/03/2012 01:00, Zhang, Yang Z ha scritto:
> > Is there any comments with the version 3?
> 
> Can you explain why you dropped the logic to set the timer to the next
> event?
Do you mean why I change the rtc logic? The reason is that: When a guest is 
idle, the main activity inside qemu is the rtc update timer(2 per second). In 
our experience(running 64 rhel6u1 guests), it will decrease pkg C6 residency 
about 6%(6% means 2 watts in my box). And normally, the guest will not use the 
update-end interrupt and alarm. So there has no need to run a periodic timer 
when guest isn't using it. 

best regards
yang
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/7] RTC: New logic to emulate RTC

2012-03-14 Thread Paolo Bonzini
Il 14/03/2012 09:52, Zhang, Yang Z ha scritto:
> Is there any comments with the version 3?
>>> 
>>> Can you explain why you dropped the logic to set the timer to the
>>> next event?
> Do you mean why I change the rtc logic? The reason is that: When a
> guest is idle, the main activity inside qemu is the rtc update
> timer(2 per second). In our experience(running 64 rhel6u1 guests), it
> will decrease pkg C6 residency about 6%(6% means 2 watts in my box).
> And normally, the guest will not use the update-end interrupt and
> alarm. So there has no need to run a periodic timer when guest isn't
> using it.

No, why you're keeping roughly the same logic as current QEMU, instead
of the more radical changes that were in v2.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 0/2] virtio-pci: fix abort when fail to allocate ioeventfd

2012-03-14 Thread Stefan Hajnoczi
On Wed, Mar 14, 2012 at 12:30 AM, Amos Kong  wrote:
> - Original Message -
>> On Tue, Mar 13, 2012 at 2:47 PM, Amos Kong  wrote:
>
> ...
>
> Hi, Stefan
>
>> > diff --git a/kvm-all.c b/kvm-all.c
>> > index 77eadf6..7157e78 100644
>> > --- a/kvm-all.c
>> > +++ b/kvm-all.c
>> > @@ -771,6 +771,8 @@ static void
>> > kvm_io_ioeventfd_add(MemoryRegionSection *sectio
>> >
>> >     r = kvm_set_ioeventfd_pio_word(fd,
>> >     section->offset_within_address_space,
>> >                                    data, true);
>> > +    if (r == -ENOSPC)
>> > +        return;
>>
>> The caller needs a way to detect the failure.
>
> Yes, about memory API
>
>> >     if (r < 0) {
>> >         abort();
>> >     }
>> >
>> >
>> >
>> >> Basically we need a way for ioeventfd to fail if we hit rlimit or
>> >> the
>> >> in-kernel io bus device limit.  Suggestions?
>> >
>> >
>> >
>> >> (The reason I don't like using has_many_ioeventfds() is because
>> >> it's
>> >> ugly and inefficient to open and then close file descriptors -
>> >> especially in a multithreaded program where file descriptor
>> >> availability can change while we're executing.)
>> >
>> > I identified it's too ugly ;)
>> > but I want to reserve it for older kernel (iobus limit is 6)
>> >
>> > Can we remove the check for old kernel (iobus limit is 6)?
>> > user can disable ioeventfd through qemu cmdline
>> >  virtio-net-pci.ioeventfd=on/off
>> >  virtio-blk-pci.ioeventfd=on/off
>>
>> Why do you want to remove the iobus limit 6 check?  The point of that
>> check is to allow vhost-net to always work since it requires an
>> ioeventfd.
>
>
> ### -device virtio-blk-pci,ioeventfd=off,drive=drive-virtio0-0-0,id=id1 
> -drive ...
>     this blk dev will not use ioeventfd for io notification.
>
> ### -device virtio-net-pci,netdev=he,ioeventfd=off -netdev tap,id=he
>     this net dev will not use ioeventfd
>
> ### -device virtio-net-pci,netdev=he,vhost=on -netdev tap,id=he
>     this dev will take 2 ioeventfd (service notifications from rx/tx queues)
>
> ioeventfd paramenter is a way for user to limit ioeventfd usage.
>
>
> ### qemu-kvm -net none -device 
> virtio-blk-pci,ioeventfd=on,drive=drive-virtio0-0-0,id=id1 -drive ...
> For some odd users, they don't use network, and they need a better disk io 
> performance.
> but we could not satisfy them if we reserve the checking of 6 iobus devs.
>
> Strategy should be decided by users.

We're discussing the wrong thing here, the 6 ioeventfd workaround is
unrelated to the problem you are trying to solve.

Graceful fallback for ioeventfd failure previously existed but it
seems the new memory API broke it.  The thing that needs fixing is the
regression caused by the new memory API code.

An error code path needs to be added to the memory API ioeventfd code
so that virtio-pci.c can deal with failure and QEMU no longer aborts.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] virtio-pci: fix abort when fail to allocate ioeventfd

2012-03-14 Thread Avi Kivity
On 03/13/2012 12:42 PM, Amos Kong wrote:
> Boot up guest with 232 virtio-blk disk, qemu will abort for fail to
> allocate ioeventfd. This patchset changes kvm_has_many_ioeventfds(),
> and check if available ioeventfd exists. If not, virtio-pci will
> fallback to userspace, and don't use ioeventfd for io notification.

How about an alternative way of solving this, within the memory core:
trap those writes in qemu and write to the ioeventfd yourself.  This way
ioeventfds work even without kvm:


  core: create eventfd
  core: install handler for memory address that writes to ioeventfd
  kvm (optional): install kernel handler for ioeventfd

even if the third step fails, the ioeventfd still works, it's just slower.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Avi Kivity
On 03/14/2012 10:29 AM, Wen Congyang wrote:
> At 03/13/2012 06:47 PM, Avi Kivity Wrote:
> > On 03/13/2012 11:18 AM, Daniel P. Berrange wrote:
> >> On Mon, Mar 12, 2012 at 12:33:33PM +0200, Avi Kivity wrote:
> >>> On 03/12/2012 11:04 AM, Wen Congyang wrote:
>  Do you have any other comments about this patch?
> 
> >>>
> >>> Not really, but I'm not 100% convinced the patch is worthwhile.  It's
> >>> likely to only be used by Linux, which has kexec facilities, and you can
> >>> put talk to management via virtio-serial and describe the crash in more
> >>> details than a simple hypercall.
> >>
> >> As mentioned before, I don't think virtio-serial is a good fit for this.
> >> We want something that is simple & guaranteed always available. Using
> >> virtio-serial requires significant setup work on both the host and guest.
> > 
> > So what?  It needs to be done anyway for the guest agent.
> > 
> >> Many management application won't know to make a vioserial device available
> >> to all guests they create. 
> > 
> > Then they won't know to deal with the panic event either.
> > 
> >> Most administrators won't even configure kexec,
> >> let alone virtio serial on top of it. 
> > 
> > It should be done by the OS vendor, not the individual admin.
> > 
> >> The hypercall requires zero host
> >> side config, and zero guest side config, which IMHO is what we need for
> >> this feature.
> > 
> > If it was this one feature, yes.  But we keep getting more and more
> > features like that and we bloat the hypervisor.  There's a reason we
> > have a host-to-guest channel, we should use it.
> > 
>
> I donot know how to use virtio-serial.

I don't either, copying Amit.

> I start vm like this:
> qemu ...\
>-device virtio-serial \
>   -chardev socket,path=/tmp/foo,server,nowait,id=foo \
>   -device virtserialport,chardev=foo,name=port1 ...
>
> You said that there are too many channels. Does it mean /tmp/foo is a channel?

Probably.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for tuesday 31

2012-03-14 Thread Igor Mitsyanko

On 03/12/2012 01:43 PM, Igor Mitsyanko wrote:

On 02/21/2012 07:33 PM, Peter Maydell wrote:


Short summary:
* switch wp groups to bitfield rather than int array
* convert sd.c to use memory_region_init_ram() to allocate the wp groups
(being careful to use memory_region_set_dirty() when we touch them)
* we don't need variable-length fields for sd.c any more
* rest of the vmstate conversion is straightforward



After a conversion to bitfield wp_groups consumes 2048 bytes for 32 GB
SD image, do we really need live section for this?



I don't know how large is too large, but currently we have

VMSTATE_UINT32_ARRAY(tx_status_fifo, lan9118_state, 512),
VMSTATE_UINT32_ARRAY(rx_status_fifo, lan9118_state, 896),
VMSTATE_UINT32_ARRAY(rx_fifo, lan9118_state, 3360),
VMSTATE_INT32_ARRAY(rx_packet_size, lan9118_state, 1024),

in hw/lan9118.c
--
Mitsyanko Igor
ASWG, Moscow R&D center, Samsung Electronics
email: i.mitsya...@samsung.com
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Amit Shah
On (Wed) 14 Mar 2012 [16:29:50], Wen Congyang wrote:
> At 03/13/2012 06:47 PM, Avi Kivity Wrote:
> > On 03/13/2012 11:18 AM, Daniel P. Berrange wrote:
> >> On Mon, Mar 12, 2012 at 12:33:33PM +0200, Avi Kivity wrote:
> >>> On 03/12/2012 11:04 AM, Wen Congyang wrote:
>  Do you have any other comments about this patch?
> 
> >>>
> >>> Not really, but I'm not 100% convinced the patch is worthwhile.  It's
> >>> likely to only be used by Linux, which has kexec facilities, and you can
> >>> put talk to management via virtio-serial and describe the crash in more
> >>> details than a simple hypercall.
> >>
> >> As mentioned before, I don't think virtio-serial is a good fit for this.
> >> We want something that is simple & guaranteed always available. Using
> >> virtio-serial requires significant setup work on both the host and guest.
> > 
> > So what?  It needs to be done anyway for the guest agent.
> > 
> >> Many management application won't know to make a vioserial device available
> >> to all guests they create. 
> > 
> > Then they won't know to deal with the panic event either.
> > 
> >> Most administrators won't even configure kexec,
> >> let alone virtio serial on top of it. 
> > 
> > It should be done by the OS vendor, not the individual admin.
> > 
> >> The hypercall requires zero host
> >> side config, and zero guest side config, which IMHO is what we need for
> >> this feature.
> > 
> > If it was this one feature, yes.  But we keep getting more and more
> > features like that and we bloat the hypervisor.  There's a reason we
> > have a host-to-guest channel, we should use it.
> > 
> 
> I donot know how to use virtio-serial.
> 
> I start vm like this:
> qemu ...\
>-device virtio-serial \
>   -chardev socket,path=/tmp/foo,server,nowait,id=foo \
>   -device virtserialport,chardev=foo,name=port1 ...

This is sufficient.  On the host, you can open /tmp/foo using a custom
program or nc (nc -U /tmp/foo).  On the guest, you can just open
/dev/virtio-ports/port1 and read/write into it.

See the following links for more details.

https://fedoraproject.org/wiki/Features/VirtioSerial#How_To_Test
http://www.linux-kvm.org/page/Virtio-serial_API

> You said that there are too many channels. Does it mean /tmp/foo is a channel?

You can have several such -device virtserialport.  The -device part
describes what the guest will see.  The -chardev part ties that to the
host-side part of the channel.

/tmp/foo is the host end-point for the channel, in the example above,
and /dev/virtio-ports/port1 is the guest-side end-point.

Amit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Isolation groups

2012-03-14 Thread David Gibson
On Tue, Mar 13, 2012 at 10:49:47AM -0600, Alex Williamson wrote:
> On Wed, 2012-03-14 at 01:33 +1100, David Gibson wrote:
> > On Mon, Mar 12, 2012 at 04:32:54PM -0600, Alex Williamson wrote:
> > > Signed-off-by: Alex Williamson 
> > > ---
> > > 
> > >  drivers/base/Kconfig  |   10 +
> > >  drivers/base/Makefile |1 
> > >  drivers/base/base.h   |5 
> > >  drivers/base/isolation.c  |  798 
> > > +
> > >  include/linux/device.h|4 
> > >  include/linux/isolation.h |  138 
> > >  6 files changed, 956 insertions(+), 0 deletions(-)
> > >  create mode 100644 drivers/base/isolation.c
> > >  create mode 100644 include/linux/isolation.h
> > > 
> > > diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> > > index 7be9f79..e98a5f3 100644
> > > --- a/drivers/base/Kconfig
> > > +++ b/drivers/base/Kconfig
> > > @@ -189,4 +189,14 @@ config DMA_SHARED_BUFFER
> > > APIs extension; the file's descriptor can then be passed on to other
> > > driver.
> > >  
> > > +config ISOLATION_GROUPS
> > > + bool "Enable Isolation Group API"
> > > + default n
> > > + depends on EXPERIMENTAL && IOMMU_API
> > > + help
> > > +   This option enables grouping of devices into Isolation Groups
> > > +   which may be used by other subsystems to perform quirks across
> > > +   sets of devices as well as userspace drivers for guaranteeing
> > > +   devices are isolated from the rest of the system.
> > > +
> > >  endmenu
> > > diff --git a/drivers/base/Makefile b/drivers/base/Makefile
> > > index 610f999..047b5f9 100644
> > > --- a/drivers/base/Makefile
> > > +++ b/drivers/base/Makefile
> > > @@ -19,6 +19,7 @@ obj-$(CONFIG_MODULES)   += module.o
> > >  endif
> > >  obj-$(CONFIG_SYS_HYPERVISOR) += hypervisor.o
> > >  obj-$(CONFIG_REGMAP) += regmap/
> > > +obj-$(CONFIG_ISOLATION_GROUPS)   += isolation.o
> > >  
> > >  ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
> > >  
> > > diff --git a/drivers/base/base.h b/drivers/base/base.h
> > > index b858dfd..376758a 100644
> > > --- a/drivers/base/base.h
> > > +++ b/drivers/base/base.h
> > > @@ -1,4 +1,5 @@
> > >  #include 
> > > +#include 
> > >  
> > >  /**
> > >   * struct subsys_private - structure to hold the private to the driver 
> > > core portions of the bus_type/class structure.
> > > @@ -108,6 +109,10 @@ extern int driver_probe_device(struct device_driver 
> > > *drv, struct device *dev);
> > >  static inline int driver_match_device(struct device_driver *drv,
> > > struct device *dev)
> > >  {
> > > + if (isolation_group_managed(to_isolation_group(dev)) &&
> > > + !isolation_group_manager_driver(to_isolation_group(dev), drv))
> > > + return 0;
> > > +
> > >   return drv->bus->match ? drv->bus->match(dev, drv) : 1;
> > >  }
> > >  
> > > diff --git a/drivers/base/isolation.c b/drivers/base/isolation.c
> > > new file mode 100644
> > > index 000..c01365c
> > > --- /dev/null
> > > +++ b/drivers/base/isolation.c
> > > @@ -0,0 +1,798 @@
> > > +/*
> > > + * Isolation group interface
> > > + *
> > > + * Copyright (C) 2012 Red Hat, Inc. All rights reserved.
> > > + * Author: Alex Williamson 
> > > + *
> > > + * This program is free software; you can redistribute it and/or modify
> > > + * it under the terms of the GNU General Public License version 2 as
> > > + * published by the Free Software Foundation.
> > > + *
> > > + */
> > > +
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +
> > > +static struct kset *isolation_kset;
> > > +/* XXX add more complete locking, maybe rcu */
> > > +static DEFINE_MUTEX(isolation_lock);
> > > +static LIST_HEAD(isolation_groups);
> > > +static LIST_HEAD(isolation_notifiers);
> > > +
> > > +/* Keep these private */
> > > +struct isolation_manager_driver {
> > > + struct device_driver *drv;
> > > + struct list_head list;
> > > +};
> > > +
> > > +struct isolation_manager {
> > > + struct list_head drivers;
> > > + /* Likely need managers to register some callbacks */
> > > +};
> > > +
> > > +struct isolation_group {
> > > + struct list_head list;
> > > + struct list_head devices;
> > > + struct kobject kobj;
> > > + struct kobject *devices_kobj;
> > > + struct iommu_domain *iommu_domain;
> > > + struct iommu_ops *iommu_ops;
> > > + struct isolation_manager *manager;
> > > + uuid_le uuid;
> > > +};
> > > +
> > > +struct isolation_device {
> > > + struct device *dev;
> > > + struct list_head list;
> > > +};
> > > +
> > > +struct isolation_notifier {
> > > + struct bus_type *bus;
> > > + struct notifier_block nb;
> > > + struct blocking_notifier_head notifier;
> > > + struct list_head list;
> > > + int refcnt;
> > > +};
> > > +
> > > +struct iso_attribute {
> > > + struct attribute attr;
> > > + ssize_t (*show)(struct isolation_group *group, char *buf);
> > > + ssize_t (*store)(struct isolation_group *group,
> > > +  const char *buf, 

Re: [Qemu-devel] [PATCH v3 9/9] net: support to include ipv6 address by brackets

2012-03-14 Thread Amos Kong

On 14/03/12 03:47, Michael Roth wrote:

On Wed, Mar 07, 2012 at 06:48:57AM +0800, Amos Kong wrote:

That method of representing an IPv6 address with a port is


I'm not sure what "that" is referencing.


2312::8274:5200  (5200 is a port)


I assumed the previous patch
but the representation seems to be the same?


2312::8274:5200

[2312::8274]:5200

The second one is better.


discouraged because of its ambiguity. Referencing to RFC5952,
the recommended format is:

  [2312::8274]:5200

For IPv6 brackets must be mandatory if you require a port.

test status: Successed
listen side: qemu-kvm  -incoming tcp:[2312::8274]:5200
client side: qemu-kvm ...
  (qemu) migrate -d tcp:[2312::8274]:5200

Signed-off-by: Amos Kong
---
  net.c |6 ++
  1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/net.c b/net.c
index d6ce1fa..499ed1d 100644
--- a/net.c
+++ b/net.c
@@ -88,6 +88,12 @@ static int get_str_sep(char *buf, int buf_size, const char 
**pp, int sep)
  if (!p1)
  return -1;
  len = p1 - p;
+/* remove brackets which includes hostname */
+if (*p == '['&&  *(p1-1) == ']') {
+p += 1;
+len -= 2;
+}


Sorry, looking again I guess net/slirp.c actually has it's own copy of
get_str_sep(), so modifying this doesn't look like it would break
anything currently.



It might cause some confusion though :). And I think
the special handling for brackets should be done in
parse_host_port_info() since get_str_sep() is pretty generically named.


agree.


+
  p1++;
  if (buf_size>  0) {
  if (len>  buf_size - 1)




--
Amos.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] virtio-pci: fix abort when fail to allocate ioeventfd

2012-03-14 Thread Stefan Hajnoczi
On Wed, Mar 14, 2012 at 9:22 AM, Avi Kivity  wrote:
> On 03/13/2012 12:42 PM, Amos Kong wrote:
>> Boot up guest with 232 virtio-blk disk, qemu will abort for fail to
>> allocate ioeventfd. This patchset changes kvm_has_many_ioeventfds(),
>> and check if available ioeventfd exists. If not, virtio-pci will
>> fallback to userspace, and don't use ioeventfd for io notification.
>
> How about an alternative way of solving this, within the memory core:
> trap those writes in qemu and write to the ioeventfd yourself.  This way
> ioeventfds work even without kvm:
>
>
>  core: create eventfd
>  core: install handler for memory address that writes to ioeventfd
>  kvm (optional): install kernel handler for ioeventfd
>
> even if the third step fails, the ioeventfd still works, it's just slower.

That approach will penalize guests with large numbers of disks - they
see an extra switch to vcpu thread instead of kvm.ko -> iothread.  It
seems okay provided we can solve the limit in the kernel once and for
all by introducing a more dynamic data structure for in-kernel
devices.  That way future kernels will never hit an arbitrary limit
below their file descriptor rlimit.

Is there some reason why kvm.ko must use a fixed size array?  Would it
be possible to use a tree (maybe with a cache for recent lookups)?

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Convert intx_mask_lock to spin lock

2012-03-14 Thread Jan Kiszka
As kvm_notify_acked_irq calls kvm_assigned_dev_ack_irq under
rcu_read_lock, we cannot use a mutex in the latter function. Switch to a
spin lock to address this.

Signed-off-by: Jan Kiszka 
---
 include/linux/kvm_host.h |2 +-
 virt/kvm/assigned-dev.c  |   14 +++---
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b18ae78..6cf158c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -568,7 +568,7 @@ struct kvm_assigned_dev_kernel {
struct pci_dev *dev;
struct kvm *kvm;
spinlock_t intx_lock;
-   struct mutex intx_mask_lock;
+   spinlock_t intx_mask_lock;
char irq_name[32];
struct pci_saved_state *pci_saved_state;
 };
diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index 08e0571..01f572c 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -77,11 +77,11 @@ kvm_assigned_dev_raise_guest_irq(struct 
kvm_assigned_dev_kernel *assigned_dev,
 {
if (unlikely(assigned_dev->irq_requested_type &
 KVM_DEV_IRQ_GUEST_INTX)) {
-   mutex_lock(&assigned_dev->intx_mask_lock);
+   spin_lock(&assigned_dev->intx_mask_lock);
if (!(assigned_dev->flags & KVM_DEV_ASSIGN_MASK_INTX))
kvm_set_irq(assigned_dev->kvm,
assigned_dev->irq_source_id, vector, 1);
-   mutex_unlock(&assigned_dev->intx_mask_lock);
+   spin_unlock(&assigned_dev->intx_mask_lock);
} else
kvm_set_irq(assigned_dev->kvm, assigned_dev->irq_source_id,
vector, 1);
@@ -141,7 +141,7 @@ static void kvm_assigned_dev_ack_irq(struct 
kvm_irq_ack_notifier *kian)
 
kvm_set_irq(dev->kvm, dev->irq_source_id, dev->guest_irq, 0);
 
-   mutex_lock(&dev->intx_mask_lock);
+   spin_lock(&dev->intx_mask_lock);
 
if (!(dev->flags & KVM_DEV_ASSIGN_MASK_INTX)) {
bool reassert = false;
@@ -165,7 +165,7 @@ static void kvm_assigned_dev_ack_irq(struct 
kvm_irq_ack_notifier *kian)
dev->guest_irq, 1);
}
 
-   mutex_unlock(&dev->intx_mask_lock);
+   spin_unlock(&dev->intx_mask_lock);
 }
 
 static void deassign_guest_irq(struct kvm *kvm,
@@ -707,7 +707,7 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
match->flags = assigned_dev->flags;
match->dev = dev;
spin_lock_init(&match->intx_lock);
-   mutex_init(&match->intx_mask_lock);
+   spin_lock_init(&match->intx_mask_lock);
match->irq_source_id = -1;
match->kvm = kvm;
match->ack_notifier.irq_acked = kvm_assigned_dev_ack_irq;
@@ -868,7 +868,7 @@ static int kvm_vm_ioctl_set_pci_irq_mask(struct kvm *kvm,
goto out;
}
 
-   mutex_lock(&match->intx_mask_lock);
+   spin_lock(&match->intx_mask_lock);
 
match->flags &= ~KVM_DEV_ASSIGN_MASK_INTX;
match->flags |= assigned_dev->flags & KVM_DEV_ASSIGN_MASK_INTX;
@@ -895,7 +895,7 @@ static int kvm_vm_ioctl_set_pci_irq_mask(struct kvm *kvm,
}
}
 
-   mutex_unlock(&match->intx_mask_lock);
+   spin_unlock(&match->intx_mask_lock);
 
 out:
mutex_unlock(&kvm->lock);
-- 
1.7.3.4
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3 1/9] net: introduce tcp_server_start()

2012-03-14 Thread Orit Wasserman
On 03/14/2012 09:51 AM, Amos Kong wrote:
> On 14/03/12 15:27, Paolo Bonzini wrote:
>>
> 
> Hi Paolo,
> 
>> Il 14/03/2012 08:14, Orit Wasserman ha scritto:
>>> if (bind(*fd, (struct sockaddr *)&saddr, sizeof(saddr))<  0)
>>> {   
>>>  closesocket(*fd);
>>>  return -socket_error();
>>> }
>>> return 0;
>>>
>>> and than you will not need ret
>>
>> But closesocket could clobber socket_error(), no?

Completely correct.

> 
> Yes, it will effect socket_error()
> 
> How about this fix ?
> 
> ret = bind(*fd, (struct sockaddr *)&saddr, sizeof(saddr));
> if (ret < 0) {
> ret = -socket_error();
> closesocket(*fd);
> }
> return ret;
> }
> 

Looks good.

Orit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Wen Congyang
At 03/14/2012 05:24 PM, Avi Kivity Wrote:
> On 03/14/2012 10:29 AM, Wen Congyang wrote:
>> At 03/13/2012 06:47 PM, Avi Kivity Wrote:
>>> On 03/13/2012 11:18 AM, Daniel P. Berrange wrote:
 On Mon, Mar 12, 2012 at 12:33:33PM +0200, Avi Kivity wrote:
> On 03/12/2012 11:04 AM, Wen Congyang wrote:
>> Do you have any other comments about this patch?
>>
>
> Not really, but I'm not 100% convinced the patch is worthwhile.  It's
> likely to only be used by Linux, which has kexec facilities, and you can
> put talk to management via virtio-serial and describe the crash in more
> details than a simple hypercall.

 As mentioned before, I don't think virtio-serial is a good fit for this.
 We want something that is simple & guaranteed always available. Using
 virtio-serial requires significant setup work on both the host and guest.
>>>
>>> So what?  It needs to be done anyway for the guest agent.
>>>
 Many management application won't know to make a vioserial device available
 to all guests they create. 
>>>
>>> Then they won't know to deal with the panic event either.
>>>
 Most administrators won't even configure kexec,
 let alone virtio serial on top of it. 
>>>
>>> It should be done by the OS vendor, not the individual admin.
>>>
 The hypercall requires zero host
 side config, and zero guest side config, which IMHO is what we need for
 this feature.
>>>
>>> If it was this one feature, yes.  But we keep getting more and more
>>> features like that and we bloat the hypervisor.  There's a reason we
>>> have a host-to-guest channel, we should use it.
>>>
>>
>> I donot know how to use virtio-serial.
> 
> I don't either, copying Amit.
> 
>> I start vm like this:
>> qemu ...\
>>-device virtio-serial \
>>   -chardev socket,path=/tmp/foo,server,nowait,id=foo \
>>   -device virtserialport,chardev=foo,name=port1 ...
>>
>> You said that there are too many channels. Does it mean /tmp/foo is a 
>> channel?
> 
> Probably.

Hmm, if we use virtio-serial, the guest kernel writes something into the 
channel when
the os is panicked. Is it right?

If so, is this channel visible to guest userspace? If the channle is visible to 
guest
userspace, the program running in userspace may write the same message to the 
channel.

Thanks
Wen Congyang

> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] virtio-pci: fix abort when fail to allocate ioeventfd

2012-03-14 Thread Avi Kivity
On 03/14/2012 11:59 AM, Stefan Hajnoczi wrote:
> On Wed, Mar 14, 2012 at 9:22 AM, Avi Kivity  wrote:
> > On 03/13/2012 12:42 PM, Amos Kong wrote:
> >> Boot up guest with 232 virtio-blk disk, qemu will abort for fail to
> >> allocate ioeventfd. This patchset changes kvm_has_many_ioeventfds(),
> >> and check if available ioeventfd exists. If not, virtio-pci will
> >> fallback to userspace, and don't use ioeventfd for io notification.
> >
> > How about an alternative way of solving this, within the memory core:
> > trap those writes in qemu and write to the ioeventfd yourself.  This way
> > ioeventfds work even without kvm:
> >
> >
> >  core: create eventfd
> >  core: install handler for memory address that writes to ioeventfd
> >  kvm (optional): install kernel handler for ioeventfd
> >
> > even if the third step fails, the ioeventfd still works, it's just slower.
>
> That approach will penalize guests with large numbers of disks - they
> see an extra switch to vcpu thread instead of kvm.ko -> iothread.

It's only a failure path.  The normal path is expected to have a kvm
ioeventfd installed.

>   It
> seems okay provided we can solve the limit in the kernel once and for
> all by introducing a more dynamic data structure for in-kernel
> devices.  That way future kernels will never hit an arbitrary limit
> below their file descriptor rlimit.
>
> Is there some reason why kvm.ko must use a fixed size array?  Would it
> be possible to use a tree (maybe with a cache for recent lookups)?

It does use bsearch today IIRC.  We'll expand the limit, but there must
be a limit, and qemu must be prepared to deal with it.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Wen Congyang
At 03/14/2012 05:51 PM, Amit Shah Wrote:
> On (Wed) 14 Mar 2012 [16:29:50], Wen Congyang wrote:
>> At 03/13/2012 06:47 PM, Avi Kivity Wrote:
>>> On 03/13/2012 11:18 AM, Daniel P. Berrange wrote:
 On Mon, Mar 12, 2012 at 12:33:33PM +0200, Avi Kivity wrote:
> On 03/12/2012 11:04 AM, Wen Congyang wrote:
>> Do you have any other comments about this patch?
>>
>
> Not really, but I'm not 100% convinced the patch is worthwhile.  It's
> likely to only be used by Linux, which has kexec facilities, and you can
> put talk to management via virtio-serial and describe the crash in more
> details than a simple hypercall.

 As mentioned before, I don't think virtio-serial is a good fit for this.
 We want something that is simple & guaranteed always available. Using
 virtio-serial requires significant setup work on both the host and guest.
>>>
>>> So what?  It needs to be done anyway for the guest agent.
>>>
 Many management application won't know to make a vioserial device available
 to all guests they create. 
>>>
>>> Then they won't know to deal with the panic event either.
>>>
 Most administrators won't even configure kexec,
 let alone virtio serial on top of it. 
>>>
>>> It should be done by the OS vendor, not the individual admin.
>>>
 The hypercall requires zero host
 side config, and zero guest side config, which IMHO is what we need for
 this feature.
>>>
>>> If it was this one feature, yes.  But we keep getting more and more
>>> features like that and we bloat the hypervisor.  There's a reason we
>>> have a host-to-guest channel, we should use it.
>>>
>>
>> I donot know how to use virtio-serial.
>>
>> I start vm like this:
>> qemu ...\
>>-device virtio-serial \
>>   -chardev socket,path=/tmp/foo,server,nowait,id=foo \
>>   -device virtserialport,chardev=foo,name=port1 ...
> 
> This is sufficient.  On the host, you can open /tmp/foo using a custom
> program or nc (nc -U /tmp/foo).  On the guest, you can just open
> /dev/virtio-ports/port1 and read/write into it.

I have two questions:
1. does it OK to open this device when the guest is panicked?
2. how to prevent the userspace's program using this device?

Thanks
Wen Congyang

> 
> See the following links for more details.
> 
> https://fedoraproject.org/wiki/Features/VirtioSerial#How_To_Test
> http://www.linux-kvm.org/page/Virtio-serial_API
> 
>> You said that there are too many channels. Does it mean /tmp/foo is a 
>> channel?
> 
> You can have several such -device virtserialport.  The -device part
> describes what the guest will see.  The -chardev part ties that to the
> host-side part of the channel.
> 
> /tmp/foo is the host end-point for the channel, in the example above,
> and /dev/virtio-ports/port1 is the guest-side end-point.
> 
>   Amit
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Avi Kivity
On 03/14/2012 11:53 AM, Wen Congyang wrote:
> At 03/14/2012 05:24 PM, Avi Kivity Wrote:
> > On 03/14/2012 10:29 AM, Wen Congyang wrote:
> >> At 03/13/2012 06:47 PM, Avi Kivity Wrote:
> >>> On 03/13/2012 11:18 AM, Daniel P. Berrange wrote:
>  On Mon, Mar 12, 2012 at 12:33:33PM +0200, Avi Kivity wrote:
> > On 03/12/2012 11:04 AM, Wen Congyang wrote:
> >> Do you have any other comments about this patch?
> >>
> >
> > Not really, but I'm not 100% convinced the patch is worthwhile.  It's
> > likely to only be used by Linux, which has kexec facilities, and you can
> > put talk to management via virtio-serial and describe the crash in more
> > details than a simple hypercall.
> 
>  As mentioned before, I don't think virtio-serial is a good fit for this.
>  We want something that is simple & guaranteed always available. Using
>  virtio-serial requires significant setup work on both the host and guest.
> >>>
> >>> So what?  It needs to be done anyway for the guest agent.
> >>>
>  Many management application won't know to make a vioserial device 
>  available
>  to all guests they create. 
> >>>
> >>> Then they won't know to deal with the panic event either.
> >>>
>  Most administrators won't even configure kexec,
>  let alone virtio serial on top of it. 
> >>>
> >>> It should be done by the OS vendor, not the individual admin.
> >>>
>  The hypercall requires zero host
>  side config, and zero guest side config, which IMHO is what we need for
>  this feature.
> >>>
> >>> If it was this one feature, yes.  But we keep getting more and more
> >>> features like that and we bloat the hypervisor.  There's a reason we
> >>> have a host-to-guest channel, we should use it.
> >>>
> >>
> >> I donot know how to use virtio-serial.
> > 
> > I don't either, copying Amit.
> > 
> >> I start vm like this:
> >> qemu ...\
> >>-device virtio-serial \
> >>   -chardev socket,path=/tmp/foo,server,nowait,id=foo \
> >>   -device virtserialport,chardev=foo,name=port1 ...
> >>
> >> You said that there are too many channels. Does it mean /tmp/foo is a 
> >> channel?
> > 
> > Probably.
>
> Hmm, if we use virtio-serial, the guest kernel writes something into the 
> channel when
> the os is panicked. Is it right?

Right.

> If so, is this channel visible to guest userspace? If the channle is visible 
> to guest
> userspace, the program running in userspace may write the same message to the 
> channel.
>

Surely there's some kind of access control on channels.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Avi Kivity
On 03/14/2012 12:04 PM, Wen Congyang wrote:
> > 
> > This is sufficient.  On the host, you can open /tmp/foo using a custom
> > program or nc (nc -U /tmp/foo).  On the guest, you can just open
> > /dev/virtio-ports/port1 and read/write into it.
>
> I have two questions:
> 1. does it OK to open this device when the guest is panicked?
>

On panic, I think it's best to reset the device and drive it directly
from the panic handler.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4 V16] Add functions to check if the host has stopped the vm

2012-03-14 Thread Jan Kiszka
On 2012-03-10 20:37, Eric B Munson wrote:
> When a host stops or suspends a VM it will set a flag to show this.  The
> watchdog will use these functions to determine if a softlockup is real, or the
> result of a suspended VM.
> 

...

> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index f8492da..4ba090c 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -114,6 +115,26 @@ static void kvm_get_preset_lpj(void)
>   preset_lpj = lpj;
>  }
>  
> +bool kvm_check_and_clear_guest_paused(void)
> +{
> + bool ret = false;
> + struct pvclock_vcpu_time_info *src;
> +
> + /*
> +  * per_cpu() is safe here because this function is only called from
> +  * timer functions where preemption is already disabled.
> +  */
> + WARN_ON(!in_atomic());
> + src = &__get_cpu_var(hv_clock);
> + if ((src->flags & PVCLOCK_GUEST_STOPPED) != 0) {
> + __this_cpu_and(hv_clock.flags, ~PVCLOCK_GUEST_STOPPED);
> + ret = true;
> + }
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(kvm_check_and_clear_guest_paused);

Which module is going to use this? I failed to find it, but I may have
missed something.

The actual reason for this reply: this patches causes a compiler warning
here, likely due to lacking include for EXPORT_SYMBOL_GPL:

  CC  arch/x86/kernel/kvmclock.o
/data/linux-kvm/arch/x86/kernel/kvmclock.c:136:1: warning: data
definition has no type or storage class
/data/linux-kvm/arch/x86/kernel/kvmclock.c:136:1: warning: type defaults
to ‘int’ in declaration of ‘EXPORT_SYMBOL_GPL’
/data/linux-kvm/arch/x86/kernel/kvmclock.c:136:1: warning: parameter
names (without types) in function declaration

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for tuesday 31

2012-03-14 Thread Avi Kivity
On 03/12/2012 11:43 AM, Igor Mitsyanko wrote:
> On 02/21/2012 07:33 PM, Peter Maydell wrote:
>>
>> Short summary:
>>   * switch wp groups to bitfield rather than int array
>>   * convert sd.c to use memory_region_init_ram() to allocate the wp
>> groups
>> (being careful to use memory_region_set_dirty() when we touch them)
>>   * we don't need variable-length fields for sd.c any more
>>   * rest of the vmstate conversion is straightforward
>>
>
> After a conversion to bitfield wp_groups consumes 2048 bytes for 32 GB
> SD image, do we really need live section for this?
>

2K is certainly okay, some processors hold more state.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3 3/9] net: introduce tcp_client_start()

2012-03-14 Thread Amos Kong

On 14/03/12 02:35, Michael Roth wrote:

On Wed, Mar 07, 2012 at 06:48:03AM +0800, Amos Kong wrote:

Introduce tcp_client_start() by moving original code in
tcp_start_outgoing_migration().

Signed-off-by: Amos Kong
---
  net.c |   41 +
  qemu_socket.h |1 +
  2 files changed, 42 insertions(+), 0 deletions(-)

diff --git a/net.c b/net.c
index e90ff23..9afb0d1 100644
--- a/net.c
+++ b/net.c
@@ -127,6 +127,47 @@ int tcp_server_start(const char *str, int *fd)
  return ret;
  }

+int tcp_client_start(const char *str, int *fd)
+{


...

Hi Michael,



+*fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
+if (fd<  0) {
+perror("socket");
+return -1;
+}
+socket_set_nonblock(*fd);
+
+for (;;) {
+ret = connect(*fd, (struct sockaddr *)&saddr, sizeof(saddr));
+if (ret<  0) {
+ret = -socket_error();
+if (ret == -EINPROGRESS) {
+break;


The previous implementation and your next patch seem to be expecting a break on
-EWOULDBLOCK/-EAGAIN as well. Was the behavior changed on purpose?


In original tcp_start_outgoing_migration():
  break:  -EINPROGRES
  cont :  -EINTR or -EWOULDBLOCK

In original net_socket_connect_init():
  break:  -EINPROGRES or -EWOULDBLOCK
  cont :  -EINTR


http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_15.html
EWOULDBLOCK
socket has nonblocking mode set, and there are no pending 
connections immediately available.


So continue to re-connect if EWOULDBLOCK or EINTR returned by 
socket_error() in tcp_client_start()




  I'm not
sure what the proper handling is for -EAGAIN: whether a non-blocking connect()
can eventually succeed or not. I suspect that it's not, but that previously we
treated it synonymously with -EINPROGRESS, then eventually got an error via
getsockopt() before failing the migration. If so, we're now changing the
behavior to retry until successful, but given the man page entry I don't
think that's a good idea since you might block indefinitely:

  EAGAIN No  more  free local ports or insufficient
   entries in the routing cache.  For AF_INET
   seethedescription   of
   /proc/sys/net/ipv4/ip_local_port_range
   ip(7)  for  information on how to increase
   the number of local ports.



We didn't process EAGAIN specially, you mean EINTR ?





+#ifdef _WIN32
+} else if (ret == -WSAEALREADY || ret == -WSAEINVAL) {
+break;
+#endif
+} else if (ret != -EINTR&&  ret != -EWOULDBLOCK) {
+perror("connect");
+closesocket(*fd);
+return ret;


-EAGAIN would go this path.



+}
+} else {
+break;
+}
+}
+
+return ret;
+}
+


--
Amos.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Wen Congyang
At 03/14/2012 06:07 PM, Avi Kivity Wrote:
> On 03/14/2012 11:53 AM, Wen Congyang wrote:
>> At 03/14/2012 05:24 PM, Avi Kivity Wrote:
>>> On 03/14/2012 10:29 AM, Wen Congyang wrote:
 At 03/13/2012 06:47 PM, Avi Kivity Wrote:
> On 03/13/2012 11:18 AM, Daniel P. Berrange wrote:
>> On Mon, Mar 12, 2012 at 12:33:33PM +0200, Avi Kivity wrote:
>>> On 03/12/2012 11:04 AM, Wen Congyang wrote:
 Do you have any other comments about this patch?

>>>
>>> Not really, but I'm not 100% convinced the patch is worthwhile.  It's
>>> likely to only be used by Linux, which has kexec facilities, and you can
>>> put talk to management via virtio-serial and describe the crash in more
>>> details than a simple hypercall.
>>
>> As mentioned before, I don't think virtio-serial is a good fit for this.
>> We want something that is simple & guaranteed always available. Using
>> virtio-serial requires significant setup work on both the host and guest.
>
> So what?  It needs to be done anyway for the guest agent.
>
>> Many management application won't know to make a vioserial device 
>> available
>> to all guests they create. 
>
> Then they won't know to deal with the panic event either.
>
>> Most administrators won't even configure kexec,
>> let alone virtio serial on top of it. 
>
> It should be done by the OS vendor, not the individual admin.
>
>> The hypercall requires zero host
>> side config, and zero guest side config, which IMHO is what we need for
>> this feature.
>
> If it was this one feature, yes.  But we keep getting more and more
> features like that and we bloat the hypervisor.  There's a reason we
> have a host-to-guest channel, we should use it.
>

 I donot know how to use virtio-serial.
>>>
>>> I don't either, copying Amit.
>>>
 I start vm like this:
 qemu ...\
-device virtio-serial \
   -chardev socket,path=/tmp/foo,server,nowait,id=foo \
   -device virtserialport,chardev=foo,name=port1 ...

 You said that there are too many channels. Does it mean /tmp/foo is a 
 channel?
>>>
>>> Probably.
>>
>> Hmm, if we use virtio-serial, the guest kernel writes something into the 
>> channel when
>> the os is panicked. Is it right?
> 
> Right.
> 
>> If so, is this channel visible to guest userspace? If the channle is visible 
>> to guest
>> userspace, the program running in userspace may write the same message to 
>> the channel.
>>
> 
> Surely there's some kind of access control on channels.

The virtio-serial depends on more things than touching the hypervisor. So I 
think touching
the hypervisor is more reliable than using virtio-serial device, and it is very 
simple and
easy to use.

If we pass something from guest userspace to host, we can use virtio-serial. 
But If we pass
something from guest kernelspace to host, I still prefer to touch the 
hypervisor.

Thanks
Wen Congyang
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Avi Kivity
On 03/14/2012 12:26 PM, Wen Congyang wrote:
> >> If so, is this channel visible to guest userspace? If the channle is 
> >> visible to guest
> >> userspace, the program running in userspace may write the same message to 
> >> the channel.
> >>
> > 
> > Surely there's some kind of access control on channels.
>
> The virtio-serial depends on more things than touching the hypervisor. So I 
> think touching
> the hypervisor is more reliable than using virtio-serial device, and it is 
> very simple and
> easy to use.
>
> If we pass something from guest userspace to host, we can use virtio-serial. 
> But If we pass
> something from guest kernelspace to host, I still prefer to touch the 
> hypervisor.

There's no argument that it's easier.  My concern is different, we're
adding more and more stuff to the hypervisor because it's easier, which
bloats it.  Every time we do it we add to compatibility and security
problems.

The panic notification is *really* simple, so I don't expect it to cause
a lot of problems.  But still, if it's possible not to change the
hypervisor, we must make an effort in that direction.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] PCI: Can continually add funcs after adding func0

2012-03-14 Thread Michael S. Tsirkin
On Fri, Jan 27, 2012 at 09:17:21AM -0800, Jesse Barnes wrote:
> On Mon, 5 Dec 2011 11:21:24 -0800
> Jesse Barnes  wrote:
> 
> > On Fri, 25 Nov 2011 15:03:07 +0800
> > Amos Kong  wrote:
> > 
> > > Boot up a KVM guest, and hotplug multifunction
> > > devices(func1,func2,func0,func3) to guest.
> > > 
> > > for i in 1 2 0 3;do
> > > qemu-img create /tmp/resize$i.qcow2 1G -f qcow2
> > > (qemu) drive_add 0x11.$i id=drv11$i,if=none,file=/tmp/resize$i.qcow2
> > > (qemu) device_add 
> > > virtio-blk-pci,id=dev11$i,drive=drv11$i,addr=0x11.$i,multifunction=on
> > > done
> > > 
> > > In linux kernel, when func0 of the slot is hot-added, the whole
> > > slot will be marked as 'enabled', then driver will ignore other new
> > > hotadded funcs.
> > > But in Win7 & WinXP, we can continaully add other funcs after adding
> > > func0, all funcs will be added in guest.
> > > 
> > > drivers/pci/hotplug/acpiphp_glue.c:
> > > static int acpiphp_check_bridge(struct acpiphp_bridge *bridge)
> > > {
> > > 
> > > for (slot = bridge->slots; slot; slot = slot->next) {
> > > if (slot->flags & SLOT_ENABLED) {
> > > acpiphp_disable_slot()
> > > else
> > > acpiphp_enable_slot()
> > >   |
> > > } v
> > > enable_device()
> > >   |
> > >   v
> > > //only don't enable slot if func0 is not added
> > >   list_for_each_entry(func, &slot->funcs, sibling) {
> > >...
> > > }
> > >slot->flags |= SLOT_ENABLED; //mark slot to 'enabled'
> > > 
> > > This patch just make pci driver can continaully add funcs after adding
> > > func 0. Only mark slot to 'enabled' when all funcs are added.
> > > 
> > > For pci multifunction hotplug, we can add functions one by one(func 0 is
> > > necessary), and all functions will be removed in one time.
> > > 
> > > Signed-off-by: Amos Kong 
> > > ---
> > 
> > Rafael, Prarit or Shaohua, any comments?  Would be good to get some
> > tested-bys too.
> 
> Missed the last merge window, but there have been no comments so I'm
> assuming people are ok with this and applying to -next.
> 
> Thanks,

I finally found this supporting statement in the acpi spec:

"For any device, the BIOS provides only information that is added to the
device in a non-hardware standard manner."
Since extra functions can be enumerated in a hardware-standard manner,
it seems clear that it's ok to skip their description in ACPI.

So, belatedly
Acked-by: Michael S. Tsirkin 

> -- 
> Jesse Barnes, Intel Open Source Technology Center


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Amit Shah
On (Wed) 14 Mar 2012 [17:53:00], Wen Congyang wrote:
> At 03/14/2012 05:24 PM, Avi Kivity Wrote:
> > On 03/14/2012 10:29 AM, Wen Congyang wrote:
> >> At 03/13/2012 06:47 PM, Avi Kivity Wrote:
> >>> On 03/13/2012 11:18 AM, Daniel P. Berrange wrote:
>  On Mon, Mar 12, 2012 at 12:33:33PM +0200, Avi Kivity wrote:
> > On 03/12/2012 11:04 AM, Wen Congyang wrote:
> >> Do you have any other comments about this patch?
> >>
> >
> > Not really, but I'm not 100% convinced the patch is worthwhile.  It's
> > likely to only be used by Linux, which has kexec facilities, and you can
> > put talk to management via virtio-serial and describe the crash in more
> > details than a simple hypercall.
> 
>  As mentioned before, I don't think virtio-serial is a good fit for this.
>  We want something that is simple & guaranteed always available. Using
>  virtio-serial requires significant setup work on both the host and guest.
> >>>
> >>> So what?  It needs to be done anyway for the guest agent.
> >>>
>  Many management application won't know to make a vioserial device 
>  available
>  to all guests they create. 
> >>>
> >>> Then they won't know to deal with the panic event either.
> >>>
>  Most administrators won't even configure kexec,
>  let alone virtio serial on top of it. 
> >>>
> >>> It should be done by the OS vendor, not the individual admin.
> >>>
>  The hypercall requires zero host
>  side config, and zero guest side config, which IMHO is what we need for
>  this feature.
> >>>
> >>> If it was this one feature, yes.  But we keep getting more and more
> >>> features like that and we bloat the hypervisor.  There's a reason we
> >>> have a host-to-guest channel, we should use it.
> >>>
> >>
> >> I donot know how to use virtio-serial.
> > 
> > I don't either, copying Amit.
> > 
> >> I start vm like this:
> >> qemu ...\
> >>-device virtio-serial \
> >>   -chardev socket,path=/tmp/foo,server,nowait,id=foo \
> >>   -device virtserialport,chardev=foo,name=port1 ...
> >>
> >> You said that there are too many channels. Does it mean /tmp/foo is a 
> >> channel?
> > 
> > Probably.
> 
> Hmm, if we use virtio-serial, the guest kernel writes something into the 
> channel when
> the os is panicked. Is it right?

Depends on how you want to use it.  It could be the kernel, or it
could be a userspace program which monitors syslogs for panic
information and passes on that info to the virtio-serial channel.

> If so, is this channel visible to guest userspace? If the channle is visible 
> to guest
> userspace, the program running in userspace may write the same message to the 
> channel.

Access control is via permissions.  You can have udev scripts assign
whatever uid and gid to the port of your interest.  By default, all
ports are only accessible to the root user.

Amit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] virtio-pci: fix abort when fail to allocate ioeventfd

2012-03-14 Thread Stefan Hajnoczi
On Wed, Mar 14, 2012 at 10:05 AM, Avi Kivity  wrote:
> On 03/14/2012 11:59 AM, Stefan Hajnoczi wrote:
>> On Wed, Mar 14, 2012 at 9:22 AM, Avi Kivity  wrote:
>> > On 03/13/2012 12:42 PM, Amos Kong wrote:
>> >> Boot up guest with 232 virtio-blk disk, qemu will abort for fail to
>> >> allocate ioeventfd. This patchset changes kvm_has_many_ioeventfds(),
>> >> and check if available ioeventfd exists. If not, virtio-pci will
>> >> fallback to userspace, and don't use ioeventfd for io notification.
>> >
>> > How about an alternative way of solving this, within the memory core:
>> > trap those writes in qemu and write to the ioeventfd yourself.  This way
>> > ioeventfds work even without kvm:
>> >
>> >
>> >  core: create eventfd
>> >  core: install handler for memory address that writes to ioeventfd
>> >  kvm (optional): install kernel handler for ioeventfd
>> >
>> > even if the third step fails, the ioeventfd still works, it's just slower.
>>
>> That approach will penalize guests with large numbers of disks - they
>> see an extra switch to vcpu thread instead of kvm.ko -> iothread.
>
> It's only a failure path.  The normal path is expected to have a kvm
> ioeventfd installed.

It's the normal path when you attach >232 virtio-blk devices to a
guest (or 300 in the future).

>>   It
>> seems okay provided we can solve the limit in the kernel once and for
>> all by introducing a more dynamic data structure for in-kernel
>> devices.  That way future kernels will never hit an arbitrary limit
>> below their file descriptor rlimit.
>>
>> Is there some reason why kvm.ko must use a fixed size array?  Would it
>> be possible to use a tree (maybe with a cache for recent lookups)?
>
> It does use bsearch today IIRC.  We'll expand the limit, but there must
> be a limit, and qemu must be prepared to deal with it.

Shouldn't the limit be the file descriptor rlimit?  If userspace
cannot create more eventfds then it cannot set up more ioeventfds.

I agree there always needs to be an error path because there is a
finite resource (either file descriptors or in-kernel device slots).

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Amit Shah
On (Wed) 14 Mar 2012 [18:04:40], Wen Congyang wrote:
> At 03/14/2012 05:51 PM, Amit Shah Wrote:
> > On (Wed) 14 Mar 2012 [16:29:50], Wen Congyang wrote:
> >> At 03/13/2012 06:47 PM, Avi Kivity Wrote:
> >>> On 03/13/2012 11:18 AM, Daniel P. Berrange wrote:
>  On Mon, Mar 12, 2012 at 12:33:33PM +0200, Avi Kivity wrote:
> > On 03/12/2012 11:04 AM, Wen Congyang wrote:
> >> Do you have any other comments about this patch?
> >>
> >
> > Not really, but I'm not 100% convinced the patch is worthwhile.  It's
> > likely to only be used by Linux, which has kexec facilities, and you can
> > put talk to management via virtio-serial and describe the crash in more
> > details than a simple hypercall.
> 
>  As mentioned before, I don't think virtio-serial is a good fit for this.
>  We want something that is simple & guaranteed always available. Using
>  virtio-serial requires significant setup work on both the host and guest.
> >>>
> >>> So what?  It needs to be done anyway for the guest agent.
> >>>
>  Many management application won't know to make a vioserial device 
>  available
>  to all guests they create. 
> >>>
> >>> Then they won't know to deal with the panic event either.
> >>>
>  Most administrators won't even configure kexec,
>  let alone virtio serial on top of it. 
> >>>
> >>> It should be done by the OS vendor, not the individual admin.
> >>>
>  The hypercall requires zero host
>  side config, and zero guest side config, which IMHO is what we need for
>  this feature.
> >>>
> >>> If it was this one feature, yes.  But we keep getting more and more
> >>> features like that and we bloat the hypervisor.  There's a reason we
> >>> have a host-to-guest channel, we should use it.
> >>>
> >>
> >> I donot know how to use virtio-serial.
> >>
> >> I start vm like this:
> >> qemu ...\
> >>-device virtio-serial \
> >>   -chardev socket,path=/tmp/foo,server,nowait,id=foo \
> >>   -device virtserialport,chardev=foo,name=port1 ...
> > 
> > This is sufficient.  On the host, you can open /tmp/foo using a custom
> > program or nc (nc -U /tmp/foo).  On the guest, you can just open
> > /dev/virtio-ports/port1 and read/write into it.
> 
> I have two questions:
> 1. does it OK to open this device when the guest is panicked?

Depends on what kind of panic it is.  If the guest can continue
operations inspite of the panic, it will be possible to write out the
data.

> 2. how to prevent the userspace's program using this device?

Mentioned in previous reply.

BTW: an in-kernel API for reading/writing to ports isn't implemented
yet, because there's no user for it as of now.  If you want to write
from the kernel to the host, there are trivial additions to the code
necessary.

(However, I think it's better to do the writing from userspace instead
from the kernel itself).

Amit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Gleb Natapov
On Wed, Mar 14, 2012 at 04:10:04PM +0530, Amit Shah wrote:
> On (Wed) 14 Mar 2012 [18:04:40], Wen Congyang wrote:
> > At 03/14/2012 05:51 PM, Amit Shah Wrote:
> > > On (Wed) 14 Mar 2012 [16:29:50], Wen Congyang wrote:
> > >> At 03/13/2012 06:47 PM, Avi Kivity Wrote:
> > >>> On 03/13/2012 11:18 AM, Daniel P. Berrange wrote:
> >  On Mon, Mar 12, 2012 at 12:33:33PM +0200, Avi Kivity wrote:
> > > On 03/12/2012 11:04 AM, Wen Congyang wrote:
> > >> Do you have any other comments about this patch?
> > >>
> > >
> > > Not really, but I'm not 100% convinced the patch is worthwhile.  It's
> > > likely to only be used by Linux, which has kexec facilities, and you 
> > > can
> > > put talk to management via virtio-serial and describe the crash in 
> > > more
> > > details than a simple hypercall.
> > 
> >  As mentioned before, I don't think virtio-serial is a good fit for 
> >  this.
> >  We want something that is simple & guaranteed always available. Using
> >  virtio-serial requires significant setup work on both the host and 
> >  guest.
> > >>>
> > >>> So what?  It needs to be done anyway for the guest agent.
> > >>>
> >  Many management application won't know to make a vioserial device 
> >  available
> >  to all guests they create. 
> > >>>
> > >>> Then they won't know to deal with the panic event either.
> > >>>
> >  Most administrators won't even configure kexec,
> >  let alone virtio serial on top of it. 
> > >>>
> > >>> It should be done by the OS vendor, not the individual admin.
> > >>>
> >  The hypercall requires zero host
> >  side config, and zero guest side config, which IMHO is what we need for
> >  this feature.
> > >>>
> > >>> If it was this one feature, yes.  But we keep getting more and more
> > >>> features like that and we bloat the hypervisor.  There's a reason we
> > >>> have a host-to-guest channel, we should use it.
> > >>>
> > >>
> > >> I donot know how to use virtio-serial.
> > >>
> > >> I start vm like this:
> > >> qemu ...\
> > >>-device virtio-serial \
> > >>   -chardev socket,path=/tmp/foo,server,nowait,id=foo \
> > >>   -device virtserialport,chardev=foo,name=port1 ...
> > > 
> > > This is sufficient.  On the host, you can open /tmp/foo using a custom
> > > program or nc (nc -U /tmp/foo).  On the guest, you can just open
> > > /dev/virtio-ports/port1 and read/write into it.
> > 
> > I have two questions:
> > 1. does it OK to open this device when the guest is panicked?
> 
> Depends on what kind of panic it is.  If the guest can continue
> operations inspite of the panic, it will be possible to write out the
> data.
> 
> > 2. how to prevent the userspace's program using this device?
> 
> Mentioned in previous reply.
> 
> BTW: an in-kernel API for reading/writing to ports isn't implemented
> yet, because there's no user for it as of now.  If you want to write
> from the kernel to the host, there are trivial additions to the code
> necessary.
> 
> (However, I think it's better to do the writing from userspace instead
> from the kernel itself).
> 
In case of panic notifier this is out of question.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Gleb Natapov
On Wed, Mar 14, 2012 at 12:29:57PM +0200, Avi Kivity wrote:
> On 03/14/2012 12:26 PM, Wen Congyang wrote:
> > >> If so, is this channel visible to guest userspace? If the channle is 
> > >> visible to guest
> > >> userspace, the program running in userspace may write the same message 
> > >> to the channel.
> > >>
> > > 
> > > Surely there's some kind of access control on channels.
> >
> > The virtio-serial depends on more things than touching the hypervisor. So I 
> > think touching
> > the hypervisor is more reliable than using virtio-serial device, and it is 
> > very simple and
> > easy to use.
> >
> > If we pass something from guest userspace to host, we can use 
> > virtio-serial. But If we pass
> > something from guest kernelspace to host, I still prefer to touch the 
> > hypervisor.
> 
> There's no argument that it's easier.  My concern is different, we're
> adding more and more stuff to the hypervisor because it's easier, which
> bloats it.  Every time we do it we add to compatibility and security
> problems.
> 
> The panic notification is *really* simple, so I don't expect it to cause
> a lot of problems.  But still, if it's possible not to change the
> hypervisor, we must make an effort in that direction.
> 
One more point against using virtio-serial is that it will be likely
compiled as a module which means panic during early boot will not be
reported.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] virtio-pci: fix abort when fail to allocate ioeventfd

2012-03-14 Thread Avi Kivity
On 03/14/2012 12:39 PM, Stefan Hajnoczi wrote:
> On Wed, Mar 14, 2012 at 10:05 AM, Avi Kivity  wrote:
> > On 03/14/2012 11:59 AM, Stefan Hajnoczi wrote:
> >> On Wed, Mar 14, 2012 at 9:22 AM, Avi Kivity  wrote:
> >> > On 03/13/2012 12:42 PM, Amos Kong wrote:
> >> >> Boot up guest with 232 virtio-blk disk, qemu will abort for fail to
> >> >> allocate ioeventfd. This patchset changes kvm_has_many_ioeventfds(),
> >> >> and check if available ioeventfd exists. If not, virtio-pci will
> >> >> fallback to userspace, and don't use ioeventfd for io notification.
> >> >
> >> > How about an alternative way of solving this, within the memory core:
> >> > trap those writes in qemu and write to the ioeventfd yourself.  This way
> >> > ioeventfds work even without kvm:
> >> >
> >> >
> >> >  core: create eventfd
> >> >  core: install handler for memory address that writes to ioeventfd
> >> >  kvm (optional): install kernel handler for ioeventfd
> >> >
> >> > even if the third step fails, the ioeventfd still works, it's just 
> >> > slower.
> >>
> >> That approach will penalize guests with large numbers of disks - they
> >> see an extra switch to vcpu thread instead of kvm.ko -> iothread.
> >
> > It's only a failure path.  The normal path is expected to have a kvm
> > ioeventfd installed.
>
> It's the normal path when you attach >232 virtio-blk devices to a
> guest (or 300 in the future).

Well, there's nothing we can do about it.

We'll increase the limit of course, but old kernels will remain out
there.  The right fix is virtio-scsi anyway.

> >>   It
> >> seems okay provided we can solve the limit in the kernel once and for
> >> all by introducing a more dynamic data structure for in-kernel
> >> devices.  That way future kernels will never hit an arbitrary limit
> >> below their file descriptor rlimit.
> >>
> >> Is there some reason why kvm.ko must use a fixed size array?  Would it
> >> be possible to use a tree (maybe with a cache for recent lookups)?
> >
> > It does use bsearch today IIRC.  We'll expand the limit, but there must
> > be a limit, and qemu must be prepared to deal with it.
>
> Shouldn't the limit be the file descriptor rlimit?  If userspace
> cannot create more eventfds then it cannot set up more ioeventfds.

You can use the same eventfd for multiple ioeventfds.  If you mean to
slave kvm's ioeventfd limit to the number of files the process can have,
that's a good idea.  Surely an ioeventfd occupies less resources than an
open file.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Avi Kivity
On 03/14/2012 12:46 PM, Gleb Natapov wrote:
> On Wed, Mar 14, 2012 at 12:29:57PM +0200, Avi Kivity wrote:
> > On 03/14/2012 12:26 PM, Wen Congyang wrote:
> > > >> If so, is this channel visible to guest userspace? If the channle is 
> > > >> visible to guest
> > > >> userspace, the program running in userspace may write the same message 
> > > >> to the channel.
> > > >>
> > > > 
> > > > Surely there's some kind of access control on channels.
> > >
> > > The virtio-serial depends on more things than touching the hypervisor. So 
> > > I think touching
> > > the hypervisor is more reliable than using virtio-serial device, and it 
> > > is very simple and
> > > easy to use.
> > >
> > > If we pass something from guest userspace to host, we can use 
> > > virtio-serial. But If we pass
> > > something from guest kernelspace to host, I still prefer to touch the 
> > > hypervisor.
> > 
> > There's no argument that it's easier.  My concern is different, we're
> > adding more and more stuff to the hypervisor because it's easier, which
> > bloats it.  Every time we do it we add to compatibility and security
> > problems.
> > 
> > The panic notification is *really* simple, so I don't expect it to cause
> > a lot of problems.  But still, if it's possible not to change the
> > hypervisor, we must make an effort in that direction.
> > 
> One more point against using virtio-serial is that it will be likely
> compiled as a module which means panic during early boot will not be
> reported.

I don't think we want to use the driver.  Instead, have a small piece of
code that resets the device and pushes out a string (the panic message?)
without any interrupts etc.

It's still going to be less reliable than a hypercall, I agree.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Wen Congyang
At 03/14/2012 06:37 PM, Amit Shah Wrote:
> On (Wed) 14 Mar 2012 [17:53:00], Wen Congyang wrote:
>> At 03/14/2012 05:24 PM, Avi Kivity Wrote:
>>> On 03/14/2012 10:29 AM, Wen Congyang wrote:
 At 03/13/2012 06:47 PM, Avi Kivity Wrote:
> On 03/13/2012 11:18 AM, Daniel P. Berrange wrote:
>> On Mon, Mar 12, 2012 at 12:33:33PM +0200, Avi Kivity wrote:
>>> On 03/12/2012 11:04 AM, Wen Congyang wrote:
 Do you have any other comments about this patch?

>>>
>>> Not really, but I'm not 100% convinced the patch is worthwhile.  It's
>>> likely to only be used by Linux, which has kexec facilities, and you can
>>> put talk to management via virtio-serial and describe the crash in more
>>> details than a simple hypercall.
>>
>> As mentioned before, I don't think virtio-serial is a good fit for this.
>> We want something that is simple & guaranteed always available. Using
>> virtio-serial requires significant setup work on both the host and guest.
>
> So what?  It needs to be done anyway for the guest agent.
>
>> Many management application won't know to make a vioserial device 
>> available
>> to all guests they create. 
>
> Then they won't know to deal with the panic event either.
>
>> Most administrators won't even configure kexec,
>> let alone virtio serial on top of it. 
>
> It should be done by the OS vendor, not the individual admin.
>
>> The hypercall requires zero host
>> side config, and zero guest side config, which IMHO is what we need for
>> this feature.
>
> If it was this one feature, yes.  But we keep getting more and more
> features like that and we bloat the hypervisor.  There's a reason we
> have a host-to-guest channel, we should use it.
>

 I donot know how to use virtio-serial.
>>>
>>> I don't either, copying Amit.
>>>
 I start vm like this:
 qemu ...\
-device virtio-serial \
   -chardev socket,path=/tmp/foo,server,nowait,id=foo \
   -device virtserialport,chardev=foo,name=port1 ...

 You said that there are too many channels. Does it mean /tmp/foo is a 
 channel?
>>>
>>> Probably.
>>
>> Hmm, if we use virtio-serial, the guest kernel writes something into the 
>> channel when
>> the os is panicked. Is it right?
> 
> Depends on how you want to use it.  It could be the kernel, or it
> could be a userspace program which monitors syslogs for panic
> information and passes on that info to the virtio-serial channel.

When the kernel is panicked, we cannot use userspace program.

> 
>> If so, is this channel visible to guest userspace? If the channle is visible 
>> to guest
>> userspace, the program running in userspace may write the same message to 
>> the channel.
> 
> Access control is via permissions.  You can have udev scripts assign
> whatever uid and gid to the port of your interest.  By default, all
> ports are only accessible to the root user.

We should also prevent root user writing message to this channel if it is
used for panicked notification.

Thanks
Wen Congyang

> 
>   Amit
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Gleb Natapov
On Wed, Mar 14, 2012 at 06:52:07PM +0800, Wen Congyang wrote:
> At 03/14/2012 06:37 PM, Amit Shah Wrote:
> > On (Wed) 14 Mar 2012 [17:53:00], Wen Congyang wrote:
> >> At 03/14/2012 05:24 PM, Avi Kivity Wrote:
> >>> On 03/14/2012 10:29 AM, Wen Congyang wrote:
>  At 03/13/2012 06:47 PM, Avi Kivity Wrote:
> > On 03/13/2012 11:18 AM, Daniel P. Berrange wrote:
> >> On Mon, Mar 12, 2012 at 12:33:33PM +0200, Avi Kivity wrote:
> >>> On 03/12/2012 11:04 AM, Wen Congyang wrote:
>  Do you have any other comments about this patch?
> 
> >>>
> >>> Not really, but I'm not 100% convinced the patch is worthwhile.  It's
> >>> likely to only be used by Linux, which has kexec facilities, and you 
> >>> can
> >>> put talk to management via virtio-serial and describe the crash in 
> >>> more
> >>> details than a simple hypercall.
> >>
> >> As mentioned before, I don't think virtio-serial is a good fit for 
> >> this.
> >> We want something that is simple & guaranteed always available. Using
> >> virtio-serial requires significant setup work on both the host and 
> >> guest.
> >
> > So what?  It needs to be done anyway for the guest agent.
> >
> >> Many management application won't know to make a vioserial device 
> >> available
> >> to all guests they create. 
> >
> > Then they won't know to deal with the panic event either.
> >
> >> Most administrators won't even configure kexec,
> >> let alone virtio serial on top of it. 
> >
> > It should be done by the OS vendor, not the individual admin.
> >
> >> The hypercall requires zero host
> >> side config, and zero guest side config, which IMHO is what we need for
> >> this feature.
> >
> > If it was this one feature, yes.  But we keep getting more and more
> > features like that and we bloat the hypervisor.  There's a reason we
> > have a host-to-guest channel, we should use it.
> >
> 
>  I donot know how to use virtio-serial.
> >>>
> >>> I don't either, copying Amit.
> >>>
>  I start vm like this:
>  qemu ...\
> -device virtio-serial \
>    -chardev socket,path=/tmp/foo,server,nowait,id=foo \
>    -device virtserialport,chardev=foo,name=port1 ...
> 
>  You said that there are too many channels. Does it mean /tmp/foo is a 
>  channel?
> >>>
> >>> Probably.
> >>
> >> Hmm, if we use virtio-serial, the guest kernel writes something into the 
> >> channel when
> >> the os is panicked. Is it right?
> > 
> > Depends on how you want to use it.  It could be the kernel, or it
> > could be a userspace program which monitors syslogs for panic
> > information and passes on that info to the virtio-serial channel.
> 
> When the kernel is panicked, we cannot use userspace program.
> 
> > 
> >> If so, is this channel visible to guest userspace? If the channle is 
> >> visible to guest
> >> userspace, the program running in userspace may write the same message to 
> >> the channel.
> > 
> > Access control is via permissions.  You can have udev scripts assign
> > whatever uid and gid to the port of your interest.  By default, all
> > ports are only accessible to the root user.
> 
> We should also prevent root user writing message to this channel if it is
> used for panicked notification.
> 
Why? Root user can also call panic hypercall if he wishes so.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Avi Kivity
On 03/14/2012 12:52 PM, Wen Congyang wrote:
> > 
> >> If so, is this channel visible to guest userspace? If the channle is 
> >> visible to guest
> >> userspace, the program running in userspace may write the same message to 
> >> the channel.
> > 
> > Access control is via permissions.  You can have udev scripts assign
> > whatever uid and gid to the port of your interest.  By default, all
> > ports are only accessible to the root user.
>
> We should also prevent root user writing message to this channel if it is
> used for panicked notification.
>

Why?  root can easily cause a panic.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Wen Congyang
At 03/14/2012 06:52 PM, Gleb Natapov Wrote:
> On Wed, Mar 14, 2012 at 06:52:07PM +0800, Wen Congyang wrote:
>> At 03/14/2012 06:37 PM, Amit Shah Wrote:
>>> On (Wed) 14 Mar 2012 [17:53:00], Wen Congyang wrote:
 At 03/14/2012 05:24 PM, Avi Kivity Wrote:
> On 03/14/2012 10:29 AM, Wen Congyang wrote:
>> At 03/13/2012 06:47 PM, Avi Kivity Wrote:
>>> On 03/13/2012 11:18 AM, Daniel P. Berrange wrote:
 On Mon, Mar 12, 2012 at 12:33:33PM +0200, Avi Kivity wrote:
> On 03/12/2012 11:04 AM, Wen Congyang wrote:
>> Do you have any other comments about this patch?
>>
>
> Not really, but I'm not 100% convinced the patch is worthwhile.  It's
> likely to only be used by Linux, which has kexec facilities, and you 
> can
> put talk to management via virtio-serial and describe the crash in 
> more
> details than a simple hypercall.

 As mentioned before, I don't think virtio-serial is a good fit for 
 this.
 We want something that is simple & guaranteed always available. Using
 virtio-serial requires significant setup work on both the host and 
 guest.
>>>
>>> So what?  It needs to be done anyway for the guest agent.
>>>
 Many management application won't know to make a vioserial device 
 available
 to all guests they create. 
>>>
>>> Then they won't know to deal with the panic event either.
>>>
 Most administrators won't even configure kexec,
 let alone virtio serial on top of it. 
>>>
>>> It should be done by the OS vendor, not the individual admin.
>>>
 The hypercall requires zero host
 side config, and zero guest side config, which IMHO is what we need for
 this feature.
>>>
>>> If it was this one feature, yes.  But we keep getting more and more
>>> features like that and we bloat the hypervisor.  There's a reason we
>>> have a host-to-guest channel, we should use it.
>>>
>>
>> I donot know how to use virtio-serial.
>
> I don't either, copying Amit.
>
>> I start vm like this:
>> qemu ...\
>>-device virtio-serial \
>>   -chardev socket,path=/tmp/foo,server,nowait,id=foo \
>>   -device virtserialport,chardev=foo,name=port1 ...
>>
>> You said that there are too many channels. Does it mean /tmp/foo is a 
>> channel?
>
> Probably.

 Hmm, if we use virtio-serial, the guest kernel writes something into the 
 channel when
 the os is panicked. Is it right?
>>>
>>> Depends on how you want to use it.  It could be the kernel, or it
>>> could be a userspace program which monitors syslogs for panic
>>> information and passes on that info to the virtio-serial channel.
>>
>> When the kernel is panicked, we cannot use userspace program.
>>
>>>
 If so, is this channel visible to guest userspace? If the channle is 
 visible to guest
 userspace, the program running in userspace may write the same message to 
 the channel.
>>>
>>> Access control is via permissions.  You can have udev scripts assign
>>> whatever uid and gid to the port of your interest.  By default, all
>>> ports are only accessible to the root user.
>>
>> We should also prevent root user writing message to this channel if it is
>> used for panicked notification.
>>
> Why? Root user can also call panic hypercall if he wishes so.

IIRC, the instruction vmcall needs to run on ring0. The root user is in ring3.

Thanks
Wen Congyang

> 
> --
>   Gleb.
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Wen Congyang
At 03/14/2012 06:52 PM, Avi Kivity Wrote:
> On 03/14/2012 12:52 PM, Wen Congyang wrote:
>>>
 If so, is this channel visible to guest userspace? If the channle is 
 visible to guest
 userspace, the program running in userspace may write the same message to 
 the channel.
>>>
>>> Access control is via permissions.  You can have udev scripts assign
>>> whatever uid and gid to the port of your interest.  By default, all
>>> ports are only accessible to the root user.
>>
>> We should also prevent root user writing message to this channel if it is
>> used for panicked notification.
>>
> 
> Why?  root can easily cause a panic.
> 

root user can write the same message to virtio-serial while the guest is 
running...

Thanks
Wen Congyang
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Daniel P. Berrange
On Wed, Mar 14, 2012 at 03:21:14PM +0530, Amit Shah wrote:
> On (Wed) 14 Mar 2012 [16:29:50], Wen Congyang wrote:
> > At 03/13/2012 06:47 PM, Avi Kivity Wrote:
> > > On 03/13/2012 11:18 AM, Daniel P. Berrange wrote:
> > >> On Mon, Mar 12, 2012 at 12:33:33PM +0200, Avi Kivity wrote:
> > >>> On 03/12/2012 11:04 AM, Wen Congyang wrote:
> >  Do you have any other comments about this patch?
> > 
> > >>>
> > >>> Not really, but I'm not 100% convinced the patch is worthwhile.  It's
> > >>> likely to only be used by Linux, which has kexec facilities, and you can
> > >>> put talk to management via virtio-serial and describe the crash in more
> > >>> details than a simple hypercall.
> > >>
> > >> As mentioned before, I don't think virtio-serial is a good fit for this.
> > >> We want something that is simple & guaranteed always available. Using
> > >> virtio-serial requires significant setup work on both the host and guest.
> > > 
> > > So what?  It needs to be done anyway for the guest agent.
> > > 
> > >> Many management application won't know to make a vioserial device 
> > >> available
> > >> to all guests they create. 
> > > 
> > > Then they won't know to deal with the panic event either.
> > > 
> > >> Most administrators won't even configure kexec,
> > >> let alone virtio serial on top of it. 
> > > 
> > > It should be done by the OS vendor, not the individual admin.
> > > 
> > >> The hypercall requires zero host
> > >> side config, and zero guest side config, which IMHO is what we need for
> > >> this feature.
> > > 
> > > If it was this one feature, yes.  But we keep getting more and more
> > > features like that and we bloat the hypervisor.  There's a reason we
> > > have a host-to-guest channel, we should use it.
> > > 
> > 
> > I donot know how to use virtio-serial.
> > 
> > I start vm like this:
> > qemu ...\
> >-device virtio-serial \
> >   -chardev socket,path=/tmp/foo,server,nowait,id=foo \
> >   -device virtserialport,chardev=foo,name=port1 ...
> 
> This is sufficient.  On the host, you can open /tmp/foo using a custom
> program or nc (nc -U /tmp/foo).  On the guest, you can just open
> /dev/virtio-ports/port1 and read/write into it.
> 
> See the following links for more details.
> 
> https://fedoraproject.org/wiki/Features/VirtioSerial#How_To_Test
> http://www.linux-kvm.org/page/Virtio-serial_API
> 
> > You said that there are too many channels. Does it mean /tmp/foo is a 
> > channel?
> 
> You can have several such -device virtserialport.  The -device part
> describes what the guest will see.  The -chardev part ties that to the
> host-side part of the channel.
> 
> /tmp/foo is the host end-point for the channel, in the example above,
> and /dev/virtio-ports/port1 is the guest-side end-point.

If we do choose to use virtio-serial for panics (which I don't think
we should), then we should not expose it in the host filesystem. The
host side should be a virtual chardev backend internal to QEMU, in
the same way that 'spicevmc' is handled.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Amit Shah
On (Wed) 14 Mar 2012 [18:52:07], Wen Congyang wrote:
> At 03/14/2012 06:37 PM, Amit Shah Wrote:
> > On (Wed) 14 Mar 2012 [17:53:00], Wen Congyang wrote:
> >> At 03/14/2012 05:24 PM, Avi Kivity Wrote:
> >>> On 03/14/2012 10:29 AM, Wen Congyang wrote:
>  At 03/13/2012 06:47 PM, Avi Kivity Wrote:
> > On 03/13/2012 11:18 AM, Daniel P. Berrange wrote:
> >> On Mon, Mar 12, 2012 at 12:33:33PM +0200, Avi Kivity wrote:
> >>> On 03/12/2012 11:04 AM, Wen Congyang wrote:
>  Do you have any other comments about this patch?
> 
> >>>
> >>> Not really, but I'm not 100% convinced the patch is worthwhile.  It's
> >>> likely to only be used by Linux, which has kexec facilities, and you 
> >>> can
> >>> put talk to management via virtio-serial and describe the crash in 
> >>> more
> >>> details than a simple hypercall.
> >>
> >> As mentioned before, I don't think virtio-serial is a good fit for 
> >> this.
> >> We want something that is simple & guaranteed always available. Using
> >> virtio-serial requires significant setup work on both the host and 
> >> guest.
> >
> > So what?  It needs to be done anyway for the guest agent.
> >
> >> Many management application won't know to make a vioserial device 
> >> available
> >> to all guests they create. 
> >
> > Then they won't know to deal with the panic event either.
> >
> >> Most administrators won't even configure kexec,
> >> let alone virtio serial on top of it. 
> >
> > It should be done by the OS vendor, not the individual admin.
> >
> >> The hypercall requires zero host
> >> side config, and zero guest side config, which IMHO is what we need for
> >> this feature.
> >
> > If it was this one feature, yes.  But we keep getting more and more
> > features like that and we bloat the hypervisor.  There's a reason we
> > have a host-to-guest channel, we should use it.
> >
> 
>  I donot know how to use virtio-serial.
> >>>
> >>> I don't either, copying Amit.
> >>>
>  I start vm like this:
>  qemu ...\
> -device virtio-serial \
>    -chardev socket,path=/tmp/foo,server,nowait,id=foo \
>    -device virtserialport,chardev=foo,name=port1 ...
> 
>  You said that there are too many channels. Does it mean /tmp/foo is a 
>  channel?
> >>>
> >>> Probably.
> >>
> >> Hmm, if we use virtio-serial, the guest kernel writes something into the 
> >> channel when
> >> the os is panicked. Is it right?
> > 
> > Depends on how you want to use it.  It could be the kernel, or it
> > could be a userspace program which monitors syslogs for panic
> > information and passes on that info to the virtio-serial channel.
> 
> When the kernel is panicked, we cannot use userspace program.
> 
> > 
> >> If so, is this channel visible to guest userspace? If the channle is 
> >> visible to guest
> >> userspace, the program running in userspace may write the same message to 
> >> the channel.
> > 
> > Access control is via permissions.  You can have udev scripts assign
> > whatever uid and gid to the port of your interest.  By default, all
> > ports are only accessible to the root user.
> 
> We should also prevent root user writing message to this channel if it is
> used for panicked notification.

Well, if you want a particular channel to be write-only by the kernel,
that can be arranged as well: just don't expose it in /dev/ at all.

Amit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Gleb Natapov
On Wed, Mar 14, 2012 at 06:57:59PM +0800, Wen Congyang wrote:
> At 03/14/2012 06:52 PM, Gleb Natapov Wrote:
> > On Wed, Mar 14, 2012 at 06:52:07PM +0800, Wen Congyang wrote:
> >> At 03/14/2012 06:37 PM, Amit Shah Wrote:
> >>> On (Wed) 14 Mar 2012 [17:53:00], Wen Congyang wrote:
>  At 03/14/2012 05:24 PM, Avi Kivity Wrote:
> > On 03/14/2012 10:29 AM, Wen Congyang wrote:
> >> At 03/13/2012 06:47 PM, Avi Kivity Wrote:
> >>> On 03/13/2012 11:18 AM, Daniel P. Berrange wrote:
>  On Mon, Mar 12, 2012 at 12:33:33PM +0200, Avi Kivity wrote:
> > On 03/12/2012 11:04 AM, Wen Congyang wrote:
> >> Do you have any other comments about this patch?
> >>
> >
> > Not really, but I'm not 100% convinced the patch is worthwhile.  
> > It's
> > likely to only be used by Linux, which has kexec facilities, and 
> > you can
> > put talk to management via virtio-serial and describe the crash in 
> > more
> > details than a simple hypercall.
> 
>  As mentioned before, I don't think virtio-serial is a good fit for 
>  this.
>  We want something that is simple & guaranteed always available. Using
>  virtio-serial requires significant setup work on both the host and 
>  guest.
> >>>
> >>> So what?  It needs to be done anyway for the guest agent.
> >>>
>  Many management application won't know to make a vioserial device 
>  available
>  to all guests they create. 
> >>>
> >>> Then they won't know to deal with the panic event either.
> >>>
>  Most administrators won't even configure kexec,
>  let alone virtio serial on top of it. 
> >>>
> >>> It should be done by the OS vendor, not the individual admin.
> >>>
>  The hypercall requires zero host
>  side config, and zero guest side config, which IMHO is what we need 
>  for
>  this feature.
> >>>
> >>> If it was this one feature, yes.  But we keep getting more and more
> >>> features like that and we bloat the hypervisor.  There's a reason we
> >>> have a host-to-guest channel, we should use it.
> >>>
> >>
> >> I donot know how to use virtio-serial.
> >
> > I don't either, copying Amit.
> >
> >> I start vm like this:
> >> qemu ...\
> >>-device virtio-serial \
> >>   -chardev socket,path=/tmp/foo,server,nowait,id=foo \
> >>   -device virtserialport,chardev=foo,name=port1 ...
> >>
> >> You said that there are too many channels. Does it mean /tmp/foo is a 
> >> channel?
> >
> > Probably.
> 
>  Hmm, if we use virtio-serial, the guest kernel writes something into the 
>  channel when
>  the os is panicked. Is it right?
> >>>
> >>> Depends on how you want to use it.  It could be the kernel, or it
> >>> could be a userspace program which monitors syslogs for panic
> >>> information and passes on that info to the virtio-serial channel.
> >>
> >> When the kernel is panicked, we cannot use userspace program.
> >>
> >>>
>  If so, is this channel visible to guest userspace? If the channle is 
>  visible to guest
>  userspace, the program running in userspace may write the same message 
>  to the channel.
> >>>
> >>> Access control is via permissions.  You can have udev scripts assign
> >>> whatever uid and gid to the port of your interest.  By default, all
> >>> ports are only accessible to the root user.
> >>
> >> We should also prevent root user writing message to this channel if it is
> >> used for panicked notification.
> >>
> > Why? Root user can also call panic hypercall if he wishes so.
> 
> IIRC, the instruction vmcall needs to run on ring0. The root user is in ring3.
> 
And who will stop the root from loading kernel module?
 
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Daniel P. Berrange
On Wed, Mar 14, 2012 at 06:58:47PM +0800, Wen Congyang wrote:
> At 03/14/2012 06:52 PM, Avi Kivity Wrote:
> > On 03/14/2012 12:52 PM, Wen Congyang wrote:
> >>>
>  If so, is this channel visible to guest userspace? If the channle is 
>  visible to guest
>  userspace, the program running in userspace may write the same message 
>  to the channel.
> >>>
> >>> Access control is via permissions.  You can have udev scripts assign
> >>> whatever uid and gid to the port of your interest.  By default, all
> >>> ports are only accessible to the root user.
> >>
> >> We should also prevent root user writing message to this channel if it is
> >> used for panicked notification.
> >>
> > 
> > Why?  root can easily cause a panic.
> > 
> 
> root user can write the same message to virtio-serial while the guest is 
> running...

Unless you are running a MAC policy which strictly confines the root
account, root can cause a kernel panic regardless of virtio-serial
permissions in the guest:

  echo c > /proc/sysrq-trigger

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Gleb Natapov
On Wed, Mar 14, 2012 at 06:58:47PM +0800, Wen Congyang wrote:
> At 03/14/2012 06:52 PM, Avi Kivity Wrote:
> > On 03/14/2012 12:52 PM, Wen Congyang wrote:
> >>>
>  If so, is this channel visible to guest userspace? If the channle is 
>  visible to guest
>  userspace, the program running in userspace may write the same message 
>  to the channel.
> >>>
> >>> Access control is via permissions.  You can have udev scripts assign
> >>> whatever uid and gid to the port of your interest.  By default, all
> >>> ports are only accessible to the root user.
> >>
> >> We should also prevent root user writing message to this channel if it is
> >> used for panicked notification.
> >>
> > 
> > Why?  root can easily cause a panic.
> > 
> 
> root user can write the same message to virtio-serial while the guest is 
> running...
> 
So what?

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Wen Congyang
At 03/14/2012 06:57 PM, Daniel P. Berrange Wrote:
> On Wed, Mar 14, 2012 at 03:21:14PM +0530, Amit Shah wrote:
>> On (Wed) 14 Mar 2012 [16:29:50], Wen Congyang wrote:
>>> At 03/13/2012 06:47 PM, Avi Kivity Wrote:
 On 03/13/2012 11:18 AM, Daniel P. Berrange wrote:
> On Mon, Mar 12, 2012 at 12:33:33PM +0200, Avi Kivity wrote:
>> On 03/12/2012 11:04 AM, Wen Congyang wrote:
>>> Do you have any other comments about this patch?
>>>
>>
>> Not really, but I'm not 100% convinced the patch is worthwhile.  It's
>> likely to only be used by Linux, which has kexec facilities, and you can
>> put talk to management via virtio-serial and describe the crash in more
>> details than a simple hypercall.
>
> As mentioned before, I don't think virtio-serial is a good fit for this.
> We want something that is simple & guaranteed always available. Using
> virtio-serial requires significant setup work on both the host and guest.

 So what?  It needs to be done anyway for the guest agent.

> Many management application won't know to make a vioserial device 
> available
> to all guests they create. 

 Then they won't know to deal with the panic event either.

> Most administrators won't even configure kexec,
> let alone virtio serial on top of it. 

 It should be done by the OS vendor, not the individual admin.

> The hypercall requires zero host
> side config, and zero guest side config, which IMHO is what we need for
> this feature.

 If it was this one feature, yes.  But we keep getting more and more
 features like that and we bloat the hypervisor.  There's a reason we
 have a host-to-guest channel, we should use it.

>>>
>>> I donot know how to use virtio-serial.
>>>
>>> I start vm like this:
>>> qemu ...\
>>>-device virtio-serial \
>>>   -chardev socket,path=/tmp/foo,server,nowait,id=foo \
>>>   -device virtserialport,chardev=foo,name=port1 ...
>>
>> This is sufficient.  On the host, you can open /tmp/foo using a custom
>> program or nc (nc -U /tmp/foo).  On the guest, you can just open
>> /dev/virtio-ports/port1 and read/write into it.
>>
>> See the following links for more details.
>>
>> https://fedoraproject.org/wiki/Features/VirtioSerial#How_To_Test
>> http://www.linux-kvm.org/page/Virtio-serial_API
>>
>>> You said that there are too many channels. Does it mean /tmp/foo is a 
>>> channel?
>>
>> You can have several such -device virtserialport.  The -device part
>> describes what the guest will see.  The -chardev part ties that to the
>> host-side part of the channel.
>>
>> /tmp/foo is the host end-point for the channel, in the example above,
>> and /dev/virtio-ports/port1 is the guest-side end-point.
> 
> If we do choose to use virtio-serial for panics (which I don't think
> we should), then we should not expose it in the host filesystem. The
> host side should be a virtual chardev backend internal to QEMU, in
> the same way that 'spicevmc' is handled.

Yes. But we don't decide to choose which now. I prefer to use vmcall.
It is simple and more reliable.

Thanks
Wen Congyang

> 
> Regards,
> Daniel

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] KVM: Avoid checking huge page mappings in get_dirty_log()

2012-03-14 Thread Marcelo Tosatti
On Wed, Mar 14, 2012 at 02:34:32PM +0900, Takuya Yoshikawa wrote:
> Marcelo Tosatti  wrote:
> 
> > guest fault enable dirty logging
> > 
> > tdp_page_fault (all _page_fault functions)  kvm_set_memory_region
> > 
> > 
> > level = mapping_level(vcpu, gfn)
> > (finds level == 2 or 3)
> > 
> > 
> > rcu_assign_pointer(slot
> >with 
> >
> > ->dirty_bitmap)
> > synchronize_srcu_expedited()
> 
> Isn't here still in the SRCU read-side critical section?
> 
> > schedule()
> > 
> > kvm_arch_commit_memory_region()
> > spin_lock(mmu_lock)
> > 
> > kvm_mmu_slot_remove_write_access()
> > removes large sptes
> > spin_unlock(mmu_lock)
> > spin_lock(mmu_lock)
> > create large spte accordingly
> > to level above
> > spin_unlock(mmu_lock)
> > 
> 
> If so, we cannot start kvm_arch_commit_memory_region() until the completion
> of the SRCU critical section and this race will not happen.
> 
>   Takuya

Correct.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM: x86: fix kvm_write_tsc() TSC matching thinko

2012-03-14 Thread Jan Kiszka
On 2012-03-08 22:46, Marcelo Tosatti wrote:
> 
> kvm_write_tsc() converts from guest TSC to microseconds, not nanoseconds
> as intended. The result is that the window for matching is 1000 seconds,
> not 1 second.
> 
> Microsecond precision is enough for checking whether the TSC write delta
> is within the heuristic values, so use it instead of nanoseconds.
> 
> Noted by Avi Kivity.
> 
> Signed-off-by: Marcelo Tosatti 
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 32096cf..9732e98 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1025,7 +1025,7 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
>   struct kvm *kvm = vcpu->kvm;
>   u64 offset, ns, elapsed;
>   unsigned long flags;
> - s64 nsdiff;
> + s64 usdiff;
>  
>   raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
>   offset = kvm_x86_ops->compute_tsc_offset(vcpu, data);
> @@ -1033,18 +1033,18 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
>   elapsed = ns - kvm->arch.last_tsc_nsec;
>  
>   /* n.b - signed multiplication and division required */
> - nsdiff = data - kvm->arch.last_tsc_write;
> + usdiff = data - kvm->arch.last_tsc_write;
>  #ifdef CONFIG_X86_64
> - nsdiff = (nsdiff * 1000) / vcpu->arch.virtual_tsc_khz;
> + usdiff = (usdiff * 1000) / vcpu->arch.virtual_tsc_khz;
>  #else
>   /* do_div() only does unsigned */
>   asm("idivl %2; xor %%edx, %%edx"
> - : "=A"(nsdiff)
> - : "A"(nsdiff * 1000), "rm"(vcpu->arch.virtual_tsc_khz));
> + : "=A"(usdiff)
> + : "A"(usdiff * 1000), "rm"(vcpu->arch.virtual_tsc_khz));
>  #endif
> - nsdiff -= elapsed;
> - if (nsdiff < 0)
> - nsdiff = -nsdiff;
> + usdiff -= elapsed / 1000;

do_div, to please x86-32.

Jan

> + if (usdiff < 0)
> + usdiff = -usdiff;
>  
>   /*
>* Special case: TSC write with a small delta (1 second) of virtual
> @@ -1056,7 +1056,7 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
>* compensation code attempt to catch up if we fall behind, but
>* it's better to try to match offsets from the beginning.
>   */
> - if (nsdiff < NSEC_PER_SEC &&
> + if (usdiff < USEC_PER_SEC &&
>   vcpu->arch.virtual_tsc_khz == kvm->arch.last_tsc_khz) {
>   if (!check_tsc_unstable()) {
>   offset = kvm->arch.cur_tsc_offset;
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Wen Congyang
At 03/14/2012 06:59 PM, Daniel P. Berrange Wrote:
> On Wed, Mar 14, 2012 at 06:58:47PM +0800, Wen Congyang wrote:
>> At 03/14/2012 06:52 PM, Avi Kivity Wrote:
>>> On 03/14/2012 12:52 PM, Wen Congyang wrote:
>
>> If so, is this channel visible to guest userspace? If the channle is 
>> visible to guest
>> userspace, the program running in userspace may write the same message 
>> to the channel.
>
> Access control is via permissions.  You can have udev scripts assign
> whatever uid and gid to the port of your interest.  By default, all
> ports are only accessible to the root user.

 We should also prevent root user writing message to this channel if it is
 used for panicked notification.

>>>
>>> Why?  root can easily cause a panic.
>>>
>>
>> root user can write the same message to virtio-serial while the guest is 
>> running...
> 
> Unless you are running a MAC policy which strictly confines the root
> account, root can cause a kernel panic regardless of virtio-serial
> permissions in the guest:
> 
>   echo c > /proc/sysrq-trigger

Yes, root user can cause a kernel panic. But if he writes the same message to 
virtio-serial,
the host will see the guest is panicked while the guest is not panicked. The 
host is cheated.

If we use vmcall, and the user causes a kernel panic, we can also know the 
guest is panicked.
It is the thing what we need. We need to know the guest is panicked, and we 
donot aware
why it is panicked. If the guest is not panicked, and the host think the guest 
is panicked, it
is not the thing we need.

Thanks

> 
> Regards,
> Daniel

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Wen Congyang
At 03/14/2012 06:48 PM, Avi Kivity Wrote:
> On 03/14/2012 12:46 PM, Gleb Natapov wrote:
>> On Wed, Mar 14, 2012 at 12:29:57PM +0200, Avi Kivity wrote:
>>> On 03/14/2012 12:26 PM, Wen Congyang wrote:
>> If so, is this channel visible to guest userspace? If the channle is 
>> visible to guest
>> userspace, the program running in userspace may write the same message 
>> to the channel.
>>
>
> Surely there's some kind of access control on channels.

 The virtio-serial depends on more things than touching the hypervisor. So 
 I think touching
 the hypervisor is more reliable than using virtio-serial device, and it is 
 very simple and
 easy to use.

 If we pass something from guest userspace to host, we can use 
 virtio-serial. But If we pass
 something from guest kernelspace to host, I still prefer to touch the 
 hypervisor.
>>>
>>> There's no argument that it's easier.  My concern is different, we're
>>> adding more and more stuff to the hypervisor because it's easier, which
>>> bloats it.  Every time we do it we add to compatibility and security
>>> problems.
>>>
>>> The panic notification is *really* simple, so I don't expect it to cause
>>> a lot of problems.  But still, if it's possible not to change the
>>> hypervisor, we must make an effort in that direction.
>>>
>> One more point against using virtio-serial is that it will be likely
>> compiled as a module which means panic during early boot will not be
>> reported.
> 
> I don't think we want to use the driver.  Instead, have a small piece of
> code that resets the device and pushes out a string (the panic message?)
> without any interrupts etc.
> 
> It's still going to be less reliable than a hypercall, I agree.

Do you still want to use complicated and less reliable way?

I think the other ones prefer to touch the hypervisor.

Thanks
Wen Congyang

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Gleb Natapov
On Wed, Mar 14, 2012 at 07:06:50PM +0800, Wen Congyang wrote:
> At 03/14/2012 06:59 PM, Daniel P. Berrange Wrote:
> > On Wed, Mar 14, 2012 at 06:58:47PM +0800, Wen Congyang wrote:
> >> At 03/14/2012 06:52 PM, Avi Kivity Wrote:
> >>> On 03/14/2012 12:52 PM, Wen Congyang wrote:
> >
> >> If so, is this channel visible to guest userspace? If the channle is 
> >> visible to guest
> >> userspace, the program running in userspace may write the same message 
> >> to the channel.
> >
> > Access control is via permissions.  You can have udev scripts assign
> > whatever uid and gid to the port of your interest.  By default, all
> > ports are only accessible to the root user.
> 
>  We should also prevent root user writing message to this channel if it is
>  used for panicked notification.
> 
> >>>
> >>> Why?  root can easily cause a panic.
> >>>
> >>
> >> root user can write the same message to virtio-serial while the guest is 
> >> running...
> > 
> > Unless you are running a MAC policy which strictly confines the root
> > account, root can cause a kernel panic regardless of virtio-serial
> > permissions in the guest:
> > 
> >   echo c > /proc/sysrq-trigger
> 
> Yes, root user can cause a kernel panic. But if he writes the same message to 
> virtio-serial,
> the host will see the guest is panicked while the guest is not panicked. The 
> host is cheated.
> 
And why is this a problem? If root in a guest wants to cheat host like
that there is no way to stop him. He can load kernel module and do
whatever he wants. Management should treat that condition as if guest
panicked.

> If we use vmcall, and the user causes a kernel panic, we can also know the 
> guest is panicked.
> It is the thing what we need. We need to know the guest is panicked, and we 
> donot aware
> why it is panicked. If the guest is not panicked, and the host think the 
> guest is panicked, it
> is not the thing we need.
> 
Then you cannot get the thing you need and you can as well stop trying.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Wen Congyang
At 03/14/2012 06:58 PM, Gleb Natapov Wrote:
> On Wed, Mar 14, 2012 at 06:57:59PM +0800, Wen Congyang wrote:
>> At 03/14/2012 06:52 PM, Gleb Natapov Wrote:
>>> On Wed, Mar 14, 2012 at 06:52:07PM +0800, Wen Congyang wrote:
 At 03/14/2012 06:37 PM, Amit Shah Wrote:
> On (Wed) 14 Mar 2012 [17:53:00], Wen Congyang wrote:
>> At 03/14/2012 05:24 PM, Avi Kivity Wrote:
>>> On 03/14/2012 10:29 AM, Wen Congyang wrote:
 At 03/13/2012 06:47 PM, Avi Kivity Wrote:
> On 03/13/2012 11:18 AM, Daniel P. Berrange wrote:
>> On Mon, Mar 12, 2012 at 12:33:33PM +0200, Avi Kivity wrote:
>>> On 03/12/2012 11:04 AM, Wen Congyang wrote:
 Do you have any other comments about this patch?

>>>
>>> Not really, but I'm not 100% convinced the patch is worthwhile.  
>>> It's
>>> likely to only be used by Linux, which has kexec facilities, and 
>>> you can
>>> put talk to management via virtio-serial and describe the crash in 
>>> more
>>> details than a simple hypercall.
>>
>> As mentioned before, I don't think virtio-serial is a good fit for 
>> this.
>> We want something that is simple & guaranteed always available. Using
>> virtio-serial requires significant setup work on both the host and 
>> guest.
>
> So what?  It needs to be done anyway for the guest agent.
>
>> Many management application won't know to make a vioserial device 
>> available
>> to all guests they create. 
>
> Then they won't know to deal with the panic event either.
>
>> Most administrators won't even configure kexec,
>> let alone virtio serial on top of it. 
>
> It should be done by the OS vendor, not the individual admin.
>
>> The hypercall requires zero host
>> side config, and zero guest side config, which IMHO is what we need 
>> for
>> this feature.
>
> If it was this one feature, yes.  But we keep getting more and more
> features like that and we bloat the hypervisor.  There's a reason we
> have a host-to-guest channel, we should use it.
>

 I donot know how to use virtio-serial.
>>>
>>> I don't either, copying Amit.
>>>
 I start vm like this:
 qemu ...\
-device virtio-serial \
   -chardev socket,path=/tmp/foo,server,nowait,id=foo \
   -device virtserialport,chardev=foo,name=port1 ...

 You said that there are too many channels. Does it mean /tmp/foo is a 
 channel?
>>>
>>> Probably.
>>
>> Hmm, if we use virtio-serial, the guest kernel writes something into the 
>> channel when
>> the os is panicked. Is it right?
>
> Depends on how you want to use it.  It could be the kernel, or it
> could be a userspace program which monitors syslogs for panic
> information and passes on that info to the virtio-serial channel.

 When the kernel is panicked, we cannot use userspace program.

>
>> If so, is this channel visible to guest userspace? If the channle is 
>> visible to guest
>> userspace, the program running in userspace may write the same message 
>> to the channel.
>
> Access control is via permissions.  You can have udev scripts assign
> whatever uid and gid to the port of your interest.  By default, all
> ports are only accessible to the root user.

 We should also prevent root user writing message to this channel if it is
 used for panicked notification.

>>> Why? Root user can also call panic hypercall if he wishes so.
>>
>> IIRC, the instruction vmcall needs to run on ring0. The root user is in 
>> ring3.
>>
> And who will stop the root from loading kernel module?

Yes, I forgot this.

Thanks
Wen Congyang

>  
> --
>   Gleb.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Daniel P. Berrange
On Wed, Mar 14, 2012 at 07:06:50PM +0800, Wen Congyang wrote:
> At 03/14/2012 06:59 PM, Daniel P. Berrange Wrote:
> > On Wed, Mar 14, 2012 at 06:58:47PM +0800, Wen Congyang wrote:
> >> At 03/14/2012 06:52 PM, Avi Kivity Wrote:
> >>> On 03/14/2012 12:52 PM, Wen Congyang wrote:
> >
> >> If so, is this channel visible to guest userspace? If the channle is 
> >> visible to guest
> >> userspace, the program running in userspace may write the same message 
> >> to the channel.
> >
> > Access control is via permissions.  You can have udev scripts assign
> > whatever uid and gid to the port of your interest.  By default, all
> > ports are only accessible to the root user.
> 
>  We should also prevent root user writing message to this channel if it is
>  used for panicked notification.
> 
> >>>
> >>> Why?  root can easily cause a panic.
> >>>
> >>
> >> root user can write the same message to virtio-serial while the guest is 
> >> running...
> > 
> > Unless you are running a MAC policy which strictly confines the root
> > account, root can cause a kernel panic regardless of virtio-serial
> > permissions in the guest:
> > 
> >   echo c > /proc/sysrq-trigger
> 
> Yes, root user can cause a kernel panic. But if he writes the same message to 
> virtio-serial,
> the host will see the guest is panicked while the guest is not panicked. The 
> host is cheated.

The host mgmt layer must *ALWAYS* expect that any information originating
from the guest is bogus. It must never trust the guest info. So regardless
of the implementation, you have to expect that the guest might have lied
to you about it being crashed. The same is true even of Xen's panic notifier.

So if an application is automatically triggering core dumps based on this
panic notification, it needs to be aware that the guest can lie and take
steps to avoid the guest causing a DOS attack on the host. Most likely
by rate limiting the frequency of core dumps per guest, and/or setting a
max core dump storage quota per guest.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3 1/9] net: introduce tcp_server_start()

2012-03-14 Thread Kevin Wolf
Am 14.03.2012 08:51, schrieb Amos Kong:
> On 14/03/12 15:27, Paolo Bonzini wrote:
>>
> 
> Hi Paolo,
> 
>> Il 14/03/2012 08:14, Orit Wasserman ha scritto:
>>> if (bind(*fd, (struct sockaddr *)&saddr, sizeof(saddr))<  0)
>>> {   
>>>  closesocket(*fd);
>>>  return -socket_error();
>>> }
>>> return 0;
>>>
>>> and than you will not need ret
>>
>> But closesocket could clobber socket_error(), no?
> 
> Yes, it will effect socket_error()
> 
> How about this fix ?
> 
>  ret = bind(*fd, (struct sockaddr *)&saddr, sizeof(saddr));
>  if (ret < 0) {
>  ret = -socket_error();
>  closesocket(*fd);
>  }
>  return ret;
> }
> 

But it's still moved (or in this patch copied) code, right?

If so, please move it in one patch, and then fix it in another one on top.

Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] virtio-pci: fix abort when fail to allocate ioeventfd

2012-03-14 Thread Stefan Hajnoczi
On Wed, Mar 14, 2012 at 10:46 AM, Avi Kivity  wrote:
> On 03/14/2012 12:39 PM, Stefan Hajnoczi wrote:
>> On Wed, Mar 14, 2012 at 10:05 AM, Avi Kivity  wrote:
>> > On 03/14/2012 11:59 AM, Stefan Hajnoczi wrote:
>> >> On Wed, Mar 14, 2012 at 9:22 AM, Avi Kivity  wrote:
>> >> > On 03/13/2012 12:42 PM, Amos Kong wrote:
>> >> >> Boot up guest with 232 virtio-blk disk, qemu will abort for fail to
>> >> >> allocate ioeventfd. This patchset changes kvm_has_many_ioeventfds(),
>> >> >> and check if available ioeventfd exists. If not, virtio-pci will
>> >> >> fallback to userspace, and don't use ioeventfd for io notification.
>> >> >
>> >> > How about an alternative way of solving this, within the memory core:
>> >> > trap those writes in qemu and write to the ioeventfd yourself.  This way
>> >> > ioeventfds work even without kvm:
>> >> >
>> >> >
>> >> >  core: create eventfd
>> >> >  core: install handler for memory address that writes to ioeventfd
>> >> >  kvm (optional): install kernel handler for ioeventfd
>> >> >
>> >> > even if the third step fails, the ioeventfd still works, it's just 
>> >> > slower.
>> >>
>> >> That approach will penalize guests with large numbers of disks - they
>> >> see an extra switch to vcpu thread instead of kvm.ko -> iothread.
>> >
>> > It's only a failure path.  The normal path is expected to have a kvm
>> > ioeventfd installed.
>>
>> It's the normal path when you attach >232 virtio-blk devices to a
>> guest (or 300 in the future).
>
> Well, there's nothing we can do about it.
>
> We'll increase the limit of course, but old kernels will remain out
> there.  The right fix is virtio-scsi anyway.
>
>> >>   It
>> >> seems okay provided we can solve the limit in the kernel once and for
>> >> all by introducing a more dynamic data structure for in-kernel
>> >> devices.  That way future kernels will never hit an arbitrary limit
>> >> below their file descriptor rlimit.
>> >>
>> >> Is there some reason why kvm.ko must use a fixed size array?  Would it
>> >> be possible to use a tree (maybe with a cache for recent lookups)?
>> >
>> > It does use bsearch today IIRC.  We'll expand the limit, but there must
>> > be a limit, and qemu must be prepared to deal with it.
>>
>> Shouldn't the limit be the file descriptor rlimit?  If userspace
>> cannot create more eventfds then it cannot set up more ioeventfds.
>
> You can use the same eventfd for multiple ioeventfds.  If you mean to
> slave kvm's ioeventfd limit to the number of files the process can have,
> that's a good idea.  Surely an ioeventfd occupies less resources than an
> open file.

Yes.

Ultimately I guess you're right in that we still need to have an error
path and virtio-scsi will reduce the pressure on I/O eventfds for
storage.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


User question: balloon memory, currentMemory and Linux/FreeBSD guests

2012-03-14 Thread Andy Smith

Hi,

  I'm a KVM user, using it for the first time and so far very happy 
with it.
I wonder if someone can help me get my head round the issue of balloon 
memory and overallocation.


Basically I have a Debian wheezy/sid KVM server, on which I have a 
couple of Linux guests and several FreeBSD guests and I see different 
behaviour between the two guest types. On FreeBSD I have installed the 
virtio and balloon drivers (disk and network working great!), there is 
definately a balloon process running.
The behaviour I see is that both Linux and FreeBSD guests always show 
the "currentMemory" number when quiered with "qemu-monitor-command --hmp 
ClientName --cmd 'info balloon'". But from a "top" in the debian KVM 
host the "RES" size of the Linux guest processes varies (presumably 
depending on the demand of the guest OS) but the FreeBSD kvm processes 
always sit at exactly the "currentMemory" size.


Basically I'm interested to know what is going on, whether what I see 
is normal and whether balloon currently dynamically manages memory usage 
or if its still the case that it requires administrator intervention to 
re-release memory after the OS has previously been allocated it.


If there's a more appropriate place to ask this type of question please 
let me know,


thanks a lot! Andy.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Avi Kivity
On 03/14/2012 01:11 PM, Wen Congyang wrote:
> > 
> > I don't think we want to use the driver.  Instead, have a small piece of
> > code that resets the device and pushes out a string (the panic message?)
> > without any interrupts etc.
> > 
> > It's still going to be less reliable than a hypercall, I agree.
>
> Do you still want to use complicated and less reliable way?

Are you willing to try it out and see how complicated it really is?

While it's more complicated, it's also more flexible.  You can
communicate the panic message, whether the guest is attempting a kdump
and its own recovery or whether it wants the host to do it, etc., you
can communicate less severe failures like oopses.

> I think the other ones prefer to touch the hypervisor.

I understand the sentiment.  Your patches are simple and easy.  But my
feeling is that the kernel has become too complicated already and I'm
looking for ways to limit changes.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Avi Kivity
On 03/14/2012 03:07 PM, Avi Kivity wrote:
> On 03/14/2012 01:11 PM, Wen Congyang wrote:
> > > 
> > > I don't think we want to use the driver.  Instead, have a small piece of
> > > code that resets the device and pushes out a string (the panic message?)
> > > without any interrupts etc.
> > > 
> > > It's still going to be less reliable than a hypercall, I agree.
> >
> > Do you still want to use complicated and less reliable way?
>
> Are you willing to try it out and see how complicated it really is?
>
> While it's more complicated, it's also more flexible.  You can
> communicate the panic message, whether the guest is attempting a kdump
> and its own recovery or whether it wants the host to do it, etc., you
> can communicate less severe failures like oopses.

Note, this is similar to how network drivers have a special path (no
interrupts) for netconsole output, this is used during panic as well.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Gleb Natapov
On Wed, Mar 14, 2012 at 03:07:46PM +0200, Avi Kivity wrote:
> On 03/14/2012 01:11 PM, Wen Congyang wrote:
> > > 
> > > I don't think we want to use the driver.  Instead, have a small piece of
> > > code that resets the device and pushes out a string (the panic message?)
> > > without any interrupts etc.
> > > 
> > > It's still going to be less reliable than a hypercall, I agree.
> >
> > Do you still want to use complicated and less reliable way?
> 
> Are you willing to try it out and see how complicated it really is?
> 
> While it's more complicated, it's also more flexible.  You can
> communicate the panic message, whether the guest is attempting a kdump
> and its own recovery or whether it wants the host to do it, etc., you
> can communicate less severe failures like oopses.
> 
hypercall can take arguments to achieve the same.

> > I think the other ones prefer to touch the hypervisor.
> 
> I understand the sentiment.  Your patches are simple and easy.  But my
> feeling is that the kernel has become too complicated already and I'm
> looking for ways to limit changes.
> 
Using virtio-serial will not reduce kernel complexity. Quite contrary
since code that will use virtio-serial will be more complicated.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Avi Kivity
On 03/14/2012 03:14 PM, Gleb Natapov wrote:
> On Wed, Mar 14, 2012 at 03:07:46PM +0200, Avi Kivity wrote:
> > On 03/14/2012 01:11 PM, Wen Congyang wrote:
> > > > 
> > > > I don't think we want to use the driver.  Instead, have a small piece of
> > > > code that resets the device and pushes out a string (the panic message?)
> > > > without any interrupts etc.
> > > > 
> > > > It's still going to be less reliable than a hypercall, I agree.
> > >
> > > Do you still want to use complicated and less reliable way?
> > 
> > Are you willing to try it out and see how complicated it really is?
> > 
> > While it's more complicated, it's also more flexible.  You can
> > communicate the panic message, whether the guest is attempting a kdump
> > and its own recovery or whether it wants the host to do it, etc., you
> > can communicate less severe failures like oopses.
> > 
> hypercall can take arguments to achieve the same.

It has to be designed in advance; and every time we notice something's
missing we have to update the host kernel.

> > > I think the other ones prefer to touch the hypervisor.
> > 
> > I understand the sentiment.  Your patches are simple and easy.  But my
> > feeling is that the kernel has become too complicated already and I'm
> > looking for ways to limit changes.
> > 
> Using virtio-serial will not reduce kernel complexity. Quite contrary
> since code that will use virtio-serial will be more complicated.

The host kernel is unmodified though.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Gleb Natapov
On Wed, Mar 14, 2012 at 03:16:05PM +0200, Avi Kivity wrote:
> On 03/14/2012 03:14 PM, Gleb Natapov wrote:
> > On Wed, Mar 14, 2012 at 03:07:46PM +0200, Avi Kivity wrote:
> > > On 03/14/2012 01:11 PM, Wen Congyang wrote:
> > > > > 
> > > > > I don't think we want to use the driver.  Instead, have a small piece 
> > > > > of
> > > > > code that resets the device and pushes out a string (the panic 
> > > > > message?)
> > > > > without any interrupts etc.
> > > > > 
> > > > > It's still going to be less reliable than a hypercall, I agree.
> > > >
> > > > Do you still want to use complicated and less reliable way?
> > > 
> > > Are you willing to try it out and see how complicated it really is?
> > > 
> > > While it's more complicated, it's also more flexible.  You can
> > > communicate the panic message, whether the guest is attempting a kdump
> > > and its own recovery or whether it wants the host to do it, etc., you
> > > can communicate less severe failures like oopses.
> > > 
> > hypercall can take arguments to achieve the same.
> 
> It has to be designed in advance; and every time we notice something's
> missing we have to update the host kernel.
> 

We and in the designed stage now. Not to late to design something flexible
:) Panic hypercall can take GPA of a buffer where host puts panic info
as a parameter.  This buffer can be read by QEMU and passed to management.

> > > > I think the other ones prefer to touch the hypervisor.
> > > 
> > > I understand the sentiment.  Your patches are simple and easy.  But my
> > > feeling is that the kernel has become too complicated already and I'm
> > > looking for ways to limit changes.
> > > 
> > Using virtio-serial will not reduce kernel complexity. Quite contrary
> > since code that will use virtio-serial will be more complicated.
> 
> The host kernel is unmodified though.
> 
Yes, this is trade-off between complexity in hypervisor and a guest
kernel. But in the end we use the same kernel for both.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3 1/9] net: introduce tcp_server_start()

2012-03-14 Thread Michael Roth
On Wed, Mar 14, 2012 at 04:33:14PM +0800, Amos Kong wrote:
> On 14/03/12 00:39, Michael Roth wrote:
> >On Wed, Mar 07, 2012 at 06:47:45AM +0800, Amos Kong wrote:
> >>Introduce tcp_server_start() by moving original code in
> >>tcp_start_incoming_migration().
> >>
> >>Signed-off-by: Amos Kong
> >>---
> >>  net.c |   28 
> >>  qemu_socket.h |2 ++
> >>  2 files changed, 30 insertions(+), 0 deletions(-)
> >>
> >>diff --git a/net.c b/net.c
> >>index c34474f..e90ff23 100644
> >>--- a/net.c
> >>+++ b/net.c
> >>@@ -99,6 +99,34 @@ static int get_str_sep(char *buf, int buf_size, const 
> >>char **pp, int sep)
> >>  return 0;
> >>  }
> >>
> >>+int tcp_server_start(const char *str, int *fd)
> >>+{
> >>+int val, ret;
> >>+struct sockaddr_in saddr;
> >>+
> >>+if (parse_host_port(&saddr, str)<  0) {
> >>+error_report("invalid host/port combination: %s", str);
> >>+return -EINVAL;
> >>+}
> >>+
> >>+*fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
> >>+if (fd<  0) {
> >>+perror("socket");
> >>+return -1;
> >>+}
> >>+socket_set_nonblock(*fd);
> >>+
> >>+/* allow fast reuse */
> >>+val = 1;
> >>+setsockopt(*fd, SOL_SOCKET, SO_REUSEADDR, (const char *)&val, 
> >>sizeof(val));
> >>+
> >>+ret = bind(*fd, (struct sockaddr *)&saddr, sizeof(saddr));
> >>+if (ret<  0) {
> >>+closesocket(*fd);
> >>+}
> >>+return ret;
> >>+}
> >>+
> >
> >I would combine this with patch 2, since it provides context for why
> >this function is being added. Would also do the same for 3 and 4.
> >
> >I see client the client implementation you need to pass fd back by
> >reference since ret can be set to EINPROGRESS/EWOULDBLOCK on success,
> 
> ret restores 0 or -socket_error()
>  success: 0, -EINPROGRESS
>  fail   : ret < 0 && ret !=-EINTR && ret != -EWOULDBLOCK
> 
> , it should be -EINPROGRESS

I see, I think I was confued by patch #4 where you do a

+ret = tcp_client_start(host_port, &s->fd);
+if (ret == -EINPROGRESS || ret == -EWOULDBLOCK) {
+DPRINTF("connect in progress");
+qemu_set_fd_handler2(s->fd, NULL, NULL, tcp_wait_for_connect, s);

If ret == EWOULDBLOCK is a failure (or if the call isn't supposed to
return EWOULDBLOCK), we should fail it there rather than passing it on to
tcp_wait_for_connect().

> 
> +ret = connect(*fd, (struct sockaddr *)&saddr, sizeof(saddr));
> +if (ret < 0) {
> +ret = -socket_error();
> +if (ret == -EINPROGRESS) {
> +break;
> 
> 
> >but here it looks like fd is only value if ret=0, so might as well just
> >pass back the fd as the function return value.
> 
> Passed *fd tells us if socket creation success
> 'ret' would return 0 or -socket_error()
> 
> -socket_error() is the error of socket creation/connection, it would
> be used by other functions.
> 
> eg: migration-tcp.c:
> int tcp_start_incoming_migration(
> ...
> ret = tcp_server_start(host_port, &s);
> if (ret < 0) {
> fprintf(stderr, "tcp_server_start: %s\n", strerror(-ret));
> 
> We can also add a error msg in tcp_start_outgoing_migration() when
> tcp_client_start() fails.
> 
> >Also, is there any reason we can't re-use
> >qemu-sockets.c:inet_listen()/qemu-sockets.c:inet_connect()? AFAICT they
> >serve the same purpose, and already include some of the work from your
> >PATCH #6.
> 
> We could not directly use it, there are some difference,
> such as tcp_start_incoming_migration() doesn't set socket no-blocked,
> but net_socket_listen_init() sets socket no-blocked.

I think adding a common function with blocking/non-blocking flag and
having inet_listen_opts()/socket_listen_opts() call it with a wrapper
would be reasonable.

A lot of code is being introduced here to solve problems that are
already handled in qemu-sockets.c. inet_listen()/inet_connect()
already handles backeted-enclosed ipv6 addrs, getting port numbers when
there's more than one colon, getaddrinfo()-based connections, and most
importantly it's had ipv6 support from day 1.

Not 100% sure it'll work for what you're doing, but qemu-sockets.c was
specifically added for this type of use-case and is heavilly used
currently (vnc, nbd, Chardev users), so I think we should use it unless
there's a good reason not to.

> 
> There are some repeated code, so I write a tcp_common_start(), and
> use it in net_socket_listen_init()/
> net_socket_connect_init() and tcp_start_incoming_migration()/
> tcp_start_outgoing_migration()
> 
> 
> >>  int parse_host_port(struct sockaddr_in *saddr, const char *str)
> >>  {
> >>  char buf[512];
> >>diff --git a/qemu_socket.h b/qemu_socket.h
> >>index fe4cf6c..d612793 100644
> >>--- a/qemu_socket.h
> >>+++ b/qemu_socket.h
> >>@@ -54,6 +54,8 @@ int unix_listen(const char *path, char *ostr, int olen);
> >>  int unix_connect_opts(QemuOpts *opts);
> >>  int unix_connect(const char *path);
> >>
> >>+int tcp_server_start(const char *str, int *fd);
> >>+

Re: [Qemu-devel] [PATCH v3 3/9] net: introduce tcp_client_start()

2012-03-14 Thread Michael Roth
On Wed, Mar 14, 2012 at 06:19:48PM +0800, Amos Kong wrote:
> On 14/03/12 02:35, Michael Roth wrote:
> >On Wed, Mar 07, 2012 at 06:48:03AM +0800, Amos Kong wrote:
> >>Introduce tcp_client_start() by moving original code in
> >>tcp_start_outgoing_migration().
> >>
> >>Signed-off-by: Amos Kong
> >>---
> >>  net.c |   41 +
> >>  qemu_socket.h |1 +
> >>  2 files changed, 42 insertions(+), 0 deletions(-)
> >>
> >>diff --git a/net.c b/net.c
> >>index e90ff23..9afb0d1 100644
> >>--- a/net.c
> >>+++ b/net.c
> >>@@ -127,6 +127,47 @@ int tcp_server_start(const char *str, int *fd)
> >>  return ret;
> >>  }
> >>
> >>+int tcp_client_start(const char *str, int *fd)
> >>+{
> 
> ...
> 
> Hi Michael,
> 
> 
> >>+*fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
> >>+if (fd<  0) {
> >>+perror("socket");
> >>+return -1;
> >>+}
> >>+socket_set_nonblock(*fd);
> >>+
> >>+for (;;) {
> >>+ret = connect(*fd, (struct sockaddr *)&saddr, sizeof(saddr));
> >>+if (ret<  0) {
> >>+ret = -socket_error();
> >>+if (ret == -EINPROGRESS) {
> >>+break;
> >
> >The previous implementation and your next patch seem to be expecting a break 
> >on
> >-EWOULDBLOCK/-EAGAIN as well. Was the behavior changed on purpose?
> 
> In original tcp_start_outgoing_migration():
>   break:  -EINPROGRES
>   cont :  -EINTR or -EWOULDBLOCK
> 
> In original net_socket_connect_init():
>   break:  -EINPROGRES or -EWOULDBLOCK
>   cont :  -EINTR
> 
> 
> http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_15.html
> EWOULDBLOCK
> socket has nonblocking mode set, and there are no pending
> connections immediately available.
> 
> So continue to re-connect if EWOULDBLOCK or EINTR returned by
> socket_error() in tcp_client_start()
> 

That seems to be for accept(), not connect(). And in the accept()/incoming
case I don't think it's an issue to keep retrying.

On the connect()/outgoing case I think we need to be careful because we can
hang both the monitor and the guest indefinitely if there's an issue affecting
outgoing connection attempts on the source-side. It's much safer to fail in
this situation rather than loop indefinitely, and originally that's what the
code did, albeit somewhat indirectly. That behavior is changed with your
implementation:

tcp_start_outgoing_migration() originally:
...
socket_set_nonblock(s->fd);

do {
ret = connect(s->fd, (struct sockaddr *)&addr, sizeof(addr));
if (ret == -1) {
ret = -socket_error();
}
if (ret == -EINPROGRESS || ret == -EWOULDBLOCK) {
qemu_set_fd_handler2(s->fd, NULL, NULL, tcp_wait_for_connect, s);
return 0;
}
} while (ret == -EINTR);
...

tcp_start_output_migration() with your changes:
...
ret = tcp_client_start(host_port, &s->fd);
if (ret == -EINPROGRESS || ret == -EWOULDBLOCK) {
DPRINTF("connect in progress");
qemu_set_fd_handler2(s->fd, NULL, NULL, tcp_wait_for_connect, s);

tcp_client_start():

static int tcp_client_connect(int fd, struct sockaddr_in *saddr)
{
int ret;

do {
ret = connect(fd, (struct sockaddr *)saddr, sizeof(*saddr));
if (ret == -1) {
ret = -socket_error();
}
} while (ret == -EINTR || ret == -EWOULDBLOCK);

> 
> >  I'm not
> >sure what the proper handling is for -EAGAIN: whether a non-blocking 
> >connect()
> >can eventually succeed or not. I suspect that it's not, but that previously 
> >we
> >treated it synonymously with -EINPROGRESS, then eventually got an error via
> >getsockopt() before failing the migration. If so, we're now changing the
> >behavior to retry until successful, but given the man page entry I don't
> >think that's a good idea since you might block indefinitely:
> >
> >  EAGAIN No  more  free local ports or insufficient
> >   entries in the routing cache.  For AF_INET
> >   seethedescription   of
> >   /proc/sys/net/ipv4/ip_local_port_range
> >   ip(7)  for  information on how to increase
> >   the number of local ports.
> 
> 
> We didn't process EAGAIN specially, you mean EINTR ?

I was referring to the EWOULDBLOCK handling, but on linux at least,
EAGAIN == EWOULDBLOCK.

> 
> 
> >
> >>+#ifdef _WIN32
> >>+} else if (ret == -WSAEALREADY || ret == -WSAEINVAL) {
> >>+break;
> >>+#endif
> >>+} else if (ret != -EINTR&&  ret != -EWOULDBLOCK) {
> >>+perror("connect");
> >>+closesocket(*fd);
> >>+return ret;
> 
> -EAGAIN would go this path.

When EAGAIN == EWOULDBLOCK, it would loop, and I'm not aware of any hosts where
this won't be the case. BSD maybe?

> 
> 
> >>+}
> >>+} else {
> >>+break;
> >>+}
> >>+}
> >>+
> >>+retur

Re: [Qemu-devel] [PATCH v3 9/9] net: support to include ipv6 address by brackets

2012-03-14 Thread Michael Roth
On Wed, Mar 14, 2012 at 05:58:20PM +0800, Amos Kong wrote:
> On 14/03/12 03:47, Michael Roth wrote:
> >On Wed, Mar 07, 2012 at 06:48:57AM +0800, Amos Kong wrote:
> >>That method of representing an IPv6 address with a port is
> >
> >I'm not sure what "that" is referencing.
> 
> 2312::8274:5200  (5200 is a port)
> 
> >I assumed the previous patch
> >but the representation seems to be the same?
> 
> 2312::8274:5200
> 
> [2312::8274]:5200
> 
> The second one is better.

The commit message for the previous patch also uses "[2312::8274]:5200"

If that format wasn't supported till this patch you should fix up the
example in patch 8 to be "2312::8274:5200". Small nit, but it gives the
impression [host]:port was already supported in some form, which is all
the more confusing because [host]:port *is* already supported in some
form, depending on where you are in QEMU :)

> 
> >>discouraged because of its ambiguity. Referencing to RFC5952,
> >>the recommended format is:
> >>
> >>  [2312::8274]:5200
> >>
> >>For IPv6 brackets must be mandatory if you require a port.
> >>
> >>test status: Successed
> >>listen side: qemu-kvm  -incoming tcp:[2312::8274]:5200
> >>client side: qemu-kvm ...
> >>  (qemu) migrate -d tcp:[2312::8274]:5200
> >>
> >>Signed-off-by: Amos Kong
> >>---
> >>  net.c |6 ++
> >>  1 files changed, 6 insertions(+), 0 deletions(-)
> >>
> >>diff --git a/net.c b/net.c
> >>index d6ce1fa..499ed1d 100644
> >>--- a/net.c
> >>+++ b/net.c
> >>@@ -88,6 +88,12 @@ static int get_str_sep(char *buf, int buf_size, const 
> >>char **pp, int sep)
> >>  if (!p1)
> >>  return -1;
> >>  len = p1 - p;
> >>+/* remove brackets which includes hostname */
> >>+if (*p == '['&&  *(p1-1) == ']') {
> >>+p += 1;
> >>+len -= 2;
> >>+}
> >
> >Sorry, looking again I guess net/slirp.c actually has it's own copy of
> >get_str_sep(), so modifying this doesn't look like it would break
> >anything currently.
> 
> >It might cause some confusion though :). And I think
> >the special handling for brackets should be done in
> >parse_host_port_info() since get_str_sep() is pretty generically named.
> 
> agree.
> 
> >>+
> >>  p1++;
> >>  if (buf_size>  0) {
> >>  if (len>  buf_size - 1)
> >>
> >>
> 
> -- 
>   Amos.
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] powerpc/kvm: Fix magic page vs. 32-bit RTAS on ppc64

2012-03-14 Thread Alexander Graf

On 14.03.2012, at 08:52, Benjamin Herrenschmidt wrote:

> When the kernel calls into RTAS, it switches to 32-bit mode. The
> magic page was is longer accessible in that case, causing the
> patched instructions in the RTAS call wrapper to crash.
> 
> This fixes it by making available a 32-bit mapping of the magic
> page in that case. This mapping is flushed whenever we switch
> the kernel back to 64-bit mode.
> 
> Signed-off-by: Benjamin Herrenschmidt 
> ---
> 
> Avi, please consider merging ASAP as this is a fairly annoying
> bug and the fix is reasonably obvious.
> 
> arch/powerpc/kvm/book3s.c|3 +++
> arch/powerpc/kvm/book3s_pr.c |   17 +
> 2 files changed, 20 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index e41ac6f..34487d4 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -289,6 +289,9 @@ pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn)
> {
>   ulong mp_pa = vcpu->arch.magic_page_pa;
> 
> + if (!(vcpu->arch.shared->msr & MSR_SF))
> + mp_pa = (uint32_t)mp_pa;
> +
>   /* Magic page override */
>   if (unlikely(mp_pa) &&
>   unlikely(((gfn << PAGE_SHIFT) & KVM_PAM) ==
> diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
> index d6851a1..23919d4 100644
> --- a/arch/powerpc/kvm/book3s_pr.c
> +++ b/arch/powerpc/kvm/book3s_pr.c
> @@ -137,6 +137,20 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
>   }
>   }
> 
> + /*
> +  * When switching from 32 to 64-bit, we may have a stale 32-bit
> +  * magic page around, we need to flush it. Typically 32-bit magic
> +  * page will be instanciated when calling into RTAS. Note: We
> +  * assume that such transition only happens while in kernel mode,
> +  * ie, we never transition from user 32-bit to kernel 64-bit with
> +  * a 32-bit magic page around.
> +  */
> + if (!(old_msr & MSR_PR) && !(old_msr & MSR_SF) && (msr & MSR_SF)) {

This should also check if we're using a magic page at all. I've added that 
check and replaced my patch with yours in kvm-ppc-next.

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: Book3s: PR: Add HV traps so we can run in HV=1 mode on p7

2012-03-14 Thread Alexander Graf
When running PR KVM on a p7 system in bare metal, we get HV exits instead
of normal supervisor traps. Semantically they are identical though and the
HSRR vs SRR difference is already taken care of in the exit code.

So all we need to do is handle them in addition to our normal exits.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_pr.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index e2be5ad..62d93b0 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -617,10 +617,13 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
break;
/* We're good on these - the host merely wanted to get our attention */
case BOOK3S_INTERRUPT_DECREMENTER:
+   case BOOK3S_INTERRUPT_HV_DECREMENTER:
vcpu->stat.dec_exits++;
r = RESUME_GUEST;
break;
case BOOK3S_INTERRUPT_EXTERNAL:
+   case BOOK3S_INTERRUPT_EXTERNAL_LEVEL:
+   case BOOK3S_INTERRUPT_EXTERNAL_HV:
vcpu->stat.ext_intr_exits++;
r = RESUME_GUEST;
break;
@@ -628,6 +631,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
r = RESUME_GUEST;
break;
case BOOK3S_INTERRUPT_PROGRAM:
+   case BOOK3S_INTERRUPT_H_EMUL_ASSIST:
{
enum emulation_result er;
struct kvmppc_book3s_shadow_vcpu *svcpu;
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: Book3S: PR: Fix preemption

2012-03-14 Thread Alexander Graf
We were leaking preemption counters. Fix the code to always toggle
between preempt and non-preempt properly.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_pr.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 62d93b0..457ba68 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -781,6 +781,7 @@ program_interrupt:
}
}
 
+   preempt_disable();
if (!(r & RESUME_HOST)) {
/* To avoid clobbering exit_reason, only check for signals if
 * we aren't already exiting to userspace for some other
@@ -802,8 +803,6 @@ program_interrupt:
run->exit_reason = KVM_EXIT_INTR;
r = -EINTR;
} else {
-   preempt_disable();
-
/* In case an interrupt came in that was triggered
 * from userspace (like DEC), we need to check what
 * to inject now! */
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: Ignore unhalt request from kvm_vcpu_block

2012-03-14 Thread Alexander Graf
When running kvm_vcpu_block and it realizes that the CPU is actually good
to run, we get a request bit set for KVM_REQ_UNHALT. Right now, there's
nothing we can do with that bit, so let's unset it right after the call
again so we don't get confused in our later checks for pending work.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_pr.c  |1 +
 arch/powerpc/kvm/book3s_pr_papr.c |1 +
 arch/powerpc/kvm/booke.c  |1 +
 3 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 457ba68..990e25c 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -119,6 +119,7 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
if (msr & MSR_POW) {
if (!vcpu->arch.pending_exceptions) {
kvm_vcpu_block(vcpu);
+   clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
vcpu->stat.halt_wakeup++;
 
/* Unset POW bit after we woke up */
diff --git a/arch/powerpc/kvm/book3s_pr_papr.c 
b/arch/powerpc/kvm/book3s_pr_papr.c
index 6d1bfe2..60ac0e7 100644
--- a/arch/powerpc/kvm/book3s_pr_papr.c
+++ b/arch/powerpc/kvm/book3s_pr_papr.c
@@ -224,6 +224,7 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
return kvmppc_h_pr_bulk_remove(vcpu);
case H_CEDE:
kvm_vcpu_block(vcpu);
+   clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
vcpu->stat.halt_wakeup++;
return EMULATE_DONE;
}
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 2675dcb..72f13f4 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -449,6 +449,7 @@ int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu)
if (vcpu->arch.shared->msr & MSR_WE) {
local_irq_enable();
kvm_vcpu_block(vcpu);
+   clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
local_irq_disable();
 
kvmppc_set_exit_type(vcpu, EMULATED_MTMSRWE_EXITS);
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 07/12] target-ppc: Prepare finalizer for PowerPCCPU

2012-03-14 Thread Andreas Färber
Signed-off-by: Andreas Färber 
---
 target-ppc/cpu.h|1 +
 target-ppc/helper.c |1 -
 target-ppc/kvm.c|1 +
 target-ppc/translate_init.c |6 ++
 4 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index ff28843..3ff2156 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -1128,6 +1128,7 @@ void ppc_store_msr (CPUPPCState *env, target_ulong value);
 void ppc_cpu_list (FILE *f, fprintf_function cpu_fprintf);
 
 void ppc_cpu_initfn(Object *obj);
+void ppc_cpu_finalize(Object *obj);
 const char *ppc_find_by_pvr(uint32_t pvr);
 PowerPCCPU *cpu_ppc_find_by_name(const char *name);
 
diff --git a/target-ppc/helper.c b/target-ppc/helper.c
index 7d26cb5..1467cf7 100644
--- a/target-ppc/helper.c
+++ b/target-ppc/helper.c
@@ -3163,6 +3163,5 @@ CPUPPCState *cpu_ppc_init(const char *cpu_model)
 
 void cpu_ppc_close(CPUPPCState *env)
 {
-/* Should also remove all opcode tables... */
 object_delete(OBJECT(ppc_env_get_cpu(env)));
 }
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 2ee5bc0..8be235b 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -942,6 +942,7 @@ static void kvmppc_register_types(void)
 .name = "host",
 .instance_size = sizeof(PowerPCCPU),
 .instance_init = ppc_cpu_initfn,
+.instance_finalize = ppc_cpu_finalize,
 .class_size = sizeof(PowerPCCPUClass),
 .class_init = kvmppc_host_cpu_class_init,
 };
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 52264c8..c167595 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -10212,6 +10212,11 @@ static void ppc_cpu_reset(CPUState *c)
 tlb_flush(env, 1);
 }
 
+void ppc_cpu_finalize(Object *obj)
+{
+/* Should remove all opcode tables... */
+}
+
 static bool ppc_cpu_usable(const PowerPCCPUInfo *def)
 {
 #if defined(TARGET_PPCEMB)
@@ -10336,6 +10341,7 @@ static void ppc_register_cpu(const PowerPCCPUInfo *info)
 .parent = TYPE_POWERPC_CPU,
 .instance_size = sizeof(PowerPCCPU),
 .instance_init = ppc_cpu_initfn,
+.instance_finalize = ppc_cpu_finalize,
 .class_size = sizeof(PowerPCCPUClass),
 .class_init = ppc_cpu_class_init,
 .class_data = (void *)info,
-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 06/12] target-ppc: QOM'ify CPU

2012-03-14 Thread Andreas Färber
Signed-off-by: Andreas Färber 
---
 target-ppc/cpu-qom.h|   84 ++
 target-ppc/cpu.h|   25 +
 target-ppc/helper.c |   72 ++--
 target-ppc/kvm.c|   29 +++--
 target-ppc/kvm_ppc.h|6 -
 target-ppc/translate.c  |2 +-
 target-ppc/translate_init.c |  264 ---
 7 files changed, 342 insertions(+), 140 deletions(-)
 create mode 100644 target-ppc/cpu-qom.h

diff --git a/target-ppc/cpu-qom.h b/target-ppc/cpu-qom.h
new file mode 100644
index 000..9236dcc
--- /dev/null
+++ b/target-ppc/cpu-qom.h
@@ -0,0 +1,84 @@
+/*
+ * QEMU PowerPC CPU
+ *
+ * Copyright (c) 2012 SUSE LINUX Products GmbH
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see
+ * 
+ */
+#ifndef QEMU_PPC_CPU_QOM_H
+#define QEMU_PPC_CPU_QOM_H
+
+#include "qemu/cpu.h"
+#include "cpu.h"
+
+#define TYPE_POWERPC_CPU "powerpc-cpu"
+
+#define POWERPC_CPU_CLASS(klass) \
+OBJECT_CLASS_CHECK(PowerPCCPUClass, (klass), TYPE_POWERPC_CPU)
+#define POWERPC_CPU(obj) \
+OBJECT_CHECK(PowerPCCPU, (obj), TYPE_POWERPC_CPU)
+#define POWERPC_CPU_GET_CLASS(obj) \
+OBJECT_GET_CLASS(PowerPCCPUClass, (obj), TYPE_POWERPC_CPU)
+
+/**
+ * PowerPCCPUClass:
+ * @parent_reset: The parent class' reset handler.
+ *
+ * A PowerPC CPU model.
+ */
+typedef struct PowerPCCPUClass {
+/*< private >*/
+CPUClass parent_class;
+/*< public >*/
+
+void (*parent_reset)(CPUState *cpu);
+
+uint32_t pvr;
+uint32_t svr;
+uint64_t insns_flags;
+uint64_t insns_flags2;
+uint64_t msr_mask;
+powerpc_mmu_t   mmu_model;
+powerpc_excp_t  excp_model;
+powerpc_input_t bus_model;
+uint32_t flags;
+int bfd_mach;
+void (*init_proc)(CPUPPCState *env);
+int  (*check_pow)(CPUPPCState *env);
+} PowerPCCPUClass;
+
+/**
+ * PowerPCCPU:
+ * @env: Legacy CPU state.
+ *
+ * A PowerPC CPU.
+ */
+typedef struct PowerPCCPU {
+/*< private >*/
+CPUState parent_obj;
+/*< public >*/
+
+CPUPPCState env;
+} PowerPCCPU;
+
+static inline PowerPCCPU *ppc_env_get_cpu(CPUPPCState *env)
+{
+return POWERPC_CPU(container_of(env, PowerPCCPU, env));
+}
+
+#define ENV_GET_CPU(e) CPU(ppc_env_get_cpu(e))
+
+
+#endif
diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index ad09cbe..ff28843 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -305,7 +305,6 @@ enum powerpc_input_t {
 #define PPC_INPUT(env) (env->bus_model)
 
 /*/
-typedef struct ppc_def_t ppc_def_t;
 typedef struct opc_handler_t opc_handler_t;
 
 /*/
@@ -877,22 +876,6 @@ enum {
 /* The whole PowerPC CPU context */
 #define NB_MMU_MODES 3
 
-struct ppc_def_t {
-const char *name;
-uint32_t pvr;
-uint32_t svr;
-uint64_t insns_flags;
-uint64_t insns_flags2;
-uint64_t msr_mask;
-powerpc_mmu_t   mmu_model;
-powerpc_excp_t  excp_model;
-powerpc_input_t bus_model;
-uint32_t flags;
-int bfd_mach;
-void (*init_proc)(CPUPPCState *env);
-int  (*check_pow)(CPUPPCState *env);
-};
-
 struct CPUPPCState {
 /* First are the most commonly used resources
  * during translated code execution
@@ -1096,6 +1079,8 @@ struct mmu_ctx_t {
 };
 #endif
 
+#include "cpu-qom.h"
+
 /*/
 CPUPPCState *cpu_ppc_init (const char *cpu_model);
 void ppc_translate_init(void);
@@ -1142,9 +1127,9 @@ void ppc_store_msr (CPUPPCState *env, target_ulong value);
 
 void ppc_cpu_list (FILE *f, fprintf_function cpu_fprintf);
 
-const ppc_def_t *ppc_find_by_pvr(uint32_t pvr);
-const ppc_def_t *cpu_ppc_find_by_name (const char *name);
-int cpu_ppc_register_internal (CPUPPCState *env, const ppc_def_t *def);
+void ppc_cpu_initfn(Object *obj);
+const char *ppc_find_by_pvr(uint32_t pvr);
+PowerPCCPU *cpu_ppc_find_by_name(const char *name);
 
 /* Time-base and decrementer management */
 #ifndef NO_CPU_IO_DEFS
diff --git a/target-ppc/helper.c b/target-ppc/helper.c
index bd711b6..7d26cb5 100644
--- a/target-ppc/helper.c
+++ b/target-ppc/helper.c
@@ -3138,85 +3138,31 @@ void cpu_dump_rfi (target_ulong RA, target_ulong msr)
 
 void cpu_state_reset(CPUPPCState *env)
 {
-target_ulong msr;
-
-if (qemu_logleve

Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Eric Northup
On Wed, Mar 14, 2012 at 6:25 AM, Gleb Natapov  wrote:
> On Wed, Mar 14, 2012 at 03:16:05PM +0200, Avi Kivity wrote:
>> On 03/14/2012 03:14 PM, Gleb Natapov wrote:
>> > On Wed, Mar 14, 2012 at 03:07:46PM +0200, Avi Kivity wrote:
>> > > On 03/14/2012 01:11 PM, Wen Congyang wrote:
>> > > > >
>> > > > > I don't think we want to use the driver.  Instead, have a small 
>> > > > > piece of
>> > > > > code that resets the device and pushes out a string (the panic 
>> > > > > message?)
>> > > > > without any interrupts etc.
>> > > > >
>> > > > > It's still going to be less reliable than a hypercall, I agree.
>> > > >
>> > > > Do you still want to use complicated and less reliable way?
>> > >
>> > > Are you willing to try it out and see how complicated it really is?
>> > >
>> > > While it's more complicated, it's also more flexible.  You can
>> > > communicate the panic message, whether the guest is attempting a kdump
>> > > and its own recovery or whether it wants the host to do it, etc., you
>> > > can communicate less severe failures like oopses.
>> > >
>> > hypercall can take arguments to achieve the same.
>>
>> It has to be designed in advance; and every time we notice something's
>> missing we have to update the host kernel.
>>
>
> We and in the designed stage now. Not to late to design something flexible
> :) Panic hypercall can take GPA of a buffer where host puts panic info
> as a parameter.  This buffer can be read by QEMU and passed to management.

If a host kernel change is in the works, I think it might be cleanest
to have the host kernel export a new kind of VCPU exit for
unhandled-by-KVM hypercalls.  Then usermode can respond to the
hypercall as appropriate.  This would permit adding or changing future
hypercalls without host kernel changes.

"Guest panic" is almost the definition of not-a-fast-path, and so
what's the reason to handle it in the host kernel.

Punting the functionality to user-space isn't a magic bullet for
getting a good interface designed, but in my opinion it is a better
place to be doing this.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 15/43] target-i386: Don't overuse CPUState

2012-03-14 Thread Andreas Färber
Scripted conversion:
  sed -i "s/CPUState/CPUX86State/g" target-i386/*.[hc]
  sed -i "s/#define CPUX86State/#define CPUState/" target-i386/cpu.h

Signed-off-by: Andreas Färber 
Acked-by: Anthony Liguori 
---
 target-i386/cpu.h   |   34 +-
 target-i386/helper.c|   38 +-
 target-i386/kvm.c   |   76 ++--
 target-i386/machine.c   |  172 +++---
 target-i386/op_helper.c |   34 +-
 target-i386/translate.c |   82 +++---
 6 files changed, 218 insertions(+), 218 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 36e3d29..6e26d21 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -788,7 +788,7 @@ int cpu_x86_exec(CPUX86State *s);
 void cpu_x86_close(CPUX86State *s);
 void x86_cpu_list (FILE *f, fprintf_function cpu_fprintf, const char *optarg);
 void x86_cpudef_setup(void);
-int cpu_x86_support_mca_broadcast(CPUState *env);
+int cpu_x86_support_mca_broadcast(CPUX86State *env);
 
 int cpu_get_pic_interrupt(CPUX86State *s);
 /* MSDOS compatibility mode FPU exception support */
@@ -970,7 +970,7 @@ uint64_t cpu_get_tsc(CPUX86State *env);
 #define MMU_MODE0_SUFFIX _kernel
 #define MMU_MODE1_SUFFIX _user
 #define MMU_USER_IDX 1
-static inline int cpu_mmu_index (CPUState *env)
+static inline int cpu_mmu_index (CPUX86State *env)
 {
 return (env->hflags & HF_CPL_MASK) == 3 ? 1 : 0;
 }
@@ -1009,7 +1009,7 @@ static inline int cpu_mmu_index (CPUState *env)
 void optimize_flags_init(void);
 
 #if defined(CONFIG_USER_ONLY)
-static inline void cpu_clone_regs(CPUState *env, target_ulong newsp)
+static inline void cpu_clone_regs(CPUX86State *env, target_ulong newsp)
 {
 if (newsp)
 env->regs[R_ESP] = newsp;
@@ -1024,7 +1024,7 @@ static inline void cpu_clone_regs(CPUState *env, 
target_ulong newsp)
 #include "hw/apic.h"
 #endif
 
-static inline bool cpu_has_work(CPUState *env)
+static inline bool cpu_has_work(CPUX86State *env)
 {
 return ((env->interrupt_request & CPU_INTERRUPT_HARD) &&
 (env->eflags & IF_MASK)) ||
@@ -1036,12 +1036,12 @@ static inline bool cpu_has_work(CPUState *env)
 
 #include "exec-all.h"
 
-static inline void cpu_pc_from_tb(CPUState *env, TranslationBlock *tb)
+static inline void cpu_pc_from_tb(CPUX86State *env, TranslationBlock *tb)
 {
 env->eip = tb->pc - tb->cs_base;
 }
 
-static inline void cpu_get_tb_cpu_state(CPUState *env, target_ulong *pc,
+static inline void cpu_get_tb_cpu_state(CPUX86State *env, target_ulong *pc,
 target_ulong *cs_base, int *flags)
 {
 *cs_base = env->segs[R_CS].base;
@@ -1050,29 +1050,29 @@ static inline void cpu_get_tb_cpu_state(CPUState *env, 
target_ulong *pc,
 (env->eflags & (IOPL_MASK | TF_MASK | RF_MASK | VM_MASK));
 }
 
-void do_cpu_init(CPUState *env);
-void do_cpu_sipi(CPUState *env);
+void do_cpu_init(CPUX86State *env);
+void do_cpu_sipi(CPUX86State *env);
 
 #define MCE_INJECT_BROADCAST1
 #define MCE_INJECT_UNCOND_AO2
 
-void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, int bank,
+void cpu_x86_inject_mce(Monitor *mon, CPUX86State *cenv, int bank,
 uint64_t status, uint64_t mcg_status, uint64_t addr,
 uint64_t misc, int flags);
 
 /* op_helper.c */
-void do_interrupt(CPUState *env);
-void do_interrupt_x86_hardirq(CPUState *env, int intno, int is_hw);
-void QEMU_NORETURN raise_exception_env(int exception_index, CPUState *nenv);
-void QEMU_NORETURN raise_exception_err_env(CPUState *nenv, int exception_index,
+void do_interrupt(CPUX86State *env);
+void do_interrupt_x86_hardirq(CPUX86State *env, int intno, int is_hw);
+void QEMU_NORETURN raise_exception_env(int exception_index, CPUX86State *nenv);
+void QEMU_NORETURN raise_exception_err_env(CPUX86State *nenv, int 
exception_index,
int error_code);
 
-void do_smm_enter(CPUState *env1);
+void do_smm_enter(CPUX86State *env1);
 
-void svm_check_intercept(CPUState *env1, uint32_t type);
+void svm_check_intercept(CPUX86State *env1, uint32_t type);
 
-uint32_t cpu_cc_compute_all(CPUState *env1, int op);
+uint32_t cpu_cc_compute_all(CPUX86State *env1, int op);
 
-void cpu_report_tpr_access(CPUState *env, TPRAccess access);
+void cpu_report_tpr_access(CPUX86State *env, TPRAccess access);
 
 #endif /* CPU_I386_H */
diff --git a/target-i386/helper.c b/target-i386/helper.c
index 140c696..83122bf 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -27,7 +27,7 @@
 //#define DEBUG_MMU
 
 /* NOTE: must be called outside the CPU execute loop */
-void cpu_state_reset(CPUState *env)
+void cpu_state_reset(CPUX86State *env)
 {
 int i;
 
@@ -106,7 +106,7 @@ void cpu_x86_close(CPUX86State *env)
 g_free(env);
 }
 
-static void cpu_x86_version(CPUState *env, int *family, int *model)
+static void cpu_x86_version(CPUX86State *env, int *family, int *model)
 {
 int cpuver = env->cpuid_version;
 
@@ -119,7 +119,7 @

[PATCH v5 20/43] target-ppc: Don't overuse CPUState

2012-03-14 Thread Andreas Färber
Scripted conversion:
  sed -i "s/CPUState/CPUPPCState/g" target-ppc/*.[hc]
  sed -i "s/#define CPUPPCState/#define CPUState/" target-ppc/cpu.h

Signed-off-by: Andreas Färber 
Acked-by: Anthony Liguori 
---
 target-ppc/cpu.h|   38 ++--
 target-ppc/helper.c |   80 +-
 target-ppc/kvm.c|   38 ++--
 target-ppc/kvm_ppc.h|   12 +++---
 target-ppc/machine.c|4 +-
 target-ppc/op_helper.c  |   12 +++---
 target-ppc/translate.c  |   78 +-
 target-ppc/translate_init.c |   42 +++---
 8 files changed, 152 insertions(+), 152 deletions(-)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index ac753f3..3508d8a 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -1173,12 +1173,12 @@ void store_40x_dbcr0 (CPUPPCState *env, uint32_t val);
 void store_40x_sler (CPUPPCState *env, uint32_t val);
 void store_booke_tcr (CPUPPCState *env, target_ulong val);
 void store_booke_tsr (CPUPPCState *env, target_ulong val);
-void booke206_flush_tlb(CPUState *env, int flags, const int check_iprot);
-target_phys_addr_t booke206_tlb_to_page_size(CPUState *env, ppcmas_tlb_t *tlb);
-int ppcemb_tlb_check(CPUState *env, ppcemb_tlb_t *tlb,
+void booke206_flush_tlb(CPUPPCState *env, int flags, const int check_iprot);
+target_phys_addr_t booke206_tlb_to_page_size(CPUPPCState *env, ppcmas_tlb_t 
*tlb);
+int ppcemb_tlb_check(CPUPPCState *env, ppcemb_tlb_t *tlb,
  target_phys_addr_t *raddrp, target_ulong address,
  uint32_t pid, int ext, int i);
-int ppcmas_tlb_check(CPUState *env, ppcmas_tlb_t *tlb,
+int ppcmas_tlb_check(CPUPPCState *env, ppcmas_tlb_t *tlb,
  target_phys_addr_t *raddrp, target_ulong address,
  uint32_t pid);
 void ppc_tlb_invalidate_all (CPUPPCState *env);
@@ -1226,13 +1226,13 @@ int ppc_dcr_write (ppc_dcr_t *dcr_env, int dcrn, 
uint32_t val);
 #define MMU_MODE1_SUFFIX _kernel
 #define MMU_MODE2_SUFFIX _hypv
 #define MMU_USER_IDX 0
-static inline int cpu_mmu_index (CPUState *env)
+static inline int cpu_mmu_index (CPUPPCState *env)
 {
 return env->mmu_idx;
 }
 
 #if defined(CONFIG_USER_ONLY)
-static inline void cpu_clone_regs(CPUState *env, target_ulong newsp)
+static inline void cpu_clone_regs(CPUPPCState *env, target_ulong newsp)
 {
 if (newsp)
 env->gpr[1] = newsp;
@@ -2056,7 +2056,7 @@ enum {
 
 /*/
 
-static inline void cpu_get_tb_cpu_state(CPUState *env, target_ulong *pc,
+static inline void cpu_get_tb_cpu_state(CPUPPCState *env, target_ulong *pc,
 target_ulong *cs_base, int *flags)
 {
 *pc = env->nip;
@@ -2064,7 +2064,7 @@ static inline void cpu_get_tb_cpu_state(CPUState *env, 
target_ulong *pc,
 *flags = env->hflags;
 }
 
-static inline void cpu_set_tls(CPUState *env, target_ulong newtls)
+static inline void cpu_set_tls(CPUPPCState *env, target_ulong newtls)
 {
 #if defined(TARGET_PPC64)
 /* The kernel checks TIF_32BIT here; we don't support loading 32-bit
@@ -2076,7 +2076,7 @@ static inline void cpu_set_tls(CPUState *env, 
target_ulong newtls)
 }
 
 #if !defined(CONFIG_USER_ONLY)
-static inline int booke206_tlbm_id(CPUState *env, ppcmas_tlb_t *tlbm)
+static inline int booke206_tlbm_id(CPUPPCState *env, ppcmas_tlb_t *tlbm)
 {
 uintptr_t tlbml = (uintptr_t)tlbm;
 uintptr_t tlbl = (uintptr_t)env->tlb.tlbm;
@@ -2084,21 +2084,21 @@ static inline int booke206_tlbm_id(CPUState *env, 
ppcmas_tlb_t *tlbm)
 return (tlbml - tlbl) / sizeof(env->tlb.tlbm[0]);
 }
 
-static inline int booke206_tlb_size(CPUState *env, int tlbn)
+static inline int booke206_tlb_size(CPUPPCState *env, int tlbn)
 {
 uint32_t tlbncfg = env->spr[SPR_BOOKE_TLB0CFG + tlbn];
 int r = tlbncfg & TLBnCFG_N_ENTRY;
 return r;
 }
 
-static inline int booke206_tlb_ways(CPUState *env, int tlbn)
+static inline int booke206_tlb_ways(CPUPPCState *env, int tlbn)
 {
 uint32_t tlbncfg = env->spr[SPR_BOOKE_TLB0CFG + tlbn];
 int r = tlbncfg >> TLBnCFG_ASSOC_SHIFT;
 return r;
 }
 
-static inline int booke206_tlbm_to_tlbn(CPUState *env, ppcmas_tlb_t *tlbm)
+static inline int booke206_tlbm_to_tlbn(CPUPPCState *env, ppcmas_tlb_t *tlbm)
 {
 int id = booke206_tlbm_id(env, tlbm);
 int end = 0;
@@ -2115,14 +2115,14 @@ static inline int booke206_tlbm_to_tlbn(CPUState *env, 
ppcmas_tlb_t *tlbm)
 return 0;
 }
 
-static inline int booke206_tlbm_to_way(CPUState *env, ppcmas_tlb_t *tlb)
+static inline int booke206_tlbm_to_way(CPUPPCState *env, ppcmas_tlb_t *tlb)
 {
 int tlbn = booke206_tlbm_to_tlbn(env, tlb);
 int tlbid = booke206_tlbm_id(env, tlb);
 return tlbid & (booke206_tlb_ways(env, tlbn) - 1);
 }
 
-static inline ppcmas_tlb_t *booke206_get_tlbm(CPUState *env, const int tlbn,
+static inline ppcmas_tlb_t *booke206_get_tlbm(CP

[PATCH v5 21/43] target-s390x: Don't overuse CPUState

2012-03-14 Thread Andreas Färber
Scripted conversion:
  sed -i "s/CPUState/CPUS390XState/g" target-s390x/*.[hc]
  sed -i "s/#define CPUS390XState/#define CPUState/" target-s390x/cpu.h

Signed-off-by: Andreas Färber 
Acked-by: Anthony Liguori 
---
 target-s390x/cpu.h   |   46 +++---
 target-s390x/helper.c|   38 
 target-s390x/kvm.c   |   56 ++--
 target-s390x/op_helper.c |   70 +++---
 target-s390x/translate.c |   56 ++--
 5 files changed, 133 insertions(+), 133 deletions(-)

diff --git a/target-s390x/cpu.h b/target-s390x/cpu.h
index e892bec..af6cc4e 100644
--- a/target-s390x/cpu.h
+++ b/target-s390x/cpu.h
@@ -106,7 +106,7 @@ typedef struct CPUS390XState {
 } CPUS390XState;
 
 #if defined(CONFIG_USER_ONLY)
-static inline void cpu_clone_regs(CPUState *env, target_ulong newsp)
+static inline void cpu_clone_regs(CPUS390XState *env, target_ulong newsp)
 {
 if (newsp) {
 env->regs[15] = newsp;
@@ -233,7 +233,7 @@ static inline void cpu_clone_regs(CPUState *env, 
target_ulong newsp)
 #define FLAG_MASK_64(PSW_MASK_64 >> 32)
 #define FLAG_MASK_320x1000
 
-static inline int cpu_mmu_index (CPUState *env)
+static inline int cpu_mmu_index (CPUS390XState *env)
 {
 if (env->psw.mask & PSW_MASK_PSTATE) {
 return 1;
@@ -242,7 +242,7 @@ static inline int cpu_mmu_index (CPUState *env)
 return 0;
 }
 
-static inline void cpu_get_tb_cpu_state(CPUState* env, target_ulong *pc,
+static inline void cpu_get_tb_cpu_state(CPUS390XState* env, target_ulong *pc,
 target_ulong *cs_base, int *flags)
 {
 *pc = env->psw.addr;
@@ -275,7 +275,7 @@ CPUS390XState *cpu_s390x_init(const char *cpu_model);
 void s390x_translate_init(void);
 int cpu_s390x_exec(CPUS390XState *s);
 void cpu_s390x_close(CPUS390XState *s);
-void do_interrupt (CPUState *env);
+void do_interrupt (CPUS390XState *env);
 
 /* you can call this signal handler from your SIGBUS and SIGSEGV
signal handlers to inform the virtual CPU of exceptions. non zero
@@ -288,42 +288,42 @@ int cpu_s390x_handle_mmu_fault (CPUS390XState *env, 
target_ulong address, int rw
 
 
 #ifndef CONFIG_USER_ONLY
-int s390_virtio_hypercall(CPUState *env, uint64_t mem, uint64_t hypercall);
+int s390_virtio_hypercall(CPUS390XState *env, uint64_t mem, uint64_t 
hypercall);
 
 #ifdef CONFIG_KVM
-void kvm_s390_interrupt(CPUState *env, int type, uint32_t code);
-void kvm_s390_virtio_irq(CPUState *env, int config_change, uint64_t token);
-void kvm_s390_interrupt_internal(CPUState *env, int type, uint32_t parm,
+void kvm_s390_interrupt(CPUS390XState *env, int type, uint32_t code);
+void kvm_s390_virtio_irq(CPUS390XState *env, int config_change, uint64_t 
token);
+void kvm_s390_interrupt_internal(CPUS390XState *env, int type, uint32_t parm,
  uint64_t parm64, int vm);
 #else
-static inline void kvm_s390_interrupt(CPUState *env, int type, uint32_t code)
+static inline void kvm_s390_interrupt(CPUS390XState *env, int type, uint32_t 
code)
 {
 }
 
-static inline void kvm_s390_virtio_irq(CPUState *env, int config_change,
+static inline void kvm_s390_virtio_irq(CPUS390XState *env, int config_change,
uint64_t token)
 {
 }
 
-static inline void kvm_s390_interrupt_internal(CPUState *env, int type,
+static inline void kvm_s390_interrupt_internal(CPUS390XState *env, int type,
uint32_t parm, uint64_t parm64,
int vm)
 {
 }
 #endif
-CPUState *s390_cpu_addr2state(uint16_t cpu_addr);
-void s390_add_running_cpu(CPUState *env);
-unsigned s390_del_running_cpu(CPUState *env);
+CPUS390XState *s390_cpu_addr2state(uint16_t cpu_addr);
+void s390_add_running_cpu(CPUS390XState *env);
+unsigned s390_del_running_cpu(CPUS390XState *env);
 
 /* from s390-virtio-bus */
 extern const target_phys_addr_t virtio_size;
 
 #else
-static inline void s390_add_running_cpu(CPUState *env)
+static inline void s390_add_running_cpu(CPUS390XState *env)
 {
 }
 
-static inline unsigned s390_del_running_cpu(CPUState *env)
+static inline unsigned s390_del_running_cpu(CPUS390XState *env)
 {
 return 0;
 }
@@ -944,11 +944,11 @@ static inline void ebcdic_put(uint8_t *p, const char 
*ascii, int len)
 #define SIGP_STAT_INVALID_ORDER 0x0002UL
 #define SIGP_STAT_RECEIVER_CHECK0x0001UL
 
-void load_psw(CPUState *env, uint64_t mask, uint64_t addr);
-int mmu_translate(CPUState *env, target_ulong vaddr, int rw, uint64_t asc,
+void load_psw(CPUS390XState *env, uint64_t mask, uint64_t addr);
+int mmu_translate(CPUS390XState *env, target_ulong vaddr, int rw, uint64_t asc,
   target_ulong *raddr, int *flags);
-int sclp_service_call(CPUState *env, uint32_t sccb, uint64_t code);
-uint32_t calc_cc(CPUState *env, uint32_t cc_op, uint64_t src, uint6

Re: [Qemu-devel] [PATCH 0/2] Lost VNC patches

2012-03-14 Thread Anthony Liguori

On 03/14/2012 01:58 AM, Corentin Chary wrote:

Hi Anthony,

Please merge these two patchs from another age, they fix crash in the VNC
server (the iohandler one is only for the threaded server).


Applied.  Thanks.

Regards,

Anthony Liguori



Thanks,

Corentin Chary (1):
   vnc: don't mess up with iohandlers in the vnc thread

Stefan Weil (1):
   vnc: Limit r/w access to size of allocated memory

  ui/vnc-jobs-async.c |   48 +---
  ui/vnc-jobs.h   |1 +
  ui/vnc.c|   15 +++
  ui/vnc.h|2 ++
  4 files changed, 47 insertions(+), 19 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 0/2] Lost VNC patches

2012-03-14 Thread Stefan Weil

Am 14.03.2012 22:46, schrieb Anthony Liguori:

On 03/14/2012 01:58 AM, Corentin Chary wrote:

Hi Anthony,

Please merge these two patchs from another age, they fix crash in the 
VNC

server (the iohandler one is only for the threaded server).


Applied.  Thanks.

Regards,

Anthony Liguori



The commit time of my patch was modified here. I had sent the patch on
March 15, 2011, 6:45 p.m., so you could also have waited a day longer
until its first birthday :-)

There is a more serious background why I write this mail: commit
requests should not modify the time when a patch was written.

See http://patchwork.ozlabs.org/patch/87029/ for the original.

Cheers,
Stefan W.





Thanks,

Corentin Chary (1):
   vnc: don't mess up with iohandlers in the vnc thread

Stefan Weil (1):
   vnc: Limit r/w access to size of allocated memory

  ui/vnc-jobs-async.c |   48 
+---

  ui/vnc-jobs.h   |1 +
  ui/vnc.c|   15 +++
  ui/vnc.h|2 ++
  4 files changed, 47 insertions(+), 19 deletions(-)





--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3 0/7] RTC: New logic to emulate RTC

2012-03-14 Thread Zhang, Yang Z
> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Wednesday, March 14, 2012 4:54 PM
> To: Zhang, Yang Z
> Cc: qemu-de...@nongnu.org; Jan Kiszka; kvm@vger.kernel.org;
> aligu...@us.ibm.com; Marcelo Tosatti
> Subject: Re: [PATCH v3 0/7] RTC: New logic to emulate RTC
> 
> Il 14/03/2012 09:52, Zhang, Yang Z ha scritto:
> > Is there any comments with the version 3?
> >>>
> >>> Can you explain why you dropped the logic to set the timer to the
> >>> next event?
> > Do you mean why I change the rtc logic? The reason is that: When a
> > guest is idle, the main activity inside qemu is the rtc update
> > timer(2 per second). In our experience(running 64 rhel6u1 guests), it
> > will decrease pkg C6 residency about 6%(6% means 2 watts in my box).
> > And normally, the guest will not use the update-end interrupt and
> > alarm. So there has no need to run a periodic timer when guest isn't
> > using it.
> 
> No, why you're keeping roughly the same logic as current QEMU, instead
> of the more radical changes that were in v2.
You are right. Actually, the v4 is ready and it uses the same logic with v2. 
Since I have other high priority task in hand, I don't test v4 too much. So i 
plan to delay it for a while and hope v3 can be accepted before v4 is ready.
If you really doesn't like the v3, I will pay more effort in v4 and will send 
out it ASAP. :)

best regards
yang

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: VMX: call invvpid only when EPT is disabled

2012-03-14 Thread Davidlohr Bueso
From: Davidlohr Bueso 

With EPT enabled it is not required to explicitly run invvpid to invalidate 
tagged TLB entries, as
KVM does not force vm-exits for cr3 writes and invlpg. Run invvpid only when 
these instructions
are emulated and shadow pages are used.

Signed-off-by: Davidlohr Bueso 
---
 arch/x86/kvm/vmx.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2c22fc7..51c7fb9 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2861,12 +2861,13 @@ static void exit_lmode(struct kvm_vcpu *vcpu)
 
 static void vmx_flush_tlb(struct kvm_vcpu *vcpu)
 {
-   vpid_sync_context(to_vmx(vcpu));
if (enable_ept) {
if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
return;
ept_sync_context(construct_eptp(vcpu->arch.mmu.root_hpa));
}
+   else
+   vpid_sync_context(to_vmx(vcpu));
 }
 
 static void vmx_decache_cr0_guest_bits(struct kvm_vcpu *vcpu)
-- 
1.7.4.1



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM: x86: fix kvm_write_tsc() TSC matching thinko

2012-03-14 Thread Marcelo Tosatti
On Wed, Mar 14, 2012 at 12:02:41PM +0100, Jan Kiszka wrote:
> On 2012-03-08 22:46, Marcelo Tosatti wrote:
> > 
> > kvm_write_tsc() converts from guest TSC to microseconds, not nanoseconds
> > as intended. The result is that the window for matching is 1000 seconds,
> > not 1 second.
> > 
> > Microsecond precision is enough for checking whether the TSC write delta
> > is within the heuristic values, so use it instead of nanoseconds.
> > 
> > Noted by Avi Kivity.
> > 
> > Signed-off-by: Marcelo Tosatti 
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 32096cf..9732e98 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1025,7 +1025,7 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
> > struct kvm *kvm = vcpu->kvm;
> > u64 offset, ns, elapsed;
> > unsigned long flags;
> > -   s64 nsdiff;
> > +   s64 usdiff;
> >  
> > raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
> > offset = kvm_x86_ops->compute_tsc_offset(vcpu, data);
> > @@ -1033,18 +1033,18 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
> > elapsed = ns - kvm->arch.last_tsc_nsec;
> >  
> > /* n.b - signed multiplication and division required */
> > -   nsdiff = data - kvm->arch.last_tsc_write;
> > +   usdiff = data - kvm->arch.last_tsc_write;
> >  #ifdef CONFIG_X86_64
> > -   nsdiff = (nsdiff * 1000) / vcpu->arch.virtual_tsc_khz;
> > +   usdiff = (usdiff * 1000) / vcpu->arch.virtual_tsc_khz;
> >  #else
> > /* do_div() only does unsigned */
> > asm("idivl %2; xor %%edx, %%edx"
> > -   : "=A"(nsdiff)
> > -   : "A"(nsdiff * 1000), "rm"(vcpu->arch.virtual_tsc_khz));
> > +   : "=A"(usdiff)
> > +   : "A"(usdiff * 1000), "rm"(vcpu->arch.virtual_tsc_khz));
> >  #endif
> > -   nsdiff -= elapsed;
> > -   if (nsdiff < 0)
> > -   nsdiff = -nsdiff;
> > +   usdiff -= elapsed / 1000;
> 
> do_div, to please x86-32.
> 
> Jan

Fixed, thanks Jan.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 0/2] Lost VNC patches

2012-03-14 Thread Corentin Chary
On Wed, Mar 14, 2012 at 11:16 PM, Stefan Weil  wrote:
> Am 14.03.2012 22:46, schrieb Anthony Liguori:
>
>> On 03/14/2012 01:58 AM, Corentin Chary wrote:
>>>
>>> Hi Anthony,
>>>
>>> Please merge these two patchs from another age, they fix crash in the VNC
>>> server (the iohandler one is only for the threaded server).
>>
>>
>> Applied.  Thanks.
>>
>> Regards,
>>
>> Anthony Liguori
>>
>
> The commit time of my patch was modified here. I had sent the patch on
> March 15, 2011, 6:45 p.m., so you could also have waited a day longer
> until its first birthday :-)
>
> There is a more serious background why I write this mail: commit
> requests should not modify the time when a patch was written.
>
> See http://patchwork.ozlabs.org/patch/87029/ for the original.

Hello Stefan,

Sorry for that, but date was correct on the patch I've sent, it seems
that some smtp server (or the list ?) took the liberty to change it.
I should have sent a pull link instead.

Thanks,

-- 
Corentin Chary
http://xf.iksaif.net
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html