Re: [Qemu-devel] [PATCH 07/25] qdev: use object_property_print in info qtree

2012-05-16 Thread Paolo Bonzini
> Am 03.04.2012 15:05, schrieb Paolo Bonzini:
> > Il 03/04/2012 14:28, Jan Kiszka ha scritto:
>   if (object_property_get_type(OBJECT(dev), legacy_name,
>   NULL)) {
>   value = object_property_get_str(OBJECT(dev),
>   legacy_name, &err);
> >> [...] We should either
> >> assert(print && parse) or handle their non-existence properly.
> > 
> > Yes, asserting that print/parse are always present in pairs (if at
> > all) would be good.
> 
> Paolo, has this issue been addressed somewhere?

No, but it's a simple assertion.  It doesn't look like 1.1 material.
I should send and rebase the bus series today, I'll include it.

Paolo



[Qemu-devel] [PATCH] Prevent disk data loss when closing qemu

2012-05-16 Thread Pavel Dovgaluk
Prevent disk data loss when closing qemu window.

Signed-off-by: Pavel Dovgalyuk 
---
 vl.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/vl.c b/vl.c
index 23ab3a3..b6cfd29 100644
--- a/vl.c
+++ b/vl.c
@@ -3650,10 +3650,10 @@ int main(int argc, char **argv, char **envp)
 }
 
 os_setup_post();
+atexit(bdrv_close_all);
 
 resume_all_vcpus();
 main_loop();
-bdrv_close_all();
 pause_all_vcpus();
 net_cleanup();
 res_free();






Re: [Qemu-devel] [PATCH 07/25] qdev: use object_property_print in info qtree

2012-05-16 Thread Paolo Bonzini


- Messaggio originale -
> Da: "Paolo Bonzini" 
> A: "Andreas Färber" 
> Cc: "Jan Kiszka" , aligu...@us.ibm.com, 
> qemu-devel@nongnu.org
> Inviato: Mercoledì, 16 maggio 2012 9:40:12
> Oggetto: Re: [Qemu-devel] [PATCH 07/25] qdev: use object_property_print in 
> info qtree
>
> > Am 03.04.2012 15:05, schrieb Paolo Bonzini:
> > > Il 03/04/2012 14:28, Jan Kiszka ha scritto:
> >   if (object_property_get_type(OBJECT(dev),
> >   legacy_name,
> >   NULL)) {
> >   value = object_property_get_str(OBJECT(dev),
> >   legacy_name, &err);
> > >> [...] We should either
> > >> assert(print && parse) or handle their non-existence properly.
> > >
> > > Yes, asserting that print/parse are always present in pairs (if
> > > at
> > > all) would be good.
> >
> > Paolo, has this issue been addressed somewhere?
>
> No, but it's a simple assertion.  It doesn't look like 1.1 material.
> I should send and rebase the bus series today, I'll include it.

Doh, it was handled by

commit 68ee356941801d0a17fdc43b11ac3e6b72fcd597
Author: Paolo Bonzini 
Date:   Thu Feb 2 10:17:19 2012 +0100

qdev: allow reusing get/set for legacy property

print/parse do not have to be present in pairs anymore.

Paolo



Re: [Qemu-devel] Get current env within io_handler ?

2012-05-16 Thread nicolas.sauzede
> Yes, it's entirely intentional to prevent io handlers from accessing CPUState.
> 
> For what you're doing, you need to hook more deeply into target-arm before 
> the 
> dispatch actually happens.

Ok, but then I guess that this kind of hooks may be less generic than the 
traditional
"io_handler" I used for my qemu/TLM binding, right ?
Are there any example code showing such kind of hooks in qemu source ?
And is this method advisable, in term of maintenance, etc. ?
(what if we want to support multiple qemu targets, should we duplicate hooks 
code ?)

> Regards,
> 
> Anthony Liguori

Thanks for your support,
NS.


Une messagerie gratuite, garantie à vie et des services en plus, ça vous tente ?
Je crée ma boîte mail www.laposte.net



Re: [Qemu-devel] Get current env within io_handler ?

2012-05-16 Thread nicolas.sauzede
> First, please don't top-post and please don't use HTML emails.

Sorry about that.

> Yes, there is work towards getting rid of implicit AREG0 env. This will
> be leading towards removing the register-pinned AREG0.

Will this AREG0 removal be optional/configurable if the patches hit the 
mainstream ?

> So you should rather figure out why you want to tie emulation of devices
> to the CPU requesting it.

Well, for example, we have the issue where we need to know if the cpu that 
performs a hardware io is in priviledged/secure more, because some HW devices 
implemented in TLM requires such special flags on certain register accesses.
How can we know that access property, when called back into an "io_handler" ?

Also, I think about some specific IPs such as local timers and such, all seen 
at the same address by all the smp cpu, then how can we know what cpu number 
originated the io transaction ?

> Andreas
>

Thanks again,
NS.

Une messagerie gratuite, garantie à vie et des services en plus, ça vous tente ?
Je crée ma boîte mail www.laposte.net



Re: [Qemu-devel] [PATCH 2/3 v6] Replace bdrv_* to bdrv_aio_* functions in DMA mode in fdc.c

2012-05-16 Thread Zhi Hui Li

On 2012年05月15日 17:27, Paolo Bonzini wrote:

Il 15/05/2012 11:17, Li Zhi Hui ha scritto:

Signed-off-by: Paolo Bonzini
Signed-off-by: Li Zhi Hui
---
  hw/fdc.c |  313 +
  1 files changed, 210 insertions(+), 103 deletions(-)


To reviewers, this is obviously missing migration support. :)



Yes, you are right.
I am sorry that I am a newer, I don't know more about migration, so I 
want to send some simple code first .

It is also missing a good commit message.  It should say that the
support in the existing code for scan commands has a lot of "weirdness",
for example:

-if ((ret<  0&&  fdctrl->data_dir == FD_DIR_SCANL) ||
-(ret>  0&&  fdctrl->data_dir == FD_DIR_SCANH)) {
-status2 = 0x00;
-goto end_transfer;

...

- end_transfer:
-len = fdctrl->data_pos - start_pos;
-FLOPPY_DPRINTF("end transfer %d %d %d\n",
-   fdctrl->data_pos, len, fdctrl->data_len);
-if (fdctrl->data_dir == FD_DIR_SCANE ||
-fdctrl->data_dir == FD_DIR_SCANL ||
-fdctrl->data_dir == FD_DIR_SCANH)
-status2 = FD_SR2_SEH;

which blindly overwrites status2.  Hence the new code was not written
based on it.  However, the new code is untested as far as I know.


Ok, I will add the commit message.

+if (fdctrl->data_dir == FD_DIR_SCANE ||
+fdctrl->data_dir == FD_DIR_SCANL ||
+fdctrl->data_dir == FD_DIR_SCANH) {
+fdctrl->dma_status2 = FD_SR2_SEH;
+}


This should be FD_SR2_SNS (Spec for the FDC is at
http://www.cepece.info/amstrad/docs/i8272/8272sp.htm: "If the conditions
for scan are not met between the the starting sector (as specified by R)
and the last sector on the cylinder (EOT), then the FDC sets the SN
(Scan Not Satisfied) flag of Status Register 2 to a 1 (high), and
terminates the Scan Command").


ok.

Thank you very much for your feedback. :)





Re: [Qemu-devel] [PATCH 4/4] xen: Fix PV-on-HVM

2012-05-16 Thread Paolo Bonzini
Il 15/05/2012 23:00, Michael S. Tsirkin ha scritto:
> On Tue, May 15, 2012 at 04:26:39PM +0100, Anthony PERARD wrote:
>> In the context of PV-on-HVM under Xen, the emulated nics are supposed to be
>> unplug before the guest drivers are initialized. This mean that there must be
>> unplug without the consent of the guest.
>>
>> Without this patch, the guest end up with two nics with the same MAC, the
>> emulated nic and the PV nic.
>>
>> Signed-off-by: Anthony PERARD 
> 
> OK, so on Xen there are special devices that can be safely removed
> without telling the guest?  Does there need to be regular hotplug for
> these devices too? Or can it be always surprise removal?

On Xen the PV drivers can ask the firmware to surprise-remove the
emulated NICs.  Of course it has to do it early enough so that the guest
doesn't crash.

Paolo




Re: [Qemu-devel] [PATCH 4/4] xen: Fix PV-on-HVM

2012-05-16 Thread Michael S. Tsirkin
On Wed, May 16, 2012 at 10:06:25AM +0200, Paolo Bonzini wrote:
> Il 15/05/2012 23:00, Michael S. Tsirkin ha scritto:
> > On Tue, May 15, 2012 at 04:26:39PM +0100, Anthony PERARD wrote:
> >> In the context of PV-on-HVM under Xen, the emulated nics are supposed to be
> >> unplug before the guest drivers are initialized. This mean that there must 
> >> be
> >> unplug without the consent of the guest.
> >>
> >> Without this patch, the guest end up with two nics with the same MAC, the
> >> emulated nic and the PV nic.
> >>
> >> Signed-off-by: Anthony PERARD 
> > 
> > OK, so on Xen there are special devices that can be safely removed
> > without telling the guest?  Does there need to be regular hotplug for
> > these devices too? Or can it be always surprise removal?
> 
> On Xen the PV drivers can ask the firmware to surprise-remove the
> emulated NICs.

So driver tells firmware (meaning acpi? how?) that it's ok
to do surprize removal?

> Of course it has to do it early enough so that the guest
> doesn't crash.
> 
> Paolo

What does early enough mean and how do we ensure that?

-- 
MST



Re: [Qemu-devel] [PATCH 04/21] qom: make Object a type

2012-05-16 Thread Andreas Färber
Am 02.05.2012 13:30, schrieb Paolo Bonzini:
> Right now the base Object class has a special NULL type.  Change this so
> that we will be able to add class_init and class_base_init callbacks.
> To do this, remove some special casing of ObjectClass that is not really
> necessary.
> 
> Signed-off-by: Paolo Bonzini 

Anthony, you had Reviewed-by an earlier version of this patch but then
did it differently in your series. Should I apply this to qom-next or is
there some issue with it? The issue of unintended NULL .parent that I
raised is being addressed with a followup patch.

Thanks,
Andreas

> ---
>  include/qemu/object.h |2 +-
>  qom/object.c  |   59 
> +
>  2 files changed, 31 insertions(+), 30 deletions(-)
> 
> diff --git a/include/qemu/object.h b/include/qemu/object.h
> index 597a2f6..9c49d99 100644
> --- a/include/qemu/object.h
> +++ b/include/qemu/object.h
> @@ -33,7 +33,7 @@ typedef struct TypeInfo TypeInfo;
>  typedef struct InterfaceClass InterfaceClass;
>  typedef struct InterfaceInfo InterfaceInfo;
>  
> -#define TYPE_OBJECT NULL
> +#define TYPE_OBJECT "object"
>  
>  /**
>   * SECTION:object.h
> diff --git a/qom/object.c b/qom/object.c
> index e66d3ab..88ec417 100644
> --- a/qom/object.c
> +++ b/qom/object.c
> @@ -210,7 +210,7 @@ static void type_class_interface_init(TypeImpl *ti, 
> InterfaceImpl *iface)
>  
>  static void type_initialize(TypeImpl *ti)
>  {
> -size_t class_size = sizeof(ObjectClass);
> +TypeImpl *parent;
>  int i;
>  
>  if (ti->class) {
> @@ -221,30 +221,24 @@ static void type_initialize(TypeImpl *ti)
>  ti->instance_size = type_object_get_size(ti);
>  
>  ti->class = g_malloc0(ti->class_size);
> -ti->class->type = ti;
> -
> -if (type_has_parent(ti)) {
> -TypeImpl *parent = type_get_parent(ti);
>  
> +parent = type_get_parent(ti);
> +if (parent) {
>  type_initialize(parent);
>  
> -class_size = parent->class_size;
>  g_assert(parent->class_size <= ti->class_size);
> +memcpy(ti->class, parent->class, parent->class_size);
> +}
>  
> -memcpy((void *)ti->class + sizeof(ObjectClass),
> -   (void *)parent->class + sizeof(ObjectClass),
> -   parent->class_size - sizeof(ObjectClass));
> +ti->class->type = ti;
>  
> -while (parent) {
> -if (parent->class_base_init) {
> -parent->class_base_init(ti->class, ti->class_data);
> -}
> -parent = type_get_parent(parent);
> +while (parent) {
> +if (parent->class_base_init) {
> +parent->class_base_init(ti->class, ti->class_data);
>  }
> +parent = type_get_parent(parent);
>  }
>  
> -memset((void *)ti->class + class_size, 0, ti->class_size - class_size);
> -
>  for (i = 0; i < ti->num_interfaces; i++) {
>  type_class_interface_init(ti, &ti->interfaces[i]);
>  }
> @@ -467,19 +461,6 @@ Object *object_dynamic_cast(Object *obj, const char 
> *typename)
>  }
>  
>  
> -static void register_types(void)
> -{
> -static TypeInfo interface_info = {
> -.name = TYPE_INTERFACE,
> -.instance_size = sizeof(Interface),
> -.abstract = true,
> -};
> -
> -type_interface = type_register_static(&interface_info);
> -}
> -
> -type_init(register_types)
> -
>  Object *object_dynamic_cast_assert(Object *obj, const char *typename)
>  {
>  Object *inst;
> @@ -1216,3 +1197,23 @@ void object_property_add_str(Object *obj, const char 
> *name,
>  property_release_str,
>  prop, errp);
>  }
> +
> +static void register_types(void)
> +{
> +static TypeInfo interface_info = {
> +.name = TYPE_INTERFACE,
> +.instance_size = sizeof(Interface),
> +.abstract = true,
> +};
> +
> +static TypeInfo object_info = {
> +.name = TYPE_OBJECT,
> +.instance_size = sizeof(Object),
> +.abstract = true,
> +};
> +
> +type_interface = type_register_static(&interface_info);
> +type_register_static(&object_info);
> +}
> +
> +type_init(register_types)


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



Re: [Qemu-devel] [PATCH] Prevent disk data loss when closing qemu

2012-05-16 Thread dunrong huang
What's the difference of these two method to call bdrv_close_all?

If you close qemu window, the main_loop will return immediately, and call
bdrv_close_all.

2012/5/16 Pavel Dovgaluk 

> Prevent disk data loss when closing qemu window.
>
> Signed-off-by: Pavel Dovgalyuk 
> ---
>  vl.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/vl.c b/vl.c
> index 23ab3a3..b6cfd29 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -3650,10 +3650,10 @@ int main(int argc, char **argv, char **envp)
> }
>
> os_setup_post();
> +atexit(bdrv_close_all);
>
> resume_all_vcpus();
> main_loop();
> -bdrv_close_all();
> pause_all_vcpus();
> net_cleanup();
> res_free();
>
>
>
>
>


-- 
linuxer and emacser and pythoner living in beijing
blog: http://mathslinux.org
twitter: https://twitter.com/mathslinux
google+: https://plus.google.com/118129852578326338750


Re: [Qemu-devel] [PATCH] Avoid segfault in cpu_dump_state

2012-05-16 Thread Fabien Chouteau
On 05/16/2012 05:50 AM, Andreas Färber wrote:
> Am 15.05.2012 18:08, schrieb Fabien Chouteau:
>> On 05/15/2012 03:31 PM, Andreas Färber wrote:
>>> Am 15.05.2012 11:39, schrieb Fabien Chouteau:
 Do not call cpu_dump_state if logfile is NULL.
>>>
>>> And where is log_cpu_state() being called from? Its caller is passing
>>> NULL already then.
>>>
>>
>> No, logfile is a global variable. log_cpu_state() takes only CPUState
>> and flags parameters.
> 
> Ah, I see now that f is a different f here, logfile becomes
> log_cpu_state()'s f. Unfortunate naming.
> 
> Your fix looks OK then but I would recommend turning it into a static
> inline function to avoid the line breaks.
> 

In this case I can rewrite all the macros in qemu-log.h to static inline.

-- 
Fabien Chouteau



Re: [Qemu-devel] [PATCH 4/4] xen: Fix PV-on-HVM

2012-05-16 Thread Paolo Bonzini
Il 16/05/2012 10:13, Michael S. Tsirkin ha scritto:
> On Wed, May 16, 2012 at 10:06:25AM +0200, Paolo Bonzini wrote:
>> On Xen the PV drivers can ask the firmware to surprise-remove the
>> emulated NICs.
> 
> So driver tells firmware (meaning acpi? how?) that it's ok
> to do surprize removal?

It writes something to some I/O port, and then QEMU surprise-removes the
NICs.

>> Of course it has to do it early enough so that the guest
>> doesn't crash.
> 
> What does early enough mean and how do we ensure that?

Early enough means that the I/O port is written very early in the boot
process, even before the PCI bus is scanned by the OS.

You don't ensure it, it's up to the OS.  The OS knows whether its
drivers can cope properly with surprise removal.  If they can, in
principle it could write the magic value whenever it wants to.

Paolo



[Qemu-devel] A error abou t gtester-report, can anyone help me, thank you !

2012-05-16 Thread Zhi Hui Li

When I use gtest.

Compiler did not have a problem, the use of gtester implementation and 
generate xml file testlog also no problem, but in the use of

gterster-report, the output is as follows:

gtester-report  test.log > test-report.html



Traceback (most recent call last):
  File "/usr/bin/gtester-report", line 492, in 
main()
  File "/usr/bin/gtester-report", line 486, in main
HTMLReportWriter(rr.get_info(), rr.binary_list()).printout()
  File "/usr/bin/gtester-report", line 350, in printout
self.handle_info ()
  File "/usr/bin/gtester-report", line 244, in handle_info
self.oprint ('Package: %(package)s, version: 
%(version)s\n' % self.info)

KeyError: 'package'
li@mm:~/ceshi/g_test$ gtester-report test.log > test-report.html
Traceback (most recent call last):
  File "/usr/bin/gtester-report", line 492, in 
main()
  File "/usr/bin/gtester-report", line 486, in main
HTMLReportWriter(rr.get_info(), rr.binary_list()).printout()
  File "/usr/bin/gtester-report", line 350, in printout
self.handle_info ()
  File "/usr/bin/gtester-report", line 244, in handle_info
self.oprint ('Package: %(package)s, version: 
%(version)s\n' % self.info)

KeyError: 'package'


I don't know why the error " KeyError: 'package' " appear. can anybody 
help me ?

Thank you very much!




Re: [Qemu-devel] [PATCH 2/3 v6] Replace bdrv_* to bdrv_aio_* functions in DMA mode in fdc.c

2012-05-16 Thread Paolo Bonzini
Il 16/05/2012 10:23, Zhi Hui Li ha scritto:
> 
> Yes , I think maybe Paolo is right.
> 
> Because the spec is incredibly complex and obscure and I am newer.
> To write the whole code's qtest beyond my ability. I am afraid I can't
> finish it. so I want only do a qtest about basic read/write in PIO
> and DMA modes. I don't know whether it is OK.

That's a start, go ahead.

Paolo




Re: [Qemu-devel] [PATCH 2/3 v6] Replace bdrv_* to bdrv_aio_* functions in DMA mode in fdc.c

2012-05-16 Thread Zhi Hui Li

On 2012年05月15日 17:38, Paolo Bonzini wrote:

Il 15/05/2012 11:33, Kevin Wolf ha scritto:

which blindly overwrites status2.  Hence the new code was not written
based on it.  However, the new code is untested as far as I know.

In the thread of an earlier version of this series, I said that a qtest
for floppy is required. This only confirms it.


The problem with writing a qtest is that the spec is incredibly complex
and obscure.  It's probably even better to rip out code that cannot be
tested properly, so you don't have to test it at all...

(Mostly tongue-in-cheek of course.  A qtest for basic read/write in PIO
and DMA modes is indeed a very good idea).

Paolo




Yes , I think maybe Paolo is right.

Because the spec is incredibly complex and obscure and I am newer.
To write the whole code's qtest beyond my ability. I am afraid I can't 
finish it. so I want only do a qtest about basic read/write in PIO

and DMA modes. I don't know whether it is OK.


(I don't know whether we can use qtest to replace the real test, 
especially on PIO mode 's test.)


Thank you very much.





Re: [Qemu-devel] [PATCH next v2 00/74] QOM CPUState, part 3: CPU reset

2012-05-16 Thread Andreas Färber
Am 15.05.2012 17:16, schrieb Igor Mammedov:
> On Thu, May 10, 2012 at 02:13:38AM +0200, Andreas Färber wrote:
>>   target-i386: Pass X86CPU to do_cpu_{init,sipi}()
>>   target-i386: Let cpu_x86_init() return X86CPU
>>   pc: Use cpu_x86_init() to obtain X86CPU
>>   pc: Pass X86CPU to pc_cpu_reset()
> 
> Reviewed-by: Igor Mammedov 

Thanks, applied to qom-next:
http://repo.or.cz/w/qemu/afaerber.git/shortlog/refs/heads/qom-next

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



Re: [Qemu-devel] [PATCH 1.1] linux-user: Fix stale tbs after mmap

2012-05-16 Thread Riku Voipio
On Tue, May 15, 2012 at 04:32:51PM -0500, Anthony Liguori wrote:
> Riku,
> 
> Can you review/ack this patch?

Acked-by: Riku Voipio 

Are there any other Linux-user patches to consider for 1.1 ?

> 
> Regards,
> 
> Anthony Liguori
> 
> 
> On 05/15/2012 03:35 PM, Peter Maydell wrote:
> >Ping? This is 1.1 material in my opinion...
> >
> >(patchwork url: http://patchwork.ozlabs.org/patch/158556/)
> >
> >-- PMM
> >
> >On 11 May 2012 17:25, Peter Maydell  wrote:
> >>On 11 May 2012 09:40, Alexander Graf  wrote:
> >>>If we execute linux-user code that does the following:
> >>>
> >>>  * A = mmap()
> >>>  * execute code in A
> >>>  * munmap(A)
> >>>  * B = mmap(), but mmap returns the same address as A
> >>>  * execute code in B
> >>>
> >>>we end up executing a stale cached tb that contains translated code
> >>>from A, while we want new code from B.
> >>>
> >>>This patch adds a TB flush for mmap'ed regions, before we return them,
> >>>avoiding the whole issue. It also adds a flush for munmap, so that we
> >>>don't execute stale TBs instead of getting a segfault.
> >>>
> >>>Reported-by: Peter Maydell
> >>>Signed-off-by: Alexander Graf
> >>
> >>Reviewed-by: Peter Maydell
> >>
> >>-- PMM
> >



Re: [Qemu-devel] [PATCH 1.1] linux-user: Fix stale tbs after mmap

2012-05-16 Thread Andreas Färber
Am 16.05.2012 11:26, schrieb Riku Voipio:
> On Tue, May 15, 2012 at 04:32:51PM -0500, Anthony Liguori wrote:
>> Riku,
>>
>> Can you review/ack this patch?
> 
> Acked-by: Riku Voipio 
> 
> Are there any other Linux-user patches to consider for 1.1 ?

I had noticed that Alex' binfmt wrappers did not get applied, but
although they fix argv[0] issues they're more of a feature than a bugfix.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



[Qemu-devel] [PATCH] Add event notification for guest balloon changes

2012-05-16 Thread Daniel P. Berrange
From: "Daniel P. Berrange" 

After setting a balloon target value, applications have to
continually poll 'query-balloon' to determine whether the
guest has reacted to this request. The virtio-balloon backend
knows exactly when the guest has reacted though, and thus it
is possible to emit a JSON event to tell the mgmt application
whenever the guest balloon changes.

This introduces a new 'qemu_balloon_change()' API which is
to be called by balloon driver backends, whenever they have
a change in balloon value. This takes the 'actual' balloon
value, as would be found in the BalloonInfo struct.

The qemu_balloon_change API emits a JSON monitor event which
looks like:

  {"timestamp": {"seconds": 1337162462, "microseconds": 814521},
   "event": "BALLOON_CHANGE", "data": {"actual": 944766976}}

* balloon.c, balloon.h: Introduce qemu_balloon_change() for
  emitting balloon change events on the monitor
* hw/virtio-balloon.c: Invoke qemu_balloon_change() whenever
  the guest changes the balloon actual value
* monitor.c, monitor.h: Define QEVENT_BALLOON_CHANGE

Signed-off-by: Daniel P. Berrange 
---
 balloon.c   |   14 ++
 balloon.h   |2 ++
 hw/virtio-balloon.c |5 +
 monitor.c   |3 +++
 monitor.h   |1 +
 5 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/balloon.c b/balloon.c
index aa354f7..913862b 100644
--- a/balloon.c
+++ b/balloon.c
@@ -30,6 +30,7 @@
 #include "balloon.h"
 #include "trace.h"
 #include "qmp-commands.h"
+#include "qjson.h"
 
 static QEMUBalloonEvent *balloon_event_fn;
 static QEMUBalloonStatus *balloon_stat_fn;
@@ -80,6 +81,19 @@ static int qemu_balloon_status(BalloonInfo *info)
 return 1;
 }
 
+void qemu_balloon_change(int64_t actual)
+{
+QObject *data;
+
+data = qobject_from_jsonf("{ 'actual': %" PRId64 " }",
+  actual);
+
+monitor_protocol_event(QEVENT_BALLOON_CHANGE, data);
+
+qobject_decref(data);
+}
+
+
 BalloonInfo *qmp_query_balloon(Error **errp)
 {
 BalloonInfo *info;
diff --git a/balloon.h b/balloon.h
index b60fd5d..2ebac0d 100644
--- a/balloon.h
+++ b/balloon.h
@@ -24,4 +24,6 @@ int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
 QEMUBalloonStatus *stat_func, void *opaque);
 void qemu_remove_balloon_handler(void *opaque);
 
+void qemu_balloon_change(int64_t actual);
+
 #endif
diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c
index ce9d2c9..9137573 100644
--- a/hw/virtio-balloon.c
+++ b/hw/virtio-balloon.c
@@ -146,8 +146,13 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
 {
 VirtIOBalloon *dev = to_virtio_balloon(vdev);
 struct virtio_balloon_config config;
+uint32_t oldactual = dev->actual;
 memcpy(&config, config_data, 8);
 dev->actual = le32_to_cpu(config.actual);
+if (dev->actual != oldactual) {
+qemu_balloon_change(ram_size -
+(dev->actual << VIRTIO_BALLOON_PFN_SHIFT));
+}
 }
 
 static uint32_t virtio_balloon_get_features(VirtIODevice *vdev, uint32_t f)
diff --git a/monitor.c b/monitor.c
index 12a6fe2..ef59cd9 100644
--- a/monitor.c
+++ b/monitor.c
@@ -493,6 +493,9 @@ void monitor_protocol_event(MonitorEvent event, QObject 
*data)
 case QEVENT_WAKEUP:
 event_name = "WAKEUP";
 break;
+case QEVENT_BALLOON_CHANGE:
+event_name = "BALLOON_CHANGE";
+break;
 default:
 abort();
 break;
diff --git a/monitor.h b/monitor.h
index 0d49800..8de0160 100644
--- a/monitor.h
+++ b/monitor.h
@@ -41,6 +41,7 @@ typedef enum MonitorEvent {
 QEVENT_DEVICE_TRAY_MOVED,
 QEVENT_SUSPEND,
 QEVENT_WAKEUP,
+QEVENT_BALLOON_CHANGE,
 QEVENT_MAX,
 } MonitorEvent;
 
-- 
1.7.7.6




Re: [Qemu-devel] [PATCH] Prevent disk data loss when closing qemu

2012-05-16 Thread Pavel Dovgaluk
I use qemu under Windows and it has two windows when executes - console and SDL 
ones.
When I close SDL window main loop function terminates correctly, and when I 
close 
console window to terminate qemu then the code after main loop is not executed.

Pavel Dovgaluk

From: dunrong huang [mailto:riegama...@gmail.com] 
Sent: Wednesday, May 16, 2012 12:17 PM
To: Pavel Dovgaluk
Cc: qemu-devel
Subject: Re: [Qemu-devel] [PATCH] Prevent disk data loss when closing qemu

What's the difference of these two method to call bdrv_close_all?

If you close qemu window, the main_loop will return immediately, and call 
bdrv_close_all.
2012/5/16 Pavel Dovgaluk 
Prevent disk data loss when closing qemu window.

Signed-off-by: Pavel Dovgalyuk 
---
 vl.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/vl.c b/vl.c
index 23ab3a3..b6cfd29 100644
--- a/vl.c
+++ b/vl.c
@@ -3650,10 +3650,10 @@ int main(int argc, char **argv, char **envp)
    }

    os_setup_post();
+    atexit(bdrv_close_all);

    resume_all_vcpus();
    main_loop();
-    bdrv_close_all();
    pause_all_vcpus();
    net_cleanup();
    res_free();






-- 
linuxer and emacser and pythoner living in beijing
blog: http://mathslinux.org
twitter: https://twitter.com/mathslinux
google+: https://plus.google.com/118129852578326338750




Re: [Qemu-devel] [PATCH 4/4] xen: Fix PV-on-HVM

2012-05-16 Thread Stefano Stabellini
On Wed, 16 May 2012, Paolo Bonzini wrote:
> Il 16/05/2012 10:13, Michael S. Tsirkin ha scritto:
> > On Wed, May 16, 2012 at 10:06:25AM +0200, Paolo Bonzini wrote:
> >> On Xen the PV drivers can ask the firmware to surprise-remove the
> >> emulated NICs.
> > 
> > So driver tells firmware (meaning acpi? how?) that it's ok
> > to do surprize removal?
> 
> It writes something to some I/O port, and then QEMU surprise-removes the
> NICs.

Yes, writing to a static I/O port provided by the Xen platform PCI
device, see hw/xen_platform.c:platform_fixed_ioport_writew.

The guest can ask to unplug emulated NICs and disks this way.
Surprise-removal is OK in these cases.


> >> Of course it has to do it early enough so that the guest
> >> doesn't crash.
> > 
> > What does early enough mean and how do we ensure that?
> 
> Early enough means that the I/O port is written very early in the boot
> process, even before the PCI bus is scanned by the OS.
> 
> You don't ensure it, it's up to the OS.  The OS knows whether its
> drivers can cope properly with surprise removal.  If they can, in
> principle it could write the magic value whenever it wants to.

Right, it is up to the OS, in general before the PCI bus is scanned.
In Linux we do it from hypervisor_x86->init_platform.



[Qemu-devel] [PATCH 1.1] virtio: check virtio_load return code

2012-05-16 Thread Paolo Bonzini
From: Orit Wassermann 

Otherwise we crash on error.

Signed-off-by: Ulrich Obergfell 
Signed-off-by: Orit Wassermann 
Signed-off-by: Paolo Bonzini 
---
 hw/virtio-balloon.c|6 +-
 hw/virtio-blk.c|7 ++-
 hw/virtio-net.c|6 +-
 hw/virtio-scsi.c   |7 ++-
 hw/virtio-serial-bus.c |6 +-
 5 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c
index ce9d2c9..075ed87 100644
--- a/hw/virtio-balloon.c
+++ b/hw/virtio-balloon.c
@@ -211,11 +211,15 @@ static void virtio_balloon_save(QEMUFile *f, void *opaque)
 static int virtio_balloon_load(QEMUFile *f, void *opaque, int version_id)
 {
 VirtIOBalloon *s = opaque;
+int ret;
 
 if (version_id != 1)
 return -EINVAL;
 
-virtio_load(&s->vdev, f);
+ret = virtio_load(&s->vdev, f);
+if (ret) {
+return ret;
+}
 
 s->num_pages = qemu_get_be32(f);
 s->actual = qemu_get_be32(f);
diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index 12312ea..73b411f 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -533,11 +533,16 @@ static void virtio_blk_save(QEMUFile *f, void *opaque)
 static int virtio_blk_load(QEMUFile *f, void *opaque, int version_id)
 {
 VirtIOBlock *s = opaque;
+int ret;
 
 if (version_id != 2)
 return -EINVAL;
 
-virtio_load(&s->vdev, f);
+ret = virtio_load(&s->vdev, f);
+if (ret) {
+return ret;
+}
+
 while (qemu_get_sbyte(f)) {
 VirtIOBlockReq *req = virtio_blk_alloc_request(s);
 qemu_get_buffer(f, (unsigned char*)&req->elem, sizeof(req->elem));
diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index bc5e3a8..3f190d4 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -891,11 +891,15 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int 
version_id)
 {
 VirtIONet *n = opaque;
 int i;
+int ret;
 
 if (version_id < 2 || version_id > VIRTIO_NET_VM_VERSION)
 return -EINVAL;
 
-virtio_load(&n->vdev, f);
+ret = virtio_load(&n->vdev, f);
+if (ret) {
+return ret;
+}
 
 qemu_get_buffer(f, n->mac, ETH_ALEN);
 n->tx_waiting = qemu_get_be32(f);
diff --git a/hw/virtio-scsi.c b/hw/virtio-scsi.c
index e8328f4..5e39ce9 100644
--- a/hw/virtio-scsi.c
+++ b/hw/virtio-scsi.c
@@ -564,7 +564,12 @@ static void virtio_scsi_save(QEMUFile *f, void *opaque)
 static int virtio_scsi_load(QEMUFile *f, void *opaque, int version_id)
 {
 VirtIOSCSI *s = opaque;
-virtio_load(&s->vdev, f);
+int ret;
+
+ret = virtio_load(&s->vdev, f);
+if (ret) {
+return ret;
+}
 return 0;
 }
 
diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index ffbdfc2..72287d1 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -637,13 +637,17 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, 
int version_id)
 VirtIOSerialPort *port;
 uint32_t max_nr_ports, nr_active_ports, ports_map;
 unsigned int i;
+int ret;
 
 if (version_id > 3) {
 return -EINVAL;
 }
 
 /* The virtio device */
-virtio_load(&s->vdev, f);
+ret = virtio_load(&s->vdev, f);
+if (ret) {
+return ret;
+}
 
 if (version_id < 2) {
 return 0;
-- 
1.7.10.1




Re: [Qemu-devel] [PATCH] Fix size checking in load_image_targphys to accept max_size > INT_MAX

2012-05-16 Thread Christian Borntraeger
On 03/05/12 16:33, Peter Maydell wrote:
> On 3 May 2012 15:27, Christian Borntraeger  wrote:
>> commit 17df768c1e4580f03301d18ea938d3557d441911
>>load_image_targphys() should enforce the max size
>>
>> caused some problems with external kernel and specific ram sizes on s390:
>>
>> We load the external kernel with
>>
>> [...]
>>kernel_size = load_image_targphys(kernel_filename, 0, ram_size);
>> [...]
>>
>> Since load_image_targphys is declared as taking an int for max_sz, this will
>> fail for ram sizes > INT_MAX.
>> Lets change the max_sz parameter to a uint64_t.
>>
>> Signed-off-by: Christian Borntraeger 
> 
> A patch equivalent to this has already been submitted:
>   http://patchwork.ozlabs.org/patch/146165/
> We should be applying that one, it has already been reviewed.
> 
> Anthony?

Anthony,

can you apply the patchwork patch for 1.1?

Christian




Re: [Qemu-devel] [PATCH 4/4] xen: Fix PV-on-HVM

2012-05-16 Thread Michael S. Tsirkin
On Wed, May 16, 2012 at 11:19:53AM +0100, Stefano Stabellini wrote:
> On Wed, 16 May 2012, Paolo Bonzini wrote:
> > Il 16/05/2012 10:13, Michael S. Tsirkin ha scritto:
> > > On Wed, May 16, 2012 at 10:06:25AM +0200, Paolo Bonzini wrote:
> > >> On Xen the PV drivers can ask the firmware to surprise-remove the
> > >> emulated NICs.
> > > 
> > > So driver tells firmware (meaning acpi? how?) that it's ok
> > > to do surprize removal?
> > 
> > It writes something to some I/O port, and then QEMU surprise-removes the
> > NICs.
> 
> Yes, writing to a static I/O port provided by the Xen platform PCI
> device, see hw/xen_platform.c:platform_fixed_ioport_writew.
> 
> The guest can ask to unplug emulated NICs and disks this way.
> Surprise-removal is OK in these cases.

Confused.
Don't you want to just remove the device on unplug?
In fact the equivalent of guest calling _EJ0?

> > >> Of course it has to do it early enough so that the guest
> > >> doesn't crash.
> > > 
> > > What does early enough mean and how do we ensure that?
> > 
> > Early enough means that the I/O port is written very early in the boot
> > process, even before the PCI bus is scanned by the OS.
> > 
> > You don't ensure it, it's up to the OS.  The OS knows whether its
> > drivers can cope properly with surprise removal.  If they can, in
> > principle it could write the magic value whenever it wants to.
> 
> Right, it is up to the OS, in general before the PCI bus is scanned.
> In Linux we do it from hypervisor_x86->init_platform.

So early on boot you decide you want PV and so you unplug all emulated
devices?




Re: [Qemu-devel] [PATCH 1/4] Introduce a new hotplug state: Force eject.

2012-05-16 Thread Stefano Stabellini
On Tue, 15 May 2012, Michael S. Tsirkin wrote:
> On Tue, May 15, 2012 at 04:26:36PM +0100, Anthony PERARD wrote:
> > This hotplug state will be used to remove a device without the guest
> > cooperation.
> > 
> > Signed-off-by: Anthony PERARD 
> 
> This can crash guest, can't it? If you are fine with crashing guest,
> we already let you do this:
> - delete device
> - reset guest
> no need for new flags.

Given that the guest is not going to crash (if it knows what it is
doing), we could just:


diff --git a/hw/xen_platform.c b/hw/xen_platform.c
index a9c52a6..a1e1a33 100644
--- a/hw/xen_platform.c
+++ b/hw/xen_platform.c
@@ -88,6 +88,7 @@ static void unplug_nic(PCIBus *b, PCIDevice *d)
 if (pci_get_word(d->config + PCI_CLASS_DEVICE) ==
 PCI_CLASS_NETWORK_ETHERNET) {
 qdev_unplug(&(d->qdev), NULL);
+qdev_free(&(d->qdev));
 }
 }
 

Anthony, can you confirm that this solves the problem for you?



Re: [Qemu-devel] [PATCH 4/4] xen: Fix PV-on-HVM

2012-05-16 Thread Stefano Stabellini
On Wed, 16 May 2012, Michael S. Tsirkin wrote:
> On Wed, May 16, 2012 at 11:19:53AM +0100, Stefano Stabellini wrote:
> > On Wed, 16 May 2012, Paolo Bonzini wrote:
> > > Il 16/05/2012 10:13, Michael S. Tsirkin ha scritto:
> > > > On Wed, May 16, 2012 at 10:06:25AM +0200, Paolo Bonzini wrote:
> > > >> On Xen the PV drivers can ask the firmware to surprise-remove the
> > > >> emulated NICs.
> > > > 
> > > > So driver tells firmware (meaning acpi? how?) that it's ok
> > > > to do surprize removal?
> > > 
> > > It writes something to some I/O port, and then QEMU surprise-removes the
> > > NICs.
> > 
> > Yes, writing to a static I/O port provided by the Xen platform PCI
> > device, see hw/xen_platform.c:platform_fixed_ioport_writew.
> > 
> > The guest can ask to unplug emulated NICs and disks this way.
> > Surprise-removal is OK in these cases.
> 
> Confused.
> Don't you want to just remove the device on unplug?

Yes, the NIC needs to "disappear" from the PCI bus.


> In fact the equivalent of guest calling _EJ0?

Except that _EJ0 can or cannot be implemented, while this doesn't have
to go through ACPI or PCI hotplug and it is supposed to always work.


> > > >> Of course it has to do it early enough so that the guest
> > > >> doesn't crash.
> > > > 
> > > > What does early enough mean and how do we ensure that?
> > > 
> > > Early enough means that the I/O port is written very early in the boot
> > > process, even before the PCI bus is scanned by the OS.
> > > 
> > > You don't ensure it, it's up to the OS.  The OS knows whether its
> > > drivers can cope properly with surprise removal.  If they can, in
> > > principle it could write the magic value whenever it wants to.
> > 
> > Right, it is up to the OS, in general before the PCI bus is scanned.
> > In Linux we do it from hypervisor_x86->init_platform.
> 
> So early on boot you decide you want PV and so you unplug all emulated
> devices?

Yes, but only the emulated devices that can collide with PV devices:
disk and network.



[Qemu-devel] [PATCH 1.1 2/4] virtio-blk: blockdev_mark_auto_del is transport-independent

2012-05-16 Thread Paolo Bonzini
Move it from virtio_blk_exit_pci to virtio_blk_exit.

This is included here because the next patch removes proxy->block.

Signed-off-by: Paolo Bonzini 
---
 hw/virtio-blk.c |1 +
 hw/virtio-pci.c |1 -
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index 7148572..0569382 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -626,5 +626,6 @@ void virtio_blk_exit(VirtIODevice *vdev)
 {
 VirtIOBlock *s = to_virtio_blk(vdev);
 unregister_savevm(s->qdev, "virtio-blk", s);
+blockdev_mark_auto_del(s->bs);
 virtio_cleanup(vdev);
 }
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 4a4413d..7b2d576 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -726,7 +726,6 @@ static int virtio_blk_exit_pci(PCIDevice *pci_dev)
 
 virtio_pci_stop_ioeventfd(proxy);
 virtio_blk_exit(proxy->vdev);
-blockdev_mark_auto_del(proxy->block.bs);
 return virtio_exit_pci(pci_dev);
 }
 
-- 
1.7.10.1





[Qemu-devel] [PATCH 1.1 0/4] decouple VIRTIO_BLK_F_SCSI from SG_IO support

2012-05-16 Thread Paolo Bonzini
Previous versions of these patches have been posted already, but they
were lost.  Sorry for realizing this quite late.

VIRTIO_BLK_F_SCSI is supposed to mean whether the host can *parse*
SCSI requests, not *execute* them.  So it should always be enabled,
and the scsi=on/off property tied to a separate configuration variable
that is not guest visible.

With this change, Linux has problems understanding failed requests, so
patch 1 works around the Linux bugs.

Important: because we need to do this to fix a migration compatibility
problem when QEMU might be invoked with an old machine type, we must do
this unconditionally.  This more or less assumes that no one ever invoked
QEMU with scsi=off, as it breaks migration from new QEMU, scsi=off to
old QEMU, also scsi=off.  However new->old is not supported upstream.

S390 compile-tested only.

Paolo Bonzini (4):
  virtio-blk: report non-zero status when failing SG_IO requests
  virtio-blk: blockdev_mark_auto_del is transport-independent
  virtio-blk: define VirtIOBlkConf
  virtio-blk: always enable VIRTIO_BLK_F_SCSI

 hw/s390-virtio-bus.c |   10 ---
 hw/s390-virtio-bus.h |4 +--
 hw/virtio-blk.c  |   80 --
 hw/virtio-blk.h  |   14 +
 hw/virtio-pci.c  |   11 +++
 hw/virtio-pci.h  |4 +--
 hw/virtio.h  |4 +--
 7 files changed, 64 insertions(+), 63 deletions(-)

-- 
1.7.10.1




[Qemu-devel] [PATCH 1.1 3/4] virtio-blk: define VirtIOBlkConf

2012-05-16 Thread Paolo Bonzini
We will have to add another field to the virtio-blk configuration in
the next patch.  Avoid a proliferation of arguments to virtio_blk_init.

Signed-off-by: Paolo Bonzini 
---
 hw/s390-virtio-bus.c |7 +++
 hw/s390-virtio-bus.h |4 ++--
 hw/virtio-blk.c  |   27 +--
 hw/virtio-blk.h  |7 +++
 hw/virtio-pci.c  |7 +++
 hw/virtio-pci.h  |4 ++--
 hw/virtio.h  |4 ++--
 7 files changed, 32 insertions(+), 28 deletions(-)

diff --git a/hw/s390-virtio-bus.c b/hw/s390-virtio-bus.c
index 63ccd5c..43647a7 100644
--- a/hw/s390-virtio-bus.c
+++ b/hw/s390-virtio-bus.c
@@ -163,8 +163,7 @@ static int s390_virtio_blk_init(VirtIOS390Device *dev)
 {
 VirtIODevice *vdev;
 
-vdev = virtio_blk_init((DeviceState *)dev, &dev->block,
-   &dev->block_serial);
+vdev = virtio_blk_init((DeviceState *)dev, &dev->blk);
 if (!vdev) {
 return -1;
 }
@@ -400,8 +399,8 @@ static TypeInfo s390_virtio_net = {
 };
 
 static Property s390_virtio_blk_properties[] = {
-DEFINE_BLOCK_PROPERTIES(VirtIOS390Device, block),
-DEFINE_PROP_STRING("serial", VirtIOS390Device, block_serial),
+DEFINE_BLOCK_PROPERTIES(VirtIOS390Device, blk.conf),
+DEFINE_PROP_STRING("serial", VirtIOS390Device, blk.serial),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/s390-virtio-bus.h b/hw/s390-virtio-bus.h
index 49e6c46..4b99d02 100644
--- a/hw/s390-virtio-bus.h
+++ b/hw/s390-virtio-bus.h
@@ -17,6 +17,7 @@
  * License along with this library; if not, see .
  */
 
+#include "virtio-blk.h"
 #include "virtio-net.h"
 #include "virtio-serial.h"
 #include "virtio-scsi.h"
@@ -64,8 +65,7 @@ struct VirtIOS390Device {
 ram_addr_t feat_offs;
 uint8_t feat_len;
 VirtIODevice *vdev;
-BlockConf block;
-char *block_serial;
+VirtIOBlkConf blk;
 NICConf nic;
 uint32_t host_features;
 virtio_serial_conf serial;
diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index 0569382..b46ecb3 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -29,7 +29,7 @@ typedef struct VirtIOBlock
 void *rq;
 QEMUBH *bh;
 BlockConf *conf;
-char *serial;
+VirtIOBlkConf *blk;
 unsigned short sector_mask;
 DeviceState *qdev;
 } VirtIOBlock;
@@ -389,7 +389,7 @@ static void virtio_blk_handle_request(VirtIOBlockReq *req,
  * terminated by '\0' only when shorter than buffer.
  */
 strncpy(req->elem.in_sg[0].iov_base,
-s->serial ? s->serial : "",
+s->blk->serial ? s->blk->serial : "",
 MIN(req->elem.in_sg[0].iov_len, VIRTIO_BLK_ID_BYTES));
 virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
 g_free(req);
@@ -568,28 +568,27 @@ static const BlockDevOps virtio_block_ops = {
 .resize_cb = virtio_blk_resize,
 };
 
-VirtIODevice *virtio_blk_init(DeviceState *dev, BlockConf *conf,
-  char **serial)
+VirtIODevice *virtio_blk_init(DeviceState *dev, VirtIOBlkConf *blk)
 {
 VirtIOBlock *s;
 int cylinders, heads, secs;
 static int virtio_blk_id;
 DriveInfo *dinfo;
 
-if (!conf->bs) {
+if (!blk->conf.bs) {
 error_report("drive property not set");
 return NULL;
 }
-if (!bdrv_is_inserted(conf->bs)) {
+if (!bdrv_is_inserted(blk->conf.bs)) {
 error_report("Device needs media, but drive is empty");
 return NULL;
 }
 
-if (!*serial) {
+if (!blk->serial) {
 /* try to fall back to value set with legacy -drive serial=... */
-dinfo = drive_get_by_blockdev(conf->bs);
+dinfo = drive_get_by_blockdev(blk->conf.bs);
 if (*dinfo->serial) {
-*serial = strdup(dinfo->serial);
+blk->serial = strdup(dinfo->serial);
 }
 }
 
@@ -600,9 +599,9 @@ VirtIODevice *virtio_blk_init(DeviceState *dev, BlockConf 
*conf,
 s->vdev.get_config = virtio_blk_update_config;
 s->vdev.get_features = virtio_blk_get_features;
 s->vdev.reset = virtio_blk_reset;
-s->bs = conf->bs;
-s->conf = conf;
-s->serial = *serial;
+s->bs = blk->conf.bs;
+s->conf = &blk->conf;
+s->blk = blk;
 s->rq = NULL;
 s->sector_mask = (s->conf->logical_block_size / BDRV_SECTOR_SIZE) - 1;
 bdrv_guess_geometry(s->bs, &cylinders, &heads, &secs);
@@ -614,10 +613,10 @@ VirtIODevice *virtio_blk_init(DeviceState *dev, BlockConf 
*conf,
 register_savevm(dev, "virtio-blk", virtio_blk_id++, 2,
 virtio_blk_save, virtio_blk_load, s);
 bdrv_set_dev_ops(s->bs, &virtio_block_ops, s);
-bdrv_set_buffer_alignment(s->bs, conf->logical_block_size);
+bdrv_set_buffer_alignment(s->bs, s->conf->logical_block_size);
 
 bdrv_iostatus_enable(s->bs);
-add_boot_device_path(conf->bootindex, dev, "/disk@0,0");
+add_boot_device_path(s->conf->bootindex, dev, "/disk@0,0");
 
 return &s->vdev;
 }
diff --git a/hw/virtio-blk.h b/hw/virtio-blk

[Qemu-devel] [PATCH 1.1 1/4] virtio-blk: report non-zero status when failing SG_IO requests

2012-05-16 Thread Paolo Bonzini
Linux really looks only at scsi->errors for SG_IO requests; it does
not look at the virtio request status at all.  Because of this, when
a SG_IO request is failed early with virtio_blk_req_complete(req,
VIRTIO_BLK_S_UNSUPP), without writing hdr.status, it will look like
a success to the guest.

This is their bug, but we can make it safe for older guests now by
forcing scsi->errors to have a non-zero value whenever a request
has to be failed.

But if we fix the bug in the guest driver, we will have another problem
because QEMU returns VIRTIO_BLK_S_IOERR if the status is non-zero, and
Linux translates that to -EIO.  Rather, the guest should succeed the
request and pass the non-zero status via the userspace-provided SG_IO
structure.  So, remove the case where virtio_blk_handle_scsi can
return VIRTIO_BLK_S_IOERR.

Signed-off-by: Paolo Bonzini 
---
 hw/virtio-blk.c |   51 +++
 1 file changed, 23 insertions(+), 28 deletions(-)

diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index d4bb400..7148572 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -145,20 +145,12 @@ static VirtIOBlockReq *virtio_blk_get_request(VirtIOBlock 
*s)
 return req;
 }
 
-#ifdef __linux__
 static void virtio_blk_handle_scsi(VirtIOBlockReq *req)
 {
-struct sg_io_hdr hdr;
 int ret;
-int status;
+int status = VIRTIO_BLK_S_OK;
 int i;
 
-if ((req->dev->vdev.guest_features & (1 << VIRTIO_BLK_F_SCSI)) == 0) {
-virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
-g_free(req);
-return;
-}
-
 /*
  * We require at least one output segment each for the virtio_blk_outhdr
  * and the SCSI command block.
@@ -173,20 +165,26 @@ static void virtio_blk_handle_scsi(VirtIOBlockReq *req)
 }
 
 /*
- * No support for bidirection commands yet.
+ * The scsi inhdr is placed in the second-to-last input segment, just
+ * before the regular inhdr.
  */
-if (req->elem.out_num > 2 && req->elem.in_num > 3) {
-virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
-g_free(req);
-return;
+req->scsi = (void *)req->elem.in_sg[req->elem.in_num - 2].iov_base;
+
+if ((req->dev->vdev.guest_features & (1 << VIRTIO_BLK_F_SCSI)) == 0) {
+status = VIRTIO_BLK_S_UNSUPP;
+goto fail;
 }
 
 /*
- * The scsi inhdr is placed in the second-to-last input segment, just
- * before the regular inhdr.
+ * No support for bidirection commands yet.
  */
-req->scsi = (void *)req->elem.in_sg[req->elem.in_num - 2].iov_base;
+if (req->elem.out_num > 2 && req->elem.in_num > 3) {
+status = VIRTIO_BLK_S_UNSUPP;
+goto fail;
+}
 
+#ifdef __linux__
+struct sg_io_hdr hdr;
 memset(&hdr, 0, sizeof(struct sg_io_hdr));
 hdr.interface_id = 'S';
 hdr.cmd_len = req->elem.out_sg[1].iov_len;
@@ -230,12 +228,7 @@ static void virtio_blk_handle_scsi(VirtIOBlockReq *req)
 ret = bdrv_ioctl(req->dev->bs, SG_IO, &hdr);
 if (ret) {
 status = VIRTIO_BLK_S_UNSUPP;
-hdr.status = ret;
-hdr.resid = hdr.dxfer_len;
-} else if (hdr.status) {
-status = VIRTIO_BLK_S_IOERR;
-} else {
-status = VIRTIO_BLK_S_OK;
+goto fail;
 }
 
 /*
@@ -258,14 +251,16 @@ static void virtio_blk_handle_scsi(VirtIOBlockReq *req)
 
 virtio_blk_req_complete(req, status);
 g_free(req);
-}
 #else
-static void virtio_blk_handle_scsi(VirtIOBlockReq *req)
-{
-virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
+abort();
+#endif
+
+fail:
+/* Just put anything nonzero so that the ioctl fails in the guest.  */
+stl_p(&req->scsi->errors, 255);
+virtio_blk_req_complete(req, status);
 g_free(req);
 }
-#endif /* __linux__ */
 
 typedef struct MultiReqBuffer {
 BlockRequestblkreq[32];
-- 
1.7.10.1





[Qemu-devel] [PATCH 1.1 4/4] virtio-blk: always enable VIRTIO_BLK_F_SCSI

2012-05-16 Thread Paolo Bonzini
VIRTIO_BLK_F_SCSI is supposed to mean whether the host can *parse*
SCSI requests, not *execute* them.  You could run QEMU with scsi=on
and a file-backed disk, and QEMU would fail all SCSI requests even
though it advertises VIRTIO_BLK_F_SCSI.

Because we need to do this to fix a migration compatibility problem
related to how QEMU is invoked by management, we must do this
unconditionally even on older machine types.  This more or less assumes
that no one ever invoked QEMU with scsi=off.

Here is how testing goes:

- old QEMU, scsi=on -> new QEMU, scsi=on
- new QEMU, scsi=on -> old QEMU, scsi=on
- old QEMU, scsi=off -> new QEMU, scsi=on
- new QEMU, scsi=off -> old QEMU, scsi=on
ok (new QEMU has VIRTIO_BLK_F_SCSI, adding host features is fine)

- old QEMU, scsi=off -> new QEMU, scsi=off
ok (new QEMU has VIRTIO_BLK_F_SCSI, adding host features is fine)

- old QEMU, scsi=on -> new QEMU, scsi=off
ok, bug fixed

- new QEMU, scsi=on -> old QEMU, scsi=off
doesn't work (same as: old QEMU, scsi=on -> old QEMU, scsi=off)

- new QEMU, scsi=off -> old QEMU, scsi=off
broken by the patch

Signed-off-by: Paolo Bonzini 
---
 hw/s390-virtio-bus.c |3 +++
 hw/virtio-blk.c  |3 ++-
 hw/virtio-blk.h  |7 +--
 hw/virtio-pci.c  |3 +++
 4 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/hw/s390-virtio-bus.c b/hw/s390-virtio-bus.c
index 43647a7..1d38a8f 100644
--- a/hw/s390-virtio-bus.c
+++ b/hw/s390-virtio-bus.c
@@ -401,6 +401,9 @@ static TypeInfo s390_virtio_net = {
 static Property s390_virtio_blk_properties[] = {
 DEFINE_BLOCK_PROPERTIES(VirtIOS390Device, blk.conf),
 DEFINE_PROP_STRING("serial", VirtIOS390Device, blk.serial),
+#ifdef __linux__
+DEFINE_PROP_BIT("scsi", VirtIOS390Device, blk.scsi, 0, true),
+#endif
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index b46ecb3..f9e1896 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -170,7 +170,7 @@ static void virtio_blk_handle_scsi(VirtIOBlockReq *req)
  */
 req->scsi = (void *)req->elem.in_sg[req->elem.in_num - 2].iov_base;
 
-if ((req->dev->vdev.guest_features & (1 << VIRTIO_BLK_F_SCSI)) == 0) {
+if (!req->dev->blk->scsi) {
 status = VIRTIO_BLK_S_UNSUPP;
 goto fail;
 }
@@ -504,6 +504,7 @@ static uint32_t virtio_blk_get_features(VirtIODevice *vdev, 
uint32_t features)
 features |= (1 << VIRTIO_BLK_F_GEOMETRY);
 features |= (1 << VIRTIO_BLK_F_TOPOLOGY);
 features |= (1 << VIRTIO_BLK_F_BLK_SIZE);
+features |= (1 << VIRTIO_BLK_F_SCSI);
 
 if (bdrv_enable_write_cache(s->bs))
 features |= (1 << VIRTIO_BLK_F_WCACHE);
diff --git a/hw/virtio-blk.h b/hw/virtio-blk.h
index 70564a1..d785001 100644
--- a/hw/virtio-blk.h
+++ b/hw/virtio-blk.h
@@ -101,15 +101,10 @@ struct VirtIOBlkConf
 {
 BlockConf conf;
 char *serial;
+uint32_t scsi;
 };
 
-#ifdef __linux__
-#define DEFINE_VIRTIO_BLK_FEATURES(_state, _field) \
-DEFINE_VIRTIO_COMMON_FEATURES(_state, _field), \
-DEFINE_PROP_BIT("scsi", _state, _field, VIRTIO_BLK_F_SCSI, true)
-#else
 #define DEFINE_VIRTIO_BLK_FEATURES(_state, _field) \
 DEFINE_VIRTIO_COMMON_FEATURES(_state, _field)
-#endif
 
 #endif
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index ae3ddd8..7932392 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -814,6 +814,9 @@ static Property virtio_blk_properties[] = {
 DEFINE_PROP_HEX32("class", VirtIOPCIProxy, class_code, 0),
 DEFINE_BLOCK_PROPERTIES(VirtIOPCIProxy, blk.conf),
 DEFINE_PROP_STRING("serial", VirtIOPCIProxy, blk.serial),
+#ifdef __linux__
+DEFINE_PROP_BIT("scsi", VirtIOPCIProxy, blk.scsi, 0, true),
+#endif
 DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
 DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2),
 DEFINE_VIRTIO_BLK_FEATURES(VirtIOPCIProxy, host_features),
-- 
1.7.10.1




Re: [Qemu-devel] [PATCH 2/3 v6] Replace bdrv_* to bdrv_aio_* functions in DMA mode in fdc.c

2012-05-16 Thread Kevin Wolf
Am 16.05.2012 10:23, schrieb Zhi Hui Li:
> On 2012年05月15日 17:38, Paolo Bonzini wrote:
>> Il 15/05/2012 11:33, Kevin Wolf ha scritto:
> which blindly overwrites status2.  Hence the new code was not written
> based on it.  However, the new code is untested as far as I know.
>>> In the thread of an earlier version of this series, I said that a qtest
>>> for floppy is required. This only confirms it.
>>
>> The problem with writing a qtest is that the spec is incredibly complex
>> and obscure.  It's probably even better to rip out code that cannot be
>> tested properly, so you don't have to test it at all...
>>
>> (Mostly tongue-in-cheek of course.  A qtest for basic read/write in PIO
>> and DMA modes is indeed a very good idea).
>>
>> Paolo
>>
>>
> 
> Yes , I think maybe Paolo is right.
> 
> Because the spec is incredibly complex and obscure and I am newer.
> To write the whole code's qtest beyond my ability. I am afraid I can't 
> finish it. so I want only do a qtest about basic read/write in PIO
> and DMA modes. I don't know whether it is OK.

Don't worry, any test is better than no test. We should try to add a
qtest for basic operation to fdc-test.c. More detailed tests can be
added later, or maybe we find good additions during review.

I know that the floppy controller spec is hard to read. Writing test
cases basically means translating it into clearer requirements.

> (I don't know whether we can use qtest to replace the real test, 
> especially on PIO mode 's test.)

In theory yes, qtest can do everything if you have a complete set of
test cases to cover the whole spec. In practice it will just help to
find regressions earlier (floppy isn't tested very often manually).

Kevin



Re: [Qemu-devel] [PATCH 1.1] tests: Add rtc-test (fix test regression)

2012-05-16 Thread Kevin Wolf
Am 16.05.2012 06:07, schrieb Andreas Färber:
> Am 15.05.2012 18:19, schrieb Stefan Weil:
>> Commit 93e9eb6808c886f5f1c903b7ced1eed65de2ba39 added fdc-test,
>> but accidentally removed rtc-test because check-qtest-i386-y was
>> not enhanced but set twice.
>>
>> This patch adds rtc-test again (and sorts both tests alphabetically).
>>
>> Signed-off-by: Stefan Weil 
> 
> Reviewed-by: Andreas Färber 
> 
> I wonder though if it might be better to always just use += because
> especially an alphabetical order risks us introducing the same fault again.

Yes, I think that would be better.

Kevin



Re: [Qemu-devel] [PATCH v3 3/8] kvm: Introduce basic MSI support for in-kernel irqchips

2012-05-16 Thread Jan Kiszka
On 2012-05-16 00:27, Marcelo Tosatti wrote:
> On Thu, May 10, 2012 at 06:02:52PM -0300, Jan Kiszka wrote:
>> This patch basically adds kvm_irqchip_send_msi, a service for sending
>> arbitrary MSI messages to KVM's in-kernel irqchip models.
>>
>> As the current KVI API requires us to establish a static route from a
>> pseudo GSI to the target MSI message and inject the MSI via toggling
>> that GSI, we need to play some tricks to make this unfortunately
>> interface transparent. We create those routes on demand and keep them
>> in a hash table. Succeeding messages can then search for an existing
>> route in the table first and reuse it whenever possible. If we should
>> run out of limited GSIs, we simply flush the table and rebuild it as
>> messages are sent.
>>
>> This approach is rather simple and could be optimized further. However,
>> future kernels will contain a more efficient MSI injection interface
>> that will obsolete the GSI-based dynamic injection.
>>
>> Signed-off-by: Jan Kiszka 
>> ---
>>  kvm-all.c |  139 
>> -
>>  kvm.h |1 +
>>  2 files changed, 139 insertions(+), 1 deletions(-)
>>
>> diff --git a/kvm-all.c b/kvm-all.c
>> index 2d82d54..e7ed510 100644
>> --- a/kvm-all.c
>> +++ b/kvm-all.c
>> @@ -24,6 +24,7 @@
>>  #include "qemu-barrier.h"
>>  #include "sysemu.h"
>>  #include "hw/hw.h"
>> +#include "hw/msi.h"
>>  #include "gdbstub.h"
>>  #include "kvm.h"
>>  #include "bswap.h"
>> @@ -48,6 +49,8 @@
>>  do { } while (0)
>>  #endif
>>  
>> +#define KVM_MSI_HASHTAB_SIZE256
>> +
>>  typedef struct KVMSlot
>>  {
>>  target_phys_addr_t start_addr;
>> @@ -59,6 +62,11 @@ typedef struct KVMSlot
>>  
>>  typedef struct kvm_dirty_log KVMDirtyLog;
>>  
>> +typedef struct KVMMSIRoute {
>> +struct kvm_irq_routing_entry kroute;
>> +QTAILQ_ENTRY(KVMMSIRoute) entry;
>> +} KVMMSIRoute;
>> +
>>  struct KVMState
>>  {
>>  KVMSlot slots[32];
>> @@ -87,6 +95,7 @@ struct KVMState
>>  int nr_allocated_irq_routes;
>>  uint32_t *used_gsi_bitmap;
>>  unsigned int gsi_count;
>> +QTAILQ_HEAD(msi_hashtab, KVMMSIRoute) msi_hashtab[KVM_MSI_HASHTAB_SIZE];
>>  #endif
>>  };
>>  
>> @@ -862,9 +871,14 @@ static void set_gsi(KVMState *s, unsigned int gsi)
>>  s->used_gsi_bitmap[gsi / 32] |= 1U << (gsi % 32);
>>  }
>>  
>> +static void clear_gsi(KVMState *s, unsigned int gsi)
>> +{
>> +s->used_gsi_bitmap[gsi / 32] &= ~(1U << (gsi % 32));
>> +}
>> +
>>  static void kvm_init_irq_routing(KVMState *s)
>>  {
>> -int gsi_count;
>> +int gsi_count, i;
>>  
>>  gsi_count = kvm_check_extension(s, KVM_CAP_IRQ_ROUTING);
>>  if (gsi_count > 0) {
>> @@ -884,6 +898,10 @@ static void kvm_init_irq_routing(KVMState *s)
>>  s->irq_routes = g_malloc0(sizeof(*s->irq_routes));
>>  s->nr_allocated_irq_routes = 0;
>>  
>> +for (i = 0; i < KVM_MSI_HASHTAB_SIZE; i++) {
>> +QTAILQ_INIT(&s->msi_hashtab[i]);
>> +}
>> +
>>  kvm_arch_init_irq_routing(s);
>>  }
>>  
>> @@ -934,11 +952,130 @@ int kvm_irqchip_commit_routes(KVMState *s)
>>  return kvm_vm_ioctl(s, KVM_SET_GSI_ROUTING, s->irq_routes);
>>  }
>>  
>> +static void kvm_release_gsi(KVMState *s, int gsi)
>> +{
>> +struct kvm_irq_routing_entry *e;
>> +int i;
>> +
>> +for (i = 0; i < s->irq_routes->nr; i++) {
>> +e = &s->irq_routes->entries[i];
>> +if (e->gsi == gsi) {
>> +s->irq_routes->nr--;
>> +*e = s->irq_routes->entries[s->irq_routes->nr];
>> +}
>> +}
>> +clear_gsi(s, gsi);
>> +}
>> +
>> +static unsigned int kvm_hash_msi(uint32_t data)
>> +{
>> +/* This is optimized for IA32 MSI layout. However, no other arch shall
>> + * repeat the mistake of not providing a direct MSI injection API. */
>> +return data & 0xff;
>> +}
>> +
>> +static void kvm_flush_dynamic_msi_routes(KVMState *s)
>> +{
>> +KVMMSIRoute *route, *next;
>> +unsigned int hash;
>> +
>> +for (hash = 0; hash < KVM_MSI_HASHTAB_SIZE; hash++) {
>> +QTAILQ_FOREACH_SAFE(route, &s->msi_hashtab[hash], entry, next) {
>> +kvm_release_gsi(s, route->kroute.gsi);
>> +QTAILQ_REMOVE(&s->msi_hashtab[hash], route, entry);
>> +g_free(route);
>> +}
>> +}
>> +}
>> +
>> +static int kvm_get_pseudo_gsi(KVMState *s)
>> +{
>> +uint32_t *word = s->used_gsi_bitmap;
>> +int max_words = ALIGN(s->gsi_count, 32) / 32;
>> +int i, bit;
>> +bool retry = true;
>> +
>> +again:
>> +/* Return the lowest unused GSI in the bitmap */
>> +for (i = 0; i < max_words; i++) {
>> +bit = ffs(~word[i]);
>> +if (!bit) {
>> +continue;
>> +}
>> +
>> +return bit - 1 + i * 32;
>> +}
>> +if (retry) {
>> +retry = false;
>> +kvm_flush_dynamic_msi_routes(s);
>> +goto again;
>> +}
> 
> Guests with more MSI vectors than GSIs available, multiplexing are going
> to be so slow that its best to disal

Re: [Qemu-devel] [Xen-devel] [PATCH V8 RESEND 2/8] configure: Introduce --enable-xen-pci-passthrough.

2012-05-16 Thread Konrad Rzeszutek Wilk
On Mon, May 14, 2012 at 11:49:46AM +0100, Anthony PERARD wrote:
> On Sat, May 12, 2012 at 2:42 AM, Konrad Rzeszutek Wilk
>  wrote:
> >
> > I thought I reviewed this last time? Is there a reason for not
> > attaching 'Reviewed-by: Konrad Rzeszutek Wilk
> > ' on this patch?
> 
> Yes, one: you reviewed a later patch :-)

Duh!
> 
> Your reviewed-by is in the last version of this series, V11.
> 
> -- 
> Anthony PERARD
> 
> ___
> Xen-devel mailing list
> xen-de...@lists.xen.org
> http://lists.xen.org/xen-devel



Re: [Qemu-devel] [PATCH 1/4] Introduce a new hotplug state: Force eject.

2012-05-16 Thread Anthony PERARD
On Wed, May 16, 2012 at 11:32 AM, Stefano Stabellini
 wrote:
> On Tue, 15 May 2012, Michael S. Tsirkin wrote:
>> On Tue, May 15, 2012 at 04:26:36PM +0100, Anthony PERARD wrote:
>> > This hotplug state will be used to remove a device without the guest
>> > cooperation.
>> >
>> > Signed-off-by: Anthony PERARD 
>>
>> This can crash guest, can't it? If you are fine with crashing guest,
>> we already let you do this:
>> - delete device
>> - reset guest
>> no need for new flags.
>
> Given that the guest is not going to crash (if it knows what it is
> doing), we could just:
>
>
> diff --git a/hw/xen_platform.c b/hw/xen_platform.c
> index a9c52a6..a1e1a33 100644
> --- a/hw/xen_platform.c
> +++ b/hw/xen_platform.c
> @@ -88,6 +88,7 @@ static void unplug_nic(PCIBus *b, PCIDevice *d)
>     if (pci_get_word(d->config + PCI_CLASS_DEVICE) ==
>             PCI_CLASS_NETWORK_ETHERNET) {
>         qdev_unplug(&(d->qdev), NULL);
> +        qdev_free(&(d->qdev));
>     }
>  }
>
>
> Anthony, can you confirm that this solves the problem for you?

This work until I try to hotplug a new device to the guest at wish
point I have this:
ERROR:/local/home/anthony/work/qemu/qom/object.c:389:object_delete:
assertion failed: (obj->ref == 0)

This is because there is still a pending request of the hotunplug in
the acpi piix4.
If I call qdev_free without qdev_unplug, I hit the same assert, but
rigth away. This is way something new.

-- 
Anthony PERARD



Re: [Qemu-devel] [PATCH 1/4] Introduce a new hotplug state: Force eject.

2012-05-16 Thread Paolo Bonzini
Il 16/05/2012 13:15, Anthony PERARD ha scritto:
>> > qdev_unplug(&(d->qdev), NULL);
>> > +qdev_free(&(d->qdev));
>> > }
>> >  }
>> >
>> >
>> > Anthony, can you confirm that this solves the problem for you?
> This work until I try to hotplug a new device to the guest at wish
> point I have this:
> ERROR:/local/home/anthony/work/qemu/qom/object.c:389:object_delete:
> assertion failed: (obj->ref == 0)
> 
> This is because there is still a pending request of the hotunplug in
> the acpi piix4.
> If I call qdev_free without qdev_unplug, I hit the same assert, but
> rigth away. This is way something new.

Because it's missing the object_unparent done by qdev_unplug.  Does
object_unparent+qdev_free work?  (I believe object_unparent should be
done by qdev_free rather than qdev_unplug, but that's something for 1.2).

Paolo



Re: [Qemu-devel] [Xen-devel] [PATCH V11 3/8] Introduce XenHostPCIDevice to access a pci device on the host.

2012-05-16 Thread Konrad Rzeszutek Wilk
On Tue, Apr 03, 2012 at 04:32:38PM +0100, Anthony PERARD wrote:
> Signed-off-by: Anthony PERARD 

Looks good, thought I've just couple of tiny comments:

> +#define XEN_HOST_PCI_RESSOURCE_BUFFER_SIZE 512

You might want a comment explaining why 512, and not
a more precise number like 741.
> +{
.. snip..
> +do {
> +rc = read(fd, &buf, sizeof (buf) - 1);
> +if (rc < 0 && errno != EINTR) {
> +rc = -errno;
> +goto out;
> +}
> +} while (rc < 0);

Ok, you read it in. Maybe my 'wc' magic is gone, but this
is what I get:
[root@localhost :00:02.0]# cat resource | wc -c
741
.. snip..
> +#define XEN_HOST_PCI_GET_VALUE_BUFFER_SIZE 42

The answer to the life? Can you provide a comment explaining
the reason why it is 42, please?

> +static int xen_host_pci_get_value(XenHostPCIDevice *d, const char *name,
> +  unsigned int *pvalue, int base)
> +{



[Qemu-devel] [PATCH 1/1] virtio-rng: device to send host entropy to guest

2012-05-16 Thread Amit Shah
The Linux kernel already has a virtio-rng driver, this is the device
implementation.

When Linux needs more entropy, it puts a buffer in the vq.  We then put
entropy into that buffer, and push it back to the guest.

Feeding randomness from host's /dev/urandom into the guest is
sufficient, so this is a simple implementation that opens /dev/urandom
and reads from it whenever required.

Invocation is simple:

  qemu ... -device virtio-rng-pci

In the guest, we see

  $ cat /sys/devices/virtual/misc/hw_random/rng_available
  virtio

  $ cat /sys/devices/virtual/misc/hw_random/rng_current
  virtio

There are ways to extend the device to be more generic and collect
entropy from other sources, but this is simple enough and works for now.

Signed-off-by: Amit Shah 
---
 Makefile.objs   |1 +
 hw/pci.h|1 +
 hw/virtio-pci.c |   50 +
 hw/virtio-rng.c |  130 +++
 hw/virtio-rng.h |   18 
 hw/virtio.h |2 +
 6 files changed, 202 insertions(+), 0 deletions(-)
 create mode 100644 hw/virtio-rng.c
 create mode 100644 hw/virtio-rng.h

diff --git a/Makefile.objs b/Makefile.objs
index 70c5c79..5850762 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -210,6 +210,7 @@ user-obj-y += $(qom-obj-twice-y)
 hw-obj-y =
 hw-obj-y += vl.o loader.o
 hw-obj-$(CONFIG_VIRTIO) += virtio-console.o
+hw-obj-$(CONFIG_VIRTIO) += virtio-rng.o
 hw-obj-y += usb/libhw.o
 hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
 hw-obj-y += fw_cfg.o
diff --git a/hw/pci.h b/hw/pci.h
index 8d0aa49..0a22f91 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -76,6 +76,7 @@
 #define PCI_DEVICE_ID_VIRTIO_BALLOON 0x1002
 #define PCI_DEVICE_ID_VIRTIO_CONSOLE 0x1003
 #define PCI_DEVICE_ID_VIRTIO_SCSI0x1004
+#define PCI_DEVICE_ID_VIRTIO_RNG 0x1005
 
 #define FMT_PCIBUS  PRIx64
 
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 4a4413d..7f2d630 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -812,6 +812,28 @@ static int virtio_balloon_exit_pci(PCIDevice *pci_dev)
 return virtio_exit_pci(pci_dev);
 }
 
+static int virtio_rng_init_pci(PCIDevice *pci_dev)
+{
+VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, pci_dev);
+VirtIODevice *vdev;
+
+vdev = virtio_rng_init(&pci_dev->qdev);
+if (!vdev) {
+return -1;
+}
+virtio_init_pci(proxy, vdev);
+return 0;
+}
+
+static int virtio_rng_exit_pci(PCIDevice *pci_dev)
+{
+VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, pci_dev);
+
+virtio_pci_stop_ioeventfd(proxy);
+virtio_rng_exit(proxy->vdev);
+return virtio_exit_pci(pci_dev);
+}
+
 static Property virtio_blk_properties[] = {
 DEFINE_PROP_HEX32("class", VirtIOPCIProxy, class_code, 0),
 DEFINE_BLOCK_PROPERTIES(VirtIOPCIProxy, block),
@@ -937,6 +959,33 @@ static TypeInfo virtio_balloon_info = {
 .class_init= virtio_balloon_class_init,
 };
 
+static Property virtio_rng_properties[] = {
+DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features),
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static void virtio_rng_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+k->init = virtio_rng_init_pci;
+k->exit = virtio_rng_exit_pci;
+k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
+k->device_id = PCI_DEVICE_ID_VIRTIO_RNG;
+k->revision = VIRTIO_PCI_ABI_VERSION;
+k->class_id = PCI_CLASS_OTHERS;
+dc->reset = virtio_pci_reset;
+dc->props = virtio_rng_properties;
+}
+
+static TypeInfo virtio_rng_info = {
+.name  = "virtio-rng-pci",
+.parent= TYPE_PCI_DEVICE,
+.instance_size = sizeof(VirtIOPCIProxy),
+.class_init= virtio_rng_class_init,
+};
+
 static int virtio_scsi_init_pci(PCIDevice *pci_dev)
 {
 VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, pci_dev);
@@ -998,6 +1047,7 @@ static void virtio_pci_register_types(void)
 type_register_static(&virtio_serial_info);
 type_register_static(&virtio_balloon_info);
 type_register_static(&virtio_scsi_info);
+type_register_static(&virtio_rng_info);
 }
 
 type_init(virtio_pci_register_types)
diff --git a/hw/virtio-rng.c b/hw/virtio-rng.c
new file mode 100644
index 000..e1f3d1c
--- /dev/null
+++ b/hw/virtio-rng.c
@@ -0,0 +1,130 @@
+/* A virtio device for feeding entropy into a guest.
+ *
+ * Copyright 2012 Red Hat, Inc.
+ * Copyright 2012 Amit Shah 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * (at your option) any later version.  See the COPYING file in the
+ * top-level directory.
+ */
+
+#include "iov.h"
+#include "qdev.h"
+#include "virtio.h"
+#include "virtio-rng.h"
+
+typedef struct VirtIORNG {
+VirtIODevice vdev;
+
+DeviceState *qdev;
+
+/* Only one vq - guest puts a buffer on it when it needs entropy */
+VirtQueue *vq;
+
+int input_fd;
+} VirtIORNG;
+
+static void handle_input(VirtIODev

[Qemu-devel] [PATCH] kvm: update vmxcap for EPT A/D, INVPCID, RDRAND, VMFUNC

2012-05-16 Thread Avi Kivity
Signed-off-by: Avi Kivity 
---
 scripts/kvm/vmxcap |   13 +
 1 file changed, 13 insertions(+)

diff --git a/scripts/kvm/vmxcap b/scripts/kvm/vmxcap
index a74ce71..cbe6440 100755
--- a/scripts/kvm/vmxcap
+++ b/scripts/kvm/vmxcap
@@ -22,6 +22,7 @@ MSR_IA32_VMX_TRUE_PINBASED_CTLS = 0x48D
 MSR_IA32_VMX_TRUE_PROCBASED_CTLS = 0x48E
 MSR_IA32_VMX_TRUE_EXIT_CTLS = 0x48F
 MSR_IA32_VMX_TRUE_ENTRY_CTLS = 0x490
+MSR_IA32_VMX_VMFUNC = 0x491
 
 class msr(object):
 def __init__(self):
@@ -147,6 +148,9 @@ controls = [
 6: 'WBINVD exiting',
 7: 'Unrestricted guest',
 10: 'PAUSE-loop exiting',
+11: 'RDRAND exiting',
+12: 'Enable INVPCID',
+13: 'Enable VM functions',
 },
 cap_msr = MSR_IA32_VMX_PROCBASED_CTLS2,
 ),
@@ -193,6 +197,7 @@ controls = [
 8: 'Wait-for-SIPI activity state',
 (16,24): 'Number of CR3-target values',
 (25,27): 'MSR-load/store count recommenation',
+28: 'IA32_SMM_MONITOR_CTL[2] can be set to 1',
 (32,62): 'MSEG revision identifier',
 },
 msr = MSR_IA32_VMX_MISC_CTLS,
@@ -208,6 +213,7 @@ controls = [
 16: '2MB EPT pages',
 17: '1GB EPT pages',
 20: 'INVEPT supported',
+21: 'EPT accessed and dirty flags',
 25: 'Single-context INVEPT',
 26: 'All-context INVEPT',
 32: 'INVVPID supported',
@@ -218,6 +224,13 @@ controls = [
 },
 msr = MSR_IA32_VMX_EPT_VPID_CAP,
 ),
+Misc(
+name = 'VM Functions',
+bits = {
+0: 'EPTP Switching',
+},
+msr = MSR_IA32_VMX_VMFUNC,
+),
 ]
 
 for c in controls:
-- 
1.7.10.1




Re: [Qemu-devel] [Xen-devel] [PATCH V11 5/8] Introduce Xen PCI Passthrough, qdevice (1/3)

2012-05-16 Thread Konrad Rzeszutek Wilk
> +void xen_pt_log(const PCIDevice *d, const char *f, ...)
> +{
> +va_list ap;
> +
> +va_start(ap, f);
> +if (d) {
> +fprintf(stderr, "[%02x:%02x.%x] ", pci_bus_num(d->bus),

%d at the end and in the other location pls fix it up..

> +PCI_SLOT(d->devfn), PCI_FUNC(d->devfn));
> +}
> +vfprintf(stderr, f, ap);
> +va_end(ap);
> +}
> +
> +/* Config Space */
> +

.. Is there a way to make this function (and the following one) squahed?

> +static uint32_t xen_pt_pci_read_config(PCIDevice *d, uint32_t addr, int len)
> +{
> +XenPCIPassthroughState *s = DO_UPCAST(XenPCIPassthroughState, dev, d);
> +uint32_t val = 0;
> +XenPTRegGroup *reg_grp_entry = NULL;
> +XenPTReg *reg_entry = NULL;
> +int rc = 0;
> +int emul_len = 0;
> +uint32_t find_addr = addr;
> +
> +if (xen_pt_pci_config_access_check(d, addr, len)) {
> +goto exit;
> +}
> +
> +/* find register group entry */
> +reg_grp_entry = xen_pt_find_reg_grp(s, addr);
> +if (reg_grp_entry) {
> +/* check 0-Hardwired register group */
> +if (reg_grp_entry->reg_grp->grp_type == XEN_PT_GRP_TYPE_HARDWIRED) {
> +/* no need to emulate, just return 0 */
> +val = 0;
> +goto exit;
> +}
> +}
> +
> +/* read I/O device register value */
> +rc = xen_host_pci_get_block(&s->real_device, addr, (uint8_t *)&val, len);
> +if (rc < 0) {
> +XEN_PT_ERR(d, "pci_read_block failed. return value: %d.\n", rc);
> +memset(&val, 0xff, len);
> +}
> +
> +/* just return the I/O device register value for
> + * passthrough type register group */
> +if (reg_grp_entry == NULL) {
> +goto exit;
> +}
> +
> +/* adjust the read value to appropriate CFC-CFF window */
> +val <<= (addr & 3) << 3;
> +emul_len = len;
> +
> +/* loop around the guest requested size */
> +while (emul_len > 0) {
> +/* find register entry to be emulated */
> +reg_entry = xen_pt_find_reg(reg_grp_entry, find_addr);
> +if (reg_entry) {
> +XenPTRegInfo *reg = reg_entry->reg;
> +uint32_t real_offset = reg_grp_entry->base_offset + reg->offset;
> +uint32_t valid_mask = 0x >> ((4 - emul_len) << 3);
> +uint8_t *ptr_val = NULL;
> +
> +valid_mask <<= (find_addr - real_offset) << 3;
> +ptr_val = (uint8_t *)&val + (real_offset & 3);
> +
> +/* do emulation based on register size */
> +switch (reg->size) {
> +case 1:
> +if (reg->u.b.read) {
> +rc = reg->u.b.read(s, reg_entry, ptr_val, valid_mask);
> +}
> +break;
> +case 2:
> +if (reg->u.w.read) {
> +rc = reg->u.w.read(s, reg_entry,
> +   (uint16_t *)ptr_val, valid_mask);
> +}
> +break;
> +case 4:
> +if (reg->u.dw.read) {
> +rc = reg->u.dw.read(s, reg_entry,
> +(uint32_t *)ptr_val, valid_mask);
> +}
> +break;
> +}
> +
> +if (rc < 0) {
> +xen_shutdown_fatal_error("Internal error: Invalid read "
> + "emulation. (%s, rc: %d)\n",
> + __func__, rc);
> +return 0;
> +}
> +
> +/* calculate next address to find */
> +emul_len -= reg->size;
> +if (emul_len > 0) {
> +find_addr = real_offset + reg->size;
> +}
> +} else {
> +/* nothing to do with passthrough type register,
> + * continue to find next byte */
> +emul_len--;
> +find_addr++;
> +}
> +}
> +
> +/* need to shift back before returning them to pci bus emulator */
> +val >>= ((addr & 3) << 3);
> +
> +exit:
> +XEN_PT_LOG_CONFIG(d, addr, val, len);
> +return val;
> +}
> +
> +static void xen_pt_pci_write_config(PCIDevice *d, uint32_t addr,
> +uint32_t val, int len)
> +{
> +XenPCIPassthroughState *s = DO_UPCAST(XenPCIPassthroughState, dev, d);
> +int index = 0;
> +XenPTRegGroup *reg_grp_entry = NULL;
> +int rc = 0;
> +uint32_t read_val = 0;
> +int emul_len = 0;
> +XenPTReg *reg_entry = NULL;
> +uint32_t find_addr = addr;
> +XenPTRegInfo *reg = NULL;
> +
> +if (xen_pt_pci_config_access_check(d, addr, len)) {
> +return;
> +}
> +
> +XEN_PT_LOG_CONFIG(d, addr, val, len);
> +
> +/* check unused BAR register */
> +index = xen_pt_bar_offset_to_index(addr);
> +if ((index >= 0) && (val > 0 && val < XEN_PT_BAR_ALLF) &&
> +(s->bases[index].bar_flag == XEN_PT_BAR_FLAG_UNUSED)) {
> +XEN_PT_WARN(

Re: [Qemu-devel] [PATCH 1/1] virtio-rng: device to send host entropy to guest

2012-05-16 Thread Paolo Bonzini
Il 16/05/2012 13:30, Amit Shah ha scritto:
>  hw-obj-$(CONFIG_VIRTIO) += virtio-console.o
> +hw-obj-$(CONFIG_VIRTIO) += virtio-rng.o

This needs to be conditional on CONFIG_LINUX too.

Paolo



Re: [Qemu-devel] [PATCH 1/4] Introduce a new hotplug state: Force eject.

2012-05-16 Thread Anthony PERARD
On Wed, May 16, 2012 at 12:23 PM, Paolo Bonzini  wrote:
> Il 16/05/2012 13:15, Anthony PERARD ha scritto:
>>> >         qdev_unplug(&(d->qdev), NULL);
>>> > +        qdev_free(&(d->qdev));
>>> >     }
>>> >  }
>>> >
>>> >
>>> > Anthony, can you confirm that this solves the problem for you?
>> This work until I try to hotplug a new device to the guest at wish
>> point I have this:
>> ERROR:/local/home/anthony/work/qemu/qom/object.c:389:object_delete:
>> assertion failed: (obj->ref == 0)
>>
>> This is because there is still a pending request of the hotunplug in
>> the acpi piix4.
>> If I call qdev_free without qdev_unplug, I hit the same assert, but
>> rigth away. This is way something new.
>
> Because it's missing the object_unparent done by qdev_unplug.  Does
> object_unparent+qdev_free work?  (I believe object_unparent should be
> done by qdev_free rather than qdev_unplug, but that's something for 1.2).

Cool, this seems to work fine. Thanks.

I'll test a bit more and resend a patch with only object_unparent+qdev_free.

-- 
Anthony PERARD



[Qemu-devel] [PATCH v10 0/9] XBZRLE delta for live migration of large memory app

2012-05-16 Thread Orit Wasserman
Changes from v9:
- move cache implementation to separate files. Kept our own 
implementation because GCache or GHashTable have no size limit.
- Add migrate_set_parameter function
- removed XBZRLE option from migrate command
- add cache size information to query_migrate command
- add documantation file
- write/read the exact XBZRLE header format
- fix other review comments by Anthony and Juan

Changes from v8:
Implement more effiecent cache_resize method
fix set_cachesize command 

Changes from v7:
Copy current page before encoding it, this will prevents page content
change during the encoding.
Allow changing the cache size during an active migration.
Fix comments by Avi.

Changes from v6:
 1) add assert checks to ULEB encoding/decoding
 2) no need to send last zero run

Changes from v5:
1) Add migration capabilities
2) Use ULEB to encode run length
3) Do not send unmodified (dirty) page
3) Fix other patch comments

Using GCache or GHashTable requires allocating new buffer on every content 
change and have no size limit ,
so I decided to keep the simple cache implementation.

Changes from v4:
1) Rebase
2) divide patch into 9 patches
3) move memory allocation into cache_insert

Future work :
 Use SSE for encoding.
 Page ranking acording to their dirty rate and automatic 
activation/deactivation of the feature - will be sent in a separate patch 
series.  

By using XBZRLE (Xor Based Zero Run Length Encoding) we can reduce VM downtime
and total live-migration time of VMs running memory write intensive workloads
typical of large enterprise applications such as SAP ERP Systems, and generally
speaking for any application with a sparse memory update pattern.

The compression format uses the fact that we will have many zero (zero 
represents
an unchanged value). 
We repesent the page data delta by zero and non zero runs.
We represent a zero run with it's length (in bytes). 
We represent a non zero run with it's length (in bytes) and the data.
The run length is encoded using ULEB128 (http://en.wikipedia.org/wiki/LEB128)

page = zrun nzrun
   | zrun nzrun page

zrun = length

nzrun = length byte...

length = uleb128 encoded integer

On the sender side XBZRLE is used as a compact delta encoding of page updates,
retrieving the old page content from an LRU cache (default size of 512 MB). The
receiving side uses the existing page content and XBZRLE to decode the new page
content.

This is a more compact way to store the delta than the previous version.

This work was originally based on research results published VEE 2011: 
Evaluation of
Delta Compression Techniques for Efficient Live Migration of Large Virtual
Machines by Benoit, Svard, Tordsson and Elmroth. Additionally the delta encoder
XBRLE was improved further using XBZRLE instead.

XBZRLE has a sustained bandwidth of 2-2.5 GB/s for typical workloads making it
ideal for in-line, real-time encoding such as is needed for live-migration.

A typical usage scenario:
{qemu} migrate_set_cachesize 256m
{qemu} migrate_set_parameter xbzrle
{qemu} migrate -d tcp:destination.host:
{qemu} info migrate
...
transferred ram: A kbytes
remaining ram: B kbytes
total ram: C kbytes
cache size: D bytes
xbzrle transferred: E kbytes
xbzrle pages: F pages
xbzrle cache miss: G
xbzrle overflow : H

Testing: live migration with XBZRLE completed in 110 seconds, without live
migration was not able to complete.

A simple synthetic memory r/w load generator:
..include 
..include 
..int main()
..{
..char *buf = (char *) calloc(4096, 4096);
..while (1) {
..int i;
..for (i = 0; i < 4096 * 4; i++) {
..buf[i * 4096 / 4]++;
..}
..printf(".");
..}
..}

Signed-off-by: Benoit Hudzia 
Signed-off-by: Petter Svard 
Signed-off-by: Aidan Shribman 

Orit Wasserman (9):
  From: Isaku Yamahata 
  Add migration capabilites
  Add XBZRLE documentation
  Add cache handling functions
  Add uleb encoding/decoding functions
  Add save_block_hdr function
  Add XBZRLE to ram_save_block and ram_save_live
  Add set_cachesize command
  Add XBZRLE statistics

 Makefile.objs|1 +
 arch_init.c  |  317 +
 block-migration.c|8 +-
 cache.c  |  212 +
 cutils.c |   29 +
 docs/xbzrle.txt  |   97 +++
 hmp-commands.hx  |   31 +
 hmp.c|   67 +++
 hmp.h|3 +
 include/qemu/cache.h |   81 +
 migration.c  |  134 --
 migration.h  |   29 +-
 monitor.c|7 +
 qapi-schema.json |   84 +-
 qemu-common.h|   14 +++
 qmp-commands.hx  |   97 +++
 savevm.c

Re: [Qemu-devel] [PATCH 1/1] virtio-rng: device to send host entropy to guest

2012-05-16 Thread Amit Shah
On (Wed) 16 May 2012 [13:38:06], Paolo Bonzini wrote:
> Il 16/05/2012 13:30, Amit Shah ha scritto:
> >  hw-obj-$(CONFIG_VIRTIO) += virtio-console.o
> > +hw-obj-$(CONFIG_VIRTIO) += virtio-rng.o
> 
> This needs to be conditional on CONFIG_LINUX too.

Right.

Thanks,

Amit



[Qemu-devel] [PATCH v10 0/9] XBZRLE delta for live migration of large memory app

2012-05-16 Thread Orit Wasserman
Changes from v9:
- move cache implementation to separate files. Kept our own 
implementation because GCache or GHashTable have no size limit.
- Add migrate_set_parameter function
- removed XBZRLE option from migrate command
- add cache size information to query_migrate command
- add documantation file
- write/read the exact XBZRLE header format
- fix other review comments by Anthony and Juan

Changes from v8:
Implement more effiecent cache_resize method
fix set_cachesize command 

Changes from v7:
Copy current page before encoding it, this will prevents page content
change during the encoding.
Allow changing the cache size during an active migration.
Fix comments by Avi.

Changes from v6:
 1) add assert checks to ULEB encoding/decoding
 2) no need to send last zero run

Changes from v5:
1) Add migration capabilities
2) Use ULEB to encode run length
3) Do not send unmodified (dirty) page
3) Fix other patch comments

Using GCache or GHashTable requires allocating new buffer on every content 
change and have no size limit ,
so I decided to keep the simple cache implementation.

Changes from v4:
1) Rebase
2) divide patch into 9 patches
3) move memory allocation into cache_insert

Future work :
 Use SSE for encoding.
 Page ranking acording to their dirty rate and automatic 
activation/deactivation of the feature - will be sent in a separate patch 
series.  

By using XBZRLE (Xor Based Zero Run Length Encoding) we can reduce VM downtime
and total live-migration time of VMs running memory write intensive workloads
typical of large enterprise applications such as SAP ERP Systems, and generally
speaking for any application with a sparse memory update pattern.

The compression format uses the fact that we will have many zero (zero 
represents
an unchanged value). 
We repesent the page data delta by zero and non zero runs.
We represent a zero run with it's length (in bytes). 
We represent a non zero run with it's length (in bytes) and the data.
The run length is encoded using ULEB128 (http://en.wikipedia.org/wiki/LEB128)

page = zrun nzrun
   | zrun nzrun page

zrun = length

nzrun = length byte...

length = uleb128 encoded integer

On the sender side XBZRLE is used as a compact delta encoding of page updates,
retrieving the old page content from an LRU cache (default size of 512 MB). The
receiving side uses the existing page content and XBZRLE to decode the new page
content.

This is a more compact way to store the delta than the previous version.

This work was originally based on research results published VEE 2011: 
Evaluation of
Delta Compression Techniques for Efficient Live Migration of Large Virtual
Machines by Benoit, Svard, Tordsson and Elmroth. Additionally the delta encoder
XBRLE was improved further using XBZRLE instead.

XBZRLE has a sustained bandwidth of 2-2.5 GB/s for typical workloads making it
ideal for in-line, real-time encoding such as is needed for live-migration.

A typical usage scenario:
{qemu} migrate_set_cachesize 256m
{qemu} migrate_set_parameter xbzrle
{qemu} migrate -d tcp:destination.host:
{qemu} info migrate
...
transferred ram: A kbytes
remaining ram: B kbytes
total ram: C kbytes
cache size: D bytes
xbzrle transferred: E kbytes
xbzrle pages: F pages
xbzrle cache miss: G
xbzrle overflow : H

Testing: live migration with XBZRLE completed in 110 seconds, without live
migration was not able to complete.

A simple synthetic memory r/w load generator:
..include 
..include 
..int main()
..{
..char *buf = (char *) calloc(4096, 4096);
..while (1) {
..int i;
..for (i = 0; i < 4096 * 4; i++) {
..buf[i * 4096 / 4]++;
..}
..printf(".");
..}
..}

Signed-off-by: Benoit Hudzia 
Signed-off-by: Petter Svard 
Signed-off-by: Aidan Shribman 

Orit Wasserman (9):
  From: Isaku Yamahata 
  Add migration capabilites
  Add XBZRLE documentation
  Add cache handling functions
  Add uleb encoding/decoding functions
  Add save_block_hdr function
  Add XBZRLE to ram_save_block and ram_save_live
  Add set_cachesize command
  Add XBZRLE statistics

 Makefile.objs|1 +
 arch_init.c  |  317 +
 block-migration.c|8 +-
 cache.c  |  212 +
 cutils.c |   29 +
 docs/xbzrle.txt  |   97 +++
 hmp-commands.hx  |   31 +
 hmp.c|   67 +++
 hmp.h|3 +
 include/qemu/cache.h |   81 +
 migration.c  |  134 --
 migration.h  |   29 +-
 monitor.c|7 +
 qapi-schema.json |   84 +-
 qemu-common.h|   14 +++
 qmp-commands.hx  |   97 +++
 savevm.c

[Qemu-devel] [PATCH v10 8/9] Add set_cachesize command

2012-05-16 Thread Orit Wasserman
Change XBZRLE cache size in bytes (the size should be a power of 2).
If XBZRLE cache size is too small there will be many cache miss.

Signed-off-by: Benoit Hudzia 
Signed-off-by: Petter Svard 
Signed-off-by: Aidan Shribman 
Signed-off-by: Orit Wasserman 
---
 arch_init.c  |7 +++
 hmp-commands.hx  |   15 +++
 hmp.c|   13 +
 hmp.h|1 +
 migration.c  |   32 +++-
 migration.h  |2 ++
 qapi-schema.json |   13 +
 qemu-common.h|5 +
 qmp-commands.hx  |   22 ++
 9 files changed, 109 insertions(+), 1 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 7ebdb7a..851e45d 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #ifndef _WIN32
 #include 
 #include 
@@ -153,6 +154,12 @@ static struct {
 Cache *cache;
 } XBZRLE = {0};
 
+
+void xbzrle_cache_resize(int64_t order)
+{
+cache_resize(XBZRLE.cache, pow(2, order));
+}
+
 static void save_block_hdr(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
 int cont, int flag)
 {
diff --git a/hmp-commands.hx b/hmp-commands.hx
index e14e7be..abc9403 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -829,6 +829,21 @@ STEXI
 @item migrate_cancel
 @findex migrate_cancel
 Cancel the current VM migration.
+
+ETEXI
+
+{
+.name   = "migrate_set_cachesize",
+.args_type  = "value:o",
+.params = "value",
+.help   = "set cache size (in bytes) for XBZRLE migrations. The 
cache size effects the number of cache misses. In case of a high cache miss 
ratio you need to increase the cache size",
+.mhandler.cmd = hmp_migrate_set_cachesize,
+},
+
+STEXI
+@item migrate_set_cachesize @var{value}
+@findex migrate_set_cache
+Set cache size to @var{value} (in bytes) for xbzrle migrations.
 ETEXI
 
 {
diff --git a/hmp.c b/hmp.c
index e73132b..0e4d63a 100644
--- a/hmp.c
+++ b/hmp.c
@@ -751,6 +751,19 @@ void hmp_migrate_set_downtime(Monitor *mon, const QDict 
*qdict)
 qmp_migrate_set_downtime(value, NULL);
 }
 
+void hmp_migrate_set_cachesize(Monitor *mon, const QDict *qdict)
+{
+int64_t value = qdict_get_int(qdict, "value");
+Error *err = NULL;
+
+qmp_migrate_set_cachesize(value, &err);
+if (err) {
+monitor_printf(mon, "%s\n", error_get_pretty(err));
+error_free(err);
+return;
+}
+}
+
 void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict)
 {
 int64_t value = qdict_get_int(qdict, "value");
diff --git a/hmp.h b/hmp.h
index 5f9d842..9559559 100644
--- a/hmp.h
+++ b/hmp.h
@@ -53,6 +53,7 @@ void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict);
+void hmp_migrate_set_cachesize(Monitor *mon, const QDict *qdict);
 void hmp_set_password(Monitor *mon, const QDict *qdict);
 void hmp_expire_password(Monitor *mon, const QDict *qdict);
 void hmp_eject(Monitor *mon, const QDict *qdict);
diff --git a/migration.c b/migration.c
index ba11adb..4fb3b8a 100644
--- a/migration.c
+++ b/migration.c
@@ -22,6 +22,7 @@
 #include "qemu_socket.h"
 #include "block-migration.h"
 #include "qmp-commands.h"
+#include 
 
 //#define DEBUG_MIGRATION
 
@@ -43,7 +44,7 @@ enum {
 
 #define MAX_THROTTLE  (32 << 20)  /* Migration speed throttling */
 
-/* Migration XBZRLE cache size */
+/* Migration XBZRLE default cache size */
 #define DEFAULT_MIGRATE_CACHE_SIZE (64 * 1024 * 1024)
 
 static NotifierList migration_state_notifiers =
@@ -501,6 +502,35 @@ void qmp_migrate_cancel(Error **errp)
 migrate_fd_cancel(migrate_get_current());
 }
 
+void qmp_migrate_set_cachesize(int64_t value, Error **errp)
+{
+MigrationState *s = migrate_get_current();
+
+/* Check for truncation */
+if (value != (size_t)value) {
+error_set(errp, QERR_INVALID_PARAMETER_VALUE, "cache size",
+  "exceeding address space");
+return;
+}
+
+value = MIN(UINT64_MAX, value);
+
+/* no change */
+if (value == s->xbzrle_cache_size) {
+return;
+}
+
+/* power of 2 */
+if (value != 1 && !is_power_of_2(value)) {
+error_set(errp, QERR_INVALID_PARAMETER_VALUE, "cache size",
+  "needs to be power of 2");
+return;
+}
+
+s->xbzrle_cache_size = value;
+xbzrle_cache_resize(log2(value));
+}
+
 void qmp_migrate_set_speed(int64_t value, Error **errp)
 {
 MigrationState *s;
diff --git a/migration.h b/migration.h
index 175c729..6a5bc0e 100644
--- a/migration.h
+++ b/migration.h
@@ -106,4 +106,6 @@ int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t 
*dst, int dlen);
 int migrate_use_xbzrle(void);
 int64_t migrate_xbzrle_cache_size(void);
 
+void xbzrle_cache_resize(int64_t new_size);
+
 #endif
diff --git a/qapi-schema.json 

[Qemu-devel] [PATCH v10 1/9] From: Isaku Yamahata

2012-05-16 Thread Orit Wasserman
Add MigrationParams structure

Signed-off-by: Isaku Yamahata 
---
 block-migration.c |8 
 migration.c   |   13 -
 migration.h   |8 ++--
 qemu-common.h |1 +
 savevm.c  |   11 ---
 sysemu.h  |3 ++-
 vmstate.h |2 +-
 7 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/block-migration.c b/block-migration.c
index fd2..b95b4e1 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -700,13 +700,13 @@ static int block_load(QEMUFile *f, void *opaque, int 
version_id)
 return 0;
 }
 
-static void block_set_params(int blk_enable, int shared_base, void *opaque)
+static void block_set_params(const MigrationParams *params, void *opaque)
 {
-block_mig_state.blk_enable = blk_enable;
-block_mig_state.shared_base = shared_base;
+block_mig_state.blk_enable = params->blk;
+block_mig_state.shared_base = params->shared;
 
 /* shared base means that blk_enable = 1 */
-block_mig_state.blk_enable |= shared_base;
+block_mig_state.blk_enable |= params->shared;
 }
 
 void blk_mig_init(void)
diff --git a/migration.c b/migration.c
index f9e968e..9d1d925 100644
--- a/migration.c
+++ b/migration.c
@@ -352,7 +352,7 @@ void migrate_fd_connect(MigrationState *s)
   migrate_fd_close);
 
 DPRINTF("beginning savevm\n");
-ret = qemu_savevm_state_begin(s->file, s->blk, s->shared);
+ret = qemu_savevm_state_begin(s->file, &s->params);
 if (ret < 0) {
 DPRINTF("failed, %d\n", ret);
 migrate_fd_error(s);
@@ -361,15 +361,14 @@ void migrate_fd_connect(MigrationState *s)
 migrate_fd_put_ready(s);
 }
 
-static MigrationState *migrate_init(int blk, int inc)
+static MigrationState *migrate_init(const MigrationParams *params)
 {
 MigrationState *s = migrate_get_current();
 int64_t bandwidth_limit = s->bandwidth_limit;
 
 memset(s, 0, sizeof(*s));
 s->bandwidth_limit = bandwidth_limit;
-s->blk = blk;
-s->shared = inc;
+s->params = *params;
 
 s->bandwidth_limit = bandwidth_limit;
 s->state = MIG_STATE_SETUP;
@@ -394,9 +393,13 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
  Error **errp)
 {
 MigrationState *s = migrate_get_current();
+MigrationParams params;
 const char *p;
 int ret;
 
+params.blk = blk;
+params.shared = inc;
+
 if (s->state == MIG_STATE_ACTIVE) {
 error_set(errp, QERR_MIGRATION_ACTIVE);
 return;
@@ -411,7 +414,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 return;
 }
 
-s = migrate_init(blk, inc);
+s = migrate_init(¶ms);
 
 if (strstart(uri, "tcp:", &p)) {
 ret = tcp_start_outgoing_migration(s, p);
diff --git a/migration.h b/migration.h
index 691b367..9e3bba7 100644
--- a/migration.h
+++ b/migration.h
@@ -19,6 +19,11 @@
 #include "notify.h"
 #include "error.h"
 
+struct MigrationParams {
+int blk;
+int shared;
+};
+
 typedef struct MigrationState MigrationState;
 
 struct MigrationState
@@ -31,8 +36,7 @@ struct MigrationState
 int (*close)(MigrationState *s);
 int (*write)(MigrationState *s, const void *buff, size_t size);
 void *opaque;
-int blk;
-int shared;
+MigrationParams params;
 };
 
 void process_incoming_migration(QEMUFile *f);
diff --git a/qemu-common.h b/qemu-common.h
index 50f659a..30c59c8 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -17,6 +17,7 @@ typedef struct DeviceState DeviceState;
 
 struct Monitor;
 typedef struct Monitor Monitor;
+typedef struct MigrationParams MigrationParams;
 
 /* we put basic includes here to avoid repeating them in device drivers */
 #include 
diff --git a/savevm.c b/savevm.c
index 2d18bab..dd66f2c 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1561,7 +1561,8 @@ bool qemu_savevm_state_blocked(Error **errp)
 return false;
 }
 
-int qemu_savevm_state_begin(QEMUFile *f, int blk_enable, int shared)
+int qemu_savevm_state_begin(QEMUFile *f,
+const MigrationParams *params)
 {
 SaveStateEntry *se;
 int ret;
@@ -1570,7 +1571,7 @@ int qemu_savevm_state_begin(QEMUFile *f, int blk_enable, 
int shared)
 if(se->set_params == NULL) {
 continue;
}
-   se->set_params(blk_enable, shared, se->opaque);
+se->set_params(params, se->opaque);
 }
 
 qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
@@ -1708,13 +1709,17 @@ void qemu_savevm_state_cancel(QEMUFile *f)
 static int qemu_savevm_state(QEMUFile *f)
 {
 int ret;
+MigrationParams params = {
+.blk = 0,
+.shared = 0
+};
 
 if (qemu_savevm_state_blocked(NULL)) {
 ret = -EINVAL;
 goto out;
 }
 
-ret = qemu_savevm_state_begin(f, 0, 0);
+ret = qemu_savevm_state_begin(f, ¶ms);
 if (ret < 0)
 goto out;
 
diff --git a/sysemu.h b/sysemu.h
index bc2c788..6540c79 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -77,7 +77,8 @@ void do_info_sna

Re: [Qemu-devel] [PATCH qom-next 00/22] ARM: QOM cpu_reset() followups

2012-05-16 Thread Igor Mitsyanko

On 05/14/2012 09:31 PM, Andreas Färber wrote:

Hello Peter,

Following up on your remark about ugly naming as a consequence of my CPU reset
patches and my OMAP field rename, here's a few more patches.

Patch 1 is spelling fixes in a comment and could be pulled into 1.1.

Patches 2-7 fix some more naming ugliness that I missed when running
  because they were just "cpu->cpu". Again, fix by
using "mpu" rather than "cpu".

Patches 8-19 are what I had in mind with my choice of variable placement in
the reset series. They clean up expressions in some targets, at the same time
incur changes to other targets which would be needed for wiring up the CPU
later anyway.

Patches 20-22 are preparations cherry-picked from QOM CPUState part 4, which
I'm including here since you indicated there might be a conflict with cp15 or
qemu-linaro work.

Please ack and indicate which merge order you would prefer.

Regards,
Andreas

Cc: Peter Maydell

Andreas Färber (22):
   arm_boot: Fix typos in comment
   omap_sx1: Rename omap_mpu_state_s variable
   palm: Rename omap_mpu_state_s variable
   mainstone: Rename PXA2xxState variable
   spitz: Rename PXA2xxState variable
   tosa: Rename PXA2xxState variable
   z2: Rename PXA2xxState variable
   strongarm: Use cpu_arm_init() to store ARMCPU in StrongARMState
   integratorcp: Use cpu_arm_init() to obtain ARMCPU
   musicpal: Use cpu_arm_init() to obtain ARMCPU
   versatilepb: Use cpu_arm_init() to obtain ARMCPU
   xilinx_zynq: Use cpu_arm_init() to obtain ARMCPU
   arm_boot: Pass ARMCPU to arm_boot_info::write_secondary_boot()
   arm_boot: Pass ARMCPU to arm_boot_info::secondary_cpu_reset_hook()
   arm_boot: Pass ARMCPU to arm_load_kernel()
   realview: Use cpu_arm_init() to obtain ARMCPU
   vexpress: Use cpu_arm_init() to obtain ARMCPU
   exynos4210: Use cpu_arm_init() to store ARMCPU
   arm_pic: Pass ARMCPU to arm_pic_init_cpu()
   pxa2xx: Pass ARMCPU to pxa2xx_pic_init()
   pxa2xx_pic: Store ARMCPU in PXA2xxPICState
   pxa2xx_gpio: Store ARMCPU in PXA2xxGPIOInfo

  hw/arm-misc.h   |   12 ++--
  hw/arm_boot.c   |   14 --
  hw/arm_pic.c|8 +---
  hw/armv7m.c |2 +-
  hw/collie.c |2 +-
  hw/exynos4210.c |9 +
  hw/exynos4210.h |4 ++--
  hw/exynos4_boards.c |4 ++--
  hw/highbank.c   |   13 +++--
  hw/integratorcp.c   |   14 --
  hw/mainstone.c  |   16 
  hw/musicpal.c   |   10 +-
  hw/nseries.c|2 +-
  hw/omap1.c  |2 +-
  hw/omap2.c  |2 +-
  hw/omap_sx1.c   |6 +++---
  hw/palm.c   |   12 ++--
  hw/pxa.h|2 +-
  hw/pxa2xx.c |4 ++--
  hw/pxa2xx_gpio.c|9 +
  hw/pxa2xx_pic.c |   32 ++--
  hw/realview.c   |   12 +++-
  hw/spitz.c  |   24 
  hw/strongarm.c  |6 +++---
  hw/strongarm.h  |2 +-
  hw/tosa.c   |   14 +++---
  hw/versatilepb.c|   13 +++--
  hw/vexpress.c   |   18 +-
  hw/xilinx_zynq.c|   10 +-
  hw/z2.c |   26 +-
  30 files changed, 160 insertions(+), 144 deletions(-)



Nice, works fine. Exynos part:

Acked-by: Igor Mitsyanko 

--
Mitsyanko Igor
ASWG, Moscow R&D center, Samsung Electronics
email: i.mitsya...@samsung.com




Re: [Qemu-devel] [PATCH next v2 73/74] linux-user: Use cpu_reset() after cpu_init() / cpu_copy()

2012-05-16 Thread Igor Mammedov

On 05/11/2012 01:26 PM, Andreas Färber wrote:

Am 11.05.2012 13:22, schrieb Peter Maydell:

On 10 May 2012 01:14, Andreas Färber  wrote:

Eliminates cpu_state_reset() usage.

Signed-off-by: Andreas Färber
---
  linux-user/main.c|2 +-
  linux-user/syscall.c |2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/linux-user/main.c b/linux-user/main.c
index 191b750..49108b8 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -3405,7 +3405,7 @@ int main(int argc, char **argv, char **envp)
 exit(1);
 }
  #if defined(TARGET_I386) || defined(TARGET_SPARC) || defined(TARGET_PPC)
-cpu_state_reset(env);
+cpu_reset(ENV_GET_CPU(env));
  #endif

 thread_env = env;
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 20d2a74..539af3f 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -4262,7 +4262,7 @@ static int do_fork(CPUArchState *env, unsigned int flags, 
abi_ulong newsp,
 /* we create a new CPU instance. */
 new_env = cpu_copy(env);
  #if defined(TARGET_I386) || defined(TARGET_SPARC) || defined(TARGET_PPC)
-cpu_state_reset(new_env);
+cpu_reset(ENV_GET_CPU(new_env));
  #endif
 /* Init regs that differ from the parent.  */
 cpu_clone_regs(new_env, newsp);
--


Do you have any plans to try to rationalise the handling of reset
so that we consistently either do or don't reset the cpu here,
rather than having it done based on a TARGET_* ifdef ?


Igor had an RFC for x86; sparc and ppc reset I haven't looked into yet.
Cc'ing Alex and Blue.

I'll rebase RFC for x86 and post patches today and will remove it from here
by the last patch in patchset so that when this patch applied we could remove
unnecessary call.
So ACK for target-i386 here.

--
-
 Igor



[Qemu-devel] [PATCH v10 2/9] Add migration capabilites

2012-05-16 Thread Orit Wasserman
Add migration capabiltes that can be queried by the management.
The managment can query the source QEMU and the destination QEMU in order to
verify both support some  migration capability (currently only XBZRLE).
The managment can enable a capabilty for the next migration by using
migrate_set_parameter command.

Signed-off-by: Orit Wasserman 
---
 hmp-commands.hx  |   16 
 hmp.c|   41 +
 hmp.h|2 ++
 migration.c  |   53 +++--
 migration.h  |2 ++
 monitor.c|7 +++
 qapi-schema.json |   46 +-
 qmp-commands.hx  |   47 +++
 savevm.c |2 +-
 9 files changed, 212 insertions(+), 4 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 18cb415..e14e7be 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -861,6 +861,20 @@ Set maximum tolerated downtime (in seconds) for migration.
 ETEXI
 
 {
+.name   = "migrate_set_parameter",
+.args_type  = "parameter:s",
+.params = "parameter",
+.help   = "Enable the usage of a capability for migration",
+.mhandler.cmd = hmp_migrate_set_parameter,
+},
+
+STEXI
+@item migrate_set_parameter @var{parameter}
+@findex migrate_set_parameter
+Enable the usage of a capability @var{parameter} for migration.
+ETEXI
+
+{
 .name   = "client_migrate_info",
 .args_type  = 
"protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
 .params = "protocol hostname port tls-port cert-subject",
@@ -1393,6 +1407,8 @@ show CPU statistics
 show user network stack connection states
 @item info migrate
 show migration status
+@item info migration_capabilities
+show migration capabilities
 @item info balloon
 show balloon information
 @item info qtree
diff --git a/hmp.c b/hmp.c
index 1f9fe0e..e73132b 100644
--- a/hmp.c
+++ b/hmp.c
@@ -128,9 +128,18 @@ void hmp_info_mice(Monitor *mon)
 void hmp_info_migrate(Monitor *mon)
 {
 MigrationInfo *info;
+MigrationCapabilityInfoList *cap;
 
 info = qmp_query_migrate(NULL);
 
+if (info->has_params && info->params) {
+monitor_printf(mon, "params: ");
+for (cap = info->params; cap; cap = cap->next) {
+monitor_printf(mon, "%s",
+   MigrationCapability_lookup[cap->value->capability]);
+}
+monitor_printf(mon, "\n");
+}
 if (info->has_status) {
 monitor_printf(mon, "Migration status: %s\n", info->status);
 }
@@ -156,6 +165,24 @@ void hmp_info_migrate(Monitor *mon)
 qapi_free_MigrationInfo(info);
 }
 
+void hmp_info_migration_capabilities(Monitor *mon)
+{
+MigrationCapabilityInfoList *caps_list, *cap;
+
+caps_list = qmp_query_migration_capabilities(NULL);
+if (!caps_list) {
+monitor_printf(mon, "No migration capabilities found\n");
+return;
+}
+
+for (cap = caps_list; cap; cap = cap->next) {
+monitor_printf(mon, "%s ",
+   MigrationCapability_lookup[cap->value->capability]);
+}
+
+qapi_free_MigrationCapabilityInfoList(caps_list);
+}
+
 void hmp_info_cpus(Monitor *mon)
 {
 CpuInfoList *cpu_list, *cpu;
@@ -730,6 +757,20 @@ void hmp_migrate_set_speed(Monitor *mon, const QDict 
*qdict)
 qmp_migrate_set_speed(value, NULL);
 }
 
+void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
+{
+const char *value = qdict_get_str(qdict, "parameter");
+Error *err = NULL;
+
+qmp_migrate_set_parameter(value, &err);
+
+if (err) {
+monitor_printf(mon, "migrate_set_parameter: %s\n",
+   error_get_pretty(err));
+error_free(err);
+}
+}
+
 void hmp_set_password(Monitor *mon, const QDict *qdict)
 {
 const char *protocol  = qdict_get_str(qdict, "protocol");
diff --git a/hmp.h b/hmp.h
index 443b812..5f9d842 100644
--- a/hmp.h
+++ b/hmp.h
@@ -25,6 +25,7 @@ void hmp_info_uuid(Monitor *mon);
 void hmp_info_chardev(Monitor *mon);
 void hmp_info_mice(Monitor *mon);
 void hmp_info_migrate(Monitor *mon);
+void hmp_info_migration_capabilities(Monitor *mon);
 void hmp_info_cpus(Monitor *mon);
 void hmp_info_block(Monitor *mon);
 void hmp_info_blockstats(Monitor *mon);
@@ -51,6 +52,7 @@ void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict);
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
+void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict);
 void hmp_set_password(Monitor *mon, const QDict *qdict);
 void hmp_expire_password(Monitor *mon, const QDict *qdict);
 void hmp_eject(Monitor *mon, const QDict *qdict);
diff --git a/migration.c b/migration.c
index 9d1d925..66e71a3 100644
--- a/migration.c
+++ b/migration.c
@@ -117,10 +117,22 @@ Migrat

Re: [Qemu-devel] [PATCH v10 1/9] From: Isaku Yamahata

2012-05-16 Thread Peter Maydell
On 16 May 2012 12:59, Orit Wasserman  wrote:
> Add MigrationParams structure
>
> Signed-off-by: Isaku Yamahata 

You seem to have managed to get the From: authorship line in
the subject commit summary line somehow in this patch...

-- PMM



[Qemu-devel] [PATCH v10 4/9] Add cache handling functions

2012-05-16 Thread Orit Wasserman
Add LRU page cache mechanism.
The page are accessed by their address.

Signed-off-by: Benoit Hudzia 
Signed-off-by: Petter Svard 
Signed-off-by: Aidan Shribman 
Signed-off-by: Orit Wasserman 
---
 Makefile.objs|1 +
 cache.c  |  212 ++
 include/qemu/cache.h |   81 +++
 3 files changed, 294 insertions(+), 0 deletions(-)
 create mode 100644 cache.c
 create mode 100644 include/qemu/cache.h

diff --git a/Makefile.objs b/Makefile.objs
index 70c5c79..8fed055 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -137,6 +137,7 @@ common-obj-y += qdev.o qdev-properties.o qdev-monitor.o
 common-obj-y += block-migration.o iohandler.o
 common-obj-y += pflib.o
 common-obj-y += bitmap.o bitops.o
+common-obj-y += cache.o
 
 common-obj-$(CONFIG_BRLAPI) += baum.o
 common-obj-$(CONFIG_POSIX) += migration-exec.o migration-unix.o migration-fd.o
diff --git a/cache.c b/cache.c
new file mode 100644
index 000..c300fa6
--- /dev/null
+++ b/cache.c
@@ -0,0 +1,212 @@
+/*
+ * Page cache for qemu
+ * The cache is base on a hash on the page address
+ *
+ * Copyright 2011 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *  Orit Wasserman  
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Contributions after 2012-01-13 are licensed under the terms of the
+ * GNU GPL, version 2 or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "qemu/cache.h"
+
+#ifdef DEBUG_CACHE
+#define DPRINTF(fmt, ...) \
+do { fprintf(stdout, "cache: " fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+do { } while (0)
+#endif
+
+typedef struct CacheItem CacheItem;
+
+struct CacheItem {
+uint64_t it_addr;
+unsigned long it_age;
+uint8_t *it_data;
+};
+
+struct Cache {
+CacheItem *page_cache;
+unsigned int page_size;
+int64_t max_num_items;
+uint64_t max_item_age;
+int64_t num_items;
+};
+
+Cache *cache_init(int64_t num_pages, unsigned int page_size)
+{
+int i;
+
+Cache *cache = g_malloc(sizeof(Cache));
+if (!cache) {
+DPRINTF("Error allocation Cache\n");
+return NULL;
+}
+
+if (num_pages <= 0) {
+DPRINTF("invalid number pages\n");
+return NULL;
+}
+
+cache->page_size = page_size;
+cache->num_items = 0;
+cache->max_item_age = 0;
+cache->max_num_items = num_pages;
+
+DPRINTF("Setting cache buckets to %lu\n", cache->max_num_items);
+
+cache->page_cache = g_malloc((cache->max_num_items) *
+ sizeof(CacheItem));
+if (!cache->page_cache) {
+DPRINTF("could not allocate cache\n");
+g_free(cache);
+return NULL;
+}
+
+for (i = 0; i < cache->max_num_items; i++) {
+cache->page_cache[i].it_data = NULL;
+cache->page_cache[i].it_age = 0;
+cache->page_cache[i].it_addr = -1;
+}
+
+return cache;
+}
+
+void cache_fini(Cache *cache)
+{
+int i;
+
+g_assert(cache);
+g_assert(cache->page_cache);
+
+for (i = 0; i < cache->max_num_items; i++) {
+g_free(cache->page_cache[i].it_data);
+cache->page_cache[i].it_data = 0;
+}
+
+g_free(cache->page_cache);
+cache->page_cache = NULL;
+}
+
+static unsigned long cache_get_cache_pos(const Cache *cache, uint64_t address)
+{
+unsigned long pos;
+
+g_assert(cache->max_num_items);
+pos = (address/cache->page_size) & (cache->max_num_items - 1);
+return pos;
+}
+
+bool cache_is_cached(const Cache *cache, uint64_t addr)
+{
+unsigned long pos;
+
+g_assert(cache);
+g_assert(cache->page_cache);
+
+pos = cache_get_cache_pos(cache, addr);
+
+return (cache->page_cache[pos].it_addr == addr);
+}
+
+static CacheItem *cache_get_by_addr(const Cache *cache, uint64_t addr)
+{
+unsigned long pos;
+
+g_assert(cache);
+g_assert(cache->page_cache);
+
+pos = cache_get_cache_pos(cache, addr);
+
+return &cache->page_cache[pos];
+}
+
+uint8_t *get_cached_data(const Cache *cache, uint64_t addr)
+{
+return cache_get_by_addr(cache, addr)->it_data;
+}
+
+void cache_insert(Cache *cache, unsigned long addr, uint8_t *pdata)
+{
+
+CacheItem *it = NULL;
+
+g_assert(cache);
+g_assert(cache->page_cache);
+
+/* actual update of entry */
+it = cache_get_by_addr(cache, addr);
+
+if (!it->it_data) {
+cache->num_items++;
+}
+
+it->it_data = pdata;
+it->it_age = ++cache->max_item_age;
+it->it_addr = addr;
+}
+
+int cache_resize(Cache *cache, int64_t new_num_pages)
+{
+Cache *new_cache;
+int i;
+
+CacheItem *old_it, *new_it;
+
+g_assert(cache);
+
+/* same size */
+if (new_num_pages == cache->max_num_items) {
+return 0;
+}
+
+/* cache was not inited */
+if (cache->page_cache == NULL) {
+  

[Qemu-devel] [PATCH v10 3/9] Add XBZRLE documentation

2012-05-16 Thread Orit Wasserman
Signed-off-by: Orit Wasserman 
---
 docs/xbzrle.txt |   97 +++
 1 files changed, 97 insertions(+), 0 deletions(-)
 create mode 100644 docs/xbzrle.txt

diff --git a/docs/xbzrle.txt b/docs/xbzrle.txt
new file mode 100644
index 000..aafdb84
--- /dev/null
+++ b/docs/xbzrle.txt
@@ -0,0 +1,97 @@
+XBZRLE (Xor Based Zero Run Length Encoding)
+===
+
+Using XBZRLE (Xor Based Zero Run Length Encoding) allows for the reduction of 
VM downtime
+and the total live-migration time of Virtual machines. It is particularly 
useful for virtual machines running memory write intensive workloads that are 
typical of large enterprise applications such as SAP ERP Systems, and generally
+speaking for any application that uses a sparse memory update pattern.
+
+Instead of sending the changed guest memory page this solution will send a 
compressed version of the updates, thus reducing the amount of data sent during 
live migration.
+In order to be able to calculate the update, the previous memory pages needed 
to be stored. Those pages are stored in a dedicated cache (hash table) and are 
accessed by their address.
+The larger the cache size the better the chances are that the page has already 
been stored in the cache. A Small cache size will result in high cache miss 
rate.
+
+Format
+===
+
+The compression format uses the zero value, where zero represents an unchanged 
value.
+The page data delta is represented by zero and non zero runs.
+A zero run is represented by it's length (in bytes).
+A non zero run is represented by it's length (in bytes) and the data.
+The run length is encoded using ULEB128 (http://en.wikipedia.org/wiki/LEB128)
+
+page = zrun nzrun
+   | zrun nzrun page
+
+zrun = length
+
+nzrun = length byte...
+
+length = uleb128 encoded integer
+
+On the sender side XBZRLE is used as a compact delta encoding of page updates,
+retrieving the old page content from the cache (default size of 512 MB). The
+receiving side uses the existing page's content and XBZRLE to decode the new 
page's content.
+
+This is a more compact way to store the deltas than the previous version.
+
+This work was originally based on research results published VEE 2011: 
Evaluation of
+Delta Compression Techniques for Efficient Live Migration of Large Virtual
+Machines by Benoit, Svard, Tordsson and Elmroth. Additionally the delta encoder
+XBRLE was improved further using the XBZRLE instead.
+
+XBZRLE has a sustained bandwidth of 2-2.5 GB/s for typical workloads making it
+ideal for in-line, real-time encoding such as is needed for live-migration.
+
+Migration Capabilities
+==
+In order to use XBZRLE the destination QEMU version should be able to
+decode the new format.
+Adding a new migration capabilities command that will allow external management
+to query for it support.
+A typical use for the destination
+{qemu} info migrate_capabilities
+{qemu} xbzrle, ...
+
+In order to enable capabilities for future live migration,
+a new command migrate_set_parameter is introduced:
+{qemu} migrate_set_parameter xbzrle
+
+Usage
+==
+
+1. Activate xbzrle
+2. Set the XBZRLE cache size - the cache size is in MBytes and should be a 
power of 2. The cache default value is 64MBytes.
+3. start outgoing migration
+
+A typical usage scenario:
+{qemu} migrate_set_parameter xbzrle
+{qemu} migrate_set_cachesize 256m
+{qemu} migrate -d tcp:destination.host:
+{qemu} info migrate
+...
+transferred ram-duplicate: A kbytes
+transferred ram-normal: B kbytes
+transferred ram-xbrle: C kbytes
+overflow ram-xbrle: D pages
+cache-miss ram-xbrle: E pages
+
+cache-miss: the number of cache misses to date - high cache-miss rate
+indicates that the cache size is set too low.
+overflow: the number of overflows in the decoding which where the delta could 
not be compressed. This can happen if the changes in the pages are too large
+or there are many short changes for example change every second byte (half a 
page).
+
+Testing: Testing indicated that live migration with XBZRLE was completed in 
110 seconds, whereas without it would not be able to complete.
+
+A simple synthetic memory r/w load generator:
+..include 
+..include 
+..int main()
+..{
+..char *buf = (char *) calloc(4096, 4096);
+..while (1) {
+..int i;
+..for (i = 0; i < 4096 * 4; i++) {
+..buf[i * 4096 / 4]++;
+..}
+..printf(".");
+..}
+..}
-- 
1.7.7.6




[Qemu-devel] [PATCH] Add 'query-events' command to QMP to query async events

2012-05-16 Thread Daniel P. Berrange
From: "Daniel P. Berrange" 

Sometimes it is neccessary for an application to determine
whether a particular QMP event is available, so they can
decide whether to use compatibility code instead. This
introduces a new 'query-events' command to QMP todo just
that

 { "execute": "query-events" }
 {"return": [{"name": "WAKEUP"},
 {"name": "SUSPEND"},
 {"name": "DEVICE_TRAY_MOVED"},
 {"name": "BLOCK_JOB_CANCELLED"},
 {"name": "BLOCK_JOB_COMPLETED"},
 ...snip...
 {"name": "SHUTDOWN"}]}

* monitor.c: Split out MonitorEvent -> string conversion
  into monitor_protocol_event_name() API. Add impl of
  qmp_query_events monitor command handler
* qapi-schema.json, qmp-commands.hx: Define contract of
  query-events command

Signed-off-by: Daniel P. Berrange 
---
 monitor.c|   93 ++---
 qapi-schema.json |   22 +
 qmp-commands.hx  |   37 +
 3 files changed, 119 insertions(+), 33 deletions(-)

diff --git a/monitor.c b/monitor.c
index ef59cd9..73ecf21 100644
--- a/monitor.c
+++ b/monitor.c
@@ -422,84 +422,93 @@ static void timestamp_put(QDict *qdict)
 qdict_put_obj(qdict, "timestamp", obj);
 }
 
-/**
- * monitor_protocol_event(): Generate a Monitor event
- *
- * Event-specific data can be emitted through the (optional) 'data' parameter.
- */
-void monitor_protocol_event(MonitorEvent event, QObject *data)
-{
-QDict *qmp;
-const char *event_name;
-Monitor *mon;
-
-assert(event < QEVENT_MAX);
 
+static const char *monitor_protocol_event_name(MonitorEvent event)
+{
 switch (event) {
 case QEVENT_SHUTDOWN:
-event_name = "SHUTDOWN";
+return "SHUTDOWN";
 break;
 case QEVENT_RESET:
-event_name = "RESET";
+return "RESET";
 break;
 case QEVENT_POWERDOWN:
-event_name = "POWERDOWN";
+return "POWERDOWN";
 break;
 case QEVENT_STOP:
-event_name = "STOP";
+return "STOP";
 break;
 case QEVENT_RESUME:
-event_name = "RESUME";
+return "RESUME";
 break;
 case QEVENT_VNC_CONNECTED:
-event_name = "VNC_CONNECTED";
+return "VNC_CONNECTED";
 break;
 case QEVENT_VNC_INITIALIZED:
-event_name = "VNC_INITIALIZED";
+return "VNC_INITIALIZED";
 break;
 case QEVENT_VNC_DISCONNECTED:
-event_name = "VNC_DISCONNECTED";
+return "VNC_DISCONNECTED";
 break;
 case QEVENT_BLOCK_IO_ERROR:
-event_name = "BLOCK_IO_ERROR";
+return "BLOCK_IO_ERROR";
 break;
 case QEVENT_RTC_CHANGE:
-event_name = "RTC_CHANGE";
+return "RTC_CHANGE";
 break;
 case QEVENT_WATCHDOG:
-event_name = "WATCHDOG";
+return "WATCHDOG";
 break;
 case QEVENT_SPICE_CONNECTED:
-event_name = "SPICE_CONNECTED";
+return "SPICE_CONNECTED";
 break;
 case QEVENT_SPICE_INITIALIZED:
-event_name = "SPICE_INITIALIZED";
+return "SPICE_INITIALIZED";
 break;
 case QEVENT_SPICE_DISCONNECTED:
-event_name = "SPICE_DISCONNECTED";
+return "SPICE_DISCONNECTED";
 break;
 case QEVENT_BLOCK_JOB_COMPLETED:
-event_name = "BLOCK_JOB_COMPLETED";
+return "BLOCK_JOB_COMPLETED";
 break;
 case QEVENT_BLOCK_JOB_CANCELLED:
-event_name = "BLOCK_JOB_CANCELLED";
+return "BLOCK_JOB_CANCELLED";
 break;
 case QEVENT_DEVICE_TRAY_MOVED:
- event_name = "DEVICE_TRAY_MOVED";
+ return "DEVICE_TRAY_MOVED";
 break;
 case QEVENT_SUSPEND:
-event_name = "SUSPEND";
+return "SUSPEND";
 break;
 case QEVENT_WAKEUP:
-event_name = "WAKEUP";
+return "WAKEUP";
 break;
 case QEVENT_BALLOON_CHANGE:
-event_name = "BALLOON_CHANGE";
+return "BALLOON_CHANGE";
 break;
 default:
-abort();
+return NULL;
 break;
 }
+}
+
+/**
+ * monitor_protocol_event(): Generate a Monitor event
+ *
+ * Event-specific data can be emitted through the (optional) 'data' parameter.
+ */
+void monitor_protocol_event(MonitorEvent event, QObject *data)
+{
+QDict *qmp;
+const char *event_name;
+Monitor *mon;
+
+assert(event < QEVENT_MAX);
+
+event_name = monitor_protocol_event_name(event);
+if (event_name == NULL) {
+abort();
+}
 
 qmp = qdict_new();
 timestamp_put(qmp);
@@ -741,6 +750,24 @@ CommandInfoList *qmp_query_commands(Error **errp)
 return cmd_lis

[Qemu-devel] [PATCH v10 9/9] Add XBZRLE statistics

2012-05-16 Thread Orit Wasserman
Signed-off-by: Benoit Hudzia 
Signed-off-by: Petter Svard 
Signed-off-by: Aidan Shribman 
Signed-off-by: Orit Wasserman 
---
 arch_init.c  |   68 +-
 hmp.c|   13 ++
 migration.c  |   12 +
 migration.h  |9 +++
 qapi-schema.json |   27 +++--
 qmp-commands.hx  |   28 ++
 6 files changed, 153 insertions(+), 4 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 851e45d..1c35b26 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -160,8 +160,66 @@ void xbzrle_cache_resize(int64_t order)
 cache_resize(XBZRLE.cache, pow(2, order));
 }
 
+/* accounting */
+typedef struct AccountingInfo {
+uint64_t dup_pages;
+uint64_t norm_pages;
+uint64_t xbzrle_bytes;
+uint64_t xbzrle_pages;
+uint64_t xbzrle_cache_miss;
+uint64_t iterations;
+uint64_t xbzrle_overflows;
+} AccountingInfo;
+
+static AccountingInfo acct_info;
+
+static void acct_clear(void)
+{
+memset(&acct_info, 0, sizeof(acct_info));
+}
+
+uint64_t dup_mig_bytes_transferred(void)
+{
+return acct_info.dup_pages * TARGET_PAGE_SIZE;
+}
+
+uint64_t dup_mig_pages_transferred(void)
+{
+return acct_info.dup_pages;
+}
+
+uint64_t norm_mig_bytes_transferred(void)
+{
+return acct_info.norm_pages * TARGET_PAGE_SIZE;
+}
+
+uint64_t norm_mig_pages_transferred(void)
+{
+return acct_info.norm_pages;
+}
+
+uint64_t xbzrle_mig_bytes_transferred(void)
+{
+return acct_info.xbzrle_bytes;
+}
+
+uint64_t xbzrle_mig_pages_transferred(void)
+{
+return acct_info.xbzrle_pages;
+}
+
+uint64_t xbzrle_mig_pages_cache_miss(void)
+{
+return acct_info.xbzrle_cache_miss;
+}
+
+uint64_t xbzrle_mig_pages_overflow(void)
+{
+return acct_info.xbzrle_overflows;
+}
+
 static void save_block_hdr(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
-int cont, int flag)
+   int cont, int flag)
 {
 qemu_put_be64(f, offset | cont | flag);
 if (!cont) {
@@ -186,6 +244,7 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t 
*current_data,
 if (!cache_is_cached(XBZRLE.cache, current_addr)) {
 cache_insert(XBZRLE.cache, current_addr, g_memdup(current_data,
   TARGET_PAGE_SIZE));
+acct_info.xbzrle_cache_miss++;
 goto done;
 }
 
@@ -202,6 +261,7 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t 
*current_data,
 } else if (encoded_len == -1) {
 bytes_sent = -1;
 DPRINTF("Overflow\n");
+acct_info.xbzrle_overflows++;
 /* update data in the cache */
 memcpy(prev_cached_page, current_data, TARGET_PAGE_SIZE);
 goto done;
@@ -222,7 +282,9 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t 
*current_data,
 qemu_put_be16(f, hdr.xh_len);
 qemu_put_be32(f, hdr.xh_cksum);
 qemu_put_buffer(f, XBZRLE.encoded_buf, encoded_len);
+acct_info.xbzrle_pages++;
 bytes_sent = encoded_len + sizeof(hdr);
+acct_info.xbzrle_bytes += bytes_sent;
 
 done:
 return bytes_sent;
@@ -257,6 +319,7 @@ static int ram_save_block(QEMUFile *f, int stage)
 p = memory_region_get_ram_ptr(mr) + offset;
 
 if (is_dup_page(p)) {
+acct_info.dup_pages++;
 save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_COMPRESS);
 qemu_put_byte(f, *p);
 bytes_sent = 1;
@@ -279,6 +342,7 @@ static int ram_save_block(QEMUFile *f, int stage)
 save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_PAGE);
 qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
 bytes_sent = TARGET_PAGE_SIZE;
+acct_info.norm_pages++;
 }
 
 break;
@@ -410,6 +474,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
 return -1;
 }
 XBZRLE.encoded_buf = g_malloc0(TARGET_PAGE_SIZE);
+acct_clear();
 }
 
 /* Make sure all dirty bits are set */
@@ -444,6 +509,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
bytes_sent -1 represent no more blocks*/
 if (bytes_sent > 0) {
 bytes_transferred += bytes_sent;
+acct_info.iterations++;
 } else if (bytes_sent == -1) { /* no more blocks */
 break;
 }
diff --git a/hmp.c b/hmp.c
index 0e4d63a..8829d3a 100644
--- a/hmp.c
+++ b/hmp.c
@@ -162,6 +162,19 @@ void hmp_info_migrate(Monitor *mon)
info->disk->total >> 10);
 }
 
+if (info->has_cache) {
+monitor_printf(mon, "cache size: %" PRIu64 " bytes\n",
+   info->cache->cache_size);
+monitor_printf(mon, "xbzrle transferred: %" PRIu64 " kbytes\n",
+   info->cache->xbzrle_bytes >> 10);
+monitor_printf(mon, "xbzrle pages: %" PRIu64 " pages\n",
+   info->cache->xbzrle_pages);
+mon

[Qemu-devel] [PATCH v10 6/9] Add save_block_hdr function

2012-05-16 Thread Orit Wasserman
Signed-off-by: Benoit Hudzia 
Signed-off-by: Petter Svard 
Signed-off-by: Aidan Shribman 
Signed-off-by: Orit Wasserman 
---
 arch_init.c |   26 ++
 1 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 9a35aee..a334a2e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -127,6 +127,18 @@ static int is_dup_page(uint8_t *page)
 return 1;
 }
 
+static void save_block_hdr(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
+int cont, int flag)
+{
+qemu_put_be64(f, offset | cont | flag);
+if (!cont) {
+qemu_put_byte(f, strlen(block->idstr));
+qemu_put_buffer(f, (uint8_t *)block->idstr,
+strlen(block->idstr));
+}
+
+}
+
 static RAMBlock *last_block;
 static ram_addr_t last_offset;
 
@@ -153,21 +165,11 @@ static int ram_save_block(QEMUFile *f)
 p = memory_region_get_ram_ptr(mr) + offset;
 
 if (is_dup_page(p)) {
-qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_COMPRESS);
-if (!cont) {
-qemu_put_byte(f, strlen(block->idstr));
-qemu_put_buffer(f, (uint8_t *)block->idstr,
-strlen(block->idstr));
-}
+save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_COMPRESS);
 qemu_put_byte(f, *p);
 bytes_sent = 1;
 } else {
-qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_PAGE);
-if (!cont) {
-qemu_put_byte(f, strlen(block->idstr));
-qemu_put_buffer(f, (uint8_t *)block->idstr,
-strlen(block->idstr));
-}
+save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_PAGE);
 qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
 bytes_sent = TARGET_PAGE_SIZE;
 }
-- 
1.7.7.6




Re: [Qemu-devel] [PATCH next v2 73/74] linux-user: Use cpu_reset() after cpu_init() / cpu_copy()

2012-05-16 Thread Andreas Färber
Am 16.05.2012 14:36, schrieb Igor Mammedov:
> On 05/11/2012 01:26 PM, Andreas Färber wrote:
>> Am 11.05.2012 13:22, schrieb Peter Maydell:
>>> On 10 May 2012 01:14, Andreas Färber  wrote:
 Eliminates cpu_state_reset() usage.

 Signed-off-by: Andreas Färber
 ---
   linux-user/main.c|2 +-
   linux-user/syscall.c |2 +-
   2 files changed, 2 insertions(+), 2 deletions(-)

 diff --git a/linux-user/main.c b/linux-user/main.c
 index 191b750..49108b8 100644
 --- a/linux-user/main.c
 +++ b/linux-user/main.c
 @@ -3405,7 +3405,7 @@ int main(int argc, char **argv, char **envp)
  exit(1);
  }
   #if defined(TARGET_I386) || defined(TARGET_SPARC) ||
 defined(TARGET_PPC)
 -cpu_state_reset(env);
 +cpu_reset(ENV_GET_CPU(env));
   #endif

  thread_env = env;
 diff --git a/linux-user/syscall.c b/linux-user/syscall.c
 index 20d2a74..539af3f 100644
 --- a/linux-user/syscall.c
 +++ b/linux-user/syscall.c
 @@ -4262,7 +4262,7 @@ static int do_fork(CPUArchState *env, unsigned
 int flags, abi_ulong newsp,
  /* we create a new CPU instance. */
  new_env = cpu_copy(env);
   #if defined(TARGET_I386) || defined(TARGET_SPARC) ||
 defined(TARGET_PPC)
 -cpu_state_reset(new_env);
 +cpu_reset(ENV_GET_CPU(new_env));
   #endif
  /* Init regs that differ from the parent.  */
  cpu_clone_regs(new_env, newsp);
 -- 
>>>
>>> Do you have any plans to try to rationalise the handling of reset
>>> so that we consistently either do or don't reset the cpu here,
>>> rather than having it done based on a TARGET_* ifdef ?
>>
>> Igor had an RFC for x86; sparc and ppc reset I haven't looked into yet.
>> Cc'ing Alex and Blue.
> I'll rebase RFC for x86 and post patches today and will remove it from here
> by the last patch in patchset so that when this patch applied we could
> remove
> unnecessary call.
> So ACK for target-i386 here.

Since back then Peter and I have discussed whether we can rather just
remove the #ifdef here and reset for all targets.

Unfortunately I'm still not clear about some patches that stand in the
way of ObjectClass::realize - if cpu_reset() is moved to realizefn for
all targets then we can just call realize here.

Actually, all we'd need is ObjectClass::realize field, so I'm
considering extracting the intersection between Paolo and me, stick a
CPU-specific wrapper method on top as requested by Anthony and then we
can move ahead here...

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



[Qemu-devel] [PATCH v10 7/9] Add XBZRLE to ram_save_block and ram_save_live

2012-05-16 Thread Orit Wasserman
In the outgoing migration check to see if the page is cached and
changed than send compressed page by using save_xbrle_page function.
In the incoming migration check to see if RAM_SAVE_FLAG_XBRLE is set
and decompress the page (by using load_xbrle function).

Signed-off-by: Benoit Hudzia 
Signed-off-by: Petter Svard 
Signed-off-by: Aidan Shribman 
Signed-off-by: Orit Wasserman 
---
 arch_init.c |  220 +++
 migration.c |   26 +++-
 migration.h |8 ++
 savevm.c|   91 
 4 files changed, 329 insertions(+), 16 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index a334a2e..7ebdb7a 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -43,6 +43,15 @@
 #include "hw/smbios.h"
 #include "exec-memory.h"
 #include "hw/pcspk.h"
+#include "qemu/cache.h"
+
+#ifdef DEBUG_ARCH_INIT
+#define DPRINTF(fmt, ...) \
+do { fprintf(stdout, "arch_init: " fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+do { } while (0)
+#endif
 
 #ifdef TARGET_SPARC
 int graphic_width = 1024;
@@ -95,6 +104,7 @@ const uint32_t arch_type = QEMU_ARCH;
 #define RAM_SAVE_FLAG_PAGE 0x08
 #define RAM_SAVE_FLAG_EOS  0x10
 #define RAM_SAVE_FLAG_CONTINUE 0x20
+#define RAM_SAVE_FLAG_XBZRLE   0x40
 
 #ifdef __ALTIVEC__
 #include 
@@ -127,6 +137,22 @@ static int is_dup_page(uint8_t *page)
 return 1;
 }
 
+/* XBZRLE (Xor Based Zero Length Encoding */
+typedef struct XBZRLEHeader {
+uint32_t xh_cksum;
+uint16_t xh_len;
+uint8_t xh_flags;
+} XBZRLEHeader;
+
+/* struct contains XBZRLE cache and a static page
+   used by the compression */
+static struct {
+/* buffer used for XBZRLE encoding */
+uint8_t *encoded_buf;
+/* Cache for XBZRLE */
+Cache *cache;
+} XBZRLE = {0};
+
 static void save_block_hdr(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
 int cont, int flag)
 {
@@ -139,19 +165,78 @@ static void save_block_hdr(QEMUFile *f, RAMBlock *block, 
ram_addr_t offset,
 
 }
 
+#define ENCODING_FLAG_XBZRLE 0x1
+
+static int save_xbzrle_page(QEMUFile *f, uint8_t *current_data,
+ram_addr_t current_addr, RAMBlock *block,
+ram_addr_t offset, int cont)
+{
+int encoded_len = 0, bytes_sent = -1, ret = -1;
+XBZRLEHeader hdr = {0};
+uint8_t *prev_cached_page;
+
+/* check to see if page is cached , if not cache and return */
+if (!cache_is_cached(XBZRLE.cache, current_addr)) {
+cache_insert(XBZRLE.cache, current_addr, g_memdup(current_data,
+  TARGET_PAGE_SIZE));
+goto done;
+}
+
+prev_cached_page = get_cached_data(XBZRLE.cache, current_addr);
+
+/* XBZRLE encoding (if there is no overflow) */
+encoded_len = xbzrle_encode_buffer(prev_cached_page, current_data,
+   TARGET_PAGE_SIZE, XBZRLE.encoded_buf,
+   TARGET_PAGE_SIZE);
+if (encoded_len == 0) {
+bytes_sent = 0;
+DPRINTF("Unmodifed page or overflow skipping\n");
+goto done;
+} else if (encoded_len == -1) {
+bytes_sent = -1;
+DPRINTF("Overflow\n");
+/* update data in the cache */
+memcpy(prev_cached_page, current_data, TARGET_PAGE_SIZE);
+goto done;
+}
+
+/* we need to update the data in the cache, in order to get the same data
+   we cached we decode the encoded page on the cached data */
+ret = xbzrle_decode_buffer(XBZRLE.encoded_buf, encoded_len,
+   prev_cached_page, TARGET_PAGE_SIZE);
+g_assert(ret != -1);
+
+hdr.xh_len = encoded_len;
+hdr.xh_flags |= ENCODING_FLAG_XBZRLE;
+
+/* Send XBZRLE based compressed page */
+save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_XBZRLE);
+qemu_put_byte(f, hdr.xh_flags);
+qemu_put_be16(f, hdr.xh_len);
+qemu_put_be32(f, hdr.xh_cksum);
+qemu_put_buffer(f, XBZRLE.encoded_buf, encoded_len);
+bytes_sent = encoded_len + sizeof(hdr);
+
+done:
+return bytes_sent;
+}
+
 static RAMBlock *last_block;
 static ram_addr_t last_offset;
 
-static int ram_save_block(QEMUFile *f)
+static int ram_save_block(QEMUFile *f, int stage)
 {
 RAMBlock *block = last_block;
 ram_addr_t offset = last_offset;
-int bytes_sent = 0;
+int bytes_sent = -1;
 MemoryRegion *mr;
+ram_addr_t current_addr;
 
 if (!block)
 block = QLIST_FIRST(&ram_list.blocks);
 
+current_addr = block->offset + offset;
+
 do {
 mr = block->mr;
 if (memory_region_get_dirty(mr, offset, TARGET_PAGE_SIZE,
@@ -168,7 +253,22 @@ static int ram_save_block(QEMUFile *f)
 save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_COMPRESS);
 qemu_put_byte(f, *p);
 bytes_sent = 1;
-} else {
+} else if (migrate_use_xbzrle()) {
+/* in stage 1 none of the pages ar

Re: [Qemu-devel] [PATCH next v2 73/74] linux-user: Use cpu_reset() after cpu_init() / cpu_copy()

2012-05-16 Thread Peter Maydell
On 16 May 2012 14:01, Andreas Färber  wrote:
> Am 16.05.2012 14:36, schrieb Igor Mammedov:
>> On 05/11/2012 01:26 PM, Andreas Färber wrote:
>>> Am 11.05.2012 13:22, schrieb Peter Maydell:
 On 10 May 2012 01:14, Andreas Färber  wrote:
> Eliminates cpu_state_reset() usage.
>
> Signed-off-by: Andreas Färber
> ---
>   linux-user/main.c    |    2 +-
>   linux-user/syscall.c |    2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/linux-user/main.c b/linux-user/main.c
> index 191b750..49108b8 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -3405,7 +3405,7 @@ int main(int argc, char **argv, char **envp)
>          exit(1);
>      }
>   #if defined(TARGET_I386) || defined(TARGET_SPARC) ||
> defined(TARGET_PPC)
> -    cpu_state_reset(env);
> +    cpu_reset(ENV_GET_CPU(env));
>   #endif
>
>      thread_env = env;
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index 20d2a74..539af3f 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -4262,7 +4262,7 @@ static int do_fork(CPUArchState *env, unsigned
> int flags, abi_ulong newsp,
>          /* we create a new CPU instance. */
>          new_env = cpu_copy(env);
>   #if defined(TARGET_I386) || defined(TARGET_SPARC) ||
> defined(TARGET_PPC)
> -        cpu_state_reset(new_env);
> +        cpu_reset(ENV_GET_CPU(new_env));
>   #endif
>          /* Init regs that differ from the parent.  */
>          cpu_clone_regs(new_env, newsp);
> --

 Do you have any plans to try to rationalise the handling of reset
 so that we consistently either do or don't reset the cpu here,
 rather than having it done based on a TARGET_* ifdef ?
>>>
>>> Igor had an RFC for x86; sparc and ppc reset I haven't looked into yet.
>>> Cc'ing Alex and Blue.
>> I'll rebase RFC for x86 and post patches today and will remove it from here
>> by the last patch in patchset so that when this patch applied we could
>> remove
>> unnecessary call.
>> So ACK for target-i386 here.
>
> Since back then Peter and I have discussed whether we can rather just
> remove the #ifdef here and reset for all targets.
>
> Unfortunately I'm still not clear about some patches that stand in the
> way of ObjectClass::realize - if cpu_reset() is moved to realizefn for
> all targets then we can just call realize here.

I don't think we need to tangle this up with realize. What I think
we should do is:
 (1) remove the #ifdefs on the cpu reset in linux-user/main.c, so
we reset for all target CPU types
 (2) remove the cpu reset from the do_fork code in linux-user and
instead do a cpu reset inside exec.c:cpu_copy(), just after we call
cpu_init(). (This is actually fixing a bug -- the thread clone syscall
is supposed to leave register values alone, which is why we have
cpu_copy in the first place, and calling cpu_reset breaks this.)
 (3) remove any cpu reset calls from inside target cpu_init functions

This cleans things up so all targets work the same (cpu_init never
does a reset) and linux-user works the same as system mode (it's
the responsibility of the code which creates the cpu to reset it
at some point before use).

-- PMM



[Qemu-devel] [PATCH v10 5/9] Add uleb encoding/decoding functions

2012-05-16 Thread Orit Wasserman
Implement Unsigned Little Endian Base 128.

Signed-off-by: Orit Wasserman 
---
 cutils.c  |   29 +
 qemu-common.h |8 
 2 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/cutils.c b/cutils.c
index af308cd..60fb7c8 100644
--- a/cutils.c
+++ b/cutils.c
@@ -549,3 +549,32 @@ int qemu_sendv(int sockfd, struct iovec *iov, int len, int 
iov_offset)
 return do_sendv_recvv(sockfd, iov, len, iov_offset, 1);
 }
 
+/*
+ * Implementation of  ULEB128 (http://en.wikipedia.org/wiki/LEB128)
+ * Input is limited to 14-bit numbers
+ */
+int uleb128_encode_small(uint8_t *out, uint32_t n)
+{
+g_assert(n <= 0x3fff);
+if (n < 0x80) {
+*out++ = n;
+return 1;
+} else {
+*out++ = (n & 0x7f) | 0x80;
+*out++ = n >> 7;
+return 2;
+}
+}
+
+int uleb128_decode_small(const uint8_t *in, uint32_t *n)
+{
+if (!(*in & 0x80)) {
+*n = *in++;
+return 1;
+} else {
+*n = *in++ & 0x7f;
+g_assert(!(*in & 0x80));
+*n |= *in++ << 7;
+return 2;
+}
+}
diff --git a/qemu-common.h b/qemu-common.h
index 30c59c8..3d0f66f 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -407,4 +407,12 @@ static inline uint64_t muldiv64(uint64_t a, uint32_t b, 
uint32_t c)
 
 #include "module.h"
 
+/*
+ * Implementation of ULEB128 (http://en.wikipedia.org/wiki/LEB128)
+ * Input is limited to 14-bit numbers
+ */
+
+int uleb128_encode_small(uint8_t *out, uint32_t n);
+int uleb128_decode_small(const uint8_t *in, uint32_t *n);
+
 #endif
-- 
1.7.7.6




[Qemu-devel] [PATCH 1/6] qcow2: don't leak buffer for unexpected qcow_version in header

2012-05-16 Thread Jim Meyering
From: Jim Meyering 


Signed-off-by: Jim Meyering 
---
 block/qcow2.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 655799c..f3388bf 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -919,6 +919,7 @@ int qcow2_update_header(BlockDriverState *bs)
 ret = sizeof(*header);
 break;
 default:
+free(buf);
 return -EINVAL;
 }

-- 
1.7.10.2.520.g6a4a482




[Qemu-devel] [PATCH 0/6] plug memory and file-descriptor leaks

2012-05-16 Thread Jim Meyering
From: Jim Meyering 

These changes fix most of the legitimate coverity-reported leak warnings.

Jim Meyering (6):
  qcow2: don't leak buffer for unexpected qcow_version in header
  qemu-ga: avoid unconditional lockfile file descriptor leak
  linux-user: do_msgrcv: don't leak host_mb upon TARGET_EFAULT failure
  sheepdog: don't leak socket file descriptor upon connection failure
  arm-semi: don't leak 1kb user string lock buffer upon TARGET_SYS_OPEN
  softmmu-semi: fix lock_user* functions not to deref NULL upon OOM

 arm-semi.c   | 13 +++--
 block/qcow2.c|  1 +
 block/sheepdog.c |  1 +
 linux-user/syscall.c |  1 +
 qemu-ga.c| 10 --
 softmmu-semi.h   |  6 +++---
 6 files changed, 21 insertions(+), 11 deletions(-)

--
1.7.10.2.520.g6a4a482



[Qemu-devel] [PATCH 4/6] sheepdog: don't leak socket file descriptor upon connection failure

2012-05-16 Thread Jim Meyering
From: Jim Meyering 


Signed-off-by: Jim Meyering 
---
 block/sheepdog.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index e01d371..a5c834f 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -489,6 +489,7 @@ static int connect_to_sdog(const char *addr, const char 
*port)
 if (errno == EINTR) {
 goto reconnect;
 }
+close(fd);
 break;
 }

-- 
1.7.10.2.520.g6a4a482




[Qemu-devel] [PATCH 3/6] linux-user: do_msgrcv: don't leak host_mb upon TARGET_EFAULT failure

2012-05-16 Thread Jim Meyering
From: Jim Meyering 


Signed-off-by: Jim Meyering 
---
 linux-user/syscall.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 20d2a74..bdf8ce0 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -2814,6 +2814,7 @@ static inline abi_long do_msgrcv(int msqid, abi_long msgp,
 end:
 if (target_mb)
 unlock_user_struct(target_mb, msgp, 1);
+free(host_mb);
 return ret;
 }

-- 
1.7.10.2.520.g6a4a482




[Qemu-devel] [PATCH 6/6] softmmu-semi: fix lock_user* functions not to deref NULL upon OOM

2012-05-16 Thread Jim Meyering
From: Jim Meyering 

Use g_malloc/g_free in place of malloc/free.

Signed-off-by: Jim Meyering 
---
 softmmu-semi.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/softmmu-semi.h b/softmmu-semi.h
index 648cb95..996e0f7 100644
--- a/softmmu-semi.h
+++ b/softmmu-semi.h
@@ -39,7 +39,7 @@ static void *softmmu_lock_user(CPUArchState *env, uint32_t 
addr, uint32_t len,
 {
 uint8_t *p;
 /* TODO: Make this something that isn't fixed size.  */
-p = malloc(len);
+p = g_malloc(len);
 if (copy)
 cpu_memory_rw_debug(env, addr, p, len, 0);
 return p;
@@ -51,7 +51,7 @@ static char *softmmu_lock_user_string(CPUArchState *env, 
uint32_t addr)
 char *s;
 uint8_t c;
 /* TODO: Make this something that isn't fixed size.  */
-s = p = malloc(1024);
+s = p = g_malloc(1024);
 do {
 cpu_memory_rw_debug(env, addr, &c, 1, 0);
 addr++;
@@ -65,6 +65,6 @@ static void softmmu_unlock_user(CPUArchState *env, void *p, 
target_ulong addr,
 {
 if (len)
 cpu_memory_rw_debug(env, addr, p, len, 1);
-free(p);
+g_free(p);
 }
 #define unlock_user(s, args, len) softmmu_unlock_user(env, s, args, len)
-- 
1.7.10.2.520.g6a4a482




Re: [Qemu-devel] A error abou t gtester-report, can anyone help me, thank you !

2012-05-16 Thread Anthony Liguori

On 05/16/2012 02:51 AM, Zhi Hui Li wrote:

When I use gtest.

Compiler did not have a problem, the use of gtester implementation and generate
xml file testlog also no problem, but in the use of
gterster-report, the output is as follows:

gtester-report test.log > test-report.html


There's a bug in gtester-report on some distros.

If you use 'make checkreport.html', it works around the bug using some scripts 
in the tree.


Regards,

Anthony Liguori





Traceback (most recent call last):
File "/usr/bin/gtester-report", line 492, in 
main()
File "/usr/bin/gtester-report", line 486, in main
HTMLReportWriter(rr.get_info(), rr.binary_list()).printout()
File "/usr/bin/gtester-report", line 350, in printout
self.handle_info ()
File "/usr/bin/gtester-report", line 244, in handle_info
self.oprint ('Package: %(package)s, version: %(version)s\n' % 
self.info)
KeyError: 'package'
li@mm:~/ceshi/g_test$ gtester-report test.log > test-report.html
Traceback (most recent call last):
File "/usr/bin/gtester-report", line 492, in 
main()
File "/usr/bin/gtester-report", line 486, in main
HTMLReportWriter(rr.get_info(), rr.binary_list()).printout()
File "/usr/bin/gtester-report", line 350, in printout
self.handle_info ()
File "/usr/bin/gtester-report", line 244, in handle_info
self.oprint ('Package: %(package)s, version: %(version)s\n' % 
self.info)
KeyError: 'package'


I don't know why the error " KeyError: 'package' " appear. can anybody help me ?
Thank you very much!







Re: [Qemu-devel] [PATCH 5/6] arm-semi: don't leak 1kb user string lock buffer upon TARGET_SYS_OPEN

2012-05-16 Thread Peter Maydell
On 16 May 2012 14:08, Jim Meyering  wrote:
> From: Jim Meyering 
>
> Always call unlock_user before returning.
>
> .

Gratuitous dot in your commit message here, but

>
> Signed-off-by: Jim Meyering 

Reviewed-by: Peter Maydell 

-- PMM



[Qemu-devel] [PATCH 2/6] qemu-ga: avoid unconditional lockfile file descriptor leak

2012-05-16 Thread Jim Meyering
From: Jim Meyering 

Do not leak a file descriptor.
Also, do not forget to unlink the lockfile upon failed lockf.
Always close the lockfile file descriptor, taking care
to diagnose close, as well as open and write, failure.

Signed-off-by: Jim Meyering 
---
 qemu-ga.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/qemu-ga.c b/qemu-ga.c
index 680997e..6c6de55 100644
--- a/qemu-ga.c
+++ b/qemu-ga.c
@@ -241,12 +241,13 @@ void ga_set_response_delimited(GAState *s)
 static bool ga_open_pidfile(const char *pidfile)
 {
 int pidfd;
+int write_err;
 char pidstr[32];

 pidfd = open(pidfile, O_CREAT|O_WRONLY, S_IRUSR|S_IWUSR);
 if (pidfd == -1 || lockf(pidfd, F_TLOCK, 0)) {
 g_critical("Cannot lock pid file, %s", strerror(errno));
-return false;
+goto fail;
 }

 if (ftruncate(pidfd, 0) || lseek(pidfd, 0, SEEK_SET)) {
@@ -254,7 +255,9 @@ static bool ga_open_pidfile(const char *pidfile)
 goto fail;
 }
 sprintf(pidstr, "%d", getpid());
-if (write(pidfd, pidstr, strlen(pidstr)) != strlen(pidstr)) {
+write_err = write(pidfd, pidstr, strlen(pidstr)) != strlen(pidstr);
+if (close(pidfd) || write_err) {
+pidfd = -1;
 g_critical("Failed to write pid file");
 goto fail;
 }
@@ -262,6 +265,9 @@ static bool ga_open_pidfile(const char *pidfile)
 return true;

 fail:
+if (pidfd != -1) {
+close(pidfd);
+}
 unlink(pidfile);
 return false;
 }
-- 
1.7.10.2.520.g6a4a482




Re: [Qemu-devel] [PATCH 1/4] Introduce a new hotplug state: Force eject.

2012-05-16 Thread Michael S. Tsirkin
On Wed, May 16, 2012 at 12:37:54PM +0100, Anthony PERARD wrote:
> On Wed, May 16, 2012 at 12:23 PM, Paolo Bonzini  wrote:
> > Il 16/05/2012 13:15, Anthony PERARD ha scritto:
> >>> >         qdev_unplug(&(d->qdev), NULL);
> >>> > +        qdev_free(&(d->qdev));
> >>> >     }
> >>> >  }
> >>> >
> >>> >
> >>> > Anthony, can you confirm that this solves the problem for you?
> >> This work until I try to hotplug a new device to the guest at wish
> >> point I have this:
> >> ERROR:/local/home/anthony/work/qemu/qom/object.c:389:object_delete:
> >> assertion failed: (obj->ref == 0)
> >>
> >> This is because there is still a pending request of the hotunplug in
> >> the acpi piix4.
> >> If I call qdev_free without qdev_unplug, I hit the same assert, but
> >> rigth away. This is way something new.
> >
> > Because it's missing the object_unparent done by qdev_unplug.  Does
> > object_unparent+qdev_free work?  (I believe object_unparent should be
> > done by qdev_free rather than qdev_unplug, but that's something for 1.2).
> 
> Cool, this seems to work fine. Thanks.
> 
> I'll test a bit more and resend a patch with only object_unparent+qdev_free.

Separately it was suggested to make qdev_free do object_unparent
automatically. Anthony is yet to respond.

> -- 
> Anthony PERARD



Re: [Qemu-devel] [PATCH 3/6] linux-user: do_msgrcv: don't leak host_mb upon TARGET_EFAULT failure

2012-05-16 Thread Peter Maydell
On 16 May 2012 14:07, Jim Meyering  wrote:
> From: Jim Meyering 
>
>
> Signed-off-by: Jim Meyering 
> ---
>  linux-user/syscall.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index 20d2a74..bdf8ce0 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -2814,6 +2814,7 @@ static inline abi_long do_msgrcv(int msqid, abi_long 
> msgp,
>  end:
>     if (target_mb)
>         unlock_user_struct(target_mb, msgp, 1);
> +    free(host_mb);
>     return ret;
>  }

This will cause us to free() host_mb twice in the normal-return case.

-- PMM



[Qemu-devel] [PATCH 5/6] arm-semi: don't leak 1kb user string lock buffer upon TARGET_SYS_OPEN

2012-05-16 Thread Jim Meyering
From: Jim Meyering 

Always call unlock_user before returning.

.

Signed-off-by: Jim Meyering 
---
 arm-semi.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arm-semi.c b/arm-semi.c
index 88ca9bb..5d2a2d2 100644
--- a/arm-semi.c
+++ b/arm-semi.c
@@ -194,18 +194,19 @@ uint32_t do_arm_semihosting(CPUARMState *env)
 if (!(s = lock_user_string(ARG(0
 /* FIXME - should this error code be -TARGET_EFAULT ? */
 return (uint32_t)-1;
-if (ARG(1) >= 12)
+if (ARG(1) >= 12) {
+unlock_user(s, ARG(0), 0);
 return (uint32_t)-1;
+}
 if (strcmp(s, ":tt") == 0) {
-if (ARG(1) < 4)
-return STDIN_FILENO;
-else
-return STDOUT_FILENO;
+int result_fileno = ARG(1) < 4 ? STDIN_FILENO : STDOUT_FILENO;
+unlock_user(s, ARG(0), 0);
+return result_fileno;
 }
 if (use_gdb_syscalls()) {
 gdb_do_syscall(arm_semi_cb, "open,%s,%x,1a4", ARG(0),
   (int)ARG(2)+1, gdb_open_modeflags[ARG(1)]);
-return env->regs[0];
+ret = env->regs[0];
 } else {
 ret = set_swi_errno(ts, open(s, open_modeflags[ARG(1)], 0644));
 }
-- 
1.7.10.2.520.g6a4a482




Re: [Qemu-devel] [PATCH 4/4] xen: Fix PV-on-HVM

2012-05-16 Thread Michael S. Tsirkin
On Wed, May 16, 2012 at 11:37:02AM +0100, Stefano Stabellini wrote:
> On Wed, 16 May 2012, Michael S. Tsirkin wrote:
> > On Wed, May 16, 2012 at 11:19:53AM +0100, Stefano Stabellini wrote:
> > > On Wed, 16 May 2012, Paolo Bonzini wrote:
> > > > Il 16/05/2012 10:13, Michael S. Tsirkin ha scritto:
> > > > > On Wed, May 16, 2012 at 10:06:25AM +0200, Paolo Bonzini wrote:
> > > > >> On Xen the PV drivers can ask the firmware to surprise-remove the
> > > > >> emulated NICs.
> > > > > 
> > > > > So driver tells firmware (meaning acpi? how?) that it's ok
> > > > > to do surprize removal?
> > > > 
> > > > It writes something to some I/O port, and then QEMU surprise-removes the
> > > > NICs.
> > > 
> > > Yes, writing to a static I/O port provided by the Xen platform PCI
> > > device, see hw/xen_platform.c:platform_fixed_ioport_writew.
> > > 
> > > The guest can ask to unplug emulated NICs and disks this way.
> > > Surprise-removal is OK in these cases.
> > 
> > Confused.
> > Don't you want to just remove the device on unplug?
> 
> Yes, the NIC needs to "disappear" from the PCI bus.
> 
> 
> > In fact the equivalent of guest calling _EJ0?
> 
> Except that _EJ0 can or cannot be implemented, while this doesn't have
> to go through ACPI or PCI hotplug and it is supposed to always work.

So the answer is to simply free on unplug exactly
like _EJ0 does. There's no forcing and no surprise removal here.

-- 
MST



Re: [Qemu-devel] [PATCH 1/1] virtio-rng: device to send host entropy to guest

2012-05-16 Thread Anthony Liguori

On 05/16/2012 06:30 AM, Amit Shah wrote:

The Linux kernel already has a virtio-rng driver, this is the device
implementation.

When Linux needs more entropy, it puts a buffer in the vq.  We then put
entropy into that buffer, and push it back to the guest.

Feeding randomness from host's /dev/urandom into the guest is
sufficient, so this is a simple implementation that opens /dev/urandom
and reads from it whenever required.

Invocation is simple:

   qemu ... -device virtio-rng-pci

In the guest, we see

   $ cat /sys/devices/virtual/misc/hw_random/rng_available
   virtio

   $ cat /sys/devices/virtual/misc/hw_random/rng_current
   virtio

There are ways to extend the device to be more generic and collect
entropy from other sources, but this is simple enough and works for now.

Signed-off-by: Amit Shah


It's not this simple unfortunately.

If you did this with libvirt, one guest could exhaust the available entropy for 
the remaining guests.  This could be used as a mechanism for one guest to attack 
another (reducing the available entropy for key generation).


You need to rate limit the amount of entropy that a guest can obtain to allow 
management tools to mitigate this attack.


Regards,

Anthony Liguori


---
  Makefile.objs   |1 +
  hw/pci.h|1 +
  hw/virtio-pci.c |   50 +
  hw/virtio-rng.c |  130 +++
  hw/virtio-rng.h |   18 
  hw/virtio.h |2 +
  6 files changed, 202 insertions(+), 0 deletions(-)
  create mode 100644 hw/virtio-rng.c
  create mode 100644 hw/virtio-rng.h

diff --git a/Makefile.objs b/Makefile.objs
index 70c5c79..5850762 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -210,6 +210,7 @@ user-obj-y += $(qom-obj-twice-y)
  hw-obj-y =
  hw-obj-y += vl.o loader.o
  hw-obj-$(CONFIG_VIRTIO) += virtio-console.o
+hw-obj-$(CONFIG_VIRTIO) += virtio-rng.o
  hw-obj-y += usb/libhw.o
  hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
  hw-obj-y += fw_cfg.o
diff --git a/hw/pci.h b/hw/pci.h
index 8d0aa49..0a22f91 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -76,6 +76,7 @@
  #define PCI_DEVICE_ID_VIRTIO_BALLOON 0x1002
  #define PCI_DEVICE_ID_VIRTIO_CONSOLE 0x1003
  #define PCI_DEVICE_ID_VIRTIO_SCSI0x1004
+#define PCI_DEVICE_ID_VIRTIO_RNG 0x1005

  #define FMT_PCIBUS  PRIx64

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 4a4413d..7f2d630 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -812,6 +812,28 @@ static int virtio_balloon_exit_pci(PCIDevice *pci_dev)
  return virtio_exit_pci(pci_dev);
  }

+static int virtio_rng_init_pci(PCIDevice *pci_dev)
+{
+VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, pci_dev);
+VirtIODevice *vdev;
+
+vdev = virtio_rng_init(&pci_dev->qdev);
+if (!vdev) {
+return -1;
+}
+virtio_init_pci(proxy, vdev);
+return 0;
+}
+
+static int virtio_rng_exit_pci(PCIDevice *pci_dev)
+{
+VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, pci_dev);
+
+virtio_pci_stop_ioeventfd(proxy);
+virtio_rng_exit(proxy->vdev);
+return virtio_exit_pci(pci_dev);
+}
+
  static Property virtio_blk_properties[] = {
  DEFINE_PROP_HEX32("class", VirtIOPCIProxy, class_code, 0),
  DEFINE_BLOCK_PROPERTIES(VirtIOPCIProxy, block),
@@ -937,6 +959,33 @@ static TypeInfo virtio_balloon_info = {
  .class_init= virtio_balloon_class_init,
  };

+static Property virtio_rng_properties[] = {
+DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features),
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static void virtio_rng_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+k->init = virtio_rng_init_pci;
+k->exit = virtio_rng_exit_pci;
+k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
+k->device_id = PCI_DEVICE_ID_VIRTIO_RNG;
+k->revision = VIRTIO_PCI_ABI_VERSION;
+k->class_id = PCI_CLASS_OTHERS;
+dc->reset = virtio_pci_reset;
+dc->props = virtio_rng_properties;
+}
+
+static TypeInfo virtio_rng_info = {
+.name  = "virtio-rng-pci",
+.parent= TYPE_PCI_DEVICE,
+.instance_size = sizeof(VirtIOPCIProxy),
+.class_init= virtio_rng_class_init,
+};
+
  static int virtio_scsi_init_pci(PCIDevice *pci_dev)
  {
  VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, pci_dev);
@@ -998,6 +1047,7 @@ static void virtio_pci_register_types(void)
  type_register_static(&virtio_serial_info);
  type_register_static(&virtio_balloon_info);
  type_register_static(&virtio_scsi_info);
+type_register_static(&virtio_rng_info);
  }

  type_init(virtio_pci_register_types)
diff --git a/hw/virtio-rng.c b/hw/virtio-rng.c
new file mode 100644
index 000..e1f3d1c
--- /dev/null
+++ b/hw/virtio-rng.c
@@ -0,0 +1,130 @@
+/* A virtio device for feeding entropy into a guest.
+ *
+ * Copyright 2012 Red Hat, Inc.
+ * Copyright 2012 Amit Shah
+ *
+ * This wo

Re: [Qemu-devel] [PATCH] Avoid segfault in cpu_dump_state

2012-05-16 Thread Fabien Chouteau
On 05/16/2012 10:29 AM, Fabien Chouteau wrote:
> On 05/16/2012 05:50 AM, Andreas Färber wrote:
>> Am 15.05.2012 18:08, schrieb Fabien Chouteau:
>>> On 05/15/2012 03:31 PM, Andreas Färber wrote:
 Am 15.05.2012 11:39, schrieb Fabien Chouteau:
> Do not call cpu_dump_state if logfile is NULL.

 And where is log_cpu_state() being called from? Its caller is passing
 NULL already then.

>>>
>>> No, logfile is a global variable. log_cpu_state() takes only CPUState
>>> and flags parameters.
>>
>> Ah, I see now that f is a different f here, logfile becomes
>> log_cpu_state()'s f. Unfortunate naming.
>>
>> Your fix looks OK then but I would recommend turning it into a static
>> inline function to avoid the line breaks.
>>
> 
> In this case I can rewrite all the macros in qemu-log.h to static inline.
> 

This is more complex than expected...

 1 - GCC rejects inlined variadic functions

 2 - Moving from macro to inline implies use of types defined in cpu.h
 (target_ulong, CPUArchState...), which I cannot include because
 qemu-log.h is used in tools (i.e.  without cpu.h).

Conclusion: unless someone volunteer for a massive restructuring of
qemu-log we have to keep the marcro for log_cpu_state.

-- 
Fabien Chouteau



Re: [Qemu-devel] [PATCH 1/1] virtio-rng: device to send host entropy to guest

2012-05-16 Thread Daniel P. Berrange
On Wed, May 16, 2012 at 08:24:22AM -0500, Anthony Liguori wrote:
> On 05/16/2012 06:30 AM, Amit Shah wrote:
> >The Linux kernel already has a virtio-rng driver, this is the device
> >implementation.
> >
> >When Linux needs more entropy, it puts a buffer in the vq.  We then put
> >entropy into that buffer, and push it back to the guest.
> >
> >Feeding randomness from host's /dev/urandom into the guest is
> >sufficient, so this is a simple implementation that opens /dev/urandom
> >and reads from it whenever required.
> >
> >Invocation is simple:
> >
> >   qemu ... -device virtio-rng-pci
> >
> >In the guest, we see
> >
> >   $ cat /sys/devices/virtual/misc/hw_random/rng_available
> >   virtio
> >
> >   $ cat /sys/devices/virtual/misc/hw_random/rng_current
> >   virtio
> >
> >There are ways to extend the device to be more generic and collect
> >entropy from other sources, but this is simple enough and works for now.
> >
> >Signed-off-by: Amit Shah
> 
> It's not this simple unfortunately.
> 
> If you did this with libvirt, one guest could exhaust the available
> entropy for the remaining guests.  This could be used as a mechanism
> for one guest to attack another (reducing the available entropy for
> key generation).
> 
> You need to rate limit the amount of entropy that a guest can obtain
> to allow management tools to mitigate this attack.

Ultimately I think you need to have a push mechanism, where an external
process feeds entropy to QEMU, rather than a pull mechanism where QEMU
grabs entropy itself.

I tend to think that virtio-rng should have a chardev backend associated
with it. The driver should just read from this chardev to get its entropy.
Either libvirtd, or better yet a separate virt-entropyd daemonm would
connect to each guest & feed the entropy into each guest according to
some desired metrics.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] [PATCH 2/2] Get system state configuration from QEMU and patcth DSDT with it.

2012-05-16 Thread Gleb Natapov
On Tue, May 15, 2012 at 07:18:10PM -0400, Kevin O'Connor wrote:
> On Tue, May 15, 2012 at 11:06:05AM +0300, Gleb Natapov wrote:
> > On Mon, May 14, 2012 at 09:43:19PM -0400, Kevin O'Connor wrote:
> > > On Mon, May 14, 2012 at 03:35:23PM +0300, Gleb Natapov wrote:
> > > > QEMU may want to disable guest's S3/S4 support and it wants to 
> > > > distinguish
> > > > between regular powerdown and S4 powerdown. To support that new fw_cfg
> > > > option was added that passes supported system states and what value 
> > > > should
> > > > guest use to enter each state. States are passed in 6 byte array. Each
> > > > byte represents one system state. If byte at offset X has its MSB set
> > > > it means that system state X is supported and to enter it guest should
> > > > use the value from lowest 7 bits. Patch also detects old QEMU and uses
> > > > values that work in backwards compatible way there.
> > > 
> > > A couple of comments - see below.
> > > 
> > > [...]
> > > > --- a/src/acpi-dsdt.dsl
> > > > +++ b/src/acpi-dsdt.dsl
> > > > @@ -613,6 +613,7 @@ DefinitionBlock (
> > > >   * S3 (suspend-to-ram), S4 (suspend-to-disk) and S5 (power-off) 
> > > > type codes:
> > > >   * must match piix4 emulation.
> > > >   */
> > > > +ACPI_EXTRACT_NAME_STRING acpi_s3_name
> > > >  Name (\_S3, Package (0x04)
> > > >  {
> > > >  0x01,  /* PM1a_CNT.SLP_TYP */
> > > > @@ -620,10 +621,12 @@ DefinitionBlock (
> > > >  Zero,  /* reserved */
> > > >  Zero   /* reserved */
> > > >  })
> > > > +ACPI_EXTRACT_NAME_STRING acpi_s4_name
> > > > +ACPI_EXTRACT_PKG_START acpi_s4_pkg
> > > 
> > > The DSDT is quite complex and has a diverse usage.  I'd feel more
> > > comfortable leaving it as static and doing any dynamic work in an
> > > SSDT.  In this particular case, can't the objects be turned into
> > > methods which calculate the associated values and return the correct
> > > results?
> > Checked with WindowsXP and Linux and they work if I make _S3 to be a
> > method that returns package, so we can drop ACPI_EXTRACT_PKG_START and
> > do runtime calculation, but what this calculation will be based on? We
> > will have to pass QEMU S4 value to AML somehow and this will involve
> > patching of something eventually.
> 
> As in the other recent discussion, a struct can be built by the BIOS
> and a pointer passed in via a dynamic SSDT (eg, BDAT).  Whatever data
> is needed can then be passed in via that struct.
> 
I saw that, but I don't get why doing it this way instead of defining
the object in AML and patching it? I can define Name(S4VL, 0x2) and path
0x2 to whatever QEMU wants me to use, or I can patch Package directly
like I did.

> >And of course we will still have to
> > patch out _S3/_S4 names in case qemu want to disable them. I do not see
> > how we can disable them in any other way.
> 
> If the mere existence of _S3 tells the OS that S3 is supported, then
> it will have to be patched in.
Seems to be so.

> 
> > I think the use of patching will only increase now after we let that
> > genie out of the bottle, so moving each part that we want to patch in
> > separate SSDT will not scale.
> 
> Why?  Just put the definitions in ssdp_pcihp.dsl instead of
> acpi-dsdt.dsl - there's no real difference.
Fine by me. I verified and Windows/Linux can cope with _Sx definitions
being in SSDT. If we a going to move all the code that needs patching to
this file may we should rename it to something like ssdp_dynamic.dsl?

> 
> > > > +int qemu_cfg_system_states(char *states)
> > > > +{
> > > 
> > > I'd prefer to see any new fw_cfg entries use the QEMU_CFG_FILE_DIR
> > > mechanism so that seabios can use romfile_loadfile (or similar).
> > > 
> > The number of files you can pass over fw_cfg interface is limited due to
> > implementation details. I think we should continue using regular
> > fw_cfg entries for code that is QEMU specific and files for code that is
> > shared with coreboot.
> 
> The QEMU_CFG_FILE_DIR is just a list of "names" and "sizes" for each
> "port".  There's no fundamental limitation to the interface.  If QEMU
> has a limit, we should just fix that.
>
Each time Seabios wants to read a file it need to iterate over all/most
existing files. I can understand advantage of using files for code that
is shared between coreboot and qemu since files is what real HW uses,
but for QEMU internal code it is just overhead for the sake of it.  I do
not have strong fillings about this issue. If you think that files is
the only way forward may be you should communicate this to QEMU and put
a comment in hw/fw_cfg.h explaining that and increasing FW_CFG_FILE_SLOTS
to some ridiculously large value.

--
Gleb.



Re: [Qemu-devel] [PATCH 1/1] virtio-rng: device to send host entropy to guest

2012-05-16 Thread Anthony Liguori

On 05/16/2012 08:45 AM, Daniel P. Berrange wrote:

On Wed, May 16, 2012 at 08:24:22AM -0500, Anthony Liguori wrote:

On 05/16/2012 06:30 AM, Amit Shah wrote:

The Linux kernel already has a virtio-rng driver, this is the device
implementation.

When Linux needs more entropy, it puts a buffer in the vq.  We then put
entropy into that buffer, and push it back to the guest.

Feeding randomness from host's /dev/urandom into the guest is
sufficient, so this is a simple implementation that opens /dev/urandom
and reads from it whenever required.

Invocation is simple:

   qemu ... -device virtio-rng-pci

In the guest, we see

   $ cat /sys/devices/virtual/misc/hw_random/rng_available
   virtio

   $ cat /sys/devices/virtual/misc/hw_random/rng_current
   virtio

There are ways to extend the device to be more generic and collect
entropy from other sources, but this is simple enough and works for now.

Signed-off-by: Amit Shah


It's not this simple unfortunately.

If you did this with libvirt, one guest could exhaust the available
entropy for the remaining guests.  This could be used as a mechanism
for one guest to attack another (reducing the available entropy for
key generation).

You need to rate limit the amount of entropy that a guest can obtain
to allow management tools to mitigate this attack.


Ultimately I think you need to have a push mechanism, where an external
process feeds entropy to QEMU, rather than a pull mechanism where QEMU
grabs entropy itself.


A previous patch didn't open urandom directly but instead talked to an entropy 
daemon.  This approach would allow libvirt to hand out entropy as it saw fit 
without requiring a new driver.


Regards,

Anthony Liguori



I tend to think that virtio-rng should have a chardev backend associated
with it. The driver should just read from this chardev to get its entropy.
Either libvirtd, or better yet a separate virt-entropyd daemonm would
connect to each guest&  feed the entropy into each guest according to
some desired metrics.

Regards,
Daniel





Re: [Qemu-devel] [PATCH 3/6] linux-user: do_msgrcv: don't leak host_mb upon TARGET_EFAULT failure

2012-05-16 Thread Jim Meyering
Peter Maydell wrote:

> On 16 May 2012 14:07, Jim Meyering  wrote:
>> From: Jim Meyering 
>>
>>
>> Signed-off-by: Jim Meyering 
>> ---
>>  linux-user/syscall.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
>> index 20d2a74..bdf8ce0 100644
>> --- a/linux-user/syscall.c
>> +++ b/linux-user/syscall.c
>> @@ -2814,6 +2814,7 @@ static inline abi_long do_msgrcv(int msqid, abi_long 
>> msgp,
>>  end:
>>     if (target_mb)
>>         unlock_user_struct(target_mb, msgp, 1);
>> +    free(host_mb);
>>     return ret;
>>  }
>
> This will cause us to free() host_mb twice in the normal-return case.

Good catch.  Thanks.
V2 corrects that.



[Qemu-devel] [PATCHv2 3/6] linux-user: do_msgrcv: don't leak host_mb upon TARGET_EFAULT failure

2012-05-16 Thread Jim Meyering

Also, use g_malloc to avoid NULL-deref upon OOM.

Signed-off-by: Jim Meyering 
---
There are other, similar NULL-deref risks in this file.
TBD separately.

 linux-user/syscall.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 20d2a74..9bf0b28 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -2794,7 +2794,7 @@ static inline abi_long do_msgrcv(int msqid, abi_long msgp,
 if (!lock_user_struct(VERIFY_WRITE, target_mb, msgp, 0))
 return -TARGET_EFAULT;

-host_mb = malloc(msgsz+sizeof(long));
+host_mb = g_malloc(msgsz+sizeof(long));
 ret = get_errno(msgrcv(msqid, host_mb, msgsz, tswapal(msgtyp), msgflg));

 if (ret > 0) {
@@ -2809,11 +2809,11 @@ static inline abi_long do_msgrcv(int msqid, abi_long 
msgp,
 }

 target_mb->mtype = tswapal(host_mb->mtype);
-free(host_mb);

 end:
 if (target_mb)
 unlock_user_struct(target_mb, msgp, 1);
+g_free(host_mb);
 return ret;
 }

--
1.7.10.2.520.g6a4a482



Re: [Qemu-devel] [PATCH 1/1] virtio-rng: device to send host entropy to guest

2012-05-16 Thread Daniel P. Berrange
On Wed, May 16, 2012 at 08:48:20AM -0500, Anthony Liguori wrote:
> On 05/16/2012 08:45 AM, Daniel P. Berrange wrote:
> >On Wed, May 16, 2012 at 08:24:22AM -0500, Anthony Liguori wrote:
> >>On 05/16/2012 06:30 AM, Amit Shah wrote:
> >>>The Linux kernel already has a virtio-rng driver, this is the device
> >>>implementation.
> >>>
> >>>When Linux needs more entropy, it puts a buffer in the vq.  We then put
> >>>entropy into that buffer, and push it back to the guest.
> >>>
> >>>Feeding randomness from host's /dev/urandom into the guest is
> >>>sufficient, so this is a simple implementation that opens /dev/urandom
> >>>and reads from it whenever required.
> >>>
> >>>Invocation is simple:
> >>>
> >>>   qemu ... -device virtio-rng-pci
> >>>
> >>>In the guest, we see
> >>>
> >>>   $ cat /sys/devices/virtual/misc/hw_random/rng_available
> >>>   virtio
> >>>
> >>>   $ cat /sys/devices/virtual/misc/hw_random/rng_current
> >>>   virtio
> >>>
> >>>There are ways to extend the device to be more generic and collect
> >>>entropy from other sources, but this is simple enough and works for now.
> >>>
> >>>Signed-off-by: Amit Shah
> >>
> >>It's not this simple unfortunately.
> >>
> >>If you did this with libvirt, one guest could exhaust the available
> >>entropy for the remaining guests.  This could be used as a mechanism
> >>for one guest to attack another (reducing the available entropy for
> >>key generation).
> >>
> >>You need to rate limit the amount of entropy that a guest can obtain
> >>to allow management tools to mitigate this attack.
> >
> >Ultimately I think you need to have a push mechanism, where an external
> >process feeds entropy to QEMU, rather than a pull mechanism where QEMU
> >grabs entropy itself.
> 
> A previous patch didn't open urandom directly but instead talked to
> an entropy daemon.  This approach would allow libvirt to hand out
> entropy as it saw fit without requiring a new driver.

The nice thing about just using a plain chardev backend for virtiorng
is that it would let you have the flexibility to integrate with any kind
of entropy daemon, or even just run without a daemon & rely on some other
process to periodically open the chardev & feed in data.

> >I tend to think that virtio-rng should have a chardev backend associated
> >with it. The driver should just read from this chardev to get its entropy.
> >Either libvirtd, or better yet a separate virt-entropyd daemonm would
> >connect to each guest&  feed the entropy into each guest according to
> >some desired metrics.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] [PATCH 1/1] virtio-rng: device to send host entropy to guest

2012-05-16 Thread Anthony Liguori

On 05/16/2012 08:53 AM, Daniel P. Berrange wrote:

On Wed, May 16, 2012 at 08:48:20AM -0500, Anthony Liguori wrote:

On 05/16/2012 08:45 AM, Daniel P. Berrange wrote:

On Wed, May 16, 2012 at 08:24:22AM -0500, Anthony Liguori wrote:

On 05/16/2012 06:30 AM, Amit Shah wrote:

The Linux kernel already has a virtio-rng driver, this is the device
implementation.

When Linux needs more entropy, it puts a buffer in the vq.  We then put
entropy into that buffer, and push it back to the guest.

Feeding randomness from host's /dev/urandom into the guest is
sufficient, so this is a simple implementation that opens /dev/urandom
and reads from it whenever required.

Invocation is simple:

   qemu ... -device virtio-rng-pci

In the guest, we see

   $ cat /sys/devices/virtual/misc/hw_random/rng_available
   virtio

   $ cat /sys/devices/virtual/misc/hw_random/rng_current
   virtio

There are ways to extend the device to be more generic and collect
entropy from other sources, but this is simple enough and works for now.

Signed-off-by: Amit Shah


It's not this simple unfortunately.

If you did this with libvirt, one guest could exhaust the available
entropy for the remaining guests.  This could be used as a mechanism
for one guest to attack another (reducing the available entropy for
key generation).

You need to rate limit the amount of entropy that a guest can obtain
to allow management tools to mitigate this attack.


Ultimately I think you need to have a push mechanism, where an external
process feeds entropy to QEMU, rather than a pull mechanism where QEMU
grabs entropy itself.


A previous patch didn't open urandom directly but instead talked to
an entropy daemon.  This approach would allow libvirt to hand out
entropy as it saw fit without requiring a new driver.


The nice thing about just using a plain chardev backend for virtiorng
is that it would let you have the flexibility to integrate with any kind
of entropy daemon, or even just run without a daemon&  rely on some other
process to periodically open the chardev&  feed in data.


Ack.

But there is an entropy daemon that does use a protocol.  The protocol may be 
"just read raw random data" but we should at least check to make sure that is 
the protocol.


Regards,

Anthony Liguori




I tend to think that virtio-rng should have a chardev backend associated
with it. The driver should just read from this chardev to get its entropy.
Either libvirtd, or better yet a separate virt-entropyd daemonm would
connect to each guest&   feed the entropy into each guest according to
some desired metrics.


Regards,
Daniel





Re: [Qemu-devel] [Qemu-discuss] Issue of usb emulation with qemu

2012-05-16 Thread Linux Porting
On 04/09/2012 06:42 AM, tangming wrote: 
Recentlly I have an usb emulation issue with qemu. There is one usb device on 
host, and now I need this usb device to be used by several VMs.
I enable the usb emulation with the VMs' qemu parameter "-usbdevice 
host::", with individual qemu instance for each vm.
but seems that only one vm can use this usb normally. Is there any way to share 
the emulated usb device with two or more VMs?
Any suggestions will be appreciated, 
>
>

mike


Hi, Guys Recentlly I have an usb emulation issue with qemu. There is one usb 
device on 
host, and now I need this usb device to be used by several VMs.
I enable the usb emulation with the VMs' qemu parameter "-usbdevice 
host::", with individual qemu instance for each vm.
but seems that only one vm can use this usb normally.  Is there any way to 
share the emulated usb device with two or more VMs? 
Any suggestions will be appreciated, Best Regards, Ming 
//***
hi . qemu users and developers,
please tell me how to emulate usb with guest os (single vm)

i  m using following configuration..


vikas@vikas-ThinkCentre-A70:~$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=11.04
DISTRIB_CODENAME=natty
DISTRIB_DESCRIPTION="Ubuntu 11.04"

vikas@vikas-ThinkCentre-A70:~$ uname -a
Linux vikas-ThinkCentre-A70 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:50 
UTC 2011 i686 i686 i386 GNU/Linux

QEMU emulator version 1.0,1, Copyright (c) 2003-2008 Fabrice Bellard

i am not able to use with single guest os ..

i have tried all efforts by using (-usb -usbdevice host:xxx:xxx) also 

please  help me

Thanks and Regards
vikas pandey

[Qemu-devel] Fw: [Qemu-discuss] Issue of usb emulation with qemu

2012-05-16 Thread Linux Porting


u





dear sir,

this is vikas pandey,i am using qemu and i am abe to run some of kernel image  
and initrd which is available on qemu site for testing purpose.
now my ami is to connect the host usb and access the content after running  iso 
image or specifing kernel and initrd image on qemu .

for this task i have followed some procedure which is inlined
//

cat /etc/lsb-release
Host pc discription : 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=11.04
DISTRIB_CODENAME=natty
DISTRIB_DESCRIPTION="Ubuntu 11.04"
==>uname -a
Linux vikas-ThinkCentre-A70 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:50 
UTC 2011 i686 i686 i386 GNU/Linux

QEMU emulator version 1.0,1, Copyright (c) 2003-2008 Fabrice Bellard

fist i have checked with iso images
step 1:
dd of=ubuntuimage bs=1024 seek=10485760 count=0  
step 2:
qemu -hda ubuntuimage -cdrom m1040_wifi.iso -usbdevice host:1e3d:2093 -m 192 
-boot d
usb_create: no bus specified, using "usb.0" for "usb-host"
(it is a problem which is not permitting guest os to access completly even 
media will be detected on console while booting on qemu  and we can add usb in 
qemu but in qemue gues os shelll it is not detecting any where  )
husb: open device 1.6
husb: config #1 need -1
husb: 1 interfaces claimed for configuration 1
husb: grabbed usb device 1.6
husb: config #1 need 1
husb: 1 interfaces claimed for configuration 1
husb: config #1 need 1
husb: 1 interfaces claimed for configuration 1  
=>2nd i have used kernel and initrd image 
step >qemu-system-arm -kernel zImage.integrator -initrd arm_root.img -hda 
ubuntuimage -usb -usbdevice host:1e3d:2093  -m 128
segmantation fault
//
please suggest me as soon as possible ,and please tell me that is it compulsary 
that use kvm and libvirt with qemu for using host usb or not..
because i have also tried alot  with libvirt but every time it is having some 
different -2 error currently error  related with libvirt is ..
==>virsh -c qemu:///system list

virsh: /usr/lib/libvirt.so.0: version `LIBVIRT_0.9.3' not found (required by 
virsh)
virsh: /usr/lib/libvirt.so.0: version `LIBVIRT_0.9.0' not found (required by 
virsh)
virsh: /usr/lib/libvirt.so.0: version `LIBVIRT_PRIVATE_0.9.3' not found 
(required by virsh)
virsh: /usr/lib/libvirt.so.0: version `LIBVIRT_0.9.2' not found (required by 
virsh

plese tell me now what should ido...
thanks& regardsvikas pandey
Thanks and Regards
vikas pandey

Re: [Qemu-devel] [PATCH v3] pci: clean all funcs when hot-removing multifunc device

2012-05-16 Thread Bjorn Helgaas
On Fri, May 11, 2012 at 8:00 AM, Jiang Liu  wrote:
> On 05/11/2012 08:24 AM, Amos Kong wrote:
>> On 05/11/2012 07:54 AM, Amos Kong wrote:
>>> On 05/11/2012 02:55 AM, Michael S. Tsirkin wrote:
 On Fri, May 11, 2012 at 01:09:13AM +0800, Jiang Liu wrote:
> On 05/10/2012 11:44 PM, Amos Kong wrote:
>
>> diff --git a/drivers/pci/hotplug/acpiphp_glue.c 
>> b/drivers/pci/hotplug/acpiphp_glue.c
>> index 806c44f..a7442d9 100644
>> --- a/drivers/pci/hotplug/acpiphp_glue.c
>> +++ b/drivers/pci/hotplug/acpiphp_glue.c
>> @@ -885,7 +885,7 @@ static void disable_bridges(struct pci_bus *bus)
>>  static int disable_device(struct acpiphp_slot *slot)
>>  {
>>   struct acpiphp_func *func;
>> - struct pci_dev *pdev;
>> + struct pci_dev *pdev, *tmp;
>>   struct pci_bus *bus = slot->bridge->pci_bus;
>>
>>   /* The slot will be enabled when func 0 is added, so check
>> @@ -902,9 +902,10 @@ static int disable_device(struct acpiphp_slot *slot)
>>                   func->bridge = NULL;
>>           }
>>
>> -         pdev = pci_get_slot(slot->bridge->pci_bus,
>> -                             PCI_DEVFN(slot->device, func->function));
>> -         if (pdev) {
>> +         list_for_each_entry_safe(pdev, tmp, &bus->devices, bus_list) {
>> +                 if (PCI_SLOT(pdev->devfn) != slot->device)
>> +                         continue;

I think the concept is good: in enable_device(), we use
pci_scan_slot(), which scans all possible functions in the slot.  So
in disable_device() we should do something symmetric to remove all the
functions.

>> +
> The pci_bus_sem lock should be acquired when walking the bus->devices 
> list.
> Otherwise it may cause invalid memory access if another thread is 
> modifying
> the bus->devices list concurrently.
>>
>> pci_bus_sem lock is only request for writing &bus->devices list, right ?
>> and this protection already exists in pci_destory_dev().
> That's for writer. For reader to walk the pci_bus->devices list, you also need
> to acquire the reader lock by down_read(&pci_bus_sem). Please refer to
> pci_get_slot() for example. This especially import for native OS because there
> may be multiple PCI slots/devices on the bus.

There is a lot of existing code that walks bus->devices without
holding pci_bus_sem, but most of it is boot-time code that is arguably
safe (though I think things like pcibios_fixup_bus() are poorly
designed and don't fit well in the hotplug-enabled world).

In this case, I do think we need to protect against updates while
we're walking bus->devices.  It's probably not trivial because
__pci_remove_bus_device() calls pci_destroy_dev(), where we do the
down_write(), so simply wrapping the whole thing with down_read() will
cause a deadlock.

Kenji-san, Yinghai, do you have any input?

Bjorn



Re: [Qemu-devel] [PATCH 2/2] Get system state configuration from QEMU and patcth DSDT with it.

2012-05-16 Thread Paolo Bonzini
Il 16/05/2012 15:46, Gleb Natapov ha scritto:
> I saw that, but I don't get why doing it this way instead of defining
> the object in AML and patching it? I can define Name(S4VL, 0x2) and path
> 0x2 to whatever QEMU wants me to use, or I can patch Package directly
> like I did.
> 

Can we build an SSDT that includes the contents of fw_cfg (e.g.
FW_CFG_SIGNATURE at offset 0, FW_CFG_UUID at offset 4, FW_CFG_NOGRAPHIC
at offset 16... the entry <-> offset mapping and the defaults would be
part of SeaBIOS), and then read that data from normal DSDT methods?

That would be similar to Gerd's patch, but without letting the OSPM use
the real fw_cfg device.

Paolo



Re: [Qemu-devel] [PATCH 1/4] Introduce a new hotplug state: Force eject.

2012-05-16 Thread Paolo Bonzini
Il 16/05/2012 15:20, Michael S. Tsirkin ha scritto:
>>> Because it's missing the object_unparent done by qdev_unplug.  Does
>>> object_unparent+qdev_free work?  (I believe object_unparent should be
>>> done by qdev_free rather than qdev_unplug, but that's something for 1.2).
>>
>> Cool, this seems to work fine. Thanks.
>>
>> I'll test a bit more and resend a patch with only object_unparent+qdev_free.
> 
> Separately it was suggested to make qdev_free do object_unparent
> automatically. Anthony is yet to respond.

Yes, but for 1.1 this Xen-only patch would be nice.

Paolo




Re: [Qemu-devel] [PATCH 1/4] Introduce a new hotplug state: Force eject.

2012-05-16 Thread Anthony Liguori

On 05/16/2012 06:23 AM, Paolo Bonzini wrote:

Il 16/05/2012 13:15, Anthony PERARD ha scritto:

 qdev_unplug(&(d->qdev), NULL);
+qdev_free(&(d->qdev));
 }
  }


Anthony, can you confirm that this solves the problem for you?

This work until I try to hotplug a new device to the guest at wish
point I have this:
ERROR:/local/home/anthony/work/qemu/qom/object.c:389:object_delete:
assertion failed: (obj->ref == 0)

This is because there is still a pending request of the hotunplug in
the acpi piix4.
If I call qdev_free without qdev_unplug, I hit the same assert, but
rigth away. This is way something new.


Because it's missing the object_unparent done by qdev_unplug.  Does
object_unparent+qdev_free work?  (I believe object_unparent should be
done by qdev_free rather than qdev_unplug, but that's something for 1.2).


qdev_free() is trivially object_delete today.

What we should do is make an object_destroy() which emits a destroy event and 
then decrements the reference count.  When ref == 0, we should emit a delete event.


We could then register a slot to object_unparent in the destroy event handler, 
and then object_new() could register a free handler in the delete event.


Then object_delete()/qdev_free() just become trivial invocations of 
object_unref().

But for 1.1, we definitely should just do an explicit object_unparent().

Regards,

Anthony Liguori



Paolo





Re: [Qemu-devel] [PATCH] Add 'query-events' command to QMP to query async events

2012-05-16 Thread Eric Blake
On 05/16/2012 06:55 AM, Daniel P. Berrange wrote:
> From: "Daniel P. Berrange" 
> 
> Sometimes it is neccessary for an application to determine
> whether a particular QMP event is available, so they can
> decide whether to use compatibility code instead. This
> introduces a new 'query-events' command to QMP todo just

s/todo/to do/

> that
> 
>  { "execute": "query-events" }
>  {"return": [{"name": "WAKEUP"},
>  {"name": "SUSPEND"},
>  {"name": "DEVICE_TRAY_MOVED"},
>  {"name": "BLOCK_JOB_CANCELLED"},
>  {"name": "BLOCK_JOB_COMPLETED"},
>  ...snip...
>  {"name": "SHUTDOWN"}]}
> 
> * monitor.c: Split out MonitorEvent -> string conversion
>   into monitor_protocol_event_name() API. Add impl of
>   qmp_query_events monitor command handler
> * qapi-schema.json, qmp-commands.hx: Define contract of
>   query-events command

Definitely useful for libvirt.


> +static const char *monitor_protocol_event_name(MonitorEvent event)
> +{
>  switch (event) {
>  case QEVENT_SHUTDOWN:
> -event_name = "SHUTDOWN";
> +return "SHUTDOWN";
>  break;

These 'break' statements are now unreachable; does this matter in qemu
coding style?


> +query-events
> +--
> +
> +List QMP available events.
> +
> +Each event is represented by a json-object, the returned value is a 
> json-array
> +of all events.
> +
> +Each json-object contain:

s/contain/contains/

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH next v2 73/74] linux-user: Use cpu_reset() after cpu_init() / cpu_copy()

2012-05-16 Thread Igor Mammedov

On 05/16/2012 03:01 PM, Andreas Färber wrote:

Am 16.05.2012 14:36, schrieb Igor Mammedov:

On 05/11/2012 01:26 PM, Andreas Färber wrote:

Am 11.05.2012 13:22, schrieb Peter Maydell:

On 10 May 2012 01:14, Andreas Färber   wrote:

Eliminates cpu_state_reset() usage.

Signed-off-by: Andreas Färber
---
   linux-user/main.c|2 +-
   linux-user/syscall.c |2 +-
   2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/linux-user/main.c b/linux-user/main.c
index 191b750..49108b8 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -3405,7 +3405,7 @@ int main(int argc, char **argv, char **envp)
  exit(1);
  }
   #if defined(TARGET_I386) || defined(TARGET_SPARC) ||
defined(TARGET_PPC)
-cpu_state_reset(env);
+cpu_reset(ENV_GET_CPU(env));
   #endif

  thread_env = env;
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 20d2a74..539af3f 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -4262,7 +4262,7 @@ static int do_fork(CPUArchState *env, unsigned
int flags, abi_ulong newsp,
  /* we create a new CPU instance. */
  new_env = cpu_copy(env);
   #if defined(TARGET_I386) || defined(TARGET_SPARC) ||
defined(TARGET_PPC)
-cpu_state_reset(new_env);
+cpu_reset(ENV_GET_CPU(new_env));
   #endif
  /* Init regs that differ from the parent.  */
  cpu_clone_regs(new_env, newsp);
--


Do you have any plans to try to rationalise the handling of reset
so that we consistently either do or don't reset the cpu here,
rather than having it done based on a TARGET_* ifdef ?


Igor had an RFC for x86; sparc and ppc reset I haven't looked into yet.
Cc'ing Alex and Blue.

I'll rebase RFC for x86 and post patches today and will remove it from here
by the last patch in patchset so that when this patch applied we could
remove
unnecessary call.
So ACK for target-i386 here.


Since back then Peter and I have discussed whether we can rather just
remove the #ifdef here and reset for all targets.

Unfortunately I'm still not clear about some patches that stand in the
way of ObjectClass::realize - if cpu_reset() is moved to realizefn for
all targets then we can just call realize here.


1. I'd like to have cpu_reset in realizefn - which is kind of equivalent
of cpu power-on, after which cpu should be in known state
(i.e. the state after reset, at least for target-i386).

2. cpu_reset in realizefn as well will free us from calling cpu_reset
in device_add when implementing hotplug and will allow to remove it from
hw/pc.c.

3. regarding exec.c:cpu_copy and do_fork on target-i386
 cpu_copy first creates cpu then copies all over it CPUArchState of original
 cpu perserving only cpu_index. It's beyond my understanding why
 anyone would/need do this.
 So I'll not touch cpu_reset in do_fork since I haven't a clue what's
 going on. Albeit adding cpu_reset in realizefn shouldn't break anything here.



Actually, all we'd need is ObjectClass::realize field, so I'm
considering extracting the intersection between Paolo and me, stick a
CPU-specific wrapper method on top as requested by Anthony and then we
can move ahead here...

Andreas



--
-
 Igor



Re: [Qemu-devel] [PATCH next v2 73/74] linux-user: Use cpu_reset() after cpu_init() / cpu_copy()

2012-05-16 Thread Peter Maydell
On 16 May 2012 17:05, Igor Mammedov  wrote:
> 1. I'd like to have cpu_reset in realizefn - which is kind of equivalent
> of cpu power-on, after which cpu should be in known state
> (i.e. the state after reset, at least for target-i386).

Realize is not the same as "has come out of reset". Consider
CPUs where on coming out of reset they read the initial program
counter from RAM. (We don't really handle that properly anyway,
though.)

> 2. cpu_reset in realizefn as well will free us from calling cpu_reset
> in device_add when implementing hotplug and will allow to remove it from
> hw/pc.c.
>
> 3. regarding exec.c:cpu_copy and do_fork on target-i386
>  cpu_copy first creates cpu then copies all over it CPUArchState of original
>  cpu perserving only cpu_index. It's beyond my understanding why
>  anyone would/need do this.
>  So I'll not touch cpu_reset in do_fork since I haven't a clue what's
>  going on. Albeit adding cpu_reset in realizefn shouldn't break anything
> here.

The purpose of cpu_copy is to handle creating a new pthread in
linux-user mode. The new thread is supposed to start out with
a CPU state (as far as user mode is concerned) which is identical
to that of its parent thread. So we create a new CPU which is
a copy of the old one. memcpy() is a cheap hack to do this, which
is creaking at the seams rather these days. If you delve back in
the git history to when it was first put in it looked a bit cleaner...

The fact that do_fork then calls cpu_reset (for some targets) is
a bug -- we need to call cpu_reset after cpu_init and before the
copying of the registers, so in do_fork is too late (it wipes
out the copied registers we've just gone to the effort of setting
up).

-- PMM



Re: [Qemu-devel] [PATCH] Add 'query-events' command to QMP to query async events

2012-05-16 Thread Anthony Liguori

On 05/16/2012 11:04 AM, Eric Blake wrote:

On 05/16/2012 06:55 AM, Daniel P. Berrange wrote:

From: "Daniel P. Berrange"

Sometimes it is neccessary for an application to determine
whether a particular QMP event is available, so they can
decide whether to use compatibility code instead. This
introduces a new 'query-events' command to QMP todo just


s/todo/to do/


that

  { "execute": "query-events" }
  {"return": [{"name": "WAKEUP"},
  {"name": "SUSPEND"},
  {"name": "DEVICE_TRAY_MOVED"},
  {"name": "BLOCK_JOB_CANCELLED"},
  {"name": "BLOCK_JOB_COMPLETED"},
  ...snip...
  {"name": "SHUTDOWN"}]}

* monitor.c: Split out MonitorEvent ->  string conversion
   into monitor_protocol_event_name() API. Add impl of
   qmp_query_events monitor command handler
* qapi-schema.json, qmp-commands.hx: Define contract of
   query-events command


Definitely useful for libvirt.



+static const char *monitor_protocol_event_name(MonitorEvent event)
+{
  switch (event) {
  case QEVENT_SHUTDOWN:
-event_name = "SHUTDOWN";
+return "SHUTDOWN";
  break;


These 'break' statements are now unreachable; does this matter in qemu
coding style?


Seems like we should just take the opportunity to kill off the function entirely 
and replace it with a table:


static const char *monitor_event_names[] = {
 [QEVENT_SHUTDOWN] = "SHUTDOWN",
 ...
};

Regards,

Anthony Liguori



Re: [Qemu-devel] [PATCH] Add 'query-events' command to QMP to query async events

2012-05-16 Thread Daniel P. Berrange
On Wed, May 16, 2012 at 11:18:22AM -0500, Anthony Liguori wrote:
> On 05/16/2012 11:04 AM, Eric Blake wrote:
> >On 05/16/2012 06:55 AM, Daniel P. Berrange wrote:
> >>From: "Daniel P. Berrange"
> >>
> >>Sometimes it is neccessary for an application to determine
> >>whether a particular QMP event is available, so they can
> >>decide whether to use compatibility code instead. This
> >>introduces a new 'query-events' command to QMP todo just
> >
> >s/todo/to do/
> >
> >>that
> >>
> >>  { "execute": "query-events" }
> >>  {"return": [{"name": "WAKEUP"},
> >>  {"name": "SUSPEND"},
> >>  {"name": "DEVICE_TRAY_MOVED"},
> >>  {"name": "BLOCK_JOB_CANCELLED"},
> >>  {"name": "BLOCK_JOB_COMPLETED"},
> >>  ...snip...
> >>  {"name": "SHUTDOWN"}]}
> >>
> >>* monitor.c: Split out MonitorEvent ->  string conversion
> >>   into monitor_protocol_event_name() API. Add impl of
> >>   qmp_query_events monitor command handler
> >>* qapi-schema.json, qmp-commands.hx: Define contract of
> >>   query-events command
> >
> >Definitely useful for libvirt.
> >
> >
> >>+static const char *monitor_protocol_event_name(MonitorEvent event)
> >>+{
> >>  switch (event) {
> >>  case QEVENT_SHUTDOWN:
> >>-event_name = "SHUTDOWN";
> >>+return "SHUTDOWN";
> >>  break;
> >
> >These 'break' statements are now unreachable; does this matter in qemu
> >coding style?
> 
> Seems like we should just take the opportunity to kill off the
> function entirely and replace it with a table:
> 
> static const char *monitor_event_names[] = {
>  [QEVENT_SHUTDOWN] = "SHUTDOWN",
>  ...
> };

That sounds good to me, I'll update the patch accordingly.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] [PATCH 05/13] pci: New pci_acs_enabled()

2012-05-16 Thread Alex Williamson
On Wed, 2012-05-16 at 09:29 -0400, Don Dutile wrote:
> On 05/15/2012 05:09 PM, Alex Williamson wrote:
> > On Tue, 2012-05-15 at 13:56 -0600, Bjorn Helgaas wrote:
> >> On Mon, May 14, 2012 at 4:49 PM, Alex Williamson
> >>   wrote:
> >>> On Mon, 2012-05-14 at 16:02 -0600, Bjorn Helgaas wrote:
>  On Fri, May 11, 2012 at 4:56 PM, Alex Williamson
>    wrote:
> > In a PCIe environment, transactions aren't always required to
> > reach the root bus before being re-routed.  Peer-to-peer DMA
> > may actually not be seen by the IOMMU in these cases.  For
> > IOMMU groups, we want to provide IOMMU drivers a way to detect
> > these restrictions.  Provided with a PCI device, pci_acs_enabled
> > returns the furthest downstream device with a complete PCI ACS
> > chain.  This information can then be used in grouping to create
> > fully isolated groups.  ACS chain logic extracted from libvirt.
> 
>  The name "pci_acs_enabled()" sounds like it returns a boolean, but it 
>  doesn't.
> >>>
> >>> Right, maybe this should be:
> >>>
> >>> struct pci_dev *pci_find_upstream_acs(struct pci_dev *pdev);
> >>>
> +1; there is a global in the PCI code, pci_acs_enable,
> and a function pci_enable_acs(), which the above name certainly
> confuses.  I recommend  pci_find_top_acs_bridge()
> would be most descriptive.

Yep, the new API I'm working with is:

bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags);
bool pci_acs_path_enabled(struct pci_dev *start,
  struct pci_dev *end, u16 acs_flags);

>  I'm not sure what "a complete PCI ACS chain" means.
> 
>  The function starts from "dev" and searches *upstream*, so I'm
>  guessing it returns the root of a subtree that must be contained in a
>  group.
> >>>
> >>> Any intermediate switch between an endpoint and the root bus can
> >>> redirect a dma access without iommu translation,
> >>
> >> Is this "redirection" just the normal PCI bridge forwarding that
> >> allows peer-to-peer transactions, i.e., the rule (from P2P bridge
> >> spec, rev 1.2, sec 4.1) that the bridge apertures define address
> >> ranges that are forwarded from primary to secondary interface, and the
> >> inverse ranges are forwarded from secondary to primary?  For example,
> >> here:
> >>
> >> ^
> >> |
> >>++---+
> >>||
> >> +--+-++-++-+
> >> | Downstream || Downstream |
> >> |Port||Port|
> >> |   06:05.0  ||   06:06.0  |
> >> +--+-++--+-+
> >>| |
> >>   +v+   +v+
> >>   | Endpoint|   | Endpoint|
> >>   | 07:00.0 |   | 08:00.0 |
> >>   +-+   +-+
> >>
> >> that rule is all that's needed for a transaction from 07:00.0 to be
> >> forwarded from upstream to the internal switch bus 06, then claimed by
> >> 06:06.0 and forwarded downstream to 08:00.0.  This is plain old PCI,
> >> nothing specific to PCIe.
> >
> > Right, I think the main PCI difference is the point-to-point nature of
> > PCIe vs legacy PCI bus.  On a legacy PCI bus there's no way to prevent
> > devices talking to each other, but on PCIe the transaction makes a
> > U-turn at some point and heads out another downstream port.  ACS allows
> > us to prevent that from happening.
> >
> detail: PCIe up/downstream routing is really done by an internal switch;
>  ACS forces the legacy, PCI base-limit address routing and *forces*
>  the switch to always route the transaction from a downstream port
>  to the upstream port.
> 
> >> I don't understand ACS very well, but it looks like it basically
> >> provides ways to prevent that peer-to-peer forwarding, so transactions
> >> would be sent upstream toward the root (and specifically, the IOMMU)
> >> instead of being directly claimed by 06:06.0.
> >
> > Yep, that's my meager understanding as well.
> >
> +1
> 
> >>> so we're looking for
> >>> the furthest upstream device for which acs is enabled all the way up to
> >>> the root bus.
> >>
> >> Correct me if this is wrong: To force device A's DMAs to be processed
> >> by an IOMMU, ACS must be enabled on the root port and every downstream
> >> port along the path to A.
> >
> > Yes, modulo this comment in libvirt source:
> >
> >  /* if we have no parent, and this is the root bus, ACS doesn't come
> >   * into play since devices on the root bus can't P2P without going
> >   * through the root IOMMU.
> >   */
> >
> Correct. PCIe spec says roots must support ACS. I believe all the
> root bridges that have an IOMMU have ACS wired in/on.

Would you mind looking for the paragraph that says this?  I'd rather
code this into the iommu driver callers than core PCI code if this is
just a platform standard.

> > So we assume that a redirect at the point of the iommu will factor in
> > iommu tra

Re: [Qemu-devel] [PATCH 1/4] Introduce a new hotplug state: Force eject.

2012-05-16 Thread Michael S. Tsirkin
On Wed, May 16, 2012 at 11:02:30AM -0500, Anthony Liguori wrote:
> On 05/16/2012 06:23 AM, Paolo Bonzini wrote:
> >Il 16/05/2012 13:15, Anthony PERARD ha scritto:
>  qdev_unplug(&(d->qdev), NULL);
> +qdev_free(&(d->qdev));
>  }
>   }
> 
> 
> Anthony, can you confirm that this solves the problem for you?
> >>This work until I try to hotplug a new device to the guest at wish
> >>point I have this:
> >>ERROR:/local/home/anthony/work/qemu/qom/object.c:389:object_delete:
> >>assertion failed: (obj->ref == 0)
> >>
> >>This is because there is still a pending request of the hotunplug in
> >>the acpi piix4.
> >>If I call qdev_free without qdev_unplug, I hit the same assert, but
> >>rigth away. This is way something new.
> >
> >Because it's missing the object_unparent done by qdev_unplug.  Does
> >object_unparent+qdev_free work?  (I believe object_unparent should be
> >done by qdev_free rather than qdev_unplug, but that's something for 1.2).
> 
> qdev_free() is trivially object_delete today.
> 
> What we should do is make an object_destroy() which emits a destroy
> event and then decrements the reference count.  When ref == 0, we
> should emit a delete event.
> 
> We could then register a slot to object_unparent in the destroy
> event handler, and then object_new() could register a free handler
> in the delete event.
> 
> Then object_delete()/qdev_free() just become trivial invocations of 
> object_unref().
> 
> But for 1.1, we definitely should just do an explicit object_unparent().
> 
> Regards,
> 
> Anthony Liguori
> 
> >
> >Paolo

Okay. Can you fix the bug Amos reported this way too
to avoid confusion?  Or prefer Amos to do it?

-- 
MST



Re: [Qemu-devel] [PATCH 1/4] Introduce a new hotplug state: Force eject.

2012-05-16 Thread Michael S. Tsirkin
On Wed, May 16, 2012 at 05:52:17PM +0200, Paolo Bonzini wrote:
> Il 16/05/2012 15:20, Michael S. Tsirkin ha scritto:
> >>> Because it's missing the object_unparent done by qdev_unplug.  Does
> >>> object_unparent+qdev_free work?  (I believe object_unparent should be
> >>> done by qdev_free rather than qdev_unplug, but that's something for 1.2).
> >>
> >> Cool, this seems to work fine. Thanks.
> >>
> >> I'll test a bit more and resend a patch with only 
> >> object_unparent+qdev_free.
> > 
> > Separately it was suggested to make qdev_free do object_unparent
> > automatically. Anthony is yet to respond.
> 
> Yes, but for 1.1 this Xen-only patch would be nice.
> 
> Paolo

No sure which this patch is meant, the force eject is not a good idea.
Freeing directly from Xen is fine.

-- 
MST



Re: [Qemu-devel] [PATCH] Add 'query-events' command to QMP to query async events

2012-05-16 Thread Luiz Capitulino
On Wed, 16 May 2012 11:18:22 -0500
Anthony Liguori  wrote:

> On 05/16/2012 11:04 AM, Eric Blake wrote:
> > On 05/16/2012 06:55 AM, Daniel P. Berrange wrote:
> >> From: "Daniel P. Berrange"
> >>
> >> Sometimes it is neccessary for an application to determine
> >> whether a particular QMP event is available, so they can
> >> decide whether to use compatibility code instead. This
> >> introduces a new 'query-events' command to QMP todo just
> >
> > s/todo/to do/
> >
> >> that
> >>
> >>   { "execute": "query-events" }
> >>   {"return": [{"name": "WAKEUP"},
> >>   {"name": "SUSPEND"},
> >>   {"name": "DEVICE_TRAY_MOVED"},
> >>   {"name": "BLOCK_JOB_CANCELLED"},
> >>   {"name": "BLOCK_JOB_COMPLETED"},
> >>   ...snip...
> >>   {"name": "SHUTDOWN"}]}
> >>
> >> * monitor.c: Split out MonitorEvent ->  string conversion
> >>into monitor_protocol_event_name() API. Add impl of
> >>qmp_query_events monitor command handler
> >> * qapi-schema.json, qmp-commands.hx: Define contract of
> >>query-events command
> >
> > Definitely useful for libvirt.
> >
> >
> >> +static const char *monitor_protocol_event_name(MonitorEvent event)
> >> +{
> >>   switch (event) {
> >>   case QEVENT_SHUTDOWN:
> >> -event_name = "SHUTDOWN";
> >> +return "SHUTDOWN";
> >>   break;
> >
> > These 'break' statements are now unreachable; does this matter in qemu
> > coding style?
> 
> Seems like we should just take the opportunity to kill off the function 
> entirely 
> and replace it with a table:
> 
> static const char *monitor_event_names[] = {
>   [QEVENT_SHUTDOWN] = "SHUTDOWN",
>   ...
> };

Yes, that's really better.

Btw, the ideal way of having this would be to add the event type to the
qapi and then add schema introspection.

As I'm not sure I can guarantee that for 1.2, I'm ok with this command.

Anthony, I'm assuming you're ok too.



Re: [Qemu-devel] [PATCH 2/2] Get system state configuration from QEMU and patcth DSDT with it.

2012-05-16 Thread Paolo Bonzini
Il 16/05/2012 18:40, Gleb Natapov ha scritto:
> On Wed, May 16, 2012 at 05:50:31PM +0200, Paolo Bonzini wrote:
>> Il 16/05/2012 15:46, Gleb Natapov ha scritto:
>>> I saw that, but I don't get why doing it this way instead of defining
>>> the object in AML and patching it? I can define Name(S4VL, 0x2) and path
>>> 0x2 to whatever QEMU wants me to use, or I can patch Package directly
>>> like I did.
>>>
>>
>> Can we build an SSDT that includes the contents of fw_cfg (e.g.
>> FW_CFG_SIGNATURE at offset 0, FW_CFG_UUID at offset 4, FW_CFG_NOGRAPHIC
>> at offset 16... the entry <-> offset mapping and the defaults would be
>> part of SeaBIOS), and then read that data from normal DSDT methods?
> 
> Kevin does not want to use offsets any more :) He wants to use files, so
> this will not work for new entries.

Then we can have:
- a table in SeaBIOS with (filenames, expected length) pairs
- an ACPI table with a list of offsets for each file (-1 if file not
found or length < expected length), with the same indices as the
previous table
- and another blob with all the files concatenated

The idea is the same, just pass the fw_cfg data to the DSDT and read it
from there.  As long as Windows and Linux can cope with the more complex
AML, there is no need to do complicated patching IMO...

>>
>> That would be similar to Gerd's patch, but without letting the OSPM use
>> the real fw_cfg device.
>>
> Latest Gerd's patch does not use fw_cfg device.

Yes, I meant the same as his first patch, not really the machanics of
creating the BDAT.

Paolo



Re: [Qemu-devel] [PATCH v10 3/9] Add XBZRLE documentation

2012-05-16 Thread Eric Blake
On 05/16/2012 05:59 AM, Orit Wasserman wrote:
> Signed-off-by: Orit Wasserman 
> ---
>  docs/xbzrle.txt |   97 
> +++
>  1 files changed, 97 insertions(+), 0 deletions(-)
>  create mode 100644 docs/xbzrle.txt
> 
> diff --git a/docs/xbzrle.txt b/docs/xbzrle.txt
> new file mode 100644
> index 000..aafdb84
> --- /dev/null
> +++ b/docs/xbzrle.txt
> @@ -0,0 +1,97 @@
> +XBZRLE (Xor Based Zero Run Length Encoding)
> +===
> +
> +Using XBZRLE (Xor Based Zero Run Length Encoding) allows for the reduction 
> of VM downtime
> +and the total live-migration time of Virtual machines. It is particularly 
> useful for virtual machines running memory write intensive workloads that are 
> typical of large enterprise applications such as SAP ERP Systems, and 
> generally

Any reason you aren't wrapping at column 80?

> +speaking for any application that uses a sparse memory update pattern.
> +
> +Instead of sending the changed guest memory page this solution will send a 
> compressed version of the updates, thus reducing the amount of data sent 
> during live migration.
> +In order to be able to calculate the update, the previous memory pages 
> needed to be stored. Those pages are stored in a dedicated cache (hash table) 
> and are accessed by their address.
> +The larger the cache size the better the chances are that the page has 
> already been stored in the cache. A Small cache size will result in high 
> cache miss rate.

s/Small/small/

> +Usage
> +==
> +
> +1. Activate xbzrle
> +2. Set the XBZRLE cache size - the cache size is in MBytes and should be a 
> power of 2. The cache default value is 64MBytes.
> +3. start outgoing migration
> +
> +A typical usage scenario:
> +{qemu} migrate_set_parameter xbzrle
> +{qemu} migrate_set_cachesize 256m
> +{qemu} migrate -d tcp:destination.host:
> +{qemu} info migrate
> +...
> +transferred ram-duplicate: A kbytes
> +transferred ram-normal: B kbytes
> +transferred ram-xbrle: C kbytes
> +overflow ram-xbrle: D pages
> +cache-miss ram-xbrle: E pages
> +
> +cache-miss: the number of cache misses to date - high cache-miss rate
> +indicates that the cache size is set too low.
> +overflow: the number of overflows in the decoding which where the delta 
> could not be compressed. This can happen if the changes in the pages are too 
> large
> +or there are many short changes for example change every second byte (half a 
> page).

Can cachesize be modified during an in-progress migration?  Do both
source and destination need to agree on cache size?

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v10 8/9] Add set_cachesize command

2012-05-16 Thread Orit Wasserman
On 05/16/2012 07:45 PM, Eric Blake wrote:
> On 05/16/2012 05:59 AM, Orit Wasserman wrote:
>> Change XBZRLE cache size in bytes (the size should be a power of 2).
>> If XBZRLE cache size is too small there will be many cache miss.
>>
>> Signed-off-by: Benoit Hudzia 
>> Signed-off-by: Petter Svard 
>> Signed-off-by: Aidan Shribman 
>> Signed-off-by: Orit Wasserman 
> 
>>  
>> +
>> +void xbzrle_cache_resize(int64_t order)
>> +{
>> +cache_resize(XBZRLE.cache, pow(2, order));
> 
> '1 << order' is much more efficient than a call to pow().
ok
> 
> 
>> +void qmp_migrate_set_cachesize(int64_t value, Error **errp)
> 
>> +
>> +/* power of 2 */
>> +if (value != 1 && !is_power_of_2(value)) {
>> +error_set(errp, QERR_INVALID_PARAMETER_VALUE, "cache size",
>> +  "needs to be power of 2");
> 
> We already have QERR_PROPERTY_VALUE_NOT_POWER_OF_2, why aren't you using
> that here?
I will update it.
> 
>> +return;
>> +}
>> +
>> +s->xbzrle_cache_size = value;
>> +xbzrle_cache_resize(log2(value));
> 
> log2() is rather expensive, ffs() from  is more efficient at
> converting a single bit into the appropriate order.
ok
> 
> 
>>  ##
>> +# @migrate_set_cachesize
>> +#
>> +# Set XBZRLE cache size
>> +#
>> +# @value: cache size in bytes
>> +#
>> +# Returns: nothing on success
> 
> Document the error for a non-power-of-2 or for overflow.
> 
> Document whether this command is safe for an ongoing migration, or
> whether it must be called in advance of a migration.
sure
> 
>> +#
>> +# Since: 1.1
> 
> 1.2.
> 
> 
>> +static inline bool is_power_of_2(int64_t value)
>> +{
>> +return !(value & (value - 1));
>> +}
> 
> This says '0' is a power of 2, which is not true.  Either fix the logic
> to exclude 0, or fix the function name to state that you are really
> checking that at most one bit is set.
> 
> Also, if value is 0x8000, you are triggering unspecified
> behavior per C99.  Is it worth using uint64_t for defined behavior, or
> do you need to take precautions regarding negative values?
The input is int64 so I prefer to keep it this way.
The calling function does the check for 0 , negative numbers and overflow
but I can add those checks here too.

> 
> 
>> +SQMP
>> +migrate_set_cachesize
>> +-
>> +
>> +Set cache size to be used by XBZRLE migration
>> +
>> +Arguments:
>> +
>> +- "value": cache size in bytes (json-int)
> 
> Would it be any easier to take 'order' (log2 of the size) instead of the
> actual cache size?  That is, instead of calling "value":1048576, I would
> rather type "value":20.
Well the user is considering how much memory is going to be used and I though 
that it
is simpler to use 1G than 30.
But I guess the user is libvirt so it can be changed to order.

> 
>> +
>> +Example:
>> +
>> +-> { "execute": "migrate_set_cachesize", "arguments": { "value": 512 } }
> 
> Isn't 512 bytes rather small?  And given my argument about taking order
> rather than bytes as being easier to use, don't you really mean 512
> megabytes (order 29) rather than 512 bytes (order 9)?
> 
correct 512M not bytes ...

Orit




  1   2   >