date:20130131

Re: [Qemu-devel] RFC migration of zero pages

2013-01-31 Thread Peter Lieven

Am 31.01.2013 um 08:47 schrieb Orit Wasserman :

> On 01/31/2013 08:57 AM, Peter Lieven wrote:
>> Hi,
>> 
>> I just came across an idea and would like to have feedback if it makes sence 
>> or not.
>> 
>> If a VM is started without preallocated memory all memory that has not been 
>> written to
>> reads as zeros, right?
> Hi,
> No the memory will be unmapped (we allocate on demand).

Yes, but those unmapped pages will read as zeroes if the guest accesses it?

>> If a VM with a lot of unwritten memory is migrated or if the memory contains 
>> a lot
>> of zeroed out memory (e.g. Windows or Linux guest with page sanitization) 
>> all this memory
>> is allocated on the target during live migration. Especially with KSM this 
>> leads
>> to the problem that this memory is allocated and might be not available 
>> completely as
>> merging of the pages will happen async.
>> 
>> Wouldn't it make sense to not send zero pages in the first round where the 
>> complete
>> ram is sent (if it is detectable that we are in this stage)?
> We send one byte per zero page at the moment (see is_dup_page) we can further 
> optimizing it
> by not sending it.
> I have to point out that this is a very idle guest and we need to work on a 
> loaded guest 
> which is the more hard problem in migration.

I was not talking about saving one byte (+ 8 bytes for header), my concern was 
that we memset all (dup) pages
including the special case of a zero dup page on the migration target. This 
allocates the memory or does it not?

If my above assumption that the guest reads unmapped memory as zeroes is right, 
this mapping
is not necessary in the case of a zero dup page.

We just have to make sure that we are still in the very first round when 
deciding not to sent
a zero page, because otherwise it could be a page that has become zero during 
migration and
this of course has to be transferred.

> 
> Also I notice that the bottle neck in migrating unmapped pages is the 
> detection of those pages
> because we map the pages in order to check them, for a large guest this is 
> very expensive as mapping a page
> results in a page fault in the host.
> So what will be very helpful is actually locating those pages without mapping 
> them
> which looks very complicated.

This would be a nice improvement, but as you said a guest will sooner or later 
allocate
all memory if it is not totally idle. However, bigger parts of this memory 
might have been reset to zeroes.
This happens on page deallocation in a Windows Guest by default and can also be 
enforced in LInux
with page sanitization.

Peter

> 
> Regards,
> Orit
>> 
>> Peter
>> 
>> 
>

Re: [Qemu-devel] [PATCH WIP 0/4] vhost-scsi: new device supporting the tcm_vhost Linux kernel module

2013-01-31 Thread Nicholas A. Bellinger

Hi Paolo,

On Wed, 2013-01-30 at 17:41 +0100, Paolo Bonzini wrote:
> Ok, so here is my attempt at a vhost-scsi device.  I'm creating an
> entirely separate device, with the common parts of virtio-scsi and
> vhost-scsi (actually little more than the initialization) grouped into
> a VirtIOSCSICommon type.  The device is used simply like "-device
> vhost-scsi-pci,wwpn=WWPN", with all configuration done in configfs
> beforehand.
> 

Cool.  :)

> As expected, using a separate device finds a snag: vhost-scsi is passing
> force=false to vhost_dev_init, and the BIOS does not use MSI-X so it
> will actually use the non-vhost implementation which is wrong.  I fixed
> this by passing force=true; I'm not sure what that would break, but I
> figured "not much" since the BIOS polls and does not rely on interrupts.
> 
> That makes vhost start, but it still doesn't work for me with a 3.7.2
> kernel on the host.  Even Nick's patches hang the guest as soon as vhost
> starts, and I get the same behavior with mine.

After bisection this evening, the change that ended up breaking
vhost-scsi is vhost backend guest notifier masking support patch in
commit f56a12475.

After adding the two new notifiers in vhost-scsi/virtio-scsi following
in qemu.git/virtio-scsi-nab code, the tcm_vhost LUN scan is functioning
again..

>   (Of course with my
> patches the BIOS hangs and you never reach Linux; but try a BIOS without
> virtio-scsi support, and you'll see Linux hanging in the same way).
> 
> Here is my configuration:
> 
>   cd /sys/kernel/config/target
>   mkdir -p core/fileio_0/fileio
>   echo 'fd_dev_name=/home/pbonzini/test.img,fd_dev_size=5905580032' > 
> core/fileio_0/fileio/control 
>   echo 1 > core/fileio_0/fileio/enable
>   mkdir -p vhost/naa.600140554cf3a18e/tpgt_0/lun/lun_0
>   cd vhost/naa.600140554cf3a18e/tpgt_0
>   ln -sf ../../../../../core/fileio_0/fileio/ lun/lun_0/virtual_scsi_port
>   echo naa.60014053226f0388 > nexus
> 
> Nick's patches are run with "-vhost-scsi 
> id=vs,tpgt=0,wwpn=naa.600140554cf3a18e
> -device virtio-scsi-pci,vhost-scsi=vs".  Perhaps I'm doing something wrong.

So after adding the same vhost backend guest notifiers to the new
VirtIOSCSICommon vhost-scsi code, I'm now hitting an QEMU invalid
option:

./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -smp 4 -m 2048 -serial
file:/tmp/vhost-serial.txt
-hda /usr/src/qemu-vhost.git/debian_squeeze_amd64_standard-old.qcow2
-vhost-scsi id=vs,tpgt=0,wwpn=naa.600140579ad21088 -device
virtio-scsi-pci,vhost-scsi=vs
qemu-system-x86_64: -vhost-scsi: invalid option

Debugging this now..

> 
> Another small bug I found is an ordering problem between
> VHOST_SET_VRING_KICK and VHOST_SCSI_SET_ENDPOINT.  Starting the vq
> causes a "vhost_scsi_handle_vq endpoint not set" error in dmesg.
> Because of this I added the first two patches, which let me do
> VHOST_SCSI_SET_ENDPOINT before VHOST_SET_VRING_KICK but after setting
> up the vring.
> 
> Unfortunately, this is not enough to fix the hang.  And anyway, it's
> probably simpler to avoid the two patches and remove this test from the
> tcm_vhost.c vhost_scsi_set/clear_endpoint functions:
> 
> mutex_lock(&vs->dev.mutex);
> /* Verify that ring has been setup correctly. */
> for (index = 0; index < vs->dev.nvqs; ++index) {
> /* Verify that ring has been setup correctly. */
> if (!vhost_vq_access_ok(&vs->vqs[index])) {
> mutex_unlock(&vs->dev.mutex);
> return -EFAULT;
> }
> }
> mutex_unlock(&vs->dev.mutex);
> 
> This way, VHOST_SCSI_SET_ENDPOINT can simply be invoked right after
> vhost_dev_init, and likewise VHOST_SCSI_CLEAR_ENDPOINT in vhost_scsi_exit.
> 

, Ok, I'll generate a patch for this soon.

> I placed both sets of patches on two branches (vhost-scsi-nab and
> vhost-scsi) of my github repo at git://github.com/bonzini/qemu.git.
> One thing I haven't done due to lack of time is applying Nick's patches
> to a tree from last September.  If it works, we can bisect.  But this is
> pretty much all the time I can devote to vhost-scsi.  Nick/Asias, if
> you want to pick it up please do.
> 

Thanks Paolo!

--nab

> Paolo
> 
> Paolo Bonzini (4):
>   vhost: do VHOST_SET_VRING_KICK after setting up all vrings
>   vhost: add set_vhost_endpoint and clear_vhost_endpoint callbacks
>   virtio-scsi: create VirtIOSCSICommon
>   vhost-scsi: new device supporting the tcm_vhost Linux kernel module
> 
>  hw/Makefile.objs |   5 +-
>  hw/s390-virtio-bus.c |  35 +
>  hw/vhost-scsi.c  | 188 
>  hw/vhost-scsi.h  |  62 
>  hw/vhost.c   |  75 ---
>  hw/virtio-pci.c  |  59 +++
>  hw/virtio-scsi.c | 199 
> +--
>  hw/virtio-scsi.h | 129 +
>  hw/virtio.h  |   2 +
>  include/qemu/osdep.h |   4 ++
>  1

Re: [Qemu-devel] RFC migration of zero pages

2013-01-31 Thread Orit Wasserman

On 01/31/2013 10:10 AM, Peter Lieven wrote:
> 
> Am 31.01.2013 um 08:47 schrieb Orit Wasserman :
> 
>> On 01/31/2013 08:57 AM, Peter Lieven wrote:
>>> Hi,
>>>
>>> I just came across an idea and would like to have feedback if it makes 
>>> sence or not.
>>>
>>> If a VM is started without preallocated memory all memory that has not been 
>>> written to
>>> reads as zeros, right?
>> Hi,
>> No the memory will be unmapped (we allocate on demand).
> 
> Yes, but those unmapped pages will read as zeroes if the guest accesses it?
yes.
> 
>>> If a VM with a lot of unwritten memory is migrated or if the memory 
>>> contains a lot
>>> of zeroed out memory (e.g. Windows or Linux guest with page sanitization) 
>>> all this memory
>>> is allocated on the target during live migration. Especially with KSM this 
>>> leads
>>> to the problem that this memory is allocated and might be not available 
>>> completely as
>>> merging of the pages will happen async.
>>>
>>> Wouldn't it make sense to not send zero pages in the first round where the 
>>> complete
>>> ram is sent (if it is detectable that we are in this stage)?
>> We send one byte per zero page at the moment (see is_dup_page) we can 
>> further optimizing it
>> by not sending it.
>> I have to point out that this is a very idle guest and we need to work on a 
>> loaded guest 
>> which is the more hard problem in migration.
> 
> I was not talking about saving one byte (+ 8 bytes for header), my concern 
> was that we memset all (dup) pages
> including the special case of a zero dup page on the migration target. This 
> allocates the memory or does it not?
> 

> If my above assumption that the guest reads unmapped memory as zeroes is 
> right, this mapping
> is not necessary in the case of a zero dup page.
> 
> We just have to make sure that we are still in the very first round when 
> deciding not to sent
> a zero page, because otherwise it could be a page that has become zero during 
> migration and
> this of course has to be transferred.

OK, so if we won't send the pages than it won't be allocate in the dst and it 
can improve both 
memory usage and reduce cpu consumption on it.
That can be good for over commit scenario.
> 
>>
>> Also I notice that the bottle neck in migrating unmapped pages is the 
>> detection of those pages
>> because we map the pages in order to check them, for a large guest this is 
>> very expensive as mapping a page
>> results in a page fault in the host.
>> So what will be very helpful is actually locating those pages without 
>> mapping them
>> which looks very complicated.
> 
> This would be a nice improvement, but as you said a guest will sooner or 
> later allocate
> all memory if it is not totally idle. However, bigger parts of this memory 
> might have been reset to zeroes.
> This happens on page deallocation in a Windows Guest by default and can also 
> be enforced in LInux
> with page sanitization.

true, but it those cases we will want to zero the page in the dst as this is 
done for security reasons

Orit
> 
> Peter
> 
>>
>> Regards,
>> Orit
>>>
>>> Peter
>>>
>>>
>>
>

Re: [Qemu-devel] [PATCH for-1.4 0/2] target-s390x: CPU cleanups preparing for 1.5

2013-01-31 Thread Cornelia Huck

On Wed, 30 Jan 2013 23:48:23 +0100
Andreas Färber  wrote:

> Hi Alex,
> 
> Here's a cleanup of all of cpu_inject_*(), as requested by Cornelia, plus
> another API preparation for my CPUState part 8 series, to go along with my
> debug output bug fixes.
> 
> As a reminder here's a link to one of my original discussions of the new 
> types:
> https://lists.nongnu.org/archive/html/qemu-devel/2012-05/msg01286.html
> 
> That is, for any non-TCG functions (TCG does not support CPUState yet) an
> S390CPU argument should be preferred over CPUS390XState since it allows cheap
> access to its own fields, CPUState's via CPU() and to CPUS390XState via ->env.
> Doing this consistently avoids costs of casting back and forth unnecessarily.
> 
> s390 code should use s390_env_get_cpu() where needed, not ENV_GET_CPU().
> 
> As a rule of thumb, any field in include/exec/cpu-defs.h:CPU_COMMON can be
> expected to end up in CPUState (or accessible from there) sooner or later.
> Per-target functions can be expected to change to CPUState soon.
> 
> New fields that do not need to be accessed via TCGv or from a hot TCG helper
> function should be added to S390CPU, not to CPUS390XState.
> 
> Regards,
> Andreas
> 
> Cc: Alexander Graf 
> Cc: Cornelia Huck 
> Cc: Christian Borntraeger 
> 
> Andreas Färber (2):
>   target-s390x: Clean up cpu_inject_*() signatures
>   target-s390x: Pass S390CPU to s390_{add,del}_running_cpu()
> 
>  hw/s390x/ipl.c |6 --
>  hw/s390x/s390-virtio-bus.c |4 +---
>  hw/s390x/s390-virtio.c |8 ++--
>  target-s390x/cpu.c |2 +-
>  target-s390x/cpu.h |   23 ++-
>  target-s390x/helper.c  |   11 +++
>  target-s390x/interrupt.c   |2 +-
>  target-s390x/kvm.c |   13 +
>  8 Dateien geändert, 39 Zeilen hinzugefügt(+), 30 Zeilen entfernt(-)
> 

Looks sane and works for me.

Acked-by: Cornelia Huck

[Qemu-devel] [PATCH V17 01/10] build: add command check-clean

2013-01-31 Thread Wenchao Xia

  This command will package the clean operations in tests,
to make it easy to be extended. Now root Makefile simply calls
the command and do not care the details of it any more.

Signed-off-by: Wenchao Xia 
---
 Makefile   |1 -
 tests/Makefile |8 +++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index 0d9099a..b2d7798 100644
--- a/Makefile
+++ b/Makefile
@@ -225,7 +225,6 @@ clean:
rm -f $(foreach f,$(GENERATED_SOURCES),$(f) $(f)-timestamp)
rm -rf qapi-generated
rm -rf qga/qapi-generated
-   $(MAKE) -C tests/tcg clean
for d in $(ALL_SUBDIRS); do \
if test -d $$d; then $(MAKE) -C $$d $@ || exit 1; fi; \
rm -f $$d/qemu-options.def; \
diff --git a/tests/Makefile b/tests/Makefile
index c681ceb..884eee5 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -144,6 +144,7 @@ check-help:
@echo " make check-unit   Run qobject tests"
@echo " make check-block  Run block tests"
@echo " make check-report.htmlGenerates an HTML test report"
+   @echo " make check-clean  Clean the tests"
@echo
@echo "Please note that HTML reports do not regenerate if the unit 
tests"
@echo "has not changed."
@@ -203,10 +204,15 @@ check-tests/qemu-iotests-quick.sh: 
tests/qemu-iotests-quick.sh qemu-img$(EXESUF)
 
 # Consolidated targets
 
-.PHONY: check-qtest check-unit check
+.PHONY: check-qtest check-unit check check-clean
 check-qtest: $(patsubst %,check-qtest-%, $(QTEST_TARGETS))
 check-unit: $(patsubst %,check-%, $(check-unit-y))
 check-block: $(patsubst %,check-%, $(check-block-y))
 check: check-unit check-qtest
 
+check-clean:
+   $(MAKE) -C tests/tcg clean
+   rm -rf $(check-unit-y) $(check-qtest-i386-y) $(check-qtest-x86_64-y) 
$(check-qtest-sparc64-y) $(check-qtest-sparc-y) tests/*.o
+
+clean: check-clean
 -include $(wildcard tests/*.d)
-- 
1.7.1

[Qemu-devel] [PATCH V17 10/10] libqblock: test: libqblock test example

2013-01-31 Thread Wenchao Xia

  In this example, first it will create some qcow2 images, then try get
information including backing file relationship, then it will do sync IO on
the image.

Signed-off-by: Wenchao Xia 
---
 tests/Makefile|2 +
 tests/check-libqblock-qcow2.c |  392 +
 2 files changed, 394 insertions(+), 0 deletions(-)
 create mode 100644 tests/check-libqblock-qcow2.c

diff --git a/tests/Makefile b/tests/Makefile
index a6a421f..ff9f423 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -136,6 +136,8 @@ $(check-qtest-y): $(qtest-obj-y)
 
 #libqblock build rules
 
+check-libqblock-$(CONFIG_LIBQBLOCK) = tests/check-libqblock-qcow2$(EXESUF)
+
 $(check-libqblock-y): QEMU_INCLUDES += -I$(SRC_PATH)/tests 
-I$(SRC_PATH)/libqblock
 
 $(check-libqblock-y): %$(EXESUF): %.o libqblock.la
diff --git a/tests/check-libqblock-qcow2.c b/tests/check-libqblock-qcow2.c
new file mode 100644
index 000..7f92e73
--- /dev/null
+++ b/tests/check-libqblock-qcow2.c
@@ -0,0 +1,392 @@
+/*
+ * QEMU block layer library test
+ *
+ * Copyright IBM, Corp. 2013
+ *
+ * Authors:
+ *  Wenchao Xia   
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ * Limitation:
+ *1 filename do not support relative path, to save trouble in creating
+ * backing files.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+
+#include "libqblock.h"
+#include "libqtest.h"
+
+#define LIBQB_TEST_ENV_DIR "LIBQBLOCK_TEST_DIR"
+#define LIBQB_TEST_DEFAULT_DIR "/tmp/libqblock_test"
+#define LIBQB_TEST_DEFAULT_FILENAME "libqblock_qcow2_test_img"
+
+typedef struct LibqbTestSettings {
+const char *image_filename;
+uint64_t image_size;
+unsigned int num_backings;
+unsigned int io_buf_size;
+uint64_t io_offset;
+int print_flag;
+} LibqbTestSettings;
+
+LibqbTestSettings libqb_test_settings;
+
+static void print_loc(const QBlockLocationInfo *loc)
+{
+if (loc == NULL) {
+printf("loc is NULL.");
+return;
+}
+switch (loc->prot_type) {
+case QB_PROTO_NONE:
+printf("protocol type [none].");
+break;
+case QB_PROTO_FILE:
+printf("protocol type [file], filename [%s].",
+   loc->o_file.filename);
+break;
+default:
+printf("protocol type not supported.");
+break;
+}
+}
+
+static void print_info_image_static(QBlockStaticInfo *info)
+{
+const uint64_t *virt_size = qb_get_virt_size(info);
+const QBlockLocationInfo *backing_loc = qb_get_backing_loc(info);
+g_assert(virt_size != NULL);
+
+printf("===image location:\n");
+print_loc(&info->loc);
+printf("\nvirtual_size %" PRId64 ", format type %d [%s]",
+   *(virt_size),
+   info->fmt.fmt_type, qb_formattype2str(info->fmt.fmt_type));
+printf("\nbacking image location:\n");
+print_loc(backing_loc);
+printf("\n");
+}
+
+static char *generate_backing_filename(const char *filename, int index)
+{
+char *backing_filename = NULL;
+
+backing_filename = g_strdup_printf("%s_backing_%d", filename, index);
+return backing_filename;
+}
+
+/* get filename in a full path */
+static const char *get_filename(const char *path)
+{
+const char *filename;
+filename = strrchr(path, '/');
+if (filename == NULL) {
+filename = path;
+} else {
+filename++;
+}
+return filename;
+}
+
+/* create a chain of files, num_backings must >= 0. */
+static void files_create_qcow2(const char *filename,
+   int num_backings,
+   uint64_t virt_size)
+{
+QBlockContext *context = NULL;
+QBlockImage *qbi = NULL;
+QBlockLocationInfo *loc_info = NULL;
+QBlockFormatInfo *fmt_info = NULL;
+int ret;
+int index;
+int flag;
+char *backing_filename = NULL, *new_filename = NULL;
+const char *relative_filename = NULL;
+
+ret = qb_context_new(&context);
+g_assert(ret == 0);
+
+ret = qb_image_new(context, &qbi);
+g_assert(ret == 0);
+
+ret = qb_location_info_new(context, &loc_info);
+g_assert(ret == 0);
+
+ret = qb_format_info_new(context, &fmt_info);
+g_assert(ret == 0);
+
+loc_info->prot_type = QB_PROTO_FILE;
+fmt_info->fmt_type = QB_FORMAT_QCOW2;
+fmt_info->virt_size = virt_size;
+flag = 0;
+
+index = 0;
+while (index < num_backings) {
+new_filename = generate_backing_filename(filename, index);
+loc_info->o_file.filename = new_filename;
+if (backing_filename != NULL) {
+fmt_info->o_qcow2.backing_loc.prot_type = QB_PROTO_FILE;
+relative_filename = get_filename(backing_filename);
+fmt_info->o_qcow2.backing_loc.o_file.filename =
+ relative_filename;
+}
+ret = qb_create(context, qbi, loc_inf

[Qemu-devel] [PATCH V17 07/10] libqblock: libqblock API design and type defines

2013-01-31 Thread Wenchao Xia

  Public API design header files: libqblock.h, libqblock-error.h.
Public type define header files: libqblock-types.h. Private internal used
header files: libqblock-internal. For ABI some reserved bytes are used in
structure defines. Macro QEMU_DLL_PUBLIC is used to mark exported function.

Important APIs:
  1 QBlockContext. This structure was used to retrieve errors, every thread
must create one first.
  2 QBlockImage. It stands for an block image object.
  3 QBlockStaticInfo. It contains static information such as location, backing
file, size.
  4 Sync I/O. It is similar to C file open, read, write and close operations.

Signed-off-by: Wenchao Xia 
---
 libqblock/libqblock-error.h|   50 ++
 libqblock/libqblock-internal.h |   68 
 libqblock/libqblock-types.h|  245 +++
 libqblock/libqblock.h  |  358 
 4 files changed, 721 insertions(+), 0 deletions(-)
 create mode 100644 libqblock/libqblock-internal.h

diff --git a/libqblock/libqblock-error.h b/libqblock/libqblock-error.h
index e69de29..1907ecf 100644
--- a/libqblock/libqblock-error.h
+++ b/libqblock/libqblock-error.h
@@ -0,0 +1,50 @@
+/*
+ * QEMU block layer library
+ *
+ * Copyright IBM, Corp. 2013
+ *
+ * Authors:
+ *  Wenchao Xia   
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#ifndef LIBQBLOCK_ERROR
+#define LIBQBLOCK_ERROR
+
+#include "libqblock-types.h"
+
+#define QB_ERR_INTERNAL_ERR (-1)
+#define QB_ERR_FATAL_ERR (-2)
+#define QB_ERR_INVALID_PARAM (-100)
+#define QB_ERR_BLOCK_OUT_OF_RANGE (-101)
+
+/* error handling */
+/**
+ * qb_error_get_human_str: get human readable error string.
+ *
+ * return a human readable string, it would be truncated if buf is not big
+ *  enough.
+ *
+ * @context: operation context, must be valid.
+ * @buf: buf to receive the string.
+ * @buf_size: the size of the string buf.
+ */
+QEMU_DLL_PUBLIC
+void qb_error_get_human_str(QBlockContext *context,
+char *buf, size_t buf_size);
+
+/**
+ * qb_error_get_errno: get error number, only valid when err_ret is
+ *   QB_ERR_INTERNAL_ERR.
+ *
+ * return negative errno if last error is QB_ERR_INTERNAL_ERR, otherwise 0.
+ *
+ * @context: operation context.
+ */
+QEMU_DLL_PUBLIC
+int qb_error_get_errno(QBlockContext *context);
+
+#endif
diff --git a/libqblock/libqblock-internal.h b/libqblock/libqblock-internal.h
new file mode 100644
index 000..91521f5
--- /dev/null
+++ b/libqblock/libqblock-internal.h
@@ -0,0 +1,68 @@
+/*
+ * QEMU block layer library
+ *
+ * Copyright IBM, Corp. 2013
+ *
+ * Authors:
+ *  Wenchao Xia   
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#ifndef LIBQBLOCK_INTERNAL
+#define LIBQBLOCK_INTERNAL
+
+#include 
+
+#include "block/block.h"
+#include "libqblock-types.h"
+
+/* this file contains defines and types used inside the library. */
+
+#define QB_FREE(p) { \
+g_free(p); \
+(p) = NULL; \
+}
+
+/* details should be hidden to user */
+struct QBlockImage {
+BlockDriverState *bdrvs;
+/* internal used file name now, if it is not NULL, it means
+   image was opened.
+*/
+char *filename;
+int ref_count;
+};
+
+struct QBlockContext {
+/* last error */
+GError *g_error;
+int err_ret; /* 1st level of error, the libqblock error number */
+int err_no; /* 2nd level of error, errno what below reports */
+};
+
+/**
+ * QBlockStaticInfoAddr: a structure contains a set of pointer.
+ *
+ *this struct contains a set of pointer pointing to some
+ *  property related to format or protocol. If a property is not available,
+ *  it will be set as NULL. User could use this to get properties directly.
+ *
+ *  @backing_loc: backing file location.
+ *  @encrypt: encryption flag.
+*/
+
+typedef struct QBlockStaticInfoAddr {
+QBlockLocationInfo *backing_loc;
+bool *encrypt;
+} QBlockStaticInfoAddr;
+
+#define G_LIBQBLOCK_ERROR g_libqblock_error_quark()
+
+static inline GQuark g_libqblock_error_quark(void)
+{
+return g_quark_from_static_string("g-libqblock-error-quark");
+}
+#endif
diff --git a/libqblock/libqblock-types.h b/libqblock/libqblock-types.h
index e69de29..d49b321 100644
--- a/libqblock/libqblock-types.h
+++ b/libqblock/libqblock-types.h
@@ -0,0 +1,245 @@
+/*
+ * QEMU block layer library
+ *
+ * Copyright IBM, Corp. 2013
+ *
+ * Authors:
+ *  Wenchao Xia   
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#ifndef LIBQBLOCK_TYPES_H
+#define LIBQBLOCK_TYPES_H
+
+#include 
+#include 
+#include 
+
+/*
+ * In case libqblock is used by other project which have not defined
+ * QEMU_DLL_PUBLIC.
+ */
+#ifndef QEMU_DLL_PUBLIC
+#define QEMU_DLL_PUBLIC
+#endif
+
+/* this library is desig

Re: [Qemu-devel] [PATCH V5 02/13] block: add bdrv_get_filename() function

2013-01-31 Thread Wenchao Xia


于 2013-1-29 20:53, Kevin Wolf 写道:

Am 24.01.2013 03:57, schrieb Wenchao Xia:

   This function will simply return the uri or filename used
to open the image.

Reviewed-by: Eric Blake 
Signed-off-by: Wenchao Xia 


The only user of this function in the series is in block.c and could
have called it without making it public.

Kevin



OK.

--
Best Regards

Wenchao Xia

Re: [Qemu-devel] [PATCH 0/4] Fix misuse of atomics in trace/simple.c

2013-01-31 Thread Markus Armbruster

Ping?

Markus Armbruster  writes:

> More old news: casting pointers considered harmful, and atomics
> considered hard to use correctly.
>
> Markus Armbruster (4):
>   trace: Fix simple trace dropped event record for big endian
>   trace: Direct access of atomics is verboten, use the API
>   trace: Clean up the "try to update atomic until it worked" loops
>   trace: Fix location of simpletrace.py in docs
>
>  docs/tracing.txt |  4 ++--
>  trace/simple.c   | 39 ---
>  2 files changed, 18 insertions(+), 25 deletions(-)

Re: [Qemu-devel] [PATCH V5 03/13] block: add bdrv_can_read_snapshot() function

2013-01-31 Thread Wenchao Xia


于 2013-1-29 20:37, Kevin Wolf 写道:

Am 25.01.2013 19:11, schrieb Eric Blake:

On 01/23/2013 07:57 PM, Wenchao Xia wrote:

   Compared to bdrv_can_snapshot(), this function return whether
bs* is ready to read snapshot info from instead of write. If yes,
caller can then query snapshot information, but taking snapshot
is not always possible for that *bs may be read only.

Signed-off-by: Wenchao Xia 
---
  block.c   |   19 +++
  include/block/block.h |1 +
  2 files changed, 20 insertions(+), 0 deletions(-)





+/* return whether internal snapshot can be read on @bs */
+int bdrv_can_read_snapshot(BlockDriverState *bs)
+{



+/* return whether internal snapshot can be write on @bs */
  int bdrv_can_snapshot(BlockDriverState *bs)


I see you just copied existing code; but any reason why these functions
return int instead of bool?  Would that be worth a separate cleanup patch?


More importantly, you shouldn't copy code. Make both of them small
wrappers around a static helper functions that contains the existing code.

Kevin


  I tried that before, but found that hard to wrapper around a common
function, because both need to recusively check some condition.



--
Best Regards

Wenchao Xia

Re: [Qemu-devel] [PATCH V5 04/13] block: add snapshot info query function bdrv_query_snapshot_infolist()

2013-01-31 Thread Wenchao Xia


于 2013-1-29 20:48, Kevin Wolf 写道:

Am 24.01.2013 03:57, schrieb Wenchao Xia:

   This patch add function bdrv_query_snapshot_infolist(), which will
return snapshot info of an image in qmp object format. The implementation
code are mostly copied from qemu-img.c with modification to fit more
for qmp based block layer API.
   To help filter out snapshot info not needed, a call back function is
added in bdrv_query_snapshot_infolist().
   bdrv_can_read_snapshot() should be called before call this function,
to avoid got *errp set unexpectedly.

Signed-off-by: Wenchao Xia 
Reviewed-by: Eric Blake 
---
  block.c   |   60 +
  include/block/block.h |7 +
  2 files changed, 67 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 934bb3f..7cdb6c6 100644
--- a/block.c
+++ b/block.c
@@ -2842,6 +2842,66 @@ int coroutine_fn 
bdrv_co_is_allocated_above(BlockDriverState *top,
  return 0;
  }

+SnapshotInfoList *bdrv_query_snapshot_infolist(BlockDriverState *bs,
+   SnapshotFilterFunc filter,


This filter things looks overengineered to me, and no patch in this
series really uses it. Do we really need it?



  It is used later, which filter out the internal ones not exist
in every block device. I think this provide a clearer way, than
original code in savevm.c which use two "for" to get the correct
snapshots.


+   void *opaque,
+   Error **errp)
+{
+int i, sn_count;
+QEMUSnapshotInfo *sn_tab = NULL;
+SnapshotInfoList *info_list, *cur_item = NULL, *head = NULL;
+
+sn_count = bdrv_snapshot_list(bs, &sn_tab);
+if (sn_count < 0) {
+/* Now bdrv_snapshot_list() use negative value to tip error, so a check
+ * of it was done here. In future errp can be set by that function
+ * itself, by changing the call back functions in c files in ./block.
+ */
+const char *dev = bdrv_get_device_name(bs);
+if (sn_count == -ENOMEDIUM) {
+error_setg(errp, "Device '%s' is not inserted.", dev);
+} else if (sn_count == -ENOTSUP) {
+error_setg(errp,
+   "Device '%s' does not support internal snapshot.",
+   dev);
+} else {
+error_setg(errp,
+   "Device '%s' got %d for bdrv_snapshot_list().",
+   dev, sn_count);


strerror(-sn_count) would make the error message more useful.

You could also consider using switch instead of an if/else if chain.


 OK.


+}
+return NULL;
+}
+
+for (i = 0; i < sn_count; i++) {
+if (filter && filter(&sn_tab[i], opaque) != 0) {
+continue;
+}
+
+info_list = g_new0(SnapshotInfoList, 1);
+
+info_list->value= g_new0(SnapshotInfo, 1);
+info_list->value->id= g_strdup(sn_tab[i].id_str);
+info_list->value->name  = g_strdup(sn_tab[i].name);
+info_list->value->vm_state_size = sn_tab[i].vm_state_size;
+info_list->value->date_sec  = sn_tab[i].date_sec;
+info_list->value->date_nsec = sn_tab[i].date_nsec;
+info_list->value->vm_clock_sec  = sn_tab[i].vm_clock_nsec / 10;
+info_list->value->vm_clock_nsec = sn_tab[i].vm_clock_nsec % 10;


Did I already mention somewhere that I'm a fan of compound literals?
Looks so much nicer than repeating info_list->value-> in each line. ;-)


  Do you mean value=info_list->value and then value->***=***?



Kevin




--
Best Regards

Wenchao Xia

Re: [Qemu-devel] [PATCH V5 05/13] block: add image info query function bdrv_query_image_info()

2013-01-31 Thread Wenchao Xia


于 2013-1-29 20:55, Kevin Wolf 写道:

Am 24.01.2013 03:57, schrieb Wenchao Xia:

   This patch add function bdrv_query_image_info(), which will return
image info in qmp object format. The implementation code are mostly
copied from qemu-img.c, but use block layer function to get snapshot
info.


Don't copy code, reuse it.

Can you move the existing qemu-img code to block.c and make qemu-img use
it from there?


  A bit hard, original code are adjusted to form two API:image_info and
snapshot info. Those code can be moved to block.c in one patch and
in following patch modified, if you like.


   A check with bdrv_can_read_snapshot(), was done before collecting
snapshot info.

Signed-off-by: Wenchao Xia 
Reviewed-by: Eric Blake 
---
  block.c   |   73 +
  include/block/block.h |1 +
  2 files changed, 74 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 7cdb6c6..14bf653 100644
--- a/block.c
+++ b/block.c
@@ -2902,6 +2902,79 @@ SnapshotInfoList 
*bdrv_query_snapshot_infolist(BlockDriverState *bs,
  return head;
  }

+/* collect all internal snapshot info in a image for ImageInfo */
+static void collect_snapshots_info(BlockDriverState *bs,
+   ImageInfo *info,
+   Error **errp)
+{
+SnapshotInfoList *info_list;
+
+if (!bdrv_can_read_snapshot(bs)) {
+return;
+}
+info_list = bdrv_query_snapshot_infolist(bs, NULL, NULL, errp);
+if (info_list != NULL) {
+info->has_snapshots = true;
+info->snapshots = info_list;
+}
+}
+
+static void collect_image_info(BlockDriverState *bs,
+   ImageInfo *info)
+{
+uint64_t total_sectors;
+char backing_filename[1024];
+char backing_filename2[1024];
+BlockDriverInfo bdi;
+const char *filename;
+
+filename = bdrv_get_filename(bs);
+bdrv_get_geometry(bs, &total_sectors);
+
+info->filename= g_strdup(filename);
+info->format  = g_strdup(bdrv_get_format_name(bs));
+info->virtual_size= total_sectors * 512;
+info->actual_size = bdrv_get_allocated_file_size(bs);
+info->has_actual_size = info->actual_size >= 0;
+if (bdrv_is_encrypted(bs)) {
+info->encrypted = true;
+info->has_encrypted = true;
+}
+if (bdrv_get_info(bs, &bdi) >= 0) {
+if (bdi.cluster_size != 0) {
+info->cluster_size = bdi.cluster_size;
+info->has_cluster_size = true;
+}
+info->dirty_flag = bdi.is_dirty;
+info->has_dirty_flag = true;
+}
+bdrv_get_backing_filename(bs, backing_filename, sizeof(backing_filename));


No need to copy this, you can directly access bs->backing_file.


  OK.


Kevin




--
Best Regards

Wenchao Xia

Re: [Qemu-devel] [PATCH V5 06/13] qemu-img: switch image retrieving function

2013-01-31 Thread Wenchao Xia


于 2013-1-29 20:56, Kevin Wolf 写道:

Am 24.01.2013 03:57, schrieb Wenchao Xia:

   Now qemu-img call block layer function to get image info and check
if error happens.

Signed-off-by: Wenchao Xia 
Reviewed-by: Eric Blake 


Merge this with patch 5.

Kevin


OK.

--
Best Regards

Wenchao Xia

[Qemu-devel] [PATCH V17 02/10] build: hide symbols in *.lo

2013-01-31 Thread Wenchao Xia

  This patch will hide all symbols except those marked with
QEMU_DLL_PUBLIC in .lo files. Because vsclient linked some
function in qemu, so export them as well. All symbols
in libcacard.syms are marked as public.

Signed-off-by: Wenchao Xia 
---
 configure   |   18 ++
 include/qemu/sockets.h  |1 +
 include/qemu/thread.h   |6 ++
 libcacard/cac.h |2 ++
 libcacard/card_7816.h   |9 +
 libcacard/vcard.h   |   22 ++
 libcacard/vcard_emul.h  |   12 
 libcacard/vcard_emul_type.h |3 +++
 libcacard/vcardt.h  |8 
 libcacard/vevent.h  |6 ++
 libcacard/vreader.h |   23 +++
 rules.mak   |1 +
 12 files changed, 111 insertions(+), 0 deletions(-)

diff --git a/configure b/configure
index b7635e4..64bde7b 100755
--- a/configure
+++ b/configure
@@ -290,6 +290,7 @@ if test "$debug_info" = "yes"; then
 CFLAGS="-g $CFLAGS"
 LDFLAGS="-g $LDFLAGS"
 fi
+QEMU_DLL_CFLAGS=""
 
 # make source path absolute
 source_path=`cd "$source_path"; pwd`
@@ -1276,6 +1277,21 @@ EOF
   fi
 fi
 
+# GNUC symbol hiding macro
+cat > $TMPC << EOF
+#if defined(__GNUC__) && __GNUC__ >= 4
+int main(void) { return 0; }
+#else
+#error GNUC with verison >=4 not found.
+#endif
+EOF
+if compile_prog "-Werror" "" ; then
+  QEMU_DLL_CFLAGS="-fvisibility=hidden $QEMU_DLL_CFLAGS"
+  QEMU_CFLAGS="-D 
QEMU_DLL_PUBLIC='__attribute__((__visibility__(\"default\")))' $QEMU_CFLAGS"
+else
+  QEMU_CFLAGS="-D QEMU_DLL_PUBLIC='' $QEMU_CFLAGS"
+fi
+
 #
 # Solaris specific configure tool chain decisions
 #
@@ -3263,6 +3279,7 @@ echo "Host C compiler   $host_cc"
 echo "Objective-C compiler $objcc"
 echo "CFLAGS$CFLAGS"
 echo "QEMU_CFLAGS   $QEMU_CFLAGS"
+echo "QEMU_DLL_CFLAGS   $QEMU_DLL_CFLAGS"
 echo "LDFLAGS   $LDFLAGS"
 echo "make  $make"
 echo "install   $install"
@@ -3770,6 +3787,7 @@ echo "WINDRES=$windres" >> $config_host_mak
 echo "LIBTOOL=$libtool" >> $config_host_mak
 echo "CFLAGS=$CFLAGS" >> $config_host_mak
 echo "QEMU_CFLAGS=$QEMU_CFLAGS" >> $config_host_mak
+echo "QEMU_DLL_CFLAGS=$QEMU_DLL_CFLAGS" >> $config_host_mak
 echo "QEMU_INCLUDES=$QEMU_INCLUDES" >> $config_host_mak
 if test "$sparse" = "yes" ; then
   echo "CC   := REAL_CC=\"\$(CC)\" cgcc"   >> $config_host_mak
diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
index 803ae17..916a57c 100644
--- a/include/qemu/sockets.h
+++ b/include/qemu/sockets.h
@@ -31,6 +31,7 @@ int inet_aton(const char *cp, struct in_addr *ia);
 #include "qapi/qmp/qerror.h"
 
 /* misc helpers */
+QEMU_DLL_PUBLIC
 int qemu_socket(int domain, int type, int protocol);
 int qemu_accept(int s, struct sockaddr *addr, socklen_t *addrlen);
 int socket_set_cork(int fd, int v);
diff --git a/include/qemu/thread.h b/include/qemu/thread.h
index c02404b..8274173 100644
--- a/include/qemu/thread.h
+++ b/include/qemu/thread.h
@@ -18,15 +18,19 @@ typedef struct QemuThread QemuThread;
 #define QEMU_THREAD_JOINABLE 0
 #define QEMU_THREAD_DETACHED 1
 
+QEMU_DLL_PUBLIC
 void qemu_mutex_init(QemuMutex *mutex);
 void qemu_mutex_destroy(QemuMutex *mutex);
+QEMU_DLL_PUBLIC
 void qemu_mutex_lock(QemuMutex *mutex);
 int qemu_mutex_trylock(QemuMutex *mutex);
+QEMU_DLL_PUBLIC
 void qemu_mutex_unlock(QemuMutex *mutex);
 
 #define rcu_read_lock() do { } while (0)
 #define rcu_read_unlock() do { } while (0)
 
+QEMU_DLL_PUBLIC
 void qemu_cond_init(QemuCond *cond);
 void qemu_cond_destroy(QemuCond *cond);
 
@@ -35,8 +39,10 @@ void qemu_cond_destroy(QemuCond *cond);
  * and pthread_cond_broadcast can be called except while the same mutex is
  * held as in the corresponding pthread_cond_wait calls!
  */
+QEMU_DLL_PUBLIC
 void qemu_cond_signal(QemuCond *cond);
 void qemu_cond_broadcast(QemuCond *cond);
+QEMU_DLL_PUBLIC
 void qemu_cond_wait(QemuCond *cond, QemuMutex *mutex);
 
 void qemu_sem_init(QemuSemaphore *sem, int init);
diff --git a/libcacard/cac.h b/libcacard/cac.h
index 15a61be..c7b85f2 100644
--- a/libcacard/cac.h
+++ b/libcacard/cac.h
@@ -13,11 +13,13 @@
  * Initialize the cac card. This is the only public function in this file. All
  * the rest are connected through function pointers.
  */
+QEMU_DLL_PUBLIC
 VCardStatus cac_card_init(VReader *reader, VCard *card, const char *params,
   unsigned char * const *cert, int cert_len[],
   VCardKey *key[] /* adopt the keys*/,
   int cert_count);
 
 /* not yet implemented */
+QEMU_DLL_PUBLIC
 VCardStatus cac_is_cac_card(VReader *reader);
 #endif
diff --git a/libcacard/card_7816.h b/libcacard/card_7816.h
index 4a01993..207385a 100644
--- a/libcacard/card_7816.h
+++ b/libcacard/card_7816.h
@@ -14,21 +14,26 @@
  * constructors for VCardResponse's
  */
 /* response from a return buffer and a status */
+QEMU_DLL_PUBLIC
 VCardResponse *vcard_response_new(VCard *card, unsigned char *buf, int len,

Re: [Qemu-devel] [PATCH V5 07/13] block: rename bdrv_query_info to bdrv_query_block_info

2013-01-31 Thread Wenchao Xia


于 2013-1-29 21:05, Kevin Wolf 写道:

Am 29.01.2013 14:04, schrieb Kevin Wolf:

Am 24.01.2013 03:57, schrieb Wenchao Xia:

   Now that we have bdrv_query_image_info, rename this function to make it
more obvious what it is doing.

Reviewed-by: Eric Blake 
Signed-off-by: Wenchao Xia 


Should this provide an option to give information on each image in the
backing file chain, as well as the BlockDriverStates stacked via bs->file?


Sorry, this was meant as a reply to patch 8.

Kevin


  I think so, how about:
# @query-images:
#
# Get a list of DeviceImageInfo for all virtual block devices.
#
# Returns: a list of @DeviceImageInfo describing each virtual block device
#
# Since: 1.4
##
{ 'command': 'query-images',
   'data': { '*all': 'bool' },
   'returns': ['DeviceImageInfo'] }



--
Best Regards

Wenchao Xia

[Qemu-devel] [PATCH V17 05/10] libqblock: build: add packaging support

2013-01-31 Thread Wenchao Xia

  Now libqblock can be packaged and installed by
"sudo make install-libqblock'.

Signed-off-by: Wenchao Xia 
---
 libqblock/Makefile  |   34 +-
 libqblock/libqblock.pc.in   |   13 +
 2 files changed, 42 insertions(+), 5 deletions(-)
 create mode 100644 libqblock/libqblock-error.h
 create mode 100644 libqblock/libqblock-types.h
 create mode 100644 libqblock/libqblock.h
 create mode 100644 libqblock/libqblock.pc.in

diff --git a/libqblock/Makefile b/libqblock/Makefile
index a6be721..8a45ce1 100644
--- a/libqblock/Makefile
+++ b/libqblock/Makefile
@@ -1,4 +1,4 @@
-all: libqblock.la
+libqblock_includedir= $(includedir)/qblock
 
 # objects linked into a shared library, built with libtool with -fPIC if 
required
 libqblock-obj-y = libqblock/libqblock.o libqblock/libqblock-error.o
@@ -7,10 +7,18 @@ libqblock-obj-y += $(util-obj-y) $(block-obj-y)
 
 libqblock-lobj-y=$(patsubst %.o, %.lo, $(libqblock-obj-y))
 
+libqblock-pub-headers= $(SRC_PATH)/libqblock/libqblock.h \
+   $(SRC_PATH)/libqblock/libqblock-types.h \
+   $(SRC_PATH)/libqblock/libqblock-error.h
+
+libqblock-requires-$(CONFIG_CURL) += curl
+libqblock-requires-$(CONFIG_LIBCAP) += cap-ng
+libqblock-requires-$(CONFIG_UUID) += uuid
+
 # libtool will build the .o files, too
 $(libqblock-obj-y): | $(libqblock-lobj-y)
 
-all: libqblock.la
+all: libqblock.la libqblock.pc
 
 #
 # Rules for building libqblock standalone library
@@ -20,10 +28,26 @@ libqblock.la: LDFLAGS += -rpath $(libdir) -no-undefined \
 libqblock.la: $(libqblock-lobj-y)
$(call LINK,$^)
 
-
-.PHONY: libqblock-clean
+libqblock.pc: $(SRC_PATH)/libqblock/libqblock.pc.in
+   $(call quiet-command,sed -e 's|@LIBDIR@|$(libdir)|' \
+   -e 's|@INCLUDEDIR@|$(libqblock_includedir)|' \
+   -e 's|@VERSION@|$(shell cat $(SRC_PATH)/VERSION)|' \
+   -e 's|@PREFIX@|$(prefix)|' \
+   -e 's|@REQUIRES@|$(libqblock-requires-y)|' \
+   $< > libqblock.pc,\
+   "  GEN   $@")
+
+.PHONY: libqblock-clean install-libqblock
+
+install-libqblock: libqblock.la libqblock.pc
+   $(INSTALL_DIR) "$(DESTDIR)$(libdir)"
+   $(INSTALL_DIR) "$(DESTDIR)$(libdir)/pkgconfig"
+   $(INSTALL_DIR) "$(DESTDIR)$(libqblock_includedir)"
+   $(INSTALL_LIB) libqblock.la "$(DESTDIR)$(libdir)"
+   $(INSTALL_DATA) libqblock.pc "$(DESTDIR)$(libdir)/pkgconfig"
+   $(INSTALL_DATA) $(libqblock-pub-headers) 
"$(DESTDIR)$(libqblock_includedir)"
 
 libqblock-clean:
-   rm -rf $(libqblock-lobj-y) libqblock.la libqblock/.libs
+   rm -rf $(libqblock-lobj-y) libqblock.la libqblock.pc libqblock/.libs
 
 clean: libqblock-clean
diff --git a/libqblock/libqblock-error.h b/libqblock/libqblock-error.h
new file mode 100644
index 000..e69de29
diff --git a/libqblock/libqblock-types.h b/libqblock/libqblock-types.h
new file mode 100644
index 000..e69de29
diff --git a/libqblock/libqblock.h b/libqblock/libqblock.h
new file mode 100644
index 000..e69de29
diff --git a/libqblock/libqblock.pc.in b/libqblock/libqblock.pc.in
new file mode 100644
index 000..0901672
--- /dev/null
+++ b/libqblock/libqblock.pc.in
@@ -0,0 +1,13 @@
+prefix=@PREFIX@
+exec_prefix=${prefix}
+libdir=@LIBDIR@
+includedir=@INCLUDEDIR@
+
+Name: qblock
+Description: Qemu block layer library
+Version: @VERSION@
+
+Requires: @REQUIRES@
+Libs: -L${libdir} -lqblock
+Libs.private:
+Cflags: -I${includedir}
-- 
1.7.1

[Qemu-devel] [PATCH V17 04/10] libqblock: build: add rule for libqblock.la

2013-01-31 Thread Wenchao Xia

  Now libqblock.la can be built with neccessary object files,
and can be automatically cleaned by make clean in root directory.
make libqblock-clean also clean it. -fvisibility=hidden was used
to hide symbols, and a special macro was introduced to export
symbols that marked as public.

Signed-off-by: Wenchao Xia 
---
 libqblock/Makefile  |   29 +++--
 1 files changed, 27 insertions(+), 2 deletions(-)
 create mode 100644 libqblock/libqblock-error.c
 create mode 100644 libqblock/libqblock.c

diff --git a/libqblock/Makefile b/libqblock/Makefile
index 8173da7..a6be721 100644
--- a/libqblock/Makefile
+++ b/libqblock/Makefile
@@ -1,4 +1,29 @@
 all: libqblock.la
 
-libqblock.la:
-   @true
+# objects linked into a shared library, built with libtool with -fPIC if 
required
+libqblock-obj-y = libqblock/libqblock.o libqblock/libqblock-error.o
+libqblock-obj-y += $(filter-out stubs/set-fd-handler.o, $(stub-obj-y))
+libqblock-obj-y += $(util-obj-y) $(block-obj-y)
+
+libqblock-lobj-y=$(patsubst %.o, %.lo, $(libqblock-obj-y))
+
+# libtool will build the .o files, too
+$(libqblock-obj-y): | $(libqblock-lobj-y)
+
+all: libqblock.la
+
+#
+# Rules for building libqblock standalone library
+
+libqblock.la: LDFLAGS += -rpath $(libdir) -no-undefined \
+   -export-syms $(SRC_PATH)/libqblock/libqblock.syms
+libqblock.la: $(libqblock-lobj-y)
+   $(call LINK,$^)
+
+
+.PHONY: libqblock-clean
+
+libqblock-clean:
+   rm -rf $(libqblock-lobj-y) libqblock.la libqblock/.libs
+
+clean: libqblock-clean
diff --git a/libqblock/libqblock-error.c b/libqblock/libqblock-error.c
new file mode 100644
index 000..e69de29
diff --git a/libqblock/libqblock.c b/libqblock/libqblock.c
new file mode 100644
index 000..e69de29
-- 
1.7.1

[Qemu-devel] [PATCH V17 00/10] libqblock qemu block layer library

2013-01-31 Thread Wenchao Xia

  These patches introduce libqblock API, make subdir-libqblock and make
check-libqblock could build this library.
Functionalities:
 1 create a new image.
 2 sync access of an image.
 3 basic image information retrieving such as backing file.
 4 detect if a sector is allocated in an image.
Supported Formats:
 ALL using file protocols.

  Patch 1 to 3 prepares qemu to accept libqblock.

v2:
  Insert reserved bytes into union.
  Use uint64_t instead of size_t, offset.
  Use const char * in filename pointer.
  Initialization function removed and it was automatically executed when
library is loaded.
  Added compile flag visibility=hidden, to avoid name space pollution.
  Structure naming style changed.
  Using byte unit instead of sector for every API.
  Added a member in image static information structure, to report logical
sector size, which is always 512 now.
  Read and write API can take request not aligned to 512 now. It returns the
byte number that have succeed in operation, but now either negative value
or the number requested would be returned, because qemu block sync I/O API
would not return such number.
  Typo fix due to comments and improved documents.

v3:
  Removed the code about OOM error, introduced GError.
  Used a table to map from string to enum types about format.
  Use typedef for every structure.
  Improved the gcc compiler macro to warn if gcc was not used.
  Global variable name changed with prefix libqb_.
  The struct QBlockStaticInfo was changed to folder full format related
information inside, and a new member with pointers pointing to the mostly used
members, such as backing file, virt size, was added. This would allow the user
to get full information about how it is created in the future.
  Each patch in the serial can work with qemu now.
  Typo fixes.

v4:
  Renamed QBroker to QBlockContext.
  Removed tool objs out of libqblock.
  Added a check in initialization about structure size for ABI.
  Added a new helper API to duplicate protocol information, helps to open files
in a backing file chain.
  Check-libqblock will not rebuild libqblock every time now.
  Test case file renamed to "libqblock-[FMT].c".
  Test use gtest framework now.
  Test do random creation of test file now, added check for information API in
it.
  Test do random sync io instead of fixed offset io now.
  Test accept one parameter about where to place the test image, now it is
./tests/libqblock/test_images.

v5:
  Makefile of libqblock was adjusted to be similar as libcacard, added spec
file and install section.
  Removed warning when GCC was not found.
  Structure names were changed to better ones.
  Removed the union typedef that contain reserved bytes to reduce the folder
depth.
  Some format related enum options was changed to better name.
  Added accessors about image static information, hide indirect accessing
member detail in the structure.
  Test Makefile do not create diretory now, test case create it themself.
  Test build system do not use libtool now, and removed qtest-obj-y in its
dependency, make check will automatically execute test anyway now.
  Removed "ifeq ($(LIBTOOL),)" in Makefile.

v6:
  Remove address pointer member in image static info structure.

v7:
  Support out of tree building.

v8:
  Fix a bug in out of tree building.

v9:
  Rebase and splitted out small fix patch for qemu.

v10:
  Rebased to upstream, adjusted libqblock build system according to Paolo's
comments.

v11:
  Adjusting code in patch 4 to 7, details are in the child patch's commit
message.

v12:
  Split a patch to add a function in stubs, other change are in patch 4 to 7
commit messages.

v13:
  Moved another function into stubs, added xml rule in tests/makefile, little
changes in patch 4, 6, 7.

v14:
  all: Rebased.
  1/10, 2/10: automatically call subdir's clean command if subdir's Makefile
added $SUBDIR_CLEAN_RULES, so tests/Makefile do not need to be always included,
libqblock's rule can also use it.
  3/10: seperated patch for configure support, modified as libcacard's style.
  4/10: modifed as libcacard's rule.
  5/10: seperated patch, also changed a bit to be a mirror as libcacard's rule.
  8/10: use bdrv_pread/bdrv_pwrite, instead of handling the buf allignment by
libqblock itself. Removed libqblock-aio.c because most function are in
block-obj-y now.
  9/10: seperated patch, use LINK instead of custom LT_LINK rule, and now
libqblock.la is a dependence in the link rule of test program, to make
LINK invoke libtool.

v15:
  1/9: merged from 1/10, 2/10 of previous version, and use dependce of "clean"
in sub Makefile instead of a intermedia variable.
  2/9: drop $TOOLS adding in libqblock's Makefile, the rule is added in "all".
  3/9: drop libqblock-y in Makefile.objs, directly add them in libqblock's
Makfile.
  6/9: use __visibility__ in special public marking macro, change macro name
to LIBQB_DLL_PUBLIC to avoid confilict in name space, use sizeof(uint64_t)
instead of magic number 8 in reserved member, spelling fix. Da

Re: [Qemu-devel] [PATCH V5 09/13] block: export function bdrv_find_snapshot()

2013-01-31 Thread Wenchao Xia


于 2013-1-29 21:15, Kevin Wolf 写道:

Am 24.01.2013 03:57, schrieb Wenchao Xia:

   This patch move it from savevm.c to block.c and export it. To make
it clear about id and name in searching, the API was changed a bit
to distinguish them. Caller can choose to search by id or name now.

Signed-off-by: Wenchao Xia 


Please keep code motion and semantic changes separate. One change per patch.


OK.


---
  block.c   |   51 +
  include/block/block.h |2 +
  savevm.c  |   32 -
  3 files changed, 58 insertions(+), 27 deletions(-)

diff --git a/block.c b/block.c
index 02c74dc..6631d9a 100644
--- a/block.c
+++ b/block.c
@@ -3395,6 +3395,57 @@ int bdrv_snapshot_load_tmp(BlockDriverState *bs,
  return -ENOTSUP;
  }

+/*
+ * Try find an internal snapshot with @id or @name, @id have higher priority
+ * in searching.
+ * @bs block device to search on, must not be NULL.
+ * @sn_info snapshot information to be filled in, must not be NULL.
+ * @id snapshot id to search with, can be NULL.
+ * @name snapshot name to search with, can be NULL.
+ * returns 0 and @sn_info is filled with related information if found,
+ * otherwise it returns negative value, -ENOENT.
+ */
+int bdrv_snapshot_find(BlockDriverState *bs, QEMUSnapshotInfo *sn_info,
+   const char *id, const char *name)
+{
+QEMUSnapshotInfo *sn_tab, *sn;
+int nb_sns, i, ret;
+
+ret = -ENOENT;
+nb_sns = bdrv_snapshot_list(bs, &sn_tab);
+if (nb_sns < 0) {
+return ret;


Why don't you return the real error?


  OK.


+}
+
+/* search by id */
+if (id) {
+for (i = 0; i < nb_sns; i++) {
+sn = &sn_tab[i];
+if (!strcmp(sn->id_str, id)) {
+*sn_info = *sn;
+ret = 0;
+goto out;
+}
+}
+}
+
+/* search by name */
+if (name) {
+for (i = 0; i < nb_sns; i++) {
+sn = &sn_tab[i];
+if (!strcmp(sn->name, name)) {
+*sn_info = *sn;
+ret = 0;
+goto out;
+}
+}
+}


What's the motivation for duplicating the code instead of keeping it an
strcmp() || strcmp() like in the original place?


  The logic is slightly different: original code will try compare name
before id comparation complete, make the caller harder to use.


+
+ out:
+g_free(sn_tab);
+return ret;
+}
+
  /* backing_file can either be relative, or absolute, or a protocol.  If it is
   * relative, it must be relative to the chain.  So, passing in bs->filename
   * from a BDS as backing_file should not be done, as that may be relative to
diff --git a/include/block/block.h b/include/block/block.h
index 9838c48..1e7f4a1 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -340,6 +340,8 @@ int bdrv_snapshot_list(BlockDriverState *bs,
  int bdrv_snapshot_load_tmp(BlockDriverState *bs,
 const char *snapshot_name);
  char *bdrv_snapshot_dump(char *buf, int buf_size, QEMUSnapshotInfo *sn);
+int bdrv_snapshot_find(BlockDriverState *bs, QEMUSnapshotInfo *sn_info,
+   const char *id, const char *name);

  char *get_human_readable_size(char *buf, int buf_size, int64_t size);
  int path_is_absolute(const char *path);
diff --git a/savevm.c b/savevm.c
index 304d1ef..dccaece 100644
--- a/savevm.c
+++ b/savevm.c
@@ -2029,28 +2029,6 @@ out:
  return ret;
  }

-static int bdrv_snapshot_find(BlockDriverState *bs, QEMUSnapshotInfo *sn_info,
-  const char *name)
-{
-QEMUSnapshotInfo *sn_tab, *sn;
-int nb_sns, i, ret;
-
-ret = -ENOENT;
-nb_sns = bdrv_snapshot_list(bs, &sn_tab);
-if (nb_sns < 0)
-return ret;
-for(i = 0; i < nb_sns; i++) {
-sn = &sn_tab[i];
-if (!strcmp(sn->id_str, name) || !strcmp(sn->name, name)) {
-*sn_info = *sn;
-ret = 0;
-break;
-}
-}
-g_free(sn_tab);
-return ret;
-}
-
  /*
   * Deletes snapshots of a given name in all opened images.
   */
@@ -2063,7 +2041,7 @@ static int del_existing_snapshots(Monitor *mon, const 
char *name)
  bs = NULL;
  while ((bs = bdrv_next(bs))) {
  if (bdrv_can_snapshot(bs) &&
-bdrv_snapshot_find(bs, snapshot, name) >= 0)
+bdrv_snapshot_find(bs, snapshot, name, name) >= 0)
  {
  ret = bdrv_snapshot_delete(bs, name);
  if (ret < 0) {
@@ -2123,7 +2101,7 @@ void do_savevm(Monitor *mon, const QDict *qdict)
  sn->vm_clock_nsec = qemu_get_clock_ns(vm_clock);

  if (name) {
-ret = bdrv_snapshot_find(bs, old_sn, name);
+ret = bdrv_snapshot_find(bs, old_sn, name, name);
  if (ret >= 0) {
  pstrcpy(sn->name, sizeof(sn->name), old_sn->name);
  pstrcpy(sn->id_str, sizeof(sn->id_str), old_sn->id_str);
@@ -2214,7 +2192,7 @@

[Qemu-devel] [PATCH V17 08/10] libqblock: libqblock API implement

2013-01-31 Thread Wenchao Xia

  This patch contains implemention for APIs. Basically it is a layer
above qemu block general layer now.
  qb_context_new() will try do init for this library.

Signed-off-by: Wenchao Xia 
---
 libqblock/libqblock-error.c |   60 +++
 libqblock/libqblock.c   | 1092 +++
 2 files changed, 1152 insertions(+), 0 deletions(-)

diff --git a/libqblock/libqblock-error.c b/libqblock/libqblock-error.c
index e69de29..c1781ff 100644
--- a/libqblock/libqblock-error.c
+++ b/libqblock/libqblock-error.c
@@ -0,0 +1,60 @@
+/*
+ * QEMU block layer library
+ *
+ * Copyright IBM, Corp. 2013
+ *
+ * Authors:
+ *  Wenchao Xia   
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "libqblock-error.h"
+#include "libqblock-internal.h"
+
+void qb_error_get_human_str(QBlockContext *context,
+char *buf, size_t buf_size)
+{
+const char *err_ret_str;
+switch (context->err_ret) {
+case QB_ERR_INTERNAL_ERR:
+err_ret_str = "Internal error.";
+break;
+case QB_ERR_FATAL_ERR:
+err_ret_str = "Fatal error.";
+break;
+case QB_ERR_INVALID_PARAM:
+err_ret_str = "Invalid param.";
+break;
+case QB_ERR_BLOCK_OUT_OF_RANGE:
+err_ret_str = "request is out of image's range.";
+break;
+default:
+err_ret_str = "Unknown error.";
+break;
+}
+if (context == NULL) {
+snprintf(buf, buf_size, "%s", err_ret_str);
+return;
+}
+
+if (context->err_ret == QB_ERR_INTERNAL_ERR) {
+snprintf(buf, buf_size, "%s %s errno [%d]. strerror [%s].",
+ err_ret_str, context->g_error->message,
+ context->err_no, strerror(-context->err_no));
+} else {
+snprintf(buf, buf_size, "%s %s",
+ err_ret_str, context->g_error->message);
+}
+return;
+}
+
+int qb_error_get_errno(QBlockContext *context)
+{
+if (context->err_ret == QB_ERR_INTERNAL_ERR) {
+return context->err_no;
+}
+return 0;
+}
diff --git a/libqblock/libqblock.c b/libqblock/libqblock.c
index e69de29..90c8729 100644
--- a/libqblock/libqblock.c
+++ b/libqblock/libqblock.c
@@ -0,0 +1,1092 @@
+/*
+ * QEMU block layer library
+ *
+ * Copyright IBM, Corp. 2013
+ *
+ * Authors:
+ *  Wenchao Xia   
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include 
+#include 
+#include 
+
+#include "libqblock.h"
+#include "libqblock-internal.h"
+
+#include "block/block_int.h"
+
+#define LIBQB_FILENAME_MAX 4096
+
+typedef struct LibqblockGlobalData {
+int init_flag;
+pthread_mutex_t mutex;
+} LibqblockGlobalData;
+
+LibqblockGlobalData libqb_global_data;
+
+typedef struct LibqbFormatStrMapping {
+const char *fmt_str;
+QBlockFormat fmt_type;
+} LibqbFormatStrMapping;
+
+LibqbFormatStrMapping libqb_formatstr_table[] = {
+{"cow", QB_FORMAT_COW},
+{"qed", QB_FORMAT_QED},
+{"qcow", QB_FORMAT_QCOW},
+{"qcow2", QB_FORMAT_QCOW2},
+{"raw", QB_FORMAT_RAW},
+{"rbd", QB_FORMAT_RBD},
+{"sheepdog", QB_FORMAT_SHEEPDOG},
+{"vdi", QB_FORMAT_VDI},
+{"vmdk", QB_FORMAT_VMDK},
+{"vpc", QB_FORMAT_VPC},
+{NULL, 0},
+};
+
+__attribute__((constructor))
+static void libqblock_init(void)
+{
+/* Todo: add an assertion about the ABI. */
+libqb_global_data.init_flag = 0;
+pthread_mutex_init(&libqb_global_data.mutex, NULL);
+}
+
+const char *qb_formattype2str(QBlockFormat fmt_type)
+{
+int i = 0;
+LibqbFormatStrMapping *tb = libqb_formatstr_table;
+
+if ((fmt_type <= QB_FORMAT_NONE) || (fmt_type >= QB_FORMAT_MAX)) {
+return NULL;
+}
+while (tb[i].fmt_str != NULL) {
+if (tb[i].fmt_type == fmt_type) {
+return tb[i].fmt_str;
+}
+i++;
+}
+return NULL;
+}
+
+QBlockFormat qb_str2fmttype(const char *fmt_str)
+{
+int i = 0;
+LibqbFormatStrMapping *tb = libqb_formatstr_table;
+
+if (fmt_str == NULL) {
+return QB_FORMAT_NONE;
+}
+while (tb[i].fmt_str != NULL) {
+if ((strcmp(fmt_str, tb[i].fmt_str) == 0)) {
+return tb[i].fmt_type;
+}
+i++;
+}
+return QB_FORMAT_NONE;
+}
+
+static void set_context_err(QBlockContext *context, int err_ret,
+const char *fmt, ...)
+{
+va_list ap;
+
+if (context->g_error != NULL) {
+g_error_free(context->g_error);
+}
+
+va_start(ap, fmt);
+context->g_error = g_error_new_valist(G_LIBQBLOCK_ERROR, err_ret, fmt, ap);
+va_end(ap);
+
+context->err_ret = err_ret;
+if (err_ret == QB_ERR_INTERNAL_ERR) {
+context->err_no = -errno;
+} else {
+context->err_no = 0;
+}
+}
+
+int qb_context_new(QBlockContext **p_context)
+{
+/*
+ * library init

[Qemu-devel] qemu-ga: user tool in /usr/bin ?

2013-01-31 Thread Michael Tokarev


I wonder why qemu-ga is a user-callable binary as installed
by `make install'.  It appears to be a system daemon which,
at the very least, requires root privileges to write its
pidfile and open the transport device.

Shouldn't it go to $sbindir or even $libexecdir instead of
the current $bindif ?  I think it has nothing to do in the
main $bindir.

I understand there's no infrastructure in Makefiles to
install tools into dirs other than $bindir, but it is not
difficult to add.  The question is whenever there are other
reasons for qemu-ga going into /usr/bin.

Thanks,

/mjt

[Qemu-devel] [PATCH V17 09/10] libqblock: build: add rules for test case

2013-01-31 Thread Wenchao Xia

  Libtool will be used for final link, the rules do nothing if
libqblock was disabled. Temp directory was used to store image
created in test, which will be deleted in clean.

Signed-off-by: Wenchao Xia 
---
 tests/Makefile |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/tests/Makefile b/tests/Makefile
index 884eee5..a6a421f 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -134,6 +134,15 @@ qtest-obj-y = tests/libqtest.o libqemuutil.a libqemustub.a
 qtest-obj-y += tests/libi2c.o tests/libi2c-omap.o
 $(check-qtest-y): $(qtest-obj-y)
 
+#libqblock build rules
+
+$(check-libqblock-y): QEMU_INCLUDES += -I$(SRC_PATH)/tests 
-I$(SRC_PATH)/libqblock
+
+$(check-libqblock-y): %$(EXESUF): %.o libqblock.la
+   $(call LINK, $^)
+
+check-unit-$(CONFIG_LIBQBLOCK) += $(check-libqblock-y)
+
 .PHONY: check-help
 check-help:
@echo "Regression testing targets:"
-- 
1.7.1

Re: [Qemu-devel] [RFC] qemu snapshot enchancement

2013-01-31 Thread Wenchao Xia


于 2013-1-30 1:44, Paolo Bonzini 写道:

Il 29/01/2013 17:44, Stefan Hajnoczi ha scritto:


I'm planning to add offline mirroring to qemu-img.  If you use an NBD
server as the destination, it can be used to send only the delta between
two snapshots via NBD.

I think this is the opposite of what you suggested, which is to run
qemu-nbd on the image and query the server.

Yep, exactly.  The nice thing about your approach is that we don't
need an explicit API for dirty block tracking, it's implicit in the
writes that we send to the backup application's NBD server.


Of course the ugly thing is that it is much less flexible than letting
the backup application do its own thing.  Also, there is just one dirty
bitmap that you need to use for everything, though in the future the
"block filters" patch will come up magically and will let you have
multiple dirty bitmaps...

Paolo


  Hi, Paolo. I didn't quite understand how offline mirroring will work
and provide the delta data, do you have a page for it? I think
all application cares is the delta data access API while VM is active,
the internal data orgnanization detail is not really the point.



--
Best Regards

Wenchao Xia

[Qemu-devel] [PATCH V17 06/10] block: export function path_has_protocol()

2013-01-31 Thread Wenchao Xia

  libqblock need to use it.

Signed-off-by: Wenchao Xia 
---
 block.c   |2 +-
 include/block/block.h |2 ++
 2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/block.c b/block.c
index ba67c0d..44db62f 100644
--- a/block.c
+++ b/block.c
@@ -195,7 +195,7 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
 }
 
 /* check if the path starts with ":" */
-static int path_has_protocol(const char *path)
+int path_has_protocol(const char *path)
 {
 const char *p;
 
diff --git a/include/block/block.h b/include/block/block.h
index 5c3b911..40b63b1 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -395,6 +395,8 @@ void bdrv_acct_start(BlockDriverState *bs, BlockAcctCookie 
*cookie,
 int64_t bytes, enum BlockAcctType type);
 void bdrv_acct_done(BlockDriverState *bs, BlockAcctCookie *cookie);
 
+int path_has_protocol(const char *path);
+
 typedef enum {
 BLKDBG_L1_UPDATE,
 
-- 
1.7.1

[Qemu-devel] [PATCH V17 03/10] libqblock: build: add configure support

2013-01-31 Thread Wenchao Xia

  Rule for libqblock.la will be included if it is enabled, and
will be added to 'all' to be automatically built.
  Only support Linux now, to save trouble in building on windows.

Signed-off-by: Wenchao Xia 
---
 Makefile   |3 +++
 configure  |   34 ++
 libqblock/Makefile |4 
 3 files changed, 41 insertions(+), 0 deletions(-)
 create mode 100644 libqblock/Makefile

diff --git a/Makefile b/Makefile
index b2d7798..a472fd3 100644
--- a/Makefile
+++ b/Makefile
@@ -114,6 +114,9 @@ endif
 ifeq ($(CONFIG_SMARTCARD_NSS),y)
 include $(SRC_PATH)/libcacard/Makefile
 endif
+ifeq ($(CONFIG_LIBQBLOCK),y)
+include $(SRC_PATH)/libqblock/Makefile
+endif
 
 all: $(DOCS) $(TOOLS) $(HELPERS-y) recurse-all
 
diff --git a/configure b/configure
index 64bde7b..3ce9166 100755
--- a/configure
+++ b/configure
@@ -226,6 +226,8 @@ coroutine=""
 seccomp=""
 glusterfs=""
 virtio_blk_data_plane=""
+libqblock=""
+libqblock_requires_y=""
 
 # parse CC options first
 for opt do
@@ -898,6 +900,10 @@ for opt do
   ;;
   --enable-virtio-blk-data-plane) virtio_blk_data_plane="yes"
   ;;
+  --disable-libqblock) libqblock="no"
+  ;;
+  --enable-libqblock) libqblock="yes"
+  ;;
   *) echo "ERROR: unknown option $opt"; show_help="yes"
   ;;
   esac
@@ -1147,6 +1153,8 @@ echo "  --enable-glusterfs   enable GlusterFS backend"
 echo "  --disable-glusterfs  disable GlusterFS backend"
 echo "  --enable-gcovenable test coverage analysis with gcov"
 echo "  --gcov=GCOV  use specified gcov [$gcov_tool]"
+echo "  --enable-libqblock   enable shared library libqblock"
+echo "  --disable-libqblock  disable shared library libqblock"
 echo ""
 echo "NOTE: The object files are built at the place where configure is 
launched"
 exit 1
@@ -2445,6 +2453,25 @@ EOF
   fi
 fi
 
+##
+# libqblock probe
+if test "$libqblock" != "no"; then
+if test -n "$libtool" &&
+$pkg_config --atleast-version=$glib_req_ver glib-2.0 > /dev/null 
2>&1 && \
+$pkg_config --atleast-version=$glib_req_ver gthread-2.0 > 
/dev/null 2>&1 && \
+test "$linux" = "yes" \
+; then
+# Only support Linux now, check for librt and libz are missing, 
add it later.
+libqblock="yes"
+libqblock_requires_y="gthread-2.0 glib-2.0 rt z"
+else
+if test "$libqblock" = "yes"; then
+feature_not_found "libqblock"
+fi
+libqblock="no"
+fi
+fi
+
 #
 # Check for xxxat() functions when we are building linux-user
 # emulator.  This is done because older glibc versions don't
@@ -3361,6 +3388,7 @@ echo "GlusterFS support $glusterfs"
 echo "virtio-blk-data-plane $virtio_blk_data_plane"
 echo "gcov  $gcov_tool"
 echo "gcov enabled  $gcov"
+echo "libqblock support $libqblock"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -3717,6 +3745,11 @@ if test "$virtio_blk_data_plane" = "yes" ; then
   echo "CONFIG_VIRTIO_BLK_DATA_PLANE=y" >> $config_host_mak
 fi
 
+if test "$libqblock" = "yes" ; then
+  echo "CONFIG_LIBQBLOCK=y" >> $config_host_mak
+  echo "libqblock-requires-y=$libqblock_requires_y" >> $config_host_mak
+fi
+
 # USB host support
 case "$usb" in
 linux)
@@ -4301,6 +4334,7 @@ DIRS="tests tests/tcg tests/tcg/cris tests/tcg/lm32"
 DIRS="$DIRS pc-bios/optionrom pc-bios/spapr-rtas"
 DIRS="$DIRS roms/seabios roms/vgabios"
 DIRS="$DIRS qapi-generated"
+DIRS="$DIRS libqblock"
 FILES="Makefile tests/tcg/Makefile qdict-test-data.txt"
 FILES="$FILES tests/tcg/cris/Makefile tests/tcg/cris/.gdbinit"
 FILES="$FILES tests/tcg/lm32/Makefile"
diff --git a/libqblock/Makefile b/libqblock/Makefile
new file mode 100644
index 000..8173da7
--- /dev/null
+++ b/libqblock/Makefile
@@ -0,0 +1,4 @@
+all: libqblock.la
+
+libqblock.la:
+   @true
-- 
1.7.1

Re: [Qemu-devel] RFC migration of zero pages

2013-01-31 Thread Gleb Natapov

On Thu, Jan 31, 2013 at 09:47:24AM +0200, Orit Wasserman wrote:
> On 01/31/2013 08:57 AM, Peter Lieven wrote:
> > Hi,
> > 
> > I just came across an idea and would like to have feedback if it makes 
> > sence or not.
> > 
> > If a VM is started without preallocated memory all memory that has not been 
> > written to
> > reads as zeros, right?
> Hi,
> No the memory will be unmapped (we allocate on demand).
> > If a VM with a lot of unwritten memory is migrated or if the memory 
> > contains a lot
> > of zeroed out memory (e.g. Windows or Linux guest with page sanitization) 
> > all this memory
> > is allocated on the target during live migration. Especially with KSM this 
> > leads
> > to the problem that this memory is allocated and might be not available 
> > completely as
> > merging of the pages will happen async.
> >
> > Wouldn't it make sense to not send zero pages in the first round where the 
> > complete
> > ram is sent (if it is detectable that we are in this stage)?
> We send one byte per zero page at the moment (see is_dup_page) we can further 
> optimizing it
> by not sending it.
> I have to point out that this is a very idle guest and we need to work on a 
> loaded guest 
> which is the more hard problem in migration.
> 
> Also I notice that the bottle neck in migrating unmapped pages is the 
> detection of those pages
> because we map the pages in order to check them, for a large guest this is 
> very expensive as mapping a page
> results in a page fault in the host.
> So what will be very helpful is actually locating those pages without mapping 
> them
> which looks very complicated.
> 
What is wrong with mincore()?

--
Gleb.

Re: [Qemu-devel] [PATCH] Fix monitor 'info registers' command on multi processor guests

2013-01-31 Thread Markus Armbruster

"Erlon Cruz"  writes:

> QEMU monitor command 'info registers' only displays information for the first
> CPU. This fix that by show registers information for each CPU in the system

This is incorrect.  It displays information for the *current* CPU.
Monitor command "cpu" selects the current CPU.

I doubt we want to change this command.  But I'm reviewing it regardless
of my doubts.

> Signed-off-by: erlon.c...@fit-tecnologia.org.br
> ---
>  monitor.c |   19 +++
>  1 file changed, 15 insertions(+), 4 deletions(-)
>
> diff --git a/monitor.c b/monitor.c
> index c0e32d6..70e9227 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -898,8 +898,14 @@ int monitor_get_cpu_index(void)
>  static void do_info_registers(Monitor *mon)
>  {
>  CPUArchState *env;
> -env = mon_get_cpu();
> -cpu_dump_state(env, (FILE *)mon, monitor_fprintf, CPU_DUMP_FPU);
> +
> +for (env = first_cpu; env != NULL; env = env->next_cpu) {
> +monitor_fprintf((FILE *)mon, "= "
> +"Registers on CPU %d =\n",
> +env->cpu_index);
> +cpu_dump_state(env, (FILE *)mon, monitor_fprintf, CPU_DUMP_FPU);
> +monitor_fprintf((FILE *)mon, "\n");
> +}
>  }
>  

You lost the cpu_synchronize_state() in mon_get_cpu().  Why is that
okay?

>  static void do_info_jit(Monitor *mon)
> @@ -930,8 +936,13 @@ static void do_info_cpu_stats(Monitor *mon)
>  {
>  CPUArchState *env;
>  
> -env = mon_get_cpu();
> -cpu_dump_statistics(env, (FILE *)mon, &monitor_fprintf, 0);
> +for (env = first_cpu; env != NULL; env = env->next_cpu) {
> +monitor_fprintf((FILE *)mon, "= "
> +"Statistics for CPU %d =\n",
> +env->cpu_index);
> +cpu_dump_statistics(env, (FILE *)mon, &monitor_fprintf, 0);
> +monitor_fprintf((FILE *)mon, "\n");
> +}
>  }
>  #endif

Same here.

Furthermore, you neglected to update the command's documentation in
hmp-commands.hx, and the reference to it in qmp-commands.hx.

[Qemu-devel] [PATCH] configure: Undefine _FORTIFY_SOURCE prior using it

2013-01-31 Thread Michal Privoznik

Currently, we are enforcing the _FORTIFY_SOURCE=2 without any
previous detection if the macro has been already defined, e.g.
by environment, or is just enabled by compiler by default.

Signed-off-by: Michal Privoznik 
---
 configure | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure b/configure
index b7635e4..97070eb 100755
--- a/configure
+++ b/configure
@@ -3159,7 +3159,7 @@ if test "$gcov" = "yes" ; then
   CFLAGS="-fprofile-arcs -ftest-coverage -g $CFLAGS"
   LDFLAGS="-fprofile-arcs -ftest-coverage $LDFLAGS"
 elif test "$debug" = "no" ; then
-  CFLAGS="-O2 -D_FORTIFY_SOURCE=2 $CFLAGS"
+  CFLAGS="-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 $CFLAGS"
 fi
 
 
-- 
1.8.0.2

Re: [Qemu-devel] RFC migration of zero pages

2013-01-31 Thread Orit Wasserman

On 01/31/2013 11:48 AM, Gleb Natapov wrote:
> On Thu, Jan 31, 2013 at 09:47:24AM +0200, Orit Wasserman wrote:
>> On 01/31/2013 08:57 AM, Peter Lieven wrote:
>>> Hi,
>>>
>>> I just came across an idea and would like to have feedback if it makes 
>>> sence or not.
>>>
>>> If a VM is started without preallocated memory all memory that has not been 
>>> written to
>>> reads as zeros, right?
>> Hi,
>> No the memory will be unmapped (we allocate on demand).
>>> If a VM with a lot of unwritten memory is migrated or if the memory 
>>> contains a lot
>>> of zeroed out memory (e.g. Windows or Linux guest with page sanitization) 
>>> all this memory
>>> is allocated on the target during live migration. Especially with KSM this 
>>> leads
>>> to the problem that this memory is allocated and might be not available 
>>> completely as
>>> merging of the pages will happen async.
>>>
>>> Wouldn't it make sense to not send zero pages in the first round where the 
>>> complete
>>> ram is sent (if it is detectable that we are in this stage)?
>> We send one byte per zero page at the moment (see is_dup_page) we can 
>> further optimizing it
>> by not sending it.
>> I have to point out that this is a very idle guest and we need to work on a 
>> loaded guest 
>> which is the more hard problem in migration.
>>
>> Also I notice that the bottle neck in migrating unmapped pages is the 
>> detection of those pages
>> because we map the pages in order to check them, for a large guest this is 
>> very expensive as mapping a page
>> results in a page fault in the host.
>> So what will be very helpful is actually locating those pages without 
>> mapping them
>> which looks very complicated.
>>
> What is wrong with mincore()?

Avi/Michael do you remember why mincore can't be used to check if a guest page 
is unmapped?

> 
> --
>   Gleb.
>

Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O

2013-01-31 Thread Michael S. Tsirkin

On Wed, Jan 30, 2013 at 04:28:30PM -0700, Alex Williamson wrote:
> On Thu, 2013-01-31 at 10:02 +1100, Benjamin Herrenschmidt wrote:
> > On Thu, 2013-01-31 at 00:49 +0200, Michael S. Tsirkin wrote:
> > > > In practice they do (VGA at least)
> > > > 
> > > > >From a SW modelling standpoint, I don't think it's worth
> > > differentiating
> > > > PCI and PCIE.
> > > > 
> > > > Cheers,
> > > > Ben.
> > > 
> > > Interesting.
> > > Do you have such hardware? Could you please dump
> > > the output of lspci -vv?
> > 
> > Any ATI or nVidia card still supports hard decoding of VGA regions for
> > the sake of legacy operating systems and BIOSes :-) I don't know about
> > Intel but I suppose it's the same.
> 
> For example:
> 
> -[:00]-+-00.0  Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI 
> bridge (external gfx0 p
>+-04.0-[02]--+-00.0  Advanced Micro Devices [AMD] nee ATI Cedar 
> PRO [Radeon HD 5450/6350]
> 
> 00:04.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI 
> bridge (PCI express gpp port D) (prog-if 00 [Normal decode])
>   Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
> Stepping- SERR- FastB2B- DisINTx-
>   Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-  SERR-Latency: 0, Cache Line Size: 64 bytes
>   Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
>   I/O behind bridge: c000-cfff
>   Memory behind bridge: fd10-fd1f
>   Prefetchable memory behind bridge: d000-dfff
>   Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- BridgeCtl: Parity- SERR- NoISA- VGA+ MAbort- >Reset- FastB2B-
>   
> VGA+ (VGA Enable) indicates positive decode of 0x3b0 - 0x3bb, 0x3c0 -
> 0x3df, and 0xa - 0xbfff.  Device 2:00.0 of course doesn't report
> these "ISA" ranges as they're implicit in the VGA class code.

OK but this appears behind a bridge.  So the bridge configuration tells
the root complex where to send accesses to the VGA.

But qemu currently puts devices directly on root bus.

And as far as I can tell when we present devices directly on bus 0, we
pretend these are integrated in the root complex. The spec seems to
say explicitly that root complex integrated devices should not use legacy
addresses or support hotplug. So I would be surprised if such one
appears in real world.

Luckily guests do not seem to be worried as long as we use ACPI.

> 
> BTW, I've been working on vfio-pci support of VGA assignment which makes
> use of the VGA arbiter in the host to manipulate the VGA Enable control
> register, allowing us to select which device to access.  The qemu side
> is simply registering memory regions for the VGA areas and expecting to
> be used with -vga none, but I'll adopt whatever strategy we choose for
> hard coded address range support.  Current base patches at the links
> below.  Thanks,
> 
> Alex
> 
> https://github.com/awilliam/qemu-vfio/commit/ea2befa59010a429dcf13c10dbccdf8b64e82fbd
> https://github.com/awilliam/linux-vfio/commit/bae182d929229cbf1eaeb01e5fad4f77f81a4c61

Re: [Qemu-devel] [RFC PATCH RDMA support v1: 4/5] connection-setup code between client/server

2013-01-31 Thread Orit Wasserman

Hi Michael,
It will be cleaner to create a new file migration-rdma.c and put all the
RDMA logic into it, this way you won't effect TCP migration code.

More comments inline

On 01/29/2013 12:01 AM, mrhi...@linux.vnet.ibm.com wrote:
> From: "Michael R. Hines" 
> 
> 
> Signed-off-by: Michael R. Hines 
> ---
>  migration-tcp.c |   53 +
>  migration.c |   41 +
>  2 files changed, 94 insertions(+)
> 
> diff --git a/migration-tcp.c b/migration-tcp.c
> index e78a296..b6c53ba 100644
> --- a/migration-tcp.c
> +++ b/migration-tcp.c
> @@ -14,10 +14,12 @@
>   */
>  
>  #include "qemu-common.h"
> +#include "qemu/rdma.h"
>  #include "qemu/sockets.h"
>  #include "migration/migration.h"
>  #include "migration/qemu-file.h"
>  #include "block/block.h"
> +#include 
>  
>  //#define DEBUG_MIGRATION_TCP
>  
> @@ -55,6 +57,9 @@ static void tcp_wait_for_connect(int fd, void *opaque)
>  
>  if (fd < 0) {
>  DPRINTF("migrate connect error\n");
> +if (qemu_use_rdma_migration()) {
> +qemu_rdma_migration_cleanup(&rdma_mdata);
> +}
>  s->fd = -1;
>  migrate_fd_error(s);
>  } else {
> @@ -62,6 +67,25 @@ static void tcp_wait_for_connect(int fd, void *opaque)
>  s->fd = fd;
>  socket_set_block(s->fd);
>  migrate_fd_connect(s);
> +
> +/* RDMA initailization */
> +if (qemu_use_rdma_migration()) {
> +if (qemu_rdma_migration_client_init(&rdma_mdata)) {
> +migrate_fd_error(s);
> +return;
> +}
> +
> +fprintf(stderr, "qemu_rdma_migration_client_init success\n");

Please use DPRINTF or trace for debug.
> +
> +if (qemu_rdma_migration_client_connect(&rdma_mdata)) {
> +migrate_fd_error(s);
> +close(s->fd);
> +return;
> +}
> +
> +fprintf(stderr, "qemu_rdma_migration_client_connect success\n");
> +}
> +
>  }
>  }
>  
> @@ -101,6 +125,16 @@ static void tcp_accept_incoming_migration(void *opaque)
>  goto out;
>  }
>  
> +if (qemu_use_rdma_migration()) {
> +printf("listen on port started: %d\n", rdma_mdata.port);
> +
> +if (qemu_rdma_migration_server_wait_for_client(&rdma_mdata)) {
> +fprintf(stderr, "rdma migration: error waiting for client!\n");
> +close(s);
> +return;
> +}
> +}
> +
>  process_incoming_migration(f);
>  return;
>  
> @@ -117,6 +151,25 @@ void tcp_start_incoming_migration(const char *host_port, 
> Error **errp)
>  return;
>  }
>  
> +if (qemu_use_rdma_migration()) {
> +int ret;
> +
> +ret = qemu_rdma_migration_server_init(&rdma_mdata);
> +if (ret) {
> +fprintf(stderr, "rdma migration: error init server!\n");

Please set errp in case of an error, I would suggest passing it into 
qemu_rdma_migration_server_init
that will set the relevant error.   
> +close(s);
> +return;
> +}
> +
> +ret = qemu_rdma_migration_server_prepare(&rdma_mdata);
> +if (ret) {
> +fprintf(stderr, "rdma migration: error preparing server!\n");
> +close(s);
> +return;
> +}
> +
> +}
> +
>  qemu_set_fd_handler2(s, NULL, tcp_accept_incoming_migration, NULL,
>   (void *)(intptr_t)s);
>  }
> diff --git a/migration.c b/migration.c
> index 77c1971..cffe16f 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -22,6 +22,7 @@
>  #include "qemu/sockets.h"
>  #include "migration/block.h"
>  #include "qemu/thread.h"
> +#include "qemu/rdma.h"
>  #include "qmp-commands.h"
>  
>  //#define DEBUG_MIGRATION
> @@ -279,6 +280,11 @@ static int migrate_fd_cleanup(MigrationState *s)
>  }
>  
>  assert(s->fd == -1);
> +
> +if (qemu_rdma_migration_enabled()) {
> +qemu_rdma_migration_cleanup(&rdma_mdata);
> +}
> +
>  return ret;
>  }
>  
> @@ -481,6 +487,41 @@ int64_t qmp_query_migrate_cache_size(Error **errp)
>  return migrate_xbzrle_cache_size();
>  }
>  
> +void qmp_migrate_set_rdma_port(int64_t port, Error **errp)
> +{
> +MigrationState *s = migrate_get_current();
> +if (s && (s->state == MIG_STATE_ACTIVE)) {
> +return;
> +}

Please return error here by setting errp.
> +printf("rdma migration port: %" PRId64 "\n", port);
You forgot a printf here ...
> +rdma_mdata.port = port;
> +}
> +
> +void qmp_migrate_set_rdma_host(const char *host, Error **errp)
> +{
> +MigrationState *s = migrate_get_current();
> +if (s && (s->state == MIG_STATE_ACTIVE)) {
> +return;
> +}
> +printf("rdma migration host name: %s\n", host);
> +strncpy(rdma_mdata.host, host, 64);

Use g_strdup here

regards,
Orit
> +rdma_mdata.host[63] = '\0';

> +}
> +
> +void qmp_migrate_rdma_prepare(Error **errp)
> +{
> +Migra

Re: [Qemu-devel] [kvmarm] [RFC v5 7/8] hw/kvm/arm_gic: Implement support for KVM in-kernel ARM GIC

2013-01-31 Thread KONRAD Frédéric


On 24/01/2013 16:43, Peter Maydell wrote:

Implement support for using the KVM in-kernel GIC for ARM.

Signed-off-by: Peter Maydell 
---
  hw/a15mpcore.c   |8 ++-
  hw/arm/Makefile.objs |1 +
  hw/kvm/arm_gic.c |  169 ++
  3 files changed, 177 insertions(+), 1 deletion(-)
  create mode 100644 hw/kvm/arm_gic.c

diff --git a/hw/a15mpcore.c b/hw/a15mpcore.c
index fe6c34c..1ca6f28 100644
--- a/hw/a15mpcore.c
+++ b/hw/a15mpcore.c
@@ -19,6 +19,7 @@
   */
  
  #include "sysbus.h"

+#include "sysemu/kvm.h"
  
  /* A15MP private memory region.  */
  
@@ -40,8 +41,13 @@ static int a15mp_priv_init(SysBusDevice *dev)

  {
  A15MPPrivState *s = FROM_SYSBUS(A15MPPrivState, dev);
  SysBusDevice *busdev;
+const char *gictype = "arm-gic";

s/arm-gic/arm_gic/ ^^ ?

Christoffer and I had trouble with that:

qemu-system-arm: Unknown device 'arm-gic' for default sysbus

Fred
  
-s->gic = qdev_create(NULL, "arm_gic");

+if (kvm_irqchip_in_kernel()) {
+gictype = "kvm-arm-gic";
+}
+
+s->gic = qdev_create(NULL, gictype);
  qdev_prop_set_uint32(s->gic, "num-cpu", s->num_cpu);
  qdev_prop_set_uint32(s->gic, "num-irq", s->num_irq);

[Qemu-devel] [PATCH 00/11] main-loop: switch to g_poll(3) on POSIX hosts

2013-01-31 Thread Stefan Hajnoczi

Amos Kong  reported that file descriptors numbered higher
than 1024 could crash QEMU.  This is due to the fixed size of the fd_set type
used for select(2) event polling.

This series converts the main-loop.c and aio-posix.c select(2) calls to
g_poll(3).  This eliminates the fd_set type and allows QEMU to scale to high
numbers of file descriptors.

The g_poll(3) interface is a portable version of the poll(2) system call.  The
difference to select(2) is that fine-grained events (G_IO_IN, G_IO_OUT,
G_IO_HUP, G_IO_ERR, G_IO_PRI) can be monitored instead of just
read/write/exception.  Also, there is no limit to the file descriptor numbers
that may be used, allowing applications to scale to many file descriptors.  See
the documentation for details:

  http://developer.gnome.org/glib/2.28/glib-The-Main-Event-Loop.html#g-poll

The QEMU main loop works as follows today:

1. Call out to slirp, iohandlers, and glib sources to fill rfds/wfds/xfds with
   the file descriptors to select(2).
2. Perform the select(2) call.
3. Call out to slirp, iohandlers, and glib sources to handle events polled in
   rfds/wfds/xfds.

The plan of attack is as follows:

1. Add a Poller type for growable GPollFD arrays.  The is the new type that
   will be used instead of fd_set.

2. Replace select(2) with g_poll(3).  Use glue that converts between
   rfds/wfds/xfds and Poller so that the unconverted QEMU components still
   work.

3. Convert slirp, iohandlers, and glib source fill/poll functions to use Poller
   directly instead of rfds/wfds/xfds.

4. Drop the glue since all components now natively use Poller.

5. Convert aio-posix.c to g_poll(3) by reusing Poller.

I have tested that the series builds and is bisectable on Linux and Windows
hosts.  But I have not done extensive testing on other host platforms or with
long-term guests to check for performance regressions.

Stefan Hajnoczi (11):
  main-loop: fix select_ret uninitialized variable warning
  poller: add Poller for growable GPollFD arrays
  poller: add poller_fill() and poller_poll()
  main-loop: switch to g_poll() on POSIX hosts
  main-loop: switch POSIX glib integration to Poller
  slirp: switch to Poller
  iohandler: switch to Poller
  main-loop: drop rfds/wfds/xfds for good
  aio: extract aio_dispatch() from aio_poll()
  aio: convert aio_poll() to g_poll(3)
  aio: support G_IO_HUP and G_IO_ERR

 aio-posix.c  | 125 ---
 async.c  |   2 +
 include/block/aio.h  |   4 ++
 include/qemu/main-loop.h |   5 +-
 include/qemu/poller.h|  68 ++
 iohandler.c  |  33 +
 main-loop.c  | 110 +
 slirp/libslirp.h |   8 +--
 slirp/main.h |   1 -
 slirp/slirp.c| 113 +-
 slirp/socket.c   |   9 
 slirp/socket.h   |   2 +
 stubs/slirp.c|   6 +--
 util/Makefile.objs   |   1 +
 util/poller.c| 100 +
 15 files changed, 365 insertions(+), 222 deletions(-)
 create mode 100644 include/qemu/poller.h
 create mode 100644 util/poller.c

-- 
1.8.1

[Qemu-devel] [PATCH 11/11] aio: support G_IO_HUP and G_IO_ERR

2013-01-31 Thread Stefan Hajnoczi

aio-posix.c could not take advantage of G_IO_HUP and G_IO_ERR because
select(2) does not have equivalent events.  Now that g_poll(3) is used
we can support G_IO_HUP and G_IO_ERR.

Signed-off-by: Stefan Hajnoczi 
---
 aio-posix.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/aio-posix.c b/aio-posix.c
index 303e19f..558ac99 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -88,8 +88,8 @@ void aio_set_fd_handler(AioContext *ctx,
 node->opaque = opaque;
 node->poller_idx = -1;
 
-node->pfd.events = (io_read ? G_IO_IN | G_IO_HUP : 0);
-node->pfd.events |= (io_write ? G_IO_OUT : 0);
+node->pfd.events = (io_read ? G_IO_IN | G_IO_HUP | G_IO_ERR : 0);
+node->pfd.events |= (io_write ? G_IO_OUT | G_IO_ERR : 0);
 }
 
 aio_notify(ctx);
@@ -112,13 +112,6 @@ bool aio_pending(AioContext *ctx)
 QLIST_FOREACH(node, &ctx->aio_handlers, node) {
 int revents;
 
-/*
- * FIXME: right now we cannot get G_IO_HUP and G_IO_ERR because
- * main-loop.c is still select based (due to the slirp legacy).
- * If main-loop.c ever switches to poll, G_IO_ERR should be
- * tested too.  Dispatching G_IO_ERR to both handlers should be
- * okay, since handlers need to be ready for spurious wakeups.
- */
 revents = node->pfd.revents & node->pfd.events;
 if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read) {
 return true;
@@ -150,7 +143,6 @@ static bool aio_dispatch(AioContext *ctx)
 revents = node->pfd.revents & node->pfd.events;
 node->pfd.revents = 0;
 
-/* See comment in aio_pending.  */
 if (!node->deleted &&
 (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) &&
 node->io_read) {
-- 
1.8.1

[Qemu-devel] [PATCH 08/11] main-loop: drop rfds/wfds/xfds for good

2013-01-31 Thread Stefan Hajnoczi

Now that all *_fill() and *_poll() functions use Poller we no longer
need rfds/wfds/xfds or poller_from_select()/poller_to_select().

>From now on everything uses Poller.

Signed-off-by: Stefan Hajnoczi 
---
 main-loop.c | 73 ++---
 1 file changed, 2 insertions(+), 71 deletions(-)

diff --git a/main-loop.c b/main-loop.c
index b638afb..4209b7c 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -142,62 +142,8 @@ int qemu_init_main_loop(void)
 }
 
 static Poller poller;
-static fd_set rfds, wfds, xfds;
-static int nfds;
 static int max_priority;
 
-/* Load rfds/wfds/xfds into Poller.  Will be removed a few commits later. */
-static void poller_from_select(void)
-{
-int fd;
-for (fd = 0; fd <= nfds; fd++) {
-int events = 0;
-if (FD_ISSET(fd, &rfds)) {
-events |= G_IO_IN | G_IO_HUP | G_IO_ERR;
-}
-if (FD_ISSET(fd, &wfds)) {
-events |= G_IO_OUT | G_IO_ERR;
-}
-if (FD_ISSET(fd, &xfds)) {
-events |= G_IO_PRI;
-}
-if (events) {
-poller_add_fd(&poller, fd, events);
-}
-}
-}
-
-/* Store Poller revents into rfds/wfds/xfds.  Will be removed a few commits
- * later.
- */
-static void poller_to_select(int ret)
-{
-int i;
-
-FD_ZERO(&rfds);
-FD_ZERO(&wfds);
-FD_ZERO(&xfds);
-
-if (ret <= 0) {
-return;
-}
-
-for (i = 0; i < poller.nfds; i++) {
-int fd = poller.poll_fds[i].fd;
-int revents = poller.poll_fds[i].revents;
-
-if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) {
-FD_SET(fd, &rfds);
-}
-if (revents & (G_IO_OUT | G_IO_ERR)) {
-FD_SET(fd, &wfds);
-}
-if (revents & G_IO_PRI) {
-FD_SET(fd, &xfds);
-}
-}
-}
-
 #ifndef _WIN32
 static int glib_poller_idx;
 static int glib_n_poll_fds;
@@ -256,15 +202,8 @@ static int os_host_main_loop_wait(uint32_t timeout)
 qemu_mutex_unlock_iothread();
 }
 
-/* We'll eventually drop fd_set completely.  But for now we still have
- * *_fill() and *_poll() functions that use rfds/wfds/xfds.
- */
-poller_from_select();
-
 ret = g_poll(poller.poll_fds, poller.nfds, timeout);
 
-poller_to_select(ret);
-
 if (timeout > 0) {
 qemu_mutex_lock_iothread();
 }
@@ -373,6 +312,8 @@ static int os_host_main_loop_wait(uint32_t timeout)
 WaitObjects *w = &wait_objects;
 gint poll_timeout;
 static struct timeval tv0;
+fd_set rfds, wfds, xfds;
+int nfds;
 
 /* XXX: need to suppress polling by better using win32 events */
 ret = 0;
@@ -419,10 +360,6 @@ static int os_host_main_loop_wait(uint32_t timeout)
  * improve socket latency.
  */
 
-/* This back-and-forth between Poller and select(2) is temporary.  We'll
- * drop it in a couple of patches, I promise :).
- */
-poller_from_select();
 FD_ZERO(&rfds);
 FD_ZERO(&wfds);
 FD_ZERO(&xfds);
@@ -434,7 +371,6 @@ static int os_host_main_loop_wait(uint32_t timeout)
 poller_poll(&poller, nfds, &rfds, &wfds, &xfds);
 }
 }
-poller_to_select(select_ret || g_poll_ret);
 
 return select_ret || g_poll_ret;
 }
@@ -452,11 +388,6 @@ int main_loop_wait(int nonblocking)
 /* poll any events */
 poller_reset(&poller);
 /* XXX: separate device handlers from system ones */
-nfds = -1;
-FD_ZERO(&rfds);
-FD_ZERO(&wfds);
-FD_ZERO(&xfds);
-
 #ifdef CONFIG_SLIRP
 slirp_update_timeout(&timeout);
 slirp_poller_fill(&poller);
-- 
1.8.1

[Qemu-devel] [PATCH 09/11] aio: extract aio_dispatch() from aio_poll()

2013-01-31 Thread Stefan Hajnoczi

We will need to loop over AioHandlers calling ->io_read()/->io_write()
when aio_poll() is converted from select(2) to g_poll(2).

Luckily the code for this already exists, extract it into the new
aio_dispatch() function.

Two small changes:

 * aio_poll() checks !node->deleted to avoid calling handlers that have
   been deleted.

 * Fix typo 'then' -> 'them' in aio_poll() comment.

Signed-off-by: Stefan Hajnoczi 
---
 aio-posix.c | 57 +++--
 1 file changed, 35 insertions(+), 22 deletions(-)

diff --git a/aio-posix.c b/aio-posix.c
index fe4dbb4..35131a3 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -129,30 +129,12 @@ bool aio_pending(AioContext *ctx)
 return false;
 }
 
-bool aio_poll(AioContext *ctx, bool blocking)
+static bool aio_dispatch(AioContext *ctx)
 {
-static struct timeval tv0;
 AioHandler *node;
-fd_set rdfds, wrfds;
-int max_fd = -1;
-int ret;
-bool busy, progress;
-
-progress = false;
-
-/*
- * If there are callbacks left that have been queued, we need to call then.
- * Do not call select in this case, because it is possible that the caller
- * does not need a complete flush (as is the case for qemu_aio_wait loops).
- */
-if (aio_bh_poll(ctx)) {
-blocking = false;
-progress = true;
-}
+bool progress = false;
 
 /*
- * Then dispatch any pending callbacks from the GSource.
- *
  * We have to walk very carefully in case qemu_aio_set_fd_handler is
  * called while we're walking.
  */
@@ -167,11 +149,15 @@ bool aio_poll(AioContext *ctx, bool blocking)
 node->pfd.revents = 0;
 
 /* See comment in aio_pending.  */
-if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read) {
+if (!node->deleted &&
+(revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) &&
+node->io_read) {
 node->io_read(node->opaque);
 progress = true;
 }
-if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write) {
+if (!node->deleted &&
+(revents & (G_IO_OUT | G_IO_ERR)) &&
+node->io_write) {
 node->io_write(node->opaque);
 progress = true;
 }
@@ -186,6 +172,33 @@ bool aio_poll(AioContext *ctx, bool blocking)
 g_free(tmp);
 }
 }
+return progress;
+}
+
+bool aio_poll(AioContext *ctx, bool blocking)
+{
+static struct timeval tv0;
+AioHandler *node;
+fd_set rdfds, wrfds;
+int max_fd = -1;
+int ret;
+bool busy, progress;
+
+progress = false;
+
+/*
+ * If there are callbacks left that have been queued, we need to call them.
+ * Do not call select in this case, because it is possible that the caller
+ * does not need a complete flush (as is the case for qemu_aio_wait loops).
+ */
+if (aio_bh_poll(ctx)) {
+blocking = false;
+progress = true;
+}
+
+if (aio_dispatch(ctx)) {
+progress = true;
+}
 
 if (progress && !blocking) {
 return true;
-- 
1.8.1

Re: [Qemu-devel] [RFC PATCH RDMA support v1: 5/5] send memory over RDMA as blocks are iterated

2013-01-31 Thread Orit Wasserman

Hi Michael,
I maybe missing something here but why do you need a RAM_SAVE_FLAG_RDMA flag?
You don't do any decoding in the destination.

I would suggest creating a QEMUFileRDMA and moving the write/read code
You can either add a new rdma_buffer QEMUFileOps or add the address to 
put_buffer.

you also have some white space damage in the beginning of savevm.c.

Regards,
Orit

On 01/29/2013 12:01 AM, mrhi...@linux.vnet.ibm.com wrote
> From: "Michael R. Hines" 
> 
> 
> Signed-off-by: Michael R. Hines 
> ---
>  arch_init.c   |  116 
> +++--
>  include/migration/qemu-file.h |1 +
>  savevm.c  |   90 +++-
>  3 files changed, 189 insertions(+), 18 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index dada6de..7633fa6 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -42,6 +42,7 @@
>  #include "migration/migration.h"
>  #include "exec/gdbstub.h"
>  #include "hw/smbios.h"
> +#include "qemu/rdma.h"
>  #include "exec/address-spaces.h"
>  #include "hw/pcspk.h"
>  #include "migration/page_cache.h"
> @@ -113,6 +114,7 @@ const uint32_t arch_type = QEMU_ARCH;
>  #define RAM_SAVE_FLAG_EOS  0x10
>  #define RAM_SAVE_FLAG_CONTINUE 0x20
>  #define RAM_SAVE_FLAG_XBZRLE   0x40
> +#define RAM_SAVE_FLAG_RDMA 0x80
>  
>  #ifdef __ALTIVEC__
>  #include 
> @@ -434,6 +436,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
>  int bytes_sent = 0;
>  MemoryRegion *mr;
>  ram_addr_t current_addr;
> +static int not_sent = 1;
>  
>  if (!block)
>  block = QTAILQ_FIRST(&ram_list.blocks);
> @@ -457,23 +460,75 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
>  int cont = (block == last_sent_block) ?
>  RAM_SAVE_FLAG_CONTINUE : 0;
>  
> +current_addr = block->offset + offset;
>  p = memory_region_get_ram_ptr(mr) + offset;
>  
>  /* In doubt sent page as normal */
>  bytes_sent = -1;
> -if (is_dup_page(p)) {
> +
> +/*
> + * RFC RDMA: The empirical cost of searching for zero pages here
> + *   plus the cost of communicating with the other side
> + *   seems to take significantly more time than simply
> + *   dumping the page into remote memory.
> + */
> +if (!qemu_rdma_migration_enabled() && is_dup_page(p)) {
>  acct_info.dup_pages++;
>  bytes_sent = save_block_hdr(f, block, offset, cont,
>  RAM_SAVE_FLAG_COMPRESS);
>  qemu_put_byte(f, *p);
>  bytes_sent += 1;
> +/*
> + * RFC RDMA: Same comment as above. time(run-length encoding)
> + *   + time(communication) is too big. RDMA throughput 
> tanks
> + *   when this feature is enabled. But there's no need
> + *   to change the code since the feature is optional.
> + */
>  } else if (migrate_use_xbzrle()) {
> -current_addr = block->offset + offset;
>  bytes_sent = save_xbzrle_page(f, p, current_addr, block,
>offset, cont, last_stage);
>  if (!last_stage) {
>  p = get_cached_data(XBZRLE.cache, current_addr);
>  }
> +} else if (qemu_rdma_migration_enabled()) {
> +int ret;
> +
> +/*
> + * RFC RDMA: This bad hack was to cause the loop on the
> + *   receiving side to break. Comments are welcome
> + *   on how to get rid of it.
> + */
> +if (not_sent == 1) {
> +not_sent = 0;
> +bytes_sent = save_block_hdr(f, block, offset,
> +cont, RAM_SAVE_FLAG_RDMA);
> +}
> +acct_info.norm_pages++;
> +/*
> + * use RDMA to send page
> + */
> +if (qemu_rdma_migration_write(&rdma_mdata, current_addr,
> +TARGET_PAGE_SIZE)) {
> +fprintf(stderr, "rdma migration: write error!\n");
> +qemu_file_set_error(f, -EIO);
> +return 0;
> +}
> +
> +/*
> + * do some polling
> + */
> +while (1) {
> +ret = qemu_rdma_migration_poll(&rdma_mdata);
> +if (ret == QEMU_RDMA_MIGRATION_WRID_NONE) {
> +break;
> +}
> +if (ret < 0) {
> +fprintf(stderr, "rdma migration: polling error!\n");
> +qemu_file_set_error(f,

[Qemu-devel] [PATCH 04/11] main-loop: switch to g_poll() on POSIX hosts

2013-01-31 Thread Stefan Hajnoczi

Use g_poll(3) instead of select(2).  Well, this is kind of a cheat.
It's true that we're now using g_poll(3) on POSIX hosts but the *_fill()
and *_poll() functions are still using rfds/wfds/xfds.

We've set the scene to start converting *_fill() and *_poll() functions
step-by-step until no more rfds/wfds/xfds users remain.  Then we'll drop
poller_from_select() and poller_to_select() completely and be left with
native g_poll(2).

On Windows things are a little crazy: convert from rfds/wfds/xfds to
Poller, back to rfds/wfds/xfds, call select(2), rfds/wfds/xfds back to
Poller, and finally back to rfds/wfds/xfds again.  This is only
temporary and keeps the Windows build working through the following
patches.  We'll drop this excessive conversion later and be left with a
single Poller -> select(2) -> Poller sequence that allows Windows to use
select(2) while the rest of QEMU only knows about Poller.

Signed-off-by: Stefan Hajnoczi 
---
 main-loop.c | 81 +++--
 1 file changed, 73 insertions(+), 8 deletions(-)

diff --git a/main-loop.c b/main-loop.c
index d0d8fe4..8d552d4 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -26,6 +26,7 @@
 #include "qemu/timer.h"
 #include "slirp/slirp.h"
 #include "qemu/main-loop.h"
+#include "qemu/poller.h"
 #include "block/aio.h"
 
 #ifndef _WIN32
@@ -140,12 +141,65 @@ int qemu_init_main_loop(void)
 return 0;
 }
 
+static Poller poller;
 static fd_set rfds, wfds, xfds;
 static int nfds;
 static GPollFD poll_fds[1024 * 2]; /* this is probably overkill */
 static int n_poll_fds;
 static int max_priority;
 
+/* Load rfds/wfds/xfds into Poller.  Will be removed a few commits later. */
+static void poller_from_select(void)
+{
+int fd;
+for (fd = 0; fd <= nfds; fd++) {
+int events = 0;
+if (FD_ISSET(fd, &rfds)) {
+events |= G_IO_IN | G_IO_HUP | G_IO_ERR;
+}
+if (FD_ISSET(fd, &wfds)) {
+events |= G_IO_OUT | G_IO_ERR;
+}
+if (FD_ISSET(fd, &xfds)) {
+events |= G_IO_PRI;
+}
+if (events) {
+poller_add_fd(&poller, fd, events);
+}
+}
+}
+
+/* Store Poller revents into rfds/wfds/xfds.  Will be removed a few commits
+ * later.
+ */
+static void poller_to_select(int ret)
+{
+int i;
+
+FD_ZERO(&rfds);
+FD_ZERO(&wfds);
+FD_ZERO(&xfds);
+
+if (ret <= 0) {
+return;
+}
+
+for (i = 0; i < poller.nfds; i++) {
+int fd = poller.poll_fds[i].fd;
+int revents = poller.poll_fds[i].revents;
+
+if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) {
+FD_SET(fd, &rfds);
+}
+if (revents & (G_IO_OUT | G_IO_ERR)) {
+FD_SET(fd, &wfds);
+}
+if (revents & G_IO_PRI) {
+FD_SET(fd, &xfds);
+}
+}
+}
+
 #ifndef _WIN32
 static void glib_select_fill(int *max_fd, fd_set *rfds, fd_set *wfds,
  fd_set *xfds, uint32_t *cur_timeout)
@@ -212,22 +266,22 @@ static void glib_select_poll(fd_set *rfds, fd_set *wfds, 
fd_set *xfds,
 
 static int os_host_main_loop_wait(uint32_t timeout)
 {
-struct timeval tv, *tvarg = NULL;
 int ret;
 
 glib_select_fill(&nfds, &rfds, &wfds, &xfds, &timeout);
 
-if (timeout < UINT32_MAX) {
-tvarg = &tv;
-tv.tv_sec = timeout / 1000;
-tv.tv_usec = (timeout % 1000) * 1000;
-}
-
 if (timeout > 0) {
 qemu_mutex_unlock_iothread();
 }
 
-ret = select(nfds + 1, &rfds, &wfds, &xfds, tvarg);
+/* We'll eventually drop fd_set completely.  But for now we still have
+ * *_fill() and *_poll() functions that use rfds/wfds/xfds.
+ */
+poller_from_select();
+
+ret = g_poll(poller.poll_fds, poller.nfds, timeout);
+
+poller_to_select(ret);
 
 if (timeout > 0) {
 qemu_mutex_lock_iothread();
@@ -382,12 +436,22 @@ static int os_host_main_loop_wait(uint32_t timeout)
  * improve socket latency.
  */
 
+/* This back-and-forth between Poller and select(2) is temporary.  We'll
+ * drop it in a couple of patches, I promise :).
+ */
+poller_from_select();
+FD_ZERO(&rfds);
+FD_ZERO(&wfds);
+FD_ZERO(&xfds);
+nfds = poller_fill(&poller, &rfds, &wfds, &xfds);
 if (nfds >= 0) {
 select_ret = select(nfds + 1, &rfds, &wfds, &xfds, &tv0);
 if (select_ret != 0) {
 timeout = 0;
+poller_poll(&poller, nfds, &rfds, &wfds, &xfds);
 }
 }
+poller_to_select(select_ret || g_poll_ret);
 
 return select_ret || g_poll_ret;
 }
@@ -403,6 +467,7 @@ int main_loop_wait(int nonblocking)
 }
 
 /* poll any events */
+poller_reset(&poller);
 /* XXX: separate device handlers from system ones */
 nfds = -1;
 FD_ZERO(&rfds);
-- 
1.8.1

Re: [Qemu-devel] [PATCH WIP 0/4] vhost-scsi: new device supporting the tcm_vhost Linux kernel module

2013-01-31 Thread Michael S. Tsirkin

On Wed, Jan 30, 2013 at 05:41:22PM +0100, Paolo Bonzini wrote:
> Ok, so here is my attempt at a vhost-scsi device.  I'm creating an
> entirely separate device, with the common parts of virtio-scsi and
> vhost-scsi (actually little more than the initialization) grouped into
> a VirtIOSCSICommon type.  The device is used simply like "-device
> vhost-scsi-pci,wwpn=WWPN", with all configuration done in configfs
> beforehand.
> 
> As expected, using a separate device finds a snag: vhost-scsi is passing
> force=false to vhost_dev_init, and the BIOS does not use MSI-X so it
> will actually use the non-vhost implementation which is wrong.  I fixed
> this by passing force=true; I'm not sure what that would break, but I
> figured "not much" since the BIOS polls and does not rely on interrupts.
> 
> That makes vhost start, but it still doesn't work for me with a 3.7.2
> kernel on the host.  Even Nick's patches hang the guest as soon as vhost
> starts, and I get the same behavior with mine.  (Of course with my
> patches the BIOS hangs and you never reach Linux; but try a BIOS without
> virtio-scsi support, and you'll see Linux hanging in the same way).
> 
> Here is my configuration:
> 
>   cd /sys/kernel/config/target
>   mkdir -p core/fileio_0/fileio
>   echo 'fd_dev_name=/home/pbonzini/test.img,fd_dev_size=5905580032' > 
> core/fileio_0/fileio/control 
>   echo 1 > core/fileio_0/fileio/enable
>   mkdir -p vhost/naa.600140554cf3a18e/tpgt_0/lun/lun_0
>   cd vhost/naa.600140554cf3a18e/tpgt_0
>   ln -sf ../../../../../core/fileio_0/fileio/ lun/lun_0/virtual_scsi_port
>   echo naa.60014053226f0388 > nexus
> 
> Nick's patches are run with "-vhost-scsi 
> id=vs,tpgt=0,wwpn=naa.600140554cf3a18e
> -device virtio-scsi-pci,vhost-scsi=vs".  Perhaps I'm doing something wrong.
> 
> Another small bug I found is an ordering problem between
> VHOST_SET_VRING_KICK and VHOST_SCSI_SET_ENDPOINT.  Starting the vq
> causes a "vhost_scsi_handle_vq endpoint not set" error in dmesg.
> Because of this I added the first two patches, which let me do
> VHOST_SCSI_SET_ENDPOINT before VHOST_SET_VRING_KICK but after setting
> up the vring.
> 
> Unfortunately, this is not enough to fix the hang.  And anyway, it's
> probably simpler to avoid the two patches and remove this test from the
> tcm_vhost.c vhost_scsi_set/clear_endpoint functions:
> 
> mutex_lock(&vs->dev.mutex);
> /* Verify that ring has been setup correctly. */
> for (index = 0; index < vs->dev.nvqs; ++index) {
> /* Verify that ring has been setup correctly. */
> if (!vhost_vq_access_ok(&vs->vqs[index])) {
> mutex_unlock(&vs->dev.mutex);
> return -EFAULT;
> }
> }
> mutex_unlock(&vs->dev.mutex);

Well userspace should initialize the kick eventfd to 0,
it seems to init it to 1 which is why we get the error.
But I think the only issue is pr_err: vhost-net already
ignores such a kick with no backend. So let's just
remove it, preferably for 3.8.

--->
tcm_vhost: fix pr_err on early kick

It's OK to get kick before backend is set or after
it is cleared, we can just ignore it.

Signed-off-by: Michael S. Tsirkin 

---

diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c
index b20df5c..22321cf 100644
--- a/drivers/vhost/tcm_vhost.c
+++ b/drivers/vhost/tcm_vhost.c
@@ -575,10 +575,8 @@ static void vhost_scsi_handle_vq(struct vhost_scsi *vs)
 
/* Must use ioctl VHOST_SCSI_SET_ENDPOINT */
tv_tpg = vs->vs_tpg;
-   if (unlikely(!tv_tpg)) {
-   pr_err("%s endpoint not set\n", __func__);
+   if (unlikely(!tv_tpg))
return;
-   }
 
mutex_lock(&vq->mutex);
vhost_disable_notify(&vs->dev, vq);

Re: [Qemu-devel] [PATCH 6/8] linux-user: Rewrite __get_user/__put_user with __builtin_choose_expr

2013-01-31 Thread Peter Maydell

On 23 January 2013 18:31, Laurent Desnogues  wrote:
> On Sat, Jan 5, 2013 at 1:39 AM, Richard Henderson  wrote:
>> +#define __get_user_e(x, hptr, e)\
>> +  ((x) =\
>> +   __builtin_choose_expr(sizeof(*(hptr)) == 1, ldub_p,  \
>> +   __builtin_choose_expr(sizeof(*(hptr)) == 2, lduw_##e##_p,\
>> +   __builtin_choose_expr(sizeof(*(hptr)) == 4, ldl_##e##_p, \
>> +   __builtin_choose_expr(sizeof(*(hptr)) == 8, ldq_##e##_p, abort   \
>> + (hptr), 0)
>
> For 8- and 16-bit quantities the load is explicitly unsigned
> through the use of ldub and lduw.  But for 32-bit, ldl_[bl]e_p
> return an int, so if x is a 64-bit variable sign-extension will
> happen.  I'm not sure this is desirable, for instance when
> using get_user_u32 which makes one think the result is an
> unsigned 32-bit value.  Shouldn't ldul*_p functions be added
> and used in __get_user_e?
>
> Note I found this in private code, but wonder if some public
> code isn't affected by this.

I just did an audit of all the uses of get_user_u32 in the codebase
and I think the only one that runs into this (ie does get_user_u32
into a variable which is 64 bits wide) is the PPC do_store_exclusive()
in linux-user/main.c. So probably this patch broke PPC64 linux-user
32 bit exclusive stores.

-- PMM

Re: [Qemu-devel] [PATCH 1/4] trace: Fix simple trace dropped event record for big endian

2013-01-31 Thread Laszlo Ersek

comments in-line

On 01/25/13 16:43, Markus Armbruster wrote:
> We use atomic operations to keep track of dropped events.
> 
> Inconveniently, GLib supports only int and void * atomics, but the
> counter dropped_events is uint64_t.

That's too bad. Instead of (or in addition to) int, glib should target
the (biggest) natural word size, ie. the integer whose size equals that
of "void *". size_t or uintptr_t looks like a good choice.

> Can't stop commit 62bab732: a
> quick (gint *)&dropped_events bludgeons the compiler into submission.
> 
> That cast is okay only when int is exactly 64 bits wide, which it
> commonly isn't.
> 
> If int is even wider, we clobber whatever follows dropped_events.  Not
> worth worrying about, as none of the machines that interest us have
> such morbidly obese ints.
> 
> That leaves the common case: int narrower than 64 bits.
> 
> Harmless on little endian hosts: we just don't access the most
> significant bits of dropped_events.  They remain zero.
> 
> On big endian hosts, we use only the most significant bits of
> dropped_events as counter.  The least significant bits remain zero.
> However, we write out the full value, which is the correct counter
> shifted left a bunch of places.
> 
> Fix by changing the variables involved to int.
> 
> There's another, equally suspicious-looking (gint *)&trace_idx
> argument to g_atomic_int_compare_and_exchange(), but that one casts
> unsigned *, so it's okay.  But it's also superfluous, because GLib's
> atomic int operations work just fine for unsigned.  Drop it.

Agree with "OK", disagree with "superfluous".

In the intersection of signed int's and unsigned int's domains, the
object representation is indeed required to be the same. But that
doesn't make "signed int" a type compatible with "unsigned int".

Since we have a prototype for g_atomic_int_compare_and_exchange(), "the
arguments are implicitly converted, as if by assignment"; however, for
the pointer-to-pointer assignment here, the argument-ptr and the
parameter-ptr would have to point to compatible types, which "signed
int" and "unsigned int" are not.

gcc's "-pedantic" emits

warning: pointer targets in passing argument N of 'F' differ in signedness

Anyway I don't insist reinstating the cast, I'm just saying that ISO C
would require it.


> 
> Signed-off-by: Markus Armbruster 
> ---
>  trace/simple.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/trace/simple.c b/trace/simple.c
> index ce17d64..ccbdb6a 100644
> --- a/trace/simple.c
> +++ b/trace/simple.c
> @@ -53,7 +53,7 @@ enum {
>  uint8_t trace_buf[TRACE_BUF_LEN];
>  static unsigned int trace_idx;
>  static unsigned int writeout_idx;
> -static uint64_t dropped_events;
> +static int dropped_events;
>  static FILE *trace_fp;
>  static char *trace_file_name;
>  
> @@ -63,7 +63,7 @@ typedef struct {
>  uint64_t timestamp_ns;
>  uint32_t length;   /*in bytes */
>  uint32_t reserved; /*unused */
> -uint8_t arguments[];
> +uint64_t arguments[];
>  } TraceRecord;
>  
>  typedef struct {
> @@ -160,7 +160,7 @@ static gpointer writeout_thread(gpointer opaque)
>  uint8_t bytes[sizeof(TraceRecord) + sizeof(uint64_t)];
>  } dropped;
>  unsigned int idx = 0;
> -uint64_t dropped_count;
> +int dropped_count;
>  size_t unused __attribute__ ((unused));
>  
>  for (;;) {
> @@ -169,16 +169,16 @@ static gpointer writeout_thread(gpointer opaque)
>  if (dropped_events) {
>  dropped.rec.event = DROPPED_EVENT_ID,
>  dropped.rec.timestamp_ns = get_clock();
> -dropped.rec.length = sizeof(TraceRecord) + 
> sizeof(dropped_events),
> +dropped.rec.length = sizeof(TraceRecord) + sizeof(uint64_t),
>  dropped.rec.reserved = 0;
>  while (1) {
>  dropped_count = dropped_events;
> -if (g_atomic_int_compare_and_exchange((gint 
> *)&dropped_events,
> +if (g_atomic_int_compare_and_exchange(&dropped_events,
>dropped_count, 0)) {
>  break;
>  }
>  }
> -memcpy(dropped.rec.arguments, &dropped_count, sizeof(uint64_t));
> +dropped.rec.arguments[0] = dropped_count;

I *think* I'd prefer if the element type of TraceRecord.arguments[] were
not changed; an array of u8's seems to be more flexible. The element
type change appears both unrelated and not really necessary for this
patch. (You'd have to keep the memcpy(), but it doesn't hurt.)

BTW I wonder if trace files are endianness-dependent by design --
normally you'd store them in network byte order (= big endian) only and
use htonX() when writing, an ntohX() when reading.


>  unused = fwrite(&dropped.rec, dropped.rec.length, 1, trace_fp);
>  }
>  
> @@ -220,11 +220,11 @@ int trace_record_start(TraceBufferRecord *rec, 
> TraceEventID event, size_t datasi

Re: [Qemu-devel] How many msi-x vectors should be allocated for the virtio-serial device?

2013-01-31 Thread Michael S. Tsirkin

On Thu, Jan 31, 2013 at 10:13:11AM +0200, Gal Hammer wrote:
> Hi,
> 
> How many msi-x vectors should be allocated for the virtio-serial device?
> 
> I'm asking this as it seems that a proposed patch
> (http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg02094.html)
> was not accepted and I re-encountered this issue while trying to
> upgrade the virtio-serial's Windows driver to use msi-x vectors.
> 
> The virtio-serial's Linux module tries to use one vector per
> virtqueue (every serial port have two virtqueues) with a fall back
> to using only two vectors
> (http://lxr.linux.no/linux+v3.7.2/drivers/virtio/virtio_pci.c#L539).
> The problem is that qemu's virtio-pci device allocate less vectors
> than the modules expects. So, for example, if a serial device have
> 16 ports, 17 vectors are allocated. The module tries to use 34
> vectors, fails and choose to use only 2, leaving 15 unused vectors.
> 
> Is it possible to increase the vectors number from
> "proxy->serial.max_virtserial_ports + 1" to
> "(proxy->serial.max_virtserial_ports + 1) * 2"?
> 
> Thanks,
> 
> Gal.

Allocated MSI-X vectors on x86 are a limited resource.  Let's not waste
them. From what you say virtio serial works fine with two
vectors and I do not see any reason to let us use more
until someone can show an important workload where this helps.

So I think we need something like the below, but we also
need to handle 1.3 and older compatibility so the
patch below shouldn't be applied. Want to try
writing up the complete patch?

-->

serial: use 2 vectors by default

Guests only utilize 2 vectors and this seems to be enough,
so let's only allocate as much.
serial is likely not so performance intensive to require more,
but if yes users can always override the nvectors property.

Signed-off-by: Michael S. Tsirkin 

---

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 9abbcdf..bb0f60e 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -975,8 +975,7 @@ static int virtio_serial_init_pci(PCIDevice *pci_dev)
 if (!vdev) {
 return -1;
 }
-vdev->nvectors = proxy->nvectors == DEV_NVECTORS_UNSPECIFIED
-? proxy->serial.max_virtserial_ports + 
1
+vdev->nvectors = proxy->nvectors == DEV_NVECTORS_UNSPECIFIED ? 2
 : proxy->nvectors;
 virtio_init_pci(proxy, vdev);
 proxy->nvectors = vdev->nvectors;
-- 
MST

Re: [Qemu-devel] [PATCH 2/4] trace: Direct access of atomics is verboten, use the API

2013-01-31 Thread Laszlo Ersek

On 01/25/13 16:43, Markus Armbruster wrote:
> The GLib Reference Manual says:
> 
> It is very important that all accesses to a particular integer or
> pointer be performed using only this API and that different sizes
> of operation are not mixed or used on overlapping memory
> regions. Never read or assign directly from or to a value --
> always use this API.
> 
> Signed-off-by: Markus Armbruster 
> ---
>  trace/simple.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)

Nothing beats a patch summary with a word that English loans from the
submitter's mother tongue! :) Extra points for the Vau.

Reviewed-by: Laszlo Ersek

Re: [Qemu-devel] [PATCH 3/4] trace: Clean up the "try to update atomic until it worked" loops

2013-01-31 Thread Laszlo Ersek

On 01/25/13 16:43, Markus Armbruster wrote:
> Signed-off-by: Markus Armbruster 
> ---
>  trace/simple.c | 18 +-
>  1 file changed, 5 insertions(+), 13 deletions(-)

Reviewed-by: Laszlo Ersek

[Qemu-devel] [PATCH 03/11] poller: add poller_fill() and poller_poll()

2013-01-31 Thread Stefan Hajnoczi

The Windows event loop cannot be converted entirely to g_poll(3).  It
will still need select(2) so add functions that convert between Poller
and rfds/wfds/xfds.  This way most code only needs to be aware of Poller
but it will still be possible to invoke select(2).

Signed-off-by: Stefan Hajnoczi 
---
 include/qemu/poller.h | 12 
 util/poller.c | 46 ++
 2 files changed, 58 insertions(+)

diff --git a/include/qemu/poller.h b/include/qemu/poller.h
index 7630099..39e07d2 100644
--- a/include/qemu/poller.h
+++ b/include/qemu/poller.h
@@ -53,4 +53,16 @@ int poller_add_fd(Poller *p, int fd, int events);
  */
 int poller_get_revents(Poller *p, int index);
 
+/**
+ * poller_fill: Fill fd_sets from a Poller
+ *
+ * Return the maximum file descriptor number.
+ */
+int poller_fill(Poller *p, fd_set *rfds, fd_set *wfds, fd_set *xfds);
+
+/**
+ * poller_poll: Bitwise OR fd_set results into Poller revents
+ */
+void poller_poll(Poller *p, int nfds, fd_set *rfds, fd_set *wfds, fd_set 
*xfds);
+
 #endif /* QEMU_POLLER_H */
diff --git a/util/poller.c b/util/poller.c
index 663e7a9..41e9b96 100644
--- a/util/poller.c
+++ b/util/poller.c
@@ -52,3 +52,49 @@ int poller_get_revents(Poller *p, int index)
 assert(index < p->nfds);
 return p->poll_fds[index].revents;
 }
+
+int poller_fill(Poller *p, fd_set *rfds, fd_set *wfds, fd_set *xfds)
+{
+int nfds = -1;
+int i;
+
+for (i = 0; i < p->nfds; i++) {
+int fd = p->poll_fds[i].fd;
+int events = p->poll_fds[i].events;
+if (events & (G_IO_IN | G_IO_HUP | G_IO_ERR)) {
+FD_SET(fd, rfds);
+nfds = MAX(nfds, fd);
+}
+if (events & (G_IO_OUT | G_IO_ERR)) {
+FD_SET(fd, wfds);
+nfds = MAX(nfds, fd);
+}
+if (events & G_IO_PRI) {
+FD_SET(fd, xfds);
+nfds = MAX(nfds, fd);
+}
+}
+return nfds;
+}
+
+void poller_poll(Poller *p, int nfds, fd_set *rfds, fd_set *wfds, fd_set *xfds)
+{
+int i;
+
+for (i = 0; i < p->nfds; i++) {
+int fd = p->poll_fds[i].fd;
+int revents = 0;
+
+if (FD_ISSET(fd, rfds)) {
+revents |= G_IO_IN | G_IO_HUP | G_IO_ERR;
+}
+if (FD_ISSET(fd, wfds)) {
+revents |= G_IO_OUT | G_IO_ERR;
+}
+if (FD_ISSET(fd, xfds)) {
+revents |= G_IO_PRI;
+}
+revents &= p->poll_fds[i].events;
+p->poll_fds[i].revents |= revents;
+}
+}
-- 
1.8.1

Re: [Qemu-devel] [PATCH 4/4] trace: Fix location of simpletrace.py in docs

2013-01-31 Thread Laszlo Ersek

On 01/25/13 16:43, Markus Armbruster wrote:
> Missed when commit 4c3b5a48 moved it.
> 
> Signed-off-by: Markus Armbruster 
> ---
>  docs/tracing.txt | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Laszlo Ersek

Re: [Qemu-devel] [kvmarm] [RFC v5 7/8] hw/kvm/arm_gic: Implement support for KVM in-kernel ARM GIC

2013-01-31 Thread Peter Maydell

On 31 January 2013 10:54, Andreas Färber  wrote:
> Am 31.01.2013 11:52, schrieb KONRAD Frédéric:
>>> +const char *gictype = "arm-gic";
>> s/arm-gic/arm_gic/ ^^ ?
>>
>> Christoffer and I had trouble with that:
>>
>> qemu-system-arm: Unknown device 'arm-gic' for default sysbus

Oops, nice catch.

> Since you already ran into issues here, even better would be to use a
> TYPE_ARM_GIC constant or so.

Hmm, I kind of agree, but QOM idiom doesn't seem to encourage
having that define be publicly visible. Should we have a
hw/my_device.h [with the public bits like the TYPE_ and
FOO_CLASS/FOO_GET_CLASS macros] for every type? (and a
hw/my_device_priv.h if needed]

-- PMM

Re: [Qemu-devel] [PATCH] Fix monitor 'info registers' command on multi processor guests

2013-01-31 Thread Luiz Capitulino

On Thu, 31 Jan 2013 10:51:10 +0100
Markus Armbruster  wrote:

> "Erlon Cruz"  writes:
> 
> > QEMU monitor command 'info registers' only displays information for the 
> > first
> > CPU. This fix that by show registers information for each CPU in the system
> 
> This is incorrect.  It displays information for the *current* CPU.
> Monitor command "cpu" selects the current CPU.
> 
> I doubt we want to change this command.  But I'm reviewing it regardless
> of my doubts.

I think a better way of doing this would be to add two new optional
arguments to 'info registers': -i (cpu index) and -a (print all CPUs). By
default the command would do what it does today.

[Qemu-devel] [PATCH 05/11] main-loop: switch POSIX glib integration to Poller

2013-01-31 Thread Stefan Hajnoczi

Convert glib file descriptor polling from rfds/wfds/xfds to Poller.

The Windows code still needs poll_fds[] and n_poll_fds but they can now
become local variables.

Signed-off-by: Stefan Hajnoczi 
---
 main-loop.c | 75 -
 1 file changed, 29 insertions(+), 46 deletions(-)

diff --git a/main-loop.c b/main-loop.c
index 8d552d4..c48c8f5 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -144,8 +144,6 @@ int qemu_init_main_loop(void)
 static Poller poller;
 static fd_set rfds, wfds, xfds;
 static int nfds;
-static GPollFD poll_fds[1024 * 2]; /* this is probably overkill */
-static int n_poll_fds;
 static int max_priority;
 
 /* Load rfds/wfds/xfds into Poller.  Will be removed a few commits later. */
@@ -201,34 +199,35 @@ static void poller_to_select(int ret)
 }
 
 #ifndef _WIN32
-static void glib_select_fill(int *max_fd, fd_set *rfds, fd_set *wfds,
- fd_set *xfds, uint32_t *cur_timeout)
+static int glib_poller_idx;
+static int glib_n_poll_fds;
+
+static void glib_poller_fill(uint32_t *cur_timeout)
 {
 GMainContext *context = g_main_context_default();
-int i;
 int timeout = 0;
+int n, n_fds;
 
 g_main_context_prepare(context, &max_priority);
 
-n_poll_fds = g_main_context_query(context, max_priority, &timeout,
-  poll_fds, ARRAY_SIZE(poll_fds));
-g_assert(n_poll_fds <= ARRAY_SIZE(poll_fds));
+n_fds = poller.max - poller.nfds;
+while ((n = g_main_context_query(context, max_priority, &timeout,
+ &poller.poll_fds[poller.nfds],
+ n_fds)) > n_fds) {
+poller.poll_fds = g_realloc(poller.poll_fds,
+(poller.nfds + n) *
+sizeof(poller.poll_fds[0]));
+n_fds = n;
+}
 
-for (i = 0; i < n_poll_fds; i++) {
-GPollFD *p = &poll_fds[i];
+/* Stash away for later */
+glib_poller_idx = poller.nfds;
+glib_n_poll_fds = n;
 
-if ((p->events & G_IO_IN)) {
-FD_SET(p->fd, rfds);
-*max_fd = MAX(*max_fd, p->fd);
-}
-if ((p->events & G_IO_OUT)) {
-FD_SET(p->fd, wfds);
-*max_fd = MAX(*max_fd, p->fd);
-}
-if ((p->events & G_IO_ERR)) {
-FD_SET(p->fd, xfds);
-*max_fd = MAX(*max_fd, p->fd);
-}
+/* Update poller book-keeping */
+poller.nfds += n;
+if (poller.nfds > poller.max) {
+poller.max = poller.nfds;
 }
 
 if (timeout >= 0 && timeout < *cur_timeout) {
@@ -236,30 +235,13 @@ static void glib_select_fill(int *max_fd, fd_set *rfds, 
fd_set *wfds,
 }
 }
 
-static void glib_select_poll(fd_set *rfds, fd_set *wfds, fd_set *xfds,
- bool err)
+static void glib_poller_poll(void)
 {
 GMainContext *context = g_main_context_default();
 
-if (!err) {
-int i;
-
-for (i = 0; i < n_poll_fds; i++) {
-GPollFD *p = &poll_fds[i];
-
-if ((p->events & G_IO_IN) && FD_ISSET(p->fd, rfds)) {
-p->revents |= G_IO_IN;
-}
-if ((p->events & G_IO_OUT) && FD_ISSET(p->fd, wfds)) {
-p->revents |= G_IO_OUT;
-}
-if ((p->events & G_IO_ERR) && FD_ISSET(p->fd, xfds)) {
-p->revents |= G_IO_ERR;
-}
-}
-}
-
-if (g_main_context_check(context, max_priority, poll_fds, n_poll_fds)) {
+if (g_main_context_check(context, max_priority,
+ &poller.poll_fds[glib_poller_idx],
+ glib_n_poll_fds)) {
 g_main_context_dispatch(context);
 }
 }
@@ -268,7 +250,7 @@ static int os_host_main_loop_wait(uint32_t timeout)
 {
 int ret;
 
-glib_select_fill(&nfds, &rfds, &wfds, &xfds, &timeout);
+glib_poller_fill(&timeout);
 
 if (timeout > 0) {
 qemu_mutex_unlock_iothread();
@@ -287,7 +269,7 @@ static int os_host_main_loop_wait(uint32_t timeout)
 qemu_mutex_lock_iothread();
 }
 
-glib_select_poll(&rfds, &wfds, &xfds, (ret < 0));
+glib_poller_poll();
 return ret;
 }
 #else
@@ -384,8 +366,9 @@ void qemu_fd_register(int fd)
 static int os_host_main_loop_wait(uint32_t timeout)
 {
 GMainContext *context = g_main_context_default();
+GPollFD poll_fds[1024 * 2]; /* this is probably overkill */
 int select_ret = 0;
-int g_poll_ret, ret, i;
+int g_poll_ret, ret, i, n_poll_fds;
 PollingEntry *pe;
 WaitObjects *w = &wait_objects;
 gint poll_timeout;
-- 
1.8.1

[Qemu-devel] [PATCH 10/11] aio: convert aio_poll() to g_poll(3)

2013-01-31 Thread Stefan Hajnoczi

AioHandler already has a GPollFD so we can directly use its
events/revents.

Add the int poller_idx field to AioContext so we can map g_poll(3)
results back to AioHandlers.

Reuse aio_dispatch() to invoke handlers after g_poll(3).

Signed-off-by: Stefan Hajnoczi 
---
 aio-posix.c | 62 -
 async.c |  2 ++
 include/block/aio.h |  4 
 3 files changed, 25 insertions(+), 43 deletions(-)

diff --git a/aio-posix.c b/aio-posix.c
index 35131a3..303e19f 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -25,6 +25,7 @@ struct AioHandler
 IOHandler *io_write;
 AioFlushHandler *io_flush;
 int deleted;
+int poller_idx;
 void *opaque;
 QLIST_ENTRY(AioHandler) node;
 };
@@ -85,6 +86,7 @@ void aio_set_fd_handler(AioContext *ctx,
 node->io_write = io_write;
 node->io_flush = io_flush;
 node->opaque = opaque;
+node->poller_idx = -1;
 
 node->pfd.events = (io_read ? G_IO_IN | G_IO_HUP : 0);
 node->pfd.events |= (io_write ? G_IO_OUT : 0);
@@ -177,10 +179,7 @@ static bool aio_dispatch(AioContext *ctx)
 
 bool aio_poll(AioContext *ctx, bool blocking)
 {
-static struct timeval tv0;
 AioHandler *node;
-fd_set rdfds, wrfds;
-int max_fd = -1;
 int ret;
 bool busy, progress;
 
@@ -206,12 +205,13 @@ bool aio_poll(AioContext *ctx, bool blocking)
 
 ctx->walking_handlers++;
 
-FD_ZERO(&rdfds);
-FD_ZERO(&wrfds);
+poller_reset(&ctx->poller);
 
-/* fill fd sets */
+/* fill poller */
 busy = false;
 QLIST_FOREACH(node, &ctx->aio_handlers, node) {
+node->poller_idx = -1;
+
 /* If there aren't pending AIO operations, don't invoke callbacks.
  * Otherwise, if there are no AIO requests, qemu_aio_wait() would
  * wait indefinitely.
@@ -222,13 +222,9 @@ bool aio_poll(AioContext *ctx, bool blocking)
 }
 busy = true;
 }
-if (!node->deleted && node->io_read) {
-FD_SET(node->pfd.fd, &rdfds);
-max_fd = MAX(max_fd, node->pfd.fd + 1);
-}
-if (!node->deleted && node->io_write) {
-FD_SET(node->pfd.fd, &wrfds);
-max_fd = MAX(max_fd, node->pfd.fd + 1);
+if (!node->deleted && node->pfd.events) {
+node->poller_idx = poller_add_fd(&ctx->poller, node->pfd.fd,
+ node->pfd.events);
 }
 }
 
@@ -240,41 +236,21 @@ bool aio_poll(AioContext *ctx, bool blocking)
 }
 
 /* wait until next event */
-ret = select(max_fd, &rdfds, &wrfds, NULL, blocking ? NULL : &tv0);
+ret = g_poll(ctx->poller.poll_fds,
+ ctx->poller.nfds,
+ blocking ? -1 : 0);
 
 /* if we have any readable fds, dispatch event */
 if (ret > 0) {
-/* we have to walk very carefully in case
- * qemu_aio_set_fd_handler is called while we're walking */
-node = QLIST_FIRST(&ctx->aio_handlers);
-while (node) {
-AioHandler *tmp;
-
-ctx->walking_handlers++;
-
-if (!node->deleted &&
-FD_ISSET(node->pfd.fd, &rdfds) &&
-node->io_read) {
-node->io_read(node->opaque);
-progress = true;
-}
-if (!node->deleted &&
-FD_ISSET(node->pfd.fd, &wrfds) &&
-node->io_write) {
-node->io_write(node->opaque);
-progress = true;
-}
-
-tmp = node;
-node = QLIST_NEXT(node, node);
-
-ctx->walking_handlers--;
-
-if (!ctx->walking_handlers && tmp->deleted) {
-QLIST_REMOVE(tmp, node);
-g_free(tmp);
+QLIST_FOREACH(node, &ctx->aio_handlers, node) {
+if (node->poller_idx != -1) {
+node->pfd.revents |=
+poller_get_revents(&ctx->poller, node->poller_idx);
 }
 }
+if (aio_dispatch(ctx)) {
+progress = true;
+}
 }
 
 assert(progress || busy);
diff --git a/async.c b/async.c
index 72d268a..1cd1b89 100644
--- a/async.c
+++ b/async.c
@@ -174,6 +174,7 @@ aio_ctx_finalize(GSource *source)
 
 aio_set_event_notifier(ctx, &ctx->notifier, NULL, NULL);
 event_notifier_cleanup(&ctx->notifier);
+poller_cleanup(&ctx->poller);
 }
 
 static GSourceFuncs aio_source_funcs = {
@@ -198,6 +199,7 @@ AioContext *aio_context_new(void)
 {
 AioContext *ctx;
 ctx = (AioContext *) g_source_new(&aio_source_funcs, sizeof(AioContext));
+poller_init(&ctx->poller);
 event_notifier_init(&ctx->notifier, false);
 aio_set_event_notifier(ctx, &ctx->notifier, 
(EventNotifierHandler *)
diff --git a/include/block/aio.h b/include/block/aio.h
index 8eda924..45ff047 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -17,6 +17,7 @@
 #include "qemu-co

[Qemu-devel] [PATCH 01/11] main-loop: fix select_ret uninitialized variable warning

2013-01-31 Thread Stefan Hajnoczi

Signed-off-by: Stefan Hajnoczi 
---
 main-loop.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/main-loop.c b/main-loop.c
index 6f52ac3..d0d8fe4 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -330,7 +330,8 @@ void qemu_fd_register(int fd)
 static int os_host_main_loop_wait(uint32_t timeout)
 {
 GMainContext *context = g_main_context_default();
-int select_ret, g_poll_ret, ret, i;
+int select_ret = 0;
+int g_poll_ret, ret, i;
 PollingEntry *pe;
 WaitObjects *w = &wait_objects;
 gint poll_timeout;
-- 
1.8.1

Re: [Qemu-devel] [PATCH 1/4] trace: Fix simple trace dropped event record for big endian

2013-01-31 Thread Harsh Bora


On 01/25/2013 09:13 PM, Markus Armbruster wrote:

We use atomic operations to keep track of dropped events.

Inconveniently, GLib supports only int and void * atomics, but the
counter dropped_events is uint64_t.  Can't stop commit 62bab732: a
quick (gint *)&dropped_events bludgeons the compiler into submission.

That cast is okay only when int is exactly 64 bits wide, which it
commonly isn't.

If int is even wider, we clobber whatever follows dropped_events.  Not
worth worrying about, as none of the machines that interest us have
such morbidly obese ints.

That leaves the common case: int narrower than 64 bits.

Harmless on little endian hosts: we just don't access the most
significant bits of dropped_events.  They remain zero.

On big endian hosts, we use only the most significant bits of
dropped_events as counter.  The least significant bits remain zero.
However, we write out the full value, which is the correct counter
shifted left a bunch of places.

Fix by changing the variables involved to int.

There's another, equally suspicious-looking (gint *)&trace_idx
argument to g_atomic_int_compare_and_exchange(), but that one casts
unsigned *, so it's okay.  But it's also superfluous, because GLib's
atomic int operations work just fine for unsigned.  Drop it.

Signed-off-by: Markus Armbruster 
---
  trace/simple.c | 16 
  1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/trace/simple.c b/trace/simple.c
index ce17d64..ccbdb6a 100644
--- a/trace/simple.c
+++ b/trace/simple.c
@@ -53,7 +53,7 @@ enum {
  uint8_t trace_buf[TRACE_BUF_LEN];
  static unsigned int trace_idx;
  static unsigned int writeout_idx;
-static uint64_t dropped_events;
+static int dropped_events;
  static FILE *trace_fp;
  static char *trace_file_name;

@@ -63,7 +63,7 @@ typedef struct {
  uint64_t timestamp_ns;
  uint32_t length;   /*in bytes */
  uint32_t reserved; /*unused */
-uint8_t arguments[];
+uint64_t arguments[];


I am just paranoid about this change, however if the patch has gone 
through a stress testing, I have no objections.



  } TraceRecord;

  typedef struct {
@@ -160,7 +160,7 @@ static gpointer writeout_thread(gpointer opaque)
  uint8_t bytes[sizeof(TraceRecord) + sizeof(uint64_t)];
  } dropped;
  unsigned int idx = 0;
-uint64_t dropped_count;
+int dropped_count;


Just because of Glib's limitations, we are limiting our counting 
ability, that's bad.



  size_t unused __attribute__ ((unused));

  for (;;) {
@@ -169,16 +169,16 @@ static gpointer writeout_thread(gpointer opaque)
  if (dropped_events) {
  dropped.rec.event = DROPPED_EVENT_ID,
  dropped.rec.timestamp_ns = get_clock();
-dropped.rec.length = sizeof(TraceRecord) + sizeof(dropped_events),
+dropped.rec.length = sizeof(TraceRecord) + sizeof(uint64_t),
  dropped.rec.reserved = 0;
  while (1) {
  dropped_count = dropped_events;
-if (g_atomic_int_compare_and_exchange((gint *)&dropped_events,
+if (g_atomic_int_compare_and_exchange(&dropped_events,
dropped_count, 0)) {
  break;
  }
  }
-memcpy(dropped.rec.arguments, &dropped_count, sizeof(uint64_t));
+dropped.rec.arguments[0] = dropped_count;


I like the above improvement for the price paid.

Lets see what Stefan has to say about this patch.

thanks,
Harsh


  unused = fwrite(&dropped.rec, dropped.rec.length, 1, trace_fp);
  }

@@ -220,11 +220,11 @@ int trace_record_start(TraceBufferRecord *rec, 
TraceEventID event, size_t datasi

  if (new_idx - writeout_idx > TRACE_BUF_LEN) {
  /* Trace Buffer Full, Event dropped ! */
-g_atomic_int_inc((gint *)&dropped_events);
+g_atomic_int_inc(&dropped_events);
  return -ENOSPC;
  }

-if (g_atomic_int_compare_and_exchange((gint *)&trace_idx,
+if (g_atomic_int_compare_and_exchange(&trace_idx,
old_idx, new_idx)) {
  break;
  }

Re: [Qemu-devel] [PATCH 2/4] trace: Direct access of atomics is verboten, use the API

2013-01-31 Thread Harsh Bora


On 01/25/2013 09:13 PM, Markus Armbruster wrote:

The GLib Reference Manual says:

 It is very important that all accesses to a particular integer or
 pointer be performed using only this API and that different sizes
 of operation are not mixed or used on overlapping memory
 regions. Never read or assign directly from or to a value --
 always use this API.

Signed-off-by: Markus Armbruster 


Reviewed-by: Harsh Prateek Bora 


---
  trace/simple.c | 9 +
  1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/trace/simple.c b/trace/simple.c
index ccbdb6a..592ff48 100644
--- a/trace/simple.c
+++ b/trace/simple.c
@@ -166,13 +166,13 @@ static gpointer writeout_thread(gpointer opaque)
  for (;;) {
  wait_for_trace_records_available();

-if (dropped_events) {
+if (g_atomic_int_get(&dropped_events)) {
  dropped.rec.event = DROPPED_EVENT_ID,
  dropped.rec.timestamp_ns = get_clock();
  dropped.rec.length = sizeof(TraceRecord) + sizeof(uint64_t),
  dropped.rec.reserved = 0;
  while (1) {
-dropped_count = dropped_events;
+dropped_count = g_atomic_int_get(&dropped_events);
  if (g_atomic_int_compare_and_exchange(&dropped_events,
dropped_count, 0)) {
  break;
@@ -214,7 +214,7 @@ int trace_record_start(TraceBufferRecord *rec, TraceEventID 
event, size_t datasi
  uint64_t timestamp_ns = get_clock();

  while (1) {
-old_idx = trace_idx;
+old_idx = g_atomic_int_get(&trace_idx);
  smp_rmb();
  new_idx = old_idx + rec_len;

@@ -275,7 +275,8 @@ void trace_record_finish(TraceBufferRecord *rec)
  record.event |= TRACE_RECORD_VALID;
  write_to_buffer(rec->tbuf_idx, &record, sizeof(TraceRecord));

-if ((trace_idx - writeout_idx) > TRACE_BUF_FLUSH_THRESHOLD) {
+if ((g_atomic_int_get(&trace_idx) - writeout_idx)
+> TRACE_BUF_FLUSH_THRESHOLD) {
  flush_trace_file(false);
  }
  }

Re: [Qemu-devel] [PATCH 1/4] trace: Fix simple trace dropped event record for big endian

2013-01-31 Thread Markus Armbruster

Laszlo Ersek  writes:

> comments in-line
>
> On 01/25/13 16:43, Markus Armbruster wrote:
>> We use atomic operations to keep track of dropped events.
>> 
>> Inconveniently, GLib supports only int and void * atomics, but the
>> counter dropped_events is uint64_t.
>
> That's too bad. Instead of (or in addition to) int, glib should target
> the (biggest) natural word size, ie. the integer whose size equals that
> of "void *". size_t or uintptr_t looks like a good choice.
>
>> Can't stop commit 62bab732: a
>> quick (gint *)&dropped_events bludgeons the compiler into submission.
>> 
>> That cast is okay only when int is exactly 64 bits wide, which it
>> commonly isn't.
>> 
>> If int is even wider, we clobber whatever follows dropped_events.  Not
>> worth worrying about, as none of the machines that interest us have
>> such morbidly obese ints.
>> 
>> That leaves the common case: int narrower than 64 bits.
>> 
>> Harmless on little endian hosts: we just don't access the most
>> significant bits of dropped_events.  They remain zero.
>> 
>> On big endian hosts, we use only the most significant bits of
>> dropped_events as counter.  The least significant bits remain zero.
>> However, we write out the full value, which is the correct counter
>> shifted left a bunch of places.
>> 
>> Fix by changing the variables involved to int.
>> 
>> There's another, equally suspicious-looking (gint *)&trace_idx
>> argument to g_atomic_int_compare_and_exchange(), but that one casts
>> unsigned *, so it's okay.  But it's also superfluous, because GLib's
>> atomic int operations work just fine for unsigned.  Drop it.
>
> Agree with "OK", disagree with "superfluous".

http://developer.gnome.org/glib/stable/glib-Atomic-Operations.html

The following is a collection of compiler macros to provide atomic
access to integer and pointer-sized values.

The macros that have 'int' in the name will operate on pointers to
gint and guint.

> In the intersection of signed int's and unsigned int's domains, the
> object representation is indeed required to be the same. But that
> doesn't make "signed int" a type compatible with "unsigned int".
>
> Since we have a prototype for g_atomic_int_compare_and_exchange(), "the
> arguments are implicitly converted, as if by assignment"; however, for
> the pointer-to-pointer assignment here, the argument-ptr and the
> parameter-ptr would have to point to compatible types, which "signed
> int" and "unsigned int" are not.
>
> gcc's "-pedantic" emits
>
> warning: pointer targets in passing argument N of 'F' differ in signedness

So GLib reneges on the promise it made in its documentation.  Meh.

> Anyway I don't insist reinstating the cast, I'm just saying that ISO C
> would require it.

I'm leaving the decision to the maintainer.  Stefan?

>> Signed-off-by: Markus Armbruster 
>> ---
>>  trace/simple.c | 16 
>>  1 file changed, 8 insertions(+), 8 deletions(-)
>> 
>> diff --git a/trace/simple.c b/trace/simple.c
>> index ce17d64..ccbdb6a 100644
>> --- a/trace/simple.c
>> +++ b/trace/simple.c
>> @@ -53,7 +53,7 @@ enum {
>>  uint8_t trace_buf[TRACE_BUF_LEN];
>>  static unsigned int trace_idx;
>>  static unsigned int writeout_idx;
>> -static uint64_t dropped_events;
>> +static int dropped_events;
>>  static FILE *trace_fp;
>>  static char *trace_file_name;
>>  
>> @@ -63,7 +63,7 @@ typedef struct {
>>  uint64_t timestamp_ns;
>>  uint32_t length;   /*in bytes */
>>  uint32_t reserved; /*unused */
>> -uint8_t arguments[];
>> +uint64_t arguments[];
>>  } TraceRecord;
>>  
>>  typedef struct {
>> @@ -160,7 +160,7 @@ static gpointer writeout_thread(gpointer opaque)
>>  uint8_t bytes[sizeof(TraceRecord) + sizeof(uint64_t)];
>>  } dropped;
>>  unsigned int idx = 0;
>> -uint64_t dropped_count;
>> +int dropped_count;
>>  size_t unused __attribute__ ((unused));
>>  
>>  for (;;) {
>> @@ -169,16 +169,16 @@ static gpointer writeout_thread(gpointer opaque)
>>  if (dropped_events) {
>>  dropped.rec.event = DROPPED_EVENT_ID,
>>  dropped.rec.timestamp_ns = get_clock();
>> -dropped.rec.length = sizeof(TraceRecord) + 
>> sizeof(dropped_events),
>> +dropped.rec.length = sizeof(TraceRecord) + sizeof(uint64_t),
>>  dropped.rec.reserved = 0;
>>  while (1) {
>>  dropped_count = dropped_events;
>> -if (g_atomic_int_compare_and_exchange((gint 
>> *)&dropped_events,
>> +if (g_atomic_int_compare_and_exchange(&dropped_events,
>>dropped_count, 0)) {
>>  break;
>>  }
>>  }
>> -memcpy(dropped.rec.arguments, &dropped_count, sizeof(uint64_t));
>> +dropped.rec.arguments[0] = dropped_count;
>
> I *think* I'd prefer if the element type of TraceRecord.arguments[] were
> not changed; an array of u8's seems to be

Re: [Qemu-devel] [PATCH] Fix monitor 'info registers' command on multi processor guests

2013-01-31 Thread Markus Armbruster

Luiz Capitulino  writes:

> On Thu, 31 Jan 2013 10:51:10 +0100
> Markus Armbruster  wrote:
>
>> "Erlon Cruz"  writes:
>> 
>> > QEMU monitor command 'info registers' only displays information
>> > for the first
>> > CPU. This fix that by show registers information for each CPU in the system
>> 
>> This is incorrect.  It displays information for the *current* CPU.
>> Monitor command "cpu" selects the current CPU.
>> 
>> I doubt we want to change this command.  But I'm reviewing it regardless
>> of my doubts.
>
> I think a better way of doing this would be to add two new optional
> arguments to 'info registers': -i (cpu index) and -a (print all CPUs). By
> default the command would do what it does today.

Sounds good to me.

[Qemu-devel] [PATCH 07/11] iohandler: switch to Poller

2013-01-31 Thread Stefan Hajnoczi

Convert iohandler_select_fill() and iohandler_select_poll() to use
Poller instead of rfds/wfds/xfds.

Signed-off-by: Stefan Hajnoczi 
---
 include/qemu/main-loop.h |  5 +++--
 iohandler.c  | 33 +++--
 main-loop.c  |  4 ++--
 3 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index e8059c3..49d2abf 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -26,6 +26,7 @@
 #define QEMU_MAIN_LOOP_H 1
 
 #include "block/aio.h"
+#include "qemu/poller.h"
 
 #define SIG_IPI SIGUSR1
 
@@ -297,8 +298,8 @@ void qemu_mutex_unlock_iothread(void);
 /* internal interfaces */
 
 void qemu_fd_register(int fd);
-void qemu_iohandler_fill(int *pnfds, fd_set *readfds, fd_set *writefds, fd_set 
*xfds);
-void qemu_iohandler_poll(fd_set *readfds, fd_set *writefds, fd_set *xfds, int 
rc);
+void qemu_iohandler_fill(Poller *poller);
+void qemu_iohandler_poll(Poller *poller, int rc);
 
 QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque);
 void qemu_bh_schedule_idle(QEMUBH *bh);
diff --git a/iohandler.c b/iohandler.c
index 2523adc..a48bcce 100644
--- a/iohandler.c
+++ b/iohandler.c
@@ -39,6 +39,7 @@ typedef struct IOHandlerRecord {
 void *opaque;
 QLIST_ENTRY(IOHandlerRecord) next;
 int fd;
+int poller_idx;
 bool deleted;
 } IOHandlerRecord;
 
@@ -78,6 +79,7 @@ int qemu_set_fd_handler2(int fd,
 ioh->fd_read = fd_read;
 ioh->fd_write = fd_write;
 ioh->opaque = opaque;
+ioh->poller_idx = -1;
 ioh->deleted = 0;
 qemu_notify_event();
 }
@@ -92,38 +94,49 @@ int qemu_set_fd_handler(int fd,
 return qemu_set_fd_handler2(fd, NULL, fd_read, fd_write, opaque);
 }
 
-void qemu_iohandler_fill(int *pnfds, fd_set *readfds, fd_set *writefds, fd_set 
*xfds)
+void qemu_iohandler_fill(Poller *poller)
 {
 IOHandlerRecord *ioh;
 
 QLIST_FOREACH(ioh, &io_handlers, next) {
+int events = 0;
+
 if (ioh->deleted)
 continue;
 if (ioh->fd_read &&
 (!ioh->fd_read_poll ||
  ioh->fd_read_poll(ioh->opaque) != 0)) {
-FD_SET(ioh->fd, readfds);
-if (ioh->fd > *pnfds)
-*pnfds = ioh->fd;
+events |= G_IO_IN | G_IO_HUP | G_IO_ERR;
 }
 if (ioh->fd_write) {
-FD_SET(ioh->fd, writefds);
-if (ioh->fd > *pnfds)
-*pnfds = ioh->fd;
+events |= G_IO_OUT | G_IO_ERR;
+}
+if (events) {
+ioh->poller_idx = poller_add_fd(poller, ioh->fd, events);
+} else {
+ioh->poller_idx = -1;
 }
 }
 }
 
-void qemu_iohandler_poll(fd_set *readfds, fd_set *writefds, fd_set *xfds, int 
ret)
+void qemu_iohandler_poll(Poller *poller, int ret)
 {
 if (ret > 0) {
 IOHandlerRecord *pioh, *ioh;
 
 QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
-if (!ioh->deleted && ioh->fd_read && FD_ISSET(ioh->fd, readfds)) {
+int revents = 0;
+
+if (!ioh->deleted && ioh->poller_idx != -1) {
+revents = poller_get_revents(poller, ioh->poller_idx);
+}
+
+if (!ioh->deleted && ioh->fd_read &&
+(revents & (G_IO_IN | G_IO_HUP | G_IO_ERR))) {
 ioh->fd_read(ioh->opaque);
 }
-if (!ioh->deleted && ioh->fd_write && FD_ISSET(ioh->fd, writefds)) 
{
+if (!ioh->deleted && ioh->fd_write &&
+(revents & (G_IO_OUT | G_IO_ERR))) {
 ioh->fd_write(ioh->opaque);
 }
 
diff --git a/main-loop.c b/main-loop.c
index 3ecefb6..b638afb 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -461,9 +461,9 @@ int main_loop_wait(int nonblocking)
 slirp_update_timeout(&timeout);
 slirp_poller_fill(&poller);
 #endif
-qemu_iohandler_fill(&nfds, &rfds, &wfds, &xfds);
+qemu_iohandler_fill(&poller);
 ret = os_host_main_loop_wait(timeout);
-qemu_iohandler_poll(&rfds, &wfds, &xfds, ret);
+qemu_iohandler_poll(&poller, ret);
 #ifdef CONFIG_SLIRP
 slirp_poller_poll(&poller, (ret < 0));
 #endif
-- 
1.8.1

[Qemu-devel] [PATCH 06/11] slirp: switch to Poller

2013-01-31 Thread Stefan Hajnoczi

Slirp uses rfds/wfds/xfds more extensively than other QEMU components.

The rarely-used out-of-band TCP data feature is used.  That means we
need the full table of select(2) to g_poll(3) events:

  rfds -> G_IO_IN | G_IO_HUP | G_IO_ERR
  wfds -> G_IO_OUT | G_IO_ERR
  xfds -> G_IO_PRI

I came up with this table by looking at Linux fs/select.c which maps
select(2) to poll(2) internally.

Another detail to watch out for are the global variables that reference
rfds/wfds/xfds during slirp_select_poll().  sofcantrcvmore() and
sofcantsendmore() use these globals to clear fd_set bits.  When
sofcantrcvmore() is called, the wfds bit is cleared so that the write
handler will no longer be run for this iteration of the event loop.

This actually seems buggy to me since TCP connections can be half-closed
and we'd still want to handle data in half-duplex fashion.  I think the
real intention is to avoid running the read/write handler when the
socket has been fully closed.  This is indicated with the SS_NOFDREF
state bit so we now check for it before invoking the TCP write handler.
Note that UDP/ICMP code paths don't care because they are
connectionless.

Note that slirp/ has a lot of tabs and sometimes mixed tabs with spaces.
I followed the style of the surrounding code.

Signed-off-by: Stefan Hajnoczi 
---
 main-loop.c  |   4 +-
 slirp/libslirp.h |   8 ++--
 slirp/main.h |   1 -
 slirp/slirp.c| 113 ---
 slirp/socket.c   |   9 -
 slirp/socket.h   |   2 +
 stubs/slirp.c|   6 +--
 7 files changed, 68 insertions(+), 75 deletions(-)

diff --git a/main-loop.c b/main-loop.c
index c48c8f5..3ecefb6 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -459,13 +459,13 @@ int main_loop_wait(int nonblocking)
 
 #ifdef CONFIG_SLIRP
 slirp_update_timeout(&timeout);
-slirp_select_fill(&nfds, &rfds, &wfds, &xfds);
+slirp_poller_fill(&poller);
 #endif
 qemu_iohandler_fill(&nfds, &rfds, &wfds, &xfds);
 ret = os_host_main_loop_wait(timeout);
 qemu_iohandler_poll(&rfds, &wfds, &xfds, ret);
 #ifdef CONFIG_SLIRP
-slirp_select_poll(&rfds, &wfds, &xfds, (ret < 0));
+slirp_poller_poll(&poller, (ret < 0));
 #endif
 
 qemu_run_all_timers();
diff --git a/slirp/libslirp.h b/slirp/libslirp.h
index 49609c2..839813d 100644
--- a/slirp/libslirp.h
+++ b/slirp/libslirp.h
@@ -2,6 +2,8 @@
 #define _LIBSLIRP_H
 
 #include "qemu-common.h"
+#include "qemu/main-loop.h"
+#include "qemu/poller.h"
 
 struct Slirp;
 typedef struct Slirp Slirp;
@@ -17,11 +19,9 @@ Slirp *slirp_init(int restricted, struct in_addr vnetwork,
 void slirp_cleanup(Slirp *slirp);
 
 void slirp_update_timeout(uint32_t *timeout);
-void slirp_select_fill(int *pnfds,
-   fd_set *readfds, fd_set *writefds, fd_set *xfds);
+void slirp_poller_fill(Poller *poller);
 
-void slirp_select_poll(fd_set *readfds, fd_set *writefds, fd_set *xfds,
-   int select_error);
+void slirp_poller_poll(Poller *poller, int select_error);
 
 void slirp_input(Slirp *slirp, const uint8_t *pkt, int pkt_len);
 
diff --git a/slirp/main.h b/slirp/main.h
index 66e4f92..f2e58cf 100644
--- a/slirp/main.h
+++ b/slirp/main.h
@@ -31,7 +31,6 @@ extern int ctty_closed;
 extern char *slirp_tty;
 extern char *exec_shell;
 extern u_int curtime;
-extern fd_set *global_readfds, *global_writefds, *global_xfds;
 extern struct in_addr loopback_addr;
 extern unsigned long loopback_mask;
 extern char *username;
diff --git a/slirp/slirp.c b/slirp/slirp.c
index 0e6e232..175b089 100644
--- a/slirp/slirp.c
+++ b/slirp/slirp.c
@@ -39,9 +39,6 @@ static const uint8_t special_ethaddr[ETH_ALEN] = {
 
 static const uint8_t zero_ethaddr[ETH_ALEN] = { 0, 0, 0, 0, 0, 0 };
 
-/* XXX: suppress those select globals */
-fd_set *global_readfds, *global_writefds, *global_xfds;
-
 u_int curtime;
 static u_int time_fasttimo, last_slowtimo;
 static int do_slowtimo;
@@ -261,7 +258,6 @@ void slirp_cleanup(Slirp *slirp)
 
 #define CONN_CANFSEND(so) (((so)->so_state & 
(SS_FCANTSENDMORE|SS_ISFCONNECTED)) == SS_ISFCONNECTED)
 #define CONN_CANFRCV(so) (((so)->so_state & (SS_FCANTRCVMORE|SS_ISFCONNECTED)) 
== SS_ISFCONNECTED)
-#define UPD_NFDS(x) if (nfds < (x)) nfds = (x)
 
 void slirp_update_timeout(uint32_t *timeout)
 {
@@ -270,23 +266,15 @@ void slirp_update_timeout(uint32_t *timeout)
 }
 }
 
-void slirp_select_fill(int *pnfds,
-   fd_set *readfds, fd_set *writefds, fd_set *xfds)
+void slirp_poller_fill(Poller *poller)
 {
 Slirp *slirp;
 struct socket *so, *so_next;
-int nfds;
 
 if (QTAILQ_EMPTY(&slirp_instances)) {
 return;
 }
 
-/* fail safe */
-global_readfds = NULL;
-global_writefds = NULL;
-global_xfds = NULL;
-
-nfds = *pnfds;
/*
 * First, TCP sockets
 */
@@ -302,8 +290,12 @@ void slirp_select_fill(int *pnfds,
 
for (so = slirp->tcb.so_next; so != &slirp->tcb;
 so = so_next) {
+

[Qemu-devel] [PULL] virtio,make,pci,e1000,vfio,piix

2013-01-31 Thread Michael S. Tsirkin

The following changes since commit 0893d46014b0300fb8aec92df94effea34d04b61:

  Merge remote-tracking branch 'qemu-kvm/uq/master' into staging (2013-01-29 
16:57:41 -0600)

are available in the git repository at:


  git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_anthony

for you to fetch changes up to 6a659bbff991b0033d1bf1ff71b7d550e0367d99:

  vfio-pci: Enable PCIe extended config space (2013-01-30 01:31:09 +0200)


virtio,make,pci,e1000,vfio,piix

This includes my timestamp generation cleanup,
Amos's and my work on virtio net commands,
pci,e1000,vfio and piix fixes.

Signed-off-by: Michael S. Tsirkin 


Alex Williamson (1):
  vfio-pci: Enable PCIe extended config space

Amos Kong (2):
  virtio-net: introduce a new macaddr control
  virtio-net: rename ctrl rx commands

Jason Baron (1):
  ich9: add support for pci assignment

Laszlo Ersek (1):
  PIIX3: reset the VM when the Reset Control Register's RCPU bit gets set

Michael S. Tsirkin (6):
  e1000: document ICS read behaviour
  rules.mak: cleanup config generation rules
  Makefile: clean timestamp generation rule
  rules/mak: make clean should blow away timestamp files
  virtio-net: revert mac on reset
  virtio-net: remove layout assumptions for ctrl vq

 hw/e1000.c  |  10 
 hw/ich9.h   |   1 +
 hw/lpc_ich9.c   |  33 
 hw/pc_piix.c|   4 ++
 hw/pc_q35.c |   1 +
 hw/piix_pci.c   |  70 -
 hw/vfio_pci.c   |   4 ++
 hw/virtio-net.c | 143 
 hw/virtio-net.h |  26 ++
 rules.mak   |  14 +++--
 trace/Makefile.objs |   4 +-
 11 files changed, 238 insertions(+), 72 deletions(-)

Re: [Qemu-devel] [PATCH v2 01/12] host-utils: add ffsl

2013-01-31 Thread Stefan Hajnoczi

On Wed, Jan 16, 2013 at 06:31:08PM +0100, Paolo Bonzini wrote:
> We can provide fast versions based on the other functions defined
> by host-utils.h.  Some care is required on glibc, which provides
> ffsl already.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  include/qemu/host-utils.h |   26 ++
>  1 files changed, 26 insertions(+), 0 deletions(-)

Speaking of ffsl() issues, mingw32 now complains about ffsl() being
undefined in hbitmap:

In file included from /home/stefanha/qemu/include/block/block_int.h:35:0,
 from qemu-img.c:32:
/home/stefanha/qemu/include/qemu/hbitmap.h: In function 'hbitmap_iter_next':
/home/stefanha/qemu/include/qemu/hbitmap.h:173:5: warning: implicit declaration 
of function 'ffsl' [-Wimplicit-function-declaration]

Using mingw32-gcc-4.7.2-7.fc18.  I haven't looked into this but I
thought ffsl() was a gcc built-in and it wouldn't require any headers.

Stefan

[Qemu-devel] [PATCH 02/11] poller: add Poller for growable GPollFD arrays

2013-01-31 Thread Stefan Hajnoczi

QEMU currently uses select(2)-style rfds/wfds/xfds for file descriptor
event polling.  Unfortunately the underlying fd_set type and its macros
(FD_SET()) have a hardcoded maximum for file descriptors.  It is
possible to exceed this limit so we need a more scalable event polling
structure.

Poller is a growable array of GPollFDs that will allow us to use
g_poll(3) instead of select(2) in the future.

Signed-off-by: Stefan Hajnoczi 
---
 include/qemu/poller.h | 56 +++
 util/Makefile.objs|  1 +
 util/poller.c | 54 +
 3 files changed, 111 insertions(+)
 create mode 100644 include/qemu/poller.h
 create mode 100644 util/poller.c

diff --git a/include/qemu/poller.h b/include/qemu/poller.h
new file mode 100644
index 000..7630099
--- /dev/null
+++ b/include/qemu/poller.h
@@ -0,0 +1,56 @@
+/*
+ * Growable GPollFD arrays
+ *
+ * Copyright 2013 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *   Stefan Hajnoczi
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_POLLER_H
+#define QEMU_POLLER_H
+
+#include "qemu-common.h"
+
+typedef struct {
+GPollFD *poll_fds;
+int nfds;
+int max;
+} Poller;
+
+void poller_init(Poller *p);
+void poller_cleanup(Poller *p);
+
+/**
+ * poller_reset: Forget all added file descriptors
+ */
+void poller_reset(Poller *p);
+
+/**
+ * poller_add_fd: Enable event polling on a file descriptor
+ *
+ * Called by *_fill() functions that are invoked by the main loop to enable
+ * event polling on file descriptors that they care about.  The results can be
+ * fetched with poller_get_revents() when the corresponding *_poll() function
+ * is invoked by the main loop.
+ *
+ * Return an index that can be used with poller_get_revents().
+ *
+ * @events: GIOCondition (G_IO_IN, G_IO_OUT, etc) bitmask
+ */
+int poller_add_fd(Poller *p, int fd, int events);
+
+/**
+ * poller_get_revents: Fetch event polling results for an added file descriptor
+ *
+ * Return the GIOCondition (G_IO_IN, G_IO_OUT, etc) bitmask.
+ *
+ * @index: Return value from poller_add_fd().  This is not an fd!
+ */
+int poller_get_revents(Poller *p, int index);
+
+#endif /* QEMU_POLLER_H */
diff --git a/util/Makefile.objs b/util/Makefile.objs
index 495a178..7ff18d6 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -8,3 +8,4 @@ util-obj-y += error.o qemu-error.o
 util-obj-$(CONFIG_POSIX) += compatfd.o
 util-obj-y += iov.o aes.o qemu-config.o qemu-sockets.o uri.o notify.o
 util-obj-y += qemu-option.o qemu-progress.o
+util-obj-y += poller.o
diff --git a/util/poller.c b/util/poller.c
new file mode 100644
index 000..663e7a9
--- /dev/null
+++ b/util/poller.c
@@ -0,0 +1,54 @@
+/*
+ * Growable GPollFD arrays
+ *
+ * Copyright 2013 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *   Stefan Hajnoczi
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include 
+#include 
+#include "qemu/poller.h"
+
+void poller_init(Poller *p)
+{
+memset(p, 0, sizeof(*p));
+}
+
+void poller_cleanup(Poller *p)
+{
+g_free(p->poll_fds);
+}
+
+void poller_reset(Poller *p)
+{
+p->nfds = 0;
+}
+
+int poller_add_fd(Poller *p, int fd, int events)
+{
+int index;
+
+index = p->nfds++;
+if (index >= p->max) {
+p->max += 64; /* avoid constant reallocation */
+p->poll_fds = g_realloc(p->poll_fds, p->max * sizeof(p->poll_fds[0]));
+}
+
+p->poll_fds[index] = (GPollFD){
+.fd = fd,
+.events = events,
+};
+return index;
+}
+
+int poller_get_revents(Poller *p, int index)
+{
+assert(index < p->nfds);
+return p->poll_fds[index].revents;
+}
-- 
1.8.1

Re: [Qemu-devel] [PATCH 1/4] trace: Fix simple trace dropped event record for big endian

2013-01-31 Thread Laszlo Ersek

On 01/31/13 13:10, Markus Armbruster wrote:
> Laszlo Ersek  writes:
>> On 01/25/13 16:43, Markus Armbruster wrote:

>>> @@ -63,7 +63,7 @@ typedef struct {
>>>  uint64_t timestamp_ns;
>>>  uint32_t length;   /*in bytes */
>>>  uint32_t reserved; /*unused */
>>> -uint8_t arguments[];
>>> +uint64_t arguments[];
>>>  } TraceRecord;
>>>  
>>>  typedef struct {
>>> @@ -160,7 +160,7 @@ static gpointer writeout_thread(gpointer opaque)
>>>  uint8_t bytes[sizeof(TraceRecord) + sizeof(uint64_t)];
>>>  } dropped;
>>>  unsigned int idx = 0;
>>> -uint64_t dropped_count;
>>> +int dropped_count;
>>>  size_t unused __attribute__ ((unused));
>>>  
>>>  for (;;) {
>>> @@ -169,16 +169,16 @@ static gpointer writeout_thread(gpointer opaque)
>>>  if (dropped_events) {
>>>  dropped.rec.event = DROPPED_EVENT_ID,
>>>  dropped.rec.timestamp_ns = get_clock();
>>> -dropped.rec.length = sizeof(TraceRecord) + 
>>> sizeof(dropped_events),
>>> +dropped.rec.length = sizeof(TraceRecord) + sizeof(uint64_t),
>>>  dropped.rec.reserved = 0;
>>>  while (1) {
>>>  dropped_count = dropped_events;
>>> -if (g_atomic_int_compare_and_exchange((gint 
>>> *)&dropped_events,
>>> +if (g_atomic_int_compare_and_exchange(&dropped_events,
>>>dropped_count, 0)) {
>>>  break;
>>>  }
>>>  }
>>> -memcpy(dropped.rec.arguments, &dropped_count, 
>>> sizeof(uint64_t));
>>> +dropped.rec.arguments[0] = dropped_count;
>>
>> I *think* I'd prefer if the element type of TraceRecord.arguments[] were
>> not changed; an array of u8's seems to be more flexible. The element
>> type change appears both unrelated and not really necessary for this
>> patch. (You'd have to keep the memcpy(), but it doesn't hurt.)
> 
> Casting uint64_t[] to uint8_t is safe.  Casting uint8_t[] to FOO * isn't
> when alignof(FOO) > 1.  That's why I really don't like uint8_t[] there.

The thought of casting &arguments[N] to another pointer type didn't
cross my mind (independently of the element type). Who would do such a
thing? memcpy() is pretty much a requirement in this case, in my world.

> It happens to be amply aligned, and it happens to be used only with
> memcpy().  But neither is immediately obvious, or obviously bound to
> stay that way.

Changing the element type to uint64_t doesn't guarantee in general that
(type*)&arguments[N] will be a correctly aligned pointer. (The
guaranteed cases are when "type" is a character type, void, or uint64_t).

Anyway my reviewed-by stands.

Laszlo

Re: [Qemu-devel] [PATCH 4/4] trace: Fix location of simpletrace.py in docs

2013-01-31 Thread Harsh Bora


On 01/25/2013 09:13 PM, Markus Armbruster wrote:

Missed when commit 4c3b5a48 moved it.

Signed-off-by: Markus Armbruster 


Reviewed-by: Harsh Prateek Bora 


---
  docs/tracing.txt | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/tracing.txt b/docs/tracing.txt
index 453cc4a..14db3bf 100644
--- a/docs/tracing.txt
+++ b/docs/tracing.txt
@@ -23,7 +23,7 @@ for debugging, profiling, and observing execution.

  4. Pretty-print the binary trace file:

-./simpletrace.py trace-events trace-*
+./scripts/simpletrace.py trace-events trace-*

  == Trace events ==

@@ -198,7 +198,7 @@ The "simple" backend produces binary trace files that can 
be formatted with the
  simpletrace.py script.  The script takes the "trace-events" file and the 
binary
  trace:

-./simpletrace.py trace-events trace-12345
+./scripts/simpletrace.py trace-events trace-12345

  You must ensure that the same "trace-events" file was used to build QEMU,
  otherwise trace event declarations may have changed and output will not be

[Qemu-devel] [PATCH for-1.4] linux-user: Restore cast to target type in get_user()

2013-01-31 Thread Peter Maydell

Commit 658f2dc97 accidentally dropped the cast to the target type of
the value loaded by get_user().  The most visible effect of this would
be that the sequence "uint64_t v; get_user_u32(v, addr)" would sign
extend the 32 bit loaded value into v rather than zero extending as
would be expected for a _u32 accessor.  Put the cast back again to
restore the old behaviour.

Signed-off-by: Peter Maydell 
---
 linux-user/qemu.h |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/linux-user/qemu.h b/linux-user/qemu.h
index 31a220a..b10e957 100644
--- a/linux-user/qemu.h
+++ b/linux-user/qemu.h
@@ -306,12 +306,12 @@ static inline int access_ok(int type, abi_ulong addr, 
abi_ulong size)
  ((hptr), (x)), 0)
 
 #define __get_user_e(x, hptr, e)\
-  ((x) =\
+  ((x) = (typeof(*hptr))(   \
__builtin_choose_expr(sizeof(*(hptr)) == 1, ldub_p,  \
__builtin_choose_expr(sizeof(*(hptr)) == 2, lduw_##e##_p,\
__builtin_choose_expr(sizeof(*(hptr)) == 4, ldl_##e##_p, \
__builtin_choose_expr(sizeof(*(hptr)) == 8, ldq_##e##_p, abort   \
- (hptr), 0)
+ (hptr)), 0)
 
 #ifdef TARGET_WORDS_BIGENDIAN
 # define __put_user(x, hptr)  __put_user_e(x, hptr, be)
-- 
1.7.9.5

Re: [Qemu-devel] [PATCH 3/4] trace: Clean up the "try to update atomic until it worked" loops

2013-01-31 Thread Harsh Bora


On 01/25/2013 09:13 PM, Markus Armbruster wrote:

Signed-off-by: Markus Armbruster 
---
  trace/simple.c | 18 +-
  1 file changed, 5 insertions(+), 13 deletions(-)


Reviewed-by: Harsh Prateek Bora 



diff --git a/trace/simple.c b/trace/simple.c
index 592ff48..74701e3 100644
--- a/trace/simple.c
+++ b/trace/simple.c
@@ -171,13 +171,10 @@ static gpointer writeout_thread(gpointer opaque)
  dropped.rec.timestamp_ns = get_clock();
  dropped.rec.length = sizeof(TraceRecord) + sizeof(uint64_t),
  dropped.rec.reserved = 0;
-while (1) {
+do {
  dropped_count = g_atomic_int_get(&dropped_events);
-if (g_atomic_int_compare_and_exchange(&dropped_events,
-  dropped_count, 0)) {
-break;
-}
-}
+} while (!g_atomic_int_compare_and_exchange(&dropped_events,
+dropped_count, 0));
  dropped.rec.arguments[0] = dropped_count;
  unused = fwrite(&dropped.rec, dropped.rec.length, 1, trace_fp);
  }
@@ -213,7 +210,7 @@ int trace_record_start(TraceBufferRecord *rec, TraceEventID 
event, size_t datasi
  uint32_t rec_len = sizeof(TraceRecord) + datasize;
  uint64_t timestamp_ns = get_clock();

-while (1) {
+do {
  old_idx = g_atomic_int_get(&trace_idx);
  smp_rmb();
  new_idx = old_idx + rec_len;
@@ -223,12 +220,7 @@ int trace_record_start(TraceBufferRecord *rec, 
TraceEventID event, size_t datasi
  g_atomic_int_inc(&dropped_events);
  return -ENOSPC;
  }
-
-if (g_atomic_int_compare_and_exchange(&trace_idx,
-  old_idx, new_idx)) {
-break;
-}
-}
+} while (!g_atomic_int_compare_and_exchange(&trace_idx, old_idx, new_idx));

  idx = old_idx % TRACE_BUF_LEN;

Re: [Qemu-devel] [kvmarm] [RFC v5 7/8] hw/kvm/arm_gic: Implement support for KVM in-kernel ARM GIC

2013-01-31 Thread Andreas Färber

Am 31.01.2013 11:52, schrieb KONRAD Frédéric:
> On 24/01/2013 16:43, Peter Maydell wrote:
>> Implement support for using the KVM in-kernel GIC for ARM.
>>
>> Signed-off-by: Peter Maydell 
>> ---
>>   hw/a15mpcore.c   |8 ++-
>>   hw/arm/Makefile.objs |1 +
>>   hw/kvm/arm_gic.c |  169
>> ++
>>   3 files changed, 177 insertions(+), 1 deletion(-)
>>   create mode 100644 hw/kvm/arm_gic.c
>>
>> diff --git a/hw/a15mpcore.c b/hw/a15mpcore.c
>> index fe6c34c..1ca6f28 100644
>> --- a/hw/a15mpcore.c
>> +++ b/hw/a15mpcore.c
>> @@ -19,6 +19,7 @@
>>*/
>> #include "sysbus.h"
>> +#include "sysemu/kvm.h"
>> /* A15MP private memory region.  */
>>   @@ -40,8 +41,13 @@ static int a15mp_priv_init(SysBusDevice *dev)
>>   {
>>   A15MPPrivState *s = FROM_SYSBUS(A15MPPrivState, dev);
>>   SysBusDevice *busdev;
>> +const char *gictype = "arm-gic";
> s/arm-gic/arm_gic/ ^^ ?
> 
> Christoffer and I had trouble with that:
> 
> qemu-system-arm: Unknown device 'arm-gic' for default sysbus

Since you already ran into issues here, even better would be to use a
TYPE_ARM_GIC constant or so.

Andreas

> 
> Fred
>>   -s->gic = qdev_create(NULL, "arm_gic");
>> +if (kvm_irqchip_in_kernel()) {
>> +gictype = "kvm-arm-gic";
>> +}
>> +
>> +s->gic = qdev_create(NULL, gictype);
>>   qdev_prop_set_uint32(s->gic, "num-cpu", s->num_cpu);
>>   qdev_prop_set_uint32(s->gic, "num-irq", s->num_irq);
> 


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] QEMU buildbot maintenance state

2013-01-31 Thread Stefan Hajnoczi

On Wed, Jan 30, 2013 at 10:31:22AM +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> > Gerd: Are you willing to co-maintain the QEMU buildmaster with Daniel
> > and Christian?  It would be awesome if you could do this given your
> > experience running and customizing buildbot.
> 
> I'll try to set aside some time for that.  Christians idea to host the
> config at github is good, that certainly makes it easier to balance
> things to more people.
> 
> Another thing which would be helpful:  Any chance we can setup a
> maintainer tree mirror @ git.qemu.org?  A single repository where each
> maintainer tree shows up as a branch?
> 
> This would make the buildbot setup *alot* easier.  We can go for a
> AnyBranchScheduler then with BuildFactory and BuildConfig shared,
> instead of needing one BuildFactory and BuildConfig per branch.  Also
> makes the buildbot web interface less cluttered as we don't have a
> insane amount of BuildConfigs any more.  And saves some resources
> (bandwidth + diskspace) for the buildslaves.
> 
> I think people who want to look what is coming or who want to test stuff
> cooking it would be a nice service too if they have a one-stop shop
> where they can get everything.

I sent a pull request that makes the BuildFactory definitions simpler
using a single create_build_factory() function:

https://github.com/b1-systems/buildbot/pull/1

Keep in mind that BuildFactories differ not just by repo/branch but
also:
 * in-tree or out-of-tree
 * extra ./configure arguments
 * gmake instead of make

I think this means it is not as simple as defining a single
BuildFactory.

Stefan

Re: [Qemu-devel] QEMU buildbot maintenance state

2013-01-31 Thread Stefan Hajnoczi

On Wed, Jan 30, 2013 at 10:31:22AM +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> > Gerd: Are you willing to co-maintain the QEMU buildmaster with Daniel
> > and Christian?  It would be awesome if you could do this given your
> > experience running and customizing buildbot.
> 
> I'll try to set aside some time for that.

Excellent, thank you!

Stefan

Re: [Qemu-devel] [PATCH 02/11] poller: add Poller for growable GPollFD arrays

2013-01-31 Thread Anthony Liguori

Stefan Hajnoczi  writes:

> QEMU currently uses select(2)-style rfds/wfds/xfds for file descriptor
> event polling.  Unfortunately the underlying fd_set type and its macros
> (FD_SET()) have a hardcoded maximum for file descriptors.  It is
> possible to exceed this limit so we need a more scalable event polling
> structure.
>
> Poller is a growable array of GPollFDs that will allow us to use
> g_poll(3) instead of select(2) in the future.
>
> Signed-off-by: Stefan Hajnoczi 

How about using a GArray instead?

Regards,

Anthony Liguori

> ---
>  include/qemu/poller.h | 56 
> +++
>  util/Makefile.objs|  1 +
>  util/poller.c | 54 +
>  3 files changed, 111 insertions(+)
>  create mode 100644 include/qemu/poller.h
>  create mode 100644 util/poller.c
>
> diff --git a/include/qemu/poller.h b/include/qemu/poller.h
> new file mode 100644
> index 000..7630099
> --- /dev/null
> +++ b/include/qemu/poller.h
> @@ -0,0 +1,56 @@
> +/*
> + * Growable GPollFD arrays
> + *
> + * Copyright 2013 Red Hat, Inc. and/or its affiliates
> + *
> + * Authors:
> + *   Stefan Hajnoczi
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> + * See the COPYING.LIB file in the top-level directory.
> + *
> + */
> +
> +#ifndef QEMU_POLLER_H
> +#define QEMU_POLLER_H
> +
> +#include "qemu-common.h"
> +
> +typedef struct {
> +GPollFD *poll_fds;
> +int nfds;
> +int max;
> +} Poller;
> +
> +void poller_init(Poller *p);
> +void poller_cleanup(Poller *p);
> +
> +/**
> + * poller_reset: Forget all added file descriptors
> + */
> +void poller_reset(Poller *p);
> +
> +/**
> + * poller_add_fd: Enable event polling on a file descriptor
> + *
> + * Called by *_fill() functions that are invoked by the main loop to enable
> + * event polling on file descriptors that they care about.  The results can 
> be
> + * fetched with poller_get_revents() when the corresponding *_poll() function
> + * is invoked by the main loop.
> + *
> + * Return an index that can be used with poller_get_revents().
> + *
> + * @events: GIOCondition (G_IO_IN, G_IO_OUT, etc) bitmask
> + */
> +int poller_add_fd(Poller *p, int fd, int events);
> +
> +/**
> + * poller_get_revents: Fetch event polling results for an added file 
> descriptor
> + *
> + * Return the GIOCondition (G_IO_IN, G_IO_OUT, etc) bitmask.
> + *
> + * @index: Return value from poller_add_fd().  This is not an fd!
> + */
> +int poller_get_revents(Poller *p, int index);
> +
> +#endif /* QEMU_POLLER_H */
> diff --git a/util/Makefile.objs b/util/Makefile.objs
> index 495a178..7ff18d6 100644
> --- a/util/Makefile.objs
> +++ b/util/Makefile.objs
> @@ -8,3 +8,4 @@ util-obj-y += error.o qemu-error.o
>  util-obj-$(CONFIG_POSIX) += compatfd.o
>  util-obj-y += iov.o aes.o qemu-config.o qemu-sockets.o uri.o notify.o
>  util-obj-y += qemu-option.o qemu-progress.o
> +util-obj-y += poller.o
> diff --git a/util/poller.c b/util/poller.c
> new file mode 100644
> index 000..663e7a9
> --- /dev/null
> +++ b/util/poller.c
> @@ -0,0 +1,54 @@
> +/*
> + * Growable GPollFD arrays
> + *
> + * Copyright 2013 Red Hat, Inc. and/or its affiliates
> + *
> + * Authors:
> + *   Stefan Hajnoczi
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> + * See the COPYING.LIB file in the top-level directory.
> + *
> + */
> +
> +#include 
> +#include 
> +#include "qemu/poller.h"
> +
> +void poller_init(Poller *p)
> +{
> +memset(p, 0, sizeof(*p));
> +}
> +
> +void poller_cleanup(Poller *p)
> +{
> +g_free(p->poll_fds);
> +}
> +
> +void poller_reset(Poller *p)
> +{
> +p->nfds = 0;
> +}
> +
> +int poller_add_fd(Poller *p, int fd, int events)
> +{
> +int index;
> +
> +index = p->nfds++;
> +if (index >= p->max) {
> +p->max += 64; /* avoid constant reallocation */
> +p->poll_fds = g_realloc(p->poll_fds, p->max * 
> sizeof(p->poll_fds[0]));
> +}
> +
> +p->poll_fds[index] = (GPollFD){
> +.fd = fd,
> +.events = events,
> +};
> +return index;
> +}
> +
> +int poller_get_revents(Poller *p, int index)
> +{
> +assert(index < p->nfds);
> +return p->poll_fds[index].revents;
> +}
> -- 
> 1.8.1

Re: [Qemu-devel] QEMU buildbot maintenance state

2013-01-31 Thread Christian Berendt


On 01/31/2013 01:54 PM, Stefan Hajnoczi wrote:

I sent a pull request that makes the BuildFactory definitions simpler
using a single create_build_factory() function:

https://github.com/b1-systems/buildbot/pull/1


Stefan, I'll have a look later this day.

Christian.

--
Christian Berendt
Tel.: +49-171-5542175
Mail: bere...@b1-systems.de

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537

Re: [Qemu-devel] RFC migration of zero pages

2013-01-31 Thread Peter Lieven

RFC patch is attached. Comments appreciated.
I have two concerns left:
a) what happens if a page turns from zero to non-zero in the first stage. Is
this page transferred in the same round or in the next?
b) what happens if live migration fails or is aborted and then again
a migration is started to the same target (if this is possible). Is the
memory at the target reinitialized?

Am 31.01.2013 um 10:37 schrieb Orit Wasserman :

> On 01/31/2013 11:25 AM, Peter Lieven wrote:
>> 
>> Am 31.01.2013 um 10:19 schrieb Orit Wasserman :
>> 
>>> On 01/31/2013 11:00 AM, Peter Lieven wrote:
 
 Am 31.01.2013 um 09:59 schrieb Orit Wasserman :
 
> On 01/31/2013 10:37 AM, Peter Lieven wrote:
>> 
>> Am 31.01.2013 um 09:33 schrieb Orit Wasserman :
>> 
>>> On 01/31/2013 10:10 AM, Peter Lieven wrote:
 
 Am 31.01.2013 um 08:47 schrieb Orit Wasserman :
 
> On 01/31/2013 08:57 AM, Peter Lieven wrote:
>> Hi,
>> 
>> I just came across an idea and would like to have feedback if it 
>> makes sence or not.
>> 
>> If a VM is started without preallocated memory all memory that has 
>> not been written to
>> reads as zeros, right?
> Hi,
> No the memory will be unmapped (we allocate on demand).
 
 Yes, but those unmapped pages will read as zeroes if the guest 
 accesses it?
>>> yes.
 
>> If a VM with a lot of unwritten memory is migrated or if the memory 
>> contains a lot
>> of zeroed out memory (e.g. Windows or Linux guest with page 
>> sanitization) all this memory
>> is allocated on the target during live migration. Especially with 
>> KSM this leads
>> to the problem that this memory is allocated and might be not 
>> available completely as
>> merging of the pages will happen async.
>> 
>> Wouldn't it make sense to not send zero pages in the first round 
>> where the complete
>> ram is sent (if it is detectable that we are in this stage)?
> We send one byte per zero page at the moment (see is_dup_page) we can 
> further optimizing it
> by not sending it.
> I have to point out that this is a very idle guest and we need to 
> work on a loaded guest 
> which is the more hard problem in migration.
 
 I was not talking about saving one byte (+ 8 bytes for header), my 
 concern was that we memset all (dup) pages
 including the special case of a zero dup page on the migration target. 
 This allocates the memory or does it not?
 
>>> 
 If my above assumption that the guest reads unmapped memory as zeroes 
 is right, this mapping
 is not necessary in the case of a zero dup page.
 
 We just have to make sure that we are still in the very first round 
 when deciding not to sent
 a zero page, because otherwise it could be a page that has become zero 
 during migration and
 this of course has to be transferred.
>>> 
>>> OK, so if we won't send the pages than it won't be allocate in the dst 
>>> and it can improve both 
>>> memory usage and reduce cpu consumption on it.
>>> That can be good for over commit scenario.
>> 
>> Yes. On the Source host those zero pages have likely all been merged by 
>> KSM already, but on the destination
>> they are allocated and initially consume real memory. This can be a 
>> problem if a lot of incoming migrations happen
>> at the same time.
> 
> That can be very effective.
>> 
 
> 
> Also I notice that the bottle neck in migrating unmapped pages is the 
> detection of those pages
> because we map the pages in order to check them, for a large guest 
> this is very expensive as mapping a page
> results in a page fault in the host.
> So what will be very helpful is actually locating those pages without 
> mapping them
> which looks very complicated.
 
 This would be a nice improvement, but as you said a guest will sooner 
 or later allocate
 all memory if it is not totally idle. However, bigger parts of this 
 memory might have been reset to zeroes.
 This happens on page deallocation in a Windows Guest by default and 
 can also be enforced in LInux
 with page sanitization.
>>> 
>>> true, but it those cases we will want to zero the page in the dst as 
>>> this is done for security reasons.
>> 
>> if i migrate it to a destination where initially all memory is unmapped 
>> not migrating the zero page turns it
>> into an unmapped page (which reads a zero?). where is the security 
>> problem? its like rethinning on a storage.
>> Or do I understa

Re: [Qemu-devel] [PATCH] vmdk: Allow space in file name

2013-01-31 Thread Stefan Hajnoczi

On Tue, Jan 29, 2013 at 10:50:31PM +0100, Philipp Hahn wrote:
> The previous scanf() format string stopped parsing the file name on the
> first white white space, which seems to be allowed at least by VMware
> Wokrstation.
> 
> Change the format string to collect everything between the first and
> second quote as the file name, disallowing line breaks.
> 
> Signed-off-by: Philipp Hahn 
> ---
>  block/vmdk.c |   10 +-
>  1 files changed, 1 insertions(+), 9 deletions(-)
> 
> [V2] Also remove " striping code.
>  Add \n\r to stop character set.

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan

Re: [Qemu-devel] [PATCH] Fix monitor 'info registers' command on multi processor guests

2013-01-31 Thread Erlon Cruz

On 01/31/2013 07:51 AM, Markus Armbruster wrote:
> "Erlon Cruz"  writes:
>
>> QEMU monitor command 'info registers' only displays information for the first
>> CPU. This fix that by show registers information for each CPU in the system
> This is incorrect.  It displays information for the *current* CPU.
> Monitor command "cpu" selects the current CPU.
That is a bit confusing. May be pointing in the command documentation 
that the current cpu could be changed would help.
>
> I doubt we want to change this command.  But I'm reviewing it regardless
> of my doubts.
>
>> Signed-off-by: erlon.c...@fit-tecnologia.org.br
>> ---
>>   monitor.c |   19 +++
>>   1 file changed, 15 insertions(+), 4 deletions(-)
>>
>> diff --git a/monitor.c b/monitor.c
>> index c0e32d6..70e9227 100644
>> --- a/monitor.c
>> +++ b/monitor.c
>> @@ -898,8 +898,14 @@ int monitor_get_cpu_index(void)
>>   static void do_info_registers(Monitor *mon)
>>   {
>>   CPUArchState *env;
>> -env = mon_get_cpu();
>> -cpu_dump_state(env, (FILE *)mon, monitor_fprintf, CPU_DUMP_FPU);
>> +
>> +for (env = first_cpu; env != NULL; env = env->next_cpu) {
>> +monitor_fprintf((FILE *)mon, "= "
>> +"Registers on CPU %d =\n",
>> +env->cpu_index);
>> +cpu_dump_state(env, (FILE *)mon, monitor_fprintf, CPU_DUMP_FPU);
>> +monitor_fprintf((FILE *)mon, "\n");
>> +}
>>   }
>>   
> You lost the cpu_synchronize_state() in mon_get_cpu().  Why is that
> okay?
I missed that. It will show inconsistent information on KVM guests if we 
dont use cpu_syncronize_state().
>
>>   static void do_info_jit(Monitor *mon)
>> @@ -930,8 +936,13 @@ static void do_info_cpu_stats(Monitor *mon)
>>   {
>>   CPUArchState *env;
>>   
>> -env = mon_get_cpu();
>> -cpu_dump_statistics(env, (FILE *)mon, &monitor_fprintf, 0);
>> +for (env = first_cpu; env != NULL; env = env->next_cpu) {
>> +monitor_fprintf((FILE *)mon, "= "
>> +"Statistics for CPU %d =\n",
>> +env->cpu_index);
>> +cpu_dump_statistics(env, (FILE *)mon, &monitor_fprintf, 0);
>> +monitor_fprintf((FILE *)mon, "\n");
>> +}
>>   }
>>   #endif
> Same here.
>
> Furthermore, you neglected to update the command's documentation in
> hmp-commands.hx, and the reference to it in qmp-commands.hx.
> .
In this case the command syntax didn't  changed right? What would need 
to be changed in the documentation?

Legal Disclaimer:
The information contained in this message may be privileged and confidential. 
It is intended to be read only by the individual or entity to whom it is 
addressed or by their designee. If the reader of this message is not the 
intended recipient, you are on notice that any distribution of this message, in 
any form, is strictly prohibited. If you have received this message in error, 
please immediately notify the sender and delete or destroy any copy of this 
message!

Re: [Qemu-devel] [PATCH] Fix monitor 'info registers' command on multi processor guests

2013-01-31 Thread Erlon Cruz

On 01/31/2013 09:33 AM, Luiz Capitulino wrote:
> On Thu, 31 Jan 2013 10:51:10 +0100
> Markus Armbruster  wrote:
>
>> "Erlon Cruz"  writes:
>>
>>> QEMU monitor command 'info registers' only displays information for the 
>>> first
>>> CPU. This fix that by show registers information for each CPU in the system
>> This is incorrect.  It displays information for the *current* CPU.
>> Monitor command "cpu" selects the current CPU.
>>
>> I doubt we want to change this command.  But I'm reviewing it regardless
>> of my doubts.
> I think a better way of doing this would be to add two new optional
> arguments to 'info registers': -i (cpu index) and -a (print all CPUs). By
> default the command would do what it does today.
> .
How about also printing the CPU index before the infos?

Legal Disclaimer:
The information contained in this message may be privileged and confidential. 
It is intended to be read only by the individual or entity to whom it is 
addressed or by their designee. If the reader of this message is not the 
intended recipient, you are on notice that any distribution of this message, in 
any form, is strictly prohibited. If you have received this message in error, 
please immediately notify the sender and delete or destroy any copy of this 
message!

[Qemu-devel] traversal termination in qbus_find_recursive() due to "max_dev"

2013-01-31 Thread Laszlo Ersek

Hi,

I'm working on propagating errors out to do_device_add(). I've come
accross qbus_find_recursive().

Re commit 1395af6f ("qdev: add a maximum device allowed field for the
bus."), could someone please explain:

(1) why the early termination due to max_index vs. max_dev merits an
error message -- both non-recursive call-sites print an error of their
own anyway,

(2) why the

  max_dev != 0 &&
  max_dev <= max_index &&
  name != NULL

condition justifies traversal termination -- in general, a bus called
"foo", albeit full of devices, could allow some of those devices to be a
child bus. Let's call one such (direct) child bus "bar".

If we're looking for "bar", or for one of its descendants, the current
logic seems to stop the search early, at "foo".

I think "bus_class->max_dev" should be checked only when we're actually
trying to add the device to the bus. (In qdev_set_parent_bus() -->
bus_add_child().)

- bus_add_child() should check the limit and set an Error,
- qdev_set_parent_bus_nofail() would abort() if there's an Error,
- qdev_set_parent_bus() would change signature and propagate error,
- all callers would be converted to either the new _nofail() or the old
function with the new signature.

I can try to write the patch if there's consensus about what should happen.

Thoughts?

Thanks!
Laszlo

Re: [Qemu-devel] [PATCH V4 00/22] Multiqueue virtio-net

2013-01-31 Thread Eric Blake

On 01/31/2013 12:00 AM, Jason Wang wrote:
> On 01/31/2013 02:29 AM, Eric Blake wrote:
>> On 01/30/2013 04:12 AM, Jason Wang wrote:
>>
>>> With this changes, user could start a multiqueue virtio-net device through
>>>
>>> ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device 
>>> virtio-net-pci,netdev=hn0
>>>
>>> Management tools such as libvirt can pass multiple pre-created fds/vhostfds 
>>> through
>>>
>>> ./qemu -netdev tap,id=hn0,fds=X:Y,vhostfds=M:N -device 
>>> virtio-net-pci,netdev=hn0
>> Do we really need specific fds= parsing, or can we reuse the existing
>> -add-fd command line option to our advantage?  I guess what I'm asking
>> is how hotplug will work; and if hotplug takes a file name, shouldn't
>> the command line also take a name; and if the command line takes a name,
>> what's wrong with:
>>
>> ./qemu -add-fd fdset=1,fd=X -add-fd fdset=2,fd=Y -add-fd fdset=3,fd=M
>> -add-fd fdset=4,fd=N -netdev
>> tap,id=hn0,fds=/dev/fdset/1:/dev/fdset/2,vhostfds=/dev/fdset/3:/dev/fdset/4
>> -device virtio-net-pci,netdev=hn0
>>
> 
> AFAIK, tap does not support fdset now, so this requirement is beyond the
> scope of multiqueue itself. We can do this in the future. Btw does
> libvirt support add-fd now?

Anything that uses qemu_open() supports fdset now.  The question I'm
asking is whether the command line has a way to pass /path/to/name
(which can be presented as /dev/fdset/nnn for add-fd usage) now, or
whether it only supports fds=integers.

> 
> For hotplug, it just work if you pass multiple file descriptors one by
> one through getfd and then use fds=X:Y,vhostfds=M:N.

For hotplug, you can't pass integers; you have to name the fds either
way.  Either you name it with getfd, or you name it with add-fd.  But
getfd is not as nice as add-fd when it comes to ensuring that fds are
not leaked in qemu, even when the management app such as libvirt
restarts.  Furthermore, if it is possible to specify taps by pathname
instead of by fd inheritance, then using getfd means you have to support
two different approaches in QMP to distinguish which string is being
supplied, while supporting add-fd means you only have to support
qemu_open() which handles both direct names and fd passing in a single
string interface.

As for libvirt support of add-fd, I'm currently working with Stefan
Berger to get patches applied; the goal is tha libvirt 1.0.3 (end of
February) will support add-fd.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH for-1.4 v2] target-ppc: Fix target_ulong vs. hwaddr format mismatches

2013-01-31 Thread Andreas Färber

Since HWADDR_PRIx is always the same now, use %016 for TARGET_PPC64 and
%08 for common code. This may slightly change the ppc64 debug output.

Signed-off-by: Andreas Färber 
---
 target-ppc/mmu_helper.c |6 +++---
 1 Datei geändert, 3 Zeilen hinzugefügt(+), 3 Zeilen entfernt(-)

diff --git a/target-ppc/mmu_helper.c b/target-ppc/mmu_helper.c
index 0aee7a9..564c80d 100644
--- a/target-ppc/mmu_helper.c
+++ b/target-ppc/mmu_helper.c
@@ -587,7 +587,7 @@ static inline int find_pte2(CPUPPCState *env, mmu_ctx_t 
*ctx, int is_64b, int h,
 }
 
 r = pte64_check(ctx, pte0, pte1, h, rw, type);
-LOG_MMU("Load pte from " TARGET_FMT_lx " => " TARGET_FMT_lx " "
+LOG_MMU("Load pte from %016" HWADDR_PRIx " => " TARGET_FMT_lx " "
 TARGET_FMT_lx " %d %d %d " TARGET_FMT_lx "\n",
 pteg_off + (i * 16), pte0, pte1, (int)(pte0 & 1), h,
 (int)((pte0 >> 1) & 1), ctx->ptem);
@@ -602,7 +602,7 @@ static inline int find_pte2(CPUPPCState *env, mmu_ctx_t 
*ctx, int is_64b, int h,
 pte1 = ldl_phys(env->htab_base + pteg_off + (i * 8) + 4);
 }
 r = pte32_check(ctx, pte0, pte1, h, rw, type);
-LOG_MMU("Load pte from " TARGET_FMT_lx " => " TARGET_FMT_lx " "
+LOG_MMU("Load pte from %08" HWADDR_PRIx " => " TARGET_FMT_lx " "
 TARGET_FMT_lx " %d %d %d " TARGET_FMT_lx "\n",
 pteg_off + (i * 8), pte0, pte1, (int)(pte0 >> 31), h,
 (int)((pte0 >> 6) & 1), ctx->ptem);
@@ -633,7 +633,7 @@ static inline int find_pte2(CPUPPCState *env, mmu_ctx_t 
*ctx, int is_64b, int h,
 }
 if (good != -1) {
 done:
-LOG_MMU("found PTE at addr " TARGET_FMT_lx " prot=%01x ret=%d\n",
+LOG_MMU("found PTE at addr %08" HWADDR_PRIx " prot=%01x ret=%d\n",
 ctx->raddr, ctx->prot, ret);
 /* Update page flags */
 pte1 = ctx->raddr;
-- 
1.7.10.4

Re: [Qemu-devel] traversal termination in qbus_find_recursive() due to "max_dev"

2013-01-31 Thread KONRAD Frédéric


On 31/01/2013 14:23, Laszlo Ersek wrote:

Hi,

I'm working on propagating errors out to do_device_add(). I've come
accross qbus_find_recursive().

Re commit 1395af6f ("qdev: add a maximum device allowed field for the
bus."), could someone please explain:

(1) why the early termination due to max_index vs. max_dev merits an
error message -- both non-recursive call-sites print an error of their
own anyway,

(2) why the

   max_dev != 0 &&
   max_dev <= max_index &&
   name != NULL

condition justifies traversal termination -- in general, a bus called
"foo", albeit full of devices, could allow some of those devices to be a
child bus. Let's call one such (direct) child bus "bar".

If we're looking for "bar", or for one of its descendants, the current
logic seems to stop the search early, at "foo".

I think "bus_class->max_dev" should be checked only when we're actually
trying to add the device to the bus. (In qdev_set_parent_bus() -->
bus_add_child().)

- bus_add_child() should check the limit and set an Error,
- qdev_set_parent_bus_nofail() would abort() if there's an Error,
- qdev_set_parent_bus() would change signature and propagate error,
- all callers would be converted to either the new _nofail() or the old
function with the new signature.

I can try to write the patch if there's consensus about what should happen.

Thoughts?

Thanks!
Laszlo

Hi,

I don't think I understand what is a traversal termination.

But, this was added to specify a maximum amount of device on a bus.
(for virtio-bus which can have only one device connected)

1 - When you explicitly set the bus in command line with bus=...and it 
is full

  it returns the error:
  qemu-system-i386: --device virtio-net,bus=virtio-pci-bus.0: Bus 
'virtio-pci-bus.0' not found

  That's why I added the error.

2 - I understand what you mean, but does your idea allows us to connect 
a device to the first

 non-full bus?

Thanks,
Fred

Re: [Qemu-devel] [PATCH V4 00/22] Multiqueue virtio-net

2013-01-31 Thread Michael S. Tsirkin

On Thu, Jan 31, 2013 at 06:44:49AM -0700, Eric Blake wrote:
> On 01/31/2013 12:00 AM, Jason Wang wrote:
> > On 01/31/2013 02:29 AM, Eric Blake wrote:
> >> On 01/30/2013 04:12 AM, Jason Wang wrote:
> >>
> >>> With this changes, user could start a multiqueue virtio-net device through
> >>>
> >>> ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device 
> >>> virtio-net-pci,netdev=hn0
> >>>
> >>> Management tools such as libvirt can pass multiple pre-created 
> >>> fds/vhostfds through
> >>>
> >>> ./qemu -netdev tap,id=hn0,fds=X:Y,vhostfds=M:N -device 
> >>> virtio-net-pci,netdev=hn0
> >> Do we really need specific fds= parsing, or can we reuse the existing
> >> -add-fd command line option to our advantage?  I guess what I'm asking
> >> is how hotplug will work; and if hotplug takes a file name, shouldn't
> >> the command line also take a name; and if the command line takes a name,
> >> what's wrong with:
> >>
> >> ./qemu -add-fd fdset=1,fd=X -add-fd fdset=2,fd=Y -add-fd fdset=3,fd=M
> >> -add-fd fdset=4,fd=N -netdev
> >> tap,id=hn0,fds=/dev/fdset/1:/dev/fdset/2,vhostfds=/dev/fdset/3:/dev/fdset/4
> >> -device virtio-net-pci,netdev=hn0
> >>
> > 
> > AFAIK, tap does not support fdset now, so this requirement is beyond the
> > scope of multiqueue itself. We can do this in the future. Btw does
> > libvirt support add-fd now?
> 
> Anything that uses qemu_open() supports fdset now.  The question I'm
> asking is whether the command line has a way to pass /path/to/name
> (which can be presented as /dev/fdset/nnn for add-fd usage) now, or
> whether it only supports fds=integers.
> 
> > 
> > For hotplug, it just work if you pass multiple file descriptors one by
> > one through getfd and then use fds=X:Y,vhostfds=M:N.
> 
> For hotplug, you can't pass integers; you have to name the fds either
> way.  Either you name it with getfd, or you name it with add-fd.  But
> getfd is not as nice as add-fd when it comes to ensuring that fds are
> not leaked in qemu, even when the management app such as libvirt
> restarts.  Furthermore, if it is possible to specify taps by pathname
> instead of by fd inheritance,

I don't think there's a way to specify taps by pathname.

> then using getfd means you have to support
> two different approaches in QMP to distinguish which string is being
> supplied, while supporting add-fd means you only have to support
> qemu_open() which handles both direct names and fd passing in a single
> string interface.
> 
> As for libvirt support of add-fd, I'm currently working with Stefan
> Berger to get patches applied; the goal is tha libvirt 1.0.3 (end of
> February) will support add-fd.
> 
> -- 
> Eric Blake   eblake redhat com+1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>

Re: [Qemu-devel] [PATCH v3 0/5] XBZRLE fixes

2013-01-31 Thread Eric Blake

On 01/31/2013 12:12 AM, Orit Wasserman wrote:
> Changes for V3:
> Fix year in xbzrle.c copyright notice
> Add copyright notice to test-xbzrle.c, use stack variable
> for 2 byte array and fix spacing.
> 
> This series add some testing to the XBZRLE compression
> and fixes the error in loading a guest from a compressed file with
> XBZRLE by allowing encoding of XBZRLE without activating it.
> 

Series: Reviewed-by: Eric Blake 

But see my comment in 1/5; I'm wondering if that can just be fixed by
whoever applies the series.

> Orit Wasserman (5):
>   Move XBZRLE encoding code to a separate file to allow testing
>   Add XBZRLE testing
>   Fix example for query-migrate-capabilities
>   Allow XBZRLE decoding without enabling the capability
>   Fix error message in migrate_set_capability HMP command
> 
>  Makefile.objs   |   2 +-
>  arch_init.c |   3 -
>  hmp.c   |   2 +-
>  qmp-commands.hx |   6 +-
>  savevm.c| 159 --
>  tests/Makefile  |   3 +
>  tests/test-xbzrle.c | 196 
> 
>  xbzrle.c| 173 ++
>  8 files changed, 376 insertions(+), 168 deletions(-)
>  create mode 100644 tests/test-xbzrle.c
>  create mode 100644 xbzrle.c
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 1/5] Move XBZRLE encoding code to a separate file to allow testing

2013-01-31 Thread Eric Blake

On 01/31/2013 12:12 AM, Orit Wasserman wrote:
> Signed-off-by: Orit Wasserman 
> Reviewed-by: Paolo Bonzini 
> ---
>  Makefile.objs |   2 +-
>  savevm.c  | 159 -
>  xbzrle.c  | 173 
> ++
>  3 files changed, 174 insertions(+), 160 deletions(-)
>  create mode 100644 xbzrle.c
> 

> +++ b/xbzrle.c
> @@ -0,0 +1,173 @@
> +/*
> + * Xor Based Zero Run Length Encoding
> + *
> + * Copyright 2013 Red Hat, Inc. and/or its affiliates

Actually, I would have written 2012-2013 - while this file is 2013 only,
it comes from code that was first written in 2012.

But I'm not sure it is worth a v4 just to fix this.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] RFC migration of zero pages

2013-01-31 Thread Peter Lieven


Am 31.01.2013 um 14:59 schrieb Avi Kivity :

> 
> On Jan 31, 2013 12:29 PM, "Orit Wasserman"  wrote:
> >
> > On 01/31/2013 11:48 AM, Gleb Natapov wrote:
> > > On Thu, Jan 31, 2013 at 09:47:24AM +0200, Orit Wasserman wrote:
> > >> On 01/31/2013 08:57 AM, Peter Lieven wrote:
> > >>> Hi,
> > >>>
> > >>> I just came across an idea and would like to have feedback if it makes 
> > >>> sence or not.
> > >>>
> > >>> If a VM is started without preallocated memory all memory that has not 
> > >>> been written to
> > >>> reads as zeros, right?
> > >> Hi,
> > >> No the memory will be unmapped (we allocate on demand).
> > >>> If a VM with a lot of unwritten memory is migrated or if the memory 
> > >>> contains a lot
> > >>> of zeroed out memory (e.g. Windows or Linux guest with page 
> > >>> sanitization) all this memory
> > >>> is allocated on the target during live migration. Especially with KSM 
> > >>> this leads
> > >>> to the problem that this memory is allocated and might be not available 
> > >>> completely as
> > >>> merging of the pages will happen async.
> > >>>
> > >>> Wouldn't it make sense to not send zero pages in the first round where 
> > >>> the complete
> > >>> ram is sent (if it is detectable that we are in this stage)?
> > >> We send one byte per zero page at the moment (see is_dup_page) we can 
> > >> further optimizing it
> > >> by not sending it.
> > >> I have to point out that this is a very idle guest and we need to work 
> > >> on a loaded guest
> > >> which is the more hard problem in migration.
> > >>
> > >> Also I notice that the bottle neck in migrating unmapped pages is the 
> > >> detection of those pages
> > >> because we map the pages in order to check them, for a large guest this 
> > >> is very expensive as mapping a page
> > >> results in a page fault in the host.
> > >> So what will be very helpful is actually locating those pages without 
> > >> mapping them
> > >> which looks very complicated.
> > >>
> > > What is wrong with mincore()?
> >
> > Avi/Michael do you remember why mincore can't be used to check if a guest 
> > page is unmapped?
> 
> A page may be not in core, but also nonzero (for example swap).
What about hugetables? Afaik, there are not swappable - at least they where 
somewhen in the past.
Maybe it would be possible to have a performance improvement if they are used.

Peter

Re: [Qemu-devel] [PATCH V4 00/22] Multiqueue virtio-net

2013-01-31 Thread Michael S. Tsirkin

On Wed, Jan 30, 2013 at 07:12:19PM +0800, Jason Wang wrote:
> Hello all:
> 
> This seires is an update of last version of multiqueue virtio-net support.
> 
> This series tries to brings multiqueue support to virtio-net through a
> multiqueue support tap backend and multiple vhost threads.
> 
> Patch 1 converts bitfield in TAPState to bool. Patch 2 replace assert(0) with
> abort() in tap.
> 
> To support this, multiqueue nic support were added to qemu. This is done by
> introducing an array of NetClientStates in NICState, and make each pair of 
> peers
> to be an queue of the nic. This is done in patch 3-9.
> 
> Tap were also converted to be able to create a multiple queue
> backend. Currently, only linux support this by issuing TUNSETIFF N times with
> the same device name to create N queues. Each fd returned by TUNSETIFF were a
> queue supported by kernel. Three new command lines were introduced, "queues"
> were used to tell how many queues will be created by qemu; "fds" were used to
> pass multiple pre-created tap file descriptors to qemu; "vhostfds" were used 
> to
> pass multiple pre-created vhost descriptors to qemu. This is done in patch 
> 10-15.
> 
> A method of deleting a queue and queue_index were also introduce for virtio,
> this is done in patch 16-17.
> 
> Vhost were also changed to support multiqueue by introducing a start vq index
> which tracks the first virtqueue that will be used by vhost instead of the
> assumption that the vhost always use virtqueue from index 0. This is done in
> patch 18.
> 
> The last part is the multiqueue userspace changes, this is done in patch 
> 19-22.
> 
> With this changes, user could start a multiqueue virtio-net device through
> 
> ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device virtio-net-pci,netdev=hn0
> 
> Management tools such as libvirt can pass multiple pre-created fds/vhostfds 
> through
> 
> ./qemu -netdev tap,id=hn0,fds=X:Y,vhostfds=M:N -device 
> virtio-net-pci,netdev=hn0
> 
> For the one who wants to try, a git tree is available at:
> git://github.com/jasowang/qemu.git
> 
> Changes from V3:
> - convert bitfield to bool in TAPState (Blue)
> - use abort() instead of assert(0) in tap code (Blue)
> - rebase to the latest
> - fix a bug that breaks the non-tap network

This conflicts with the pull request I sent, in partucular this adds a
layout assumption.  In the hope this will accelerate things, I did a
rebase and a trivial test with single queue only and it seems ok:

git://github.com/mstsirkin/qemu.git pci

There were some warnings about whitespace at EOF but
otherwise seems ok.

-- 
MST

Re: [Qemu-devel] [PATCH V17 03/10] libqblock: build: add configure support

2013-01-31 Thread Stefan Hajnoczi

On Thu, Jan 31, 2013 at 04:53:43PM +0800, Wenchao Xia wrote:
> +##
> +# libqblock probe
> +if test "$libqblock" != "no"; then
> +if test -n "$libtool" &&
> +$pkg_config --atleast-version=$glib_req_ver glib-2.0 > /dev/null 
> 2>&1 && \
> +$pkg_config --atleast-version=$glib_req_ver gthread-2.0 > 
> /dev/null 2>&1 && \
> +test "$linux" = "yes" \
> +; then
> +# Only support Linux now, check for librt and libz are missing, 
> add it later.

Please reuse probing results instead of duplicating them for libqblock.

Re: [Qemu-devel] [PATCH for-1.4] linux-user: Restore cast to target type in get_user()

2013-01-31 Thread Laurent Desnogues

On Thu, Jan 31, 2013 at 1:50 PM, Peter Maydell  wrote:
> Commit 658f2dc97 accidentally dropped the cast to the target type of
> the value loaded by get_user().  The most visible effect of this would
> be that the sequence "uint64_t v; get_user_u32(v, addr)" would sign
> extend the 32 bit loaded value into v rather than zero extending as
> would be expected for a _u32 accessor.  Put the cast back again to
> restore the old behaviour.

This fixes the issue I mentioned a week ago, thanks.

Reviewed-by: Laurent Desnogues 


Laurent

> Signed-off-by: Peter Maydell 
> ---
>  linux-user/qemu.h |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/linux-user/qemu.h b/linux-user/qemu.h
> index 31a220a..b10e957 100644
> --- a/linux-user/qemu.h
> +++ b/linux-user/qemu.h
> @@ -306,12 +306,12 @@ static inline int access_ok(int type, abi_ulong addr, 
> abi_ulong size)
>   ((hptr), (x)), 0)
>
>  #define __get_user_e(x, hptr, e)\
> -  ((x) =\
> +  ((x) = (typeof(*hptr))(   \
> __builtin_choose_expr(sizeof(*(hptr)) == 1, ldub_p,  \
> __builtin_choose_expr(sizeof(*(hptr)) == 2, lduw_##e##_p,\
> __builtin_choose_expr(sizeof(*(hptr)) == 4, ldl_##e##_p, \
> __builtin_choose_expr(sizeof(*(hptr)) == 8, ldq_##e##_p, abort   \
> - (hptr), 0)
> + (hptr)), 0)
>
>  #ifdef TARGET_WORDS_BIGENDIAN
>  # define __put_user(x, hptr)  __put_user_e(x, hptr, be)
> --
> 1.7.9.5
>

Re: [Qemu-devel] RFC migration of zero pages

2013-01-31 Thread Gleb Natapov

On Thu, Jan 31, 2013 at 03:59:23PM +0200, Avi Kivity wrote:
> On Jan 31, 2013 12:29 PM, "Orit Wasserman"  wrote:
> >
> > On 01/31/2013 11:48 AM, Gleb Natapov wrote:
> > > On Thu, Jan 31, 2013 at 09:47:24AM +0200, Orit Wasserman wrote:
> > >> On 01/31/2013 08:57 AM, Peter Lieven wrote:
> > >>> Hi,
> > >>>
> > >>> I just came across an idea and would like to have feedback if it
> makes sence or not.
> > >>>
> > >>> If a VM is started without preallocated memory all memory that has
> not been written to
> > >>> reads as zeros, right?
> > >> Hi,
> > >> No the memory will be unmapped (we allocate on demand).
> > >>> If a VM with a lot of unwritten memory is migrated or if the memory
> contains a lot
> > >>> of zeroed out memory (e.g. Windows or Linux guest with page
> sanitization) all this memory
> > >>> is allocated on the target during live migration. Especially with KSM
> this leads
> > >>> to the problem that this memory is allocated and might be not
> available completely as
> > >>> merging of the pages will happen async.
> > >>>
> > >>> Wouldn't it make sense to not send zero pages in the first round
> where the complete
> > >>> ram is sent (if it is detectable that we are in this stage)?
> > >> We send one byte per zero page at the moment (see is_dup_page) we can
> further optimizing it
> > >> by not sending it.
> > >> I have to point out that this is a very idle guest and we need to work
> on a loaded guest
> > >> which is the more hard problem in migration.
> > >>
> > >> Also I notice that the bottle neck in migrating unmapped pages is the
> detection of those pages
> > >> because we map the pages in order to check them, for a large guest
> this is very expensive as mapping a page
> > >> results in a page fault in the host.
> > >> So what will be very helpful is actually locating those pages without
> mapping them
> > >> which looks very complicated.
> > >>
> > > What is wrong with mincore()?
> >
> > Avi/Michael do you remember why mincore can't be used to check if a guest
> page is unmapped?
> 
> A page may be not in core, but also nonzero (for example swap).
Yes, mincore() should not be used as zero page check, but it can be used
as an indicator that page can be dealt with in dead stage of migration
since it is likely zero and will not take much time to send. It is
possible to call madvise(MADV_WILLNEED) on them meanwhile to pre-load
swap without faulting on each page individually.

--
Gleb.

Re: [Qemu-devel] [PATCH V4 00/22] Multiqueue virtio-net

2013-01-31 Thread Michael S. Tsirkin

On Thu, Jan 31, 2013 at 04:21:49PM +0200, Michael S. Tsirkin wrote:
> On Wed, Jan 30, 2013 at 07:12:19PM +0800, Jason Wang wrote:
> > Hello all:
> > 
> > This seires is an update of last version of multiqueue virtio-net support.
> > 
> > This series tries to brings multiqueue support to virtio-net through a
> > multiqueue support tap backend and multiple vhost threads.
> > 
> > Patch 1 converts bitfield in TAPState to bool. Patch 2 replace assert(0) 
> > with
> > abort() in tap.
> > 
> > To support this, multiqueue nic support were added to qemu. This is done by
> > introducing an array of NetClientStates in NICState, and make each pair of 
> > peers
> > to be an queue of the nic. This is done in patch 3-9.
> > 
> > Tap were also converted to be able to create a multiple queue
> > backend. Currently, only linux support this by issuing TUNSETIFF N times 
> > with
> > the same device name to create N queues. Each fd returned by TUNSETIFF were 
> > a
> > queue supported by kernel. Three new command lines were introduced, "queues"
> > were used to tell how many queues will be created by qemu; "fds" were used 
> > to
> > pass multiple pre-created tap file descriptors to qemu; "vhostfds" were 
> > used to
> > pass multiple pre-created vhost descriptors to qemu. This is done in patch 
> > 10-15.
> > 
> > A method of deleting a queue and queue_index were also introduce for virtio,
> > this is done in patch 16-17.
> > 
> > Vhost were also changed to support multiqueue by introducing a start vq 
> > index
> > which tracks the first virtqueue that will be used by vhost instead of the
> > assumption that the vhost always use virtqueue from index 0. This is done in
> > patch 18.
> > 
> > The last part is the multiqueue userspace changes, this is done in patch 
> > 19-22.
> > 
> > With this changes, user could start a multiqueue virtio-net device through
> > 
> > ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device 
> > virtio-net-pci,netdev=hn0
> > 
> > Management tools such as libvirt can pass multiple pre-created fds/vhostfds 
> > through
> > 
> > ./qemu -netdev tap,id=hn0,fds=X:Y,vhostfds=M:N -device 
> > virtio-net-pci,netdev=hn0
> > 
> > For the one who wants to try, a git tree is available at:
> > git://github.com/jasowang/qemu.git
> > 
> > Changes from V3:
> > - convert bitfield to bool in TAPState (Blue)
> > - use abort() instead of assert(0) in tap code (Blue)
> > - rebase to the latest
> > - fix a bug that breaks the non-tap network
> 
> This conflicts with the pull request I sent, in partucular this adds a
> layout assumption.  In the hope this will accelerate things, I did a
> rebase and a trivial test with single queue only and it seems ok:
> 
> git://github.com/mstsirkin/qemu.git pci
> 
> There were some warnings about whitespace at EOF but
> otherwise seems ok.

Pushed to my pci branch on kernel.org too.

> -- 
> MST

Re: [Qemu-devel] [PATCH V17 04/10] libqblock: build: add rule for libqblock.la

2013-01-31 Thread Stefan Hajnoczi

On Thu, Jan 31, 2013 at 04:53:44PM +0800, Wenchao Xia wrote:
> diff --git a/libqblock/Makefile b/libqblock/Makefile
> index 8173da7..a6be721 100644
> --- a/libqblock/Makefile
> +++ b/libqblock/Makefile
> @@ -1,4 +1,29 @@
>  all: libqblock.la
>  
> -libqblock.la:
> - @true
> +# objects linked into a shared library, built with libtool with -fPIC if 
> required
> +libqblock-obj-y = libqblock/libqblock.o libqblock/libqblock-error.o
> +libqblock-obj-y += $(filter-out stubs/set-fd-handler.o, $(stub-obj-y))

I guess you will reimplement the function yourself later?  Please
include a comment explaining the reason for making this exception.

[Qemu-devel] [PATCH for-1.4 v2] block/raw-posix: Build fix for O_ASYNC

2013-01-31 Thread Andreas Färber

Commit eeb6b45d48800e96f67ef2a5c80332557fd45ddb (block: raw-posix image
file reopen) broke the build on OpenIndiana.

illumos has no O_ASYNC. Exclude it from flags to be compared
and instead assert that it is not set where defined.

Cf. e61ab1da7e98357da47c54d8f893b9bd6ff2f7f9 for qemu-ga.

Cc: qemu-sta...@nongnu.org (1.3.x)
Cc: Jeff Cody 
Suggested-by: Paolo Bonzini 
Signed-off-by: Andreas Färber 
---
 v1 -> v2:
 * Instead of excluding O_ASYNC from flag comparison only for CONFIG_SOLARIS,
   assert that O_ASYNC is not set if defined. Suggested by Paolo.

 block/raw-posix.c |   11 ++-
 1 Datei geändert, 10 Zeilen hinzugefügt(+), 1 Zeile entfernt(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 657af95..8b6b926 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -345,11 +345,20 @@ static int raw_reopen_prepare(BDRVReopenState *state,
 
 raw_s->fd = -1;
 
-int fcntl_flags = O_APPEND | O_ASYNC | O_NONBLOCK;
+int fcntl_flags = O_APPEND | O_NONBLOCK;
 #ifdef O_NOATIME
 fcntl_flags |= O_NOATIME;
 #endif
 
+#ifdef O_ASYNC
+/* Not all operating systems have O_ASYNC, and those that don't
+ * will not let us track the state into raw_s->open_flags (typically
+ * you achieve the same effect with an ioctl, for example I_SETSIG
+ * on Solaris). But we do not use O_ASYNC, so that's fine.
+ */
+assert((s->open_flags & O_ASYNC) == 0);
+#endif
+
 if ((raw_s->open_flags & ~fcntl_flags) == (s->open_flags & ~fcntl_flags)) {
 /* dup the original fd */
 /* TODO: use qemu fcntl wrapper */
-- 
1.7.10.4

Re: [Qemu-devel] RFC migration of zero pages

2013-01-31 Thread Gleb Natapov

On Thu, Jan 31, 2013 at 04:36:25PM +0200, Avi Kivity wrote:
> On Thu, Jan 31, 2013 at 4:30 PM, Gleb Natapov  wrote:
> >> >
> >> > Avi/Michael do you remember why mincore can't be used to check if a guest
> >> page is unmapped?
> >>
> >> A page may be not in core, but also nonzero (for example swap).
> > Yes, mincore() should not be used as zero page check, but it can be used
> > as an indicator that page can be dealt with in dead stage of migration
> > since it is likely zero and will not take much time to send.
> 
> Or it's swapped, in which case we know nothing about it.
> 
> > It is
Of course, that is why I said "likely' :)

> > possible to call madvise(MADV_WILLNEED) on them meanwhile to pre-load
> > swap without faulting on each page individually.
> 
> During migration you're faulting like mad anyway.
That's guest faulting on dirty bit logging, as far as I understand Orit
says that in addition to that she sees a lot of faults generated by
migration reading unmapped guest memory. She wants to eliminate at least
those.

--
Gleb.

Re: [Qemu-devel] [RFC PATCH RDMA support v1: 4/5] connection-setup code between client/server

2013-01-31 Thread Michael R. hines

I was wondering about that possibility myself, I'll give this a try 
(along with the other comments).


Thanks,
- Michael

On 01/31/2013 05:51 AM, Orit Wasserman wrote:

Hi Michael,
It will be cleaner to create a new file migration-rdma.c and put all the
RDMA logic into it, this way you won't effect TCP migration code.

More comments inline

On 01/29/2013 12:01 AM, mrhi...@linux.vnet.ibm.com wrote:

From: "Michael R. Hines" 


Signed-off-by: Michael R. Hines 
---
  migration-tcp.c |   53 +
  migration.c |   41 +
  2 files changed, 94 insertions(+)

diff --git a/migration-tcp.c b/migration-tcp.c
index e78a296..b6c53ba 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -14,10 +14,12 @@
   */
  
  #include "qemu-common.h"

+#include "qemu/rdma.h"
  #include "qemu/sockets.h"
  #include "migration/migration.h"
  #include "migration/qemu-file.h"
  #include "block/block.h"
+#include 
  
  //#define DEBUG_MIGRATION_TCP
  
@@ -55,6 +57,9 @@ static void tcp_wait_for_connect(int fd, void *opaque)
  
  if (fd < 0) {

  DPRINTF("migrate connect error\n");
+if (qemu_use_rdma_migration()) {
+qemu_rdma_migration_cleanup(&rdma_mdata);
+}
  s->fd = -1;
  migrate_fd_error(s);
  } else {
@@ -62,6 +67,25 @@ static void tcp_wait_for_connect(int fd, void *opaque)
  s->fd = fd;
  socket_set_block(s->fd);
  migrate_fd_connect(s);
+
+/* RDMA initailization */
+if (qemu_use_rdma_migration()) {
+if (qemu_rdma_migration_client_init(&rdma_mdata)) {
+migrate_fd_error(s);
+return;
+}
+
+fprintf(stderr, "qemu_rdma_migration_client_init success\n");

Please use DPRINTF or trace for debug.

+
+if (qemu_rdma_migration_client_connect(&rdma_mdata)) {
+migrate_fd_error(s);
+close(s->fd);
+return;
+}
+
+fprintf(stderr, "qemu_rdma_migration_client_connect success\n");
+}
+
  }
  }
  
@@ -101,6 +125,16 @@ static void tcp_accept_incoming_migration(void *opaque)

  goto out;
  }
  
+if (qemu_use_rdma_migration()) {

+printf("listen on port started: %d\n", rdma_mdata.port);
+
+if (qemu_rdma_migration_server_wait_for_client(&rdma_mdata)) {
+fprintf(stderr, "rdma migration: error waiting for client!\n");
+close(s);
+return;
+}
+}
+
  process_incoming_migration(f);
  return;
  
@@ -117,6 +151,25 @@ void tcp_start_incoming_migration(const char *host_port, Error **errp)

  return;
  }
  
+if (qemu_use_rdma_migration()) {

+int ret;
+
+ret = qemu_rdma_migration_server_init(&rdma_mdata);
+if (ret) {
+fprintf(stderr, "rdma migration: error init server!\n");

Please set errp in case of an error, I would suggest passing it into 
qemu_rdma_migration_server_init
that will set the relevant error.   

+close(s);
+return;
+}
+
+ret = qemu_rdma_migration_server_prepare(&rdma_mdata);
+if (ret) {
+fprintf(stderr, "rdma migration: error preparing server!\n");
+close(s);
+return;
+}
+
+}
+
  qemu_set_fd_handler2(s, NULL, tcp_accept_incoming_migration, NULL,
   (void *)(intptr_t)s);
  }
diff --git a/migration.c b/migration.c
index 77c1971..cffe16f 100644
--- a/migration.c
+++ b/migration.c
@@ -22,6 +22,7 @@
  #include "qemu/sockets.h"
  #include "migration/block.h"
  #include "qemu/thread.h"
+#include "qemu/rdma.h"
  #include "qmp-commands.h"
  
  //#define DEBUG_MIGRATION

@@ -279,6 +280,11 @@ static int migrate_fd_cleanup(MigrationState *s)
  }
  
  assert(s->fd == -1);

+
+if (qemu_rdma_migration_enabled()) {
+qemu_rdma_migration_cleanup(&rdma_mdata);
+}
+
  return ret;
  }
  
@@ -481,6 +487,41 @@ int64_t qmp_query_migrate_cache_size(Error **errp)

  return migrate_xbzrle_cache_size();
  }
  
+void qmp_migrate_set_rdma_port(int64_t port, Error **errp)

+{
+MigrationState *s = migrate_get_current();
+if (s && (s->state == MIG_STATE_ACTIVE)) {
+return;
+}

Please return error here by setting errp.

+printf("rdma migration port: %" PRId64 "\n", port);

You forgot a printf here ...

+rdma_mdata.port = port;
+}
+
+void qmp_migrate_set_rdma_host(const char *host, Error **errp)
+{
+MigrationState *s = migrate_get_current();
+if (s && (s->state == MIG_STATE_ACTIVE)) {
+return;
+}
+printf("rdma migration host name: %s\n", host);
+strncpy(rdma_mdata.host, host, 64);

Use g_strdup here

regards,
Orit

+rdma_mdata.host[63] = '\0';
+}
+
+void qmp_migrate_rdma_prepare(Error **errp)
+{
+MigrationState *s = migrate_get_current();
+if (s && (s->state == MIG_STATE_ACTI

Re: [Qemu-devel] [PATCH V4 00/22] Multiqueue virtio-net

2013-01-31 Thread Jason Wang

On 01/31/2013 09:44 PM, Eric Blake wrote:
> On 01/31/2013 12:00 AM, Jason Wang wrote:
>> On 01/31/2013 02:29 AM, Eric Blake wrote:
>>> On 01/30/2013 04:12 AM, Jason Wang wrote:
>>>
 With this changes, user could start a multiqueue virtio-net device through

 ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device 
 virtio-net-pci,netdev=hn0

 Management tools such as libvirt can pass multiple pre-created 
 fds/vhostfds through

 ./qemu -netdev tap,id=hn0,fds=X:Y,vhostfds=M:N -device 
 virtio-net-pci,netdev=hn0
>>> Do we really need specific fds= parsing, or can we reuse the existing
>>> -add-fd command line option to our advantage?  I guess what I'm asking
>>> is how hotplug will work; and if hotplug takes a file name, shouldn't
>>> the command line also take a name; and if the command line takes a name,
>>> what's wrong with:
>>>
>>> ./qemu -add-fd fdset=1,fd=X -add-fd fdset=2,fd=Y -add-fd fdset=3,fd=M
>>> -add-fd fdset=4,fd=N -netdev
>>> tap,id=hn0,fds=/dev/fdset/1:/dev/fdset/2,vhostfds=/dev/fdset/3:/dev/fdset/4
>>> -device virtio-net-pci,netdev=hn0
>>>
>> AFAIK, tap does not support fdset now, so this requirement is beyond the
>> scope of multiqueue itself. We can do this in the future. Btw does
>> libvirt support add-fd now?
> Anything that uses qemu_open() supports fdset now.  The question I'm
> asking is whether the command line has a way to pass /path/to/name
> (which can be presented as /dev/fdset/nnn for add-fd usage) now, or
> whether it only supports fds=integers.

Nothing special with 'fds' and 'vhostfds', it just split the params by
':' and pass them one by one through monitor_handle_fd_param() just like
"fd" and "vhostfd". So if 'fd' and 'vhostfd' supports /path/to/name, so
do 'fds' and 'vhostfds'.

So for command line, you do can pass /path/to/name to fd/vhostfd but it
won't work since monitor_handle_fd_param() can not handle it because 1)
it's not an integer 2) it was not named before. But for hotplug,
non-integers works since it has already named by getfd, so does fds and
vhostfds.

For management such as libvirt, what's needed is just to connect the
fdname with ':'.

>> For hotplug, it just work if you pass multiple file descriptors one by
>> one through getfd and then use fds=X:Y,vhostfds=M:N.
> For hotplug, you can't pass integers; you have to name the fds either
> way.  Either you name it with getfd, or you name it with add-fd.  But
> getfd is not as nice as add-fd when it comes to ensuring that fds are
> not leaked in qemu, even when the management app such as libvirt
> restarts.  Furthermore, if it is possible to specify taps by pathname
> instead of by fd inheritance, then using getfd means you have to support
> two different approaches in QMP to distinguish which string is being
> supplied, while supporting add-fd means you only have to support
> qemu_open() which handles both direct names and fd passing in a single
> string interface.
>
> As for libvirt support of add-fd, I'm currently working with Stefan
> Berger to get patches applied; the goal is tha libvirt 1.0.3 (end of
> February) will support add-fd.
>

Thanks, I know there are advantages of add-fd, but current tap does not
use qemu_open() which means it can't support fdset. We can add this in
the future.

Re: [Qemu-devel] [PATCH V4 00/22] Multiqueue virtio-net

2013-01-31 Thread Jason Wang

On 01/31/2013 10:36 PM, Michael S. Tsirkin wrote:
> On Thu, Jan 31, 2013 at 04:21:49PM +0200, Michael S. Tsirkin wrote:
>> On Wed, Jan 30, 2013 at 07:12:19PM +0800, Jason Wang wrote:
>>> Hello all:
>>>
>>> This seires is an update of last version of multiqueue virtio-net support.
>>>
>>> This series tries to brings multiqueue support to virtio-net through a
>>> multiqueue support tap backend and multiple vhost threads.
>>>
>>> Patch 1 converts bitfield in TAPState to bool. Patch 2 replace assert(0) 
>>> with
>>> abort() in tap.
>>>
>>> To support this, multiqueue nic support were added to qemu. This is done by
>>> introducing an array of NetClientStates in NICState, and make each pair of 
>>> peers
>>> to be an queue of the nic. This is done in patch 3-9.
>>>
>>> Tap were also converted to be able to create a multiple queue
>>> backend. Currently, only linux support this by issuing TUNSETIFF N times 
>>> with
>>> the same device name to create N queues. Each fd returned by TUNSETIFF were 
>>> a
>>> queue supported by kernel. Three new command lines were introduced, "queues"
>>> were used to tell how many queues will be created by qemu; "fds" were used 
>>> to
>>> pass multiple pre-created tap file descriptors to qemu; "vhostfds" were 
>>> used to
>>> pass multiple pre-created vhost descriptors to qemu. This is done in patch 
>>> 10-15.
>>>
>>> A method of deleting a queue and queue_index were also introduce for virtio,
>>> this is done in patch 16-17.
>>>
>>> Vhost were also changed to support multiqueue by introducing a start vq 
>>> index
>>> which tracks the first virtqueue that will be used by vhost instead of the
>>> assumption that the vhost always use virtqueue from index 0. This is done in
>>> patch 18.
>>>
>>> The last part is the multiqueue userspace changes, this is done in patch 
>>> 19-22.
>>>
>>> With this changes, user could start a multiqueue virtio-net device through
>>>
>>> ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device 
>>> virtio-net-pci,netdev=hn0
>>>
>>> Management tools such as libvirt can pass multiple pre-created fds/vhostfds 
>>> through
>>>
>>> ./qemu -netdev tap,id=hn0,fds=X:Y,vhostfds=M:N -device 
>>> virtio-net-pci,netdev=hn0
>>>
>>> For the one who wants to try, a git tree is available at:
>>> git://github.com/jasowang/qemu.git
>>>
>>> Changes from V3:
>>> - convert bitfield to bool in TAPState (Blue)
>>> - use abort() instead of assert(0) in tap code (Blue)
>>> - rebase to the latest
>>> - fix a bug that breaks the non-tap network
>> This conflicts with the pull request I sent, in partucular this adds a
>> layout assumption.  In the hope this will accelerate things, I did a
>> rebase and a trivial test with single queue only and it seems ok:
>>
>> git://github.com/mstsirkin/qemu.git pci
>>
>> There were some warnings about whitespace at EOF but
>> otherwise seems ok.
> Pushed to my pci branch on kernel.org too.

Tested with mq, it works well.

Thanks.
>
>> -- 
>> MST

[Qemu-devel] [PATCH for-1.4?] target-i386: Pass X86CPU to cpu_x86_set_a20()

2013-01-31 Thread Andreas Färber

Prepares for cpu_interrupt() changing argument to CPUState.

While touching it, rename to x86_cpu_...() now that it takes an X86CPU.

Signed-off-by: Andreas Färber 
---
 Extracted from my qom-cpu-8 queue.

 hw/pc.c  |7 ---
 target-i386/cpu.h|2 +-
 target-i386/helper.c |4 +++-
 3 Dateien geändert, 8 Zeilen hinzugefügt(+), 5 Zeilen entfernt(-)

diff --git a/hw/pc.c b/hw/pc.c
index 34b6dff..53cc173 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -527,11 +527,11 @@ type_init(port92_register_types)
 
 static void handle_a20_line_change(void *opaque, int irq, int level)
 {
-CPUX86State *cpu = opaque;
+X86CPU *cpu = opaque;
 
 /* XXX: send to all CPUs ? */
 /* XXX: add logic to handle multiple A20 line sources */
-cpu_x86_set_a20(cpu, level);
+x86_cpu_set_a20(cpu, level);
 }
 
 int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
@@ -1085,7 +1085,8 @@ void pc_basic_device_init(ISABus *isa_bus, qemu_irq *gsi,
 }
 }
 
-a20_line = qemu_allocate_irqs(handle_a20_line_change, first_cpu, 2);
+a20_line = qemu_allocate_irqs(handle_a20_line_change,
+  x86_env_get_cpu(first_cpu), 2);
 i8042 = isa_create_simple(isa_bus, "i8042");
 i8042_setup_a20_line(i8042, &a20_line[0]);
 if (!no_vmport) {
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 62508dc..9e6e1a6 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -1011,7 +1011,7 @@ void host_cpuid(uint32_t function, uint32_t count,
 int cpu_x86_handle_mmu_fault(CPUX86State *env, target_ulong addr,
  int is_write, int mmu_idx);
 #define cpu_handle_mmu_fault cpu_x86_handle_mmu_fault
-void cpu_x86_set_a20(CPUX86State *env, int a20_state);
+void x86_cpu_set_a20(X86CPU *cpu, int a20_state);
 
 static inline bool hw_local_breakpoint_enabled(unsigned long dr7, int index)
 {
diff --git a/target-i386/helper.c b/target-i386/helper.c
index bf43d6a..29217ef 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -366,8 +366,10 @@ void cpu_dump_state(CPUX86State *env, FILE *f, 
fprintf_function cpu_fprintf,
 /* x86 mmu */
 /* XXX: add PGE support */
 
-void cpu_x86_set_a20(CPUX86State *env, int a20_state)
+void x86_cpu_set_a20(X86CPU *cpu, int a20_state)
 {
+CPUX86State *env = &cpu->env;
+
 a20_state = (a20_state != 0);
 if (a20_state != ((env->a20_mask >> 20) & 1)) {
 #if defined(DEBUG_MMU)
-- 
1.7.10.4

Re: [Qemu-devel] [RFC PATCH RDMA support v1: 5/5] send memory over RDMA as blocks are iterated

2013-01-31 Thread Michael R. hines

Yes, I was hoping for a comment about this in particular (before I send 
out another patchest with the proper coverletter and so forth).


So here's the problem I was experiencing inside savevm.c, 
qemu_loadvm_state():


I was having a little trouble serializing the client/server protocol in 
my brain.


When the server-side begins loading pages into ram, qemu_loadvm_state() 
goes into a tight loop waiting for each memory page to be transmitted, 
one-by-one.


Now, for RDMA, this loop is not necessary, but the loop was stuck 
waiting for TCP messages that did not need to be sent. So, the extra 
flag you saw was a hack to break out of the loop


 but according to you, I should bypass this loop entirely?
Should I write a brand new function in savevm.c which skips this function?

Also, with the QEMUFileRDMA, I have a question: Since RDMA does not 
require a file-like abstraction of any kind (meaning there is no 
explicit handshaking during an RDMA transfer), should I really create 
one of these? Unlike sockets and snapshot files that a typical migration 
would normally need, an RDMA-based migration doesn't operate this way 
anymore.


Thanks for all the comments ... keep them coming =)

- Michael


On 01/31/2013 06:04 AM, Orit Wasserman wrote:

Hi Michael,
I maybe missing something here but why do you need a RAM_SAVE_FLAG_RDMA flag?
You don't do any decoding in the destination.

I would suggest creating a QEMUFileRDMA and moving the write/read code
You can either add a new rdma_buffer QEMUFileOps or add the address to 
put_buffer.

you also have some white space damage in the beginning of savevm.c.

Regards,
Orit

On 01/29/2013 12:01 AM, mrhi...@linux.vnet.ibm.com wrote

From: "Michael R. Hines" 


Signed-off-by: Michael R. Hines 
---
  arch_init.c   |  116 +++--
  include/migration/qemu-file.h |1 +
  savevm.c  |   90 +++-
  3 files changed, 189 insertions(+), 18 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index dada6de..7633fa6 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -42,6 +42,7 @@
  #include "migration/migration.h"
  #include "exec/gdbstub.h"
  #include "hw/smbios.h"
+#include "qemu/rdma.h"
  #include "exec/address-spaces.h"
  #include "hw/pcspk.h"
  #include "migration/page_cache.h"
@@ -113,6 +114,7 @@ const uint32_t arch_type = QEMU_ARCH;
  #define RAM_SAVE_FLAG_EOS  0x10
  #define RAM_SAVE_FLAG_CONTINUE 0x20
  #define RAM_SAVE_FLAG_XBZRLE   0x40
+#define RAM_SAVE_FLAG_RDMA 0x80
  
  #ifdef __ALTIVEC__

  #include 
@@ -434,6 +436,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
  int bytes_sent = 0;
  MemoryRegion *mr;
  ram_addr_t current_addr;
+static int not_sent = 1;
  
  if (!block)

  block = QTAILQ_FIRST(&ram_list.blocks);
@@ -457,23 +460,75 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
  int cont = (block == last_sent_block) ?
  RAM_SAVE_FLAG_CONTINUE : 0;
  
+current_addr = block->offset + offset;

  p = memory_region_get_ram_ptr(mr) + offset;
  
  /* In doubt sent page as normal */

  bytes_sent = -1;
-if (is_dup_page(p)) {
+
+/*
+ * RFC RDMA: The empirical cost of searching for zero pages here
+ *   plus the cost of communicating with the other side
+ *   seems to take significantly more time than simply
+ *   dumping the page into remote memory.
+ */
+if (!qemu_rdma_migration_enabled() && is_dup_page(p)) {
  acct_info.dup_pages++;
  bytes_sent = save_block_hdr(f, block, offset, cont,
  RAM_SAVE_FLAG_COMPRESS);
  qemu_put_byte(f, *p);
  bytes_sent += 1;
+/*
+ * RFC RDMA: Same comment as above. time(run-length encoding)
+ *   + time(communication) is too big. RDMA throughput 
tanks
+ *   when this feature is enabled. But there's no need
+ *   to change the code since the feature is optional.
+ */
  } else if (migrate_use_xbzrle()) {
-current_addr = block->offset + offset;
  bytes_sent = save_xbzrle_page(f, p, current_addr, block,
offset, cont, last_stage);
  if (!last_stage) {
  p = get_cached_data(XBZRLE.cache, current_addr);
  }
+} else if (qemu_rdma_migration_enabled()) {
+int ret;
+
+/*
+ * RFC RDMA: This bad hack was to cause the loop on the
+ *   receiving side to break. Comments are welcome
+ *   on how to get rid of it.
+ */
+if (no

Re: [Qemu-devel] [PATCH] Fix monitor 'info registers' command on multi processor guests

2013-01-31 Thread Luiz Capitulino

On Thu, 31 Jan 2013 05:12:25 -0800
"Erlon Cruz"  wrote:

> Legal Disclaimer:
> The information contained in this message may be privileged and confidential. 
> It is intended to be read only by the individual or entity to whom it is 
> addressed or by their designee. If the reader of this message is not the 
> intended recipient, you are on notice that any distribution of this message, 
> in any form, is strictly prohibited. If you have received this message in 
> error, please immediately notify the sender and delete or destroy any copy of 
> this message!

This is not appropriate for a public ML (or for sending patches), please
drop this.

PS: I won't reply to messages containing such disclaimer nor take patch
from you if they have it.

Re: [Qemu-devel] RFC migration of zero pages

2013-01-31 Thread Michael R. hines


What about using the linux pagemap?

With RDMA, I have exactly this problem. The cost of mapping and faulting 
and checking for zeros is higher than the cost of simply dumping the 
page into remote memory on the server side.


If we could avoid accessing the page, it was save huge amounts of time

- Michael

On 01/31/2013 02:47 AM, Orit Wasserman wrote:
We send one byte per zero page at the moment (see is_dup_page) we can 
further optimizing it by not sending it. I have to point out that this 
is a very idle guest and we need to work on a loaded guest which is 
the more hard problem in migration. Also I notice that the bottle neck 
in migrating unmapped pages is the detection of those pages because we 
map the pages in order to check them, for a large guest this is very 
expensive as mapping a page results in a page fault in the host. So 
what will be very helpful is actually locating those pages without 
mapping them which looks very complicated. Regards, Orit

Peter

Re: [Qemu-devel] [PATCH V4 00/22] Multiqueue virtio-net

2013-01-31 Thread Eric Blake

On 01/31/2013 06:58 AM, Michael S. Tsirkin wrote:

>>> For hotplug, it just work if you pass multiple file descriptors one by
>>> one through getfd and then use fds=X:Y,vhostfds=M:N.
>>
>> For hotplug, you can't pass integers; you have to name the fds either
>> way.  Either you name it with getfd, or you name it with add-fd.  But
>> getfd is not as nice as add-fd when it comes to ensuring that fds are
>> not leaked in qemu, even when the management app such as libvirt
>> restarts.  Furthermore, if it is possible to specify taps by pathname
>> instead of by fd inheritance,
> 
> I don't think there's a way to specify taps by pathname.

Then using fds=integer:integer on the command line makes the most sense,
and QMP uses fds=name:name where name was specified by 'getfd', and
there is no way to wire up qemu_open() nor any need to use 'add-fd'.
Okay, my question has been answered, your approach looks right now that
I know more about how -netdev works to begin with.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH for-1.4 0/2] target-s390x: CPU cleanups preparing for 1.5

2013-01-31 Thread Andreas Färber

Am 30.01.2013 23:48, schrieb Andreas Färber:
> As a reminder here's a link to one of my original discussions of the new 
> types:
> https://lists.nongnu.org/archive/html/qemu-devel/2012-05/msg01286.html
> 
> That is, for any non-TCG functions (TCG does not support CPUState yet) an
> S390CPU argument should be preferred over CPUS390XState since it allows cheap
> access to its own fields, CPUState's via CPU() and to CPUS390XState via ->env.
> Doing this consistently avoids costs of casting back and forth unnecessarily.
> 
> s390 code should use s390_env_get_cpu() where needed, not ENV_GET_CPU().
> 
> As a rule of thumb, any field in include/exec/cpu-defs.h:CPU_COMMON can be
> expected to end up in CPUState (or accessible from there) sooner or later.

> Per-target functions can be expected to change to CPUState soon.

Maybe too brief: This was referring to functions like kvm_arch_*() that
each target implements, knowing its CPU type. In particular
do_interrupt() is one of my next candidates.

Andreas

> 
> New fields that do not need to be accessed via TCGv or from a hot TCG helper
> function should be added to S390CPU, not to CPUS390XState.

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH 2/2] usb-ehci: add Faraday FUSBH200 support

2013-01-31 Thread Andreas Färber

Am 30.01.2013 01:44, schrieb Kuo-Jung Su:
> 
> 2013/1/29 Andreas Färber mailto:afaer...@suse.de>>
> 
> Gerd, what are your thoughts? If Kuo-Jung doesn't mind, I would offer to
> send a v2 implementing the changes I suggested in the way you prefer.
> 
> 
> Don't worry about me, I'm O.K to any changes. It does not bother me at all,
> what's really killing me is only the patch rules

I started looking into this but I feel we should defer both my realizefn
patch and the fusbh200 addition to 1.5, so that it can get sufficient
review. It turned out less trivial than I thought. ;)
Will send an RFC the next days.

Regards,
Andreas

P.S. Kuo-Jung, please try to avoid HTML-formatted replies to the list,
the quoting breaks then. Thanks.

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

[Qemu-devel] [RFC 1/2] qbus_find_recursive(): don't abort search for named bus on full bus node

2013-01-31 Thread Laszlo Ersek

The bus we're looking for could be in the sub-tree rooted at the node
being checked; don't skip looping over the children.

Signed-off-by: Laszlo Ersek 
---
 hw/qdev-monitor.c |   10 +-
 1 files changed, 1 insertions(+), 9 deletions(-)

diff --git a/hw/qdev-monitor.c b/hw/qdev-monitor.c
index 4e2a92b..34f5014 100644
--- a/hw/qdev-monitor.c
+++ b/hw/qdev-monitor.c
@@ -295,15 +295,7 @@ static BusState *qbus_find_recursive(BusState *bus, const 
char *name,
 match = 0;
 }
 if ((bus_class->max_dev != 0) && (bus_class->max_dev <= bus->max_index)) {
-if (name != NULL) {
-/* bus was explicitly specified: return an error. */
-qerror_report(ERROR_CLASS_GENERIC_ERROR, "Bus '%s' is full",
-  bus->name);
-return NULL;
-} else {
-/* bus was not specified: try to find another one. */
-match = 0;
-}
+match = 0;
 }
 if (match) {
 return bus;
-- 
1.7.1

1 2 3 >

1 - 100 of 221 matches

Mail list logo