date:20140923

Re: [Qemu-devel] [RFC v2 0/3] Q35/AHCI -cdrom/-hda desugaring

2014-09-23 Thread Markus Armbruster

John Snow  writes:

> On 09/19/2014 05:53 AM, Markus Armbruster wrote:
>> John Snow  writes:
>>
>>> This is an extremely rough/quick sketch of
>>> a -cdrom/-hda desugaring fix for Q35/AHCI.
>>>
>>> Before I spent any time on it, I wanted feedback
>>> from Markus or anyone else who had concerns about
>>> how this problem would get fixed.
>>>
>>> This is, then, rough approach #2.
>>>
>>> Highlights:
>>> (1) Add a board property (instead of a HBA property, sigh)
>>>  that defines how we should map (index, (bus,unit)).
>>
>> Imperfect, but it'll do for now.  The place in the boards that sets it
>> should point to the HBA in a comment.
>>
>>> (2) Modify drive_new to accept the MachineClass instead of
>>>  the default interface type. This does not affect how
>>>  default drives get added, because any over-rides to
>>>  the "default type" get handled in options, so while
>>>  it appears we have removed the type of default drives,
>>>  we have not.
>>>
>>> (3) Create helpers for AHCI to assist the Q35 board in
>>>  populating the AHCI device with the IDE drives.
>>>
>>> (4) Create a helper to whine at us for oversights and
>>>  help bug reporters give us more meaningful information.
>>
>> General approach looks good to me; I can see only coding bugs, not
>> design flaws.
>>
>
> I rewrote this series and was about to send it out, but it does fail
> the bios-tables-test because this test uses this command line:
>
> -net none -display none -machine q35,accel=tcg -drive
> file=tests/acpi-test-disk.raw,id=hd0 -device ide-hd,drive=hd0,

qemu: -device ide-hd,drive=hd: Property 'ide-hd.drive' can't take value 'hd', 
it's in use

> Notice it doesn't say if=none for the drive, so after fixing Q35, this
> actually creates a new failure in this test because we will create the
> drive (and device), then fail when trying to create the second device
> attached to the same drive.

Exactly.

Code in question:

const char *device = "";

if (!g_strcmp0(data->machine, MACHINE_Q35)) {
device = ",id=hd -device ide-hd,drive=hd";
}

args = g_strdup_printf("-net none -display none %s -drive file=%s%s,",
   params ? params : "", disk, device);
qtest_start(args);

Called twice:

1. data->machine = MACHINE_PC (really: "pc")
   params = "-machine accel=tcg"

   Thus, args = "-net none -display none -machine accel=tcg"
   " -drive file=tests/acpi-test-disk.raw,"

   Neither if, bus, unit, nor index specified with -drive, so this asks
   board code to set up an IDE disk (bus, unit) = (0,0), and the board
   code complies.

2. data->machine == MACHINE_Q35 (really: "q35",
   params = "-machine q35,accel=tcg"

   Thus, args = ""-net none -display none -machine q35,accel=tcg"
   " -drive file=tests/acpi-test-disk.raw,id=hd -device ide-hd,drive=hd,"

   Again, this asks board code for IDE disk (0,0).  Before your patch,
   board code silently ignores the request.  Afterwards, it complies.

   Additionally, it sets up another IDE disk (0,0) with device,
   exploiting the misfeature that -device *can* use a drive with if=ide.
   Uses the same drive with two device models, which is unsafe and duly
   rejected.

   The special case for q35 is for coping with the lack of a working
   -drive if=ide.  A working if=ide makes it unnecessary and brings out
   its flaws.

   Even before your patch, the special case is unnecessary: "-drive
   if=none,id=hd,file=... -drive ide-hd,drive=hd" would do for both
   machine types.  I'd make that change in a preliminary patch, because
   that avoids polluting the meat of your patching with correcting test
   bugs.

> I think this test is at fault, but I wanted to be duly diligent and
> ask the question: "Is it a big deal if I break backwards compatibility
> with broken scripts?"

Where "broken scripts" means "our users' broken scripts (if any)".

On the one hand, we've always told users that -device goes with a -drive
if=none.  On the other hand, using a drive not picked up by board code
has always worked anyway.

If we want to remain bug-compatible, we need to make the q35 board code
pick up the IF_IDE drives only for new machine types.  Feasible.  But is
it worthwhile?

[Qemu-devel] [PATCH] target-xtensa: add definition for XTHAL_INTTYPE_PROFILING

2014-09-23 Thread Max Filippov

There's new interrupt type in the recent Xtensa releases that may appear
in configuration overlay. Add definition so that new cores that use it
could be automatically imported.

Signed-off-by: Max Filippov 
---
 target-xtensa/cpu.h  | 1 +
 target-xtensa/overlay_tool.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/target-xtensa/cpu.h b/target-xtensa/cpu.h
index 9cf5275..beb5486 100644
--- a/target-xtensa/cpu.h
+++ b/target-xtensa/cpu.h
@@ -266,6 +266,7 @@ typedef enum {
 INTTYPE_TIMER,
 INTTYPE_DEBUG,
 INTTYPE_WRITE_ERR,
+INTTYPE_PROFILING,
 INTTYPE_MAX
 } interrupt_type;
 
diff --git a/target-xtensa/overlay_tool.h b/target-xtensa/overlay_tool.h
index 4c0de7f..5a1353e 100644
--- a/target-xtensa/overlay_tool.h
+++ b/target-xtensa/overlay_tool.h
@@ -163,6 +163,7 @@
 #define XTHAL_INTTYPE_TBD1 INTTYPE_DEBUG
 #define XTHAL_INTTYPE_TBD2 INTTYPE_WRITE_ERR
 #define XTHAL_INTTYPE_WRITE_ERROR INTTYPE_WRITE_ERR
+#define XTHAL_INTTYPE_PROFILING INTTYPE_PROFILING
 
 
 #define INTERRUPT(i) { \
-- 
1.8.1.4

[Qemu-devel] [PATCH] target-xtensa: tests: pre-process tests linker script

2014-09-23 Thread Max Filippov

Xtensa cores have configurable interrupt vectors and endiannes. This
information is needed to link executable images correctly for a specific
core configuration. Instead of hard-coding dc232 defaults pull endianness,
number of high-priority interrupts and location of vectors from the core
configuration and pass it through the C preprocessor.

While at it clean up tabs and align the initial stack on 16 bytes.

Signed-off-by: Max Filippov 
---
 tests/tcg/xtensa/Makefile|  10 +++-
 tests/tcg/xtensa/linker.ld   | 112 -
 tests/tcg/xtensa/linker.ld.S | 130 +++
 3 files changed, 137 insertions(+), 115 deletions(-)
 delete mode 100644 tests/tcg/xtensa/linker.ld
 create mode 100644 tests/tcg/xtensa/linker.ld.S

diff --git a/tests/tcg/xtensa/Makefile b/tests/tcg/xtensa/Makefile
index a70c92b..522a63e 100644
--- a/tests/tcg/xtensa/Makefile
+++ b/tests/tcg/xtensa/Makefile
@@ -13,6 +13,7 @@ SIMFLAGS = --xtensa-core=DC_B_232L --exit_with_target_code 
$(EXTFLAGS)
 SIMDEBUG = --gdbserve=0
 endif
 
+HOST_CC = gcc
 CC  = $(CROSS)gcc
 AS  = $(CROSS)gcc -x assembler-with-cpp
 LD  = $(CROSS)ld
@@ -21,7 +22,7 @@ XTENSA_SRC_PATH = $(SRC_PATH)/tests/tcg/xtensa
 INCLUDE_DIRS = $(XTENSA_SRC_PATH) $(SRC_PATH)/target-xtensa/core-$(CORE)
 XTENSA_INC = $(addprefix -I,$(INCLUDE_DIRS))
 
-LDFLAGS = -T$(XTENSA_SRC_PATH)/linker.ld
+LDFLAGS = -Tlinker.ld
 
 CRT= crt.o vectors.o
 
@@ -59,13 +60,16 @@ TESTCASES += test_windowed.tst
 
 all: build
 
+linker.ld: $(XTENSA_SRC_PATH)/linker.ld.S
+   $(HOST_CC) $(XTENSA_INC) -E -P $< -o $@
+
 %.o: $(XTENSA_SRC_PATH)/%.c
$(CC) $(XTENSA_INC) $(CFLAGS) -c $< -o $@
 
 %.o: $(XTENSA_SRC_PATH)/%.S
$(CC) $(XTENSA_INC) $(ASFLAGS) -c $< -o $@
 
-%.tst: %.o $(XTENSA_SRC_PATH)/macros.inc $(CRT) Makefile
+%.tst: %.o linker.ld $(XTENSA_SRC_PATH)/macros.inc $(CRT) Makefile
$(LD) $(LDFLAGS) $(NOSTDFLAGS) $(CRT) $< -o $@
 
 build: $(TESTCASES)
@@ -85,4 +89,4 @@ host-debug-%.tst: %.tst
gdb --args $(SIM) $(SIMFLAGS) ./$<
 
 clean:
-   $(RM) -fr $(TESTCASES) $(CRT)
+   $(RM) -fr $(TESTCASES) $(CRT) linker.ld
diff --git a/tests/tcg/xtensa/linker.ld b/tests/tcg/xtensa/linker.ld
deleted file mode 100644
index 4d0b307..000
--- a/tests/tcg/xtensa/linker.ld
+++ /dev/null
@@ -1,112 +0,0 @@
-OUTPUT_FORMAT("elf32-xtensa-le")
-ENTRY(_start)
-
-__DYNAMIC = 0;
-
-MEMORY {
-   ram : ORIGIN = 0xd000, LENGTH = 0x0800  /* 128M */
-   rom : ORIGIN = 0xfe00, LENGTH = 0x1000  /* 4k */
-}
-
-SECTIONS
-{
-.init :
-{
-*(.init)
-   *(.init.*)
-} > rom
-
-.vector :
-{
-. = 0x;
-*(.vector.window_overflow_4)
-*(.vector.window_overflow_4.*)
-. = 0x0040;
-*(.vector.window_underflow_4)
-*(.vector.window_underflow_4.*)
-. = 0x0080;
-*(.vector.window_overflow_8)
-*(.vector.window_overflow_8.*)
-. = 0x00c0;
-*(.vector.window_underflow_8)
-*(.vector.window_underflow_8.*)
-. = 0x0100;
-*(.vector.window_overflow_12)
-*(.vector.window_overflow_12.*)
-. = 0x0140;
-*(.vector.window_underflow_12)
-*(.vector.window_underflow_12.*)
-
-. = 0x0180;
-*(.vector.level2)
-*(.vector.level2.*)
-. = 0x01c0;
-*(.vector.level3)
-*(.vector.level3.*)
-. = 0x0200;
-*(.vector.level4)
-*(.vector.level4.*)
-. = 0x0240;
-*(.vector.level5)
-*(.vector.level5.*)
-. = 0x0280;
-*(.vector.level6)
-*(.vector.level6.*)
-. = 0x02c0;
-*(.vector.level7)
-*(.vector.level7.*)
-
-. = 0x0300;
-*(.vector.kernel)
-*(.vector.kernel.*)
-. = 0x0340;
-*(.vector.user)
-*(.vector.user.*)
-. = 0x03c0;
-*(.vector.double)
-*(.vector.double.*)
-} > ram
-
-   .text :
-   {
-   _ftext = .;
-   *(.text .stub .text.* .gnu.linkonce.t.* .literal .literal.*)
-   _etext = .;
-   } > ram
-
-   .rodata :
-   {
-   . = ALIGN(4);
-   _frodata = .;
-   *(.rodata .rodata.* .gnu.linkonce.r.*)
-   *(.rodata1)
-   _erodata = .;
-   } > ram
-
-   .data :
-   {
-   . = ALIGN(4);
-   _fdata = .;
-   *(.data .data.* .gnu.linkonce.d.*)
-   *(.data1)
-   _gp = ALIGN(16);
-   *(.sdata .sdata.* .gnu.linkonce.s.*)
-   _edata = .;
-   } > ram
-
-   .bss :
-   {
-   . = ALIGN(4);
-   _fbss = .;
-   *(.dynsbss)
-   *(.sbss .sbss.* .gnu.linkonce.sb.*)
-   *(.scommon)
-   *(.dynbss)
-   *(.bss .bss.* .gnu.linkonce.b.*)
-   *(COMMON)
-   _

[Qemu-devel] [PATCH v3 6/7] virtio-scsi: Call bdrv_io_plug/bdrv_io_unplug in cmd request handling

2014-09-23 Thread Fam Zheng

Signed-off-by: Fam Zheng 
---
 hw/scsi/virtio-scsi.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 395178e..09a39cb 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -488,6 +488,8 @@ bool virtio_scsi_handle_cmd_req_prepare(VirtIOSCSI *s, 
VirtIOSCSIReq *req)
 virtio_scsi_complete_cmd_req(req);
 return false;
 }
+scsi_req_ref(req->sreq);
+bdrv_io_plug(d->conf.bs);
 return true;
 }
 
@@ -496,6 +498,8 @@ void virtio_scsi_handle_cmd_req_submit(VirtIOSCSI *s, 
VirtIOSCSIReq *req)
 if (scsi_req_enqueue(req->sreq)) {
 scsi_req_continue(req->sreq);
 }
+bdrv_io_unplug(req->sreq->dev->conf.bs);
+scsi_req_unref(req->sreq);
 }
 
 static void virtio_scsi_handle_cmd(VirtIODevice *vdev, VirtQueue *vq)
-- 
1.9.3

[Qemu-devel] [PATCH v3 1/7] virtio-scsi-dataplane: Code to run virtio-scsi on iothread

2014-09-23 Thread Fam Zheng

This implements the core part of dataplane feature of virtio-scsi.

A few fields are added in VirtIOSCSICommon to maintain the dataplane
status. These fields are managed by a new source file:
virtio-scsi-dataplane.c.

Most code in this file will run on an iothread, unless otherwise
commented as in a global mutex context, such as those functions to
start, stop and setting the iothread property.

Upon start, we set up guest/host event notifiers, in a same way as
virtio-blk does. The handlers then pop request from vring and call into
virtio-scsi.c functions to process it. So we need to make sure make all
those called functions work with iothread, too.

Signed-off-by: Fam Zheng 
Signed-off-by: Paolo Bonzini 
---
 hw/scsi/Makefile.objs   |   2 +-
 hw/scsi/virtio-scsi-dataplane.c | 222 
 include/hw/virtio/virtio-scsi.h |  25 -
 3 files changed, 245 insertions(+), 4 deletions(-)
 create mode 100644 hw/scsi/virtio-scsi-dataplane.c

diff --git a/hw/scsi/Makefile.objs b/hw/scsi/Makefile.objs
index 121ddc5..40c79d3 100644
--- a/hw/scsi/Makefile.objs
+++ b/hw/scsi/Makefile.objs
@@ -8,6 +8,6 @@ common-obj-$(CONFIG_ESP_PCI) += esp-pci.o
 obj-$(CONFIG_PSERIES) += spapr_vscsi.o
 
 ifeq ($(CONFIG_VIRTIO),y)
-obj-y += virtio-scsi.o
+obj-y += virtio-scsi.o virtio-scsi-dataplane.o
 obj-$(CONFIG_VHOST_SCSI) += vhost-scsi.o
 endif
diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
new file mode 100644
index 000..acbf622
--- /dev/null
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -0,0 +1,222 @@
+/*
+ * Virtio SCSI dataplane
+ *
+ * Copyright Red Hat, Inc. 2014
+ *
+ * Authors:
+ *   Fam Zheng 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "hw/virtio/virtio-scsi.h"
+#include "qemu/error-report.h"
+#include 
+#include 
+#include 
+#include "hw/virtio/virtio-access.h"
+#include "stdio.h"
+
+/* Context: QEMU global mutex held */
+void virtio_scsi_set_iothread(VirtIOSCSI *s, IOThread *iothread)
+{
+BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(s)));
+VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s);
+
+assert(!s->ctx);
+s->ctx = iothread_get_aio_context(vs->conf.iothread);
+
+/* Don't try if transport does not support notifiers. */
+if (!k->set_guest_notifiers || !k->set_host_notifier) {
+fprintf(stderr, "virtio-scsi: Failed to set iothread "
+   "(transport does not support notifiers)");
+exit(1);
+}
+}
+
+static VirtIOSCSIVring *virtio_scsi_vring_init(VirtIOSCSI *s,
+   VirtQueue *vq,
+   EventNotifierHandler *handler,
+   int n)
+{
+BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(s)));
+VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+VirtIOSCSIVring *r = g_slice_new(VirtIOSCSIVring);
+
+/* Set up virtqueue notify */
+if (k->set_host_notifier(qbus->parent, n, true) != 0) {
+fprintf(stderr, "virtio-scsi: Failed to set host notifier\n");
+exit(1);
+}
+r->host_notifier = *virtio_queue_get_host_notifier(vq);
+r->guest_notifier = *virtio_queue_get_guest_notifier(vq);
+aio_set_event_notifier(s->ctx, &r->host_notifier, handler);
+
+r->parent = s;
+
+if (!vring_setup(&r->vring, VIRTIO_DEVICE(s), n)) {
+fprintf(stderr, "virtio-scsi: VRing setup failed\n");
+exit(1);
+}
+return r;
+}
+
+VirtIOSCSIReq *virtio_scsi_pop_req_vring(VirtIOSCSI *s,
+ VirtIOSCSIVring *vring)
+{
+VirtIOSCSIReq *req = virtio_scsi_init_req(s, NULL);
+int r;
+
+req->vring = vring;
+r = vring_pop((VirtIODevice *)s, &vring->vring, &req->elem);
+if (r < 0) {
+virtio_scsi_free_req(req);
+req = NULL;
+}
+return req;
+}
+
+void virtio_scsi_vring_push_notify(VirtIOSCSIReq *req)
+{
+vring_push(&req->vring->vring, &req->elem,
+   req->qsgl.size + req->resp_iov.size);
+event_notifier_set(&req->vring->guest_notifier);
+}
+
+static void virtio_scsi_iothread_handle_ctrl(EventNotifier *notifier)
+{
+VirtIOSCSIVring *vring = container_of(notifier,
+  VirtIOSCSIVring, host_notifier);
+VirtIOSCSI *s = VIRTIO_SCSI(vring->parent);
+VirtIOSCSIReq *req;
+
+event_notifier_test_and_clear(notifier);
+while ((req = virtio_scsi_pop_req_vring(s, vring))) {
+virtio_scsi_handle_ctrl_req(s, req);
+}
+}
+
+static void virtio_scsi_iothread_handle_event(EventNotifier *notifier)
+{
+VirtIOSCSIVring *vring = container_of(notifier,
+  VirtIOSCSIVring, host_notifier);
+VirtIOSCSI *s = vring->parent;
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+event_notifier_test_and_clear(

[Qemu-devel] [PATCH v3 0/7] virtio-scsi: Dataplane on single iothread

2014-09-23 Thread Fam Zheng

The second half of previous series while rebasing on Paolo's scsi-next branch.

Changes include:

 - Move dataplane fields from VirtIOSCSICommon to VirtIOSCSI.
 - Assert s->ctx in virtio_scsi_set_iothread.
 - No virtio_scsi_aio_acquire, just acquire/release in virtio_scsi_push_event.
 - Add migration state notifier.
 - Add bdrv_io_plug / bdrv_io_unplug.

Thanks,
Fam

Fam Zheng (7):
  virtio-scsi-dataplane: Code to run virtio-scsi on iothread
  virtio-scsi: Hook up with dataplane
  virtio-scsi: Add migration state notifier for dataplane code
  virtio-scsi: Two stages processing of cmd request
  virtio-scsi: Batched prepare for cmd reqs
  virtio-scsi: Call bdrv_io_plug/bdrv_io_unplug in cmd request handling
  virtio-scsi: Process ".iothread" property

 hw/scsi/Makefile.objs   |   2 +-
 hw/scsi/virtio-scsi-dataplane.c | 229 
 hw/scsi/virtio-scsi.c   | 116 +---
 include/hw/virtio/virtio-scsi.h |  33 +-
 4 files changed, 362 insertions(+), 18 deletions(-)
 create mode 100644 hw/scsi/virtio-scsi-dataplane.c

-- 
1.9.3

[Qemu-devel] [PATCH v3 4/7] virtio-scsi: Two stages processing of cmd request

2014-09-23 Thread Fam Zheng

Mechanical change, in preparation for bdrv_io_plug/bdrv_io_unplug.

Signed-off-by: Fam Zheng 
---
 hw/scsi/virtio-scsi-dataplane.c |  4 +++-
 hw/scsi/virtio-scsi.c   | 20 
 include/hw/virtio/virtio-scsi.h |  3 ++-
 3 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index acbf622..11f5705 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -126,7 +126,9 @@ static void virtio_scsi_iothread_handle_cmd(EventNotifier 
*notifier)
 
 event_notifier_test_and_clear(notifier);
 while ((req = virtio_scsi_pop_req_vring(s, vring))) {
-virtio_scsi_handle_cmd_req(s, req);
+if (virtio_scsi_handle_cmd_req_prepare(s, req)) {
+virtio_scsi_handle_cmd_req_submit(s, req);
+}
 }
 }
 
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 57efe65..6cf070f 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -449,10 +449,9 @@ static void virtio_scsi_fail_cmd_req(VirtIOSCSIReq *req)
 virtio_scsi_complete_cmd_req(req);
 }
 
-void virtio_scsi_handle_cmd_req(VirtIOSCSI *s, VirtIOSCSIReq *req)
+bool virtio_scsi_handle_cmd_req_prepare(VirtIOSCSI *s, VirtIOSCSIReq *req)
 {
 VirtIOSCSICommon *vs = &s->parent_obj;
-int n;
 SCSIDevice *d;
 int rc;
 
@@ -464,14 +463,14 @@ void virtio_scsi_handle_cmd_req(VirtIOSCSI *s, 
VirtIOSCSIReq *req)
 } else {
 virtio_scsi_bad_req();
 }
-return;
+return false;
 }
 
 d = virtio_scsi_device_find(s, req->req.cmd.lun);
 if (!d) {
 req->resp.cmd.response = VIRTIO_SCSI_S_BAD_TARGET;
 virtio_scsi_complete_cmd_req(req);
-return;
+return false;
 }
 if (s->dataplane_started && bdrv_get_aio_context(d->conf.bs) != s->ctx) {
 aio_context_acquire(s->ctx);
@@ -487,11 +486,14 @@ void virtio_scsi_handle_cmd_req(VirtIOSCSI *s, 
VirtIOSCSIReq *req)
 req->sreq->cmd.xfer > req->qsgl.size)) {
 req->resp.cmd.response = VIRTIO_SCSI_S_OVERRUN;
 virtio_scsi_complete_cmd_req(req);
-return;
+return false;
 }
+return true;
+}
 
-n = scsi_req_enqueue(req->sreq);
-if (n) {
+void virtio_scsi_handle_cmd_req_submit(VirtIOSCSI *s, VirtIOSCSIReq *req)
+{
+if (scsi_req_enqueue(req->sreq)) {
 scsi_req_continue(req->sreq);
 }
 }
@@ -507,7 +509,9 @@ static void virtio_scsi_handle_cmd(VirtIODevice *vdev, 
VirtQueue *vq)
 return;
 }
 while ((req = virtio_scsi_pop_req(s, vq))) {
-virtio_scsi_handle_cmd_req(s, req);
+if (virtio_scsi_handle_cmd_req_prepare(s, req)) {
+virtio_scsi_handle_cmd_req_submit(s, req);
+}
 }
 }
 
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 1a6a919..1cc759a 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -256,7 +256,8 @@ void virtio_scsi_common_realize(DeviceState *dev, Error 
**errp,
 
 void virtio_scsi_common_unrealize(DeviceState *dev, Error **errp);
 void virtio_scsi_handle_ctrl_req(VirtIOSCSI *s, VirtIOSCSIReq *req);
-void virtio_scsi_handle_cmd_req(VirtIOSCSI *s, VirtIOSCSIReq *req);
+bool virtio_scsi_handle_cmd_req_prepare(VirtIOSCSI *s, VirtIOSCSIReq *req);
+void virtio_scsi_handle_cmd_req_submit(VirtIOSCSI *s, VirtIOSCSIReq *req);
 VirtIOSCSIReq *virtio_scsi_init_req(VirtIOSCSI *s, VirtQueue *vq);
 void virtio_scsi_free_req(VirtIOSCSIReq *req);
 void virtio_scsi_push_event(VirtIOSCSI *s, SCSIDevice *dev,
-- 
1.9.3

[Qemu-devel] [PATCH v3 3/7] virtio-scsi: Add migration state notifier for dataplane code

2014-09-23 Thread Fam Zheng

Similar to virtio-blk-dataplane, we stop the iothread while migration
starts and restart it when migration finishes.

Signed-off-by: Fam Zheng 
---
 hw/scsi/virtio-scsi.c   | 35 ---
 include/hw/virtio/virtio-scsi.h |  2 ++
 2 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index dc705f0..57efe65 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include "hw/virtio/virtio-access.h"
+#include "migration/migration.h"
 
 static inline int virtio_scsi_get_lun(uint8_t *lun)
 {
@@ -357,7 +358,7 @@ static void virtio_scsi_handle_ctrl(VirtIODevice *vdev, 
VirtQueue *vq)
 VirtIOSCSI *s = (VirtIOSCSI *)vdev;
 VirtIOSCSIReq *req;
 
-if (s->ctx) {
+if (s->ctx && !s->dataplane_disabled) {
 virtio_scsi_dataplane_start(s);
 return;
 }
@@ -501,7 +502,7 @@ static void virtio_scsi_handle_cmd(VirtIODevice *vdev, 
VirtQueue *vq)
 VirtIOSCSI *s = (VirtIOSCSI *)vdev;
 VirtIOSCSIReq *req;
 
-if (s->ctx) {
+if (s->ctx && !s->dataplane_disabled) {
 virtio_scsi_dataplane_start(s);
 return;
 }
@@ -651,7 +652,7 @@ static void virtio_scsi_handle_event(VirtIODevice *vdev, 
VirtQueue *vq)
 {
 VirtIOSCSI *s = VIRTIO_SCSI(vdev);
 
-if (s->ctx) {
+if (s->ctx && !s->dataplane_disabled) {
 virtio_scsi_dataplane_start(s);
 return;
 }
@@ -742,6 +743,31 @@ void virtio_scsi_common_realize(DeviceState *dev, Error 
**errp,
 }
 }
 
+/* Disable dataplane thread during live migration since it does not
+ * update the dirty memory bitmap yet.
+ */
+static void virtio_scsi_migration_state_changed(Notifier *notifier, void *data)
+{
+VirtIOSCSI *s = container_of(notifier, VirtIOSCSI,
+ migration_state_notifier);
+MigrationState *mig = data;
+
+if (migration_in_setup(mig)) {
+if (!s->dataplane_started) {
+return;
+}
+virtio_scsi_dataplane_stop(s);
+s->dataplane_disabled = true;
+} else if (migration_has_finished(mig) ||
+   migration_has_failed(mig)) {
+if (s->dataplane_started) {
+return;
+}
+bdrv_drain_all(); /* complete in-flight non-dataplane requests */
+s->dataplane_disabled = false;
+}
+}
+
 static void virtio_scsi_device_realize(DeviceState *dev, Error **errp)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -770,6 +796,8 @@ static void virtio_scsi_device_realize(DeviceState *dev, 
Error **errp)
 
 register_savevm(dev, "virtio-scsi", virtio_scsi_id++, 1,
 virtio_scsi_save, virtio_scsi_load, s);
+s->migration_state_notifier.notify = virtio_scsi_migration_state_changed;
+add_migration_state_change_notifier(&s->migration_state_notifier);
 }
 
 static void virtio_scsi_instance_init(Object *obj)
@@ -796,6 +824,7 @@ static void virtio_scsi_device_unrealize(DeviceState *dev, 
Error **errp)
 VirtIOSCSI *s = VIRTIO_SCSI(dev);
 
 unregister_savevm(dev, "virtio-scsi", s);
+remove_migration_state_change_notifier(&s->migration_state_notifier);
 
 virtio_scsi_common_unrealize(dev, errp);
 }
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 8e1968f..1a6a919 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -194,6 +194,8 @@ typedef struct VirtIOSCSI {
 bool dataplane_started;
 bool dataplane_starting;
 bool dataplane_stopping;
+bool dataplane_disabled;
+Notifier migration_state_notifier;
 } VirtIOSCSI;
 
 typedef struct VirtIOSCSIReq {
-- 
1.9.3

[Qemu-devel] [PATCH v3 2/7] virtio-scsi: Hook up with dataplane

2014-09-23 Thread Fam Zheng

This enables the virtio-scsi-dataplane code by setting the iothread
in virtio-scsi device, and makes any function that is called by
back from dataplane to cooperate with the caller: they need to be
vring/iothread aware when handling the requests and using scsi devices
on the bus.

Signed-off-by: Fam Zheng 
Signed-off-by: Paolo Bonzini 
---
 hw/scsi/virtio-scsi.c | 52 +++
 1 file changed, 48 insertions(+), 4 deletions(-)

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 91ead26..dc705f0 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -69,13 +69,19 @@ static void virtio_scsi_complete_req(VirtIOSCSIReq *req)
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
 qemu_iovec_from_buf(&req->resp_iov, 0, &req->resp, req->resp_size);
-virtqueue_push(vq, &req->elem, req->qsgl.size + req->resp_iov.size);
+if (req->vring) {
+assert(req->vq == NULL);
+virtio_scsi_vring_push_notify(req);
+} else {
+virtqueue_push(vq, &req->elem, req->qsgl.size + req->resp_iov.size);
+virtio_notify(vdev, vq);
+}
+
 if (req->sreq) {
 req->sreq->hba_private = NULL;
 scsi_req_unref(req->sreq);
 }
 virtio_scsi_free_req(req);
-virtio_notify(vdev, vq);
 }
 
 static void virtio_scsi_bad_req(void)
@@ -208,6 +214,11 @@ static void virtio_scsi_do_tmf(VirtIOSCSI *s, 
VirtIOSCSIReq *req)
 BusChild *kid;
 int target;
 
+if (s->dataplane_started && bdrv_get_aio_context(d->conf.bs) != s->ctx) {
+aio_context_acquire(s->ctx);
+bdrv_set_aio_context(d->conf.bs, s->ctx);
+aio_context_release(s->ctx);
+}
 /* Here VIRTIO_SCSI_S_OK means "FUNCTION COMPLETE".  */
 req->resp.tmf.response = VIRTIO_SCSI_S_OK;
 
@@ -346,6 +357,10 @@ static void virtio_scsi_handle_ctrl(VirtIODevice *vdev, 
VirtQueue *vq)
 VirtIOSCSI *s = (VirtIOSCSI *)vdev;
 VirtIOSCSIReq *req;
 
+if (s->ctx) {
+virtio_scsi_dataplane_start(s);
+return;
+}
 while ((req = virtio_scsi_pop_req(s, vq))) {
 virtio_scsi_handle_ctrl_req(s, req);
 }
@@ -457,6 +472,11 @@ void virtio_scsi_handle_cmd_req(VirtIOSCSI *s, 
VirtIOSCSIReq *req)
 virtio_scsi_complete_cmd_req(req);
 return;
 }
+if (s->dataplane_started && bdrv_get_aio_context(d->conf.bs) != s->ctx) {
+aio_context_acquire(s->ctx);
+bdrv_set_aio_context(d->conf.bs, s->ctx);
+aio_context_release(s->ctx);
+}
 req->sreq = scsi_req_new(d, req->req.cmd.tag,
  virtio_scsi_get_lun(req->req.cmd.lun),
  req->req.cdb, req);
@@ -481,6 +501,10 @@ static void virtio_scsi_handle_cmd(VirtIODevice *vdev, 
VirtQueue *vq)
 VirtIOSCSI *s = (VirtIOSCSI *)vdev;
 VirtIOSCSIReq *req;
 
+if (s->ctx) {
+virtio_scsi_dataplane_start(s);
+return;
+}
 while ((req = virtio_scsi_pop_req(s, vq))) {
 virtio_scsi_handle_cmd_req(s, req);
 }
@@ -531,6 +555,9 @@ static void virtio_scsi_reset(VirtIODevice *vdev)
 VirtIOSCSI *s = VIRTIO_SCSI(vdev);
 VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
 
+if (s->ctx) {
+virtio_scsi_dataplane_stop(s);
+}
 s->resetting++;
 qbus_reset_all(&s->bus.qbus);
 s->resetting--;
@@ -573,10 +600,19 @@ void virtio_scsi_push_event(VirtIOSCSI *s, SCSIDevice 
*dev,
 return;
 }
 
-req = virtio_scsi_pop_req(s, vs->event_vq);
+if (s->dataplane_started) {
+assert(s->ctx);
+aio_context_acquire(s->ctx);
+}
+
+if (s->dataplane_started) {
+req = virtio_scsi_pop_req_vring(s, s->event_vring);
+} else {
+req = virtio_scsi_pop_req(s, vs->event_vq);
+}
 if (!req) {
 s->events_dropped = true;
-return;
+goto out;
 }
 
 if (s->events_dropped) {
@@ -605,12 +641,20 @@ void virtio_scsi_push_event(VirtIOSCSI *s, SCSIDevice 
*dev,
 evt->lun[3] = dev->lun & 0xFF;
 }
 virtio_scsi_complete_req(req);
+out:
+if (s->dataplane_started) {
+aio_context_release(s->ctx);
+}
 }
 
 static void virtio_scsi_handle_event(VirtIODevice *vdev, VirtQueue *vq)
 {
 VirtIOSCSI *s = VIRTIO_SCSI(vdev);
 
+if (s->ctx) {
+virtio_scsi_dataplane_start(s);
+return;
+}
 if (s->events_dropped) {
 virtio_scsi_push_event(s, NULL, VIRTIO_SCSI_T_NO_EVENT, 0);
 }
-- 
1.9.3

[Qemu-devel] [PATCH v3 5/7] virtio-scsi: Batched prepare for cmd reqs

2014-09-23 Thread Fam Zheng

Queue the popped requests while calling
virtio_scsi_handle_cmd_req_prepare(), then submit them after all
prepared.

Signed-off-by: Fam Zheng 
---
 hw/scsi/virtio-scsi-dataplane.c | 9 +++--
 hw/scsi/virtio-scsi.c   | 9 +++--
 include/hw/virtio/virtio-scsi.h | 3 +++
 3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 11f5705..b778e05 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -122,14 +122,19 @@ static void virtio_scsi_iothread_handle_cmd(EventNotifier 
*notifier)
 VirtIOSCSIVring *vring = container_of(notifier,
   VirtIOSCSIVring, host_notifier);
 VirtIOSCSI *s = (VirtIOSCSI *)vring->parent;
-VirtIOSCSIReq *req;
+VirtIOSCSIReq *req, *next;
+QTAILQ_HEAD(, VirtIOSCSIReq) reqs = QTAILQ_HEAD_INITIALIZER(reqs);
 
 event_notifier_test_and_clear(notifier);
 while ((req = virtio_scsi_pop_req_vring(s, vring))) {
 if (virtio_scsi_handle_cmd_req_prepare(s, req)) {
-virtio_scsi_handle_cmd_req_submit(s, req);
+QTAILQ_INSERT_TAIL(&reqs, req, next);
 }
 }
+
+QTAILQ_FOREACH_SAFE(req, &reqs, next, next) {
+virtio_scsi_handle_cmd_req_submit(s, req);
+}
 }
 
 /* Context: QEMU global mutex held */
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 6cf070f..395178e 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -502,7 +502,8 @@ static void virtio_scsi_handle_cmd(VirtIODevice *vdev, 
VirtQueue *vq)
 {
 /* use non-QOM casts in the data path */
 VirtIOSCSI *s = (VirtIOSCSI *)vdev;
-VirtIOSCSIReq *req;
+VirtIOSCSIReq *req, *next;
+QTAILQ_HEAD(, VirtIOSCSIReq) reqs = QTAILQ_HEAD_INITIALIZER(reqs);
 
 if (s->ctx && !s->dataplane_disabled) {
 virtio_scsi_dataplane_start(s);
@@ -510,9 +511,13 @@ static void virtio_scsi_handle_cmd(VirtIODevice *vdev, 
VirtQueue *vq)
 }
 while ((req = virtio_scsi_pop_req(s, vq))) {
 if (virtio_scsi_handle_cmd_req_prepare(s, req)) {
-virtio_scsi_handle_cmd_req_submit(s, req);
+QTAILQ_INSERT_TAIL(&reqs, req, next);
 }
 }
+
+QTAILQ_FOREACH_SAFE(req, &reqs, next, next) {
+virtio_scsi_handle_cmd_req_submit(s, req);
+}
 }
 
 static void virtio_scsi_get_config(VirtIODevice *vdev,
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 1cc759a..6134c0b 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -213,6 +213,9 @@ typedef struct VirtIOSCSIReq {
 VirtQueueElement elem;
 /* Set by dataplane code. */
 VirtIOSCSIVring *vring;
+/* Used by two stages request submitting */
+QTAILQ_ENTRY(VirtIOSCSIReq) next;
+
 SCSIRequest *sreq;
 size_t resp_size;
 enum SCSIXferMode mode;
-- 
1.9.3

[Qemu-devel] [PATCH v3 7/7] virtio-scsi: Process ".iothread" property

2014-09-23 Thread Fam Zheng

We are ready, now let's effectively enable dataplane.

Signed-off-by: Fam Zheng 
---
 hw/scsi/virtio-scsi.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 09a39cb..fa36e23 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -754,6 +754,10 @@ void virtio_scsi_common_realize(DeviceState *dev, Error 
**errp,
 s->cmd_vqs[i] = virtio_add_queue(vdev, VIRTIO_SCSI_VQ_SIZE,
  cmd);
 }
+
+if (s->conf.iothread) {
+virtio_scsi_set_iothread(VIRTIO_SCSI(s), s->conf.iothread);
+}
 }
 
 /* Disable dataplane thread during live migration since it does not
-- 
1.9.3

Re: [Qemu-devel] [PATCH v3 2/2] Add configure option --enable-pc-1-0-qemu-kvm

2014-09-23 Thread Markus Armbruster

"Michael S. Tsirkin"  writes:

> On Mon, Sep 22, 2014 at 05:32:16PM +0200, Markus Armbruster wrote:
>> "Michael S. Tsirkin"  writes:
>> 
>> > On Sun, Sep 21, 2014 at 03:38:59PM +0100, Alex Bligh wrote:
>> >> Add a configure option --enable-pc-1-0-qemu-kvm and the
>> >> corresponding --disable-pc-1-0-qemu-kvm, defaulting
>> >> to disabled.
>> >> 
>> >> Rename machine type pc-1.0 to pc-1.0-qemu-git.
>> >> 
>> >> Make pc-1.0 machine type an alias of either pc-1.0-qemu-kvm
>> >> or pc-1.0-qemu-git depending on the value of the config
>> >> option.
>> >> 
>> >> Signed-off-by: Alex Bligh 
>> >
>> > I have to say, this one bothers me.
>> > We end up not being able to predict what does pc-1.0
>> > reference.
>> >
>> > Users also don't get qemu from git so I don't see
>> > why does git make sense in the name?
>> >
>> > Legacy management applications invoked qemu as qemu-kvm -
>> > how about detecting that name and switching
>> > the machine types?
>> 
>> Ugh!  I like that even less than a configure option.
>
> configure options are tested even less than runtime ones:
> each distro has to pick up a set of options and all
> users are stuck with it.
> So let's assume Ubuntu sets this and something is broken.  How is a
> Fedora user to test this? Find out configure flags that Ubuntu uses
> and rebuild from source? It's hard to get people to triage bugs as it
> is.
> That's why changing behaviour in subtle ways depending on
> configure options is a very bad idea.

All it changes is a default machine type.  I wouldn't classify that as
"very bad".  Fortunately, our difference of opinion doesn't matter,
because the conclusion of the thread is "leave it to the distros".

> So a flag might be better, but if we want to be compatible
> with qemu-kvm, poking at argv[0] still better than configure.

That I do consider a genuinely bad idea.

Example of usage messed up by it: I symlink from my ~/bin to executables
in various build trees.  If one of these programs changed behavior
depending on what it finds in argv[0], I'd have to know *exactly* how it
matches argv[0] to choose suitable names for my symlinks.

The name "qemu-kvm" is not a useful indicator of compatibility with the
qemu-kvm fork of QEMU anyway.  There are programs out there calling
themselves qemu-kvm that aren't compatible with the qemu-kvm fork.  And
there are programs out there that are compatible, but call themselves
something else.

[Qemu-devel] [PATCH] vl: Adjust the place of calling mlockall to speedup VM's startup

2014-09-23 Thread zhanghailiang

If we configure mlock=on and memory policy=bind at the same time,
It will consume lots of time for system to treat with memory,
especially when call mbind after mlockall.

Adjust the place of calling mlockall, calling mbind before mlockall
can remarkably reduce the time of VM's startup.

Signed-off-by: zhanghailiang 
---
Hi,

Actually, for mbind and mlockall, i have made a test about the time consuming 
for the different call sequence. 

The results is shown below. It is obviously that mlockall called before mbind 
is 
more time-consuming.

Besides, this patch is OK with memory hotplug.

TEST CODE:
if (mbind_first) {
printf("mbind --> mlockall\n");
mbind(ptr, ram_size/2, MPOL_BIND, &node0mask, 2,
  MPOL_MF_STRICT | MPOL_MF_MOVE);
mbind(ptr + ram_size/2, ram_size/2, MPOL_BIND, &node1mask, 2,
  MPOL_MF_STRICT | MPOL_MF_MOVE);
mlockall(MCL_CURRENT | MCL_FUTURE);
} else {
printf("mlockall --> mbind\n");
mlockall(MCL_CURRENT | MCL_FUTURE);
mbind(ptr, ram_size/2, MPOL_BIND, &node0mask, 2 ,
  MPOL_MF_STRICT | MPOL_MF_MOVE);
mbind(ptr + ram_size/2, ram_size/2, MPOL_BIND, &node1mask, 2,
  MPOL_MF_STRICT | MPOL_MF_MOVE);
}

RESULT 1:
#time /home/test_mbind 10240 0
memroy size 10737418240
mlockall --> mbind

real0m11.886s
user0m0.004s
sys 0m11.865s
#time /home/test_mbind 10240 1
memroy size 10737418240
mbind --> mlockall

real0m5.334s
user0m0.000s
sys 0m5.324s

RESULT 2:
#time /home/test_mbind 4096 0
memroy size 4294967296
mlockall --> mbind

real0m5.503s
user0m0.000s
sys 0m5.492s
#time /home/test_mbind 4096 1
memroy size 4294967296
mbind --> mlockall

real0m2.139s
user0m0.000s
sys 0m2.132s

Best Regards,
zhanghailiang
---
 vl.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/vl.c b/vl.c
index dc792fe..adf4770 100644
--- a/vl.c
+++ b/vl.c
@@ -134,6 +134,7 @@ const char* keyboard_layout = NULL;
 ram_addr_t ram_size;
 const char *mem_path = NULL;
 int mem_prealloc = 0; /* force preallocation of physical target memory */
+int enable_mlock = false;
 int nb_nics;
 NICInfo nd_table[MAX_NICS];
 int autostart;
@@ -1421,12 +1422,8 @@ static void smp_parse(QemuOpts *opts)
 
 }
 
-static void configure_realtime(QemuOpts *opts)
+static void realtime_init(void)
 {
-bool enable_mlock;
-
-enable_mlock = qemu_opt_get_bool(opts, "mlock", true);
-
 if (enable_mlock) {
 if (os_mlock() < 0) {
 fprintf(stderr, "qemu: locking memory failed\n");
@@ -3973,7 +3970,7 @@ int main(int argc, char **argv, char **envp)
 if (!opts) {
 exit(1);
 }
-configure_realtime(opts);
+enable_mlock = qemu_opt_get_bool(opts, "mlock", true);
 break;
 case QEMU_OPTION_msg:
 opts = qemu_opts_parse(qemu_find_opts("msg"), optarg, 0);
@@ -4441,6 +4438,8 @@ int main(int argc, char **argv, char **envp)
 
 machine_class->init(current_machine);
 
+realtime_init();
+
 audio_init();
 
 cpu_synchronize_all_post_init();
-- 
1.7.12.4

Re: [Qemu-devel] [PATCH v4 02/19] qapi: Ignore files created during make check

2014-09-23 Thread Markus Armbruster

Eric Blake  writes:

> After an in-tree build and run of 'make check-{qapi-schema,unit}',
> I noticed some leftover files.
>
> Signed-off-by: Eric Blake 
> Reviewed-by: Wenchao Xia 

Reviewed-by: Markus Armbruster

[Qemu-devel] [PATCH v3] pc-dimm/numa: Fix stat of memory size in node when hotplug memory

2014-09-23 Thread zhanghailiang

When do memory hotplug, if there is numa node, we should add
the memory size to the corresponding node memory size.

For now, it mainly affects the result of hmp command "info numa".

Signed-off-by: zhanghailiang 
---
 v3:
- cold-plugged memory should not be excluded when stat memory size (Igor 
Mammedov)
 v2:
- Don't modify the numa_info.node_mem directly when treating hotplug memory,
  fix the "info numa" instead (suggested by Igor Mammedov)
---
 hw/mem/pc-dimm.c | 30 ++
 include/hw/mem/pc-dimm.h |  2 ++
 include/sysemu/sysemu.h  |  1 +
 monitor.c|  6 +-
 numa.c   | 15 +++
 5 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 5bfc5b7..8e80d74 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -195,6 +195,36 @@ out:
 return ret;
 }
 
+static int pc_dimm_stat_mem_size(Object *obj, void *opaque)
+{
+uint64_t *node_mem = opaque;
+int ret;
+
+if (object_dynamic_cast(obj, TYPE_PC_DIMM)) {
+DeviceState *dev = DEVICE(obj);
+
+if (dev->realized) {
+PCDIMMDevice *dimm = PC_DIMM(obj);
+int size;
+
+size = object_property_get_int(OBJECT(dimm), PC_DIMM_SIZE_PROP,
+   NULL);
+if (size < 0) {
+return -1;
+}
+node_mem[dimm->node] += size;
+}
+}
+
+ret = object_child_foreach(obj, pc_dimm_stat_mem_size, opaque);
+return ret;
+}
+
+void pc_dimm_stat_node_mem(uint64_t *node_mem)
+{
+object_child_foreach(qdev_get_machine(), pc_dimm_stat_mem_size, node_mem);
+}
+
 static Property pc_dimm_properties[] = {
 DEFINE_PROP_UINT64(PC_DIMM_ADDR_PROP, PCDIMMDevice, addr, 0),
 DEFINE_PROP_UINT32(PC_DIMM_NODE_PROP, PCDIMMDevice, node, 0),
diff --git a/include/hw/mem/pc-dimm.h b/include/hw/mem/pc-dimm.h
index 761eeef..0c9a8eb 100644
--- a/include/hw/mem/pc-dimm.h
+++ b/include/hw/mem/pc-dimm.h
@@ -78,4 +78,6 @@ uint64_t pc_dimm_get_free_addr(uint64_t address_space_start,
 int pc_dimm_get_free_slot(const int *hint, int max_slots, Error **errp);
 
 int qmp_pc_dimm_device_list(Object *obj, void *opaque);
+
+void pc_dimm_stat_node_mem(uint64_t *node_mem);
 #endif
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index d8539fd..cfc1592 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -160,6 +160,7 @@ typedef struct node_info {
 extern NodeInfo numa_info[MAX_NODES];
 void set_numa_nodes(void);
 void set_numa_modes(void);
+int query_numa_node_mem(uint64_t *node_mem);
 extern QemuOptsList qemu_numa_opts;
 int numa_init_func(QemuOpts *opts, void *opaque);
 
diff --git a/monitor.c b/monitor.c
index 7467521..c8c812f 100644
--- a/monitor.c
+++ b/monitor.c
@@ -1948,7 +1948,10 @@ static void do_info_numa(Monitor *mon, const QDict 
*qdict)
 {
 int i;
 CPUState *cpu;
+uint64_t *node_mem;
 
+node_mem = g_new0(uint64_t, nb_numa_nodes);
+query_numa_node_mem(node_mem);
 monitor_printf(mon, "%d nodes\n", nb_numa_nodes);
 for (i = 0; i < nb_numa_nodes; i++) {
 monitor_printf(mon, "node %d cpus:", i);
@@ -1959,8 +1962,9 @@ static void do_info_numa(Monitor *mon, const QDict *qdict)
 }
 monitor_printf(mon, "\n");
 monitor_printf(mon, "node %d size: %" PRId64 " MB\n", i,
-numa_info[i].node_mem >> 20);
+node_mem[i] >> 20);
 }
+g_free(node_mem);
 }
 
 #ifdef CONFIG_PROFILER
diff --git a/numa.c b/numa.c
index 3b98135..4e27dd8 100644
--- a/numa.c
+++ b/numa.c
@@ -35,6 +35,7 @@
 #include "hw/boards.h"
 #include "sysemu/hostmem.h"
 #include "qmp-commands.h"
+#include "hw/mem/pc-dimm.h"
 
 QemuOptsList qemu_numa_opts = {
 .name = "numa",
@@ -315,6 +316,20 @@ void memory_region_allocate_system_memory(MemoryRegion 
*mr, Object *owner,
 }
 }
 
+int query_numa_node_mem(uint64_t *node_mem)
+{
+int i;
+
+if (nb_numa_nodes <= 0) {
+return 0;
+}
+pc_dimm_stat_node_mem(node_mem);
+for (i = 0; i < nb_numa_nodes; i++) {
+node_mem[i] += numa_info[i].node_mem;
+}
+return 0;
+}
+
 static int query_memdev(Object *obj, void *opaque)
 {
 MemdevList **list = opaque;
-- 
1.7.12.4

Re: [Qemu-devel] [PATCH v3 2/2] Add configure option --enable-pc-1-0-qemu-kvm

2014-09-23 Thread Michael S. Tsirkin

On Tue, Sep 23, 2014 at 09:59:17AM +0200, Markus Armbruster wrote:
> "Michael S. Tsirkin"  writes:
> 
> > On Mon, Sep 22, 2014 at 05:32:16PM +0200, Markus Armbruster wrote:
> >> "Michael S. Tsirkin"  writes:
> >> 
> >> > On Sun, Sep 21, 2014 at 03:38:59PM +0100, Alex Bligh wrote:
> >> >> Add a configure option --enable-pc-1-0-qemu-kvm and the
> >> >> corresponding --disable-pc-1-0-qemu-kvm, defaulting
> >> >> to disabled.
> >> >> 
> >> >> Rename machine type pc-1.0 to pc-1.0-qemu-git.
> >> >> 
> >> >> Make pc-1.0 machine type an alias of either pc-1.0-qemu-kvm
> >> >> or pc-1.0-qemu-git depending on the value of the config
> >> >> option.
> >> >> 
> >> >> Signed-off-by: Alex Bligh 
> >> >
> >> > I have to say, this one bothers me.
> >> > We end up not being able to predict what does pc-1.0
> >> > reference.
> >> >
> >> > Users also don't get qemu from git so I don't see
> >> > why does git make sense in the name?
> >> >
> >> > Legacy management applications invoked qemu as qemu-kvm -
> >> > how about detecting that name and switching
> >> > the machine types?
> >> 
> >> Ugh!  I like that even less than a configure option.
> >
> > configure options are tested even less than runtime ones:
> > each distro has to pick up a set of options and all
> > users are stuck with it.
> > So let's assume Ubuntu sets this and something is broken.  How is a
> > Fedora user to test this? Find out configure flags that Ubuntu uses
> > and rebuild from source? It's hard to get people to triage bugs as it
> > is.
> > That's why changing behaviour in subtle ways depending on
> > configure options is a very bad idea.
> 
> All it changes is a default machine type.  I wouldn't classify that as
> "very bad".  Fortunately, our difference of opinion doesn't matter,
> because the conclusion of the thread is "leave it to the distros".
> 
> > So a flag might be better, but if we want to be compatible
> > with qemu-kvm, poking at argv[0] still better than configure.
> 
> That I do consider a genuinely bad idea.
> 
> Example of usage messed up by it: I symlink from my ~/bin to executables
> in various build trees.  If one of these programs changed behavior
> depending on what it finds in argv[0], I'd have to know *exactly* how it
> matches argv[0] to choose suitable names for my symlinks.
> 
> The name "qemu-kvm" is not a useful indicator of compatibility with the
> qemu-kvm fork of QEMU anyway.  There are programs out there calling
> themselves qemu-kvm that aren't compatible with the qemu-kvm fork.  And
> there are programs out there that are compatible, but call themselves
> something else.

I think the approach v4 takes is reasonable, though.
I didn't look at the implementation yet.

-- 
MST

[Qemu-devel] [Bug 1101210] Re: qemu 1.4.2: usb keyboard not fully working

2014-09-23 Thread Bjoern Kerler

After some digging, it seems that windows itself is responsible for that
bug. On international keyboards, it tend to send "Ctrl" and "Alt"
instead of "AltGr" keycode.

A possible patch would be to add an option to QEMU command line to
handle Ctrl and Alt keystrokes being pressed at the same time as AltGr
keystroke.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1101210

Title:
  qemu 1.4.2: usb keyboard not fully working

Status in QEMU:
  Confirmed

Bug description:
  When using the usb keyboard, I can't type the | character. I'm using
  german keyboard layout (de) on the host and inside the guest. As a
  guest OS, I use Linux (e.g. a recent KNOPPIX cd). To obtain the |
  character on a german keyboard, I need to press AltGr + the < or >
  key, i.e. the key right to the left shift.

  The qemu command line is something like this:
  ./qemu-system-i386 -device pci-ohci -device usb-kbd
  I also tried
  ./qemu-system-i386 -usb -usbdevice keyboard
  with the same effect.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1101210/+subscriptions

Re: [Qemu-devel] [Bug 1363641] Re: Build of v2.1.0 fails on armv7l due to undeclared __NR_select

2014-09-23 Thread Michael R. Hines


On 09/09/2014 02:09 AM, Dr. David Alan Gilbert wrote:

(cc'ing Michael Hines who owns and knows the RDMA code)

* Karl-Philipp Richter (krichter...@aol.de) wrote:

** Description changed:

   After `make clean` and `git clean -x -f -d` `git checkout v2.1.0 &&
   configure --prefix=/home/user/prefix-qemu-2.1.0 && make` fails due to
   missing declarations
   
   CCqemu-seccomp.o

   qemu-seccomp.c:28:1: error: '__NR_select' undeclared here (not in a 
function)
   qemu-seccomp.c:36:1: error: '__NR_mmap' undeclared here (not in a 
function)
   qemu-seccomp.c:57:1: error: '__NR_getrlimit' undeclared here (not in a 
function)
   qemu-seccomp.c:96:1: error: '__NR_time' undeclared here (not in a 
function)
 GEN   qmp-marshal.c
   qemu-seccomp.c:186:1: error: '__NR_alarm' undeclared here (not in a 
function)
   make: *** [qemu-seccomp.o] Error 1
   
   Same errors for master 8b3030114a449e66c68450acaac4b66f26d91416.

   `configure`should not succeed for a failing build. `config.log` for
   v2.1.0 and 8b303011... attached. The content is mostly compiler output
   which I think is unusual for `config.log`, but see for yourself.
   
   I'm building on a debian 7.6 chroot on Synology DSM 5.0. `uname -a` says

   `Linux diskstatation 3.2.40 #4493 SMP Thu Aug 21 21:43:02 CST 2014
   armv7l GNU/Linux`.
+
+ After installing some of the missing header files (-> configure should
+ fail at the right point with a good error message), i.e. `apt-get
+ install liblzo2-dev libbsd-dev syslinux-common libhwloc-dev librdmacm-
+ dev libsnappy-dev libibverbs-dev valgrind linux-headers-3.2.0-4-common`
+ I'm getting
+
+  CCmigration-rdma.o
+ migration-rdma.c: In function 'ram_chunk_start':
+ migration-rdma.c:523:12: error: cast to pointer from integer of different 
size [-Werror=int-to-pointer-cast]

is that:
 return (uint8_t *) (((uintptr_t) rdma_ram_block->local_host_addr)
 + (i << RDMA_REG_CHUNK_SHIFT));

244:uint8_t  *local_host_addr; /* local virtual address */

in which case I think the problem is the 'i' which is a uint64_t.


+ migration-rdma.c: In function '__qemu_rdma_add_block':
+ migration-rdma.c:556:49: error: cast to pointer from integer of different 
size [-Werror=int-to-pointer-cast]
+ migration-rdma.c:557:49: error: cast to pointer from integer of different 
size [-Werror=int-to-pointer-cast]
+ migration-rdma.c: In function '__qemu_rdma_delete_block':
+ migration-rdma.c:664:45: error: cast to pointer from integer of different 
size [-Werror=int-to-pointer-cast]
+ migration-rdma.c:699:49: error: cast to pointer from integer of different 
size [-Werror=int-to-pointer-cast]
+ migration-rdma.c: In function 'qemu_rdma_search_ram_block':
+ migration-rdma.c:1113:49: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
+ migration-rdma.c: In function 'qemu_rdma_register_and_get_keys':
+ migration-rdma.c:1176:50: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
+ migration-rdma.c:1177:29: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
+ migration-rdma.c:1177:51: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
+ migration-rdma.c:1178:29: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
+ migration-rdma.c: In function 'qemu_rdma_post_send_control':
+ migration-rdma.c:1562:36: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
+ migration-rdma.c: In function 'qemu_rdma_post_recv_control':
+ migration-rdma.c:1616:37: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
+ migration-rdma.c: In function 'qemu_rdma_write_one':
+ migration-rdma.c:1864:16: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
+ migration-rdma.c:1868:53: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
+ migration-rdma.c:1922:52: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
+ migration-rdma.c:1923:50: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
+ migration-rdma.c:1977:49: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
+ migration-rdma.c:1998:49: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
+ migration-rdma.c:2010:58: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
+ migration-rdma.c: In function 'qemu_rdma_registration_handle':
+ migration-rdma.c:3027:21: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
+ migration-rdma.c:3092:41: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]

Re: [Qemu-devel] [Bug 1363641] Re: Build of v2.1.0 fails on armv7l due to undeclared __NR_select

2014-09-23 Thread Michael R. Hines


On 09/09/2014 02:09 AM, Dr. David Alan Gilbert wrote:

(cc'ing Michael Hines who owns and knows the RDMA code)

* Karl-Philipp Richter (krichter...@aol.de) wrote:

** Description changed:

   After `make clean` and `git clean -x -f -d` `git checkout v2.1.0 &&
   configure --prefix=/home/user/prefix-qemu-2.1.0 && make` fails due to
   missing declarations

   CCqemu-seccomp.o
   qemu-seccomp.c:28:1: error: '__NR_select' undeclared here (not in a 
function)
   qemu-seccomp.c:36:1: error: '__NR_mmap' undeclared here (not in a 
function)
   qemu-seccomp.c:57:1: error: '__NR_getrlimit' undeclared here (not in a 
function)
   qemu-seccomp.c:96:1: error: '__NR_time' undeclared here (not in a 
function)
 GEN   qmp-marshal.c
   qemu-seccomp.c:186:1: error: '__NR_alarm' undeclared here (not in a 
function)
   make: *** [qemu-seccomp.o] Error 1

   Same errors for master 8b3030114a449e66c68450acaac4b66f26d91416.
   `configure`should not succeed for a failing build. `config.log` for
   v2.1.0 and 8b303011... attached. The content is mostly compiler output
   which I think is unusual for `config.log`, but see for yourself.

   I'm building on a debian 7.6 chroot on Synology DSM 5.0. `uname -a` says
   `Linux diskstatation 3.2.40 #4493 SMP Thu Aug 21 21:43:02 CST 2014
   armv7l GNU/Linux`.
+
+ After installing some of the missing header files (-> configure should
+ fail at the right point with a good error message), i.e. `apt-get
+ install liblzo2-dev libbsd-dev syslinux-common libhwloc-dev librdmacm-
+ dev libsnappy-dev libibverbs-dev valgrind linux-headers-3.2.0-4-common`
+ I'm getting
+
+  CCmigration-rdma.o
+ migration-rdma.c: In function 'ram_chunk_start':
+ migration-rdma.c:523:12: error: cast to pointer from integer of different 
size [-Werror=int-to-pointer-cast]

is that:
 return (uint8_t *) (((uintptr_t) rdma_ram_block->local_host_addr)
 + (i << RDMA_REG_CHUNK_SHIFT));

244:uint8_t  *local_host_addr; /* local virtual address */

in which case I think the problem is the 'i' which is a uint64_t.


+ migration-rdma.c: In function '__qemu_rdma_add_block':
+ migration-rdma.c:556:49: error: cast to pointer from integer of different 
size [-Werror=int-to-pointer-cast]
+ migration-rdma.c:557:49: error: cast to pointer from integer of different 
size [-Werror=int-to-pointer-cast]
+ migration-rdma.c: In function '__qemu_rdma_delete_block':
+ migration-rdma.c:664:45: error: cast to pointer from integer of different 
size [-Werror=int-to-pointer-cast]
+ migration-rdma.c:699:49: error: cast to pointer from integer of different 
size [-Werror=int-to-pointer-cast]
+ migration-rdma.c: In function 'qemu_rdma_search_ram_block':
+ migration-rdma.c:1113:49: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
+ migration-rdma.c: In function 'qemu_rdma_register_and_get_keys':
+ migration-rdma.c:1176:50: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
+ migration-rdma.c:1177:29: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
+ migration-rdma.c:1177:51: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
+ migration-rdma.c:1178:29: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
+ migration-rdma.c: In function 'qemu_rdma_post_send_control':
+ migration-rdma.c:1562:36: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
+ migration-rdma.c: In function 'qemu_rdma_post_recv_control':
+ migration-rdma.c:1616:37: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
+ migration-rdma.c: In function 'qemu_rdma_write_one':
+ migration-rdma.c:1864:16: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
+ migration-rdma.c:1868:53: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
+ migration-rdma.c:1922:52: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
+ migration-rdma.c:1923:50: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
+ migration-rdma.c:1977:49: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
+ migration-rdma.c:1998:49: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
+ migration-rdma.c:2010:58: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
+ migration-rdma.c: In function 'qemu_rdma_registration_handle':
+ migration-rdma.c:3027:21: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
+ migration-rdma.c:3092:41: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
+ cc1:

Re: [Qemu-devel] [PATCH] vl: Adjust the place of calling mlockall to speedup VM's startup

2014-09-23 Thread Michael S. Tsirkin

On Tue, Sep 23, 2014 at 03:57:47PM +0800, zhanghailiang wrote:
> If we configure mlock=on and memory policy=bind at the same time,
> It will consume lots of time for system to treat with memory,
> especially when call mbind after mlockall.
> 
> Adjust the place of calling mlockall, calling mbind before mlockall
> can remarkably reduce the time of VM's startup.
> 
> Signed-off-by: zhanghailiang 

The idea makes absolute sense to me:
bind after lock will force data copy of
all pages. bind before lock gives us an
indication where to put data on fault in.

Acked-by: Michael S. Tsirkin 


> ---
> Hi,
> 
> Actually, for mbind and mlockall, i have made a test about the time consuming 
> for the different call sequence. 
> 
> The results is shown below. It is obviously that mlockall called before mbind 
> is 
> more time-consuming.
> 
> Besides, this patch is OK with memory hotplug.
> 
> TEST CODE:
> if (mbind_first) {
> printf("mbind --> mlockall\n");
> mbind(ptr, ram_size/2, MPOL_BIND, &node0mask, 2,
>   MPOL_MF_STRICT | MPOL_MF_MOVE);
> mbind(ptr + ram_size/2, ram_size/2, MPOL_BIND, &node1mask, 2,
>   MPOL_MF_STRICT | MPOL_MF_MOVE);
> mlockall(MCL_CURRENT | MCL_FUTURE);
> } else {
> printf("mlockall --> mbind\n");
> mlockall(MCL_CURRENT | MCL_FUTURE);
> mbind(ptr, ram_size/2, MPOL_BIND, &node0mask, 2 ,
>   MPOL_MF_STRICT | MPOL_MF_MOVE);
> mbind(ptr + ram_size/2, ram_size/2, MPOL_BIND, &node1mask, 2,
>   MPOL_MF_STRICT | MPOL_MF_MOVE);
> }
> 
> RESULT 1:
> #time /home/test_mbind 10240 0
> memroy size 10737418240
> mlockall --> mbind
> 
> real0m11.886s
> user0m0.004s
> sys 0m11.865s
> #time /home/test_mbind 10240 1
> memroy size 10737418240
> mbind --> mlockall
> 
> real0m5.334s
> user0m0.000s
> sys 0m5.324s
> 
> RESULT 2:
> #time /home/test_mbind 4096 0
> memroy size 4294967296
> mlockall --> mbind
> 
> real0m5.503s
> user0m0.000s
> sys 0m5.492s
> #time /home/test_mbind 4096 1
> memroy size 4294967296
> mbind --> mlockall
> 
> real0m2.139s
> user0m0.000s
> sys 0m2.132s
> 
> Best Regards,
> zhanghailiang
> ---
>  vl.c | 11 +--
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/vl.c b/vl.c
> index dc792fe..adf4770 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -134,6 +134,7 @@ const char* keyboard_layout = NULL;
>  ram_addr_t ram_size;
>  const char *mem_path = NULL;
>  int mem_prealloc = 0; /* force preallocation of physical target memory */
> +int enable_mlock = false;
>  int nb_nics;
>  NICInfo nd_table[MAX_NICS];
>  int autostart;
> @@ -1421,12 +1422,8 @@ static void smp_parse(QemuOpts *opts)
>  
>  }
>  
> -static void configure_realtime(QemuOpts *opts)
> +static void realtime_init(void)
>  {
> -bool enable_mlock;
> -
> -enable_mlock = qemu_opt_get_bool(opts, "mlock", true);
> -
>  if (enable_mlock) {
>  if (os_mlock() < 0) {
>  fprintf(stderr, "qemu: locking memory failed\n");
> @@ -3973,7 +3970,7 @@ int main(int argc, char **argv, char **envp)
>  if (!opts) {
>  exit(1);
>  }
> -configure_realtime(opts);
> +enable_mlock = qemu_opt_get_bool(opts, "mlock", true);
>  break;
>  case QEMU_OPTION_msg:
>  opts = qemu_opts_parse(qemu_find_opts("msg"), optarg, 0);
> @@ -4441,6 +4438,8 @@ int main(int argc, char **argv, char **envp)
>  
>  machine_class->init(current_machine);
>  
> +realtime_init();
> +
>  audio_init();
>  
>  cpu_synchronize_all_post_init();
> -- 
> 1.7.12.4
>

Re: [Qemu-devel] [RFC patch 0/6] vfio based pci pass-through for qemu/KVM on s390

2014-09-23 Thread Alexander Graf



On 23.09.14 00:28, Alex Williamson wrote:
> On Tue, 2014-09-23 at 00:08 +0200, Alexander Graf wrote:
>>
>> On 22.09.14 22:47, Alex Williamson wrote:
>>> On Fri, 2014-09-19 at 13:54 +0200, frank.blasc...@de.ibm.com wrote:
 This set of patches implements a vfio based solution for pci
 pass-through on the s390 platform. The kernel stuff is pretty
 much straight forward, but qemu needs more work.

 Most interesting patch is:
   vfio: make vfio run on s390 platform

 I hope Alex & Alex can give me some guidance how to do the changes
 in an appropriate way. After creating a separate iommmu address space
 for each attached PCI device I can successfully run the vfio type1
 iommu. So If we could extend type1 not registering all guest memory
 (see patch) I think we do not need a special vfio iommu for s390
 for the moment.

 The patches implement the base pass-through support. s390 specific
 virtualization functions are currently not included. This would
 be a second step after the base support is done.

 kernel patches apply to linux-kvm-next

 KVM: s390: Enable PCI instructions
 iommu: add iommu for s390 platform
 vfio: make vfio build on s390

 qemu patches apply to qemu-master

 s390: Add PCI bus support
 s390: implement pci instruction
 vfio: make vfio run on s390 platform

 Thx for feedback and review comments
>>>
>>> Sending patches as attachments makes it difficult to comment inline.
>>>
>>> 2/6
>>>  - careful of the namespace as you're changing functions from static and
>>> exporting them
>>>  - doesn't seem like functions need to be exported, just non-static to
>>> call from s390-iommu.c
>>>
>>> 6/6
>>>  - We shouldn't need to globally disable mmap, each VFIO region reports
>>> whether it supports mmap and vfio-pci on s390 should indicate mmap is
>>> not supported on the platform.
>>
>> Can we emulate MMIO on mmap'ed regions by routing every memory access
>> via the kernel? It'd be slow, but at least make existing VFIO code
>> compatible.
> 
> Isn't that effectively what we do when we use memory_region_init_io() vs
> memory_region_init_ram_ptr() or are you suggesting something that can
> handle the MMIO without bouncing out to QEMU?  VFIO is already
> compatible with regions that cannot be mmap'd, the kernel just needs to
> report it as such.  Thanks,

Ah, cool. I guess I missed that part :). Then all is well.


Alex

Re: [Qemu-devel] [PATCH] vl: Adjust the place of calling mlockall to speedup VM's startup

2014-09-23 Thread Hu Tao

On Tue, Sep 23, 2014 at 11:30:26AM +0300, Michael S. Tsirkin wrote:
> On Tue, Sep 23, 2014 at 03:57:47PM +0800, zhanghailiang wrote:
> > If we configure mlock=on and memory policy=bind at the same time,
> > It will consume lots of time for system to treat with memory,
> > especially when call mbind after mlockall.
> > 
> > Adjust the place of calling mlockall, calling mbind before mlockall
> > can remarkably reduce the time of VM's startup.
> > 
> > Signed-off-by: zhanghailiang 
> 
> The idea makes absolute sense to me:
> bind after lock will force data copy of
> all pages. bind before lock gives us an
> indication where to put data on fault in.

Agreed.

> 
> Acked-by: Michael S. Tsirkin 
> 
> 
> > ---
> > Hi,
> > 
> > Actually, for mbind and mlockall, i have made a test about the time 
> > consuming 
> > for the different call sequence. 
> > 
> > The results is shown below. It is obviously that mlockall called before 
> > mbind is 
> > more time-consuming.
> > 
> > Besides, this patch is OK with memory hotplug.
> > 
> > TEST CODE:
> > if (mbind_first) {
> > printf("mbind --> mlockall\n");
> > mbind(ptr, ram_size/2, MPOL_BIND, &node0mask, 2,
> >   MPOL_MF_STRICT | MPOL_MF_MOVE);
> > mbind(ptr + ram_size/2, ram_size/2, MPOL_BIND, &node1mask, 2,
> >   MPOL_MF_STRICT | MPOL_MF_MOVE);
> > mlockall(MCL_CURRENT | MCL_FUTURE);
> > } else {
> > printf("mlockall --> mbind\n");
> > mlockall(MCL_CURRENT | MCL_FUTURE);
> > mbind(ptr, ram_size/2, MPOL_BIND, &node0mask, 2 ,
> >   MPOL_MF_STRICT | MPOL_MF_MOVE);
> > mbind(ptr + ram_size/2, ram_size/2, MPOL_BIND, &node1mask, 2,
> >   MPOL_MF_STRICT | MPOL_MF_MOVE);
> > }
> > 
> > RESULT 1:
> > #time /home/test_mbind 10240 0
> > memroy size 10737418240
> > mlockall --> mbind
> > 
> > real0m11.886s
> > user0m0.004s
> > sys 0m11.865s
> > #time /home/test_mbind 10240 1
> > memroy size 10737418240
> > mbind --> mlockall
> > 
> > real0m5.334s
> > user0m0.000s
> > sys 0m5.324s
> > 
> > RESULT 2:
> > #time /home/test_mbind 4096 0
> > memroy size 4294967296
> > mlockall --> mbind
> > 
> > real0m5.503s
> > user0m0.000s
> > sys 0m5.492s
> > #time /home/test_mbind 4096 1
> > memroy size 4294967296
> > mbind --> mlockall
> > 
> > real0m2.139s
> > user0m0.000s
> > sys 0m2.132s
> > 
> > Best Regards,
> > zhanghailiang
> > ---
> >  vl.c | 11 +--
> >  1 file changed, 5 insertions(+), 6 deletions(-)
> > 
> > diff --git a/vl.c b/vl.c
> > index dc792fe..adf4770 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -134,6 +134,7 @@ const char* keyboard_layout = NULL;
> >  ram_addr_t ram_size;
> >  const char *mem_path = NULL;
> >  int mem_prealloc = 0; /* force preallocation of physical target memory */
> > +int enable_mlock = false;

Why not bool?

Regards,
Hu

Re: [Qemu-devel] [PATCH 0/3] Fix confused output for alias properties

2014-09-23 Thread Paolo Bonzini

Il 23/09/2014 05:09, Gonglei (Arei) ha scritto:
> Hi,
> 
>> This doesn't change the fact that ObjectProperty is a generic struct,
>> and adding alias-specific fields there is wrong.
>
> OK, Maybe I should find other ways to attach this purpose and
> avoid layering violation. Thanks!

 Unfortunately I cannot think of any.

 We could add a description field to ObjectProperty, and replace
 legacy_name with a description.  The output then would be

 virtio-blk.drive=str (drive)
> 
> There is a question that the QOM properties are added dynamically.
> When we call qdev_alias_all_properties() adding alias properties to
> the source object all qdev properties on the target DeviceState, how do we
> judge the property's name and set the value of corresponding description 
> field?
> Such as setting virtio-blk-pci.drive.description = "drive".

You use the legacy_name field of PropertyInfo to set the description of
a qdev property, and then let object_property_add_alias() copy the
description.

Paolo

Re: [Qemu-devel] [PATCH 1/2] pc-dimm: No numa option shouldn't break hotplug memory feature

2014-09-23 Thread Igor Mammedov

On Mon, 22 Sep 2014 17:03:12 +0800
Tang Chen  wrote:

> Hi Igor,
> 
> On 09/19/2014 08:26 PM, Igor Mammedov wrote:
> > On Wed, 17 Sep 2014 16:32:20 +0800
> > Hu Tao  wrote:
> >
> >> On Tue, Sep 16, 2014 at 06:39:15PM +0800, zhanghailiang wrote:
> >>> If we do not configure numa option, memory hotplug should work as well.
> >>> It should not depend on numa option.
> >>>
> >>> Steps to reproduce:
> >>> (1) Start VM: qemu-kvm -m 1024,slots=4,maxmem=8G
> >>> (2) Hotplug memory
> >>> It will fail and reports:
> >>> "'DIMM property node has value 0' which exceeds the number of numa nodes: 
> >>> 0"
> >>>
> >> I rememberd Tang Chen had a patch for this bug, this is what Andrey 
> >> suggested:
> >>
> >>I thnk that there should be no
> >>cases when dimm is plugged (and check from patch is fired up) without
> >>actually populated NUMA, because not every OS will workaround this by
> >>faking the node.
> > This doesn't take in to account that dimm device by itself has nothing to do
> > with numa (numa is just optional property of its representation in ACPI land
> > and nothing else).
> >
> > In case initial memory is converted to dimm devices, qemu can be
> > started without numa option and it still must work.
> >
> > So I'm in favor of this path.
> 
> I just did some tests. Even if I modify qemu code and make hotpluggable 
> bit in SRAT 0,
> memory hotplug will still work on Linux guest, which means Linux kernel 
> doesn't check
> SRAT info after system is up when doing memory hotplug.
> 
> I did the following modification in hw/i386/acpi-build.c
> -ram_addr_t hotplugabble_address_space_size =
> -object_property_get_int(OBJECT(pcms), PC_MACHINE_MEMHP_REGION_SIZE,
> -NULL);
> +ram_addr_t hotplugabble_address_space_size = 0;
> 
> And when the guest is up, no memory should be hotpluggable, I think. But 
> I hot-added
> memory successfully.
> 
> IMHO, I think memory hotplug should based on ACPI on Linux. And SRAT 
> tells system
> which memory ranges are hotpluggable, and we should follow it. So I 
> think Linux kernel
> has some problem in this issue.
It's fine to use SRAT for these purposes on baremetal NUMA systems since
due to used chipset constrains it's possible statically allocate ranges
for every possible DIMM socket.
However SRAT(which is optional table BTW) entries are not mandatory
and override-able by ACPI Device's _PXM/_CRS methods replacing needs
for SRAT entries and QEMU uses this fact by supplying these methods.
QEMU adds FAKE SRAT entry only to workaround Windows limitation,
and for nothing else.

I think Linux does not violate ACPI spec and behaves as expected, moreover
it's more correct than Windows since memory hotplug will work on non NUMA
machines as well.

Hence I think this patch is correct and allows memory hotplug in absence
of NUMA configuration. It also would allow to use pc-dimm as replacement
for initial memory for non-NUMA configs (which is on my TODO list)

As for the Windows, QEMU has no idea what OS it would be running,
I see 2 ways to solve issue:
 1. user should know that memory hotplug on Windows requires NUMA machine
and specify "-numa ..." option for this case.
   (I've discussed this with libvirt folks and was promised that
if user enables memory hotplug, libvirt would provide "-numa" option
to workaround Windows issue)

 2. QEMU could unconditionally create single NUMA if memory hotplug is
enabled. (but that should be enable only for 2.2 or late machines
to avoid migration issues)

> 
> I'd like to fix it like this:
> 
> 1. Send patches to make Linux kernel to check SRAT info when doing 
> memory hotplug.
>  (Now, SRAT is only checked at boot time.)
> 
> 2. In qemu, when users gave a memory hotplug option without NUMA 
> options, we create
>  node0 and node1, and make node1 hotpluggable.
>  This is because when using MOVABLE_NODE, node0 in which the kernel 
> resides in should
>  not be hotpluggable.
>  Of course, make part of memory in node0 hotpluggable is OK, but on 
> a real machine, no
>  one will do this, I think. So I suggest above idea.
> 
> Thanks. :)
> 
> >
> >> https://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg04587.html
> >>
> >> Have you tested this patch with Windows guest?
> >>
> >> Regards,
> >> Hu
> >
> > .
> >
> 
>

[Qemu-devel] [RFC PATCH] qcow2: Fix race in cache invalidation

2014-09-23 Thread Alexey Kardashevskiy

On 09/19/2014 06:47 PM, Kevin Wolf wrote:> Am 16.09.2014 um 14:59 hat Paolo 
Bonzini geschrieben:
>> Il 16/09/2014 14:52, Kevin Wolf ha scritto:
>>> Yes, that's true. We can't fix this problem in qcow2, though, because
>>> it's a more general one.  I think we must make sure that
>>> bdrv_invalidate_cache() doesn't yield.
>>>
>>> Either by forbidding to run bdrv_invalidate_cache() in a coroutine and
>>> moving the problem to the caller (where and why is it even called from a
>>> coroutine?), or possibly by creating a new coroutine for the driver
>>> callback and running that in a nested event loop that only handles
>>> bdrv_invalidate_cache() callbacks, so that the NBD server doesn't get a
>>> chance to process new requests in this thread.
>>
>> Incoming migration runs in a coroutine (the coroutine entry point is
>> process_incoming_migration_co).  But everything after qemu_fclose() can
>> probably be moved into a separate bottom half, so that it gets out of
>> coroutine context.
> 
> Alexey, you should probably rather try this (and add a bdrv_drain_all()
> in bdrv_invalidate_cache) than messing around with qcow2 locks. This
> isn't a problem that can be completely fixed in qcow2.


Ok. Tried :) Not very successful though. The patch is below.

Is that the correct bottom half? When I did it, I started getting crashes
in various sport on accesses to s->l1_cache which is NULL after qcow2_close.
Normally the code would check s->l1_size and then use but they are out of sync.

So I clear it in qcow2_close(). This allowed migrated guest to work and not
to crash until I shut it down when it aborted at "HERE IT FAILS ON SHUTDOWN".

Here I realized I am missing something in this picture again, what is it?
Thanks!


---
 block.c |  2 ++
 block/qcow2-cache.c |  2 +-
 block/qcow2.c   | 50 --
 block/qcow2.h   |  4 
 4 files changed, 43 insertions(+), 15 deletions(-)

diff --git a/block.c b/block.c
index d06dd51..1e6dfd1 100644
--- a/block.c
+++ b/block.c
@@ -5044,6 +5044,8 @@ void bdrv_invalidate_cache(BlockDriverState *bs, Error 
**errp)
 error_setg_errno(errp, -ret, "Could not refresh total sector count");
 return;
 }
+
+bdrv_drain_all();
 }
 
 void bdrv_invalidate_cache_all(Error **errp)
diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index 904f6b1..59ff48c 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -259,7 +259,7 @@ static int qcow2_cache_find_entry_to_replace(Qcow2Cache *c)
 if (min_index == -1) {
 /* This can't happen in current synchronous code, but leave the check
  * here as a reminder for whoever starts using AIO with the cache */
-abort();
+abort(); // < HERE IT FAILS ON SHUTDOWN
 }
 return min_index;
 }
diff --git a/block/qcow2.c b/block/qcow2.c
index f9e045f..2b84562 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1399,6 +1399,7 @@ static void qcow2_close(BlockDriverState *bs)
 qemu_vfree(s->l1_table);
 /* else pre-write overlap checks in cache_destroy may crash */
 s->l1_table = NULL;
+s->l1_size = 0;
 
 if (!(bs->open_flags & BDRV_O_INCOMING)) {
 qcow2_cache_flush(bs, s->l2_table_cache);
@@ -1419,16 +1420,11 @@ static void qcow2_close(BlockDriverState *bs)
 qcow2_free_snapshots(bs);
 }
 
+static void qcow2_invalidate_cache_bh_cb(void *opaque);
+
 static void qcow2_invalidate_cache(BlockDriverState *bs, Error **errp)
 {
 BDRVQcowState *s = bs->opaque;
-int flags = s->flags;
-AES_KEY aes_encrypt_key;
-AES_KEY aes_decrypt_key;
-uint32_t crypt_method = 0;
-QDict *options;
-Error *local_err = NULL;
-int ret;
 
 /*
  * Backing files are read-only which makes all of their metadata immutable,
@@ -1436,13 +1432,28 @@ static void qcow2_invalidate_cache(BlockDriverState 
*bs, Error **errp)
  */
 
 if (s->crypt_method) {
-crypt_method = s->crypt_method;
-memcpy(&aes_encrypt_key, &s->aes_encrypt_key, sizeof(aes_encrypt_key));
-memcpy(&aes_decrypt_key, &s->aes_decrypt_key, sizeof(aes_decrypt_key));
+s->bh_crypt_method = s->crypt_method;
+memcpy(&s->bh_aes_encrypt_key, &s->aes_encrypt_key, 
sizeof(s->bh_aes_encrypt_key));
+memcpy(&s->bh_aes_decrypt_key, &s->aes_decrypt_key, 
sizeof(s->bh_aes_decrypt_key));
+} else {
+s->bh_crypt_method = 0;
 }
 
 qcow2_close(bs);
 
+s->cache_inv_bh = aio_bh_new(bdrv_get_aio_context(bs),
+ qcow2_invalidate_cache_bh_cb,
+ bs);
+}
+
+static void qcow2_invalidate_cache_bh(BlockDriverState *bs, Error **errp)
+{
+BDRVQcowState *s = bs->opaque;
+int flags = s->flags;
+QDict *options;
+Error *local_err = NULL;
+int ret;
+
 bdrv_invalidate_cache(bs->file, &local_err);
 if (local_err) {
 error_propagate(errp, local_err);
@@ -1464,11 +1475,22 @@ static void qcow2_invalidate_cache(Bl

Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests

2014-09-23 Thread Kevin Wolf

Am 08.09.2014 um 16:54 hat Peter Lieven geschrieben:
> On 08.09.2014 16:42, Paolo Bonzini wrote:
> >Il 08/09/2014 16:35, Peter Lieven ha scritto:
> >>Whats your opinion changed the max_xfer_len to 0x regardsless
> >>of use_16_for_rw in iSCSI?
> >If you implemented request splitting in the block layer, it would be
> >okay to force max_xfer_len to 0x.
> 
> Unfortunately, I currently have no time for that. It will include some 
> shuffling with
> qiovs that has to be properly tested.
> 
> Regarding iSCSI: In fact currently the limit is 0x for all iSCSI Targets 
> < 2TB.
> So I thought that its not obvious at all why a > 2TB target can handle bigger 
> requests.
> 
> To the root cause of this patch multiwrite_merge I still have some thoughts:
>  - why are we merging requests for raw (especially host devices and/or iSCSI?)
>The original patch from Kevin was to mitigate a QCOW2 performance 
> regression.

The problem wasn't in qcow2, though, it just became more apparent there
because lots of small requests are deadly to performance during cluster
allocation. Installation simply took ages compared to IDE. If you do a
real benchmark, you'll probably see (smaller) differences with raw, too.

The core problem is virtio-blk getting much smaller requests. IIRC, I
got an average of 128k-512k per request for IDE and something as poor as
4k-32k for virtio-blk. If I read this thread correctly, you're saying
that this is still the case. I suspect there is something wrong with the
guest driver, but somebody would have to dig into that.

>For iSCSI the qiov concats are destroying all the zerocopy efforts we made.

If this is true, it just means that the iscsi driver sucks for vectored
I/O and needs to be fixed.

>  - should we only merge requests within the same cluster?

Does it hurt to merge everything we can? The block driver needs to be
able to take things apart anyway, the large request could come from
somewhere else (guest, block job, builtin NBD server, etc.)

>  - why is there no multiread_merge?

Because nobody implemented it. :-)

As I said above, writes hurt a lot because of qcow2 cluster allocation.
Reads are probably losing some performance as well (someone would need
to check this), but processing them separately isn't quite as painful.

Kevin

Re: [Qemu-devel] [PATCH v2 0/2] list supported machine types in well-defined order

2014-09-23 Thread Michael S. Tsirkin

On Mon, Sep 22, 2014 at 10:38:34PM +0200, Laszlo Ersek wrote:
> The first patch introduces a generic comparator. This comparator should
> cover all machine types at once that don't belong to machine type
> "families". Hence, for example, the output it produces for
> 
>   qemu-system-aarch64 -M \?
> 
> is meant to be final. (See examples in the patches.)
> 
> The second patch files piix and q35 machine types into their respective
> families.
> 
> Paolo said we needed to care about "pseries, pc, q35", but I got no clue
> about "pseries", so I didn't touch that. It shouldn't be hard for
> someone who knows "pseries" to post a followup patch that covers it.
> Until then, "pseries" machine types are listed in alphabetical order (no
> families).
> 
> Laszlo Ersek (2):
>   well-defined listing order for machine types
>   i386/pc: add piix and q35 machtypes to sorting families for -M \?
> 
>  include/hw/boards.h |  2 ++
>  hw/i386/pc.c|  1 +
>  hw/i386/pc_piix.c   |  1 +
>  hw/i386/pc_q35.c|  1 +
>  vl.c| 38 +-
>  5 files changed, 42 insertions(+), 1 deletion(-)

Acked-by: Michael S. Tsirkin 


> -- 
> 1.8.3.1

Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests

2014-09-23 Thread Peter Lieven


On 23.09.2014 10:47, Kevin Wolf wrote:

Am 08.09.2014 um 16:54 hat Peter Lieven geschrieben:

On 08.09.2014 16:42, Paolo Bonzini wrote:

Il 08/09/2014 16:35, Peter Lieven ha scritto:

Whats your opinion changed the max_xfer_len to 0x regardsless
of use_16_for_rw in iSCSI?

If you implemented request splitting in the block layer, it would be
okay to force max_xfer_len to 0x.

Unfortunately, I currently have no time for that. It will include some 
shuffling with
qiovs that has to be properly tested.

Regarding iSCSI: In fact currently the limit is 0x for all iSCSI Targets < 
2TB.
So I thought that its not obvious at all why a > 2TB target can handle bigger 
requests.

To the root cause of this patch multiwrite_merge I still have some thoughts:
  - why are we merging requests for raw (especially host devices and/or iSCSI?)
The original patch from Kevin was to mitigate a QCOW2 performance 
regression.

The problem wasn't in qcow2, though, it just became more apparent there
because lots of small requests are deadly to performance during cluster
allocation. Installation simply took ages compared to IDE. If you do a
real benchmark, you'll probably see (smaller) differences with raw, too.

The core problem is virtio-blk getting much smaller requests. IIRC, I
got an average of 128k-512k per request for IDE and something as poor as
4k-32k for virtio-blk. If I read this thread correctly, you're saying
that this is still the case. I suspect there is something wrong with the
guest driver, but somebody would have to dig into that.


This seems to be the case. I will check if this disappears with most
recent kernels virtio-blk versions.




For iSCSI the qiov concats are destroying all the zerocopy efforts we made.

If this is true, it just means that the iscsi driver sucks for vectored
I/O and needs to be fixed.


It works quite well, I misunderstood the iovec_concat routine at first. I 
thought
it was copying payload data around. The overhead for bilding the qiov structure
only should be neglectible.





  - should we only merge requests within the same cluster?

Does it hurt to merge everything we can? The block driver needs to be
able to take things apart anyway, the large request could come from
somewhere else (guest, block job, builtin NBD server, etc.)


I was just thinking. It was unclear what the original intention was.

My concern was that merging too much will increase latency for
the specific I/O even if it increases throughput.




  - why is there no multiread_merge?

Because nobody implemented it. :-)


Maybe its worth looking at this. Taking the multiwrite_merge stuff as
a basis it shouldn't be too hard.



As I said above, writes hurt a lot because of qcow2 cluster allocation.
Reads are probably losing some performance as well (someone would need
to check this), but processing them separately isn't quite as painful.


I will try to sort that out.

Peter

Re: [Qemu-devel] [PATCH 1/2] pc-dimm: No numa option shouldn't break hotplug memory feature

2014-09-23 Thread Tang Chen



On 09/23/2014 04:40 PM, Igor Mammedov wrote:

..
It's fine to use SRAT for these purposes on baremetal NUMA systems since
due to used chipset constrains it's possible statically allocate ranges
for every possible DIMM socket.
However SRAT(which is optional table BTW) entries are not mandatory
and override-able by ACPI Device's _PXM/_CRS methods replacing needs
for SRAT entries and QEMU uses this fact by supplying these methods.
QEMU adds FAKE SRAT entry only to workaround Windows limitation,
and for nothing else.

I think Linux does not violate ACPI spec and behaves as expected, moreover
it's more correct than Windows since memory hotplug will work on non NUMA
machines as well.

Hence I think this patch is correct and allows memory hotplug in absence
of NUMA configuration. It also would allow to use pc-dimm as replacement
for initial memory for non-NUMA configs (which is on my TODO list)

As for the Windows, QEMU has no idea what OS it would be running,
I see 2 ways to solve issue:
  1. user should know that memory hotplug on Windows requires NUMA machine
 and specify "-numa ..." option for this case.
(I've discussed this with libvirt folks and was promised that
 if user enables memory hotplug, libvirt would provide "-numa" option
 to workaround Windows issue)

  2. QEMU could unconditionally create single NUMA if memory hotplug is
 enabled. (but that should be enable only for 2.2 or late machines
 to avoid migration issues)


I prefer 2. I'll try to send patches for it if Zhang is also OK with it.

Thanks. :)

Re: [Qemu-devel] [PATCH 4/4] block: avoid creating oversized writes in multiwrite_merge

2014-09-23 Thread Kevin Wolf

Am 23.09.2014 um 08:15 hat Peter Lieven geschrieben:
> On 22.09.2014 21:06, Paolo Bonzini wrote:
> >Il 22/09/2014 11:43, Peter Lieven ha scritto:
> >>This series aims not at touching default behaviour. The default for 
> >>max_transfer_length
> >>is 0 (no limit). max_transfer_length is a limit that MUST be satisfied 
> >>otherwise the request
> >>will fail. And Patch 2 aims at catching this fail earlier in the stack.
> >Understood.  But the right fix is to avoid that backend limits transpire
> >into guest ABI, not to catch the limits earlier.  So the right fix would
> >be to implement request splitting.
> 
> Since you proposed to add traces for this would you leave those in?
> And since iSCSI is the only user of this at the moment would you
> go for implementing this check in the iSCSI block layer?
> 
> As for the split logic would you think it is enough to build new qiov's
> out of the too big iov without copying the contents? This would work
> as long as a single iov inside the qiov is not bigger the max_transfer_length.
> Otherwise I would need to allocate temporary buffers and copy around.

You can split single iovs, too. There are functions that make this very
easy, they copy a sub-qiov with a byte granularity offset and length
(qemu_iovec_concat and friends). qcow2 uses the same to split requests
at (fragmented) cluster boundaries.

Kevin

Re: [Qemu-devel] [PATCH 1/2] pc-dimm: No numa option shouldn't break hotplug memory feature

2014-09-23 Thread Igor Mammedov

On Mon, 22 Sep 2014 14:17:28 +0300
"Michael S. Tsirkin"  wrote:

> On Fri, Sep 19, 2014 at 02:37:46PM +0200, Igor Mammedov wrote:
> > On Tue, 16 Sep 2014 18:39:15 +0800
> > zhanghailiang  wrote:
> > 
> > > If we do not configure numa option, memory hotplug should work as well.
> > > It should not depend on numa option.
> > > 
> > > Steps to reproduce:
> > > (1) Start VM: qemu-kvm -m 1024,slots=4,maxmem=8G
> > > (2) Hotplug memory
> > > It will fail and reports:
> > > "'DIMM property node has value 0' which exceeds the number of numa nodes: 
> > > 0"
> > > 
> > > Signed-off-by: zhanghailiang 
> > > ---
> > >  hw/mem/pc-dimm.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
> > > index 5bfc5b7..a800ea7 100644
> > > --- a/hw/mem/pc-dimm.c
> > > +++ b/hw/mem/pc-dimm.c
> > > @@ -252,7 +252,7 @@ static void pc_dimm_realize(DeviceState *dev, Error 
> > > **errp)
> > >  error_setg(errp, "'" PC_DIMM_MEMDEV_PROP "' property is not 
> > > set");
> > >  return;
> > >  }
> > > -if (dimm->node >= nb_numa_nodes) {
> > > +if ((nb_numa_nodes > 0) && (dimm->node >= nb_numa_nodes)) {
> > >  error_setg(errp, "'DIMM property " PC_DIMM_NODE_PROP " has value 
> > > %"
> > > PRIu32 "' which exceeds the number of numa nodes: %d",
> > > dimm->node, nb_numa_nodes);
> > 
> > Reviewed-By: Igor Mammedov 
> 
> 
> I read:
> > Hmm, I have just tested this, and Yes, it didn't work for Windows guest.
> > Thanks for your kind reminder.;)
> 
> So should I expect v2 which works with windows?
Hotplug wouldn't work with Windows without -numa (it's Windows limitation)
and more importantly pc-dimm shouldn't be limited only to NUMA configs
which this patch fixes.
This patch is fine and should go to stable as well.

On top of this we could add automatic NUMA node creation when
memory hotplug is enabled if this Windows workaround is acceptable.

[Qemu-devel] [PATCH] qemu-log: add log category for MIPS MMU fault info

2014-09-23 Thread Antony Pavlov

Running barebox on qemu-system-mips* with '-d unimp' overloads
stderr by very very many mips_cpu_handle_mmu_fault() messages:

  mips_cpu_handle_mmu_fault address=b80003fd ret 0 physical 180003fd 
prot 3
  mips_cpu_handle_mmu_fault address=a0800884 ret 0 physical 00800884 
prot 3
  mips_cpu_handle_mmu_fault pc a080cd80 ad b80003fd rw 0 mmu_idx 0

So it's very difficult to find LOG_UNIMP message.

The mips_cpu_handle_mmu_fault() messages appears on enabling ANY
logging! It's not very handy.

Adding separate log category for mips_cpu_handle_mmu_fault()
logging fixes the problem.

Signed-off-by: Antony Pavlov 
---
 include/qemu/log.h   | 1 +
 qemu-log.c   | 2 ++
 target-mips/helper.c | 6 --
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/qemu/log.h b/include/qemu/log.h
index d515424..195f665 100644
--- a/include/qemu/log.h
+++ b/include/qemu/log.h
@@ -40,6 +40,7 @@ static inline bool qemu_log_enabled(void)
 #define CPU_LOG_RESET  (1 << 9)
 #define LOG_UNIMP  (1 << 10)
 #define LOG_GUEST_ERROR(1 << 11)
+#define CPU_LOG_MMU(1 << 12)
 
 /* Returns true if a bit is set in the current loglevel mask
  */
diff --git a/qemu-log.c b/qemu-log.c
index 797f2af..d27766a 100644
--- a/qemu-log.c
+++ b/qemu-log.c
@@ -110,6 +110,8 @@ const QEMULogItem qemu_log_items[] = {
   "x86 only: show protected mode far calls/returns/exceptions" },
 { CPU_LOG_RESET, "cpu_reset",
   "x86 only: show CPU state before CPU resets" },
+{ CPU_LOG_MMU, "mmu",
+  "mips only: show MMU fault handling information" },
 { CPU_LOG_IOPORT, "ioport",
   "show all i/o ports accesses" },
 { LOG_UNIMP, "unimp",
diff --git a/target-mips/helper.c b/target-mips/helper.c
index 8a997e4..cb41061 100644
--- a/target-mips/helper.c
+++ b/target-mips/helper.c
@@ -309,7 +309,8 @@ int mips_cpu_handle_mmu_fault(CPUState *cs, vaddr address, 
int rw,
 #if 0
 log_cpu_state(cs, 0);
 #endif
-qemu_log("%s pc " TARGET_FMT_lx " ad %" VADDR_PRIx " rw %d mmu_idx %d\n",
+qemu_log_mask(CPU_LOG_MMU,
+  "%s pc " TARGET_FMT_lx " ad %" VADDR_PRIx " rw %d mmu_idx %d\n",
   __func__, env->active_tc.PC, address, rw, mmu_idx);
 
 rw &= 1;
@@ -321,7 +322,8 @@ int mips_cpu_handle_mmu_fault(CPUState *cs, vaddr address, 
int rw,
 access_type = ACCESS_INT;
 ret = get_physical_address(env, &physical, &prot,
address, rw, access_type);
-qemu_log("%s address=%" VADDR_PRIx " ret %d physical " TARGET_FMT_plx
+qemu_log_mask(CPU_LOG_MMU,
+ "%s address=%" VADDR_PRIx " ret %d physical " TARGET_FMT_plx
  " prot %d\n",
  __func__, address, ret, physical, prot);
 if (ret == TLBRET_MATCH) {
-- 
2.1.0

Re: [Qemu-devel] [PATCH 4/4] block: avoid creating oversized writes in multiwrite_merge

2014-09-23 Thread Peter Lieven


On 23.09.2014 10:59, Kevin Wolf wrote:

Am 23.09.2014 um 08:15 hat Peter Lieven geschrieben:

On 22.09.2014 21:06, Paolo Bonzini wrote:

Il 22/09/2014 11:43, Peter Lieven ha scritto:

This series aims not at touching default behaviour. The default for 
max_transfer_length
is 0 (no limit). max_transfer_length is a limit that MUST be satisfied 
otherwise the request
will fail. And Patch 2 aims at catching this fail earlier in the stack.

Understood.  But the right fix is to avoid that backend limits transpire
into guest ABI, not to catch the limits earlier.  So the right fix would
be to implement request splitting.

Since you proposed to add traces for this would you leave those in?
And since iSCSI is the only user of this at the moment would you
go for implementing this check in the iSCSI block layer?

As for the split logic would you think it is enough to build new qiov's
out of the too big iov without copying the contents? This would work
as long as a single iov inside the qiov is not bigger the max_transfer_length.
Otherwise I would need to allocate temporary buffers and copy around.

You can split single iovs, too. There are functions that make this very
easy, they copy a sub-qiov with a byte granularity offset and length
(qemu_iovec_concat and friends). qcow2 uses the same to split requests
at (fragmented) cluster boundaries.


Thanks for that pointer. Looking at qemu_iovec_concat this seems to be a
nobrainer.

I will first send an updated series dropping Patch 2 and will then send a 
follow-up
that splits requests.

Peter

Re: [Qemu-devel] [PATCH 0/3] Fix confused output for alias properties

2014-09-23 Thread Markus Armbruster

"Gonglei (Arei)"  writes:

> Hi,
>
>>  This doesn't change the fact that ObjectProperty is a generic struct,
>>  and adding alias-specific fields there is wrong.
>> >>>
>> >>> OK, Maybe I should find other ways to attach this purpose and
>> >>> avoid layering violation. Thanks!
>> >>
>> >> Unfortunately I cannot think of any.
>> >>
>> >> We could add a description field to ObjectProperty, and replace
>> >> legacy_name with a description.  The output then would be
>> >>
>> >> virtio-blk.drive=str (drive)
>
> There is a question that the QOM properties are added dynamically.
> When we call qdev_alias_all_properties() adding alias properties to
> the source object all qdev properties on the target DeviceState, how do we
> judge the property's name and set the value of corresponding description 
> field?
> Such as setting virtio-blk-pci.drive.description = "drive".
>
> The only way I think of is using the property' name as the description. In 
> object_property_add_alias() add below code:
>
> +   op->description = g_strdup(name);
>
> After those handling we can get results:
>
> virtio-blk-pci.physical_block_size=uint16 (physical_block_size)
> virtio-blk-pci.logical_block_size=uint16 (logical_block_size)
> virtio-blk-pci.drive=str (drive)
>
> virtio-net-pci.netdev=str (netdev)
> virtio-net-pci.vlan=int (vlan)
> virtio-net-pci.mac=str (mac)
>
> But if using this way, we just need simply modify make_device_property_info()
> like below patch (just assure the new output resemble the old, don't need 
> change
> ObjectProperty struct).  Am I right? Thanks :)

Adding descriptions to properties is a big, but useful task.  The
descriptions can serve as documentation in the code, and they can be
used to provide better help.  This applies both to qdev and to object
properties.

Completing this task in one go is unfortunately not practical.  We need
to add descriptions incrementally.  Until we're done, some properties
will lack descriptions (null pointer rather than a descriptive string).

Reusing the property name as description just to have a description adds
no information.

What about this: add a description string (optional for now) both to
ObjectProperty and to PropertyInfo.  Either add a description string
parameter to object property constructors like object_property_add() and
object_property_add_alias, or, for less churn, add a separate function
to set an object property's description.  Update
qdev_alias_all_properties() to set the object property's description to
the qdev property's.

Then you can fix the help regression by giving all the regressed
properties a useful description.

That fix should also permit retiring legacy_name.

Re: [Qemu-devel] [PATCH] vl: Adjust the place of calling mlockall to speedup VM's startup

2014-09-23 Thread zhanghailiang


On 2014/9/23 16:35, Hu Tao wrote:

On Tue, Sep 23, 2014 at 11:30:26AM +0300, Michael S. Tsirkin wrote:

On Tue, Sep 23, 2014 at 03:57:47PM +0800, zhanghailiang wrote:

If we configure mlock=on and memory policy=bind at the same time,
It will consume lots of time for system to treat with memory,
especially when call mbind after mlockall.

Adjust the place of calling mlockall, calling mbind before mlockall
can remarkably reduce the time of VM's startup.

Signed-off-by: zhanghailiang 


The idea makes absolute sense to me:
bind after lock will force data copy of
all pages. bind before lock gives us an
indication where to put data on fault in.


Agreed.



Acked-by: Michael S. Tsirkin 



---
Hi,

Actually, for mbind and mlockall, i have made a test about the time consuming
for the different call sequence.

The results is shown below. It is obviously that mlockall called before mbind is
more time-consuming.

Besides, this patch is OK with memory hotplug.

TEST CODE:
 if (mbind_first) {
 printf("mbind --> mlockall\n");
 mbind(ptr, ram_size/2, MPOL_BIND, &node0mask, 2,
   MPOL_MF_STRICT | MPOL_MF_MOVE);
 mbind(ptr + ram_size/2, ram_size/2, MPOL_BIND, &node1mask, 2,
   MPOL_MF_STRICT | MPOL_MF_MOVE);
 mlockall(MCL_CURRENT | MCL_FUTURE);
 } else {
 printf("mlockall --> mbind\n");
 mlockall(MCL_CURRENT | MCL_FUTURE);
 mbind(ptr, ram_size/2, MPOL_BIND, &node0mask, 2 ,
   MPOL_MF_STRICT | MPOL_MF_MOVE);
 mbind(ptr + ram_size/2, ram_size/2, MPOL_BIND, &node1mask, 2,
   MPOL_MF_STRICT | MPOL_MF_MOVE);
 }

RESULT 1:
#time /home/test_mbind 10240 0
memroy size 10737418240
mlockall --> mbind

real0m11.886s
user0m0.004s
sys 0m11.865s
#time /home/test_mbind 10240 1
memroy size 10737418240
mbind --> mlockall

real0m5.334s
user0m0.000s
sys 0m5.324s

RESULT 2:
#time /home/test_mbind 4096 0
memroy size 4294967296
mlockall --> mbind

real0m5.503s
user0m0.000s
sys 0m5.492s
#time /home/test_mbind 4096 1
memroy size 4294967296
mbind --> mlockall

real0m2.139s
user0m0.000s
sys 0m2.132s

Best Regards,
zhanghailiang
---
  vl.c | 11 +--
  1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/vl.c b/vl.c
index dc792fe..adf4770 100644
--- a/vl.c
+++ b/vl.c
@@ -134,6 +134,7 @@ const char* keyboard_layout = NULL;
  ram_addr_t ram_size;
  const char *mem_path = NULL;
  int mem_prealloc = 0; /* force preallocation of physical target memory */
+int enable_mlock = false;


Why not bool?



Er, that is my fault, Will fix it and submit V2, Thanks;)


Regards,
Hu

.

Re: [Qemu-devel] [PATCH] qemu-log: add log category for MIPS MMU fault info

2014-09-23 Thread Peter Maydell

On 23 September 2014 10:04, Antony Pavlov  wrote:
> Running barebox on qemu-system-mips* with '-d unimp' overloads
> stderr by very very many mips_cpu_handle_mmu_fault() messages:
>
>   mips_cpu_handle_mmu_fault address=b80003fd ret 0 physical 180003fd 
> prot 3
>   mips_cpu_handle_mmu_fault address=a0800884 ret 0 physical 00800884 
> prot 3
>   mips_cpu_handle_mmu_fault pc a080cd80 ad b80003fd rw 0 mmu_idx 0
>
> So it's very difficult to find LOG_UNIMP message.
>
> The mips_cpu_handle_mmu_fault() messages appears on enabling ANY
> logging! It's not very handy.
>
> Adding separate log category for mips_cpu_handle_mmu_fault()
> logging fixes the problem.

I don't think we should have CPU-specific logging categories
(the x86-only categories are somewhat legacy). Can you
make this a category that applies to all CPU architectures
rather than a MIPS specific one, please? Given most of them
have an MMU I think that just means removing the "mips
only" text from the help documentation...

thanks
-- PMM

Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests

2014-09-23 Thread Kevin Wolf

Am 23.09.2014 um 10:55 hat Peter Lieven geschrieben:
> >>  - should we only merge requests within the same cluster?
> >Does it hurt to merge everything we can? The block driver needs to be
> >able to take things apart anyway, the large request could come from
> >somewhere else (guest, block job, builtin NBD server, etc.)
> 
> I was just thinking. It was unclear what the original intention was.

I hope it's clearer now. Anyway, for qcow2 merging even across clusters
does help a bit if those clusters are physically contiguous.

> My concern was that merging too much will increase latency for
> the specific I/O even if it increases throughput.

That's probably a valid concern. Perhaps we should add a set of options
to enable or disable certain request manipulations that the block layer
can do.  This includes request merging, but also alignment adjustments,
zero detection (which already has an option) etc.

> >>  - why is there no multiread_merge?
> >Because nobody implemented it. :-)
> 
> Maybe its worth looking at this. Taking the multiwrite_merge stuff as
> a basis it shouldn't be too hard.

Yes, it should be doable. We need numbers, though, before merging
something like this.

> >As I said above, writes hurt a lot because of qcow2 cluster allocation.
> >Reads are probably losing some performance as well (someone would need
> >to check this), but processing them separately isn't quite as painful.
> 
> I will try to sort that out.

Great, thanks!

Kevin

Re: [Qemu-devel] [PATCH 0/3] Fix confused output for alias properties

2014-09-23 Thread Michael S. Tsirkin

On Tue, Sep 23, 2014 at 11:06:02AM +0200, Markus Armbruster wrote:
> "Gonglei (Arei)"  writes:
> 
> > Hi,
> >
> >>  This doesn't change the fact that ObjectProperty is a generic struct,
> >>  and adding alias-specific fields there is wrong.
> >> >>>
> >> >>> OK, Maybe I should find other ways to attach this purpose and
> >> >>> avoid layering violation. Thanks!
> >> >>
> >> >> Unfortunately I cannot think of any.
> >> >>
> >> >> We could add a description field to ObjectProperty, and replace
> >> >> legacy_name with a description.  The output then would be
> >> >>
> >> >> virtio-blk.drive=str (drive)
> >
> > There is a question that the QOM properties are added dynamically.
> > When we call qdev_alias_all_properties() adding alias properties to
> > the source object all qdev properties on the target DeviceState, how do we
> > judge the property's name and set the value of corresponding description 
> > field?
> > Such as setting virtio-blk-pci.drive.description = "drive".
> >
> > The only way I think of is using the property' name as the description. In 
> > object_property_add_alias() add below code:
> >
> > +   op->description = g_strdup(name);
> >
> > After those handling we can get results:
> >
> > virtio-blk-pci.physical_block_size=uint16 (physical_block_size)
> > virtio-blk-pci.logical_block_size=uint16 (logical_block_size)
> > virtio-blk-pci.drive=str (drive)
> >
> > virtio-net-pci.netdev=str (netdev)
> > virtio-net-pci.vlan=int (vlan)
> > virtio-net-pci.mac=str (mac)
> >
> > But if using this way, we just need simply modify 
> > make_device_property_info()
> > like below patch (just assure the new output resemble the old, don't need 
> > change
> > ObjectProperty struct).  Am I right? Thanks :)
> 
> Adding descriptions to properties is a big, but useful task.  The
> descriptions can serve as documentation in the code, and they can be
> used to provide better help.  This applies both to qdev and to object
> properties.
> 
> Completing this task in one go is unfortunately not practical.  We need
> to add descriptions incrementally.  Until we're done, some properties
> will lack descriptions (null pointer rather than a descriptive string).
> 
> Reusing the property name as description just to have a description adds
> no information.
> 
> What about this: add a description string (optional for now) both to
> ObjectProperty and to PropertyInfo.  Either add a description string
> parameter to object property constructors like object_property_add() and
> object_property_add_alias, or, for less churn, add a separate function
> to set an object property's description.  Update
> qdev_alias_all_properties() to set the object property's description to
> the qdev property's.
> 
> Then you can fix the help regression by giving all the regressed
> properties a useful description.
> 
> That fix should also permit retiring legacy_name.

Ack. Sounds good to me.

Re: [Qemu-devel] [PATCH 0/3] Fix confused output for alias properties

2014-09-23 Thread Gonglei (Arei)







> -Original Message-
> From: Markus Armbruster [mailto:arm...@redhat.com]
> Sent: Tuesday, September 23, 2014 5:06 PM
> To: Gonglei (Arei)
> Cc: Paolo Bonzini; Michael S. Tsirkin; ag...@suse.de; Huangweidong (C);
> stefa...@redhat.com; Huangpeng (Peter); qemu-devel@nongnu.org;
> aligu...@amazon.com; lcapitul...@redhat.com
> Subject: Re: [Qemu-devel] [PATCH 0/3] Fix confused output for alias properties
> 
> "Gonglei (Arei)"  writes:
> 
> > Hi,
> >
> >>  This doesn't change the fact that ObjectProperty is a generic struct,
> >>  and adding alias-specific fields there is wrong.
> >> >>>
> >> >>> OK, Maybe I should find other ways to attach this purpose and
> >> >>> avoid layering violation. Thanks!
> >> >>
> >> >> Unfortunately I cannot think of any.
> >> >>
> >> >> We could add a description field to ObjectProperty, and replace
> >> >> legacy_name with a description.  The output then would be
> >> >>
> >> >> virtio-blk.drive=str (drive)
> >
> > There is a question that the QOM properties are added dynamically.
> > When we call qdev_alias_all_properties() adding alias properties to
> > the source object all qdev properties on the target DeviceState, how do we
> > judge the property's name and set the value of corresponding description
> field?
> > Such as setting virtio-blk-pci.drive.description = "drive".
> >
> > The only way I think of is using the property' name as the description. In
> > object_property_add_alias() add below code:
> >
> > +   op->description = g_strdup(name);
> >
> > After those handling we can get results:
> >
> > virtio-blk-pci.physical_block_size=uint16 (physical_block_size)
> > virtio-blk-pci.logical_block_size=uint16 (logical_block_size)
> > virtio-blk-pci.drive=str (drive)
> >
> > virtio-net-pci.netdev=str (netdev)
> > virtio-net-pci.vlan=int (vlan)
> > virtio-net-pci.mac=str (mac)
> >
> > But if using this way, we just need simply modify
> make_device_property_info()
> > like below patch (just assure the new output resemble the old, don't need
> change
> > ObjectProperty struct).  Am I right? Thanks :)
> 
> Adding descriptions to properties is a big, but useful task.  The
> descriptions can serve as documentation in the code, and they can be
> used to provide better help.  This applies both to qdev and to object
> properties.
> 
> Completing this task in one go is unfortunately not practical.  We need
> to add descriptions incrementally.  Until we're done, some properties
> will lack descriptions (null pointer rather than a descriptive string).
> 
> Reusing the property name as description just to have a description adds
> no information.
> 
> What about this: add a description string (optional for now) both to
> ObjectProperty and to PropertyInfo.  Either add a description string
> parameter to object property constructors like object_property_add() and
> object_property_add_alias, or, for less churn, add a separate function
> to set an object property's description.  Update
> qdev_alias_all_properties() to set the object property's description to
> the qdev property's.
> 
> Then you can fix the help regression by giving all the regressed
> properties a useful description.
> 
> That fix should also permit retiring legacy_name.

Great! Thanks, I will work for this :)

Best regards,
-Gonglei

Re: [Qemu-devel] [PATCH 0/3] Fix confused output for alias properties

2014-09-23 Thread Gonglei (Arei)

> Subject: Re: [PATCH 0/3] Fix confused output for alias properties
> 
> Il 23/09/2014 05:09, Gonglei (Arei) ha scritto:
> > Hi,
> >
> >> This doesn't change the fact that ObjectProperty is a generic struct,
> >> and adding alias-specific fields there is wrong.
> >
> > OK, Maybe I should find other ways to attach this purpose and
> > avoid layering violation. Thanks!
> 
>  Unfortunately I cannot think of any.
> 
>  We could add a description field to ObjectProperty, and replace
>  legacy_name with a description.  The output then would be
> 
>  virtio-blk.drive=str (drive)
> >
> > There is a question that the QOM properties are added dynamically.
> > When we call qdev_alias_all_properties() adding alias properties to
> > the source object all qdev properties on the target DeviceState, how do we
> > judge the property's name and set the value of corresponding description
> field?
> > Such as setting virtio-blk-pci.drive.description = "drive".
> 
> You use the legacy_name field of PropertyInfo to set the description of
> a qdev property, and then let object_property_add_alias() copy the
> description.
> 
> Paolo

Got it. Thanks for your point. :)

Best regards,
-Gonglei

Re: [Qemu-devel] [PATCH v5] Add HMP command "info memory-devices"

2014-09-23 Thread Igor Mammedov

On Tue, 23 Sep 2014 13:35:19 +0800
Zhu Guihua  wrote:

> Provides HMP equivalent of QMP query-memory-devices command.
> 
> Signed-off-by: Zhu Guihua 
> ---
> Changes since v4:
> - enclose ID in double quotes.
> 
> Changes since v3:
> - optimize the time to print memory devices' information.
> - change output format of di->addr and di->size.
> 
> Changes since v2:
> - print address in hex.
> - change the loop control from while to for.
> - modify some variables' name.
> - optimize the time to print memory devices' kind. 
> 
> Changes since v1:
> - fix bug that accessing info->dimm when MemoryDeviceInfo is not PCDIMMDevice.
> - use enum to replace "dimm", and lookup typename in
>   MemoryDeviceInfoKind_lookup[] instead of opencodding it.
> ---
>  hmp-commands.hx |  2 ++
>  hmp.c   | 38 ++
>  hmp.h   |  1 +
>  monitor.c   |  7 +++
>  4 files changed, 48 insertions(+)
> 
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index f859f8d..0b1a4f7 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -1778,6 +1778,8 @@ show qdev device model list
>  show roms
>  @item info tpm
>  show the TPM device
> +@item info memory-devices
> +show the memory devices
>  @end table
>  ETEXI
>  
> diff --git a/hmp.c b/hmp.c
> index 40a90da..49238e9 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -1718,3 +1718,41 @@ void hmp_info_memdev(Monitor *mon, const QDict *qdict)
>  
>  qapi_free_MemdevList(memdev_list);
>  }
> +
> +void hmp_info_memory_devices(Monitor *mon, const QDict *qdict)
> +{
> +Error *err = NULL;
> +MemoryDeviceInfoList *info_list = qmp_query_memory_devices(&err);
> +MemoryDeviceInfoList *info;
> +MemoryDeviceInfo *value;
> +PCDIMMDeviceInfo *di;
> +
> +for (info = info_list; info; info = info->next) {
> +value = info->value;
> +
> +if (value) {
> +switch (value->kind) {
> +case MEMORY_DEVICE_INFO_KIND_DIMM:
> +di = value->dimm;
> +
> +monitor_printf(mon, "Memory device [%s]: \"%s\"\n",
> +   MemoryDeviceInfoKind_lookup[value->kind],
> +   di->id ? di->id : "");
> +monitor_printf(mon, "  addr: 0x%" PRIx64 "\n", di->addr);
> +monitor_printf(mon, "  slot: %" PRId64 "\n", di->slot);
> +monitor_printf(mon, "  node: %" PRId64 "\n", di->node);
> +monitor_printf(mon, "  size: %" PRIu64 "\n", di->size);
> +monitor_printf(mon, "  memdev: %s\n", di->memdev);
> +monitor_printf(mon, "  hotplugged: %s\n",
> +   di->hotplugged ? "true" : "false");
> +monitor_printf(mon, "  hotpluggable: %s\n",
> +   di->hotpluggable ? "true" : "false");
> +break;
> +default:
> +break;
> +}
> +}
> +}
> +
> +qapi_free_MemoryDeviceInfoList(info_list);
> +}
> diff --git a/hmp.h b/hmp.h
> index 4fd3c4a..4bb5dca 100644
> --- a/hmp.h
> +++ b/hmp.h
> @@ -94,6 +94,7 @@ void hmp_cpu_add(Monitor *mon, const QDict *qdict);
>  void hmp_object_add(Monitor *mon, const QDict *qdict);
>  void hmp_object_del(Monitor *mon, const QDict *qdict);
>  void hmp_info_memdev(Monitor *mon, const QDict *qdict);
> +void hmp_info_memory_devices(Monitor *mon, const QDict *qdict);
>  void object_add_completion(ReadLineState *rs, int nb_args, const char *str);
>  void object_del_completion(ReadLineState *rs, int nb_args, const char *str);
>  void device_add_completion(ReadLineState *rs, int nb_args, const char *str);
> diff --git a/monitor.c b/monitor.c
> index 667efb7..7467521 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -2921,6 +2921,13 @@ static mon_cmd_t info_cmds[] = {
>  .mhandler.cmd = hmp_info_memdev,
>  },
>  {
> +.name   = "memory-devices",
> +.args_type  = "",
> +.params = "",
> +.help   = "show memory devices",
> +.mhandler.cmd = hmp_info_memory_devices,
> +},
> +{
>  .name   = NULL,
>  },
>  };

Reviewed-By: Igor Mammedov

[Qemu-devel] [RFC PATCH v2 01/23] QEMUSizedBuffer/QEMUFile

2014-09-23 Thread Yang Hongyang

From: "Dr. David Alan Gilbert" 

** Intended to merge separately; see the separate qemu-devel thread for
   latest version
**

COLO buffer: Use QEMUSizedBuffer/QEMUFile to store migration data

We need a buffer to store migration data.

On save side:
  all saved data is written into colo buffer first, so that we can know
the total size of the migration data. this can also separate the data
transmission from colo control data, we use colo control data over
socket fd to synchronous both side's stat.

On restore side:
  all migration data was read into colo buffer first, then load data
from the buffer: If network error happens during data transmission,
the slave can still function because the migration data is not yet
loaded.

This is based on Stefan Berger's patch that creates a QEMUFile that goes
to a memory buffer; from:

http://lists.gnu.org/archive/html/qemu-devel/2013-03/msg05036.html

Using the QEMUFile interface, this patch adds support functions for
operating on in-memory sized buffers that can be written to or read from.

Signed-off-by: Stefan Berger 
Signed-off-by: Joel Schopp 

For minor tweeks/rebase I've done to it:
Signed-off-by: Dr. David Alan Gilbert 

For commit messages about colo buffer:
Signed-off-by: Yang Hongyang 
---
 include/migration/qemu-file.h |  28 +++
 include/qemu/typedefs.h   |   1 +
 qemu-file.c   | 410 ++
 3 files changed, 439 insertions(+)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index c90f529..80af3ff 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -25,6 +25,8 @@
 #define QEMU_FILE_H 1
 #include "exec/cpu-common.h"
 
+#include 
+
 /* This function writes a chunk of data to a file at the given position.
  * The pos argument can be ignored if the file is only being used for
  * streaming.  The handler should try to write all of the data it can.
@@ -94,11 +96,21 @@ typedef struct QEMUFileOps {
 QEMURamSaveFunc *save_page;
 } QEMUFileOps;
 
+struct QEMUSizedBuffer {
+struct iovec *iov;
+size_t n_iov;
+size_t size; /* total allocated size in all iov's */
+size_t used; /* number of used bytes */
+};
+
+typedef struct QEMUSizedBuffer QEMUSizedBuffer;
+
 QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops);
 QEMUFile *qemu_fopen(const char *filename, const char *mode);
 QEMUFile *qemu_fdopen(int fd, const char *mode);
 QEMUFile *qemu_fopen_socket(int fd, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
+QEMUFile *qemu_bufopen(const char *mode, QEMUSizedBuffer *input);
 int qemu_get_fd(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
 int64_t qemu_ftell(QEMUFile *f);
@@ -111,6 +123,22 @@ void qemu_put_byte(QEMUFile *f, int v);
 void qemu_put_buffer_async(QEMUFile *f, const uint8_t *buf, int size);
 bool qemu_file_mode_is_not_valid(const char *mode);
 
+QEMUSizedBuffer *qsb_create(const uint8_t *buffer, size_t len);
+QEMUSizedBuffer *qsb_clone(const QEMUSizedBuffer *);
+void qsb_free(QEMUSizedBuffer *);
+size_t qsb_set_length(QEMUSizedBuffer *qsb, size_t length);
+size_t qsb_get_length(const QEMUSizedBuffer *qsb);
+ssize_t qsb_get_buffer(const QEMUSizedBuffer *, off_t start, size_t count,
+   uint8_t **buf);
+ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *buf,
+ off_t pos, size_t count);
+
+
+/*
+ * For use on files opened with qemu_bufopen
+ */
+const QEMUSizedBuffer *qemu_buf_get(QEMUFile *f);
+
 static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
 {
 qemu_put_byte(f, (int)v);
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 5f20b0e..db1153a 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -60,6 +60,7 @@ typedef struct PCIEAERLog PCIEAERLog;
 typedef struct PCIEAERErr PCIEAERErr;
 typedef struct PCIEPort PCIEPort;
 typedef struct PCIESlot PCIESlot;
+typedef struct QEMUSizedBuffer QEMUSizedBuffer;
 typedef struct MSIMessage MSIMessage;
 typedef struct SerialState SerialState;
 typedef struct PCMCIACardState PCMCIACardState;
diff --git a/qemu-file.c b/qemu-file.c
index a8e3912..d64bee2 100644
--- a/qemu-file.c
+++ b/qemu-file.c
@@ -878,3 +878,413 @@ uint64_t qemu_get_be64(QEMUFile *f)
 v |= qemu_get_be32(f);
 return v;
 }
+
+#define QSB_CHUNK_SIZE  (1 << 10)
+#define QSB_MAX_CHUNK_SIZE  (10 * QSB_CHUNK_SIZE)
+
+/**
+ * Create a QEMUSizedBuffer
+ * This type of buffer uses scatter-gather lists internally and
+ * can grow to any size. Any data array in the scatter-gather list
+ * can hold different amount of bytes.
+ *
+ * @buffer: Optional buffer to copy into the QSB
+ * @len: size of initial buffer; if @buffer is given, buffer must
+ *   hold at least len bytes
+ *
+ * Returns a pointer to a QEMUSizedBuffer
+ */
+QEMUSizedBuffer *qsb_create(const uint8_t *buffer, size_t len)
+{
+QEMUSizedBuffer *qsb;
+size_t alloc_len, num_chunks, i, to_copy;
+size_t chunk_size =

[Qemu-devel] [RFC PATCH v2 00/23] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service

2014-09-23 Thread Yang Hongyang

Virtual machine (VM) replication is a well known technique for
providing application-agnostic software-implemented hardware fault
tolerance "non-stop service". COLO is a high availability solution.
Both primary VM (PVM) and secondary VM (SVM) run in parallel. They
receive the same request from client, and generate response in parallel
too. If the response packets from PVM and SVM are identical, they are
released immediately. Otherwise, a VM checkpoint (on demand) is
conducted. The idea is presented in Xen summit 2012, and 2013,
and academia paper in SOCC 2013. It's also presented in KVM forum
2013:
http://www.linux-kvm.org/wiki/images/1/1d/Kvm-forum-2013-COLO.pdf
Please refer to above document for detailed information. 
Please also refer to previous posted RFC proposal:
http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg05567.html

The patchset is also hosted on github:
https://github.com/macrosheep/qemu/tree/colo_v0.5

v2:
use QEMUSizedBuffer/QEMUFile as COLO buffer
colo support is enabled by default
add nic replication support
addressed comments from Eric Blake and Dr. David Alan Gilbert

v1:
implement the frame of colo

This patchset is RFC, But it is ready for demo the COLO idea
with QEMU-KVM.
Steps using this patchset to get an overview of COLO:
1. configure
2. compile
3. just like QEMU's normal migration, run 2 QEMU VM:
   - Primary VM 
   - Secondary VM with -incoming tcp:[IP]:[PORT] option
4. on Primary VM's QEMU monitor, run following command:
   migrate_set_capability colo on
   migrate tcp:[IP]:[PORT]
5. done
you will see two runing VMs, whenever you make changes to PVM, SVM
will be synced to PVM's state.

TODO list:
1. failover (will require heartbeat module: 
http://www.linux-ha.org/wiki/Downloads)
2. disk replication[COLO Disk manager]

Any comments/feedbacks are warmly welcomed.

Thanks,
Yang


Dr. David Alan Gilbert (1):
  QEMUSizedBuffer/QEMUFile

Yang Hongyang (22):
  configure: add CONFIG_COLO to switch COLO support
  COLO: introduce an api colo_supported() to indicate COLO support
  COLO migration: add a migration capability 'colo'
  COLO info: use colo info to tell migration target colo is enabled
  COLO save: integrate COLO checkpointed save into qemu migration
  COLO restore: integrate COLO checkpointed restore into qemu restore
  COLO: disable qdev hotplug
  COLO ctl: implement API's that communicate with colo agent
  COLO ctl: introduce is_slave() and is_master()
  COLO ctl: implement colo checkpoint protocol
  COLO ctl: add a RunState RUN_STATE_COLO
  COLO ctl: implement colo save
  COLO ctl: implement colo restore
  COLO save: reuse migration bitmap under colo checkpoint
  COLO ram cache: implement colo ram cache on slave
  HACK: trigger checkpoint every 500ms
  COLO nic: add command line switch
  COLO nic: init/remove colo nic devices when add/cleanup tap devices
  COLO nic: implement colo nic device interface support_colo()
  COLO nic: implement colo nic device interface configure()
  COLO nic: export colo nic APIs
  COLO nic: setup/teardown colo nic devices

 Makefile.objs  |   2 +
 arch_init.c| 174 +++-
 configure  |  14 +
 include/exec/cpu-all.h |   1 +
 include/migration/migration-colo.h |  36 +++
 include/migration/migration.h  |  13 +
 include/migration/qemu-file.h  |  28 ++
 include/net/colo-nic.h |  20 ++
 include/net/net.h  |   4 +
 include/qemu/typedefs.h|   1 +
 migration-colo-comm.c  |  78 ++
 migration-colo.c   | 540 +
 migration.c|  47 ++--
 net/Makefile.objs  |   1 +
 net/colo-nic.c | 227 
 net/tap.c  |  45 +++-
 network-colo   | 194 +
 qapi-schema.json   |  18 +-
 qemu-file.c| 410 
 qemu-options.hx|  10 +-
 stubs/Makefile.objs|   1 +
 stubs/migration-colo.c |  34 +++
 vl.c   |  12 +
 23 files changed, 1879 insertions(+), 31 deletions(-)
 create mode 100644 include/migration/migration-colo.h
 create mode 100644 include/net/colo-nic.h
 create mode 100644 migration-colo-comm.c
 create mode 100644 migration-colo.c
 create mode 100644 net/colo-nic.c
 create mode 100755 network-colo
 create mode 100644 stubs/migration-colo.c

-- 
1.9.1

[Qemu-devel] [RFC PATCH v2 06/23] COLO save: integrate COLO checkpointed save into qemu migration

2014-09-23 Thread Yang Hongyang

  Integrate COLO checkpointed save flow into qemu migration.
  Add a migrate state: MIG_STATE_COLO, enter this migrate state
after the first live migration successfully finished.
  Create a colo thread to do the checkpointed save.

Signed-off-by: Yang Hongyang 
---
 include/migration/migration-colo.h |  4 
 include/migration/migration.h  | 13 +++
 migration-colo-comm.c  |  2 +-
 migration-colo.c   | 46 ++
 migration.c| 36 -
 stubs/migration-colo.c |  4 
 6 files changed, 89 insertions(+), 16 deletions(-)

diff --git a/include/migration/migration-colo.h 
b/include/migration/migration-colo.h
index e3735d8..24589c0 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -18,4 +18,8 @@ void colo_info_mig_init(void);
 
 bool colo_supported(void);
 
+/* save */
+bool migrate_use_colo(void);
+void colo_init_checkpointer(MigrationState *s);
+
 #endif
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 3cb5ba8..3e81a27 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -64,6 +64,19 @@ struct MigrationState
 int64_t dirty_sync_count;
 };
 
+enum {
+MIG_STATE_ERROR = -1,
+MIG_STATE_NONE,
+MIG_STATE_SETUP,
+MIG_STATE_CANCELLING,
+MIG_STATE_CANCELLED,
+MIG_STATE_ACTIVE,
+MIG_STATE_COLO,
+MIG_STATE_COMPLETED,
+};
+
+void migrate_set_state(MigrationState *s, int old_state, int new_state);
+
 void process_incoming_migration(QEMUFile *f);
 
 void qemu_start_incoming_migration(const char *uri, Error **errp);
diff --git a/migration-colo-comm.c b/migration-colo-comm.c
index ccbc246..4504ceb 100644
--- a/migration-colo-comm.c
+++ b/migration-colo-comm.c
@@ -25,7 +25,7 @@ static bool colo_requested;
 
 /* save */
 
-static bool migrate_use_colo(void)
+bool migrate_use_colo(void)
 {
 MigrationState *s = migrate_get_current();
 return s->enabled_capabilities[MIGRATION_CAPABILITY_COLO];
diff --git a/migration-colo.c b/migration-colo.c
index 1d3bef8..c3fc313 100644
--- a/migration-colo.c
+++ b/migration-colo.c
@@ -8,9 +8,55 @@
  * the COPYING file in the top-level directory.
  */
 
+#include "qemu/main-loop.h"
+#include "qemu/thread.h"
 #include "migration/migration-colo.h"
 
+static QEMUBH *colo_bh;
+
 bool colo_supported(void)
 {
 return true;
 }
+
+/* save */
+
+static void *colo_thread(void *opaque)
+{
+MigrationState *s = opaque;
+
+/*TODO: COLO checkpointed save loop*/
+
+migrate_set_state(s, MIG_STATE_COLO, MIG_STATE_COMPLETED);
+
+qemu_mutex_lock_iothread();
+qemu_bh_schedule(s->cleanup_bh);
+qemu_mutex_unlock_iothread();
+
+return NULL;
+}
+
+static void colo_start_checkpointer(void *opaque)
+{
+MigrationState *s = opaque;
+
+if (colo_bh) {
+qemu_bh_delete(colo_bh);
+colo_bh = NULL;
+}
+
+qemu_mutex_unlock_iothread();
+qemu_thread_join(&s->thread);
+qemu_mutex_lock_iothread();
+
+migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_COLO);
+
+qemu_thread_create(&s->thread, "colo", colo_thread, s,
+   QEMU_THREAD_JOINABLE);
+}
+
+void colo_init_checkpointer(MigrationState *s)
+{
+colo_bh = qemu_bh_new(colo_start_checkpointer, s);
+qemu_bh_schedule(colo_bh);
+}
diff --git a/migration.c b/migration.c
index 5d28237..6a6dce5 100644
--- a/migration.c
+++ b/migration.c
@@ -27,16 +27,6 @@
 #include "trace.h"
 #include "migration/migration-colo.h"
 
-enum {
-MIG_STATE_ERROR = -1,
-MIG_STATE_NONE,
-MIG_STATE_SETUP,
-MIG_STATE_CANCELLING,
-MIG_STATE_CANCELLED,
-MIG_STATE_ACTIVE,
-MIG_STATE_COMPLETED,
-};
-
 #define MAX_THROTTLE  (32 << 20)  /* Migration speed throttling */
 
 /* Amount of time to allocate to each "chunk" of bandwidth-throttled
@@ -229,6 +219,11 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 
 get_xbzrle_cache_stats(info);
 break;
+case MIG_STATE_COLO:
+info->has_status = true;
+info->status = g_strdup("colo");
+/* TODO: display COLO specific informations(checkpoint info etc.),*/
+break;
 case MIG_STATE_COMPLETED:
 get_xbzrle_cache_stats(info);
 
@@ -272,7 +267,8 @@ void 
qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
 MigrationState *s = migrate_get_current();
 MigrationCapabilityStatusList *cap;
 
-if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP) {
+if (s->state == MIG_STATE_ACTIVE || s->state == MIG_STATE_SETUP ||
+s->state == MIG_STATE_COLO) {
 error_set(errp, QERR_MIGRATION_ACTIVE);
 return;
 }
@@ -291,7 +287,7 @@ void 
qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
 
 /* shared migration helpers */
 
-static void migrate_set_state(MigrationState *s, int old_state, int new_state)
+void migrate_set_state(MigrationState *s, int old_s

[Qemu-devel] [RFC PATCH v2 09/23] COLO ctl: implement API's that communicate with colo agent

2014-09-23 Thread Yang Hongyang

We use COLO agent to compare the packets returned by
Primary VM and Secondary VM, and decide whether to start a
checkpoint according to some rules. It is a linux kernel
module for host.
COLO controller communicate with the agent through ioctl().

Signed-off-by: Yang Hongyang 
---
 migration-colo.c | 117 +--
 1 file changed, 114 insertions(+), 3 deletions(-)

diff --git a/migration-colo.c b/migration-colo.c
index 1e6cd70..8610e15 100644
--- a/migration-colo.c
+++ b/migration-colo.c
@@ -12,7 +12,17 @@
 #include "qemu/thread.h"
 #include "block/coroutine.h"
 #include "hw/qdev-core.h"
+#include "qemu/timer.h"
 #include "migration/migration-colo.h"
+#include 
+#include "qemu/error-report.h"
+
+/*
+ * checkpoint timer: unit ms
+ * this is large because COLO checkpoint will mostly depend on
+ * COLO compare module.
+ */
+#define CHKPOINT_TIMER 1
 
 static QEMUBH *colo_bh;
 
@@ -21,16 +31,100 @@ bool colo_supported(void)
 return true;
 }
 
+/* colo compare */
+#define COMP_IOC_MAGIC 'k'
+#define COMP_IOCTWAIT   _IO(COMP_IOC_MAGIC, 0)
+#define COMP_IOCTFLUSH  _IO(COMP_IOC_MAGIC, 1)
+#define COMP_IOCTRESUME _IO(COMP_IOC_MAGIC, 2)
+
+#define COMPARE_DEV "/dev/HA_compare"
+/* COLO compare module FD */
+static int comp_fd = -1;
+
+static int colo_compare_init(void)
+{
+comp_fd = open(COMPARE_DEV, O_RDONLY);
+if (comp_fd < 0) {
+return -1;
+}
+
+return 0;
+}
+
+static void colo_compare_destroy(void)
+{
+if (comp_fd >= 0) {
+close(comp_fd);
+comp_fd = -1;
+}
+}
+
+/*
+ * Communicate with COLO Agent through ioctl.
+ * return:
+ * 0: start a checkpoint
+ * other: errno == ETIME or ERESTART, try again
+ *errno == other, error, quit colo save
+ */
+static int colo_compare(void)
+{
+return ioctl(comp_fd, COMP_IOCTWAIT, 250);
+}
+
+static __attribute__((unused)) int colo_compare_flush(void)
+{
+return ioctl(comp_fd, COMP_IOCTFLUSH, 1);
+}
+
+static __attribute__((unused)) int colo_compare_resume(void)
+{
+return ioctl(comp_fd, COMP_IOCTRESUME, 1);
+}
+
 /* save */
 
 static void *colo_thread(void *opaque)
 {
 MigrationState *s = opaque;
-int dev_hotplug = qdev_hotplug;
+int dev_hotplug = qdev_hotplug, wait_cp = 0;
+int64_t start_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+int64_t current_time;
+
+if (colo_compare_init() < 0) {
+error_report("Init colo compare error");
+goto out;
+}
 
 qdev_hotplug = 0;
 
-/*TODO: COLO checkpointed save loop*/
+while (s->state == MIG_STATE_COLO) {
+/* wait for a colo checkpoint */
+wait_cp = colo_compare();
+if (wait_cp) {
+if (errno != ETIME && errno != ERESTART) {
+error_report("compare module failed(%s)", strerror(errno));
+goto out;
+}
+/*
+ * no checkpoint is needed, wait for 1ms and then
+ * check if we need checkpoint
+ */
+current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+if (current_time - start_time < CHKPOINT_TIMER) {
+usleep(1000);
+continue;
+}
+}
+
+/* start a colo checkpoint */
+
+/*TODO: COLO save */
+
+start_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+}
+
+out:
+colo_compare_destroy();
 
 migrate_set_state(s, MIG_STATE_COLO, MIG_STATE_COMPLETED);
 
@@ -72,6 +166,17 @@ void colo_init_checkpointer(MigrationState *s)
 
 static Coroutine *colo;
 
+/*
+ * return:
+ * 0: start a checkpoint
+ * 1: some error happend, exit colo restore
+ */
+static int slave_wait_new_checkpoint(QEMUFile *f)
+{
+/* TODO: wait checkpoint start command from master */
+return 1;
+}
+
 void colo_process_incoming_checkpoints(QEMUFile *f)
 {
 int dev_hotplug = qdev_hotplug;
@@ -85,7 +190,13 @@ void colo_process_incoming_checkpoints(QEMUFile *f)
 colo = qemu_coroutine_self();
 assert(colo != NULL);
 
-/* TODO: COLO checkpointed restore loop */
+while (true) {
+if (slave_wait_new_checkpoint(f)) {
+break;
+}
+
+/* TODO: COLO restore */
+}
 
 colo = NULL;
 restore_exit_colo();
-- 
1.9.1

[Qemu-devel] [RFC PATCH v2 05/23] COLO info: use colo info to tell migration target colo is enabled

2014-09-23 Thread Yang Hongyang

migrate colo info to migration target to tell the target colo is
enabled.

Signed-off-by: Yang Hongyang 
---
 Makefile.objs  |  1 +
 include/migration/migration-colo.h |  3 ++
 migration-colo-comm.c  | 68 ++
 vl.c   |  4 +++
 4 files changed, 76 insertions(+)
 create mode 100644 migration-colo-comm.c

diff --git a/Makefile.objs b/Makefile.objs
index 9654c04..e305444 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -49,6 +49,7 @@ common-obj-$(CONFIG_POSIX) += os-posix.o
 common-obj-$(CONFIG_LINUX) += fsdev/
 
 common-obj-y += migration.o migration-tcp.o
+common-obj-y += migration-colo-comm.o
 common-obj-$(CONFIG_COLO) += migration-colo.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o
diff --git a/include/migration/migration-colo.h 
b/include/migration/migration-colo.h
index 35b384c..e3735d8 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -12,6 +12,9 @@
 #define QEMU_MIGRATION_COLO_H
 
 #include "qemu-common.h"
+#include "migration/migration.h"
+
+void colo_info_mig_init(void);
 
 bool colo_supported(void);
 
diff --git a/migration-colo-comm.c b/migration-colo-comm.c
new file mode 100644
index 000..ccbc246
--- /dev/null
+++ b/migration-colo-comm.c
@@ -0,0 +1,68 @@
+/*
+ *  COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ *  (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ *  Copyright (C) 2014 FUJITSU LIMITED
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ *
+ */
+
+#include 
+
+#define DEBUG_COLO
+
+#ifdef DEBUG_COLO
+#define DPRINTF(fmt, ...) \
+do { fprintf(stdout, "COLO: " fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+do { } while (0)
+#endif
+
+static bool colo_requested;
+
+/* save */
+
+static bool migrate_use_colo(void)
+{
+MigrationState *s = migrate_get_current();
+return s->enabled_capabilities[MIGRATION_CAPABILITY_COLO];
+}
+
+static void colo_info_save(QEMUFile *f, void *opaque)
+{
+qemu_put_byte(f, migrate_use_colo());
+}
+
+/* restore */
+
+static int colo_info_load(QEMUFile *f, void *opaque, int version_id)
+{
+int value = qemu_get_byte(f);
+
+if (value && !colo_supported()) {
+fprintf(stderr, "COLO is not supported\n");
+return -EINVAL;
+}
+
+if (value && !colo_requested) {
+DPRINTF("COLO requested!\n");
+}
+
+colo_requested = value;
+
+return 0;
+}
+
+static SaveVMHandlers savevm_colo_info_handlers = {
+.save_state = colo_info_save,
+.load_state = colo_info_load,
+};
+
+void colo_info_mig_init(void)
+{
+register_savevm_live(NULL, "colo info", -1, 1,
+ &savevm_colo_info_handlers, NULL);
+}
diff --git a/vl.c b/vl.c
index dc792fe..09d9851 100644
--- a/vl.c
+++ b/vl.c
@@ -89,6 +89,7 @@ int main(int argc, char **argv)
 #include "sysemu/dma.h"
 #include "audio/audio.h"
 #include "migration/migration.h"
+#include "migration/migration-colo.h"
 #include "sysemu/kvm.h"
 #include "qapi/qmp/qjson.h"
 #include "qemu/option.h"
@@ -4371,6 +4372,9 @@ int main(int argc, char **argv, char **envp)
 
 blk_mig_init();
 ram_mig_init();
+if (colo_supported()) {
+colo_info_mig_init();
+}
 
 /* open the virtual block devices */
 if (snapshot)
-- 
1.9.1

[Qemu-devel] [RFC PATCH v2 02/23] configure: add CONFIG_COLO to switch COLO support

2014-09-23 Thread Yang Hongyang

./configure --enable-colo/--disable-colo to switch COLO
support on/off.
COLO support is on by default.

Signed-off-by: Yang Hongyang 
---
 configure | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/configure b/configure
index 6c3d6cd..12635cd 100755
--- a/configure
+++ b/configure
@@ -258,6 +258,7 @@ xfs=""
 vhost_net="no"
 vhost_scsi="no"
 kvm="no"
+colo="yes"
 rdma=""
 gprof="no"
 debug_tcg="no"
@@ -923,6 +924,10 @@ for opt do
   ;;
   --enable-kvm) kvm="yes"
   ;;
+  --disable-colo) colo="no"
+  ;;
+  --enable-colo) colo="yes"
+  ;;
   --disable-tcg-interpreter) tcg_interpreter="no"
   ;;
   --enable-tcg-interpreter) tcg_interpreter="yes"
@@ -1320,6 +1325,10 @@ Advanced options (experts only):
   --disable-slirp  disable SLIRP userspace network connectivity
   --disable-kvmdisable KVM acceleration support
   --enable-kvm enable KVM acceleration support
+  --disable-colo   disable COarse-grain LOck-stepping Virtual
+   Machines for Non-stop Service
+  --enable-coloenable COarse-grain LOck-stepping Virtual
+   Machines for Non-stop Service (default)
   --disable-rdma   disable RDMA-based migration support
   --enable-rdmaenable RDMA-based migration support
   --enable-tcg-interpreter enable TCG with bytecode interpreter (TCI)
@@ -4291,6 +4300,7 @@ echo "Linux AIO support $linux_aio"
 echo "ATTR/XATTR support $attr"
 echo "Install blobs $blobs"
 echo "KVM support   $kvm"
+echo "COLO support  $colo"
 echo "RDMA support  $rdma"
 echo "TCG interpreter   $tcg_interpreter"
 echo "fdt support   $fdt"
@@ -4842,6 +4852,10 @@ if have_backend "ftrace"; then
 fi
 echo "CONFIG_TRACE_FILE=$trace_file" >> $config_host_mak
 
+if test "$colo" = "yes"; then
+  echo "CONFIG_COLO=y" >> $config_host_mak
+fi
+
 if test "$rdma" = "yes" ; then
   echo "CONFIG_RDMA=y" >> $config_host_mak
 fi
-- 
1.9.1

[Qemu-devel] [RFC PATCH v2 08/23] COLO: disable qdev hotplug

2014-09-23 Thread Yang Hongyang

COLO do not support qdev hotplug migration, disable it.

Signed-off-by: Yang Hongyang 
---
 migration-colo.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/migration-colo.c b/migration-colo.c
index 391e0b7..1e6cd70 100644
--- a/migration-colo.c
+++ b/migration-colo.c
@@ -11,6 +11,7 @@
 #include "qemu/main-loop.h"
 #include "qemu/thread.h"
 #include "block/coroutine.h"
+#include "hw/qdev-core.h"
 #include "migration/migration-colo.h"
 
 static QEMUBH *colo_bh;
@@ -25,6 +26,9 @@ bool colo_supported(void)
 static void *colo_thread(void *opaque)
 {
 MigrationState *s = opaque;
+int dev_hotplug = qdev_hotplug;
+
+qdev_hotplug = 0;
 
 /*TODO: COLO checkpointed save loop*/
 
@@ -34,6 +38,8 @@ static void *colo_thread(void *opaque)
 qemu_bh_schedule(s->cleanup_bh);
 qemu_mutex_unlock_iothread();
 
+qdev_hotplug = dev_hotplug;
+
 return NULL;
 }
 
@@ -68,10 +74,14 @@ static Coroutine *colo;
 
 void colo_process_incoming_checkpoints(QEMUFile *f)
 {
+int dev_hotplug = qdev_hotplug;
+
 if (!restore_use_colo()) {
 return;
 }
 
+qdev_hotplug = 0;
+
 colo = qemu_coroutine_self();
 assert(colo != NULL);
 
@@ -80,5 +90,7 @@ void colo_process_incoming_checkpoints(QEMUFile *f)
 colo = NULL;
 restore_exit_colo();
 
+qdev_hotplug = dev_hotplug;
+
 return;
 }
-- 
1.9.1

[Qemu-devel] [RFC PATCH v2 04/23] COLO migration: add a migration capability 'colo'

2014-09-23 Thread Yang Hongyang

Add a migration capability 'colo'. If this capability is on,
The migration will never end, and the VM will be continuously
checkpointed.

Signed-off-by: Yang Hongyang 
---
 migration.c  | 8 
 qapi-schema.json | 5 -
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/migration.c b/migration.c
index 8d675b3..5d28237 100644
--- a/migration.c
+++ b/migration.c
@@ -25,6 +25,7 @@
 #include "qemu/thread.h"
 #include "qmp-commands.h"
 #include "trace.h"
+#include "migration/migration-colo.h"
 
 enum {
 MIG_STATE_ERROR = -1,
@@ -277,6 +278,13 @@ void 
qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
 }
 
 for (cap = params; cap; cap = cap->next) {
+if (cap->value->capability == MIGRATION_CAPABILITY_COLO &&
+cap->value->state && !colo_supported()) {
+error_setg(errp, "COLO is not currently supported, please rerun"
+ " configure with --enable-colo option in order to"
+ " support COLO feature");
+continue;
+}
 s->enabled_capabilities[cap->value->capability] = cap->value->state;
 }
 }
diff --git a/qapi-schema.json b/qapi-schema.json
index 689b548..8d58ef7 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -491,10 +491,13 @@
 # @auto-converge: If enabled, QEMU will automatically throttle down the guest
 #  to speed up convergence of RAM migration. (since 1.6)
 #
+# @colo: If enabled, the migration will never end, and the VM will instead be
+#  continuously checkpointed. (since 2.2)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
-  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks'] }
+  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', 'colo'] }
 
 ##
 # @MigrationCapabilityStatus
-- 
1.9.1

[Qemu-devel] [RFC PATCH v2 07/23] COLO restore: integrate COLO checkpointed restore into qemu restore

2014-09-23 Thread Yang Hongyang

enter colo checkpointed restore loop after live migration.

Signed-off-by: Yang Hongyang 
---
 include/migration/migration-colo.h |  6 ++
 migration-colo-comm.c  | 10 ++
 migration-colo.c   | 22 ++
 migration.c|  3 +++
 stubs/migration-colo.c |  4 
 5 files changed, 45 insertions(+)

diff --git a/include/migration/migration-colo.h 
b/include/migration/migration-colo.h
index 24589c0..861fa27 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -22,4 +22,10 @@ bool colo_supported(void);
 bool migrate_use_colo(void);
 void colo_init_checkpointer(MigrationState *s);
 
+/* restore */
+bool restore_use_colo(void);
+void restore_exit_colo(void);
+
+void colo_process_incoming_checkpoints(QEMUFile *f);
+
 #endif
diff --git a/migration-colo-comm.c b/migration-colo-comm.c
index 4504ceb..b12a57a 100644
--- a/migration-colo-comm.c
+++ b/migration-colo-comm.c
@@ -38,6 +38,16 @@ static void colo_info_save(QEMUFile *f, void *opaque)
 
 /* restore */
 
+bool restore_use_colo(void)
+{
+return colo_requested;
+}
+
+void restore_exit_colo(void)
+{
+colo_requested = false;
+}
+
 static int colo_info_load(QEMUFile *f, void *opaque, int version_id)
 {
 int value = qemu_get_byte(f);
diff --git a/migration-colo.c b/migration-colo.c
index c3fc313..391e0b7 100644
--- a/migration-colo.c
+++ b/migration-colo.c
@@ -10,6 +10,7 @@
 
 #include "qemu/main-loop.h"
 #include "qemu/thread.h"
+#include "block/coroutine.h"
 #include "migration/migration-colo.h"
 
 static QEMUBH *colo_bh;
@@ -60,3 +61,24 @@ void colo_init_checkpointer(MigrationState *s)
 colo_bh = qemu_bh_new(colo_start_checkpointer, s);
 qemu_bh_schedule(colo_bh);
 }
+
+/* restore */
+
+static Coroutine *colo;
+
+void colo_process_incoming_checkpoints(QEMUFile *f)
+{
+if (!restore_use_colo()) {
+return;
+}
+
+colo = qemu_coroutine_self();
+assert(colo != NULL);
+
+/* TODO: COLO checkpointed restore loop */
+
+colo = NULL;
+restore_exit_colo();
+
+return;
+}
diff --git a/migration.c b/migration.c
index 6a6dce5..06165a7 100644
--- a/migration.c
+++ b/migration.c
@@ -86,6 +86,9 @@ static void process_incoming_migration_co(void *opaque)
 int ret;
 
 ret = qemu_loadvm_state(f);
+if (!ret) {
+colo_process_incoming_checkpoints(f);
+}
 qemu_fclose(f);
 free_xbzrle_decoded_buf();
 if (ret < 0) {
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
index 9013c40..55f0d37 100644
--- a/stubs/migration-colo.c
+++ b/stubs/migration-colo.c
@@ -18,3 +18,7 @@ bool colo_supported(void)
 void colo_init_checkpointer(MigrationState *s)
 {
 }
+
+void colo_process_incoming_checkpoints(QEMUFile *f)
+{
+}
-- 
1.9.1

[Qemu-devel] [RFC PATCH v2 10/23] COLO ctl: introduce is_slave() and is_master()

2014-09-23 Thread Yang Hongyang

is_slaver is to determine whether the QEMU instance is a
slaver(migration target) at runtime.
is_master is to determine whether the QEMU instance is a
master(migration starter) at runtime.
This 2 APIs will be used later.

Signed-off-by: Yang Hongyang 
---
 migration-colo.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/migration-colo.c b/migration-colo.c
index 8610e15..37003f5 100644
--- a/migration-colo.c
+++ b/migration-colo.c
@@ -83,6 +83,12 @@ static __attribute__((unused)) int colo_compare_resume(void)
 
 /* save */
 
+static __attribute__((unused)) bool colo_is_master(void)
+{
+MigrationState *s = migrate_get_current();
+return (s->state == MIG_STATE_COLO);
+}
+
 static void *colo_thread(void *opaque)
 {
 MigrationState *s = opaque;
@@ -166,6 +172,11 @@ void colo_init_checkpointer(MigrationState *s)
 
 static Coroutine *colo;
 
+static __attribute__((unused)) bool colo_is_slave(void)
+{
+return colo != NULL;
+}
+
 /*
  * return:
  * 0: start a checkpoint
-- 
1.9.1

[Qemu-devel] [RFC PATCH v2 13/23] COLO ctl: implement colo save

2014-09-23 Thread Yang Hongyang

implement colo save

Signed-off-by: Yang Hongyang 
---
 migration-colo.c | 60 +---
 1 file changed, 53 insertions(+), 7 deletions(-)

diff --git a/migration-colo.c b/migration-colo.c
index 2e478e9..d99342a 100644
--- a/migration-colo.c
+++ b/migration-colo.c
@@ -13,6 +13,7 @@
 #include "block/coroutine.h"
 #include "hw/qdev-core.h"
 #include "qemu/timer.h"
+#include "sysemu/sysemu.h"
 #include "migration/migration-colo.h"
 #include 
 #include "qemu/error-report.h"
@@ -106,12 +107,12 @@ static int colo_compare(void)
 return ioctl(comp_fd, COMP_IOCTWAIT, 250);
 }
 
-static __attribute__((unused)) int colo_compare_flush(void)
+static int colo_compare_flush(void)
 {
 return ioctl(comp_fd, COMP_IOCTFLUSH, 1);
 }
 
-static __attribute__((unused)) int colo_compare_resume(void)
+static int colo_compare_resume(void)
 {
 return ioctl(comp_fd, COMP_IOCTRESUME, 1);
 }
@@ -200,6 +201,9 @@ static bool colo_is_master(void)
 static int do_colo_transaction(MigrationState *s, QEMUFile *control)
 {
 int ret;
+uint8_t *buf;
+size_t size;
+QEMUFile *trans = NULL;
 
 ret = colo_ctl_put(s->file, COLO_CHECKPOINT_NEW);
 if (ret) {
@@ -211,30 +215,73 @@ static int do_colo_transaction(MigrationState *s, 
QEMUFile *control)
 goto out;
 }
 
-/* TODO: suspend and save vm state to colo buffer */
+/* open colo buffer for write */
+trans = qemu_bufopen("w", NULL);
+if (!trans) {
+error_report("Open colo buffer for write failed");
+goto out;
+}
+
+/* suspend and save vm state to colo buffer */
+qemu_mutex_lock_iothread();
+vm_stop_force_state(RUN_STATE_COLO);
+qemu_mutex_unlock_iothread();
+/* Disable block migration */
+s->params.blk = 0;
+s->params.shared = 0;
+qemu_savevm_state_begin(trans, &s->params);
+qemu_savevm_state_complete(trans);
+
+qemu_fflush(trans);
 
 ret = colo_ctl_put(s->file, COLO_CHECKPOINT_SEND);
 if (ret) {
 goto out;
 }
 
-/* TODO: send vmstate to slave */
+/* send vmstate to slave */
+
+/* we send the total size of the vmstate first */
+size = qsb_get_length(qemu_buf_get(trans));
+ret = colo_ctl_put(s->file, size);
+if (ret) {
+goto out;
+}
+
+buf = g_malloc(size);
+qsb_get_buffer(qemu_buf_get(trans), 0, size, &buf);
+qemu_put_buffer(s->file, buf, size);
+g_free(buf);
+ret = qemu_file_get_error(s->file);
+if (ret < 0) {
+goto out;
+}
+qemu_fflush(s->file);
 
 ret = colo_ctl_get(control, COLO_CHECKPOINT_RECEIVED);
 if (ret) {
 goto out;
 }
 
-/* TODO: Flush network etc. */
+/* Flush network etc. */
+colo_compare_flush();
 
 ret = colo_ctl_get(control, COLO_CHECKPOINT_LOADED);
 if (ret) {
 goto out;
 }
 
-/* TODO: resume master */
+colo_compare_resume();
+ret = 0;
 
 out:
+if (trans)
+qemu_fclose(trans);
+/* resume master */
+qemu_mutex_lock_iothread();
+vm_start();
+qemu_mutex_unlock_iothread();
+
 return ret;
 }
 
@@ -289,7 +336,6 @@ static void *colo_thread(void *opaque)
 }
 
 /* start a colo checkpoint */
-
 if (do_colo_transaction(s, colo_control)) {
 goto out;
 }
-- 
1.9.1

[Qemu-devel] [RFC PATCH v2 03/23] COLO: introduce an api colo_supported() to indicate COLO support

2014-09-23 Thread Yang Hongyang

introduce an api colo_supported() to indicate COLO support, returns
true if colo supported (configured with --enable-colo).

Signed-off-by: Yang Hongyang 
---
 Makefile.objs  |  1 +
 include/migration/migration-colo.h | 18 ++
 migration-colo.c   | 16 
 stubs/Makefile.objs|  1 +
 stubs/migration-colo.c | 16 
 5 files changed, 52 insertions(+)
 create mode 100644 include/migration/migration-colo.h
 create mode 100644 migration-colo.c
 create mode 100644 stubs/migration-colo.c

diff --git a/Makefile.objs b/Makefile.objs
index 97db978..9654c04 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -49,6 +49,7 @@ common-obj-$(CONFIG_POSIX) += os-posix.o
 common-obj-$(CONFIG_LINUX) += fsdev/
 
 common-obj-y += migration.o migration-tcp.o
+common-obj-$(CONFIG_COLO) += migration-colo.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o
 common-obj-$(CONFIG_RDMA) += migration-rdma.o
diff --git a/include/migration/migration-colo.h 
b/include/migration/migration-colo.h
new file mode 100644
index 000..35b384c
--- /dev/null
+++ b/include/migration/migration-colo.h
@@ -0,0 +1,18 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (C) 2014 FUJITSU LIMITED
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_MIGRATION_COLO_H
+#define QEMU_MIGRATION_COLO_H
+
+#include "qemu-common.h"
+
+bool colo_supported(void);
+
+#endif
diff --git a/migration-colo.c b/migration-colo.c
new file mode 100644
index 000..1d3bef8
--- /dev/null
+++ b/migration-colo.c
@@ -0,0 +1,16 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (C) 2014 FUJITSU LIMITED
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "migration/migration-colo.h"
+
+bool colo_supported(void)
+{
+return true;
+}
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 5e347d0..9fe6b4c 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -40,3 +40,4 @@ stub-obj-$(CONFIG_WIN32) += fd-register.o
 stub-obj-y += cpus.o
 stub-obj-y += kvm.o
 stub-obj-y += qmp_pc_dimm_device_list.o
+stub-obj-y += migration-colo.o
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
new file mode 100644
index 000..b9ee6a0
--- /dev/null
+++ b/stubs/migration-colo.c
@@ -0,0 +1,16 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (C) 2014 FUJITSU LIMITED
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "migration/migration-colo.h"
+
+bool colo_supported(void)
+{
+return false;
+}
-- 
1.9.1

[Qemu-devel] [RFC PATCH v2 19/23] COLO nic: init/remove colo nic devices when add/cleanup tap devices

2014-09-23 Thread Yang Hongyang

init/remove colo nic devices when add/cleanup tap devices

Signed-off-by: Yang Hongyang 
Cc: Stefan Hajnoczi 
---
 include/net/colo-nic.h | 18 +
 include/net/net.h  |  3 +++
 net/Makefile.objs  |  1 +
 net/colo-nic.c | 54 ++
 net/tap.c  | 45 +
 5 files changed, 113 insertions(+), 8 deletions(-)
 create mode 100644 include/net/colo-nic.h
 create mode 100644 net/colo-nic.c

diff --git a/include/net/colo-nic.h b/include/net/colo-nic.h
new file mode 100644
index 000..fe6874b
--- /dev/null
+++ b/include/net/colo-nic.h
@@ -0,0 +1,18 @@
+/*
+ *  COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ *  (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ *  Copyright (C) 2014 FUJITSU LIMITED
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef COLO_NIC_H
+#define COLO_NIC_H
+
+void colo_add_nic_devices(NetClientState *nc);
+void colo_remove_nic_devices(NetClientState *nc);
+
+#endif
diff --git a/include/net/net.h b/include/net/net.h
index ed594f9..62050c5 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -85,6 +85,9 @@ struct NetClientState {
 char *model;
 char *name;
 char info_str[256];
+char colo_script[1024];
+char colo_nicname[128];
+char ifname[128];
 unsigned receive_disabled : 1;
 NetClientDestructor *destructor;
 unsigned int queue_index;
diff --git a/net/Makefile.objs b/net/Makefile.objs
index ec19cb3..73f4a81 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -13,3 +13,4 @@ common-obj-$(CONFIG_HAIKU) += tap-haiku.o
 common-obj-$(CONFIG_SLIRP) += slirp.o
 common-obj-$(CONFIG_VDE) += vde.o
 common-obj-$(CONFIG_NETMAP) += netmap.o
+common-obj-$(CONFIG_COLO) += colo-nic.o
diff --git a/net/colo-nic.c b/net/colo-nic.c
new file mode 100644
index 000..2f7ca23
--- /dev/null
+++ b/net/colo-nic.c
@@ -0,0 +1,54 @@
+/*
+ *  COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ *  (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (C) 2014 FUJITSU LIMITED
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ *
+ */
+#include "net/net.h"
+#include "net/colo-nic.h"
+
+typedef struct nic_device {
+NetClientState *nc;
+bool (*support_colo)(NetClientState *nc);
+int (*configure)(NetClientState *nc, bool up, bool is_slave);
+QTAILQ_ENTRY(nic_device) next;
+bool is_up;
+} nic_device;
+
+QTAILQ_HEAD(, nic_device) nic_devices = QTAILQ_HEAD_INITIALIZER(nic_devices);
+
+void colo_add_nic_devices(NetClientState *nc)
+{
+struct nic_device *nic = g_malloc0(sizeof(*nic));
+
+/* TODO: init colo function pointers */
+/*
+ * TODO
+ * only support "-netdev tap,colo_scripte..."  options
+ * "-net nic -net tap..." options is not supported
+ */
+nic->nc = nc;
+
+QTAILQ_INSERT_TAIL(&nic_devices, nic, next);
+}
+
+void colo_remove_nic_devices(NetClientState *nc)
+{
+struct nic_device *nic, *next_nic;
+
+if (!nc) {
+return;
+}
+
+QTAILQ_FOREACH_SAFE(nic, &nic_devices, next, next_nic) {
+if (nic->nc == nc) {
+/* TODO: teardown colo nic */
+QTAILQ_REMOVE(&nic_devices, nic, next);
+g_free(nic);
+}
+}
+}
diff --git a/net/tap.c b/net/tap.c
index a40f7f0..d1a1dab 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -41,6 +41,7 @@
 #include "qemu/error-report.h"
 
 #include "net/tap.h"
+#include "net/colo-nic.h"
 
 #include "net/vhost_net.h"
 
@@ -284,6 +285,8 @@ static void tap_cleanup(NetClientState *nc)
 
 qemu_purge_queued_packets(nc);
 
+colo_remove_nic_devices(nc);
+
 if (s->down_script[0])
 launch_script(s->down_script, s->down_script_arg, s->fd);
 
@@ -591,10 +594,11 @@ static int net_init_tap_one(const NetdevTapOptions *tap, 
NetClientState *peer,
 const char *model, const char *name,
 const char *ifname, const char *script,
 const char *downscript, const char *vhostfdname,
-int vnet_hdr, int fd)
+int vnet_hdr, int fd, bool setup_colo)
 {
 TAPState *s;
 int vhostfd;
+NetClientState *nc = NULL;
 
 s = net_tap_fd_init(peer, model, name, fd, vnet_hdr);
 if (!s) {
@@ -623,6 +627,21 @@ static int net_init_tap_one(const NetdevTapOptions *tap, 
NetClientState *peer,
 }
 }
 
+nc = &(s->nc);
+snprintf(nc->ifname, sizeof(nc->ifname), "%s", ifname);
+if (tap->has_colo_script) {
+snprintf(nc->colo_script, sizeof(nc->colo_script), "%s",
+ tap->colo_script);
+}
+if (tap->has_colo_nicname) {
+snprintf(nc->colo_nicname, sizeof(nc->colo_nicname), "%s",
+

[Qemu-devel] [RFC PATCH v2 11/23] COLO ctl: implement colo checkpoint protocol

2014-09-23 Thread Yang Hongyang

implement colo checkpoint protocol.

Checkpoint synchronzing points.

  Primary Secondary
  NEW @
  Suspend
  SUSPENDED   @
  Suspend&Save state
  SEND@
  Send state  Receive state
  RECEIVED@
  Flush network   Load state
  LOADED  @
  Resume  Resume

  Start Comparing
NOTE:
 1) '@' who sends the message
 2) Every sync-point is synchronized by two sides with only
one handshake(single direction) for low-latency.
If more strict synchronization is required, a opposite direction
sync-point should be added.
 3) Since sync-points are single direction, the remote side may
go forward a lot when this side just receives the sync-point.

Signed-off-by: Yang Hongyang 
---
 migration-colo.c | 250 +--
 1 file changed, 244 insertions(+), 6 deletions(-)

diff --git a/migration-colo.c b/migration-colo.c
index 37003f5..2e478e9 100644
--- a/migration-colo.c
+++ b/migration-colo.c
@@ -24,6 +24,41 @@
  */
 #define CHKPOINT_TIMER 1
 
+enum {
+COLO_READY = 0x46,
+
+/*
+ * Checkpoint synchronzing points.
+ *
+ *  Primary Secondary
+ *  NEW @
+ *  Suspend
+ *  SUSPENDED   @
+ *  Suspend&Save state
+ *  SEND@
+ *  Send state  Receive state
+ *  RECEIVED@
+ *  Flush network   Load state
+ *  LOADED  @
+ *  Resume  Resume
+ *
+ *  Start Comparing
+ * NOTE:
+ * 1) '@' who sends the message
+ * 2) Every sync-point is synchronized by two sides with only
+ *one handshake(single direction) for low-latency.
+ *If more strict synchronization is required, a opposite direction
+ *sync-point should be added.
+ * 3) Since sync-points are single direction, the remote side may
+ *go forward a lot when this side just receives the sync-point.
+ */
+COLO_CHECKPOINT_NEW,
+COLO_CHECKPOINT_SUSPENDED,
+COLO_CHECKPOINT_SEND,
+COLO_CHECKPOINT_RECEIVED,
+COLO_CHECKPOINT_LOADED,
+};
+
 static QEMUBH *colo_bh;
 
 bool colo_supported(void)
@@ -81,28 +116,159 @@ static __attribute__((unused)) int 
colo_compare_resume(void)
 return ioctl(comp_fd, COMP_IOCTRESUME, 1);
 }
 
+/* colo checkpoint control helper */
+static bool colo_is_master(void);
+static bool colo_is_slave(void);
+
+static void ctl_error_handler(void *opaque, int err)
+{
+if (colo_is_slave()) {
+/* TODO: determine whether we need to failover */
+/* FIXME: we will not failover currently, just kill slave */
+error_report("error: colo transmission failed!");
+exit(1);
+} else if (colo_is_master()) {
+/* Master still alive, do not failover */
+error_report("error: colo transmission failed!");
+return;
+} else {
+error_report("COLO: Unexpected error happend!");
+exit(EXIT_FAILURE);
+}
+}
+
+static int colo_ctl_put(QEMUFile *f, uint64_t request)
+{
+int ret = 0;
+
+qemu_put_be64(f, request);
+qemu_fflush(f);
+
+ret = qemu_file_get_error(f);
+if (ret < 0) {
+ctl_error_handler(f, ret);
+return 1;
+}
+
+return ret;
+}
+
+static int colo_ctl_get_value(QEMUFile *f, uint64_t *value)
+{
+int ret = 0;
+uint64_t temp;
+
+temp = qemu_get_be64(f);
+
+ret = qemu_file_get_error(f);
+if (ret < 0) {
+ctl_error_handler(f, ret);
+return 1;
+}
+
+*value = temp;
+return 0;
+}
+
+static int colo_ctl_get(QEMUFile *f, uint64_t require)
+{
+int ret;
+uint64_t value;
+
+ret = colo_ctl_get_value(f, &value);
+if (ret) {
+return ret;
+}
+
+if (value != require) {
+error_report("unexpected state! expected: %"PRIu64
+ ", received: %"PRIu64, require, value);
+exit(1);
+}
+
+return ret;
+}
+
 /* save */
 
-static __attribute__((unused)) bool colo_is_master(void)
+static bool colo_is_master(void)
 {
 MigrationState *s = migrate_get_current();
 return (s->state == MIG_STATE_COLO);
 }
 
+static int do_colo_transaction(MigrationState *s, QEMUFile *control)
+{
+int ret;
+
+ret = colo_ctl_put(s->file, COLO_CHECKPOINT_NEW);
+if (ret) {
+goto out;
+}
+
+ret = colo_ctl_get(control, COLO_CHECKPOINT_SUSPENDED);
+if (ret) {
+goto out;
+}
+
+/* TODO: suspend and save vm state to colo buffer */
+
+ret = colo_ctl_put(s->file, COLO

[Qemu-devel] [RFC PATCH v2 12/23] COLO ctl: add a RunState RUN_STATE_COLO

2014-09-23 Thread Yang Hongyang

Guest will enter this state when paused to save/restore VM state
under colo checkpoint.

Signed-off-by: Yang Hongyang 
---
 qapi-schema.json | 5 -
 vl.c | 8 
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/qapi-schema.json b/qapi-schema.json
index 8d58ef7..7dcfb90 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -145,12 +145,15 @@
 # @watchdog: the watchdog action is configured to pause and has been triggered
 #
 # @guest-panicked: guest has been panicked as a result of guest OS panic
+#
+# @colo: guest is paused to save/restore VM state under colo checkpoint (since
+# 2.2)
 ##
 { 'enum': 'RunState',
   'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused',
 'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm',
 'running', 'save-vm', 'shutdown', 'suspended', 'watchdog',
-'guest-panicked' ] }
+'guest-panicked', 'colo' ] }
 
 ##
 # @StatusInfo:
diff --git a/vl.c b/vl.c
index 09d9851..b0de38e 100644
--- a/vl.c
+++ b/vl.c
@@ -619,6 +619,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 
 { RUN_STATE_INMIGRATE, RUN_STATE_RUNNING },
 { RUN_STATE_INMIGRATE, RUN_STATE_PAUSED },
+{ RUN_STATE_INMIGRATE, RUN_STATE_COLO },
 
 { RUN_STATE_INTERNAL_ERROR, RUN_STATE_PAUSED },
 { RUN_STATE_INTERNAL_ERROR, RUN_STATE_FINISH_MIGRATE },
@@ -628,6 +629,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 
 { RUN_STATE_PAUSED, RUN_STATE_RUNNING },
 { RUN_STATE_PAUSED, RUN_STATE_FINISH_MIGRATE },
+{ RUN_STATE_PAUSED, RUN_STATE_COLO},
 
 { RUN_STATE_POSTMIGRATE, RUN_STATE_RUNNING },
 { RUN_STATE_POSTMIGRATE, RUN_STATE_FINISH_MIGRATE },
@@ -638,9 +640,12 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 
 { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
 { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE },
+{ RUN_STATE_FINISH_MIGRATE, RUN_STATE_COLO},
 
 { RUN_STATE_RESTORE_VM, RUN_STATE_RUNNING },
 
+{ RUN_STATE_COLO, RUN_STATE_RUNNING },
+
 { RUN_STATE_RUNNING, RUN_STATE_DEBUG },
 { RUN_STATE_RUNNING, RUN_STATE_INTERNAL_ERROR },
 { RUN_STATE_RUNNING, RUN_STATE_IO_ERROR },
@@ -651,6 +656,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_RUNNING, RUN_STATE_SHUTDOWN },
 { RUN_STATE_RUNNING, RUN_STATE_WATCHDOG },
 { RUN_STATE_RUNNING, RUN_STATE_GUEST_PANICKED },
+{ RUN_STATE_RUNNING, RUN_STATE_COLO},
 
 { RUN_STATE_SAVE_VM, RUN_STATE_RUNNING },
 
@@ -661,9 +667,11 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED },
 { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING },
 { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
+{ RUN_STATE_SUSPENDED, RUN_STATE_COLO},
 
 { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
 { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
+{ RUN_STATE_WATCHDOG, RUN_STATE_COLO},
 
 { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING },
 { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
-- 
1.9.1

Re: [Qemu-devel] [PATCH 4/4] block: avoid creating oversized writes in multiwrite_merge

2014-09-23 Thread Peter Lieven


On 23.09.2014 10:59, Kevin Wolf wrote:

Am 23.09.2014 um 08:15 hat Peter Lieven geschrieben:

On 22.09.2014 21:06, Paolo Bonzini wrote:

Il 22/09/2014 11:43, Peter Lieven ha scritto:

This series aims not at touching default behaviour. The default for 
max_transfer_length
is 0 (no limit). max_transfer_length is a limit that MUST be satisfied 
otherwise the request
will fail. And Patch 2 aims at catching this fail earlier in the stack.

Understood.  But the right fix is to avoid that backend limits transpire
into guest ABI, not to catch the limits earlier.  So the right fix would
be to implement request splitting.

Since you proposed to add traces for this would you leave those in?
And since iSCSI is the only user of this at the moment would you
go for implementing this check in the iSCSI block layer?

As for the split logic would you think it is enough to build new qiov's
out of the too big iov without copying the contents? This would work
as long as a single iov inside the qiov is not bigger the max_transfer_length.
Otherwise I would need to allocate temporary buffers and copy around.

You can split single iovs, too. There are functions that make this very
easy, they copy a sub-qiov with a byte granularity offset and length
(qemu_iovec_concat and friends). qcow2 uses the same to split requests
at (fragmented) cluster boundaries.


Might it be as easy as this?

static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
BdrvRequestFlags flags)
{
if (nb_sectors < 0 || nb_sectors > (UINT_MAX >> BDRV_SECTOR_BITS)) {
return -EINVAL;
}

if (bs->bl.max_transfer_length &&
nb_sectors > bs->bl.max_transfer_length) {
int ret = 0;
QEMUIOVector *qiov2 = NULL;
size_t soffset = 0;

trace_bdrv_co_do_readv_toobig(bs, sector_num, nb_sectors,
bs->bl.max_transfer_length);

qemu_iovec_init(qiov2, qiov->niov);
while (nb_sectors > bs->bl.max_transfer_length && !ret) {
qemu_iovec_reset(qiov2);
qemu_iovec_concat(qiov2, qiov, soffset,
  bs->bl.max_transfer_length << BDRV_SECTOR_BITS);
ret = bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS,
bs->bl.max_transfer_length << 
BDRV_SECTOR_BITS,
qiov2, flags);
soffset += bs->bl.max_transfer_length << BDRV_SECTOR_BITS;
sector_num += bs->bl.max_transfer_length;
nb_sectors -= bs->bl.max_transfer_length;
}
qemu_iovec_destroy(qiov2);
if (ret) {
return ret;
}
}

return bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS,
 nb_sectors << BDRV_SECTOR_BITS, qiov, flags);
}



Kevin



--

Mit freundlichen Grüßen

Peter Lieven

...

  KAMP Netzwerkdienste GmbH
  Vestische Str. 89-91 | 46117 Oberhausen
  Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40
  p...@kamp.de | http://www.kamp.de

  Geschäftsführer: Heiner Lante | Michael Lante
  Amtsgericht Duisburg | HRB Nr. 12154
  USt-Id-Nr.: DE 120607556

...

[Qemu-devel] [RFC PATCH v2 14/23] COLO ctl: implement colo restore

2014-09-23 Thread Yang Hongyang

implement colo restore

Signed-off-by: Yang Hongyang 
---
 migration-colo.c | 46 --
 1 file changed, 40 insertions(+), 6 deletions(-)

diff --git a/migration-colo.c b/migration-colo.c
index d99342a..91634d2 100644
--- a/migration-colo.c
+++ b/migration-colo.c
@@ -426,8 +426,11 @@ void colo_process_incoming_checkpoints(QEMUFile *f)
 {
 int fd = qemu_get_fd(f);
 int dev_hotplug = qdev_hotplug;
-QEMUFile *ctl = NULL;
+QEMUFile *ctl = NULL, *fb = NULL;
 int ret;
+uint64_t total_size;
+uint8_t *buf;
+QEMUSizedBuffer *colo_buffer;
 
 if (!restore_use_colo()) {
 return;
@@ -449,7 +452,8 @@ void colo_process_incoming_checkpoints(QEMUFile *f)
 goto out;
 }
 
-/* TODO: in COLO mode, slave is runing, so start the vm */
+/* in COLO mode, slave is runing, so start the vm */
+vm_start();
 
 while (true) {
 if (slave_wait_new_checkpoint(f)) {
@@ -458,7 +462,8 @@ void colo_process_incoming_checkpoints(QEMUFile *f)
 
 /* start colo checkpoint */
 
-/* TODO: suspend guest */
+/* suspend guest */
+vm_stop_force_state(RUN_STATE_COLO);
 
 ret = colo_ctl_put(ctl, COLO_CHECKPOINT_SUSPENDED);
 if (ret) {
@@ -470,26 +475,55 @@ void colo_process_incoming_checkpoints(QEMUFile *f)
 goto out;
 }
 
-/* TODO: read migration data into colo buffer */
+/* read migration data into colo buffer */
+
+/* read the vmstate total size first */
+ret = colo_ctl_get_value(f, &total_size);
+if (ret) {
+goto out;
+}
+buf = g_malloc(total_size);
+qemu_get_buffer(f, buf, total_size);
+colo_buffer = qsb_create(buf, total_size);
+g_free(buf);
 
 ret = colo_ctl_put(ctl, COLO_CHECKPOINT_RECEIVED);
 if (ret) {
 goto out;
 }
 
-/* TODO: load vm state */
+/* open colo buffer for read */
+fb = qemu_bufopen("r", colo_buffer);
+if (!fb) {
+error_report("can't open colo buffer for read");
+goto out;
+}
+
+/* load vm state */
+if (qemu_loadvm_state(fb) < 0) {
+error_report("COLO: loadvm failed\n");
+goto out;
+}
 
 ret = colo_ctl_put(ctl, COLO_CHECKPOINT_LOADED);
 if (ret) {
 goto out;
 }
 
-/* TODO: resume guest */
+/* resume guest */
+vm_start();
+
+qemu_fclose(fb);
+fb = NULL;
 }
 
 out:
 colo = NULL;
 
+if (fb) {
+qemu_fclose(fb);
+}
+
 if (ctl) {
 qemu_fclose(ctl);
 }
-- 
1.9.1

[Qemu-devel] [RFC PATCH v2 17/23] HACK: trigger checkpoint every 500ms

2014-09-23 Thread Yang Hongyang

Because COLO Agent is under development. We add this hack for
test purpose. Trigger checkpoint every 500ms so that we can
test the process of COLO save/restore.
NOTE:
  This is only a hack, and will be removed at last.

Signed-off-by: Yang Hongyang 
---
 migration-colo.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/migration-colo.c b/migration-colo.c
index 71a500e..1da5629 100644
--- a/migration-colo.c
+++ b/migration-colo.c
@@ -23,7 +23,7 @@
  * this is large because COLO checkpoint will mostly depend on
  * COLO compare module.
  */
-#define CHKPOINT_TIMER 1
+#define CHKPOINT_TIMER 500
 
 enum {
 COLO_READY = 0x46,
@@ -79,11 +79,6 @@ static int comp_fd = -1;
 
 static int colo_compare_init(void)
 {
-comp_fd = open(COMPARE_DEV, O_RDONLY);
-if (comp_fd < 0) {
-return -1;
-}
-
 return 0;
 }
 
@@ -104,17 +99,18 @@ static void colo_compare_destroy(void)
  */
 static int colo_compare(void)
 {
-return ioctl(comp_fd, COMP_IOCTWAIT, 250);
+errno = ERESTART;
+return 1;
 }
 
 static int colo_compare_flush(void)
 {
-return ioctl(comp_fd, COMP_IOCTFLUSH, 1);
+return 0;
 }
 
 static int colo_compare_resume(void)
 {
-return ioctl(comp_fd, COMP_IOCTRESUME, 1);
+return 0;
 }
 
 /* colo checkpoint control helper */
-- 
1.9.1

[Qemu-devel] [RFC PATCH v2 15/23] COLO save: reuse migration bitmap under colo checkpoint

2014-09-23 Thread Yang Hongyang

reuse migration bitmap under colo checkpoint, only send dirty pages
per-checkpoint.

Signed-off-by: Yang Hongyang 
---
 arch_init.c| 20 +++-
 include/migration/migration-colo.h |  2 ++
 migration-colo.c   |  6 ++
 stubs/migration-colo.c | 10 ++
 4 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index c974f3f..00d5793 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -52,6 +52,7 @@
 #include "exec/ram_addr.h"
 #include "hw/acpi/acpi.h"
 #include "qemu/host-utils.h"
+#include "migration/migration-colo.h"
 
 #ifdef DEBUG_ARCH_INIT
 #define DPRINTF(fmt, ...) \
@@ -771,6 +772,15 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 RAMBlock *block;
 int64_t ram_bitmap_pages; /* Size of bitmap in pages, including gaps */
 
+/*
+ * migration has already setup the bitmap, reuse it.
+ */
+if (colo_is_master()) {
+qemu_mutex_lock_ramlist();
+reset_ram_globals();
+goto out_setup;
+}
+
 mig_throttle_on = false;
 dirty_rate_high_cnt = 0;
 bitmap_sync_count = 0;
@@ -830,6 +840,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 migration_bitmap_sync();
 qemu_mutex_unlock_iothread();
 
+out_setup:
 qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
 
 QTAILQ_FOREACH(block, &ram_list.blocks, next) {
@@ -939,7 +950,14 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 }
 
 ram_control_after_iterate(f, RAM_CONTROL_FINISH);
-migration_end();
+
+/*
+ * Since we need to reuse dirty bitmap in colo,
+ * don't cleanup the bitmap.
+ */
+if (!migrate_use_colo() || migration_has_failed(migrate_get_current())) {
+migration_end();
+}
 
 qemu_mutex_unlock_ramlist();
 qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
diff --git a/include/migration/migration-colo.h 
b/include/migration/migration-colo.h
index 861fa27..9b82b71 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -21,10 +21,12 @@ bool colo_supported(void);
 /* save */
 bool migrate_use_colo(void);
 void colo_init_checkpointer(MigrationState *s);
+bool colo_is_master(void);
 
 /* restore */
 bool restore_use_colo(void);
 void restore_exit_colo(void);
+bool colo_is_slave(void);
 
 void colo_process_incoming_checkpoints(QEMUFile *f);
 
diff --git a/migration-colo.c b/migration-colo.c
index 91634d2..8477fbd 100644
--- a/migration-colo.c
+++ b/migration-colo.c
@@ -118,8 +118,6 @@ static int colo_compare_resume(void)
 }
 
 /* colo checkpoint control helper */
-static bool colo_is_master(void);
-static bool colo_is_slave(void);
 
 static void ctl_error_handler(void *opaque, int err)
 {
@@ -192,7 +190,7 @@ static int colo_ctl_get(QEMUFile *f, uint64_t require)
 
 /* save */
 
-static bool colo_is_master(void)
+bool colo_is_master(void)
 {
 MigrationState *s = migrate_get_current();
 return (s->state == MIG_STATE_COLO);
@@ -390,7 +388,7 @@ void colo_init_checkpointer(MigrationState *s)
 
 static Coroutine *colo;
 
-static bool colo_is_slave(void)
+bool colo_is_slave(void)
 {
 return colo != NULL;
 }
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
index 55f0d37..4828316 100644
--- a/stubs/migration-colo.c
+++ b/stubs/migration-colo.c
@@ -22,3 +22,13 @@ void colo_init_checkpointer(MigrationState *s)
 void colo_process_incoming_checkpoints(QEMUFile *f)
 {
 }
+
+bool colo_is_master(void)
+{
+return false;
+}
+
+bool colo_is_slave(void)
+{
+return false;
+}
-- 
1.9.1

[Qemu-devel] [RFC PATCH v2 16/23] COLO ram cache: implement colo ram cache on slave

2014-09-23 Thread Yang Hongyang

The ram cache was initially the same as PVM's memory. At
checkpoint, we cache the dirty memory of PVM into ram cache
(so that ram cache always the same as PVM's memory at every
checkpoint), flush cached memory to SVM after we received
all PVM dirty memory (only needed to flush memory that was
both dirty on PVM and SVM since last checkpoint).

Signed-off-by: Yang Hongyang 
---
 arch_init.c| 154 -
 include/exec/cpu-all.h |   1 +
 include/migration/migration-colo.h |   3 +
 migration-colo.c   |   4 +
 4 files changed, 159 insertions(+), 3 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 00d5793..108cca0 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1015,6 +1015,7 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void 
*host)
 return 0;
 }
 
+static void *memory_region_get_ram_cache_ptr(MemoryRegion *mr, RAMBlock 
*block);
 static inline void *host_from_stream_offset(QEMUFile *f,
 ram_addr_t offset,
 int flags)
@@ -1029,7 +1030,12 @@ static inline void *host_from_stream_offset(QEMUFile *f,
 return NULL;
 }
 
-return memory_region_get_ram_ptr(block->mr) + offset;
+if (colo_is_slave()) {
+migration_bitmap_set_dirty(block->mr->ram_addr + offset);
+return memory_region_get_ram_cache_ptr(block->mr, block) + offset;
+} else {
+return memory_region_get_ram_ptr(block->mr) + offset;
+}
 }
 
 len = qemu_get_byte(f);
@@ -1037,8 +1043,15 @@ static inline void *host_from_stream_offset(QEMUFile *f,
 id[len] = 0;
 
 QTAILQ_FOREACH(block, &ram_list.blocks, next) {
-if (!strncmp(id, block->idstr, sizeof(id)))
-return memory_region_get_ram_ptr(block->mr) + offset;
+if (!strncmp(id, block->idstr, sizeof(id))) {
+if (colo_is_slave()) {
+migration_bitmap_set_dirty(block->mr->ram_addr + offset);
+return memory_region_get_ram_cache_ptr(block->mr, block)
+   + offset;
+} else {
+return memory_region_get_ram_ptr(block->mr) + offset;
+}
+}
 }
 
 error_report("Can't find block %s!", id);
@@ -1056,11 +1069,13 @@ void ram_handle_compressed(void *host, uint8_t ch, 
uint64_t size)
 }
 }
 
+static void ram_flush_cache(void);
 static int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
 ram_addr_t addr;
 int flags, ret = 0;
 static uint64_t seq_iter;
+bool need_flush = false;
 
 seq_iter++;
 
@@ -1123,6 +1138,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 break;
 }
 
+need_flush = true;
 ch = qemu_get_byte(f);
 ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
 } else if (flags & RAM_SAVE_FLAG_PAGE) {
@@ -1135,6 +1151,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 break;
 }
 
+need_flush = true;
 qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
 } else if (flags & RAM_SAVE_FLAG_XBZRLE) {
 void *host = host_from_stream_offset(f, addr, flags);
@@ -1150,6 +1167,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 ret = -EINVAL;
 break;
 }
+need_flush = true;
 } else if (flags & RAM_SAVE_FLAG_HOOK) {
 ram_control_load_hook(f, flags);
 } else if (flags & RAM_SAVE_FLAG_EOS) {
@@ -1163,11 +1181,141 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 ret = qemu_file_get_error(f);
 }
 
+if (!ret && colo_is_slave() && need_flush) {
+ram_flush_cache();
+}
+
 DPRINTF("Completed load of VM with exit code %d seq iteration "
 "%" PRIu64 "\n", ret, seq_iter);
 return ret;
 }
 
+/*
+ * colo cache: this is for secondary VM, we cache the whole
+ * memory of the secondary VM.
+ */
+void create_and_init_ram_cache(void)
+{
+/*
+ * called after first migration
+ */
+RAMBlock *block;
+int64_t ram_cache_pages = last_ram_offset() >> TARGET_PAGE_BITS;
+
+QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+block->host_cache = g_malloc(block->length);
+memcpy(block->host_cache, block->host, block->length);
+}
+
+migration_bitmap = bitmap_new(ram_cache_pages);
+migration_dirty_pages = 0;
+memory_global_dirty_log_start();
+}
+
+void release_ram_cache(void)
+{
+RAMBlock *block;
+
+if (migration_bitmap) {
+memory_global_dirty_log_stop();
+g_free(migration_bitmap);
+migration_bitmap = NULL;
+}
+
+QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+g_free(block->host_cache);
+}
+}
+
+static void *memory_region_get_ram_cache_ptr(MemoryRegion *mr, RAMBlock *block)

[Qemu-devel] [RFC PATCH v2 18/23] COLO nic: add command line switch

2014-09-23 Thread Yang Hongyang

add command line switch

Signed-off-by: Yang Hongyang 
Cc: Anthony Liguori 
Cc: Stefan Hajnoczi 
---
 qapi-schema.json |  8 +++-
 qemu-options.hx  | 10 +-
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/qapi-schema.json b/qapi-schema.json
index 7dcfb90..e53a571 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -2008,6 +2008,10 @@
 #
 # @queues: #optional number of queues to be created for multiqueue capable tap
 #
+# @colo_nicname: #optional the host physical nic for QEMU (Since 2.2)
+#
+# @colo_script: #optional the script file which used by COLO (Since 2.2)
+#
 # Since 1.2
 ##
 { 'type': 'NetdevTapOptions',
@@ -2024,7 +2028,9 @@
 '*vhostfd':'str',
 '*vhostfds':   'str',
 '*vhostforce': 'bool',
-'*queues': 'uint32'} }
+'*queues': 'uint32',
+'*colo_nicname':  'str',
+'*colo_script':   'str'} }
 
 ##
 # @NetdevSocketOptions
diff --git a/qemu-options.hx b/qemu-options.hx
index 365b56c..9f31dcc 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1412,7 +1412,11 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
 "-net tap[,vlan=n][,name=str],ifname=name\n"
 "connect the host TAP network interface to VLAN 'n'\n"
 #else
-"-net 
tap[,vlan=n][,name=str][,fd=h][,fds=x:y:...:z][,ifname=name][,script=file][,downscript=dfile][,helper=helper][,sndbuf=nbytes][,vnet_hdr=on|off][,vhost=on|off][,vhostfd=h][,vhostfds=x:y:...:z][,vhostforce=on|off][,queues=n]\n"
+"-net 
tap[,vlan=n][,name=str][,fd=h][,fds=x:y:...:z][,ifname=name][,script=file][,downscript=dfile][,helper=helper][,sndbuf=nbytes][,vnet_hdr=on|off][,vhost=on|off][,vhostfd=h][,vhostfds=x:y:...:z][,vhostforce=on|off][,queues=n]"
+#ifdef CONFIG_COLO
+"[,colo_nicname=nicname][,colo_script=scriptfile]"
+#endif
+"\n"
 "connect the host TAP network interface to VLAN 'n'\n"
 "use network scripts 'file' (default=" 
DEFAULT_NETWORK_SCRIPT ")\n"
 "to configure it and 'dfile' (default=" 
DEFAULT_NETWORK_DOWN_SCRIPT ")\n"
@@ -1432,6 +1436,10 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
 "use 'vhostfd=h' to connect to an already opened vhost net 
device\n"
 "use 'vhostfds=x:y:...:z to connect to multiple already 
opened vhost net devices\n"
 "use 'queues=n' to specify the number of queues to be 
created for multiqueue TAP\n"
+#ifdef CONFIG_COLO
+"use 'colo_nicname=nicname' to specify the host physical 
nic for QEMU\n"
+"use 'colo_script=scriptfile' to specify script file when 
colo is enabled\n"
+#endif
 "-net bridge[,vlan=n][,name=str][,br=bridge][,helper=helper]\n"
 "connects a host TAP network interface to a host bridge 
device 'br'\n"
 "(default=" DEFAULT_BRIDGE_INTERFACE ") using the program 
'helper'\n"
-- 
1.9.1

[Qemu-devel] [RFC PATCH v2 21/23] COLO nic: implement colo nic device interface configure()

2014-09-23 Thread Yang Hongyang

implement colo nic device interface configure()
add a script to configure nic devices:
${QEMU_SCRIPT_DIR}/network-colo

Script for configuring the network of Master & Slaver.

Usage:
 network-colo (master|slaver) (install|uninstall) vif pif [ifb1 ifb2]

Signed-off-by: Yang Hongyang 
---
 include/net/net.h |   1 +
 net/colo-nic.c| 106 +
 network-colo  | 194 ++
 3 files changed, 301 insertions(+)
 create mode 100755 network-colo

diff --git a/include/net/net.h b/include/net/net.h
index 62050c5..9cc9b5c 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -88,6 +88,7 @@ struct NetClientState {
 char colo_script[1024];
 char colo_nicname[128];
 char ifname[128];
+char ifb[2][128];
 unsigned receive_disabled : 1;
 NetClientDestructor *destructor;
 unsigned int queue_index;
diff --git a/net/colo-nic.c b/net/colo-nic.c
index 7255a48..d661d8b 100644
--- a/net/colo-nic.c
+++ b/net/colo-nic.c
@@ -10,6 +10,7 @@
  */
 #include "net/net.h"
 #include "net/colo-nic.h"
+#include "qemu/error-report.h"
 
 typedef struct nic_device {
 NetClientState *nc;
@@ -26,11 +27,116 @@ static bool nic_support_colo(NetClientState *nc)
 return nc && nc->colo_script[0] && nc->colo_nicname[0];
 }
 
+#define STDOUT_BUF_LEN 1024
+static char stdout_buf[STDOUT_BUF_LEN];
+
+static int launch_colo_script(char *argv[])
+{
+int pid, status;
+char *script = argv[0];
+int fds[2];
+
+bzero(stdout_buf, sizeof(stdout_buf));
+
+if (pipe(fds) < 0) {
+return -1;
+}
+/* try to launch network script */
+pid = fork();
+if (pid == 0) {
+close(fds[0]);
+dup2(fds[1], STDOUT_FILENO);
+execv(script, argv);
+_exit(1);
+} else if (pid > 0) {
+FILE *stream;
+int n;
+close(fds[1]);
+stream = fdopen(fds[0], "r");
+n = fread(stdout_buf, 1, STDOUT_BUF_LEN - 1, stream);
+stdout_buf[n] = '\0';
+close(fds[0]);
+
+while (waitpid(pid, &status, 0) != pid) {
+/* loop */
+}
+
+if (WIFEXITED(status) && WEXITSTATUS(status) == 0) {
+return 0;
+}
+}
+fprintf(stderr, "%s\n", stdout_buf);
+fprintf(stderr, "%s: could not launch network script\n", script);
+return -1;
+}
+
+static void store_ifbname(NetClientState *nc)
+{
+char *str_b = NULL, *str_e = NULL;
+
+str_b = strstr(stdout_buf, "ifb0=");
+if (str_b) {
+str_e = strstr(str_b, "\n");
+}
+if (str_e) {
+snprintf(nc->ifb[0], str_e - str_b - 5 + 1, "%s", str_b + 5);
+}
+
+str_b = str_e = NULL;
+str_b = strstr(stdout_buf, "ifb1=");
+if (str_b) {
+str_e = strstr(str_b, "\n");
+}
+if (str_e) {
+snprintf(nc->ifb[1], str_e - str_b - 5 + 1, "%s", str_b + 5);
+}
+}
+
+static int nic_configure(NetClientState *nc, bool up, bool is_slave)
+{
+char *argv[8];
+char **parg;
+int ret = -1, i;
+int argc = (!is_slave && !up) ? 7 : 5;
+
+if (!nc) {
+error_report("Can not parse colo_script or colo_nicname");
+return ret;
+}
+
+parg = argv;
+*parg++ = nc->colo_script;
+*parg++ = (char *)(is_slave ? "slaver" : "master");
+*parg++ = (char *)(up ? "install" : "uninstall");
+*parg++ = nc->ifname;
+*parg++ = nc->colo_nicname;
+if (!is_slave && !up) {
+*parg++ = nc->ifb[0];
+*parg++ = nc->ifb[1];
+}
+*parg = NULL;
+
+for (i = 0; i < argc; i++) {
+if (!argv[i][0]) {
+error_report("Can not get colo_script argument");
+return ret;
+}
+}
+
+ret = launch_colo_script(argv);
+if (!is_slave && up && ret == 0) {
+store_ifbname(nc);
+}
+
+return ret;
+}
+
 void colo_add_nic_devices(NetClientState *nc)
 {
 struct nic_device *nic = g_malloc0(sizeof(*nic));
 
 nic->support_colo = nic_support_colo;
+nic->configure = nic_configure;
 
 /*
  * TODO
diff --git a/network-colo b/network-colo
new file mode 100755
index 000..9112888
--- /dev/null
+++ b/network-colo
@@ -0,0 +1,194 @@
+#! /bin/bash
+#
+# ${QEMU_SCRIPT_DIR}/network-colo
+#
+# Script for configuring the network of Master & Slaver.
+#
+# Usage:
+# network-colo (master|slaver) (install|uninstall) vif pif [ifb1 ifb2]
+#
+
+sides=$1
+op=$2
+vif=$3
+pif=$4
+ifb1=$5
+ifb2=$6
+BR=br1
+
+qlen=40960
+module="HA_compare"
+device="HA_compare"
+
+# start_master ifbx
+function start_master() {
+
+# In colo mode, we don't use gso, gro...
+ip link set dev $vif qlen $qlen
+
+# copy and foward input packets to $pif
+tc qdisc add dev $vif root handle 1: prio
+tc filter add dev $vif parent 1: protocol ip prio 10 u32 match u32 0 0 
flowid 1:2 action mirred egress mirror dev

[Qemu-devel] [RFC PATCH v2 20/23] COLO nic: implement colo nic device interface support_colo()

2014-09-23 Thread Yang Hongyang

implement colo nic device interface support_colo()

Signed-off-by: Yang Hongyang 
---
 net/colo-nic.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/colo-nic.c b/net/colo-nic.c
index 2f7ca23..7255a48 100644
--- a/net/colo-nic.c
+++ b/net/colo-nic.c
@@ -21,11 +21,17 @@ typedef struct nic_device {
 
 QTAILQ_HEAD(, nic_device) nic_devices = QTAILQ_HEAD_INITIALIZER(nic_devices);
 
+static bool nic_support_colo(NetClientState *nc)
+{
+return nc && nc->colo_script[0] && nc->colo_nicname[0];
+}
+
 void colo_add_nic_devices(NetClientState *nc)
 {
 struct nic_device *nic = g_malloc0(sizeof(*nic));
 
-/* TODO: init colo function pointers */
+nic->support_colo = nic_support_colo;
+
 /*
  * TODO
  * only support "-netdev tap,colo_scripte..."  options
-- 
1.9.1

[Qemu-devel] [RFC PATCH v2 22/23] COLO nic: export colo nic APIs

2014-09-23 Thread Yang Hongyang

export colo nic APIs:
colo_configure_nic()
colo_teardown_nic()

Signed-off-by: Yang Hongyang 
---
 include/net/colo-nic.h |  2 ++
 net/colo-nic.c | 63 +-
 2 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/include/net/colo-nic.h b/include/net/colo-nic.h
index fe6874b..66d2dcc 100644
--- a/include/net/colo-nic.h
+++ b/include/net/colo-nic.h
@@ -14,5 +14,7 @@
 
 void colo_add_nic_devices(NetClientState *nc);
 void colo_remove_nic_devices(NetClientState *nc);
+int colo_configure_nic(bool is_slave);
+void colo_teardown_nic(bool is_slave);
 
 #endif
diff --git a/net/colo-nic.c b/net/colo-nic.c
index d661d8b..5d6d352 100644
--- a/net/colo-nic.c
+++ b/net/colo-nic.c
@@ -11,6 +11,7 @@
 #include "net/net.h"
 #include "net/colo-nic.h"
 #include "qemu/error-report.h"
+#include "migration/migration-colo.h"
 
 typedef struct nic_device {
 NetClientState *nc;
@@ -131,6 +132,35 @@ static int nic_configure(NetClientState *nc, bool up, bool 
is_slave)
 return ret;
 }
 
+static int configure_one_nic(NetClientState *nc, bool up, bool is_slave)
+{
+struct nic_device *nic;
+
+if (!nc) {
+return -1;
+}
+
+QTAILQ_FOREACH(nic, &nic_devices, next) {
+if (nic->nc == nc) {
+if (!nic->support_colo || !nic->support_colo(nic->nc)
+|| !nic->configure) {
+return -1;
+}
+if (up == nic->is_up) {
+return 0;
+}
+
+if (nic->configure(nic->nc, up, is_slave) && up) {
+return -1;
+}
+nic->is_up = up;
+return 0;
+}
+}
+
+return -1;
+}
+
 void colo_add_nic_devices(NetClientState *nc)
 {
 struct nic_device *nic = g_malloc0(sizeof(*nic));
@@ -158,9 +188,40 @@ void colo_remove_nic_devices(NetClientState *nc)
 
 QTAILQ_FOREACH_SAFE(nic, &nic_devices, next, next_nic) {
 if (nic->nc == nc) {
-/* TODO: teardown colo nic */
+if (colo_is_slave()) {
+configure_one_nic(nc, 0, 1);
+}
+if (colo_is_master()) {
+configure_one_nic(nc, 0, 0);
+}
 QTAILQ_REMOVE(&nic_devices, nic, next);
 g_free(nic);
 }
 }
 }
+
+int colo_configure_nic(bool is_slave)
+{
+struct nic_device *nic;
+
+if (QTAILQ_EMPTY(&nic_devices)) {
+return -1;
+}
+
+QTAILQ_FOREACH(nic, &nic_devices, next) {
+if (configure_one_nic(nic->nc, 1, is_slave)) {
+return -1;
+}
+}
+
+return 0;
+}
+
+void colo_teardown_nic(bool is_slave)
+{
+struct nic_device *nic;
+
+QTAILQ_FOREACH(nic, &nic_devices, next) {
+configure_one_nic(nic->nc, 0, is_slave);
+}
+}
-- 
1.9.1

[Qemu-devel] [RFC PATCH v2 23/23] COLO nic: setup/teardown colo nic devices

2014-09-23 Thread Yang Hongyang

setup/teardown colo nic devices when enter/leave
colo process.

Signed-off-by: Yang Hongyang 
---
 migration-colo.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/migration-colo.c b/migration-colo.c
index 1da5629..13f55d9 100644
--- a/migration-colo.c
+++ b/migration-colo.c
@@ -17,6 +17,7 @@
 #include "migration/migration-colo.h"
 #include 
 #include "qemu/error-report.h"
+#include "net/colo-nic.h"
 
 /*
  * checkpoint timer: unit ms
@@ -121,6 +122,7 @@ static void ctl_error_handler(void *opaque, int err)
 /* TODO: determine whether we need to failover */
 /* FIXME: we will not failover currently, just kill slave */
 error_report("error: colo transmission failed!");
+colo_teardown_nic(true);
 exit(1);
 } else if (colo_is_master()) {
 /* Master still alive, do not failover */
@@ -301,6 +303,7 @@ static void *colo_thread(void *opaque)
 
 qdev_hotplug = 0;
 
+colo_configure_nic(false);
 /*
  * Wait for slave finish loading vm states and enter COLO
  * restore.
@@ -343,6 +346,7 @@ out:
 }
 
 colo_compare_destroy();
+colo_teardown_nic(false);
 
 migrate_set_state(s, MIG_STATE_COLO, MIG_STATE_COMPLETED);
 
@@ -442,6 +446,7 @@ void colo_process_incoming_checkpoints(QEMUFile *f)
 }
 
 create_and_init_ram_cache();
+colo_configure_nic(true);
 
 ret = colo_ctl_put(ctl, COLO_READY);
 if (ret) {
@@ -521,6 +526,7 @@ out:
 }
 
 release_ram_cache();
+colo_teardown_nic(true);
 
 if (ctl) {
 qemu_fclose(ctl);
-- 
1.9.1

[Qemu-devel] [PATCH] vpc: fix beX_to_cpu() and cpu_to_beX() confusion

2014-09-23 Thread Stefan Hajnoczi

The beX_to_cpu() and cpu_to_beX() functions perform the same operation -
they do a byteswap if the host CPU endianness is little-endian or a
nothing otherwise.

The point of two names for the same operation is that it documents which
direction the data is being converted.  This makes it clear whether the
data is suitable for CPU processing or in its external representation.

This patch fixes incorrect beX_to_cpu()/cpu_to_beX() usage.

Signed-off-by: Stefan Hajnoczi 
---
 block/vpc.c | 44 ++--
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/block/vpc.c b/block/vpc.c
index 4947369..e08144a 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -207,7 +207,7 @@ static int vpc_open(BlockDriverState *bs, QDict *options, 
int flags,
 "incorrect.\n", bs->filename);
 
 /* Write 'checksum' back to footer, or else will leave it with zero. */
-footer->checksum = be32_to_cpu(checksum);
+footer->checksum = cpu_to_be32(checksum);
 
 // The visible size of a image in Virtual PC depends on the geometry
 // rather than on the size stored in the footer (the size in the footer
@@ -472,7 +472,7 @@ static int64_t alloc_block(BlockDriverState* bs, int64_t 
sector_num)
 
 // Write BAT entry to disk
 bat_offset = s->bat_offset + (4 * index);
-bat_value = be32_to_cpu(s->pagetable[index]);
+bat_value = cpu_to_be32(s->pagetable[index]);
 ret = bdrv_pwrite_sync(bs->file, bat_offset, &bat_value, 4);
 if (ret < 0)
 goto fail;
@@ -699,13 +699,13 @@ static int create_dynamic_disk(BlockDriverState *bs, 
uint8_t *buf,
  * Note: The spec is actually wrong here for data_offset, it says
  * 0x, but MS tools expect all 64 bits to be set.
  */
-dyndisk_header->data_offset = be64_to_cpu(0xULL);
-dyndisk_header->table_offset = be64_to_cpu(3 * 512);
-dyndisk_header->version = be32_to_cpu(0x0001);
-dyndisk_header->block_size = be32_to_cpu(block_size);
-dyndisk_header->max_table_entries = be32_to_cpu(num_bat_entries);
+dyndisk_header->data_offset = cpu_to_be64(0xULL);
+dyndisk_header->table_offset = cpu_to_be64(3 * 512);
+dyndisk_header->version = cpu_to_be32(0x0001);
+dyndisk_header->block_size = cpu_to_be32(block_size);
+dyndisk_header->max_table_entries = cpu_to_be32(num_bat_entries);
 
-dyndisk_header->checksum = be32_to_cpu(vpc_checksum(buf, 1024));
+dyndisk_header->checksum = cpu_to_be32(vpc_checksum(buf, 1024));
 
 // Write the header
 offset = 512;
@@ -810,36 +810,36 @@ static int vpc_create(const char *filename, QemuOpts 
*opts, Error **errp)
 memcpy(footer->creator_app, "qemu", 4);
 memcpy(footer->creator_os, "Wi2k", 4);
 
-footer->features = be32_to_cpu(0x02);
-footer->version = be32_to_cpu(0x0001);
+footer->features = cpu_to_be32(0x02);
+footer->version = cpu_to_be32(0x0001);
 if (disk_type == VHD_DYNAMIC) {
-footer->data_offset = be64_to_cpu(HEADER_SIZE);
+footer->data_offset = cpu_to_be64(HEADER_SIZE);
 } else {
-footer->data_offset = be64_to_cpu(0xULL);
+footer->data_offset = cpu_to_be64(0xULL);
 }
-footer->timestamp = be32_to_cpu(time(NULL) - VHD_TIMESTAMP_BASE);
+footer->timestamp = cpu_to_be32(time(NULL) - VHD_TIMESTAMP_BASE);
 
 /* Version of Virtual PC 2007 */
-footer->major = be16_to_cpu(0x0005);
-footer->minor = be16_to_cpu(0x0003);
+footer->major = cpu_to_be16(0x0005);
+footer->minor = cpu_to_be16(0x0003);
 if (disk_type == VHD_DYNAMIC) {
-footer->orig_size = be64_to_cpu(total_sectors * 512);
-footer->size = be64_to_cpu(total_sectors * 512);
+footer->orig_size = cpu_to_be64(total_sectors * 512);
+footer->size = cpu_to_be64(total_sectors * 512);
 } else {
-footer->orig_size = be64_to_cpu(total_size);
-footer->size = be64_to_cpu(total_size);
+footer->orig_size = cpu_to_be64(total_size);
+footer->size = cpu_to_be64(total_size);
 }
-footer->cyls = be16_to_cpu(cyls);
+footer->cyls = cpu_to_be16(cyls);
 footer->heads = heads;
 footer->secs_per_cyl = secs_per_cyl;
 
-footer->type = be32_to_cpu(disk_type);
+footer->type = cpu_to_be32(disk_type);
 
 #if defined(CONFIG_UUID)
 uuid_generate(footer->uuid);
 #endif
 
-footer->checksum = be32_to_cpu(vpc_checksum(buf, HEADER_SIZE));
+footer->checksum = cpu_to_be32(vpc_checksum(buf, HEADER_SIZE));
 
 if (disk_type == VHD_DYNAMIC) {
 ret = create_dynamic_disk(bs, buf, total_sectors);
-- 
1.9.3

Re: [Qemu-devel] [PATCH] migration: catch unknown flag combinations in ram_load

2014-09-23 Thread Paolo Bonzini

Il 02/09/2014 11:17, Peter Lieven ha scritto:
>>>
>> Juan is back, I'll let him pick it through his tree.
> 
> Juan, have you picked this up?

commit db80facefa62dff42bb50c73b0f03eda5f732b49
Author: Peter Lieven 
Date:   Tue Jun 10 11:29:16 2014 +0200

migration: catch unknown flags in ram_load

if a saved vm has unknown flags in the memory data qemu
currently simply ignores this flag and continues which
yields in an unpredictable result.

This patch catches all unknown flags and aborts the
loading of the vm. Additionally error reports are thrown
if the migration aborts abnormally.

Signed-off-by: Peter Lieven 
Signed-off-by: Juan Quintela 

Paolo

Re: [Qemu-devel] [PATCH 4/4] block: avoid creating oversized writes in multiwrite_merge

2014-09-23 Thread Kevin Wolf

Am 23.09.2014 um 11:32 hat Peter Lieven geschrieben:
> On 23.09.2014 10:59, Kevin Wolf wrote:
> >Am 23.09.2014 um 08:15 hat Peter Lieven geschrieben:
> >>On 22.09.2014 21:06, Paolo Bonzini wrote:
> >>>Il 22/09/2014 11:43, Peter Lieven ha scritto:
> This series aims not at touching default behaviour. The default for 
> max_transfer_length
> is 0 (no limit). max_transfer_length is a limit that MUST be satisfied 
> otherwise the request
> will fail. And Patch 2 aims at catching this fail earlier in the stack.
> >>>Understood.  But the right fix is to avoid that backend limits transpire
> >>>into guest ABI, not to catch the limits earlier.  So the right fix would
> >>>be to implement request splitting.
> >>Since you proposed to add traces for this would you leave those in?
> >>And since iSCSI is the only user of this at the moment would you
> >>go for implementing this check in the iSCSI block layer?
> >>
> >>As for the split logic would you think it is enough to build new qiov's
> >>out of the too big iov without copying the contents? This would work
> >>as long as a single iov inside the qiov is not bigger the 
> >>max_transfer_length.
> >>Otherwise I would need to allocate temporary buffers and copy around.
> >You can split single iovs, too. There are functions that make this very
> >easy, they copy a sub-qiov with a byte granularity offset and length
> >(qemu_iovec_concat and friends). qcow2 uses the same to split requests
> >at (fragmented) cluster boundaries.
> 
> Might it be as easy as this?

This is completely untested, right? :-)

But ignoring bugs, the principle looks right.

> static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
> int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
> BdrvRequestFlags flags)
> {
> if (nb_sectors < 0 || nb_sectors > (UINT_MAX >> BDRV_SECTOR_BITS)) {
> return -EINVAL;
> }
> 
> if (bs->bl.max_transfer_length &&
> nb_sectors > bs->bl.max_transfer_length) {
> int ret = 0;
> QEMUIOVector *qiov2 = NULL;

Make it "QEMUIOVector qiov2;" on the stack.

> size_t soffset = 0;
> 
> trace_bdrv_co_do_readv_toobig(bs, sector_num, nb_sectors,
> bs->bl.max_transfer_length);
> 
> qemu_iovec_init(qiov2, qiov->niov);

And &qiov2 here, then this doesn't crash with a NULL dereference.

> while (nb_sectors > bs->bl.max_transfer_length && !ret) {
> qemu_iovec_reset(qiov2);
> qemu_iovec_concat(qiov2, qiov, soffset,
>   bs->bl.max_transfer_length << BDRV_SECTOR_BITS);
> ret = bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS,
> bs->bl.max_transfer_length << 
> BDRV_SECTOR_BITS,
> qiov2, flags);
> soffset += bs->bl.max_transfer_length << BDRV_SECTOR_BITS;
> sector_num += bs->bl.max_transfer_length;
> nb_sectors -= bs->bl.max_transfer_length;
> }
> qemu_iovec_destroy(qiov2);
> if (ret) {
> return ret;
> }

The error check needs to be immediately after the assignment of ret,
otherwise the next loop iteration can overwrite an error with a success
(and if it didn't, it would still do useless I/O because the request as
a whole would fail anyway).

> }
> 
> return bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS,
>  nb_sectors << BDRV_SECTOR_BITS, qiov, flags);

qiov doesn't work here for the splitting case. You need the remaining
part, not the whole original qiov.

Kevin

Re: [Qemu-devel] [PATCH] migration: catch unknown flag combinations in ram_load

2014-09-23 Thread Paolo Bonzini

Il 23/09/2014 11:46, Paolo Bonzini ha scritto:
> Il 02/09/2014 11:17, Peter Lieven ha scritto:

>>> Juan is back, I'll let him pick it through his tree.
>>
>> Juan, have you picked this up?
> 
> commit db80facefa62dff42bb50c73b0f03eda5f732b49
> Author: Peter Lieven 
> Date:   Tue Jun 10 11:29:16 2014 +0200
> 
> migration: catch unknown flags in ram_load
> 
> if a saved vm has unknown flags in the memory data qemu
> currently simply ignores this flag and continues which
> yields in an unpredictable result.
> 
> This patch catches all unknown flags and aborts the
> loading of the vm. Additionally error reports are thrown
> if the migration aborts abnormally.
> 
> Signed-off-by: Peter Lieven 
> Signed-off-by: Juan Quintela 

Never mind, this is a different patch.

Paolo

Re: [Qemu-devel] [PATCH 4/4] block: avoid creating oversized writes in multiwrite_merge

2014-09-23 Thread Peter Lieven


On 23.09.2014 11:47, Kevin Wolf wrote:

Am 23.09.2014 um 11:32 hat Peter Lieven geschrieben:

On 23.09.2014 10:59, Kevin Wolf wrote:

Am 23.09.2014 um 08:15 hat Peter Lieven geschrieben:

On 22.09.2014 21:06, Paolo Bonzini wrote:

Il 22/09/2014 11:43, Peter Lieven ha scritto:

This series aims not at touching default behaviour. The default for 
max_transfer_length
is 0 (no limit). max_transfer_length is a limit that MUST be satisfied 
otherwise the request
will fail. And Patch 2 aims at catching this fail earlier in the stack.

Understood.  But the right fix is to avoid that backend limits transpire
into guest ABI, not to catch the limits earlier.  So the right fix would
be to implement request splitting.

Since you proposed to add traces for this would you leave those in?
And since iSCSI is the only user of this at the moment would you
go for implementing this check in the iSCSI block layer?

As for the split logic would you think it is enough to build new qiov's
out of the too big iov without copying the contents? This would work
as long as a single iov inside the qiov is not bigger the max_transfer_length.
Otherwise I would need to allocate temporary buffers and copy around.

You can split single iovs, too. There are functions that make this very
easy, they copy a sub-qiov with a byte granularity offset and length
(qemu_iovec_concat and friends). qcow2 uses the same to split requests
at (fragmented) cluster boundaries.

Might it be as easy as this?

This is completely untested, right? :-)


Yes :-)
I was just unsure if the general approach is right.



But ignoring bugs, the principle looks right.


static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
 BdrvRequestFlags flags)
{
 if (nb_sectors < 0 || nb_sectors > (UINT_MAX >> BDRV_SECTOR_BITS)) {
 return -EINVAL;
 }

 if (bs->bl.max_transfer_length &&
 nb_sectors > bs->bl.max_transfer_length) {
 int ret = 0;
 QEMUIOVector *qiov2 = NULL;

Make it "QEMUIOVector qiov2;" on the stack.


 size_t soffset = 0;

 trace_bdrv_co_do_readv_toobig(bs, sector_num, nb_sectors,
bs->bl.max_transfer_length);

 qemu_iovec_init(qiov2, qiov->niov);

And &qiov2 here, then this doesn't crash with a NULL dereference.


 while (nb_sectors > bs->bl.max_transfer_length && !ret) {
 qemu_iovec_reset(qiov2);
 qemu_iovec_concat(qiov2, qiov, soffset,
   bs->bl.max_transfer_length << BDRV_SECTOR_BITS);
 ret = bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS,
 bs->bl.max_transfer_length << 
BDRV_SECTOR_BITS,
 qiov2, flags);
 soffset += bs->bl.max_transfer_length << BDRV_SECTOR_BITS;
 sector_num += bs->bl.max_transfer_length;
 nb_sectors -= bs->bl.max_transfer_length;
 }
 qemu_iovec_destroy(qiov2);
 if (ret) {
 return ret;
 }

The error check needs to be immediately after the assignment of ret,
otherwise the next loop iteration can overwrite an error with a success
(and if it didn't, it would still do useless I/O because the request as
a whole would fail anyway).


There is a && !ret in the loop condition. I wanted to avoid copying the destroy 
part.

BTW, is it !ret or ret < 0 ?




 }

 return bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS,
  nb_sectors << BDRV_SECTOR_BITS, qiov, flags);

qiov doesn't work here for the splitting case. You need the remaining
part, not the whole original qiov.


Right ;-)

Peter

Re: [Qemu-devel] [[PATCH v2] 1/1] vpc.c: Add VHD resize support

2014-09-23 Thread Stefan Hajnoczi

On Mon, Sep 22, 2014 at 01:24:16PM +, Lucian Petrut wrote:
>  >> +if (cpu_to_be32(footer->type) == VHD_DIFFERENCING) {
>  >footer_buf is big-endian so this should be be32_to_cpu()
> 
> My bad, I’ll fix the BE related issues.

The existing block/vpc.c code confuses beX_to_cpu() and cpu_to_beX().  I
have sent a patch to fix that ("[PATCH] vpc: fix beX_to_cpu() and
cpu_to_beX() confusion").

>  >> +error_report("Resizing differencing vhd images is not 
> supported.");
>  >> +return -ENOTSUP;
>  >> +}
>  >> +
>  >> +old_bat_size = (s->max_table_entries * 4 + 511) & ~511;
>  >> +new_total_sectors = offset / BDRV_SECTOR_SIZE;
>  >> +
>  >> +for (index = 0; new_total_sectors > (int64_t)cyls * heads * 
> secs_per_cyl;
>  >> +index++) {
>  >> +if (calculate_geometry(new_total_sectors + index, &cyls, &heads,
>  >> +   &secs_per_cyl)) {
>  >> +return -EFBIG;
>  >> +}
>  >> +}
>  >> +new_total_sectors = (int64_t) cyls * heads * secs_per_cyl;
>  >> +new_num_bat_entries = (new_total_sectors + s->block_size / 512) /
>  >> +  (s->block_size / 512);
>  >This expression doesn't just round up, it always adds an extra block.
>  >Is this intentional?
>  >I expected the numerator for rounding up to be:
>  >  new_total_sectors + s->block_size / 512 - 1
> 
> This was intended, it may just add an extra entry. Also, the number
> of entries is calculated in the same way in the existing code which
> creates dynamic VHDs, so I wanted to keep consistency with that
> as well.

Okay, if existing code already does it like this it makes sense to be
consistent.


pgpyh8Q6AJ6_R.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH 0/3] Fix confused output for alias properties

2014-09-23 Thread Markus Armbruster

Paolo Bonzini  writes:

> Il 23/09/2014 05:09, Gonglei (Arei) ha scritto:
>> Hi,
>> 
>>> This doesn't change the fact that ObjectProperty is a generic struct,
>>> and adding alias-specific fields there is wrong.
>>
>> OK, Maybe I should find other ways to attach this purpose and
>> avoid layering violation. Thanks!
>
> Unfortunately I cannot think of any.
>
> We could add a description field to ObjectProperty, and replace
> legacy_name with a description.  The output then would be
>
> virtio-blk.drive=str (drive)
>> 
>> There is a question that the QOM properties are added dynamically.
>> When we call qdev_alias_all_properties() adding alias properties to
>> the source object all qdev properties on the target DeviceState, how do we
>> judge the property's name and set the value of corresponding
>> description field?
>> Such as setting virtio-blk-pci.drive.description = "drive".
>
> You use the legacy_name field of PropertyInfo to set the description of
> a qdev property, and then let object_property_add_alias() copy the
> description.

Gets us part of the way to what I described in my reply.  I'm fine with
that.

Re: [Qemu-devel] [PATCH 4/4] block: avoid creating oversized writes in multiwrite_merge

2014-09-23 Thread Kevin Wolf

Am 23.09.2014 um 11:52 hat Peter Lieven geschrieben:
> On 23.09.2014 11:47, Kevin Wolf wrote:
> >Am 23.09.2014 um 11:32 hat Peter Lieven geschrieben:
> >>On 23.09.2014 10:59, Kevin Wolf wrote:
> >>>Am 23.09.2014 um 08:15 hat Peter Lieven geschrieben:
> On 22.09.2014 21:06, Paolo Bonzini wrote:
> >Il 22/09/2014 11:43, Peter Lieven ha scritto:
> >>This series aims not at touching default behaviour. The default for 
> >>max_transfer_length
> >>is 0 (no limit). max_transfer_length is a limit that MUST be satisfied 
> >>otherwise the request
> >>will fail. And Patch 2 aims at catching this fail earlier in the stack.
> >Understood.  But the right fix is to avoid that backend limits transpire
> >into guest ABI, not to catch the limits earlier.  So the right fix would
> >be to implement request splitting.
> Since you proposed to add traces for this would you leave those in?
> And since iSCSI is the only user of this at the moment would you
> go for implementing this check in the iSCSI block layer?
> 
> As for the split logic would you think it is enough to build new qiov's
> out of the too big iov without copying the contents? This would work
> as long as a single iov inside the qiov is not bigger the 
> max_transfer_length.
> Otherwise I would need to allocate temporary buffers and copy around.
> >>>You can split single iovs, too. There are functions that make this very
> >>>easy, they copy a sub-qiov with a byte granularity offset and length
> >>>(qemu_iovec_concat and friends). qcow2 uses the same to split requests
> >>>at (fragmented) cluster boundaries.
> >>Might it be as easy as this?
> >This is completely untested, right? :-)
> 
> Yes :-)
> I was just unsure if the general approach is right.

Looks alright to me.

> >But ignoring bugs, the principle looks right.
> >
> >>static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
> >> int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
> >> BdrvRequestFlags flags)
> >>{
> >> if (nb_sectors < 0 || nb_sectors > (UINT_MAX >> BDRV_SECTOR_BITS)) {
> >> return -EINVAL;
> >> }
> >>
> >> if (bs->bl.max_transfer_length &&
> >> nb_sectors > bs->bl.max_transfer_length) {
> >> int ret = 0;
> >> QEMUIOVector *qiov2 = NULL;
> >Make it "QEMUIOVector qiov2;" on the stack.
> >
> >> size_t soffset = 0;
> >>
> >> trace_bdrv_co_do_readv_toobig(bs, sector_num, nb_sectors,
> >>bs->bl.max_transfer_length);
> >>
> >> qemu_iovec_init(qiov2, qiov->niov);
> >And &qiov2 here, then this doesn't crash with a NULL dereference.
> >
> >> while (nb_sectors > bs->bl.max_transfer_length && !ret) {
> >> qemu_iovec_reset(qiov2);
> >> qemu_iovec_concat(qiov2, qiov, soffset,
> >>   bs->bl.max_transfer_length << 
> >> BDRV_SECTOR_BITS);
> >> ret = bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS,
> >> bs->bl.max_transfer_length << 
> >> BDRV_SECTOR_BITS,
> >> qiov2, flags);
> >> soffset += bs->bl.max_transfer_length << BDRV_SECTOR_BITS;
> >> sector_num += bs->bl.max_transfer_length;
> >> nb_sectors -= bs->bl.max_transfer_length;
> >> }
> >> qemu_iovec_destroy(qiov2);
> >> if (ret) {
> >> return ret;
> >> }
> >The error check needs to be immediately after the assignment of ret,
> >otherwise the next loop iteration can overwrite an error with a success
> >(and if it didn't, it would still do useless I/O because the request as
> >a whole would fail anyway).
> 
> There is a && !ret in the loop condition. I wanted to avoid copying the 
> destroy part.

Ah, yes, clever. I missed that. Maybe too clever then. ;-)

> BTW, is it !ret or ret < 0 ?

It only ever returns 0 or negative, so both are equivalent. I
prefer checks for ret < 0, but that's a matter of style rather than
correctness.

Another problem I just noticed is that this is not the only caller of
bdrv_co_do_preadv(), so you're not splitting all requests. The
synchronous bdrv_read/write/pread/pwrite/pwritev functions all don't get
the functionality this way.

Perhaps you should be doing it inside bdrv_co_do_preadv(), before the
call to bdrv_aligned_preadv(). Might even be more correct if it can
happen that the alignment adjustment increases a request too much to fit
in bl.max_transfer_length.

Kevin

Re: [Qemu-devel] [PATCH 1/2] pc-dimm: No numa option shouldn't break hotplug memory feature

2014-09-23 Thread zhanghailiang


On 2014/9/23 17:01, Igor Mammedov wrote:

On Mon, 22 Sep 2014 14:17:28 +0300
"Michael S. Tsirkin"  wrote:


On Fri, Sep 19, 2014 at 02:37:46PM +0200, Igor Mammedov wrote:

On Tue, 16 Sep 2014 18:39:15 +0800
zhanghailiang  wrote:


If we do not configure numa option, memory hotplug should work as well.
It should not depend on numa option.

Steps to reproduce:
(1) Start VM: qemu-kvm -m 1024,slots=4,maxmem=8G
(2) Hotplug memory
It will fail and reports:
"'DIMM property node has value 0' which exceeds the number of numa nodes: 0"

Signed-off-by: zhanghailiang 
---
  hw/mem/pc-dimm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 5bfc5b7..a800ea7 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -252,7 +252,7 @@ static void pc_dimm_realize(DeviceState *dev, Error **errp)
  error_setg(errp, "'" PC_DIMM_MEMDEV_PROP "' property is not set");
  return;
  }
-if (dimm->node >= nb_numa_nodes) {
+if ((nb_numa_nodes > 0) && (dimm->node >= nb_numa_nodes)) {
  error_setg(errp, "'DIMM property " PC_DIMM_NODE_PROP " has value %"
 PRIu32 "' which exceeds the number of numa nodes: %d",
 dimm->node, nb_numa_nodes);


Reviewed-By: Igor Mammedov 



I read:

Hmm, I have just tested this, and Yes, it didn't work for Windows guest.
Thanks for your kind reminder.;)


So should I expect v2 which works with windows?

Hotplug wouldn't work with Windows without -numa (it's Windows limitation)
and more importantly pc-dimm shouldn't be limited only to NUMA configs
which this patch fixes.
This patch is fine and should go to stable as well.



Agreed, maybe i should change the title and submit V2.;)


On top of this we could add automatic NUMA node creation when
memory hotplug is enabled if this Windows workaround is acceptable.


.

[Qemu-devel] [PATCH v2] docs: add blkdebug block driver documentation

2014-09-23 Thread Stefan Hajnoczi

The blkdebug block driver is undocumented.  Documenting it is worthwhile
since it offers powerful error injection features that are used by
qemu-iotests test cases.

This document will make it easier for people to learn about and use
blkdebug.

Signed-off-by: Stefan Hajnoczi 
---
v2:
 * Added GPL v2 or later license and Red Hat copyright [Eric]
 * Expanded ini rules file explanation [Paolo]
 * Added note that errno values depend on the host [Eric]

 docs/blkdebug.txt | 155 ++
 1 file changed, 155 insertions(+)
 create mode 100644 docs/blkdebug.txt

diff --git a/docs/blkdebug.txt b/docs/blkdebug.txt
new file mode 100644
index 000..4034c82
--- /dev/null
+++ b/docs/blkdebug.txt
@@ -0,0 +1,155 @@
+Block I/O error injection using blkdebug
+
+Copyright (C) 2014 Red Hat Inc
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.  See
+the COPYING file in the top-level directory.
+
+The blkdebug block driver is a rule-based error injection engine.  It can be
+used to exercise error code paths in block drivers including ENOSPC (out of
+space) and EIO.
+
+This document gives an overview of the features available in blkdebug.
+
+Background
+--
+Block drivers have many error code paths that handle I/O errors.  Image formats
+are especially complex since metadata I/O errors during cluster allocation or
+while updating tables happen halfway through request processing and require
+discipline to keep image files consistent.
+
+Error injection allows test cases to trigger I/O errors at specific points.
+This way, all error paths can be tested to make sure they are correct.
+
+Rules
+-
+The blkdebug block driver takes a list of "rules" that tell the error injection
+engine when to fail an I/O request.
+
+Each I/O request is evaluated against the rules.  If a rule matches the request
+then its "action" is executed.
+
+Rules can be placed in a configuration file; the configuration file
+follows the same .ini-like format used by QEMU's -readconfig option, and
+each section of the file represents a rule.
+
+The following configuration file defines a single rule:
+
+  $ cat blkdebug.conf
+  [inject-error]
+  event = "read_aio"
+  errno = "28"
+
+This rule fails all aio read requests with ENOSPC (28).  Note that the errno
+value depends on the host.  On Linux, see
+/usr/include/asm-generic/errno-base.h for errno values.
+
+Invoke QEMU as follows:
+
+  $ qemu-system-x86_64
+-drive 
if=none,cache=none,file=blkdebug:blkdebug.conf:test.img,id=drive0 \
+-device virtio-blk-pci,drive=drive0,id=virtio-blk-pci0
+
+Rules support the following attributes:
+
+  event - which type of operation to match (e.g. read_aio, write_aio,
+ flush_to_os, flush_to_disk).  See the "Events" section for
+  information on events.
+
+  state - (optional) the engine must be in this state number in order for this
+ rule to match.  See the "State transitions" section for information
+  on states.
+
+  errno - the numeric errno value to return when a request matches this rule.
+  The errno values depend on the host since the numeric values are not
+  standarized in the POSIX specification.
+
+  sector - (optional) a sector number that the request must overlap in order to
+   match this rule
+
+  once - (optional, default "off") only execute this action on the first
+ matching request
+
+  immediately - (optional, default "off") return a NULL BlockDriverAIOCB
+   pointer and fail without an errno instead.  This exercises the
+   code path where BlockDriverAIOCB fails and the caller's
+BlockDriverCompletionFunc is not invoked.
+
+Events
+--
+Block drivers provide information about the type of I/O request they are about
+to make so rules can match specific types of requests.  For example, the qcow2
+block driver tells blkdebug when it accesses the L1 table so rules can match
+only L1 table accesses and not other metadata or guest data requests.
+
+The core events are:
+
+  read_aio - guest data read
+
+  write_aio - guest data write
+
+  flush_to_os - write out unwritten block driver state (e.g. cached metadata)
+
+  flush_to_disk - flush the host block device's disk cache
+
+See block/blkdebug.c:event_names[] for the list of available events.  You may
+need to grep block driver source code to understand the meaning of specific
+events.
+
+State transitions
+-
+There are cases where more power is needed to match a particular I/O request in
+a longer sequence of requests.  For example:
+
+  write_aio
+  flush_to_disk
+  write_aio
+
+How do we match the 2nd write_aio but not the first?  This is where state
+transitions come in.
+
+The error injection engine has an integer called the "state" that always starts
+initialized to 1.  Rules can be conditional on the state and they can
+transition to a new stat

Re: [Qemu-devel] [PATCH 0/3] trace: drop orphan events from ./trace-events

2014-09-23 Thread Stefan Hajnoczi

On Mon, Sep 22, 2014 at 05:45:15PM +0200, Markus Armbruster wrote:
> Stefan Hajnoczi  writes:
> 
> > Over time a few unused trace events have been left behind in ./trace-events.
> > Either the code that called them was deleted or the event was never called 
> > in
> > the first place.
> >
> > This is a clear violation of the solider's motto "no man left behind".  It's
> > time to bring these trace events home.
> >
> > On a more serious note, unused trace events cause confusion to users who 
> > want
> > to enable existing instrumentation.  On the SystemTap backend we get errors
> > when attempting to enable them because the static probes that these events 
> > are
> > based on do not exist.
> 
> Reviewed-by: Markus Armbruster 
> 
> While you're at it, could you also clean up the pointers to source files
> that have bit-rotted?

Sure, please post a separate patch with your Signed-off-by and I'll
merge it.

Stefan

> $ scripts/cleanup-trace-events.pl trace-events | diff -u trace-events -
> --- trace-events  2014-09-22 17:40:04.228700732 +0200
> +++ - 2014-09-22 17:41:03.107310697 +0200
> @@ -836,7 +836,7 @@
>  pvscsi_tx_rings_ppn(const char* label, uint64_t ppn) "%s page: %"PRIx64""
>  pvscsi_tx_rings_num_pages(const char* label, uint32_t num) "Number of %s 
> pages: %u"
>  
> -# xen-all.c
> +# xen-hvm.c
>  xen_ram_alloc(unsigned long ram_addr, unsigned long size) "requested: %#lx, 
> size %#lx"
>  xen_client_set_memory(uint64_t start_addr, unsigned long size, bool 
> log_dirty) "%#"PRIx64" size %#lx, log_dirty %i"
>  
> @@ -845,7 +845,7 @@
>  xen_remap_bucket(uint64_t index) "index %#"PRIx64
>  xen_map_cache_return(void* ptr) "%p"
>  
> -# hw/xen/xen_platform.c
> +# hw/i386/xen/xen_platform.c
>  xen_platform_log(char *s) "xen platform: %s"
>  
>  # qemu-coroutine.c
> @@ -1075,6 +1075,7 @@
>  vmware_scratch_write(uint32_t index, uint32_t value) "index %d, value 0x%x"
>  vmware_setmode(uint32_t w, uint32_t h, uint32_t bpp) "%dx%d @ %d bpp"
>  
> +# vmstate.c
>  # savevm.c
>  savevm_section_start(const char *id, unsigned int section_id) "%s, 
> section_id %u"
>  savevm_section_end(const char *id, unsigned int section_id) "%s, section_id 
> %u"
> @@ -1241,7 +1242,7 @@
>  virtio_ccw_interpret_ccw(int cssid, int ssid, int schid, int cmd_code) 
> "VIRTIO-CCW: %x.%x.%04x: interpret command %x"
>  virtio_ccw_new_device(int cssid, int ssid, int schid, int devno, const char 
> *devno_mode) "VIRTIO-CCW: add subchannel %x.%x.%04x, devno %04x (%s)"
>  
> -# hw/intc/s390_flic.c
> +# hw/intc/s390_flic_kvm.c
>  flic_create_device(int err) "flic: create device failed %d"
>  flic_no_device_api(int err) "flic: no Device Contral API support %d"
>  flic_reset_failed(int err) "flic: reset failed %d"
> @@ -1254,6 +1255,7 @@
>  migrate_pending(uint64_t size, uint64_t max) "pending size %" PRIu64 " max 
> %" PRIu64
>  migrate_transferred(uint64_t tranferred, uint64_t time_spent, double 
> bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " 
> bandwidth %g max_size %" PRId64
>  
> +# target-ppc/kvm.c
>  # kvm-all.c
>  kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
>  kvm_vm_ioctl(int type, void *arg) "type 0x%x, arg %p"
> @@ -1282,10 +1284,12 @@
>  object_dynamic_cast_assert(const char *type, const char *target, const char 
> *file, int line, const char *func) "%s->%s (%s:%d:%s)"
>  object_class_dynamic_cast_assert(const char *type, const char *target, const 
> char *file, int line, const char *func) "%s->%s (%s:%d:%s)"
>  
> -# hw/xen/xen_pvdevice.c
> +# hw/i386/xen/xen_pvdevice.c
>  xen_pv_mmio_read(uint64_t addr) "WARNING: read from Xen PV Device MMIO space 
> (address %"PRIx64")"
>  xen_pv_mmio_write(uint64_t addr) "WARNING: write to Xen PV Device MMIO space 
> (address %"PRIx64")"
>  
> +# hw/i386/pc.c
> +# hw/acpi/memory_hotplug.c
>  # hw/pci/pci_host.c
>  pci_cfg_read(const char *dev, unsigned devid, unsigned fnid, unsigned offs, 
> unsigned val) "%s %02u:%u @0x%x -> 0x%x"
>  pci_cfg_write(const char *dev, unsigned devid, unsigned fnid, unsigned offs, 
> unsigned val) "%s %02u:%u @0x%x <- 0x%x"


pgpQtyq2pu3re.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH 1/2] pc-dimm: No numa option shouldn't break hotplug memory feature

2014-09-23 Thread zhanghailiang


On 2014/9/23 16:58, Tang Chen wrote:


On 09/23/2014 04:40 PM, Igor Mammedov wrote:

..
It's fine to use SRAT for these purposes on baremetal NUMA systems since
due to used chipset constrains it's possible statically allocate ranges
for every possible DIMM socket.
However SRAT(which is optional table BTW) entries are not mandatory
and override-able by ACPI Device's _PXM/_CRS methods replacing needs
for SRAT entries and QEMU uses this fact by supplying these methods.
QEMU adds FAKE SRAT entry only to workaround Windows limitation,
and for nothing else.

I think Linux does not violate ACPI spec and behaves as expected, moreover
it's more correct than Windows since memory hotplug will work on non NUMA
machines as well.

Hence I think this patch is correct and allows memory hotplug in absence
of NUMA configuration. It also would allow to use pc-dimm as replacement
for initial memory for non-NUMA configs (which is on my TODO list)

As for the Windows, QEMU has no idea what OS it would be running,
I see 2 ways to solve issue:
  1. user should know that memory hotplug on Windows requires NUMA machine
 and specify "-numa ..." option for this case.
(I've discussed this with libvirt folks and was promised that
 if user enables memory hotplug, libvirt would provide "-numa" option
 to workaround Windows issue)

  2. QEMU could unconditionally create single NUMA if memory hotplug is
 enabled. (but that should be enable only for 2.2 or late machines
 to avoid migration issues)


I prefer 2. I'll try to send patches for it if Zhang is also OK with it.



Yep, It is a good scheme to create a dummy NUMA unconditionally.
But Igor has said there are migration issues for this scenario, do you know 
what's
it? ;)

Re: [Qemu-devel] [PATCH] vl: Adjust the place of calling mlockall to speedup VM's startup

2014-09-23 Thread zhanghailiang


On 2014/9/23 16:30, Michael S. Tsirkin wrote:

On Tue, Sep 23, 2014 at 03:57:47PM +0800, zhanghailiang wrote:

If we configure mlock=on and memory policy=bind at the same time,
It will consume lots of time for system to treat with memory,
especially when call mbind after mlockall.

Adjust the place of calling mlockall, calling mbind before mlockall
can remarkably reduce the time of VM's startup.

Signed-off-by: zhanghailiang 


The idea makes absolute sense to me:
bind after lock will force data copy of
all pages. bind before lock gives us an
indication where to put data on fault in.

Acked-by: Michael S. Tsirkin 




Thanks for your quick reviewing..

Best Regards,
zhanghailiang


---
Hi,

Actually, for mbind and mlockall, i have made a test about the time consuming
for the different call sequence.

The results is shown below. It is obviously that mlockall called before mbind is
more time-consuming.

Besides, this patch is OK with memory hotplug.

TEST CODE:
 if (mbind_first) {
 printf("mbind --> mlockall\n");
 mbind(ptr, ram_size/2, MPOL_BIND, &node0mask, 2,
   MPOL_MF_STRICT | MPOL_MF_MOVE);
 mbind(ptr + ram_size/2, ram_size/2, MPOL_BIND, &node1mask, 2,
   MPOL_MF_STRICT | MPOL_MF_MOVE);
 mlockall(MCL_CURRENT | MCL_FUTURE);
 } else {
 printf("mlockall --> mbind\n");
 mlockall(MCL_CURRENT | MCL_FUTURE);
 mbind(ptr, ram_size/2, MPOL_BIND, &node0mask, 2 ,
   MPOL_MF_STRICT | MPOL_MF_MOVE);
 mbind(ptr + ram_size/2, ram_size/2, MPOL_BIND, &node1mask, 2,
   MPOL_MF_STRICT | MPOL_MF_MOVE);
 }

RESULT 1:
#time /home/test_mbind 10240 0
memroy size 10737418240
mlockall --> mbind

real0m11.886s
user0m0.004s
sys 0m11.865s
#time /home/test_mbind 10240 1
memroy size 10737418240
mbind --> mlockall

real0m5.334s
user0m0.000s
sys 0m5.324s

RESULT 2:
#time /home/test_mbind 4096 0
memroy size 4294967296
mlockall --> mbind

real0m5.503s
user0m0.000s
sys 0m5.492s
#time /home/test_mbind 4096 1
memroy size 4294967296
mbind --> mlockall

real0m2.139s
user0m0.000s
sys 0m2.132s

Best Regards,
zhanghailiang
---
  vl.c | 11 +--
  1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/vl.c b/vl.c
index dc792fe..adf4770 100644
--- a/vl.c
+++ b/vl.c
@@ -134,6 +134,7 @@ const char* keyboard_layout = NULL;
  ram_addr_t ram_size;
  const char *mem_path = NULL;
  int mem_prealloc = 0; /* force preallocation of physical target memory */
+int enable_mlock = false;
  int nb_nics;
  NICInfo nd_table[MAX_NICS];
  int autostart;
@@ -1421,12 +1422,8 @@ static void smp_parse(QemuOpts *opts)

  }

-static void configure_realtime(QemuOpts *opts)
+static void realtime_init(void)
  {
-bool enable_mlock;
-
-enable_mlock = qemu_opt_get_bool(opts, "mlock", true);
-
  if (enable_mlock) {
  if (os_mlock() < 0) {
  fprintf(stderr, "qemu: locking memory failed\n");
@@ -3973,7 +3970,7 @@ int main(int argc, char **argv, char **envp)
  if (!opts) {
  exit(1);
  }
-configure_realtime(opts);
+enable_mlock = qemu_opt_get_bool(opts, "mlock", true);
  break;
  case QEMU_OPTION_msg:
  opts = qemu_opts_parse(qemu_find_opts("msg"), optarg, 0);
@@ -4441,6 +4438,8 @@ int main(int argc, char **argv, char **envp)

  machine_class->init(current_machine);

+realtime_init();
+
  audio_init();

  cpu_synchronize_all_post_init();
--
1.7.12.4



.

[Qemu-devel] [PATCH] ohci: Fix compile errors without --enable-trace-backend

2014-09-23 Thread Alexey Kardashevskiy

This adds a stub for ohci_td_pkt() function (which traces packets)
when configured without --enable-trace-backend

Signed-off-by: Alexey Kardashevskiy 
---

It should probably be squashed to
[PATCH] ohci: Convert fprint/DPRINTF/print to traces

Sorry about that...

---
 hw/usb/hcd-ohci.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c
index 4a4dd02..7ea871d 100644
--- a/hw/usb/hcd-ohci.c
+++ b/hw/usb/hcd-ohci.c
@@ -920,6 +920,7 @@ static int ohci_service_iso_td(OHCIState *ohci, struct 
ohci_ed *ed,
 return 1;
 }
 
+#ifdef trace_event_get_state
 static void ohci_td_pkt(const char *msg, const uint8_t *buf, size_t len)
 {
 bool print16 = !!trace_event_get_state(TRACE_USB_OHCI_TD_PKT_SHORT);
@@ -950,6 +951,11 @@ static void ohci_td_pkt(const char *msg, const uint8_t 
*buf, size_t len)
 p += sprintf(p, " %.2x", buf[i]);
 }
 }
+#else
+static void ohci_td_pkt(const char *msg, const uint8_t *buf, size_t len)
+{
+}
+#endif
 
 /* Service a transport descriptor.
Returns nonzero to terminate processing of this endpoint.  */
-- 
2.0.0

Re: [Qemu-devel] [PATCH 1/2] pc-dimm: No numa option shouldn't break hotplug memory feature

2014-09-23 Thread Tang Chen



On 09/23/2014 06:11 PM, zhanghailiang wrote:

On 2014/9/23 16:58, Tang Chen wrote:


On 09/23/2014 04:40 PM, Igor Mammedov wrote:

..
It's fine to use SRAT for these purposes on baremetal NUMA systems 
since

due to used chipset constrains it's possible statically allocate ranges
for every possible DIMM socket.
However SRAT(which is optional table BTW) entries are not mandatory
and override-able by ACPI Device's _PXM/_CRS methods replacing needs
for SRAT entries and QEMU uses this fact by supplying these methods.
QEMU adds FAKE SRAT entry only to workaround Windows limitation,
and for nothing else.

I think Linux does not violate ACPI spec and behaves as expected, 
moreover
it's more correct than Windows since memory hotplug will work on non 
NUMA

machines as well.

Hence I think this patch is correct and allows memory hotplug in 
absence
of NUMA configuration. It also would allow to use pc-dimm as 
replacement

for initial memory for non-NUMA configs (which is on my TODO list)

As for the Windows, QEMU has no idea what OS it would be running,
I see 2 ways to solve issue:
  1. user should know that memory hotplug on Windows requires NUMA 
machine

 and specify "-numa ..." option for this case.
(I've discussed this with libvirt folks and was promised that
 if user enables memory hotplug, libvirt would provide "-numa" 
option

 to workaround Windows issue)

  2. QEMU could unconditionally create single NUMA if memory hotplug is
 enabled. (but that should be enable only for 2.2 or late machines
 to avoid migration issues)


I prefer 2. I'll try to send patches for it if Zhang is also OK with it.



Yep, It is a good scheme to create a dummy NUMA unconditionally.
But Igor has said there are migration issues for this scenario, do you 
know what's

it? ;)


Not sure.  This one ?

https://www.mail-archive.com/qemu-devel%40nongnu.org/msg249146.html

Thanks. :)



.

Re: [Qemu-devel] [PATCH] po: fix conflict with %.mo rule in rules.mak

2014-09-23 Thread Christian Borntraeger

On 09/22/2014 04:19 PM, Paolo Bonzini wrote:
> po/Makefile includes rules.mak to use the nice quiet-command macro.
> However, this also brings in a %.mo rule that breaks "make build".
> Put our own rule before the include, so that it has precedence.
> 
> Reported-by: Christian Borntraeger 
> Signed-off-by: Paolo Bonzini 
Tested-by: Christian Borntraeger 

> ---
>  po/Makefile | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/po/Makefile b/po/Makefile
> index 1ab241a..b271f79 100644
> --- a/po/Makefile
> +++ b/po/Makefile
> @@ -9,6 +9,9 @@ all:
> 
>  .PHONY:  all build clean install update
> 
> +%.mo: %.po
> + $(call quiet-command, msgfmt -o $@ $<, "  GEN   $@")
> +
>  -include ../config-host.mak
>  include $(SRC_PATH)/rules.mak
> 
> @@ -38,9 +41,6 @@ install: $(OBJS)
>   $(INSTALL) -m644 $$obj 
> $(DESTDIR)$(prefix)/share/locale/$$base/LC_MESSAGES/qemu.mo; \
>   done
> 
> -%.mo: %.po
> - $(call quiet-command, msgfmt -o $@ $<, "  GEN   $@")
> -
>  $(PO_PATH)/messages.po: $(SRC_PATH)/ui/gtk.c
>   $(call quiet-command, ( cd $(SRC_PATH) && \
>xgettext -o - --from-code=UTF-8 --foreign-user \
>

Re: [Qemu-devel] [PATCH] trace: tighten up trace-events regex to fix bad parse

2014-09-23 Thread Stefan Hajnoczi

On Mon, Sep 22, 2014 at 12:35:22PM -0500, Lluís Vilanova wrote:
> Stefan Hajnoczi writes:
> 
> > Use \w for properties and trace event names since they are both drawn
> > from [a-zA-Z0-9_] character sets.
> 
> > The .* for matching properties was too aggressive and caused the
> > following failure with foo(int rc) "(this is a test)":
> 
> >   Traceback (most recent call last):
> > File "scripts/tracetool.py", line 139, in 
> >   main(sys.argv)
> > File "scripts/tracetool.py", line 134, in main
> >   binary=binary, probe_prefix=probe_prefix)
> > File "scripts/tracetool/__init__.py", line 334, in generate
> >   events = _read_events(fevents)
> > File "scripts/tracetool/__init__.py", line 262, in _read_events
> >   res.append(Event.build(line))
> > File "scripts/tracetool/__init__.py", line 225, in build
> >   return Event(name, props, fmt, args, arg_fmts)
> > File "scripts/tracetool/__init__.py", line 185, in __init__
> >   % ", ".join(unknown_props))
> >   ValueError: Unknown properties: foo(int, rc)
> 
> > Cc: Lluís Vilanova 
> > Reported-by: Eric Auger 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  scripts/tracetool/__init__.py | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> > diff --git a/scripts/tracetool/__init__.py b/scripts/tracetool/__init__.py
> > index 36c789d..474f11b 100644
> > --- a/scripts/tracetool/__init__.py
> > +++ b/scripts/tracetool/__init__.py
> > @@ -140,8 +140,8 @@ class Event(object):
> >  The format strings for each argument.
> >  """
>  
> > -_CRE = re.compile("((?P.*)\s+)?"
> > -  "(?P[^(\s]+)"
> > +_CRE = re.compile("((?P\w*)\s+)?"
> > +  "(?P\w+)"
> >"\((?P[^)]*)\)"
> >"\s*"
> >"(?:(?:(?P\".+),)?\s*(?P\".+))?"
> 
> The previous implementation allowed multiple properties. Maybe this should be
> instead (which still allows multiple properties):
> 
> "((?P[\w\s]+)\s+)?"
> "(?P\w+)\s*"
> ...

Ooops, good point!

I will send v2.

Stefan


pgpU9wBmHYAv9.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH v3 0/7] virtio-scsi: Dataplane on single iothread

2014-09-23 Thread Paolo Bonzini

Il 23/09/2014 09:49, Fam Zheng ha scritto:
> The second half of previous series while rebasing on Paolo's scsi-next branch.
> 
> Changes include:
> 
>  - Move dataplane fields from VirtIOSCSICommon to VirtIOSCSI.
>  - Assert s->ctx in virtio_scsi_set_iothread.
>  - No virtio_scsi_aio_acquire, just acquire/release in virtio_scsi_push_event.
>  - Add migration state notifier.
>  - Add bdrv_io_plug / bdrv_io_unplug.
> 
> Thanks,
> Fam
> 
> Fam Zheng (7):
>   virtio-scsi-dataplane: Code to run virtio-scsi on iothread
>   virtio-scsi: Hook up with dataplane
>   virtio-scsi: Add migration state notifier for dataplane code
>   virtio-scsi: Two stages processing of cmd request
>   virtio-scsi: Batched prepare for cmd reqs
>   virtio-scsi: Call bdrv_io_plug/bdrv_io_unplug in cmd request handling
>   virtio-scsi: Process ".iothread" property
> 
>  hw/scsi/Makefile.objs   |   2 +-
>  hw/scsi/virtio-scsi-dataplane.c | 229 
> 
>  hw/scsi/virtio-scsi.c   | 116 +---
>  include/hw/virtio/virtio-scsi.h |  33 +-
>  4 files changed, 362 insertions(+), 18 deletions(-)
>  create mode 100644 hw/scsi/virtio-scsi-dataplane.c
> 

Thanks, applied to scsi-next.

Paolo

[Qemu-devel] [PATCH v2] trace: tighten up trace-events regex to fix bad parse

2014-09-23 Thread Stefan Hajnoczi

Use \w for properties and trace event names since they are both drawn
from [a-zA-Z0-9_] character sets.

The .* for matching properties was too aggressive and caused the
following failure with foo(int rc) "(this is a test)":

  Traceback (most recent call last):
File "scripts/tracetool.py", line 139, in 
  main(sys.argv)
File "scripts/tracetool.py", line 134, in main
  binary=binary, probe_prefix=probe_prefix)
File "scripts/tracetool/__init__.py", line 334, in generate
  events = _read_events(fevents)
File "scripts/tracetool/__init__.py", line 262, in _read_events
  res.append(Event.build(line))
File "scripts/tracetool/__init__.py", line 225, in build
  return Event(name, props, fmt, args, arg_fmts)
File "scripts/tracetool/__init__.py", line 185, in __init__
  % ", ".join(unknown_props))
  ValueError: Unknown properties: foo(int, rc)

Cc: Lluís Vilanova 
Reported-by: Eric Auger 
Signed-off-by: Stefan Hajnoczi 
---
v2:
 * Fix regex to allow multiple properties [Lluís]

 scripts/tracetool/__init__.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/scripts/tracetool/__init__.py b/scripts/tracetool/__init__.py
index 36c789d..6ee6af7 100644
--- a/scripts/tracetool/__init__.py
+++ b/scripts/tracetool/__init__.py
@@ -140,8 +140,8 @@ class Event(object):
 The format strings for each argument.
 """
 
-_CRE = re.compile("((?P.*)\s+)?"
-  "(?P[^(\s]+)"
+_CRE = re.compile("((?P[\w\s]+)\s+)?"
+  "(?P\w+)"
   "\((?P[^)]*)\)"
   "\s*"
   "(?:(?:(?P\".+),)?\s*(?P\".+))?"
-- 
1.9.3

Re: [Qemu-devel] [PATCH 3/4] .travis.yml: pre-seed sub-modules for speed

2014-09-23 Thread Alexander Graf



On 15.09.14 18:48, Alex Bennée wrote:
> A significant portion of the build time is spent initialising all the
> sub-modules we use in the source tree. Often this is almost as long as
> the build itself. By pre-seeding the .git/modules tree this will
> hopefully improve things.
> 
> Signed-off-by: Alex Bennée 
> 
> diff --git a/.travis.yml b/.travis.yml
> index f113339..8df02a4 100644
> --- a/.travis.yml
> +++ b/.travis.yml
> @@ -37,7 +37,12 @@ env:
>  - TARGETS=unicore32-softmmu,unicore32-linux-user
>  # Group remaining softmmu only targets into one build
>  - 
> TARGETS=lm32-softmmu,moxie-softmmu,tricore-softmmu,xtensa-softmmu,xtensaeb-softmmu
> +git:
> +  # we want to do this ourselves
> +  submodules: false
>  before_install:
> +  - wget http://people.linaro.org/~alex.bennee/qemu-submodule-git-seed.tar.xz
> +  - tar -xvf qemu-submodule-git-seed.tar.xz

wget -O - | tar?

>- git submodule update --init --recursive

Doesn't this overwrite the code you just downloaded?


Alex

>- sudo apt-get update -qq
>- sudo apt-get install -qq ${CORE_PKGS} ${NET_PKGS} ${GUI_PKGS} 
> ${EXTRA_PKGS}
>

Re: [Qemu-devel] [PATCH 0/4] A number of Travis CI tweaks

2014-09-23 Thread Alexander Graf



On 15.09.14 18:48, Alex Bennée wrote:
> Hi,
> 
> While I was in-between kernel builds last week I attempted to improve
> the Travis build a little. Alexander Graf pointed out we were missing
> a number of the linux-user targets. To avoid exploding the matrix too
> much I've grouped builds together where they hopefully benefit from
> sharing some objtect files. The biggest win however was using a
> tarball to pre-seed the sub-module checkouts. I'm not sure if this is
> because hammering our git server slows down or just because it's a lot
> of data but it was adding up to around half the execution time of the
> build.
> 
> Finally I removed "make check" from every build. It still gets run
> once in the matrix but this means the current instability will bite
> less often.
> 
> I failed in the task of adding mingw builds because it's hard on the
> Travis Ubuntu hosts to install the headers/libs for a cross compile.
> If someone can come up with a nice solid script that hand pulls in
> these dependancies then we can add a call to that for the mingw
> builds. Any volenteers?
> 
> If there are no objections/negative reviews I'll push these at the
> Trivial tree later this week. Or is it time we create a specific
> testing tree for these particular type of patches?

I like it :). Except for the small nit on 3/4.

Reviewed-by: Alexander Graf 


Alex

[Qemu-devel] [PATCH v2] vl: Adjust the place of calling mlockall to speedup VM's startup

2014-09-23 Thread zhanghailiang

If we configure mlock=on and memory policy=bind at the same time,
It will consume lots of time for system to treat with memory,
especially when call mbind behind mlockall.

Adjust the place of calling mlockall, calling mbind before mlockall
can remarkably reduce the time of VM's startup.

Acked-by: Michael S. Tsirkin 
Signed-off-by: zhanghailiang 
---
 v2:
- Add Acked-by
- change 'int' to 'bool' (Thanks Hu Tao)
---
 vl.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/vl.c b/vl.c
index dc792fe..35e5de6 100644
--- a/vl.c
+++ b/vl.c
@@ -134,6 +134,7 @@ const char* keyboard_layout = NULL;
 ram_addr_t ram_size;
 const char *mem_path = NULL;
 int mem_prealloc = 0; /* force preallocation of physical target memory */
+bool enable_mlock = false;
 int nb_nics;
 NICInfo nd_table[MAX_NICS];
 int autostart;
@@ -1421,12 +1422,8 @@ static void smp_parse(QemuOpts *opts)
 
 }
 
-static void configure_realtime(QemuOpts *opts)
+static void realtime_init(void)
 {
-bool enable_mlock;
-
-enable_mlock = qemu_opt_get_bool(opts, "mlock", true);
-
 if (enable_mlock) {
 if (os_mlock() < 0) {
 fprintf(stderr, "qemu: locking memory failed\n");
@@ -3973,7 +3970,7 @@ int main(int argc, char **argv, char **envp)
 if (!opts) {
 exit(1);
 }
-configure_realtime(opts);
+enable_mlock = qemu_opt_get_bool(opts, "mlock", true);
 break;
 case QEMU_OPTION_msg:
 opts = qemu_opts_parse(qemu_find_opts("msg"), optarg, 0);
@@ -4441,6 +4438,8 @@ int main(int argc, char **argv, char **envp)
 
 machine_class->init(current_machine);
 
+realtime_init();
+
 audio_init();
 
 cpu_synchronize_all_post_init();
-- 
1.7.12.4

Re: [Qemu-devel] [PATCH] ohci: Fix compile errors without --enable-trace-backend

2014-09-23 Thread Gerd Hoffmann

On Di, 2014-09-23 at 20:19 +1000, Alexey Kardashevskiy wrote:
> This adds a stub for ohci_td_pkt() function (which traces packets)
> when configured without --enable-trace-backend

Ah, cool.  Just noticed that, while doing usb patch queue test builds
for the next pull req.

> It should probably be squashed to
> [PATCH] ohci: Convert fprint/DPRINTF/print to traces

Done.

thanks,
  Gerd

Re: [Qemu-devel] [PATCH v3 04/23] block: Connect BlockBackend and DriveInfo

2014-09-23 Thread Markus Armbruster

Kevin Wolf  writes:

> Am 16.09.2014 um 20:12 hat Markus Armbruster geschrieben:
>> Make the BlockBackend own the DriveInfo.  Change blockdev_init() to
>> return the BlockBackend instead of the DriveInfo.
>> 
>> Signed-off-by: Markus Armbruster 
>> ---
>>  block.c   |  2 --
>>  block/block-backend.c | 38 
>>  blockdev.c| 73 
>> ---
>>  include/sysemu/blockdev.h |  4 +++
>>  4 files changed, 79 insertions(+), 38 deletions(-)
>> 
>> diff --git a/block.c b/block.c
>> index 7ccf443..5f7dc45 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -29,7 +29,6 @@
>>  #include "qemu/module.h"
>>  #include "qapi/qmp/qjson.h"
>>  #include "sysemu/sysemu.h"
>> -#include "sysemu/blockdev.h"/* FIXME layering violation */
>>  #include "qemu/notify.h"
>>  #include "block/coroutine.h"
>>  #include "block/qapi.h"
>> @@ -2124,7 +2123,6 @@ static void bdrv_delete(BlockDriverState *bs)
>>  /* remove from list, if necessary */
>>  bdrv_make_anon(bs);
>>  
>> -drive_info_del(drive_get_by_blockdev(bs));
>>  g_free(bs);
>>  }
>>  
>> diff --git a/block/block-backend.c b/block/block-backend.c
>> index a12215a..9ee57c7 100644
>> --- a/block/block-backend.c
>> +++ b/block/block-backend.c
>> @@ -12,11 +12,13 @@
>>  
>>  #include "sysemu/block-backend.h"
>>  #include "block/block_int.h"
>> +#include "sysemu/blockdev.h"
>>  
>>  struct BlockBackend {
>>  char *name;
>>  int refcnt;
>>  BlockDriverState *bs;
>> +DriveInfo *legacy_dinfo;
>>  QTAILQ_ENTRY(BlockBackend) link; /* for blk_backends */
>>  };
>>  
>> @@ -87,6 +89,7 @@ static void blk_delete(BlockBackend *blk)
>>  QTAILQ_REMOVE(&blk_backends, blk, link);
>>  }
>>  g_free(blk->name);
>> +drive_info_del(blk->legacy_dinfo);
>>  g_free(blk);
>>  }
>>  
>> @@ -167,6 +170,41 @@ BlockDriverState *blk_bs(BlockBackend *blk)
>>  }
>>  
>>  /*
>> + * Return @blk's DriveInfo if any, else null.
>> + */
>> +DriveInfo *blk_legacy_dinfo(BlockBackend *blk)
>> +{
>> +return blk->legacy_dinfo;
>> +}
>> +
>> +/*
>> + * Set @blk's DriveInfo to @dinfo, and return it.
>> + * @blk must not have a DriveInfo set already.
>> + * No other BlockBackend may have the same DriveInfo set.
>> + */
>> +DriveInfo *blk_set_legacy_dinfo(BlockBackend *blk, DriveInfo *dinfo)
>> +{
>> +assert(!blk->legacy_dinfo);
>> +return blk->legacy_dinfo = dinfo;
>
> Ugh, I don't like assignments in a return statement much more than
> setters that return something.

It's an assignment.  In C, assignments return a value.

> Fortunately, nobody uses the return value, so this can become a void
> function.
>
>> +}
>> +/*
>> + * Return the BlockBackend with DriveInfo @dinfo.
>> + * It must exist.
>> + */
>> +BlockBackend *blk_by_legacy_dinfo(DriveInfo *dinfo)
>> +{
>> +BlockBackend *blk;
>> +
>> +QTAILQ_FOREACH(blk, &blk_backends, link) {
>> +if (blk->legacy_dinfo == dinfo) {
>> +return blk;
>> +}
>> +}
>> +assert(0);
>
> I'm surprised that the compiler doesn't complain here. Seems it
> understands that the condition is always false and the libc header sets
> the noreturn attribute for the internal function handling a failure.
> With NDEBUG set, this wouldn't work any more.
>
> I think abort() is better.

Okay.

>> +}
>> +
>> +/*
>>   * Hide @blk.
>>   * @blk must not have been hidden already.
>>   * Make attached BlockDriverState, if any, anonymous.
>> diff --git a/blockdev.c b/blockdev.c
>> index 5da6028..722d083 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -47,8 +47,6 @@
>>  #include "trace.h"
>>  #include "sysemu/arch_init.h"
>>  
>> -static QTAILQ_HEAD(drivelist, DriveInfo) drives = 
>> QTAILQ_HEAD_INITIALIZER(drives);
>> -
>>  static const char *const if_name[IF_COUNT] = {
>>  [IF_NONE] = "none",
>>  [IF_IDE] = "ide",
>> @@ -89,7 +87,8 @@ static const int if_max_devs[IF_COUNT] = {
>>   */
>>  void blockdev_mark_auto_del(BlockDriverState *bs)
>>  {
>> -DriveInfo *dinfo = drive_get_by_blockdev(bs);
>> +BlockBackend *blk = bs->blk;
>> +DriveInfo *dinfo = blk_legacy_dinfo(blk);
>>  
>>  if (dinfo && !dinfo->enable_auto_del) {
>>  return;
>> @@ -105,7 +104,8 @@ void blockdev_mark_auto_del(BlockDriverState *bs)
>>  
>>  void blockdev_auto_del(BlockDriverState *bs)
>>  {
>> -DriveInfo *dinfo = drive_get_by_blockdev(bs);
>> +BlockBackend *blk = bs->blk;
>> +DriveInfo *dinfo = blk_legacy_dinfo(blk);
>>  
>>  if (dinfo && dinfo->auto_del) {
>>  drive_del(dinfo);
>> @@ -153,15 +153,15 @@ QemuOpts *drive_add(BlockInterfaceType type, int 
>> index, const char *file,
>>  
>>  DriveInfo *drive_get(BlockInterfaceType type, int bus, int unit)
>>  {
>> +BlockBackend *blk;
>>  DriveInfo *dinfo;
>>  
>> -/* seek interface, bus and unit */
>> -
>> -QTAILQ_FOREACH(dinfo, &drives, next) {
>> -if (dinfo->type == type &&
>> -dinfo->bus == bus &&
>>

Re: [Qemu-devel] [PATCH v3 00/19] usb: convert device init to realize

2014-09-23 Thread Gonglei (Arei)

Hi, Gred

Would you like to pick up this patch series? Thanks.


Best regards,
-Gonglei

> -Original Message-
> From: Gonglei (Arei)
> Sent: Friday, September 19, 2014 2:48 PM
> To: qemu-devel@nongnu.org
> Cc: kra...@redhat.com; Huangweidong (C); arm...@redhat.com;
> pbonz...@redhat.com; Huangpeng (Peter); Luonengjun; Gonglei (Arei)
> Subject: [PATCH v3 00/19] usb: convert device init to realize
> 
> From: Gonglei 
> 
> DeviceClass->init is the old interface, let's convert usb
> devices to the new realize API. In this way, all the
> implementations now use error_setg instead of
> qerror_report/error_report for reporting error.
> 
> Note:
>  Next, I will post a incremental patch series fixing
>  usb-serial issue. :)
> 
> v3 -> v2:
>  - fix minor style align issues (Gerd)
> 
> v2 -> v1:
>  - fix PATCH 2, using qerror_report_err print error messages
>   when attach fails (Paolo)
>  - using errp instead of qerror_report_err introduced by
>   fix 1 in PATCH 12 (Paolo)
>  - fix missing return in PATCH 14 (Paolo)
>  - add 'Reviewed-by' tag for other patches
> 
> Thanks a lot for reviewing!
> 
> Gonglei (19):
>   usb-storage: fix possible memory leak and missing error message
>   usb-bus: convert USBDeviceClass init to realize
>   usb-net: convert init to realize
>   libusb: convert init to realize
>   libusb: using error_report instead of fprintf
>   usb-hub: convert init to realize
>   dev-storage: convert init to realize
>   dev-storage: usring error_report instead of fprintf/printf
>   dev-uas: convert init to realize
>   dev-uas: using error_report instead of fprintf
>   dev-bluetooth: convert init to realize
>   dev-serial: convert init to realize
>   usb-ccid: convert init to realize
>   dev-hid: convert init to realize
>   dev-wacom: convert init to realize
>   usb-audio: convert init to realize
>   usb-redir: convert init to realize
>   usb-mtp: convert init to realize
>   usb-bus: remove "init" from USBDeviceClass struct
> 
>  hw/usb/bus.c  | 79
> ++-
>  hw/usb/dev-audio.c|  5 ++-
>  hw/usb/dev-bluetooth.c|  6 ++--
>  hw/usb/dev-hid.c  | 27 +++
>  hw/usb/dev-hub.c  |  9 +++--
>  hw/usb/dev-mtp.c  |  5 ++-
>  hw/usb/dev-network.c  |  9 +++--
>  hw/usb/dev-serial.c   | 22 +++-
>  hw/usb/dev-smartcard-reader.c |  5 ++-
>  hw/usb/dev-storage.c  | 42 ---
>  hw/usb/dev-uas.c  | 17 +-
>  hw/usb/dev-wacom.c|  5 ++-
>  hw/usb/host-libusb.c  | 33 +-
>  hw/usb/redirect.c | 21 +++-
>  include/hw/usb.h  | 10 --
>  15 files changed, 150 insertions(+), 145 deletions(-)
> 
> --
> 1.7.12.4
>

Re: [Qemu-devel] [PATCH 0/2] usb-serial: only check speed once at realize time

2014-09-23 Thread Gonglei (Arei)

And this one :)  Thanks!


Best regards,
-Gonglei


> -Original Message-
> From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo
> Bonzini
> Sent: Friday, September 19, 2014 4:08 PM
> To: Gonglei (Arei); qemu-devel@nongnu.org
> Cc: Huangweidong (C); Luonengjun; Huangpeng (Peter); arm...@redhat.com;
> kra...@redhat.com
> Subject: Re: [PATCH 0/2] usb-serial: only check speed once at realize time
> 
> Il 19/09/2014 09:25, arei.gong...@huawei.com ha scritto:
> > From: Gonglei 
> >
> > This patch series based on
> >  [PATCH v3 00/19] usb: convert device init to realize
> >
> > As Paolo's comments:
> >
> > usb port speed check could be extracted to a separate
> > function usb_check_attach, that is called just once at realize time,
> > even if !s->cs->be_open.
> >
> > Please review, Thanks. :)
> >
> > Gonglei (2):
> >   usb-bus: introduce a wrapper function to check speed
> >   usb-serial: only check speed once at realize time
> >
> >  hw/usb/bus.c| 14 +-
> >  hw/usb/dev-serial.c | 16 +---
> >  include/hw/usb.h|  1 +
> >  3 files changed, 23 insertions(+), 8 deletions(-)
> >
> 
> Reviewed-by: Paolo Bonzini

Re: [Qemu-devel] [PATCH v3 00/19] usb: convert device init to realize

2014-09-23 Thread Gerd Hoffmann

On Di, 2014-09-23 at 10:58 +, Gonglei (Arei) wrote:
> Hi, Gred
> 
> Would you like to pick up this patch series? Thanks.

Just picked it up (and the followup too), test builds running, pull
request will most likely follow later today.

cheers,
  Gerd

Re: [Qemu-devel] [PATCH v3 00/19] usb: convert device init to realize

2014-09-23 Thread Gonglei (Arei)

> From: Gerd Hoffmann [mailto:kra...@redhat.com]
> Sent: Tuesday, September 23, 2014 7:02 PM
> Subject: Re: [PATCH v3 00/19] usb: convert device init to realize
> 
> On Di, 2014-09-23 at 10:58 +, Gonglei (Arei) wrote:
> > Hi, Gred
> >
> > Would you like to pick up this patch series? Thanks.
> 
> Just picked it up (and the followup too), test builds running, pull
> request will most likely follow later today.
> 
Great! Thanks for your work. :)

Best regards,
-Gonglei

> cheers,
>   Gerd
>

Re: [Qemu-devel] [PULL 0/9] Trivial patches for 2014-09-22

2014-09-23 Thread Peter Maydell

On 22 September 2014 09:11, Michael Tokarev  wrote:
> Here's a next (small) batch of trivial stuff.  Accumulated for over
> 2 weeks, but still quite small.  Random tiny things here and there.
> Please consider pulling/applying.
>
> Thanks,
>
> /mjt
>
> The following changes since commit 07e2863d0271ac6c05206d8ce9e4f4c39b25d3ea:
>
>   exec.c: fix setting 1-byte-long watchpoints (2014-09-19 17:42:16 +0100)
>
> are available in the git repository at:
>
>   git://git.corpit.ru/qemu.git tags/trivial-patches-2014-09-22
>
> for you to fetch changes up to 7e3d523883202396ae7ff8bafcc796c86e026adc:
>
>   arch_init: Setting QEMU_ARCH enum straight (2014-09-22 12:09:43 +0400)
>
> 
> trivial patches for 2014-09-22
>
> 

Applied, thanks.

-- PMM

Re: [Qemu-devel] [PATCH 1/2] pc-dimm: No numa option shouldn't break hotplug memory feature

2014-09-23 Thread Igor Mammedov

On Tue, 23 Sep 2014 18:07:16 +0800
zhanghailiang  wrote:

> On 2014/9/23 17:01, Igor Mammedov wrote:
> > On Mon, 22 Sep 2014 14:17:28 +0300
> > "Michael S. Tsirkin"  wrote:
> >
> >> On Fri, Sep 19, 2014 at 02:37:46PM +0200, Igor Mammedov wrote:
> >>> On Tue, 16 Sep 2014 18:39:15 +0800
> >>> zhanghailiang  wrote:
> >>>
>  If we do not configure numa option, memory hotplug should work as well.
>  It should not depend on numa option.
> 
>  Steps to reproduce:
>  (1) Start VM: qemu-kvm -m 1024,slots=4,maxmem=8G
>  (2) Hotplug memory
>  It will fail and reports:
>  "'DIMM property node has value 0' which exceeds the number of numa 
>  nodes: 0"
> 
>  Signed-off-by: zhanghailiang 
>  ---
>    hw/mem/pc-dimm.c | 2 +-
>    1 file changed, 1 insertion(+), 1 deletion(-)
> 
>  diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
>  index 5bfc5b7..a800ea7 100644
>  --- a/hw/mem/pc-dimm.c
>  +++ b/hw/mem/pc-dimm.c
>  @@ -252,7 +252,7 @@ static void pc_dimm_realize(DeviceState *dev, Error 
>  **errp)
>    error_setg(errp, "'" PC_DIMM_MEMDEV_PROP "' property is not 
>  set");
>    return;
>    }
>  -if (dimm->node >= nb_numa_nodes) {
>  +if ((nb_numa_nodes > 0) && (dimm->node >= nb_numa_nodes)) {
>    error_setg(errp, "'DIMM property " PC_DIMM_NODE_PROP " has 
>  value %"
>   PRIu32 "' which exceeds the number of numa nodes: 
>  %d",
>   dimm->node, nb_numa_nodes);
> >>>
> >>> Reviewed-By: Igor Mammedov 
> >>
> >>
> >> I read:
> >>> Hmm, I have just tested this, and Yes, it didn't work for Windows guest.
> >>> Thanks for your kind reminder.;)
> >>
> >> So should I expect v2 which works with windows?
> > Hotplug wouldn't work with Windows without -numa (it's Windows limitation)
> > and more importantly pc-dimm shouldn't be limited only to NUMA configs
> > which this patch fixes.
> > This patch is fine and should go to stable as well.
> >
> 
> Agreed, maybe i should change the title and submit V2.;)
sure

> 
> > On top of this we could add automatic NUMA node creation when
> > memory hotplug is enabled if this Windows workaround is acceptable.
> >
> >
> > .
> >
> 
>

[Qemu-devel] Why is virt-resize designed to involve two disks?

2014-09-23 Thread 吴志勇

HI,
As you've known, vhd-util and qemu-img both provide the capacity for resizing 
the original disk, but why is virt-resize designed to involve two disks, not 
only original disk? Is there any concern? Is it possible that only one original 
disk is involved in virt-resize?

-- 
Regards,
 
Zhi Yong Wu

Re: [Qemu-devel] [PATCH 1/2] pc-dimm: No numa option shouldn't break hotplug memory feature

2014-09-23 Thread Igor Mammedov

On Tue, 23 Sep 2014 18:11:35 +0800
zhanghailiang  wrote:

> On 2014/9/23 16:58, Tang Chen wrote:
> >
> > On 09/23/2014 04:40 PM, Igor Mammedov wrote:
> >> ..
> >> It's fine to use SRAT for these purposes on baremetal NUMA systems since
> >> due to used chipset constrains it's possible statically allocate ranges
> >> for every possible DIMM socket.
> >> However SRAT(which is optional table BTW) entries are not mandatory
> >> and override-able by ACPI Device's _PXM/_CRS methods replacing needs
> >> for SRAT entries and QEMU uses this fact by supplying these methods.
> >> QEMU adds FAKE SRAT entry only to workaround Windows limitation,
> >> and for nothing else.
> >>
> >> I think Linux does not violate ACPI spec and behaves as expected, moreover
> >> it's more correct than Windows since memory hotplug will work on non NUMA
> >> machines as well.
> >>
> >> Hence I think this patch is correct and allows memory hotplug in absence
> >> of NUMA configuration. It also would allow to use pc-dimm as replacement
> >> for initial memory for non-NUMA configs (which is on my TODO list)
> >>
> >> As for the Windows, QEMU has no idea what OS it would be running,
> >> I see 2 ways to solve issue:
> >>   1. user should know that memory hotplug on Windows requires NUMA machine
> >>  and specify "-numa ..." option for this case.
> >> (I've discussed this with libvirt folks and was promised that
> >>  if user enables memory hotplug, libvirt would provide "-numa" option
> >>  to workaround Windows issue)
> >>
> >>   2. QEMU could unconditionally create single NUMA if memory hotplug is
> >>  enabled. (but that should be enable only for 2.2 or late machines
> >>  to avoid migration issues)
> >>
> > I prefer 2. I'll try to send patches for it if Zhang is also OK with it.
> >
> 
> Yep, It is a good scheme to create a dummy NUMA unconditionally.
> But Igor has said there are migration issues for this scenario, do you know 
> what's
> it? ;)
> 
'-numa' will add SRAT table to ACPI tables blob, as result it will grow in size,
depending on config options #cpus, #dimms, #pci-bridges it could trigger
issues we've had with prior 2.1 was released.

Re: [Qemu-devel] [PATCH v3 03/23] block: Connect BlockBackend to BlockDriverState

2014-09-23 Thread Kevin Wolf

Am 22.09.2014 um 18:34 hat Markus Armbruster geschrieben:
> Kevin Wolf  writes:
> 
> > Am 16.09.2014 um 20:12 hat Markus Armbruster geschrieben:
> >> diff --git a/include/block/block_int.h b/include/block/block_int.h
> >> index 8d86a6c..14e0b7c 100644
> >> --- a/include/block/block_int.h
> >> +++ b/include/block/block_int.h
> >> @@ -324,6 +324,8 @@ struct BlockDriverState {
> >>  BlockDriver *drv; /* NULL means no media */
> >>  void *opaque;
> >>  
> >> +BlockBackend *blk;  /* owning backend, if any */
> >> +
> >>  void *dev;  /* attached device model, if any */
> >>  /* TODO change to DeviceState when all users are qdevified */
> >>  const BlockDevOps *dev_ops;
> >
> > Just to make sure that we agree on where we're going: This makes the
> > assumption that a BDS has at most one BB that owns it.
> 
> Yes.
> 
> >Which is not the
> > final state that we want to have, so this will have to go away later.
> 
> I don't know.  Can you explain why you think we're going to want
> multiple BBs?

We already agreed that we'll have multiple parents for a BDS, for
scenarios like having an NBD server on a snapshot or sharing backing
files, potentially also some block jobs.

The question is whether among these multiple parents we want to have a
limitation to one BlockBackend, forbidding e.g. an NBD server on the
active layer. This would be a problem for live storage migration if we
don't want the NBD server to reuse the same BB as the guest device.

More generally, if we can indirectly have multiple BBs on a single
BDS by putting a filter in between, do we have good reasons to forbid
having them attached directly?

> > (Where "later" isn't necessarily part of this series.)
> >
> > For now, the use of the field is limited to callbacks and
> > bdrv_get_device_name(). Callbacks could always only serve a single
> > device, so nothing became worse here.
> 
> In *this* patch, member blk is only read in bdrv_swap(), which asserts
> it's null.  Later on in the series, it gets indeed used as you describe.

Yes, my "now" depends on context and either refers to the patch I'm
commenting on or the end of the series. In most cases when I see
something that I feel is worth having a closer look, the first thing I
do is looking at the fully applied series.

> PATCH 22 puts it to use for BlockDevOps callbacks.  The patch moves the
> callbacks from BDS to BB.  I hope you'll agree that's where they belong.
> 
> Naturally, the *calls* of the callbacks remain where they are, in
> block.c.  They get updated like this:
> 
> -   bdrv_dev_FOO(bs, ARGS)
> +   if (bs->blk) {
> +   blk_dev_FOO(bs->blk ARGS)
> +   }

Yes, as I said, this is fine for now. When we allow multiple BBs, we'll
have to turn it into something like notifier lists, but that can wait.

> PATCH 08 uses it to eliminate BDS member device_name[].
> 
> > I'm not entirely sure about bdrv_get_device_name(), whether it needs to
> > go or to be rewritten to get the name of any BB pointing to it (I
> > suspect for most callers we want to replace it by something that uses
> > node-name by default if there is one and only fall back to BB names if
> > there isn't), but that's not an issue to block this patch.
> 
> I agree users of bdrv_get_device_name() need to be examined, and the
> ones that really want a BDS name should probably be changed to use the
> BDS name (a.k.a. node-name) and fall back to the BB name.
> 
> This series makes this need more visible, by emphasizing the
> distinctness of the two names.
> 
> Aside: which one to fall back to if we have multiple BBs?

My first attempt would be "any", and in cases where this isn't good
enough, you can't use a fallback at all.

> > What I would consider, however, is adding a TODO comment that tells
> > people that this field needs to go and if you need to use it, something
> > is wrong with your design (which happens to be true for the existing
> > design of some code).
> 
> For the device callbacks, we need a way to find the BB.  If multiple BBs
> can sit on top of the same BDS, we need to find the one with a device
> models attached.  Ot even the ones, if we permit that.
> 
> Let's discuss this a bit, and depending on what we learn, add a suitable
> comment.  Possibly on top.

Are you sure that nothing else than device models can be interested in
callbacks? I expect that whatever block layer user we have, they will
always be interested in resizes, for example. Media change might also
not be entirely uninteresting, though in most cases what other users
want is probably a blocker.

Kevin

Re: [Qemu-devel] [PATCH 0/3] Fix confused output for alias properties

2014-09-23 Thread Markus Armbruster

Paolo Bonzini  writes:

> Il 22/09/2014 13:43, Markus Armbruster ha scritto:
>> Paolo Bonzini  writes:
>> 
>>> Il 22/09/2014 10:34, Michael S. Tsirkin ha scritto:
 Basically this patch brings back historical HMP behaviour.
 As far as I could tell, it wasn't changed intentionally.
>>>
>>> It was changed intentionally.  Or rather, the change was known at the
>>> time Stefan made it.
>> 
>> Commit hash?
>
> There are three commits involved.
>
> The first is commit f4eb32b (qmp: show QOM properties in
> device-list-properties, 2014-05-20).  It was done in preparation for the
> change of virtio-blk-pci.drive from qdev property to QOM property, to
> avoid breaking device-list-properties.

Yes.

Unless there were *no* QOM properties then (other than the implicit ones
listed in the commit message), it also made the help complete again,
which I'd regard as regression fix.

> The actual change took some more time to review, so it went in one month
> later.  Commit caffdac (virtio-blk: use aliases instead of duplicate
> qdev properties, 2014-06-18) is what changed virtio-blk-pci.drive from
> qdev property to QOM property.  This changed the type from 'drive' to
> 'str' in device-list-properties.  (Note that this patch fixed an actual
> bug, it was not just for the sake of cleanup).
>
> I cannot find any reference to the change; perhaps it was discussed only
> on IRC.  However, I'm quite certain that I knew about it.

If memory serves, I didn't.

> Now one thing did slip through; commit caffdac actually dropped
> virtio-blk-pci.drive from -device virtio-blk-pci,help.  Either I
> misremembered that the first commit fixed command-line help too, or I
> just assumed that -device foo,help was implemented on top of
> qmp_device_list_properties.  Which wasn't the case,

Happens.

> so the third commit
> (9ef52358, qdev-monitor: include QOM properties in -device FOO, help
> output, 2014-07-09) did the right thing and brought back

You mean commit ef52358.

> virtio-blk-pci.drive into the help message.
>
> Now, the cover latter for that patch finally has a hint that the change
> was known.  http://thread.gmane.org/gmane.comp.emulators.qemu/285819 has
> this text:
>
>We simply need to update -device FOO,help code to use both qdev and
>QOM properties.  Note that types change because a 'drive' qdev type
>is actually a 'str' QOM type.  We're moving more and more to QOM
>properties where the final type for this property would be
>'link' or similar.
>
> It is only in the cover letter and thus not part of any commit message.

I missed it.  Of course I might have missed it even if the commit
message had spelled it out.

The cover letter shows the people involved in the patch were aware of
the help change.  The clause "the final type for this property would be
'link' or similar" can perhaps be read as "yes, the change is a
degradation, but it's a temporary one".  Sounds sensible, except our
path to this "final type" is still unclear, and we have no ETA.  Makes
the degradation rather less temporary.

>> I haven't got this part of the code present in my mind at the moment,
>> but I'm willing to take your assertion of a layering violation for
>> granted.  Is this violation necessary for fixing the regression, or is
>> it just an artifact of this particular fix?
>
> I guess we could always add some ad-hoc mechanism for
> device-list-properties to get to the "drive" string, and smuggle it as a
> QOM feature.  Then, aliases would just forward that mechanism to the
> aliased property, which would just work.
>
> Of course, with every new feature we would most likely have yet another
> unfinished transition.  In the lack of a clear user complaint (or even
> of a clear indication that human users ever used -device foo,help...)
> the tempation to pass the buck is strong.

I use it all the time, both interactively to get help, and
programmatically to answer questions about the code or a configuration.

I think elsewhere in this thread, we found a way to improve help that
doesn't involve ad hoc hackery to object.  It does start yet another
transition, though.

Thanks for your explanation, and your patience.

1 2 3 >

1 - 100 of 232 matches

Mail list logo