[Qemu-devel] [Bug 1527765] Re: sh4: ghc randomly segfaults on qemu-sh4-static

2015-12-25 Thread John Paul Adrian Glaubitz
Interestingly, cmake also seems to crash in a similar way:

- Log: 
https://buildd.debian.org/status/fetch.php?pkg=apt-cacher-ng&arch=sh4&ver=0.8.8-1&stamp=1450985460
- Log: 
https://buildd.debian.org/status/fetch.php?pkg=texworks&arch=sh4&ver=0.5~svn1363-6%2Bb1&stamp=1450992669
- Log: 
https://buildd.debian.org/status/fetch.php?pkg=x265&arch=sh4&ver=1.8-6&stamp=1450995672
- Log: 
https://buildd.debian.org/status/fetch.php?pkg=libwebsockets&arch=sh4&ver=1.6.0-2&stamp=1450997039

Maybe those are related?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1527765

Title:
  sh4: ghc randomly segfaults on qemu-sh4-static

Status in QEMU:
  New

Bug description:
  Hello!

  I am currently in the process of bootstrapping ghc for the Debian sh4
  port and ran into a strange problem with qemu-sh4-static which
  randomly segfaults when running ghc to compile a Haskell source:

  root@jessie32:~/ghc-7.8.4/utils/ghc-pwd# ls
  Main.hi  Main.hs  Setup.hs  ghc-pwd.cabal  ghc.mk
  root@jessie32:~/ghc-7.8.4/utils/ghc-pwd# ghc Main.hs
  /bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
  qemu: uncaught target signal 11 (Segmentation fault) - core dumped
  Segmentation fault
  root@jessie32:~/ghc-7.8.4/utils/ghc-pwd# ghc Main.hs
  /bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
  qemu: uncaught target signal 11 (Segmentation fault) - core dumped
  Segmentation fault
  root@jessie32:~/ghc-7.8.4/utils/ghc-pwd# ghc Main.hs
  /bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
  qemu: uncaught target signal 11 (Segmentation fault) - core dumped
  Segmentation fault
  root@jessie32:~/ghc-7.8.4/utils/ghc-pwd# ghc Main.hs
  /bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
  qemu: uncaught target signal 11 (Segmentation fault) - core dumped
  Segmentation fault
  root@jessie32:~/ghc-7.8.4/utils/ghc-pwd# ghc Main.hs
  /bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
  [1 of 1] Compiling Main ( Main.hs, Main.o )
  qemu: uncaught target signal 11 (Segmentation fault) - core dumped
  Segmentation fault
  root@jessie32:~/ghc-7.8.4/utils/ghc-pwd# ghc Main.hs
  /bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
  [1 of 1] Compiling Main ( Main.hs, Main.o )
  qemu: uncaught target signal 11 (Segmentation fault) - core dumped
  Segmentation fault
  root@jessie32:~/ghc-7.8.4/utils/ghc-pwd# ghc Main.hs
  /bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
  [1 of 1] Compiling Main ( Main.hs, Main.o )
  Bad interface file: 
/usr/local/lib/sh4-unknown-linux-gnu-ghc-7.10.3/time/dist-install/build/Data/Time/Format/Parse.hi
  ghc: panic! (the 'impossible' happened)
    (GHC version 7.10.3 for sh4-unknown-linux):
   getSymtabName:unknown known-key unique
  <>

  Please report this as a GHC bug:
  http://www.haskell.org/ghc/reportabug

  root@jessie32:~/ghc-7.8.4/utils/ghc-pwd# ghc Main.hs
  /bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
  [1 of 1] Compiling Main ( Main.hs, Main.o )
  Linking Main ...
  root@jessie32:~/ghc-7.8.4/utils/ghc-pwd#

  As seen above, compiling a Haskell source code often results in a
  segfault but simply by retrying to run ghc over and over again, the
  compile process will eventually succeed and no segfault occurs.

  I have created a tarball which contains the sh4 chroot from the
  example above which also includes ghc, gcc and the source code in
  question (in /root/ghc-7.8.4/utils/ghc-pwd). To test, it's probably a
  good idea to replace the qemu-sh4-static binary in /usr/bin with a
  current git snapshot (which I tried but didn't help).

  > http://users.physik.fu-berlin.de/~glaubitz/sid-sh4-sbuild-ghc.tgz

  In case anyone wants to try ghc with their own sh4 chroot, here's my
  version of ghc:

  > https://people.debian.org/~glaubitz/sh4-unknown-linux-gnu-
  ghc-7.10.3.tar.gz

  Just extract this tarball into the root directory of the sh4 chroot.

  Please note, that it might be advisable on sh4 to apply the patches
  from these two bug reports as otherwise qemu-sh4-static won't work
  properly on amd64 and misses syscall 186:

  > https://bugs.launchpad.net/ubuntu/+source/qemu-linaro/+bug/1254824
  > https://bugs.launchpad.net/qemu/+bug/1516408

  The above issue is reproducible with the two patches applied and
  without. It's also reproducible with both libc6 2.19 and 2.21 in the
  chroot. Thus, I am currently out of ideas what else to test.

  Cheers,
  Adrian

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1527765/+subscriptions



[Qemu-devel] [PATCH v9 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2015-12-25 Thread Changlong Xie
From: Wen Congyang 

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
---
 block.c   |   8 ++--
 block/quorum.c| 122 +-
 include/block/block.h |   4 ++
 3 files changed, 128 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index a347008..b9e99da 100644
--- a/block.c
+++ b/block.c
@@ -1196,10 +1196,10 @@ static int bdrv_fill_options(QDict **options, const 
char *filename,
 return 0;
 }
 
-static BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
-BlockDriverState *child_bs,
-const char *child_name,
-const BdrvChildRole *child_role)
+BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
+ BlockDriverState *child_bs,
+ const char *child_name,
+ const BdrvChildRole *child_role)
 {
 BdrvChild *child = g_new(BdrvChild, 1);
 *child = (BdrvChild) {
diff --git a/block/quorum.c b/block/quorum.c
index 6793f12..e73418c 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -23,6 +23,7 @@
 #include "qapi/qmp/qstring.h"
 #include "qapi-event.h"
 #include "crypto/hash.h"
+#include "qemu/bitmap.h"
 
 #define HASH_LENGTH 32
 
@@ -80,6 +81,8 @@ typedef struct BDRVQuorumState {
 bool rewrite_corrupted;/* true if the driver must rewrite-on-read corrupted
 * block if Quorum is reached.
 */
+unsigned long *index_bitmap;
+int bsize;
 
 QuorumReadPattern read_pattern;
 } BDRVQuorumState;
@@ -875,9 +878,9 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 ret = -EINVAL;
 goto exit;
 }
-if (s->num_children < 2) {
+if (s->num_children < 1) {
 error_setg(&local_err,
-   "Number of provided children must be greater than 1");
+   "Number of provided children must be 1 or more");
 ret = -EINVAL;
 goto exit;
 }
@@ -926,6 +929,7 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 /* allocate the children array */
 s->children = g_new0(BdrvChild *, s->num_children);
 opened = g_new0(bool, s->num_children);
+s->index_bitmap = bitmap_new(s->num_children);
 
 for (i = 0; i < s->num_children; i++) {
 char indexstr[32];
@@ -941,6 +945,8 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 
 opened[i] = true;
 }
+bitmap_set(s->index_bitmap, 0, s->num_children);
+s->bsize = s->num_children;
 
 g_free(opened);
 goto exit;
@@ -997,6 +1003,115 @@ static void quorum_attach_aio_context(BlockDriverState 
*bs,
 }
 }
 
+static int get_new_child_index(BDRVQuorumState *s)
+{
+int index;
+
+index = find_next_zero_bit(s->index_bitmap, s->bsize, 0);
+if (index < s->bsize) {
+return index;
+}
+
+if ((s->bsize % BITS_PER_LONG) == 0) {
+s->index_bitmap = bitmap_zero_extend(s->index_bitmap, s->bsize,
+ s->bsize + 1);
+}
+
+return s->bsize++;
+}
+
+static void remove_child_index(BDRVQuorumState *s, int index)
+{
+int last_index;
+long new_len;
+
+assert(index < s->bsize);
+
+clear_bit(index, s->index_bitmap);
+if (index < s->bsize - 1) {
+/*
+ * The last bit is always set, and we don't clear
+ * the last bit.
+ */
+return;
+}
+
+last_index = find_last_bit(s->index_bitmap, s->bsize);
+s->bsize = last_index + 1;
+if (BITS_TO_LONGS(last_index + 1) == BITS_TO_LONGS(s->bsize)) {
+return;
+}
+
+new_len = BITS_TO_LONGS(last_index + 1) * sizeof(unsigned long);
+s->index_bitmap = g_realloc(s->index_bitmap, new_len);
+}
+
+static void quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs,
+ Error **errp)
+{
+BDRVQuorumState *s = bs->opaque;
+BdrvChild *child;
+char indexstr[32];
+int index, ret;
+
+index = get_new_child_index(s);
+ret = snprintf(indexstr, 32, "children.%d", index);
+if (ret < 0 || ret >= 32) {
+error_setg(errp, "cannot generate child name");
+return;
+}
+
+bdrv_drain(bs);
+
+assert(s->num_children <= INT_MAX / sizeof(BdrvChild *));
+if (s->num_children == INT_MAX / sizeof(BdrvChild *)) {
+error_setg(errp, "Too many children");
+return;
+}
+s->children = g_renew(BdrvChild *, s->children, s->num_children + 1);
+
+bdrv_ref(child_bs);
+child = bdrv_attach_child(bs, child_bs, indexstr, &child_format);
+s->children[s->num_children++] = child;
+set_bit(index, s->index_bitmap);
+}
+
+static void quorum_del_child(BlockDriverState *bs, BlockDriverState *child_bs,
+ Error **errp)
+{

[Qemu-devel] [PATCH v9 0/3] qapi: child add/delete support

2015-12-25 Thread Changlong Xie
If quorum's child is broken, we can use mirror job to replace it.
But sometimes, the user only need to remove the broken child, and
add it later when the problem is fixed.

ChangLog:
v9:
1. Rebase to the newest codes
2. Remove redundant codes in quorum_add_child() and quorum_del_child()
3. Fix typos and in qmp-commands.hx 
v8:
1. Rebase to the newest codes
2. Address the comments from Eric Blake
v7:
1. Remove the qmp command x-blockdev-change's parameter operation according
   to Kevin's comments.
2. Remove the hmp command.
v6:
1. Use a single qmp command x-blockdev-change to replace x-blockdev-child-add
   and x-blockdev-child-delete
v5:
1. Address Eric Blake's comments
v4:
1. drop nbd driver's implementation. We can use human-monitor-command
   to do it.
2. Rename the command name.
v3:
1. Don't open BDS in bdrv_add_child(). Use the existing BDS which is
   created by the QMP command blockdev-add.
2. The driver NBD can support filename, path, host:port now.
v2:
1. Use bdrv_get_device_or_node_name() instead of new function
   bdrv_get_id_or_node_name()
2. Update the error message
3. Update the documents in block-core.json

Wen Congyang (3):
  Add new block driver interface to add/delete a BDS's child
  quorum: implement bdrv_add_child() and bdrv_del_child()
  qmp: add monitor command to add/remove a child

 block.c   |  58 --
 block/quorum.c| 122 +-
 blockdev.c|  54 
 include/block/block.h |   9 
 include/block/block_int.h |   5 ++
 qapi/block-core.json  |  23 +
 qmp-commands.hx   |  47 ++
 7 files changed, 312 insertions(+), 6 deletions(-)

-- 
1.9.3






[Qemu-devel] [PATCH v9 1/3] Add new block driver interface to add/delete a BDS's child

2015-12-25 Thread Changlong Xie
From: Wen Congyang 

In some cases, we want to take a quorum child offline, and take
another child online.

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
Reviewed-by: Eric Blake 
Reviewed-by: Alberto Garcia 
---
 block.c   | 50 +++
 include/block/block.h |  5 +
 include/block/block_int.h |  5 +
 3 files changed, 60 insertions(+)

diff --git a/block.c b/block.c
index 411edbf..a347008 100644
--- a/block.c
+++ b/block.c
@@ -4320,3 +4320,53 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 QDECREF(json);
 }
 }
+
+/*
+ * Hot add/remove a BDS's child. So the user can take a child offline when
+ * it is broken and take a new child online
+ */
+void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
+Error **errp)
+{
+
+if (!parent_bs->drv || !parent_bs->drv->bdrv_add_child) {
+error_setg(errp, "The node %s doesn't support adding a child",
+   bdrv_get_device_or_node_name(parent_bs));
+return;
+}
+
+if (!QLIST_EMPTY(&child_bs->parents)) {
+error_setg(errp, "The node %s already has a parent",
+   child_bs->node_name);
+return;
+}
+
+parent_bs->drv->bdrv_add_child(parent_bs, child_bs, errp);
+}
+
+void bdrv_del_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
+Error **errp)
+{
+BdrvChild *child;
+
+if (!parent_bs->drv || !parent_bs->drv->bdrv_del_child) {
+error_setg(errp, "The node %s doesn't support removing a child",
+   bdrv_get_device_or_node_name(parent_bs));
+return;
+}
+
+QLIST_FOREACH(child, &parent_bs->children, next) {
+if (child->bs == child_bs) {
+break;
+}
+}
+
+if (!child) {
+error_setg(errp, "The node %s is not a child of %s",
+   bdrv_get_device_or_node_name(child_bs),
+   bdrv_get_device_or_node_name(parent_bs));
+return;
+}
+
+parent_bs->drv->bdrv_del_child(parent_bs, child_bs, errp);
+}
diff --git a/include/block/block.h b/include/block/block.h
index db8e096..863a7c8 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -578,4 +578,9 @@ void bdrv_drained_begin(BlockDriverState *bs);
  */
 void bdrv_drained_end(BlockDriverState *bs);
 
+void bdrv_add_child(BlockDriverState *parent, BlockDriverState *child,
+Error **errp);
+void bdrv_del_child(BlockDriverState *parent, BlockDriverState *child,
+Error **errp);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 256609d..ebe8b1e 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -303,6 +303,11 @@ struct BlockDriver {
  */
 void (*bdrv_drain)(BlockDriverState *bs);
 
+void (*bdrv_add_child)(BlockDriverState *parent, BlockDriverState *child,
+   Error **errp);
+void (*bdrv_del_child)(BlockDriverState *parent, BlockDriverState *child,
+   Error **errp);
+
 QLIST_ENTRY(BlockDriver) list;
 };
 
-- 
1.9.3






[Qemu-devel] [PATCH v9 3/3] qmp: add monitor command to add/remove a child

2015-12-25 Thread Changlong Xie
From: Wen Congyang 

The new QMP command name is x-blockdev-change. It's just for adding/removing
quorum's child now, and doesn't support all kinds of children, all kinds of
operations, nor all block drivers. So it is experimental now.

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
---
 blockdev.c   | 54 
 qapi/block-core.json | 23 ++
 qmp-commands.hx  | 47 +
 3 files changed, 124 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 64dbfeb..4e62fdf 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3836,6 +3836,60 @@ out:
 aio_context_release(aio_context);
 }
 
+static BlockDriverState *bdrv_find_child(BlockDriverState *parent_bs,
+ const char *child_name)
+{
+BdrvChild *child;
+
+QLIST_FOREACH(child, &parent_bs->children, next) {
+if (strcmp(child->name, child_name) == 0) {
+return child->bs;
+}
+}
+
+return NULL;
+}
+
+void qmp_x_blockdev_change(const char *parent, bool has_child,
+   const char *child, bool has_node,
+   const char *node, Error **errp)
+{
+BlockDriverState *parent_bs, *child_bs = NULL, *new_bs = NULL;
+
+parent_bs = bdrv_lookup_bs(parent, parent, errp);
+if (!parent_bs) {
+return;
+}
+
+if (has_child == has_node) {
+if (has_child) {
+error_setg(errp, "The paramter child and node is conflict");
+} else {
+error_setg(errp, "Either child or node should be specified");
+}
+return;
+}
+
+if (has_child) {
+child_bs = bdrv_find_child(parent_bs, child);
+if (!child_bs) {
+error_setg(errp, "Node '%s' doesn't have child %s",
+   parent, child);
+return;
+}
+bdrv_del_child(parent_bs, child_bs, errp);
+}
+
+if (has_node) {
+new_bs = bdrv_find_node(node);
+if (!new_bs) {
+error_setg(errp, "Node '%s' not found", node);
+return;
+}
+bdrv_add_child(parent_bs, new_bs, errp);
+}
+}
+
 BlockJobInfoList *qmp_query_block_jobs(Error **errp)
 {
 BlockJobInfoList *head = NULL, **p_next = &head;
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 1a5d9ce..fe63c6d 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2408,3 +2408,26 @@
 ##
 { 'command': 'block-set-write-threshold',
   'data': { 'node-name': 'str', 'write-threshold': 'uint64' } }
+
+##
+# @x-blockdev-change
+#
+# Dynamically reconfigure the block driver state graph. It can be used
+# to add, remove, insert or replace a block driver state. Currently only
+# the Quorum driver implements this feature to add or remove its child.
+# This is useful to fix a broken quorum child.
+#
+# @parent: the id or name of the node that will be changed.
+#
+# @child: #optional the name of the child that will be deleted.
+#
+# @node: #optional the name of the node will be added.
+#
+# Note: this command is experimental, and its API is not stable.
+#
+# Since: 2.6
+##
+{ 'command': 'x-blockdev-change',
+  'data' : { 'parent': 'str',
+ '*child': 'str',
+ '*node': 'str' } }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 7b235ee..efee0ca 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -4293,6 +4293,53 @@ Example:
 EQMP
 
 {
+.name   = "x-blockdev-change",
+.args_type  = "parent:B,child:B?,node:B?",
+.mhandler.cmd_new = qmp_marshal_x_blockdev_change,
+},
+
+SQMP
+x-blockdev-change
+-
+
+Dynamically reconfigure the block driver state graph. It can be used to
+add, remove, insert, or replace a block driver state. Currently only
+the Quorum driver implements this feature to add and remove its child.
+This is useful to fix a broken quorum child.
+
+Arguments:
+- "parent": the id or node name of which node will be changed (json-string)
+- "child": the child name which will be deleted (json-string, optional)
+- "node": the new node-name which will be added (json-string, optional)
+
+Note: this command is experimental, and not a stable API. It doesn't
+support all kinds of operations, all kinds of children, nor all block
+drivers.
+
+Example:
+
+Add a new node to a quorum
+-> { "execute": "blockdev-add",
+"arguments": { "options": { "driver": "raw",
+"node-name": "new_node",
+"id": "test_new_node",
+"file": { "driver": "file",
+  "filename": "test.raw" } } } }
+<- { "return": {} }
+-> { "execute": "x-blockdev-change",
+"arguments": { "parent": "disk1",
+   "node": "new_node" } }
+<- { "return": {} }
+
+Delete a quorum's node
+-> { "execute": "x-blockdev

[Qemu-devel] [Bug 1529226] [NEW] qemu-i386-user on 32-bit Linux: uncaught target signal 11

2015-12-25 Thread PeteVine
Public bug reported:

Even though the command I'm trying to run (a wrapper script for
qemu-i386-user running rustc, the rust compiler)  produces the expected
compiled output, the build process is interrupted:

qemu: uncaught target signal 11 (Segmentation fault) - core dumped
i686-unknown-linux-gnu/stage0/bin/rustc: line 1:  7474 Segmentation fault  
/usr/local/bin/qemu-i386 -cpu qemu32 /home/petevine/stage0/rustc.bin -C 
target-cpu=pentium2 -L 
/home/petevine/unpacked/rust-master/i686-unknown-linux-gnu/stage0/lib/rustlib/i686-unknown-linux-gnu/lib/
 "$@"
make: *** 
[i686-unknown-linux-gnu/stage0/lib/rustlib/i686-unknown-linux-gnu/lib/stamp.rustc_back]
 Error 139

The stamp file is not being created so this could be about forking bash
after finishing the wrapper script.

Qemu was compiled from the latest git source.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1529226

Title:
  qemu-i386-user on 32-bit Linux: uncaught target signal 11

Status in QEMU:
  New

Bug description:
  Even though the command I'm trying to run (a wrapper script for
  qemu-i386-user running rustc, the rust compiler)  produces the
  expected  compiled output, the build process is interrupted:

  qemu: uncaught target signal 11 (Segmentation fault) - core dumped
  i686-unknown-linux-gnu/stage0/bin/rustc: line 1:  7474 Segmentation fault 
 /usr/local/bin/qemu-i386 -cpu qemu32 /home/petevine/stage0/rustc.bin -C 
target-cpu=pentium2 -L 
/home/petevine/unpacked/rust-master/i686-unknown-linux-gnu/stage0/lib/rustlib/i686-unknown-linux-gnu/lib/
 "$@"
  make: *** 
[i686-unknown-linux-gnu/stage0/lib/rustlib/i686-unknown-linux-gnu/lib/stamp.rustc_back]
 Error 139

  The stamp file is not being created so this could be about forking
  bash after finishing the wrapper script.

  Qemu was compiled from the latest git source.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1529226/+subscriptions



Re: [Qemu-devel] live migration vs device assignment (motivation)

2015-12-25 Thread Michael S. Tsirkin
On Fri, Dec 25, 2015 at 03:03:47PM +0800, Lan Tianyu wrote:
> Merry Christmas.
> Sorry for later response due to personal affair.
> 
> On 2015年12月14日 03:30, Alexander Duyck wrote:
> >> > These sounds we need to add a faked bridge for migration and adding a
> >> > driver in the guest for it. It also needs to extend PCI bus/hotplug
> >> > driver to do pause/resume other devices, right?
> >> >
> >> > My concern is still that whether we can change PCI bus/hotplug like that
> >> > without spec change.
> >> >
> >> > IRQ should be general for any devices and we may extend it for
> >> > migration. Device driver also can make decision to support migration
> >> > or not.
> > The device should have no say in the matter.  Either we are going to
> > migrate or we will not.  This is why I have suggested my approach as
> > it allows for the least amount of driver intrusion while providing the
> > maximum number of ways to still perform migration even if the device
> > doesn't support it.
> 
> Even if the device driver doesn't support migration, you still want to
> migrate VM? That maybe risk and we should add the "bad path" for the
> driver at least.
> 
> > 
> > The solution I have proposed is simple:
> > 
> > 1.  Extend swiotlb to allow for a page dirtying functionality.
> > 
> >  This part is pretty straight forward.  I'll submit a few patches
> > later today as RFC that can provided the minimal functionality needed
> > for this.
> 
> Very appreciate to do that.
> 
> > 
> > 2.  Provide a vendor specific configuration space option on the QEMU
> > implementation of a PCI bridge to act as a bridge between direct
> > assigned devices and the host bridge.
> > 
> >  My thought was to add some vendor specific block that includes a
> > capabilities, status, and control register so you could go through and
> > synchronize things like the DMA page dirtying feature.  The bridge
> > itself could manage the migration capable bit inside QEMU for all
> > devices assigned to it.  So if you added a VF to the bridge it would
> > flag that you can support migration in QEMU, while the bridge would
> > indicate you cannot until the DMA page dirtying control bit is set by
> > the guest.
> > 
> >  We could also go through and optimize the DMA page dirtying after
> > this is added so that we can narrow down the scope of use, and as a
> > result improve the performance for other devices that don't need to
> > support migration.  It would then be a matter of adding an interrupt
> > in the device to handle an event such as the DMA page dirtying status
> > bit being set in the config space status register, while the bit is
> > not set in the control register.  If it doesn't get set then we would
> > have to evict the devices before the warm-up phase of the migration,
> > otherwise we can defer it until the end of the warm-up phase.
> > 
> > 3.  Extend existing shpc driver to support the optional "pause"
> > functionality as called out in section 4.1.2 of the Revision 1.1 PCI
> > hot-plug specification.
> 
> Since your solution has added a faked PCI bridge. Why not notify the
> bridge directly during migration via irq and call device driver's
> callback in the new bridge driver?
> 
> Otherwise, the new bridge driver also can check whether the device
> driver provides migration callback or not and call them to improve the
> passthough device's performance during migration.

As long as you keep up this vague talk about performance during
migration, without even bothering with any measurements, this patchset
will keep going nowhere.




There's Alex's patch that tracks memory changes during migration.  It
needs some simple enhancements to be useful in production (e.g. add a
host/guest handshake to both enable tracking in guest and to detect the
support in host), then it can allow starting migration with an assigned
device, by invoking hot-unplug after most of memory have been migrated.

Please implement this in qemu and measure the speed.
I will not be surprised if destroying/creating netdev in linux
turns out to take too long, but before anyone bothered
checking, it does not make sense to discuss further enhancements.



> > 
> >  Note I call out "extend" here instead of saying to add this.
> > Basically what we should do is provide a means of quiescing the device
> > without unloading the driver.  This is called out as something the OS
> > vendor can optionally implement in the PCI hot-plug specification.  On
> > OSes that wouldn't support this it would just be treated as a standard
> > hot-plug event.   We could add a capability, status, and control bit
> > in the vendor specific configuration block for this as well and if we
> > set the status bit would indicate the host wants to pause instead of
> > remove and the control bit would indicate the guest supports "pause"
> > in the OS.  We then could optionally disable guest migration while the
> > VF is present and pause is not supported.
> > 
> >  To support this we would need 

Re: [Qemu-devel] [PATCH v4 1/4] target-tilegx: Add floating point shared functions

2015-12-25 Thread Chen Gang

On 12/25/15 04:01, Richard Henderson wrote:
> On 12/24/2015 07:38 AM, Chen Gang wrote:
>>
>> OK, thanks. Since fp_status need to be initialized to be 0, so I will
>> declared it statically, too (need we consider about thread safe for it?
>> I guess not).
> 
> While qemu is not currently thread-safe, there's work going on to make that 
> happen.  There is no need to exacerbate the problem.
> 

OK, thanks.

> Also, I think using an on-stack automatic variable, initialized each time, 
> emphasizes the fact there there is no state that is preserved across 
> operations.
> 
> This should really be as simple as
> 
>   float_status fp_status = {
> .float_rounding_mode = float_round_nearest_even
>   };
> 
> (I realize float_round_nearest_even is *also* zero, but humor me.  At least 
> the other members are either flags or booleans.)
> 

OK, thanks.

-- 
Chen Gang (陈刚)

Open, share, and attitude like air, water, and life which God blessed



[Qemu-devel] [PATCH v13 02/10] Store parent BDS in BdrvChild

2015-12-25 Thread Changlong Xie
From: Wen Congyang 

We need to access the parent BDS to get the root BDS.

Signed-off-by: Wen Congyang 
Signed-off-by: Changlong Xie 
---
 block.c   | 1 +
 include/block/block_int.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/block.c b/block.c
index 1589c0d..c9c913e 100644
--- a/block.c
+++ b/block.c
@@ -1204,6 +1204,7 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
 BdrvChild *child = g_new(BdrvChild, 1);
 *child = (BdrvChild) {
 .bs = child_bs,
+.parent = parent_bs,
 .name   = g_strdup(child_name),
 .role   = child_role,
 };
diff --git a/include/block/block_int.h b/include/block/block_int.h
index ebe8b1e..19c02b6 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -361,6 +361,7 @@ extern const BdrvChildRole child_format;
 
 struct BdrvChild {
 BlockDriverState *bs;
+BlockDriverState *parent;
 char *name;
 const BdrvChildRole *role;
 QLIST_ENTRY(BdrvChild) next;
-- 
1.9.3






[Qemu-devel] [PATCH v13 03/10] Backup: clear all bitmap when doing block checkpoint

2015-12-25 Thread Changlong Xie
From: Wen Congyang 

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
Reviewed-by: Jeff Cody 
---
 block/backup.c   | 14 ++
 blockjob.c   | 11 +++
 include/block/blockjob.h | 12 
 3 files changed, 37 insertions(+)

diff --git a/block/backup.c b/block/backup.c
index 705bb77..0a27d01 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -253,11 +253,25 @@ static void backup_abort(BlockJob *job)
 }
 }
 
+static void backup_do_checkpoint(BlockJob *job, Error **errp)
+{
+BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
+
+if (backup_job->sync_mode != MIRROR_SYNC_MODE_NONE) {
+error_setg(errp, "The backup job only supports block checkpoint in"
+   " sync=none mode");
+return;
+}
+
+hbitmap_reset_all(backup_job->bitmap);
+}
+
 static const BlockJobDriver backup_job_driver = {
 .instance_size  = sizeof(BackupBlockJob),
 .job_type   = BLOCK_JOB_TYPE_BACKUP,
 .set_speed  = backup_set_speed,
 .iostatus_reset = backup_iostatus_reset,
+.do_checkpoint  = backup_do_checkpoint,
 .commit = backup_commit,
 .abort  = backup_abort,
 };
diff --git a/blockjob.c b/blockjob.c
index 80adb9d..0c8edfe 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -533,3 +533,14 @@ void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job)
 QLIST_INSERT_HEAD(&txn->jobs, job, txn_list);
 block_job_txn_ref(txn);
 }
+
+void block_job_do_checkpoint(BlockJob *job, Error **errp)
+{
+if (!job->driver->do_checkpoint) {
+error_setg(errp, "The job %s doesn't support block checkpoint",
+   BlockJobType_lookup[job->driver->job_type]);
+return;
+}
+
+job->driver->do_checkpoint(job, errp);
+}
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index d84ccd8..abdba7c 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -70,6 +70,9 @@ typedef struct BlockJobDriver {
  * never both.
  */
 void (*abort)(BlockJob *job);
+
+/** Optional callback for job types that support checkpoint. */
+void (*do_checkpoint)(BlockJob *job, Error **errp);
 } BlockJobDriver;
 
 /**
@@ -443,4 +446,13 @@ void block_job_txn_unref(BlockJobTxn *txn);
  */
 void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job);
 
+/**
+ * block_job_do_checkpoint:
+ * @job: The job.
+ * @errp: Error object.
+ *
+ * Do block checkpoint on the specified job.
+ */
+void block_job_do_checkpoint(BlockJob *job, Error **errp);
+
 #endif
-- 
1.9.3






[Qemu-devel] [PATCH v13 01/10] unblock backup operations in backing file

2015-12-25 Thread Changlong Xie
From: Wen Congyang 

Signed-off-by: Wen Congyang 
Signed-off-by: Changlong Xie 
---
 block.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/block.c b/block.c
index b9e99da..1589c0d 100644
--- a/block.c
+++ b/block.c
@@ -1275,6 +1275,24 @@ void bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd)
 /* Otherwise we won't be able to commit due to check in bdrv_commit */
 bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_COMMIT_TARGET,
 bs->backing_blocker);
+/*
+ * We do backup in 3 ways:
+ * 1. drive backup
+ *The target bs is new opened, and the source is top BDS
+ * 2. blockdev backup
+ *Both the source and the target are top BDSes.
+ * 3. internal backup(used for block replication)
+ *Both the source and the target are backing file
+ *
+ * In case 1, and 2, the backing file is neither the source nor
+ * the target.
+ * In case 3, we will block the top BDS, so there is only one block
+ * job for the top BDS and its backing chain.
+ */
+bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_SOURCE,
+bs->backing_blocker);
+bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_TARGET,
+bs->backing_blocker);
 out:
 bdrv_refresh_limits(bs, NULL);
 }
-- 
1.9.3






[Qemu-devel] [PATCH v13 04/10] Allow creating backup jobs when opening BDS

2015-12-25 Thread Changlong Xie
From: Wen Congyang 

When opening BDS, we need to create backup jobs for
image-fleecing.

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Jeff Cody 
---
 block/Makefile.objs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 58ef2ef..fa05f37 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -22,10 +22,10 @@ block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
 block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o
 block-obj-y += write-threshold.o
+block-obj-y += backup.o
 
 common-obj-y += stream.o
 common-obj-y += commit.o
-common-obj-y += backup.o
 
 iscsi.o-cflags := $(LIBISCSI_CFLAGS)
 iscsi.o-libs   := $(LIBISCSI_LIBS)
-- 
1.9.3






[Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints

2015-12-25 Thread Changlong Xie
Block replication is a very important feature which is used for
continuous checkpoints(for example: COLO).

You can get the detailed information about block replication from here:
http://wiki.qemu.org/Features/BlockReplication

Usage:
Please refer to docs/block-replication.txt

This patch series is based on the following patch series:
1. http://lists.nongnu.org/archive/html/qemu-devel/2015-12/msg04570.html

You can get the patch here:
https://github.com/Pating/qemu/tree/changlox/block-replication-v13

You can get the patch with framework here:
https://github.com/Pating/qemu/tree/changlox/colo_framework_v12

TODO:
1. Continuous block replication. It will be started after basic functions
   are accepted.

Changs Log:
V13:
1. Rebase to the newest codes
2. Remove redundant marcos and semicolon in replication.c 
3. Fix typos in block-replication.txt
V12:
1. Rebase to the newest codes
2. Use backing reference to replcace 'allow-write-backing-file'
V11:
1. Reopen the backing file when starting blcok replication if it is not
   opened in R/W mode
2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
   when opening backing file
3. Block the top BDS so there is only one block job for the top BDS and
   its backing chain.
V10:
1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
   reference.
2. Address the comments from Eric Blake
V9:
1. Update the error messages
2. Rebase to the newest qemu
3. Split child add/delete support. These patches are sent in another patchset.
V8:
1. Address Alberto Garcia's comments
V7:
1. Implement adding/removing quorum child. Remove the option non-connect.
2. Simplify the backing refrence option according to Stefan Hajnoczi's 
suggestion
V6:
1. Rebase to the newest qemu.
V5:
1. Address the comments from Gong Lei
2. Speed the failover up. The secondary vm can take over very quickly even
   if there are too many I/O requests.
V4:
1. Introduce a new driver replication to avoid touch nbd and qcow2.
V3:
1: use error_setg() instead of error_set()
2. Add a new block job API
3. Active disk, hidden disk and nbd target uses the same AioContext
4. Add a testcase to test new hbitmap API
V2:
1. Redesign the secondary qemu(use image-fleecing)
2. Use Error objects to return error message
3. Address the comments from Max Reitz and Eric Blake

Wen Congyang (10):
  unblock backup operations in backing file
  Store parent BDS in BdrvChild
  Backup: clear all bitmap when doing block checkpoint
  Allow creating backup jobs when opening BDS
  docs: block replication's description
  Add new block driver interfaces to control block replication
  quorum: implement block driver interfaces for block replication
  Implement new driver for block replication
  support replication driver in blockdev-add
  Add a new API to start/stop replication, do checkpoint to all BDSes

 block.c| 145 
 block/Makefile.objs|   3 +-
 block/backup.c |  14 ++
 block/quorum.c |  78 +++
 block/replication.c| 545 +
 blockjob.c |  11 +
 docs/block-replication.txt | 227 +++
 include/block/block.h  |   9 +
 include/block/block_int.h  |  15 ++
 include/block/blockjob.h   |  12 +
 qapi/block-core.json   |  33 ++-
 11 files changed, 1089 insertions(+), 3 deletions(-)
 create mode 100644 block/replication.c
 create mode 100644 docs/block-replication.txt

-- 
1.9.3






[Qemu-devel] [PATCH v13 09/10] support replication driver in blockdev-add

2015-12-25 Thread Changlong Xie
From: Wen Congyang 

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
Reviewed-by: Eric Blake 
---
 qapi/block-core.json | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 610da92..7354c6a 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -220,6 +220,7 @@
 #   2.2: 'archipelago' added, 'cow' dropped
 #   2.3: 'host_floppy' deprecated
 #   2.5: 'host_floppy' dropped
+#   2.6: 'replication' added
 #
 # @backing_file: #optional the name of the backing file (for copy-on-write)
 #
@@ -1492,6 +1493,7 @@
 # Drivers that are supported in block device operations.
 #
 # @host_device, @host_cdrom: Since 2.1
+# @replication: Since 2.6
 #
 # Since: 2.0
 ##
@@ -1499,8 +1501,8 @@
   'data': [ 'archipelago', 'blkdebug', 'blkverify', 'bochs', 'cloop',
 'dmg', 'file', 'ftp', 'ftps', 'host_cdrom', 'host_device',
 'http', 'https', 'null-aio', 'null-co', 'parallels',
-'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'tftp', 'vdi', 'vhdx',
-'vmdk', 'vpc', 'vvfat' ] }
+'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'replication',
+'tftp', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
 
 ##
 # @BlockdevOptionsBase
@@ -1940,6 +1942,19 @@
 { 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] }
 
 ##
+# @BlockdevOptionsReplication
+#
+# Driver specific block device options for replication
+#
+# @mode: the replication mode
+#
+# Since: 2.6
+##
+{ 'struct': 'BlockdevOptionsReplication',
+  'base': 'BlockdevOptionsGenericFormat',
+  'data': { 'mode': 'ReplicationMode'  } }
+
+##
 # @BlockdevOptions
 #
 # Options for creating a block device.
@@ -1976,6 +1991,7 @@
   'quorum': 'BlockdevOptionsQuorum',
   'raw':'BlockdevOptionsGenericFormat',
 # TODO rbd: Wait for structured options
+  'replication':'BlockdevOptionsReplication',
 # TODO sheepdog: Wait for structured options
 # TODO ssh: Should take InetSocketAddress for 'host'?
   'tftp':   'BlockdevOptionsFile',
-- 
1.9.3






[Qemu-devel] [PATCH v13 07/10] quorum: implement block driver interfaces for block replication

2015-12-25 Thread Changlong Xie
From: Wen Congyang 

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
Reviewed-by: Alberto Garcia 
---
 block/quorum.c | 78 ++
 1 file changed, 78 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index e73418c..aa8c4dd 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -85,6 +85,8 @@ typedef struct BDRVQuorumState {
 int bsize;
 
 QuorumReadPattern read_pattern;
+
+int replication_index; /* store which child supports block replication */
 } BDRVQuorumState;
 
 typedef struct QuorumAIOCB QuorumAIOCB;
@@ -949,6 +951,7 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 s->bsize = s->num_children;
 
 g_free(opened);
+s->replication_index = -1;
 goto exit;
 
 close_exit:
@@ -1146,6 +1149,77 @@ static void quorum_refresh_filename(BlockDriverState 
*bs, QDict *options)
 bs->full_open_options = opts;
 }
 
+static void quorum_start_replication(BlockDriverState *bs, ReplicationMode 
mode,
+ Error **errp)
+{
+BDRVQuorumState *s = bs->opaque;
+int count = 0, i, index;
+Error *local_err = NULL;
+
+/*
+ * TODO: support REPLICATION_MODE_SECONDARY if we allow secondary
+ * QEMU becoming primary QEMU.
+ */
+if (mode != REPLICATION_MODE_PRIMARY) {
+error_setg(errp, "The replication mode for quorum should be 
'primary'");
+return;
+}
+
+if (s->read_pattern != QUORUM_READ_PATTERN_FIFO) {
+error_setg(errp, "Block replication needs read pattern 'fifo'");
+return;
+}
+
+for (i = 0; i < s->num_children; i++) {
+bdrv_start_replication(s->children[i]->bs, mode, &local_err);
+if (local_err) {
+error_free(local_err);
+local_err = NULL;
+} else {
+count++;
+index = i;
+}
+}
+
+if (count == 0) {
+error_setg(errp, "No child supports block replication");
+} else if (count > 1) {
+for (i = 0; i < s->num_children; i++) {
+bdrv_stop_replication(s->children[i]->bs, false, NULL);
+}
+error_setg(errp, "Too many children support block replication");
+} else {
+s->replication_index = index;
+}
+}
+
+static void quorum_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+BDRVQuorumState *s = bs->opaque;
+
+if (s->replication_index < 0) {
+error_setg(errp, "Block replication is not running");
+return;
+}
+
+bdrv_do_checkpoint(s->children[s->replication_index]->bs, errp);
+}
+
+static void quorum_stop_replication(BlockDriverState *bs, bool failover,
+Error **errp)
+{
+BDRVQuorumState *s = bs->opaque;
+
+if (s->replication_index < 0) {
+error_setg(errp, "Block replication is not running");
+return;
+}
+
+bdrv_stop_replication(s->children[s->replication_index]->bs, failover,
+  errp);
+s->replication_index = -1;
+}
+
 static BlockDriver bdrv_quorum = {
 .format_name= "quorum",
 .protocol_name  = "quorum",
@@ -1172,6 +1246,10 @@ static BlockDriver bdrv_quorum = {
 
 .is_filter  = true,
 .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
+
+.bdrv_start_replication = quorum_start_replication,
+.bdrv_do_checkpoint = quorum_do_checkpoint,
+.bdrv_stop_replication  = quorum_stop_replication,
 };
 
 static void bdrv_quorum_init(void)
-- 
1.9.3






[Qemu-devel] [PATCH v13 06/10] Add new block driver interfaces to control block replication

2015-12-25 Thread Changlong Xie
From: Wen Congyang 

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
Cc: Luiz Capitulino 
Cc: Michael Roth 
Reviewed-by: Paolo Bonzini 
---
 block.c   | 43 +++
 include/block/block.h |  5 +
 include/block/block_int.h | 14 ++
 qapi/block-core.json  | 13 +
 4 files changed, 75 insertions(+)

diff --git a/block.c b/block.c
index c9c913e..275d8b4 100644
--- a/block.c
+++ b/block.c
@@ -4389,3 +4389,46 @@ void bdrv_del_child(BlockDriverState *parent_bs, 
BlockDriverState *child_bs,
 
 parent_bs->drv->bdrv_del_child(parent_bs, child_bs, errp);
 }
+
+void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode,
+Error **errp)
+{
+BlockDriver *drv = bs->drv;
+
+if (drv && drv->bdrv_start_replication) {
+drv->bdrv_start_replication(bs, mode, errp);
+} else if (bs->file) {
+bdrv_start_replication(bs->file->bs, mode, errp);
+} else {
+error_setg(errp, "The BDS %s doesn't support starting block"
+   " replication", bs->filename);
+}
+}
+
+void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+BlockDriver *drv = bs->drv;
+
+if (drv && drv->bdrv_do_checkpoint) {
+drv->bdrv_do_checkpoint(bs, errp);
+} else if (bs->file) {
+bdrv_do_checkpoint(bs->file->bs, errp);
+} else {
+error_setg(errp, "The BDS %s doesn't support block checkpoint",
+   bs->filename);
+}
+}
+
+void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp)
+{
+BlockDriver *drv = bs->drv;
+
+if (drv && drv->bdrv_stop_replication) {
+drv->bdrv_stop_replication(bs, failover, errp);
+} else if (bs->file) {
+bdrv_stop_replication(bs->file->bs, failover, errp);
+} else {
+error_setg(errp, "The BDS %s doesn't support stopping block"
+   " replication", bs->filename);
+}
+}
diff --git a/include/block/block.h b/include/block/block.h
index 6c7e54b..5d47cef 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -587,4 +587,9 @@ void bdrv_add_child(BlockDriverState *parent, 
BlockDriverState *child,
 void bdrv_del_child(BlockDriverState *parent, BlockDriverState *child,
 Error **errp);
 
+void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode,
+Error **errp);
+void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp);
+void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 19c02b6..e31f9db 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -308,6 +308,20 @@ struct BlockDriver {
 void (*bdrv_del_child)(BlockDriverState *parent, BlockDriverState *child,
Error **errp);
 
+void (*bdrv_start_replication)(BlockDriverState *bs, ReplicationMode mode,
+   Error **errp);
+/* Drop Disk buffer when doing checkpoint. */
+void (*bdrv_do_checkpoint)(BlockDriverState *bs, Error **errp);
+/*
+ * After failover, we should flush Disk buffer into secondary disk
+ * and stop block replication.
+ *
+ * If the guest is shutdown, we should drop Disk buffer and stop
+ * block representation.
+ */
+void (*bdrv_stop_replication)(BlockDriverState *bs, bool failover,
+  Error **errp);
+
 QLIST_ENTRY(BlockDriver) list;
 };
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index fe63c6d..610da92 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1927,6 +1927,19 @@
 '*read-pattern': 'QuorumReadPattern' } }
 
 ##
+# @ReplicationMode
+#
+# An enumeration of replication modes.
+#
+# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
+#
+# @secondary: Secondary mode, receive the vm's state from primary QEMU.
+#
+# Since: 2.6
+##
+{ 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] }
+
+##
 # @BlockdevOptions
 #
 # Options for creating a block device.
-- 
1.9.3






[Qemu-devel] [PATCH v13 05/10] docs: block replication's description

2015-12-25 Thread Changlong Xie
From: Wen Congyang 

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
---
 docs/block-replication.txt | 227 +
 1 file changed, 227 insertions(+)
 create mode 100644 docs/block-replication.txt

diff --git a/docs/block-replication.txt b/docs/block-replication.txt
new file mode 100644
index 000..73abb65
--- /dev/null
+++ b/docs/block-replication.txt
@@ -0,0 +1,227 @@
+Block replication
+
+Copyright Fujitsu, Corp. 2015
+Copyright (c) 2015 Intel Corporation
+Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+Block replication is used for continuous checkpoints. It is designed
+for COLO (COarse-grain LOck-stepping) where the Secondary VM is running.
+It can also be applied for FT/HA (Fault-tolerance/High Assurance) scenario,
+where the Secondary VM is not running.
+
+This document gives an overview of block replication's design.
+
+== Background ==
+High availability solutions such as micro checkpoint and COLO will do
+consecutive checkpoints. The VM state of Primary VM and Secondary VM is
+identical right after a VM checkpoint, but becomes different as the VM
+executes till the next checkpoint. To support disk contents checkpoint,
+the modified disk contents in the Secondary VM must be buffered, and are
+only dropped at next checkpoint time. To reduce the network transportation
+effort at the time of checkpoint, the disk modification operations of
+Primary disk are asynchronously forwarded to the Secondary node.
+
+== Workflow ==
+The following is the image of block replication workflow:
+
++--+++
+|Primary Write Requests||Secondary Write Requests|
++--+++
+  |   |
+  |  (4)
+  |   V
+  |  /-\
+  |  Copy and Forward| |
+  |-(1)--+   | Disk Buffer |
+  |  |   | |
+  | (3)  \-/
+  | speculative  ^
+  |write through(2)
+  |  |   |
+  V  V   |
+   +--+   ++
+   | Primary Disk |   | Secondary Disk |
+   +--+   ++
+
+1) Primary write requests will be copied and forwarded to Secondary
+   QEMU.
+2) Before Primary write requests are written to Secondary disk, the
+   original sector content will be read from Secondary disk and
+   buffered in the Disk buffer, but it will not overwrite the existing
+   sector content (it could be from either "Secondary Write Requests" or
+   previous COW of "Primary Write Requests") in the Disk buffer.
+3) Primary write requests will be written to Secondary disk.
+4) Secondary write requests will be buffered in the Disk buffer and it
+   will overwrite the existing sector content in the buffer.
+
+== Architecture ==
+We are going to implement block replication from many basic
+blocks that are already in QEMU.
+
+ virtio-blk   ||
+ ^||.--
+ |||| Secondary
+1 Quorum  ||'--
+ /  \ ||
+/\||
+   Primary2 filter
+ disk ^
 virtio-blk
+  |
  ^
+3 NBD  --->  3 NBD 
  |
+client|| server
  2 filter
+  ||^  
  ^
+. |||  
  |
+Primary | ||  Secondary disk <- hidden-disk 5 
<- active-disk 4
+' |||  backing^   backing
+  ||| |
+  ||| |
+  ||'-'
+  ||   drive-backup sync=none 6

[Qemu-devel] [PATCH v13 08/10] Implement new driver for block replication

2015-12-25 Thread Changlong Xie
From: Wen Congyang 

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
---
 block/Makefile.objs |   1 +
 block/replication.c | 545 
 2 files changed, 546 insertions(+)
 create mode 100644 block/replication.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index fa05f37..94c1d03 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -23,6 +23,7 @@ block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o
 block-obj-y += write-threshold.o
 block-obj-y += backup.o
+block-obj-y += replication.o
 
 common-obj-y += stream.o
 common-obj-y += commit.o
diff --git a/block/replication.c b/block/replication.c
new file mode 100644
index 000..6a061c9
--- /dev/null
+++ b/block/replication.c
@@ -0,0 +1,545 @@
+/*
+ * Replication Block filter
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2015 Intel Corporation
+ * Copyright (c) 2015 FUJITSU LIMITED
+ *
+ * Author:
+ *   Wen Congyang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu-common.h"
+#include "block/block_int.h"
+#include "block/blockjob.h"
+#include "block/nbd.h"
+
+typedef struct BDRVReplicationState {
+ReplicationMode mode;
+int replication_state;
+BlockDriverState *active_disk;
+BlockDriverState *hidden_disk;
+BlockDriverState *secondary_disk;
+BlockDriverState *top_bs;
+Error *blocker;
+int orig_hidden_flags;
+int orig_secondary_flags;
+int error;
+} BDRVReplicationState;
+
+enum {
+BLOCK_REPLICATION_NONE, /* block replication is not started */
+BLOCK_REPLICATION_RUNNING,  /* block replication is running */
+BLOCK_REPLICATION_DONE, /* block replication is done(failover) */
+};
+
+static void replication_stop(BlockDriverState *bs, bool failover, Error 
**errp);
+
+#define REPLICATION_MODE"mode"
+static QemuOptsList replication_runtime_opts = {
+.name = "replication",
+.head = QTAILQ_HEAD_INITIALIZER(replication_runtime_opts.head),
+.desc = {
+{
+.name = REPLICATION_MODE,
+.type = QEMU_OPT_STRING,
+},
+{ /* end of list */ }
+},
+};
+
+static int replication_open(BlockDriverState *bs, QDict *options,
+int flags, Error **errp)
+{
+int ret;
+BDRVReplicationState *s = bs->opaque;
+Error *local_err = NULL;
+QemuOpts *opts = NULL;
+const char *mode;
+
+ret = -EINVAL;
+opts = qemu_opts_create(&replication_runtime_opts, NULL, 0, &error_abort);
+qemu_opts_absorb_qdict(opts, options, &local_err);
+if (local_err) {
+goto fail;
+}
+
+mode = qemu_opt_get(opts, REPLICATION_MODE);
+if (!mode) {
+error_setg(&local_err, "Missing the option mode");
+goto fail;
+}
+
+if (!strcmp(mode, "primary")) {
+s->mode = REPLICATION_MODE_PRIMARY;
+} else if (!strcmp(mode, "secondary")) {
+s->mode = REPLICATION_MODE_SECONDARY;
+} else {
+error_setg(&local_err,
+   "The option mode's value should be primary or secondary");
+goto fail;
+}
+
+ret = 0;
+
+fail:
+qemu_opts_del(opts);
+/* propagate error */
+if (local_err) {
+error_propagate(errp, local_err);
+}
+return ret;
+}
+
+static void replication_close(BlockDriverState *bs)
+{
+BDRVReplicationState *s = bs->opaque;
+
+if (s->replication_state == BLOCK_REPLICATION_RUNNING) {
+replication_stop(bs, false, NULL);
+}
+}
+
+static int64_t replication_getlength(BlockDriverState *bs)
+{
+return bdrv_getlength(bs->file->bs);
+}
+
+static int replication_get_io_status(BDRVReplicationState *s)
+{
+switch (s->replication_state) {
+case BLOCK_REPLICATION_NONE:
+return -EIO;
+case BLOCK_REPLICATION_RUNNING:
+return 0;
+case BLOCK_REPLICATION_DONE:
+return s->mode == REPLICATION_MODE_PRIMARY ? -EIO : 1;
+default:
+abort();
+}
+}
+
+static int replication_return_value(BDRVReplicationState *s, int ret)
+{
+if (s->mode == REPLICATION_MODE_SECONDARY) {
+return ret;
+}
+
+if (ret < 0) {
+s->error = ret;
+ret = 0;
+}
+
+return ret;
+}
+
+static coroutine_fn int replication_co_readv(BlockDriverState *bs,
+ int64_t sector_num,
+ int remaining_sectors,
+ QEMUIOVector *qiov)
+{
+BDRVReplicationState *s = bs->opaque;
+int ret;
+
+if (s->mode == REPLICATION_MODE_PRIMARY) {
+/* We only use it to forward primary write requests */
+return -EIO;
+}
+
+ret = replication_get_io_status(s);
+if (ret < 0) {
+return ret;
+}
+
+/*
+ * After failover, because we don't 

[Qemu-devel] [PATCH v13 10/10] Add a new API to start/stop replication, do checkpoint to all BDSes

2015-12-25 Thread Changlong Xie
From: Wen Congyang 

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
---
 block.c   | 83 +++
 include/block/block.h |  4 +++
 2 files changed, 87 insertions(+)

diff --git a/block.c b/block.c
index 275d8b4..634cc97 100644
--- a/block.c
+++ b/block.c
@@ -4432,3 +4432,86 @@ void bdrv_stop_replication(BlockDriverState *bs, bool 
failover, Error **errp)
" replication", bs->filename);
 }
 }
+
+void bdrv_start_replication_all(ReplicationMode mode, Error **errp)
+{
+BlockDriverState *bs = NULL, *temp = NULL;
+Error *local_err = NULL;
+
+while ((bs = bdrv_next(bs))) {
+if (!QLIST_EMPTY(&bs->parents)) {
+/* It is not top BDS */
+continue;
+}
+
+if (bdrv_is_read_only(bs) || !bdrv_is_inserted(bs)) {
+continue;
+}
+
+bdrv_start_replication(bs, mode, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+goto fail;
+}
+}
+
+return;
+
+fail:
+while ((temp = bdrv_next(temp)) && bs != temp) {
+bdrv_stop_replication(temp, false, NULL);
+}
+}
+
+void bdrv_do_checkpoint_all(Error **errp)
+{
+BlockDriverState *bs = NULL;
+Error *local_err = NULL;
+
+while ((bs = bdrv_next(bs))) {
+if (!QLIST_EMPTY(&bs->parents)) {
+/* It is not top BDS */
+continue;
+}
+
+if (bdrv_is_read_only(bs) || !bdrv_is_inserted(bs)) {
+continue;
+}
+
+bdrv_do_checkpoint(bs, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+}
+}
+
+void bdrv_stop_replication_all(bool failover, Error **errp)
+{
+BlockDriverState *bs = NULL;
+Error *local_err = NULL;
+
+while ((bs = bdrv_next(bs))) {
+if (!QLIST_EMPTY(&bs->parents)) {
+/* It is not top BDS */
+continue;
+}
+
+if (bdrv_is_read_only(bs) || !bdrv_is_inserted(bs)) {
+continue;
+}
+
+bdrv_stop_replication(bs, failover, &local_err);
+if (!errp) {
+/*
+ * The caller doesn't care the result, they just
+ * want to stop all block's replication.
+ */
+continue;
+}
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+}
+}
diff --git a/include/block/block.h b/include/block/block.h
index 5d47cef..9c4de14 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -592,4 +592,8 @@ void bdrv_start_replication(BlockDriverState *bs, 
ReplicationMode mode,
 void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp);
 void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp);
 
+void bdrv_start_replication_all(ReplicationMode mode, Error **errp);
+void bdrv_do_checkpoint_all(Error **errp);
+void bdrv_stop_replication_all(bool failover, Error **errp);
+
 #endif
-- 
1.9.3






[Qemu-devel] [PATCH] Add missing syscall nrs. This updates the QEMU syscall tables to more recent Linux kernels.

2015-12-25 Thread Johan Ouwerkerk
This change covers arm, aarch64, mips. Others to follow?

The change was prompted by QEMU warning about a syscall 384 (get_random()) with 
Debian armhf binaries (ARMv7).

Signed-off-by: Johan Ouwerkerk 
---
 linux-user/aarch64/syscall_nr.h | 13 +
 linux-user/arm/syscall_nr.h | 12 
 linux-user/mips/syscall_nr.h| 12 
 3 files changed, 37 insertions(+)

diff --git a/linux-user/aarch64/syscall_nr.h b/linux-user/aarch64/syscall_nr.h
index 743255d..74f4275 100644
--- a/linux-user/aarch64/syscall_nr.h
+++ b/linux-user/aarch64/syscall_nr.h
@@ -262,6 +262,19 @@
 #define TARGET_NR_process_vm_writev 271
 #define TARGET_NR_kcmp 272
 #define TARGET_NR_finit_module 273
+
+#define TARGET_NR_sched_setattr 274
+#define TARGET_NR_sched_getattr 275
+#define TARGET_NR_renameat2 276
+#define TARGET_NR_seccomp 277
+#define TARGET_NR_getrandom 278
+#define TARGET_NR_memfd_create 279
+#define TARGET_NR_bpf 280
+#define TARGET_NR_execveat 281
+#define TARGET_NR_userfaultfd 282
+#define TARGET_NR_membarrier 283
+#define TARGET_NR_mlock2 284
+
 #define TARGET_NR_open 1024
 #define TARGET_NR_link 1025
 #define TARGET_NR_unlink 1026
diff --git a/linux-user/arm/syscall_nr.h b/linux-user/arm/syscall_nr.h
index 53552be..cc9089c 100644
--- a/linux-user/arm/syscall_nr.h
+++ b/linux-user/arm/syscall_nr.h
@@ -384,3 +384,15 @@
 #define TARGET_NR_process_vm_writev(377)
 #define TARGET_NR_kcmp (378)
 #define TARGET_NR_finit_module (379)
+
+#define TARGET_NR_sched_setattr(380)
+#define TARGET_NR_sched_getattr(381)
+#define TARGET_NR_renameat2(382)
+#define TARGET_NR_seccomp  (383)
+#define TARGET_NR_getrandom(384)
+#define TARGET_NR_memfd_create (385)
+#define TARGET_NR_bpf  (386)
+#define TARGET_NR_execveat (387)
+#define TARGET_NR_userfaultfd  (388)
+#define TARGET_NR_membarrier   (389)
+#define TARGET_NR_mlock2   (390)
diff --git a/linux-user/mips/syscall_nr.h b/linux-user/mips/syscall_nr.h
index 2d1a13e..6819f86 100644
--- a/linux-user/mips/syscall_nr.h
+++ b/linux-user/mips/syscall_nr.h
@@ -351,3 +351,15 @@
 #define TARGET_NR_process_vm_writev (TARGET_NR_Linux + 346)
 #define TARGET_NR_kcmp  (TARGET_NR_Linux + 347)
 #define TARGET_NR_finit_module  (TARGET_NR_Linux + 348)
+
+#define TARGET_NR_sched_setattr (TARGET_NR_Linux + 349)
+#define TARGET_NR_sched_getattr (TARGET_NR_Linux + 350)
+#define TARGET_NR_renameat2 (TARGET_NR_Linux + 351)
+#define TARGET_NR_seccomp   (TARGET_NR_Linux + 352)
+#define TARGET_NR_getrandom (TARGET_NR_Linux + 353)
+#define TARGET_NR_memfd_create  (TARGET_NR_Linux + 354)
+#define TARGET_NR_bpf   (TARGET_NR_Linux + 355)
+#define TARGET_NR_execveat  (TARGET_NR_Linux + 356)
+#define TARGET_NR_userfaultfd   (TARGET_NR_Linux + 357)
+#define TARGET_NR_membarrier(TARGET_NR_Linux + 358)
+#define TARGET_NR_mlock2(TARGET_NR_Linux + 359)
-- 
2.6.4




[Qemu-devel] git send-email didn't arrive?

2015-12-25 Thread Johan Ouwerkerk
Recently I attempted to post a patch to the qemu-devel mailing list
through git send-email (as per the wiki) and while a copy of the
e-mail has hit my inbox, it doesn't seem to have hit the mailing list
judging by checking gmane.

Something seems to have gone wrong. The subject of the mail was:

[PATCH] Add missing syscall nrs. This updates the QEMU syscall tables
to more recent Linux kernels.

I used git format-patch with --stdout -1 -s > syscalls_v1.patch
I used git send-email with --to=qemu-devel@nongnu.org
--cc=qemu-triv...@nongnu.org syscalls_v1.patch
I have git sendemail set up as follows:

from = Johan Ouwerkerk 
smtpserver = smtp.gmail.com
smtpuser = jm.ouwerk...@gmail.com
smtpencryption = tls
suppresscc = self

(FWIW the mail was delivered with SMTP status code of 250, i.e. OK).
Does anything stand out as obviously wrong? And if so what should I
have done/used instead?

Regards,

-Johan Ouwerkerk



Re: [Qemu-devel] git send-email didn't arrive?

2015-12-25 Thread Johan Ouwerkerk
Scratch this? It seems to have arrived just now, finally, having taken
a mere 16 hours and 4 minutes to get there... ?



[Qemu-devel] [Bug 1529187] Re: vfio passtrhough fails at 'No available IOMMU models' on Intel BDW-EP platform

2015-12-25 Thread Alex Williamson
You've somehow managed to not load the vfio_iommu_type1 module.  The
vfio module will request it when loading, if the module is not available
when loading, such as from an initramfs that does not include the full
set of vfio modules, it will need to be loaded later manually.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1529187

Title:
  vfio passtrhough fails at 'No available IOMMU models' on Intel BDW-EP
  platform

Status in QEMU:
  New

Bug description:
  Environment:
   
   Host OS (ia32/ia32e/IA64): ia32e
   Guest OS (ia32/ia32e/IA64): ia32e
   Guest OS Type (Linux/Windows): linux
   kvm.git Commit: da3f7ca3
   qemu.git Commit: 38a762fe 
   Host Kernel Version: 4.4.0-rc2
   Hardware: BDW EP (Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz, Grantley-EP)

  Bug description:
   --
   when create guest with vt-d assignment using vfio-pci driver, the guest can 
not be created.
  Warning 'No available IOMMU models'

  
  Reproduce steps:
   
   1. bind device to vfio-pci driver
   2. qemu-system-x86_64 -enable-kvm -m 512 -smp 2 -device 
vfio-pci,host=81:00.0 -net none -drive 
file=rhel7u2.qcow2,if=none,id=virtio-disk0 -device 
virtio-blk-pci,drive=virtio-disk0

  Current result:
   
   qemu-system-x86_64: -device vfio-pci,host=81:00.0: vfio: No available IOMMU 
models
   qemu-system-x86_64: -device vfio-pci,host=81:00.0: vfio: failed to setup 
container for group 41
   qemu-system-x86_64: -device vfio-pci,host=81:00.0: vfio: failed to get group 
41
   qemu-system-x86_64: -device vfio-pci,host=81:00.0: Device initialization 
failed

  Expected result:
   
   guest can be created
  Basic root-causing log:
   --
  

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1529187/+subscriptions



[Qemu-devel] [PATCH v4 2/4] i386/acpi: make floppy controller object dynamic

2015-12-25 Thread Roman Kagan
Instead of statically declaring the floppy controller in DSDT, with its
_STA method depending on some obscure bit in the parent ISA bridge, add
the object dynamically to SSDT via AML API only when the controller is
present.

The _STA method is no longer necessary and is therefore dropped.  So are
the declarations of the fields indicating whether the contoller is
enabled.

Signed-off-by: Roman Kagan 
Cc: "Michael S. Tsirkin" 
Cc: Eduardo Habkost 
Cc: Igor Mammedov 
Cc: John Snow 
Cc: Kevin Wolf 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: qemu-bl...@nongnu.org
Cc: qemu-sta...@nongnu.org
---
changes since v3:
 - new patch (note that it conflicts with "[PATCH 50/74] pc: acpi: move
   FDC0 device from DSDT to SSDT" from Igor's series)
 - include test data updates to maintain bisectability

 hw/i386/acpi-build.c|  24 
 hw/i386/acpi-dsdt-isa.dsl   |  18 --
 hw/i386/acpi-dsdt.dsl   |   1 -
 hw/i386/q35-acpi-dsdt.dsl   |   7 ++-
 tests/acpi-test-data/pc/DSDT| Bin 3028 -> 2946 bytes
 tests/acpi-test-data/pc/SSDT| Bin 2486 -> 2554 bytes
 tests/acpi-test-data/pc/SSDT.bridge | Bin 4345 -> 4413 bytes
 tests/acpi-test-data/q35/DSDT   | Bin 7666 -> 7578 bytes
 8 files changed, 26 insertions(+), 24 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 4cc1440..a01e909 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -113,6 +113,7 @@ typedef struct AcpiMiscInfo {
 unsigned dsdt_size;
 uint16_t pvpanic_port;
 uint16_t applesmc_io_base;
+bool has_fdc;
 } AcpiMiscInfo;
 
 typedef struct AcpiBuildPciBusHotplugState {
@@ -236,10 +237,15 @@ static void acpi_get_pm_info(AcpiPmInfo *pm)
 
 static void acpi_get_misc_info(AcpiMiscInfo *info)
 {
+ISADevice *fdc;
+
 info->has_hpet = hpet_find();
 info->tpm_version = tpm_get_version();
 info->pvpanic_port = pvpanic_port();
 info->applesmc_io_base = applesmc_port();
+
+fdc = pc_find_fdc0();
+info->has_fdc = !!fdc;
 }
 
 /*
@@ -1099,6 +1105,24 @@ build_ssdt(GArray *table_data, GArray *linker,
 aml_append(scope, aml_name_decl("_S5", pkg));
 aml_append(ssdt, scope);
 
+if (misc->has_fdc) {
+scope = aml_scope("\\_SB.PCI0.ISA");
+dev = aml_device("FDC0");
+
+aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0700")));
+
+crs = aml_resource_template();
+aml_append(crs, aml_io(AML_DECODE16, 0x03F2, 0x03F2, 0x00, 0x04));
+aml_append(crs, aml_io(AML_DECODE16, 0x03F7, 0x03F7, 0x00, 0x01));
+aml_append(crs, aml_irq_no_flags(6));
+aml_append(crs, aml_dma(AML_COMPATIBILITY, AML_NOTBUSMASTER,
+AML_TRANSFER8, 2));
+aml_append(dev, aml_name_decl("_CRS", crs));
+
+aml_append(scope, dev);
+aml_append(ssdt, scope);
+}
+
 if (misc->applesmc_io_base) {
 scope = aml_scope("\\_SB.PCI0.ISA");
 dev = aml_device("SMC");
diff --git a/hw/i386/acpi-dsdt-isa.dsl b/hw/i386/acpi-dsdt-isa.dsl
index 89caa16..061507d 100644
--- a/hw/i386/acpi-dsdt-isa.dsl
+++ b/hw/i386/acpi-dsdt-isa.dsl
@@ -47,24 +47,6 @@ Scope(\_SB.PCI0.ISA) {
 })
 }
 
-Device(FDC0) {
-Name(_HID, EisaId("PNP0700"))
-Method(_STA, 0, NotSerialized) {
-Store(FDEN, Local0)
-If (LEqual(Local0, 0)) {
-Return (0x00)
-} Else {
-Return (0x0F)
-}
-}
-Name(_CRS, ResourceTemplate() {
-IO(Decode16, 0x03F2, 0x03F2, 0x00, 0x04)
-IO(Decode16, 0x03F7, 0x03F7, 0x00, 0x01)
-IRQNoFlags() { 6 }
-DMA(Compatibility, NotBusMaster, Transfer8) { 2 }
-})
-}
-
 Device(LPT) {
 Name(_HID, EisaId("PNP0400"))
 Method(_STA, 0, NotSerialized) {
diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl
index 8dba096..aa50990 100644
--- a/hw/i386/acpi-dsdt.dsl
+++ b/hw/i386/acpi-dsdt.dsl
@@ -80,7 +80,6 @@ DefinitionBlock (
 , 3,
 CBEN, 1, // COM2
 }
-Name(FDEN, 1)
 }
 }
 
diff --git a/hw/i386/q35-acpi-dsdt.dsl b/hw/i386/q35-acpi-dsdt.dsl
index 7be7b37..fcb9915 100644
--- a/hw/i386/q35-acpi-dsdt.dsl
+++ b/hw/i386/q35-acpi-dsdt.dsl
@@ -137,16 +137,13 @@ DefinitionBlock (
 COMB,   3,
 
 Offset(0x01),
-LPTD,   2,
-,   2,
-FDCD,   2
+LPTD,   2
 }
 OperationRegion(LPCE, PCI_Config, 0x82, 0x2)
 Field(LPCE, AnyAcc, NoLock, Preserve) {
 CAEN,   1,
 CBEN,   1,
-LPEN,   1,
-FDEN,   1
+LPEN,   1
 }
 }
 }
diff --git a/tests/acpi-test-data/pc/DSDT b/tests/acpi-test-data/pc/DSDT
index 
c658203db94a7e7db7c36fde99a7075a8d75498d..d8ebf12cc0ae9f

[Qemu-devel] [PATCH v4 0/4] i386: expose floppy-related objects in SSDT

2015-12-25 Thread Roman Kagan
Windows on UEFI systems is only capable of detecting the presence and
the type of floppy drives via corresponding ACPI objects.

Those objects are added in the last patch of the series; the three
preceding ones pave the way to it, by making the necessary data
public and by moving the whole floppy drive controller description into
runtime-generated SSDT.

Note that the series conflicts with Igor's patchset for dynamic DSDT, in
particular, with "[PATCH 50/74] pc: acpi: move FDC0 device from DSDT
to SSDT"; I haven't managed to avoid that while trying to meet
maintainer's comments.

Roman Kagan (4):
  i386/pc: expose identifying the floppy controller
  i386/acpi: make floppy controller object dynamic
  expose floppy drive geometry and CMOS type
  i386: populate floppy drive information in SSDT

Signed-off-by: Roman Kagan 
Cc: "Michael S. Tsirkin" 
Cc: Eduardo Habkost 
Cc: Igor Mammedov 
Cc: John Snow 
Cc: Kevin Wolf 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: qemu-bl...@nongnu.org
Cc: qemu-sta...@nongnu.org
---
changes since v3:
 - make FDC object fully dynamic in a separate patch
 - split out support patches
 - include test data updates with the respective patches to maintain
   bisectability

changes since v2:
 - explicit endianness for buffer data
 - reorder code to reduce conflicts with dynamic DSDT patchset
 - update test data


 hw/block/fdc.c  |  11 +
 hw/i386/acpi-build.c|  92 
 hw/i386/acpi-dsdt-isa.dsl   |  18 ---
 hw/i386/acpi-dsdt.dsl   |   1 -
 hw/i386/pc.c|  46 ++
 hw/i386/q35-acpi-dsdt.dsl   |   7 +--
 include/hw/block/fdc.h  |   2 +
 include/hw/i386/pc.h|   3 ++
 tests/acpi-test-data/pc/DSDT| Bin 3028 -> 2946 bytes
 tests/acpi-test-data/pc/SSDT| Bin 2486 -> 2635 bytes
 tests/acpi-test-data/pc/SSDT.bridge | Bin 4345 -> 4494 bytes
 tests/acpi-test-data/q35/DSDT   | Bin 7666 -> 7578 bytes
 12 files changed, 137 insertions(+), 43 deletions(-)

-- 
2.5.0




[Qemu-devel] [PATCH v4 3/4] expose floppy drive geometry and CMOS type

2015-12-25 Thread Roman Kagan
Make it possible to query the geometry and the CMOS type of a floppy
drive outside of the respective source files.

It will be useful, in particular, when dynamically building ACPI tables,
and will allow to properly populate the corresponding ACPI objects and
thus enable BIOS-less systems to access the floppy drives.

Signed-off-by: Roman Kagan 
Cc: "Michael S. Tsirkin" 
Cc: Eduardo Habkost 
Cc: Igor Mammedov 
Cc: John Snow 
Cc: Kevin Wolf 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: qemu-bl...@nongnu.org
Cc: qemu-sta...@nongnu.org
---
changes since v3:
 - split out into a separate patch to faciliate review

 hw/block/fdc.c | 11 +++
 hw/i386/pc.c   |  2 +-
 include/hw/block/fdc.h |  2 ++
 include/hw/i386/pc.h   |  1 +
 4 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index 4292ece..c858c5f 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -2408,6 +2408,17 @@ FDriveType isa_fdc_get_drive_type(ISADevice *fdc, int i)
 return isa->state.drives[i].drive;
 }
 
+void isa_fdc_get_drive_geometry(ISADevice *fdc, int i, uint8_t *cylinders,
+uint8_t *heads, uint8_t *sectors)
+{
+FDCtrlISABus *isa = ISA_FDC(fdc);
+FDrive *drv = &isa->state.drives[i];
+
+*cylinders = drv->max_track;
+*heads = (drv->flags & FDISK_DBL_SIDES) ? 2 : 1;
+*sectors = drv->last_sect;
+}
+
 static const VMStateDescription vmstate_isa_fdc ={
 .name = "fdc",
 .version_id = 2,
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index c36b8cf..99fab83 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -199,7 +199,7 @@ static void pic_irq_request(void *opaque, int irq, int 
level)
 
 #define REG_EQUIPMENT_BYTE  0x14
 
-static int cmos_get_fd_drive_type(FDriveType fd0)
+int cmos_get_fd_drive_type(FDriveType fd0)
 {
 int val;
 
diff --git a/include/hw/block/fdc.h b/include/hw/block/fdc.h
index d48b2f8..adaf3dc 100644
--- a/include/hw/block/fdc.h
+++ b/include/hw/block/fdc.h
@@ -22,5 +22,7 @@ void sun4m_fdctrl_init(qemu_irq irq, hwaddr io_base,
DriveInfo **fds, qemu_irq *fdc_tc);
 
 FDriveType isa_fdc_get_drive_type(ISADevice *fdc, int i);
+void isa_fdc_get_drive_geometry(ISADevice *fdc, int i, uint8_t *cylinders,
+uint8_t *heads, uint8_t *sectors);
 
 #endif
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 819..d044a9a 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -268,6 +268,7 @@ typedef void (*cpu_set_smm_t)(int smm, void *arg);
 void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name);
 
 ISADevice *pc_find_fdc0(void);
+int cmos_get_fd_drive_type(FDriveType fd0);
 
 /* acpi_piix.c */
 
-- 
2.5.0




[Qemu-devel] [PATCH v4 1/4] i386/pc: expose identifying the floppy controller

2015-12-25 Thread Roman Kagan
Factor out and expose the function to locate the floppy controller in
the system.
It will be useful when dynamically populating the relevant objects in
the ACPI tables.

Signed-off-by: Roman Kagan 
Cc: "Michael S. Tsirkin" 
Cc: Eduardo Habkost 
Cc: Igor Mammedov 
Cc: John Snow 
Cc: Kevin Wolf 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: qemu-bl...@nongnu.org
Cc: qemu-sta...@nongnu.org
---
changes since v3:
 - split out into a separate patch to faciliate review

 hw/i386/pc.c | 44 ++--
 include/hw/i386/pc.h |  2 ++
 2 files changed, 28 insertions(+), 18 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 459260b..c36b8cf 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -360,6 +360,31 @@ static const char * const fdc_container_path[] = {
 "/unattached", "/peripheral", "/peripheral-anon"
 };
 
+/*
+ * Locate the FDC at IO address 0x3f0, in order to configure the CMOS registers
+ * and ACPI objects.
+ */
+ISADevice *pc_find_fdc0(void)
+{
+int i;
+Object *container;
+CheckFdcState state = { 0 };
+
+for (i = 0; i < ARRAY_SIZE(fdc_container_path); i++) {
+container = container_get(qdev_get_machine(), fdc_container_path[i]);
+object_child_foreach(container, check_fdc, &state);
+}
+
+if (state.multiple) {
+error_report("warning: multiple floppy disk controllers with "
+ "iobase=0x3f0 have been found;\n"
+ "the one being picked for CMOS setup might not reflect "
+ "your intent");
+}
+
+return state.floppy;
+}
+
 static void pc_cmos_init_late(void *opaque)
 {
 pc_cmos_init_late_arg *arg = opaque;
@@ -368,8 +393,6 @@ static void pc_cmos_init_late(void *opaque)
 int8_t heads, sectors;
 int val;
 int i, trans;
-Object *container;
-CheckFdcState state = { 0 };
 
 val = 0;
 if (ide_get_geometry(arg->idebus[0], 0,
@@ -399,22 +422,7 @@ static void pc_cmos_init_late(void *opaque)
 }
 rtc_set_memory(s, 0x39, val);
 
-/*
- * Locate the FDC at IO address 0x3f0, and configure the CMOS registers
- * accordingly.
- */
-for (i = 0; i < ARRAY_SIZE(fdc_container_path); i++) {
-container = container_get(qdev_get_machine(), fdc_container_path[i]);
-object_child_foreach(container, check_fdc, &state);
-}
-
-if (state.multiple) {
-error_report("warning: multiple floppy disk controllers with "
- "iobase=0x3f0 have been found;\n"
- "the one being picked for CMOS setup might not reflect "
- "your intent");
-}
-pc_cmos_init_floppy(s, state.floppy);
+pc_cmos_init_floppy(s, pc_find_fdc0());
 
 qemu_unregister_reset(pc_cmos_init_late, opaque);
 }
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index b0d6283..819 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -267,6 +267,8 @@ typedef void (*cpu_set_smm_t)(int smm, void *arg);
 
 void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name);
 
+ISADevice *pc_find_fdc0(void);
+
 /* acpi_piix.c */
 
 I2CBus *piix4_pm_init(PCIBus *bus, int devfn, uint32_t smb_io_base,
-- 
2.5.0




[Qemu-devel] [PATCH v4 4/4] i386: populate floppy drive information in SSDT

2015-12-25 Thread Roman Kagan
On x86-based systems Linux determines the presence and the type of
floppy drives via a query of a CMOS field.  So does SeaBIOS when
populating the return data for int 0x13 function 0x08.

Windows doesn't; instead, it requests this information from BIOS via int
0x13/0x08 or through ACPI objects _FDE (Floppy Drive Enumerate) and _FDI
(Floppy Drive Information) of the floppy controller object.  On UEFI
systems only ACPI-based detection is supported.

QEMU used not to provide those objects in its ACPI tables; as a result
floppy drives were invisible to Windows on UEFI/OVMF.

This patch adds those objects to the floppy controller in SSDT,
populating them with the information from the respective QEMU objects.

Signed-off-by: Roman Kagan 
Cc: "Michael S. Tsirkin" 
Cc: Eduardo Habkost 
Cc: Igor Mammedov 
Cc: John Snow 
Cc: Kevin Wolf 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: qemu-bl...@nongnu.org
Cc: qemu-sta...@nongnu.org
---
changes since v3:
 - build on top of dynamic FDC0 patch
 - include test data updates to maintain bisectability

changes since v2:
 - explicit endianness for buffer data
 - reorder code to reduce conflicts with dynamic DSDT patchset

 hw/i386/acpi-build.c|  68 
 tests/acpi-test-data/pc/SSDT| Bin 2554 -> 2635 bytes
 tests/acpi-test-data/pc/SSDT.bridge | Bin 4413 -> 4494 bytes
 3 files changed, 68 insertions(+)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index a01e909..7b8de59 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -38,6 +38,7 @@
 #include "hw/acpi/bios-linker-loader.h"
 #include "hw/loader.h"
 #include "hw/isa/isa.h"
+#include "hw/block/fdc.h"
 #include "hw/acpi/memory_hotplug.h"
 #include "hw/mem/nvdimm.h"
 #include "sysemu/tpm.h"
@@ -106,6 +107,13 @@ typedef struct AcpiPmInfo {
 uint16_t pcihp_io_len;
 } AcpiPmInfo;
 
+typedef struct AcpiFDInfo {
+uint8_t type;
+uint8_t cylinders;
+uint8_t heads;
+uint8_t sectors;
+} AcpiFDInfo;
+
 typedef struct AcpiMiscInfo {
 bool has_hpet;
 TPMVersion tpm_version;
@@ -114,6 +122,7 @@ typedef struct AcpiMiscInfo {
 uint16_t pvpanic_port;
 uint16_t applesmc_io_base;
 bool has_fdc;
+AcpiFDInfo fdinfo[2];
 } AcpiMiscInfo;
 
 typedef struct AcpiBuildPciBusHotplugState {
@@ -237,6 +246,7 @@ static void acpi_get_pm_info(AcpiPmInfo *pm)
 
 static void acpi_get_misc_info(AcpiMiscInfo *info)
 {
+int i;
 ISADevice *fdc;
 
 info->has_hpet = hpet_find();
@@ -246,6 +256,16 @@ static void acpi_get_misc_info(AcpiMiscInfo *info)
 
 fdc = pc_find_fdc0();
 info->has_fdc = !!fdc;
+if (fdc) {
+for (i = 0; i < ARRAY_SIZE(info->fdinfo); i++) {
+AcpiFDInfo *fdinfo = &info->fdinfo[i];
+fdinfo->type = isa_fdc_get_drive_type(fdc, i);
+if (fdinfo->type < FDRIVE_DRV_NONE) {
+isa_fdc_get_drive_geometry(fdc, i, &fdinfo->cylinders,
+   &fdinfo->heads, &fdinfo->sectors);
+}
+}
+}
 }
 
 /*
@@ -935,6 +955,40 @@ static Aml *build_crs(PCIHostState *host,
 return crs;
 }
 
+static Aml *build_fdinfo_aml(int idx, AcpiFDInfo *fdinfo)
+{
+Aml *dev, *fdi;
+
+dev = aml_device("FLP%c", 'A' + idx);
+
+aml_append(dev, aml_name_decl("_ADR", aml_int(idx)));
+
+fdi = aml_package(0x10);
+aml_append(fdi, aml_int(idx));  /* Drive Number */
+aml_append(fdi,
+aml_int(cmos_get_fd_drive_type(fdinfo->type)));  /* Device Type */
+aml_append(fdi,
+aml_int(fdinfo->cylinders - 1));  /* Maximum Cylinder Number */
+aml_append(fdi, aml_int(fdinfo->sectors));  /* Maximum Sector Number */
+aml_append(fdi, aml_int(fdinfo->heads - 1));  /* Maximum Head Number */
+/* SeaBIOS returns the below values for int 0x13 func 0x08 regardless of
+ * the drive type, so shall we */
+aml_append(fdi, aml_int(0xAF));  /* disk_specify_1 */
+aml_append(fdi, aml_int(0x02));  /* disk_specify_2 */
+aml_append(fdi, aml_int(0x25));  /* disk_motor_wait */
+aml_append(fdi, aml_int(0x02));  /* disk_sector_siz */
+aml_append(fdi, aml_int(0x12));  /* disk_eot */
+aml_append(fdi, aml_int(0x1B));  /* disk_rw_gap */
+aml_append(fdi, aml_int(0xFF));  /* disk_dtl */
+aml_append(fdi, aml_int(0x6C));  /* disk_formt_gap */
+aml_append(fdi, aml_int(0xF6));  /* disk_fill */
+aml_append(fdi, aml_int(0x0F));  /* disk_head_sttl */
+aml_append(fdi, aml_int(0x08));  /* disk_motor_strt */
+
+aml_append(dev, aml_name_decl("_FDI", fdi));
+return dev;
+}
+
 static void
 build_ssdt(GArray *table_data, GArray *linker,
AcpiCpuInfo *cpu, AcpiPmInfo *pm, AcpiMiscInfo *misc,
@@ -1106,6 +1160,8 @@ build_ssdt(GArray *table_data, GArray *linker,
 aml_append(ssdt, scope);
 
 if (misc->has_fdc) {
+uint32_t fde_buf[5] = {0, 0, 0, 0, cpu_to_le32(0x2)};
+
 scope = aml_scope("\\_SB.PCI0.ISA");
 dev = aml_device("FDC0");
 
@@ -1119,6 +1

Re: [Qemu-devel] [PATCH v3 2/2] tests: update expected SSDT for floppy changes

2015-12-25 Thread Roman Kagan
On Thu, Dec 24, 2015 at 08:17:45AM +0200, Michael S. Tsirkin wrote:
> On Wed, Dec 23, 2015 at 08:51:45PM +0300, Roman Kagan wrote:
> > On Wed, Dec 23, 2015 at 06:47:16PM +0100, Igor Mammedov wrote:
> > > On Wed, 23 Dec 2015 20:20:54 +0300
> > > Roman Kagan  wrote:
> > > > > ... two 1.44M drives with bogus geometry for q35.
> > > > 
> > > > This one is a bug in my patch, indeed: I was tricked by FDRIVE_DRV_NONE
> > > > being non-zero, and forgot to initialize the respective fields in
> > > > acpi_get_misc_info() in case there is no floppy controller at all.
> > > so instead of fake initialization, it's worth to make your patch
> > > conditional on presence of controller after all.
> > > i.e. add AML only if controller was present.
> > 
> > Indeed :)
> > 
> > Roman.
> 
> Or rather, start series with a patch making FDC conditional,
> then update expected ssdt, then tweak methods within -
> should not change ssdt since we don't create a floppy in
> the test.

Actually we do.  "pc" machine type has it by default regardless of
whether anything is attached to it.

So I ended up with the patch series v4 I just posted but I'm not sure it
addresses all the concerns people have had about v3.

Thanks,
Roman.



Re: [Qemu-devel] git send-email didn't arrive?

2015-12-25 Thread Peter Maydell
On 25 December 2015 at 14:42, Johan Ouwerkerk  wrote:
> Scratch this? It seems to have arrived just now, finally, having taken
> a mere 16 hours and 4 minutes to get there... ?

The GNU list servers occasionally get a bit overloaded and sit
on mail for a while...

thanks
-- PMM



Re: [Qemu-devel] live migration vs device assignment (motivation)

2015-12-25 Thread Alexander Duyck
On Thu, Dec 24, 2015 at 11:03 PM, Lan Tianyu  wrote:
> Merry Christmas.
> Sorry for later response due to personal affair.
>
> On 2015年12月14日 03:30, Alexander Duyck wrote:
>>> > These sounds we need to add a faked bridge for migration and adding a
>>> > driver in the guest for it. It also needs to extend PCI bus/hotplug
>>> > driver to do pause/resume other devices, right?
>>> >
>>> > My concern is still that whether we can change PCI bus/hotplug like that
>>> > without spec change.
>>> >
>>> > IRQ should be general for any devices and we may extend it for
>>> > migration. Device driver also can make decision to support migration
>>> > or not.
>> The device should have no say in the matter.  Either we are going to
>> migrate or we will not.  This is why I have suggested my approach as
>> it allows for the least amount of driver intrusion while providing the
>> maximum number of ways to still perform migration even if the device
>> doesn't support it.
>
> Even if the device driver doesn't support migration, you still want to
> migrate VM? That maybe risk and we should add the "bad path" for the
> driver at least.

At a minimum we should have support for hot-plug if we are expecting
to support migration.  You would simply have to hot-plug the device
before you start migration and then return it after.  That is how the
current bonding approach for this works if I am not mistaken.

The advantage we are looking to gain is to avoid removing/disabling
the device for as long as possible.  Ideally we want to keep the
device active through the warm-up period, but if the guest doesn't do
that we should still be able to fall back on the older approaches if
needed.

>>
>> The solution I have proposed is simple:
>>
>> 1.  Extend swiotlb to allow for a page dirtying functionality.
>>
>>  This part is pretty straight forward.  I'll submit a few patches
>> later today as RFC that can provided the minimal functionality needed
>> for this.
>
> Very appreciate to do that.
>
>>
>> 2.  Provide a vendor specific configuration space option on the QEMU
>> implementation of a PCI bridge to act as a bridge between direct
>> assigned devices and the host bridge.
>>
>>  My thought was to add some vendor specific block that includes a
>> capabilities, status, and control register so you could go through and
>> synchronize things like the DMA page dirtying feature.  The bridge
>> itself could manage the migration capable bit inside QEMU for all
>> devices assigned to it.  So if you added a VF to the bridge it would
>> flag that you can support migration in QEMU, while the bridge would
>> indicate you cannot until the DMA page dirtying control bit is set by
>> the guest.
>>
>>  We could also go through and optimize the DMA page dirtying after
>> this is added so that we can narrow down the scope of use, and as a
>> result improve the performance for other devices that don't need to
>> support migration.  It would then be a matter of adding an interrupt
>> in the device to handle an event such as the DMA page dirtying status
>> bit being set in the config space status register, while the bit is
>> not set in the control register.  If it doesn't get set then we would
>> have to evict the devices before the warm-up phase of the migration,
>> otherwise we can defer it until the end of the warm-up phase.
>>
>> 3.  Extend existing shpc driver to support the optional "pause"
>> functionality as called out in section 4.1.2 of the Revision 1.1 PCI
>> hot-plug specification.
>
> Since your solution has added a faked PCI bridge. Why not notify the
> bridge directly during migration via irq and call device driver's
> callback in the new bridge driver?
>
> Otherwise, the new bridge driver also can check whether the device
> driver provides migration callback or not and call them to improve the
> passthough device's performance during migration.

This is basically what I had in mind.  Though I would take things one
step further.  You don't need to add any new call-backs if you make
use of the existing suspend/resume logic.  For a VF this does exactly
what you would need since the VFs don't support wake on LAN so it will
simply clear the bus master enable and put the netdev in a suspended
state until the resume can be called.

The PCI hot-plug specification calls out that the OS can optionally
implement a "pause" mechanism which is meant to be used for high
availability type environments.  What I am proposing is basically
extending the standard SHPC capable PCI bridge so that we can support
the DMA page dirtying for everything hosted on it, add a vendor
specific block to the config space so that the guest can notify the
host that it will do page dirtying, and add a mechanism to indicate
that all hot-plug events during the warm-up phase of the migration are
pause events instead of full removals.

I've been poking around in the kernel and QEMU code and the part I
have been trying to sort out is how to get QEMU based pci-bridge to
use the SHPC