[Qemu-devel] [Bug 1527765] Re: sh4: ghc randomly segfaults on qemu-sh4-static
Interestingly, cmake also seems to crash in a similar way: - Log: https://buildd.debian.org/status/fetch.php?pkg=apt-cacher-ng&arch=sh4&ver=0.8.8-1&stamp=1450985460 - Log: https://buildd.debian.org/status/fetch.php?pkg=texworks&arch=sh4&ver=0.5~svn1363-6%2Bb1&stamp=1450992669 - Log: https://buildd.debian.org/status/fetch.php?pkg=x265&arch=sh4&ver=1.8-6&stamp=1450995672 - Log: https://buildd.debian.org/status/fetch.php?pkg=libwebsockets&arch=sh4&ver=1.6.0-2&stamp=1450997039 Maybe those are related? -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1527765 Title: sh4: ghc randomly segfaults on qemu-sh4-static Status in QEMU: New Bug description: Hello! I am currently in the process of bootstrapping ghc for the Debian sh4 port and ran into a strange problem with qemu-sh4-static which randomly segfaults when running ghc to compile a Haskell source: root@jessie32:~/ghc-7.8.4/utils/ghc-pwd# ls Main.hi Main.hs Setup.hs ghc-pwd.cabal ghc.mk root@jessie32:~/ghc-7.8.4/utils/ghc-pwd# ghc Main.hs /bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8) qemu: uncaught target signal 11 (Segmentation fault) - core dumped Segmentation fault root@jessie32:~/ghc-7.8.4/utils/ghc-pwd# ghc Main.hs /bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8) qemu: uncaught target signal 11 (Segmentation fault) - core dumped Segmentation fault root@jessie32:~/ghc-7.8.4/utils/ghc-pwd# ghc Main.hs /bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8) qemu: uncaught target signal 11 (Segmentation fault) - core dumped Segmentation fault root@jessie32:~/ghc-7.8.4/utils/ghc-pwd# ghc Main.hs /bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8) qemu: uncaught target signal 11 (Segmentation fault) - core dumped Segmentation fault root@jessie32:~/ghc-7.8.4/utils/ghc-pwd# ghc Main.hs /bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8) [1 of 1] Compiling Main ( Main.hs, Main.o ) qemu: uncaught target signal 11 (Segmentation fault) - core dumped Segmentation fault root@jessie32:~/ghc-7.8.4/utils/ghc-pwd# ghc Main.hs /bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8) [1 of 1] Compiling Main ( Main.hs, Main.o ) qemu: uncaught target signal 11 (Segmentation fault) - core dumped Segmentation fault root@jessie32:~/ghc-7.8.4/utils/ghc-pwd# ghc Main.hs /bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8) [1 of 1] Compiling Main ( Main.hs, Main.o ) Bad interface file: /usr/local/lib/sh4-unknown-linux-gnu-ghc-7.10.3/time/dist-install/build/Data/Time/Format/Parse.hi ghc: panic! (the 'impossible' happened) (GHC version 7.10.3 for sh4-unknown-linux): getSymtabName:unknown known-key unique <> Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug root@jessie32:~/ghc-7.8.4/utils/ghc-pwd# ghc Main.hs /bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8) [1 of 1] Compiling Main ( Main.hs, Main.o ) Linking Main ... root@jessie32:~/ghc-7.8.4/utils/ghc-pwd# As seen above, compiling a Haskell source code often results in a segfault but simply by retrying to run ghc over and over again, the compile process will eventually succeed and no segfault occurs. I have created a tarball which contains the sh4 chroot from the example above which also includes ghc, gcc and the source code in question (in /root/ghc-7.8.4/utils/ghc-pwd). To test, it's probably a good idea to replace the qemu-sh4-static binary in /usr/bin with a current git snapshot (which I tried but didn't help). > http://users.physik.fu-berlin.de/~glaubitz/sid-sh4-sbuild-ghc.tgz In case anyone wants to try ghc with their own sh4 chroot, here's my version of ghc: > https://people.debian.org/~glaubitz/sh4-unknown-linux-gnu- ghc-7.10.3.tar.gz Just extract this tarball into the root directory of the sh4 chroot. Please note, that it might be advisable on sh4 to apply the patches from these two bug reports as otherwise qemu-sh4-static won't work properly on amd64 and misses syscall 186: > https://bugs.launchpad.net/ubuntu/+source/qemu-linaro/+bug/1254824 > https://bugs.launchpad.net/qemu/+bug/1516408 The above issue is reproducible with the two patches applied and without. It's also reproducible with both libc6 2.19 and 2.21 in the chroot. Thus, I am currently out of ideas what else to test. Cheers, Adrian To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1527765/+subscriptions
[Qemu-devel] [PATCH v9 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()
From: Wen Congyang Signed-off-by: Wen Congyang Signed-off-by: zhanghailiang Signed-off-by: Gonglei Signed-off-by: Changlong Xie --- block.c | 8 ++-- block/quorum.c| 122 +- include/block/block.h | 4 ++ 3 files changed, 128 insertions(+), 6 deletions(-) diff --git a/block.c b/block.c index a347008..b9e99da 100644 --- a/block.c +++ b/block.c @@ -1196,10 +1196,10 @@ static int bdrv_fill_options(QDict **options, const char *filename, return 0; } -static BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs, -BlockDriverState *child_bs, -const char *child_name, -const BdrvChildRole *child_role) +BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs, + BlockDriverState *child_bs, + const char *child_name, + const BdrvChildRole *child_role) { BdrvChild *child = g_new(BdrvChild, 1); *child = (BdrvChild) { diff --git a/block/quorum.c b/block/quorum.c index 6793f12..e73418c 100644 --- a/block/quorum.c +++ b/block/quorum.c @@ -23,6 +23,7 @@ #include "qapi/qmp/qstring.h" #include "qapi-event.h" #include "crypto/hash.h" +#include "qemu/bitmap.h" #define HASH_LENGTH 32 @@ -80,6 +81,8 @@ typedef struct BDRVQuorumState { bool rewrite_corrupted;/* true if the driver must rewrite-on-read corrupted * block if Quorum is reached. */ +unsigned long *index_bitmap; +int bsize; QuorumReadPattern read_pattern; } BDRVQuorumState; @@ -875,9 +878,9 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags, ret = -EINVAL; goto exit; } -if (s->num_children < 2) { +if (s->num_children < 1) { error_setg(&local_err, - "Number of provided children must be greater than 1"); + "Number of provided children must be 1 or more"); ret = -EINVAL; goto exit; } @@ -926,6 +929,7 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags, /* allocate the children array */ s->children = g_new0(BdrvChild *, s->num_children); opened = g_new0(bool, s->num_children); +s->index_bitmap = bitmap_new(s->num_children); for (i = 0; i < s->num_children; i++) { char indexstr[32]; @@ -941,6 +945,8 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags, opened[i] = true; } +bitmap_set(s->index_bitmap, 0, s->num_children); +s->bsize = s->num_children; g_free(opened); goto exit; @@ -997,6 +1003,115 @@ static void quorum_attach_aio_context(BlockDriverState *bs, } } +static int get_new_child_index(BDRVQuorumState *s) +{ +int index; + +index = find_next_zero_bit(s->index_bitmap, s->bsize, 0); +if (index < s->bsize) { +return index; +} + +if ((s->bsize % BITS_PER_LONG) == 0) { +s->index_bitmap = bitmap_zero_extend(s->index_bitmap, s->bsize, + s->bsize + 1); +} + +return s->bsize++; +} + +static void remove_child_index(BDRVQuorumState *s, int index) +{ +int last_index; +long new_len; + +assert(index < s->bsize); + +clear_bit(index, s->index_bitmap); +if (index < s->bsize - 1) { +/* + * The last bit is always set, and we don't clear + * the last bit. + */ +return; +} + +last_index = find_last_bit(s->index_bitmap, s->bsize); +s->bsize = last_index + 1; +if (BITS_TO_LONGS(last_index + 1) == BITS_TO_LONGS(s->bsize)) { +return; +} + +new_len = BITS_TO_LONGS(last_index + 1) * sizeof(unsigned long); +s->index_bitmap = g_realloc(s->index_bitmap, new_len); +} + +static void quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs, + Error **errp) +{ +BDRVQuorumState *s = bs->opaque; +BdrvChild *child; +char indexstr[32]; +int index, ret; + +index = get_new_child_index(s); +ret = snprintf(indexstr, 32, "children.%d", index); +if (ret < 0 || ret >= 32) { +error_setg(errp, "cannot generate child name"); +return; +} + +bdrv_drain(bs); + +assert(s->num_children <= INT_MAX / sizeof(BdrvChild *)); +if (s->num_children == INT_MAX / sizeof(BdrvChild *)) { +error_setg(errp, "Too many children"); +return; +} +s->children = g_renew(BdrvChild *, s->children, s->num_children + 1); + +bdrv_ref(child_bs); +child = bdrv_attach_child(bs, child_bs, indexstr, &child_format); +s->children[s->num_children++] = child; +set_bit(index, s->index_bitmap); +} + +static void quorum_del_child(BlockDriverState *bs, BlockDriverState *child_bs, + Error **errp) +{
[Qemu-devel] [PATCH v9 0/3] qapi: child add/delete support
If quorum's child is broken, we can use mirror job to replace it. But sometimes, the user only need to remove the broken child, and add it later when the problem is fixed. ChangLog: v9: 1. Rebase to the newest codes 2. Remove redundant codes in quorum_add_child() and quorum_del_child() 3. Fix typos and in qmp-commands.hx v8: 1. Rebase to the newest codes 2. Address the comments from Eric Blake v7: 1. Remove the qmp command x-blockdev-change's parameter operation according to Kevin's comments. 2. Remove the hmp command. v6: 1. Use a single qmp command x-blockdev-change to replace x-blockdev-child-add and x-blockdev-child-delete v5: 1. Address Eric Blake's comments v4: 1. drop nbd driver's implementation. We can use human-monitor-command to do it. 2. Rename the command name. v3: 1. Don't open BDS in bdrv_add_child(). Use the existing BDS which is created by the QMP command blockdev-add. 2. The driver NBD can support filename, path, host:port now. v2: 1. Use bdrv_get_device_or_node_name() instead of new function bdrv_get_id_or_node_name() 2. Update the error message 3. Update the documents in block-core.json Wen Congyang (3): Add new block driver interface to add/delete a BDS's child quorum: implement bdrv_add_child() and bdrv_del_child() qmp: add monitor command to add/remove a child block.c | 58 -- block/quorum.c| 122 +- blockdev.c| 54 include/block/block.h | 9 include/block/block_int.h | 5 ++ qapi/block-core.json | 23 + qmp-commands.hx | 47 ++ 7 files changed, 312 insertions(+), 6 deletions(-) -- 1.9.3
[Qemu-devel] [PATCH v9 1/3] Add new block driver interface to add/delete a BDS's child
From: Wen Congyang In some cases, we want to take a quorum child offline, and take another child online. Signed-off-by: Wen Congyang Signed-off-by: zhanghailiang Signed-off-by: Gonglei Signed-off-by: Changlong Xie Reviewed-by: Eric Blake Reviewed-by: Alberto Garcia --- block.c | 50 +++ include/block/block.h | 5 + include/block/block_int.h | 5 + 3 files changed, 60 insertions(+) diff --git a/block.c b/block.c index 411edbf..a347008 100644 --- a/block.c +++ b/block.c @@ -4320,3 +4320,53 @@ void bdrv_refresh_filename(BlockDriverState *bs) QDECREF(json); } } + +/* + * Hot add/remove a BDS's child. So the user can take a child offline when + * it is broken and take a new child online + */ +void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs, +Error **errp) +{ + +if (!parent_bs->drv || !parent_bs->drv->bdrv_add_child) { +error_setg(errp, "The node %s doesn't support adding a child", + bdrv_get_device_or_node_name(parent_bs)); +return; +} + +if (!QLIST_EMPTY(&child_bs->parents)) { +error_setg(errp, "The node %s already has a parent", + child_bs->node_name); +return; +} + +parent_bs->drv->bdrv_add_child(parent_bs, child_bs, errp); +} + +void bdrv_del_child(BlockDriverState *parent_bs, BlockDriverState *child_bs, +Error **errp) +{ +BdrvChild *child; + +if (!parent_bs->drv || !parent_bs->drv->bdrv_del_child) { +error_setg(errp, "The node %s doesn't support removing a child", + bdrv_get_device_or_node_name(parent_bs)); +return; +} + +QLIST_FOREACH(child, &parent_bs->children, next) { +if (child->bs == child_bs) { +break; +} +} + +if (!child) { +error_setg(errp, "The node %s is not a child of %s", + bdrv_get_device_or_node_name(child_bs), + bdrv_get_device_or_node_name(parent_bs)); +return; +} + +parent_bs->drv->bdrv_del_child(parent_bs, child_bs, errp); +} diff --git a/include/block/block.h b/include/block/block.h index db8e096..863a7c8 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -578,4 +578,9 @@ void bdrv_drained_begin(BlockDriverState *bs); */ void bdrv_drained_end(BlockDriverState *bs); +void bdrv_add_child(BlockDriverState *parent, BlockDriverState *child, +Error **errp); +void bdrv_del_child(BlockDriverState *parent, BlockDriverState *child, +Error **errp); + #endif diff --git a/include/block/block_int.h b/include/block/block_int.h index 256609d..ebe8b1e 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -303,6 +303,11 @@ struct BlockDriver { */ void (*bdrv_drain)(BlockDriverState *bs); +void (*bdrv_add_child)(BlockDriverState *parent, BlockDriverState *child, + Error **errp); +void (*bdrv_del_child)(BlockDriverState *parent, BlockDriverState *child, + Error **errp); + QLIST_ENTRY(BlockDriver) list; }; -- 1.9.3
[Qemu-devel] [PATCH v9 3/3] qmp: add monitor command to add/remove a child
From: Wen Congyang The new QMP command name is x-blockdev-change. It's just for adding/removing quorum's child now, and doesn't support all kinds of children, all kinds of operations, nor all block drivers. So it is experimental now. Signed-off-by: Wen Congyang Signed-off-by: zhanghailiang Signed-off-by: Gonglei Signed-off-by: Changlong Xie --- blockdev.c | 54 qapi/block-core.json | 23 ++ qmp-commands.hx | 47 + 3 files changed, 124 insertions(+) diff --git a/blockdev.c b/blockdev.c index 64dbfeb..4e62fdf 100644 --- a/blockdev.c +++ b/blockdev.c @@ -3836,6 +3836,60 @@ out: aio_context_release(aio_context); } +static BlockDriverState *bdrv_find_child(BlockDriverState *parent_bs, + const char *child_name) +{ +BdrvChild *child; + +QLIST_FOREACH(child, &parent_bs->children, next) { +if (strcmp(child->name, child_name) == 0) { +return child->bs; +} +} + +return NULL; +} + +void qmp_x_blockdev_change(const char *parent, bool has_child, + const char *child, bool has_node, + const char *node, Error **errp) +{ +BlockDriverState *parent_bs, *child_bs = NULL, *new_bs = NULL; + +parent_bs = bdrv_lookup_bs(parent, parent, errp); +if (!parent_bs) { +return; +} + +if (has_child == has_node) { +if (has_child) { +error_setg(errp, "The paramter child and node is conflict"); +} else { +error_setg(errp, "Either child or node should be specified"); +} +return; +} + +if (has_child) { +child_bs = bdrv_find_child(parent_bs, child); +if (!child_bs) { +error_setg(errp, "Node '%s' doesn't have child %s", + parent, child); +return; +} +bdrv_del_child(parent_bs, child_bs, errp); +} + +if (has_node) { +new_bs = bdrv_find_node(node); +if (!new_bs) { +error_setg(errp, "Node '%s' not found", node); +return; +} +bdrv_add_child(parent_bs, new_bs, errp); +} +} + BlockJobInfoList *qmp_query_block_jobs(Error **errp) { BlockJobInfoList *head = NULL, **p_next = &head; diff --git a/qapi/block-core.json b/qapi/block-core.json index 1a5d9ce..fe63c6d 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -2408,3 +2408,26 @@ ## { 'command': 'block-set-write-threshold', 'data': { 'node-name': 'str', 'write-threshold': 'uint64' } } + +## +# @x-blockdev-change +# +# Dynamically reconfigure the block driver state graph. It can be used +# to add, remove, insert or replace a block driver state. Currently only +# the Quorum driver implements this feature to add or remove its child. +# This is useful to fix a broken quorum child. +# +# @parent: the id or name of the node that will be changed. +# +# @child: #optional the name of the child that will be deleted. +# +# @node: #optional the name of the node will be added. +# +# Note: this command is experimental, and its API is not stable. +# +# Since: 2.6 +## +{ 'command': 'x-blockdev-change', + 'data' : { 'parent': 'str', + '*child': 'str', + '*node': 'str' } } diff --git a/qmp-commands.hx b/qmp-commands.hx index 7b235ee..efee0ca 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -4293,6 +4293,53 @@ Example: EQMP { +.name = "x-blockdev-change", +.args_type = "parent:B,child:B?,node:B?", +.mhandler.cmd_new = qmp_marshal_x_blockdev_change, +}, + +SQMP +x-blockdev-change +- + +Dynamically reconfigure the block driver state graph. It can be used to +add, remove, insert, or replace a block driver state. Currently only +the Quorum driver implements this feature to add and remove its child. +This is useful to fix a broken quorum child. + +Arguments: +- "parent": the id or node name of which node will be changed (json-string) +- "child": the child name which will be deleted (json-string, optional) +- "node": the new node-name which will be added (json-string, optional) + +Note: this command is experimental, and not a stable API. It doesn't +support all kinds of operations, all kinds of children, nor all block +drivers. + +Example: + +Add a new node to a quorum +-> { "execute": "blockdev-add", +"arguments": { "options": { "driver": "raw", +"node-name": "new_node", +"id": "test_new_node", +"file": { "driver": "file", + "filename": "test.raw" } } } } +<- { "return": {} } +-> { "execute": "x-blockdev-change", +"arguments": { "parent": "disk1", + "node": "new_node" } } +<- { "return": {} } + +Delete a quorum's node +-> { "execute": "x-blockdev
[Qemu-devel] [Bug 1529226] [NEW] qemu-i386-user on 32-bit Linux: uncaught target signal 11
Public bug reported: Even though the command I'm trying to run (a wrapper script for qemu-i386-user running rustc, the rust compiler) produces the expected compiled output, the build process is interrupted: qemu: uncaught target signal 11 (Segmentation fault) - core dumped i686-unknown-linux-gnu/stage0/bin/rustc: line 1: 7474 Segmentation fault /usr/local/bin/qemu-i386 -cpu qemu32 /home/petevine/stage0/rustc.bin -C target-cpu=pentium2 -L /home/petevine/unpacked/rust-master/i686-unknown-linux-gnu/stage0/lib/rustlib/i686-unknown-linux-gnu/lib/ "$@" make: *** [i686-unknown-linux-gnu/stage0/lib/rustlib/i686-unknown-linux-gnu/lib/stamp.rustc_back] Error 139 The stamp file is not being created so this could be about forking bash after finishing the wrapper script. Qemu was compiled from the latest git source. ** Affects: qemu Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1529226 Title: qemu-i386-user on 32-bit Linux: uncaught target signal 11 Status in QEMU: New Bug description: Even though the command I'm trying to run (a wrapper script for qemu-i386-user running rustc, the rust compiler) produces the expected compiled output, the build process is interrupted: qemu: uncaught target signal 11 (Segmentation fault) - core dumped i686-unknown-linux-gnu/stage0/bin/rustc: line 1: 7474 Segmentation fault /usr/local/bin/qemu-i386 -cpu qemu32 /home/petevine/stage0/rustc.bin -C target-cpu=pentium2 -L /home/petevine/unpacked/rust-master/i686-unknown-linux-gnu/stage0/lib/rustlib/i686-unknown-linux-gnu/lib/ "$@" make: *** [i686-unknown-linux-gnu/stage0/lib/rustlib/i686-unknown-linux-gnu/lib/stamp.rustc_back] Error 139 The stamp file is not being created so this could be about forking bash after finishing the wrapper script. Qemu was compiled from the latest git source. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1529226/+subscriptions
Re: [Qemu-devel] live migration vs device assignment (motivation)
On Fri, Dec 25, 2015 at 03:03:47PM +0800, Lan Tianyu wrote: > Merry Christmas. > Sorry for later response due to personal affair. > > On 2015年12月14日 03:30, Alexander Duyck wrote: > >> > These sounds we need to add a faked bridge for migration and adding a > >> > driver in the guest for it. It also needs to extend PCI bus/hotplug > >> > driver to do pause/resume other devices, right? > >> > > >> > My concern is still that whether we can change PCI bus/hotplug like that > >> > without spec change. > >> > > >> > IRQ should be general for any devices and we may extend it for > >> > migration. Device driver also can make decision to support migration > >> > or not. > > The device should have no say in the matter. Either we are going to > > migrate or we will not. This is why I have suggested my approach as > > it allows for the least amount of driver intrusion while providing the > > maximum number of ways to still perform migration even if the device > > doesn't support it. > > Even if the device driver doesn't support migration, you still want to > migrate VM? That maybe risk and we should add the "bad path" for the > driver at least. > > > > > The solution I have proposed is simple: > > > > 1. Extend swiotlb to allow for a page dirtying functionality. > > > > This part is pretty straight forward. I'll submit a few patches > > later today as RFC that can provided the minimal functionality needed > > for this. > > Very appreciate to do that. > > > > > 2. Provide a vendor specific configuration space option on the QEMU > > implementation of a PCI bridge to act as a bridge between direct > > assigned devices and the host bridge. > > > > My thought was to add some vendor specific block that includes a > > capabilities, status, and control register so you could go through and > > synchronize things like the DMA page dirtying feature. The bridge > > itself could manage the migration capable bit inside QEMU for all > > devices assigned to it. So if you added a VF to the bridge it would > > flag that you can support migration in QEMU, while the bridge would > > indicate you cannot until the DMA page dirtying control bit is set by > > the guest. > > > > We could also go through and optimize the DMA page dirtying after > > this is added so that we can narrow down the scope of use, and as a > > result improve the performance for other devices that don't need to > > support migration. It would then be a matter of adding an interrupt > > in the device to handle an event such as the DMA page dirtying status > > bit being set in the config space status register, while the bit is > > not set in the control register. If it doesn't get set then we would > > have to evict the devices before the warm-up phase of the migration, > > otherwise we can defer it until the end of the warm-up phase. > > > > 3. Extend existing shpc driver to support the optional "pause" > > functionality as called out in section 4.1.2 of the Revision 1.1 PCI > > hot-plug specification. > > Since your solution has added a faked PCI bridge. Why not notify the > bridge directly during migration via irq and call device driver's > callback in the new bridge driver? > > Otherwise, the new bridge driver also can check whether the device > driver provides migration callback or not and call them to improve the > passthough device's performance during migration. As long as you keep up this vague talk about performance during migration, without even bothering with any measurements, this patchset will keep going nowhere. There's Alex's patch that tracks memory changes during migration. It needs some simple enhancements to be useful in production (e.g. add a host/guest handshake to both enable tracking in guest and to detect the support in host), then it can allow starting migration with an assigned device, by invoking hot-unplug after most of memory have been migrated. Please implement this in qemu and measure the speed. I will not be surprised if destroying/creating netdev in linux turns out to take too long, but before anyone bothered checking, it does not make sense to discuss further enhancements. > > > > Note I call out "extend" here instead of saying to add this. > > Basically what we should do is provide a means of quiescing the device > > without unloading the driver. This is called out as something the OS > > vendor can optionally implement in the PCI hot-plug specification. On > > OSes that wouldn't support this it would just be treated as a standard > > hot-plug event. We could add a capability, status, and control bit > > in the vendor specific configuration block for this as well and if we > > set the status bit would indicate the host wants to pause instead of > > remove and the control bit would indicate the guest supports "pause" > > in the OS. We then could optionally disable guest migration while the > > VF is present and pause is not supported. > > > > To support this we would need
Re: [Qemu-devel] [PATCH v4 1/4] target-tilegx: Add floating point shared functions
On 12/25/15 04:01, Richard Henderson wrote: > On 12/24/2015 07:38 AM, Chen Gang wrote: >> >> OK, thanks. Since fp_status need to be initialized to be 0, so I will >> declared it statically, too (need we consider about thread safe for it? >> I guess not). > > While qemu is not currently thread-safe, there's work going on to make that > happen. There is no need to exacerbate the problem. > OK, thanks. > Also, I think using an on-stack automatic variable, initialized each time, > emphasizes the fact there there is no state that is preserved across > operations. > > This should really be as simple as > > float_status fp_status = { > .float_rounding_mode = float_round_nearest_even > }; > > (I realize float_round_nearest_even is *also* zero, but humor me. At least > the other members are either flags or booleans.) > OK, thanks. -- Chen Gang (陈刚) Open, share, and attitude like air, water, and life which God blessed
[Qemu-devel] [PATCH v13 02/10] Store parent BDS in BdrvChild
From: Wen Congyang We need to access the parent BDS to get the root BDS. Signed-off-by: Wen Congyang Signed-off-by: Changlong Xie --- block.c | 1 + include/block/block_int.h | 1 + 2 files changed, 2 insertions(+) diff --git a/block.c b/block.c index 1589c0d..c9c913e 100644 --- a/block.c +++ b/block.c @@ -1204,6 +1204,7 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs, BdrvChild *child = g_new(BdrvChild, 1); *child = (BdrvChild) { .bs = child_bs, +.parent = parent_bs, .name = g_strdup(child_name), .role = child_role, }; diff --git a/include/block/block_int.h b/include/block/block_int.h index ebe8b1e..19c02b6 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -361,6 +361,7 @@ extern const BdrvChildRole child_format; struct BdrvChild { BlockDriverState *bs; +BlockDriverState *parent; char *name; const BdrvChildRole *role; QLIST_ENTRY(BdrvChild) next; -- 1.9.3
[Qemu-devel] [PATCH v13 03/10] Backup: clear all bitmap when doing block checkpoint
From: Wen Congyang Signed-off-by: Wen Congyang Signed-off-by: zhanghailiang Signed-off-by: Gonglei Signed-off-by: Changlong Xie Reviewed-by: Jeff Cody --- block/backup.c | 14 ++ blockjob.c | 11 +++ include/block/blockjob.h | 12 3 files changed, 37 insertions(+) diff --git a/block/backup.c b/block/backup.c index 705bb77..0a27d01 100644 --- a/block/backup.c +++ b/block/backup.c @@ -253,11 +253,25 @@ static void backup_abort(BlockJob *job) } } +static void backup_do_checkpoint(BlockJob *job, Error **errp) +{ +BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common); + +if (backup_job->sync_mode != MIRROR_SYNC_MODE_NONE) { +error_setg(errp, "The backup job only supports block checkpoint in" + " sync=none mode"); +return; +} + +hbitmap_reset_all(backup_job->bitmap); +} + static const BlockJobDriver backup_job_driver = { .instance_size = sizeof(BackupBlockJob), .job_type = BLOCK_JOB_TYPE_BACKUP, .set_speed = backup_set_speed, .iostatus_reset = backup_iostatus_reset, +.do_checkpoint = backup_do_checkpoint, .commit = backup_commit, .abort = backup_abort, }; diff --git a/blockjob.c b/blockjob.c index 80adb9d..0c8edfe 100644 --- a/blockjob.c +++ b/blockjob.c @@ -533,3 +533,14 @@ void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job) QLIST_INSERT_HEAD(&txn->jobs, job, txn_list); block_job_txn_ref(txn); } + +void block_job_do_checkpoint(BlockJob *job, Error **errp) +{ +if (!job->driver->do_checkpoint) { +error_setg(errp, "The job %s doesn't support block checkpoint", + BlockJobType_lookup[job->driver->job_type]); +return; +} + +job->driver->do_checkpoint(job, errp); +} diff --git a/include/block/blockjob.h b/include/block/blockjob.h index d84ccd8..abdba7c 100644 --- a/include/block/blockjob.h +++ b/include/block/blockjob.h @@ -70,6 +70,9 @@ typedef struct BlockJobDriver { * never both. */ void (*abort)(BlockJob *job); + +/** Optional callback for job types that support checkpoint. */ +void (*do_checkpoint)(BlockJob *job, Error **errp); } BlockJobDriver; /** @@ -443,4 +446,13 @@ void block_job_txn_unref(BlockJobTxn *txn); */ void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job); +/** + * block_job_do_checkpoint: + * @job: The job. + * @errp: Error object. + * + * Do block checkpoint on the specified job. + */ +void block_job_do_checkpoint(BlockJob *job, Error **errp); + #endif -- 1.9.3
[Qemu-devel] [PATCH v13 01/10] unblock backup operations in backing file
From: Wen Congyang Signed-off-by: Wen Congyang Signed-off-by: Changlong Xie --- block.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/block.c b/block.c index b9e99da..1589c0d 100644 --- a/block.c +++ b/block.c @@ -1275,6 +1275,24 @@ void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd) /* Otherwise we won't be able to commit due to check in bdrv_commit */ bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_COMMIT_TARGET, bs->backing_blocker); +/* + * We do backup in 3 ways: + * 1. drive backup + *The target bs is new opened, and the source is top BDS + * 2. blockdev backup + *Both the source and the target are top BDSes. + * 3. internal backup(used for block replication) + *Both the source and the target are backing file + * + * In case 1, and 2, the backing file is neither the source nor + * the target. + * In case 3, we will block the top BDS, so there is only one block + * job for the top BDS and its backing chain. + */ +bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_SOURCE, +bs->backing_blocker); +bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_TARGET, +bs->backing_blocker); out: bdrv_refresh_limits(bs, NULL); } -- 1.9.3
[Qemu-devel] [PATCH v13 04/10] Allow creating backup jobs when opening BDS
From: Wen Congyang When opening BDS, we need to create backup jobs for image-fleecing. Signed-off-by: Wen Congyang Signed-off-by: zhanghailiang Signed-off-by: Gonglei Signed-off-by: Changlong Xie Reviewed-by: Stefan Hajnoczi Reviewed-by: Jeff Cody --- block/Makefile.objs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/Makefile.objs b/block/Makefile.objs index 58ef2ef..fa05f37 100644 --- a/block/Makefile.objs +++ b/block/Makefile.objs @@ -22,10 +22,10 @@ block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o block-obj-$(CONFIG_LIBSSH2) += ssh.o block-obj-y += accounting.o block-obj-y += write-threshold.o +block-obj-y += backup.o common-obj-y += stream.o common-obj-y += commit.o -common-obj-y += backup.o iscsi.o-cflags := $(LIBISCSI_CFLAGS) iscsi.o-libs := $(LIBISCSI_LIBS) -- 1.9.3
[Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints
Block replication is a very important feature which is used for continuous checkpoints(for example: COLO). You can get the detailed information about block replication from here: http://wiki.qemu.org/Features/BlockReplication Usage: Please refer to docs/block-replication.txt This patch series is based on the following patch series: 1. http://lists.nongnu.org/archive/html/qemu-devel/2015-12/msg04570.html You can get the patch here: https://github.com/Pating/qemu/tree/changlox/block-replication-v13 You can get the patch with framework here: https://github.com/Pating/qemu/tree/changlox/colo_framework_v12 TODO: 1. Continuous block replication. It will be started after basic functions are accepted. Changs Log: V13: 1. Rebase to the newest codes 2. Remove redundant marcos and semicolon in replication.c 3. Fix typos in block-replication.txt V12: 1. Rebase to the newest codes 2. Use backing reference to replcace 'allow-write-backing-file' V11: 1. Reopen the backing file when starting blcok replication if it is not opened in R/W mode 2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET when opening backing file 3. Block the top BDS so there is only one block job for the top BDS and its backing chain. V10: 1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing reference. 2. Address the comments from Eric Blake V9: 1. Update the error messages 2. Rebase to the newest qemu 3. Split child add/delete support. These patches are sent in another patchset. V8: 1. Address Alberto Garcia's comments V7: 1. Implement adding/removing quorum child. Remove the option non-connect. 2. Simplify the backing refrence option according to Stefan Hajnoczi's suggestion V6: 1. Rebase to the newest qemu. V5: 1. Address the comments from Gong Lei 2. Speed the failover up. The secondary vm can take over very quickly even if there are too many I/O requests. V4: 1. Introduce a new driver replication to avoid touch nbd and qcow2. V3: 1: use error_setg() instead of error_set() 2. Add a new block job API 3. Active disk, hidden disk and nbd target uses the same AioContext 4. Add a testcase to test new hbitmap API V2: 1. Redesign the secondary qemu(use image-fleecing) 2. Use Error objects to return error message 3. Address the comments from Max Reitz and Eric Blake Wen Congyang (10): unblock backup operations in backing file Store parent BDS in BdrvChild Backup: clear all bitmap when doing block checkpoint Allow creating backup jobs when opening BDS docs: block replication's description Add new block driver interfaces to control block replication quorum: implement block driver interfaces for block replication Implement new driver for block replication support replication driver in blockdev-add Add a new API to start/stop replication, do checkpoint to all BDSes block.c| 145 block/Makefile.objs| 3 +- block/backup.c | 14 ++ block/quorum.c | 78 +++ block/replication.c| 545 + blockjob.c | 11 + docs/block-replication.txt | 227 +++ include/block/block.h | 9 + include/block/block_int.h | 15 ++ include/block/blockjob.h | 12 + qapi/block-core.json | 33 ++- 11 files changed, 1089 insertions(+), 3 deletions(-) create mode 100644 block/replication.c create mode 100644 docs/block-replication.txt -- 1.9.3
[Qemu-devel] [PATCH v13 09/10] support replication driver in blockdev-add
From: Wen Congyang Signed-off-by: Wen Congyang Signed-off-by: zhanghailiang Signed-off-by: Gonglei Signed-off-by: Changlong Xie Reviewed-by: Eric Blake --- qapi/block-core.json | 20 ++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/qapi/block-core.json b/qapi/block-core.json index 610da92..7354c6a 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -220,6 +220,7 @@ # 2.2: 'archipelago' added, 'cow' dropped # 2.3: 'host_floppy' deprecated # 2.5: 'host_floppy' dropped +# 2.6: 'replication' added # # @backing_file: #optional the name of the backing file (for copy-on-write) # @@ -1492,6 +1493,7 @@ # Drivers that are supported in block device operations. # # @host_device, @host_cdrom: Since 2.1 +# @replication: Since 2.6 # # Since: 2.0 ## @@ -1499,8 +1501,8 @@ 'data': [ 'archipelago', 'blkdebug', 'blkverify', 'bochs', 'cloop', 'dmg', 'file', 'ftp', 'ftps', 'host_cdrom', 'host_device', 'http', 'https', 'null-aio', 'null-co', 'parallels', -'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'tftp', 'vdi', 'vhdx', -'vmdk', 'vpc', 'vvfat' ] } +'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'replication', +'tftp', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] } ## # @BlockdevOptionsBase @@ -1940,6 +1942,19 @@ { 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] } ## +# @BlockdevOptionsReplication +# +# Driver specific block device options for replication +# +# @mode: the replication mode +# +# Since: 2.6 +## +{ 'struct': 'BlockdevOptionsReplication', + 'base': 'BlockdevOptionsGenericFormat', + 'data': { 'mode': 'ReplicationMode' } } + +## # @BlockdevOptions # # Options for creating a block device. @@ -1976,6 +1991,7 @@ 'quorum': 'BlockdevOptionsQuorum', 'raw':'BlockdevOptionsGenericFormat', # TODO rbd: Wait for structured options + 'replication':'BlockdevOptionsReplication', # TODO sheepdog: Wait for structured options # TODO ssh: Should take InetSocketAddress for 'host'? 'tftp': 'BlockdevOptionsFile', -- 1.9.3
[Qemu-devel] [PATCH v13 07/10] quorum: implement block driver interfaces for block replication
From: Wen Congyang Signed-off-by: Wen Congyang Signed-off-by: zhanghailiang Signed-off-by: Gonglei Signed-off-by: Changlong Xie Reviewed-by: Alberto Garcia --- block/quorum.c | 78 ++ 1 file changed, 78 insertions(+) diff --git a/block/quorum.c b/block/quorum.c index e73418c..aa8c4dd 100644 --- a/block/quorum.c +++ b/block/quorum.c @@ -85,6 +85,8 @@ typedef struct BDRVQuorumState { int bsize; QuorumReadPattern read_pattern; + +int replication_index; /* store which child supports block replication */ } BDRVQuorumState; typedef struct QuorumAIOCB QuorumAIOCB; @@ -949,6 +951,7 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags, s->bsize = s->num_children; g_free(opened); +s->replication_index = -1; goto exit; close_exit: @@ -1146,6 +1149,77 @@ static void quorum_refresh_filename(BlockDriverState *bs, QDict *options) bs->full_open_options = opts; } +static void quorum_start_replication(BlockDriverState *bs, ReplicationMode mode, + Error **errp) +{ +BDRVQuorumState *s = bs->opaque; +int count = 0, i, index; +Error *local_err = NULL; + +/* + * TODO: support REPLICATION_MODE_SECONDARY if we allow secondary + * QEMU becoming primary QEMU. + */ +if (mode != REPLICATION_MODE_PRIMARY) { +error_setg(errp, "The replication mode for quorum should be 'primary'"); +return; +} + +if (s->read_pattern != QUORUM_READ_PATTERN_FIFO) { +error_setg(errp, "Block replication needs read pattern 'fifo'"); +return; +} + +for (i = 0; i < s->num_children; i++) { +bdrv_start_replication(s->children[i]->bs, mode, &local_err); +if (local_err) { +error_free(local_err); +local_err = NULL; +} else { +count++; +index = i; +} +} + +if (count == 0) { +error_setg(errp, "No child supports block replication"); +} else if (count > 1) { +for (i = 0; i < s->num_children; i++) { +bdrv_stop_replication(s->children[i]->bs, false, NULL); +} +error_setg(errp, "Too many children support block replication"); +} else { +s->replication_index = index; +} +} + +static void quorum_do_checkpoint(BlockDriverState *bs, Error **errp) +{ +BDRVQuorumState *s = bs->opaque; + +if (s->replication_index < 0) { +error_setg(errp, "Block replication is not running"); +return; +} + +bdrv_do_checkpoint(s->children[s->replication_index]->bs, errp); +} + +static void quorum_stop_replication(BlockDriverState *bs, bool failover, +Error **errp) +{ +BDRVQuorumState *s = bs->opaque; + +if (s->replication_index < 0) { +error_setg(errp, "Block replication is not running"); +return; +} + +bdrv_stop_replication(s->children[s->replication_index]->bs, failover, + errp); +s->replication_index = -1; +} + static BlockDriver bdrv_quorum = { .format_name= "quorum", .protocol_name = "quorum", @@ -1172,6 +1246,10 @@ static BlockDriver bdrv_quorum = { .is_filter = true, .bdrv_recurse_is_first_non_filter = quorum_recurse_is_first_non_filter, + +.bdrv_start_replication = quorum_start_replication, +.bdrv_do_checkpoint = quorum_do_checkpoint, +.bdrv_stop_replication = quorum_stop_replication, }; static void bdrv_quorum_init(void) -- 1.9.3
[Qemu-devel] [PATCH v13 06/10] Add new block driver interfaces to control block replication
From: Wen Congyang Signed-off-by: Wen Congyang Signed-off-by: zhanghailiang Signed-off-by: Gonglei Signed-off-by: Changlong Xie Cc: Luiz Capitulino Cc: Michael Roth Reviewed-by: Paolo Bonzini --- block.c | 43 +++ include/block/block.h | 5 + include/block/block_int.h | 14 ++ qapi/block-core.json | 13 + 4 files changed, 75 insertions(+) diff --git a/block.c b/block.c index c9c913e..275d8b4 100644 --- a/block.c +++ b/block.c @@ -4389,3 +4389,46 @@ void bdrv_del_child(BlockDriverState *parent_bs, BlockDriverState *child_bs, parent_bs->drv->bdrv_del_child(parent_bs, child_bs, errp); } + +void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode, +Error **errp) +{ +BlockDriver *drv = bs->drv; + +if (drv && drv->bdrv_start_replication) { +drv->bdrv_start_replication(bs, mode, errp); +} else if (bs->file) { +bdrv_start_replication(bs->file->bs, mode, errp); +} else { +error_setg(errp, "The BDS %s doesn't support starting block" + " replication", bs->filename); +} +} + +void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp) +{ +BlockDriver *drv = bs->drv; + +if (drv && drv->bdrv_do_checkpoint) { +drv->bdrv_do_checkpoint(bs, errp); +} else if (bs->file) { +bdrv_do_checkpoint(bs->file->bs, errp); +} else { +error_setg(errp, "The BDS %s doesn't support block checkpoint", + bs->filename); +} +} + +void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp) +{ +BlockDriver *drv = bs->drv; + +if (drv && drv->bdrv_stop_replication) { +drv->bdrv_stop_replication(bs, failover, errp); +} else if (bs->file) { +bdrv_stop_replication(bs->file->bs, failover, errp); +} else { +error_setg(errp, "The BDS %s doesn't support stopping block" + " replication", bs->filename); +} +} diff --git a/include/block/block.h b/include/block/block.h index 6c7e54b..5d47cef 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -587,4 +587,9 @@ void bdrv_add_child(BlockDriverState *parent, BlockDriverState *child, void bdrv_del_child(BlockDriverState *parent, BlockDriverState *child, Error **errp); +void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode, +Error **errp); +void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp); +void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp); + #endif diff --git a/include/block/block_int.h b/include/block/block_int.h index 19c02b6..e31f9db 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -308,6 +308,20 @@ struct BlockDriver { void (*bdrv_del_child)(BlockDriverState *parent, BlockDriverState *child, Error **errp); +void (*bdrv_start_replication)(BlockDriverState *bs, ReplicationMode mode, + Error **errp); +/* Drop Disk buffer when doing checkpoint. */ +void (*bdrv_do_checkpoint)(BlockDriverState *bs, Error **errp); +/* + * After failover, we should flush Disk buffer into secondary disk + * and stop block replication. + * + * If the guest is shutdown, we should drop Disk buffer and stop + * block representation. + */ +void (*bdrv_stop_replication)(BlockDriverState *bs, bool failover, + Error **errp); + QLIST_ENTRY(BlockDriver) list; }; diff --git a/qapi/block-core.json b/qapi/block-core.json index fe63c6d..610da92 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -1927,6 +1927,19 @@ '*read-pattern': 'QuorumReadPattern' } } ## +# @ReplicationMode +# +# An enumeration of replication modes. +# +# @primary: Primary mode, the vm's state will be sent to secondary QEMU. +# +# @secondary: Secondary mode, receive the vm's state from primary QEMU. +# +# Since: 2.6 +## +{ 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] } + +## # @BlockdevOptions # # Options for creating a block device. -- 1.9.3
[Qemu-devel] [PATCH v13 05/10] docs: block replication's description
From: Wen Congyang Signed-off-by: Wen Congyang Signed-off-by: zhanghailiang Signed-off-by: Gonglei Signed-off-by: Changlong Xie --- docs/block-replication.txt | 227 + 1 file changed, 227 insertions(+) create mode 100644 docs/block-replication.txt diff --git a/docs/block-replication.txt b/docs/block-replication.txt new file mode 100644 index 000..73abb65 --- /dev/null +++ b/docs/block-replication.txt @@ -0,0 +1,227 @@ +Block replication + +Copyright Fujitsu, Corp. 2015 +Copyright (c) 2015 Intel Corporation +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD. + +This work is licensed under the terms of the GNU GPL, version 2 or later. +See the COPYING file in the top-level directory. + +Block replication is used for continuous checkpoints. It is designed +for COLO (COarse-grain LOck-stepping) where the Secondary VM is running. +It can also be applied for FT/HA (Fault-tolerance/High Assurance) scenario, +where the Secondary VM is not running. + +This document gives an overview of block replication's design. + +== Background == +High availability solutions such as micro checkpoint and COLO will do +consecutive checkpoints. The VM state of Primary VM and Secondary VM is +identical right after a VM checkpoint, but becomes different as the VM +executes till the next checkpoint. To support disk contents checkpoint, +the modified disk contents in the Secondary VM must be buffered, and are +only dropped at next checkpoint time. To reduce the network transportation +effort at the time of checkpoint, the disk modification operations of +Primary disk are asynchronously forwarded to the Secondary node. + +== Workflow == +The following is the image of block replication workflow: + ++--+++ +|Primary Write Requests||Secondary Write Requests| ++--+++ + | | + | (4) + | V + | /-\ + | Copy and Forward| | + |-(1)--+ | Disk Buffer | + | | | | + | (3) \-/ + | speculative ^ + |write through(2) + | | | + V V | + +--+ ++ + | Primary Disk | | Secondary Disk | + +--+ ++ + +1) Primary write requests will be copied and forwarded to Secondary + QEMU. +2) Before Primary write requests are written to Secondary disk, the + original sector content will be read from Secondary disk and + buffered in the Disk buffer, but it will not overwrite the existing + sector content (it could be from either "Secondary Write Requests" or + previous COW of "Primary Write Requests") in the Disk buffer. +3) Primary write requests will be written to Secondary disk. +4) Secondary write requests will be buffered in the Disk buffer and it + will overwrite the existing sector content in the buffer. + +== Architecture == +We are going to implement block replication from many basic +blocks that are already in QEMU. + + virtio-blk || + ^||.-- + |||| Secondary +1 Quorum ||'-- + / \ || +/\|| + Primary2 filter + disk ^ virtio-blk + | ^ +3 NBD ---> 3 NBD | +client|| server 2 filter + ||^ ^ +. ||| | +Primary | || Secondary disk <- hidden-disk 5 <- active-disk 4 +' ||| backing^ backing + ||| | + ||| | + ||'-' + || drive-backup sync=none 6
[Qemu-devel] [PATCH v13 08/10] Implement new driver for block replication
From: Wen Congyang Signed-off-by: Wen Congyang Signed-off-by: zhanghailiang Signed-off-by: Gonglei Signed-off-by: Changlong Xie --- block/Makefile.objs | 1 + block/replication.c | 545 2 files changed, 546 insertions(+) create mode 100644 block/replication.c diff --git a/block/Makefile.objs b/block/Makefile.objs index fa05f37..94c1d03 100644 --- a/block/Makefile.objs +++ b/block/Makefile.objs @@ -23,6 +23,7 @@ block-obj-$(CONFIG_LIBSSH2) += ssh.o block-obj-y += accounting.o block-obj-y += write-threshold.o block-obj-y += backup.o +block-obj-y += replication.o common-obj-y += stream.o common-obj-y += commit.o diff --git a/block/replication.c b/block/replication.c new file mode 100644 index 000..6a061c9 --- /dev/null +++ b/block/replication.c @@ -0,0 +1,545 @@ +/* + * Replication Block filter + * + * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD. + * Copyright (c) 2015 Intel Corporation + * Copyright (c) 2015 FUJITSU LIMITED + * + * Author: + * Wen Congyang + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include "qemu-common.h" +#include "block/block_int.h" +#include "block/blockjob.h" +#include "block/nbd.h" + +typedef struct BDRVReplicationState { +ReplicationMode mode; +int replication_state; +BlockDriverState *active_disk; +BlockDriverState *hidden_disk; +BlockDriverState *secondary_disk; +BlockDriverState *top_bs; +Error *blocker; +int orig_hidden_flags; +int orig_secondary_flags; +int error; +} BDRVReplicationState; + +enum { +BLOCK_REPLICATION_NONE, /* block replication is not started */ +BLOCK_REPLICATION_RUNNING, /* block replication is running */ +BLOCK_REPLICATION_DONE, /* block replication is done(failover) */ +}; + +static void replication_stop(BlockDriverState *bs, bool failover, Error **errp); + +#define REPLICATION_MODE"mode" +static QemuOptsList replication_runtime_opts = { +.name = "replication", +.head = QTAILQ_HEAD_INITIALIZER(replication_runtime_opts.head), +.desc = { +{ +.name = REPLICATION_MODE, +.type = QEMU_OPT_STRING, +}, +{ /* end of list */ } +}, +}; + +static int replication_open(BlockDriverState *bs, QDict *options, +int flags, Error **errp) +{ +int ret; +BDRVReplicationState *s = bs->opaque; +Error *local_err = NULL; +QemuOpts *opts = NULL; +const char *mode; + +ret = -EINVAL; +opts = qemu_opts_create(&replication_runtime_opts, NULL, 0, &error_abort); +qemu_opts_absorb_qdict(opts, options, &local_err); +if (local_err) { +goto fail; +} + +mode = qemu_opt_get(opts, REPLICATION_MODE); +if (!mode) { +error_setg(&local_err, "Missing the option mode"); +goto fail; +} + +if (!strcmp(mode, "primary")) { +s->mode = REPLICATION_MODE_PRIMARY; +} else if (!strcmp(mode, "secondary")) { +s->mode = REPLICATION_MODE_SECONDARY; +} else { +error_setg(&local_err, + "The option mode's value should be primary or secondary"); +goto fail; +} + +ret = 0; + +fail: +qemu_opts_del(opts); +/* propagate error */ +if (local_err) { +error_propagate(errp, local_err); +} +return ret; +} + +static void replication_close(BlockDriverState *bs) +{ +BDRVReplicationState *s = bs->opaque; + +if (s->replication_state == BLOCK_REPLICATION_RUNNING) { +replication_stop(bs, false, NULL); +} +} + +static int64_t replication_getlength(BlockDriverState *bs) +{ +return bdrv_getlength(bs->file->bs); +} + +static int replication_get_io_status(BDRVReplicationState *s) +{ +switch (s->replication_state) { +case BLOCK_REPLICATION_NONE: +return -EIO; +case BLOCK_REPLICATION_RUNNING: +return 0; +case BLOCK_REPLICATION_DONE: +return s->mode == REPLICATION_MODE_PRIMARY ? -EIO : 1; +default: +abort(); +} +} + +static int replication_return_value(BDRVReplicationState *s, int ret) +{ +if (s->mode == REPLICATION_MODE_SECONDARY) { +return ret; +} + +if (ret < 0) { +s->error = ret; +ret = 0; +} + +return ret; +} + +static coroutine_fn int replication_co_readv(BlockDriverState *bs, + int64_t sector_num, + int remaining_sectors, + QEMUIOVector *qiov) +{ +BDRVReplicationState *s = bs->opaque; +int ret; + +if (s->mode == REPLICATION_MODE_PRIMARY) { +/* We only use it to forward primary write requests */ +return -EIO; +} + +ret = replication_get_io_status(s); +if (ret < 0) { +return ret; +} + +/* + * After failover, because we don't
[Qemu-devel] [PATCH v13 10/10] Add a new API to start/stop replication, do checkpoint to all BDSes
From: Wen Congyang Signed-off-by: Wen Congyang Signed-off-by: zhanghailiang Signed-off-by: Gonglei Signed-off-by: Changlong Xie --- block.c | 83 +++ include/block/block.h | 4 +++ 2 files changed, 87 insertions(+) diff --git a/block.c b/block.c index 275d8b4..634cc97 100644 --- a/block.c +++ b/block.c @@ -4432,3 +4432,86 @@ void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp) " replication", bs->filename); } } + +void bdrv_start_replication_all(ReplicationMode mode, Error **errp) +{ +BlockDriverState *bs = NULL, *temp = NULL; +Error *local_err = NULL; + +while ((bs = bdrv_next(bs))) { +if (!QLIST_EMPTY(&bs->parents)) { +/* It is not top BDS */ +continue; +} + +if (bdrv_is_read_only(bs) || !bdrv_is_inserted(bs)) { +continue; +} + +bdrv_start_replication(bs, mode, &local_err); +if (local_err) { +error_propagate(errp, local_err); +goto fail; +} +} + +return; + +fail: +while ((temp = bdrv_next(temp)) && bs != temp) { +bdrv_stop_replication(temp, false, NULL); +} +} + +void bdrv_do_checkpoint_all(Error **errp) +{ +BlockDriverState *bs = NULL; +Error *local_err = NULL; + +while ((bs = bdrv_next(bs))) { +if (!QLIST_EMPTY(&bs->parents)) { +/* It is not top BDS */ +continue; +} + +if (bdrv_is_read_only(bs) || !bdrv_is_inserted(bs)) { +continue; +} + +bdrv_do_checkpoint(bs, &local_err); +if (local_err) { +error_propagate(errp, local_err); +return; +} +} +} + +void bdrv_stop_replication_all(bool failover, Error **errp) +{ +BlockDriverState *bs = NULL; +Error *local_err = NULL; + +while ((bs = bdrv_next(bs))) { +if (!QLIST_EMPTY(&bs->parents)) { +/* It is not top BDS */ +continue; +} + +if (bdrv_is_read_only(bs) || !bdrv_is_inserted(bs)) { +continue; +} + +bdrv_stop_replication(bs, failover, &local_err); +if (!errp) { +/* + * The caller doesn't care the result, they just + * want to stop all block's replication. + */ +continue; +} +if (local_err) { +error_propagate(errp, local_err); +return; +} +} +} diff --git a/include/block/block.h b/include/block/block.h index 5d47cef..9c4de14 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -592,4 +592,8 @@ void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode, void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp); void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp); +void bdrv_start_replication_all(ReplicationMode mode, Error **errp); +void bdrv_do_checkpoint_all(Error **errp); +void bdrv_stop_replication_all(bool failover, Error **errp); + #endif -- 1.9.3
[Qemu-devel] [PATCH] Add missing syscall nrs. This updates the QEMU syscall tables to more recent Linux kernels.
This change covers arm, aarch64, mips. Others to follow? The change was prompted by QEMU warning about a syscall 384 (get_random()) with Debian armhf binaries (ARMv7). Signed-off-by: Johan Ouwerkerk --- linux-user/aarch64/syscall_nr.h | 13 + linux-user/arm/syscall_nr.h | 12 linux-user/mips/syscall_nr.h| 12 3 files changed, 37 insertions(+) diff --git a/linux-user/aarch64/syscall_nr.h b/linux-user/aarch64/syscall_nr.h index 743255d..74f4275 100644 --- a/linux-user/aarch64/syscall_nr.h +++ b/linux-user/aarch64/syscall_nr.h @@ -262,6 +262,19 @@ #define TARGET_NR_process_vm_writev 271 #define TARGET_NR_kcmp 272 #define TARGET_NR_finit_module 273 + +#define TARGET_NR_sched_setattr 274 +#define TARGET_NR_sched_getattr 275 +#define TARGET_NR_renameat2 276 +#define TARGET_NR_seccomp 277 +#define TARGET_NR_getrandom 278 +#define TARGET_NR_memfd_create 279 +#define TARGET_NR_bpf 280 +#define TARGET_NR_execveat 281 +#define TARGET_NR_userfaultfd 282 +#define TARGET_NR_membarrier 283 +#define TARGET_NR_mlock2 284 + #define TARGET_NR_open 1024 #define TARGET_NR_link 1025 #define TARGET_NR_unlink 1026 diff --git a/linux-user/arm/syscall_nr.h b/linux-user/arm/syscall_nr.h index 53552be..cc9089c 100644 --- a/linux-user/arm/syscall_nr.h +++ b/linux-user/arm/syscall_nr.h @@ -384,3 +384,15 @@ #define TARGET_NR_process_vm_writev(377) #define TARGET_NR_kcmp (378) #define TARGET_NR_finit_module (379) + +#define TARGET_NR_sched_setattr(380) +#define TARGET_NR_sched_getattr(381) +#define TARGET_NR_renameat2(382) +#define TARGET_NR_seccomp (383) +#define TARGET_NR_getrandom(384) +#define TARGET_NR_memfd_create (385) +#define TARGET_NR_bpf (386) +#define TARGET_NR_execveat (387) +#define TARGET_NR_userfaultfd (388) +#define TARGET_NR_membarrier (389) +#define TARGET_NR_mlock2 (390) diff --git a/linux-user/mips/syscall_nr.h b/linux-user/mips/syscall_nr.h index 2d1a13e..6819f86 100644 --- a/linux-user/mips/syscall_nr.h +++ b/linux-user/mips/syscall_nr.h @@ -351,3 +351,15 @@ #define TARGET_NR_process_vm_writev (TARGET_NR_Linux + 346) #define TARGET_NR_kcmp (TARGET_NR_Linux + 347) #define TARGET_NR_finit_module (TARGET_NR_Linux + 348) + +#define TARGET_NR_sched_setattr (TARGET_NR_Linux + 349) +#define TARGET_NR_sched_getattr (TARGET_NR_Linux + 350) +#define TARGET_NR_renameat2 (TARGET_NR_Linux + 351) +#define TARGET_NR_seccomp (TARGET_NR_Linux + 352) +#define TARGET_NR_getrandom (TARGET_NR_Linux + 353) +#define TARGET_NR_memfd_create (TARGET_NR_Linux + 354) +#define TARGET_NR_bpf (TARGET_NR_Linux + 355) +#define TARGET_NR_execveat (TARGET_NR_Linux + 356) +#define TARGET_NR_userfaultfd (TARGET_NR_Linux + 357) +#define TARGET_NR_membarrier(TARGET_NR_Linux + 358) +#define TARGET_NR_mlock2(TARGET_NR_Linux + 359) -- 2.6.4
[Qemu-devel] git send-email didn't arrive?
Recently I attempted to post a patch to the qemu-devel mailing list through git send-email (as per the wiki) and while a copy of the e-mail has hit my inbox, it doesn't seem to have hit the mailing list judging by checking gmane. Something seems to have gone wrong. The subject of the mail was: [PATCH] Add missing syscall nrs. This updates the QEMU syscall tables to more recent Linux kernels. I used git format-patch with --stdout -1 -s > syscalls_v1.patch I used git send-email with --to=qemu-devel@nongnu.org --cc=qemu-triv...@nongnu.org syscalls_v1.patch I have git sendemail set up as follows: from = Johan Ouwerkerk smtpserver = smtp.gmail.com smtpuser = jm.ouwerk...@gmail.com smtpencryption = tls suppresscc = self (FWIW the mail was delivered with SMTP status code of 250, i.e. OK). Does anything stand out as obviously wrong? And if so what should I have done/used instead? Regards, -Johan Ouwerkerk
Re: [Qemu-devel] git send-email didn't arrive?
Scratch this? It seems to have arrived just now, finally, having taken a mere 16 hours and 4 minutes to get there... ?
[Qemu-devel] [Bug 1529187] Re: vfio passtrhough fails at 'No available IOMMU models' on Intel BDW-EP platform
You've somehow managed to not load the vfio_iommu_type1 module. The vfio module will request it when loading, if the module is not available when loading, such as from an initramfs that does not include the full set of vfio modules, it will need to be loaded later manually. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1529187 Title: vfio passtrhough fails at 'No available IOMMU models' on Intel BDW-EP platform Status in QEMU: New Bug description: Environment: Host OS (ia32/ia32e/IA64): ia32e Guest OS (ia32/ia32e/IA64): ia32e Guest OS Type (Linux/Windows): linux kvm.git Commit: da3f7ca3 qemu.git Commit: 38a762fe Host Kernel Version: 4.4.0-rc2 Hardware: BDW EP (Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz, Grantley-EP) Bug description: -- when create guest with vt-d assignment using vfio-pci driver, the guest can not be created. Warning 'No available IOMMU models' Reproduce steps: 1. bind device to vfio-pci driver 2. qemu-system-x86_64 -enable-kvm -m 512 -smp 2 -device vfio-pci,host=81:00.0 -net none -drive file=rhel7u2.qcow2,if=none,id=virtio-disk0 -device virtio-blk-pci,drive=virtio-disk0 Current result: qemu-system-x86_64: -device vfio-pci,host=81:00.0: vfio: No available IOMMU models qemu-system-x86_64: -device vfio-pci,host=81:00.0: vfio: failed to setup container for group 41 qemu-system-x86_64: -device vfio-pci,host=81:00.0: vfio: failed to get group 41 qemu-system-x86_64: -device vfio-pci,host=81:00.0: Device initialization failed Expected result: guest can be created Basic root-causing log: -- To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1529187/+subscriptions
[Qemu-devel] [PATCH v4 2/4] i386/acpi: make floppy controller object dynamic
Instead of statically declaring the floppy controller in DSDT, with its _STA method depending on some obscure bit in the parent ISA bridge, add the object dynamically to SSDT via AML API only when the controller is present. The _STA method is no longer necessary and is therefore dropped. So are the declarations of the fields indicating whether the contoller is enabled. Signed-off-by: Roman Kagan Cc: "Michael S. Tsirkin" Cc: Eduardo Habkost Cc: Igor Mammedov Cc: John Snow Cc: Kevin Wolf Cc: Paolo Bonzini Cc: Richard Henderson Cc: qemu-bl...@nongnu.org Cc: qemu-sta...@nongnu.org --- changes since v3: - new patch (note that it conflicts with "[PATCH 50/74] pc: acpi: move FDC0 device from DSDT to SSDT" from Igor's series) - include test data updates to maintain bisectability hw/i386/acpi-build.c| 24 hw/i386/acpi-dsdt-isa.dsl | 18 -- hw/i386/acpi-dsdt.dsl | 1 - hw/i386/q35-acpi-dsdt.dsl | 7 ++- tests/acpi-test-data/pc/DSDT| Bin 3028 -> 2946 bytes tests/acpi-test-data/pc/SSDT| Bin 2486 -> 2554 bytes tests/acpi-test-data/pc/SSDT.bridge | Bin 4345 -> 4413 bytes tests/acpi-test-data/q35/DSDT | Bin 7666 -> 7578 bytes 8 files changed, 26 insertions(+), 24 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 4cc1440..a01e909 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -113,6 +113,7 @@ typedef struct AcpiMiscInfo { unsigned dsdt_size; uint16_t pvpanic_port; uint16_t applesmc_io_base; +bool has_fdc; } AcpiMiscInfo; typedef struct AcpiBuildPciBusHotplugState { @@ -236,10 +237,15 @@ static void acpi_get_pm_info(AcpiPmInfo *pm) static void acpi_get_misc_info(AcpiMiscInfo *info) { +ISADevice *fdc; + info->has_hpet = hpet_find(); info->tpm_version = tpm_get_version(); info->pvpanic_port = pvpanic_port(); info->applesmc_io_base = applesmc_port(); + +fdc = pc_find_fdc0(); +info->has_fdc = !!fdc; } /* @@ -1099,6 +1105,24 @@ build_ssdt(GArray *table_data, GArray *linker, aml_append(scope, aml_name_decl("_S5", pkg)); aml_append(ssdt, scope); +if (misc->has_fdc) { +scope = aml_scope("\\_SB.PCI0.ISA"); +dev = aml_device("FDC0"); + +aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0700"))); + +crs = aml_resource_template(); +aml_append(crs, aml_io(AML_DECODE16, 0x03F2, 0x03F2, 0x00, 0x04)); +aml_append(crs, aml_io(AML_DECODE16, 0x03F7, 0x03F7, 0x00, 0x01)); +aml_append(crs, aml_irq_no_flags(6)); +aml_append(crs, aml_dma(AML_COMPATIBILITY, AML_NOTBUSMASTER, +AML_TRANSFER8, 2)); +aml_append(dev, aml_name_decl("_CRS", crs)); + +aml_append(scope, dev); +aml_append(ssdt, scope); +} + if (misc->applesmc_io_base) { scope = aml_scope("\\_SB.PCI0.ISA"); dev = aml_device("SMC"); diff --git a/hw/i386/acpi-dsdt-isa.dsl b/hw/i386/acpi-dsdt-isa.dsl index 89caa16..061507d 100644 --- a/hw/i386/acpi-dsdt-isa.dsl +++ b/hw/i386/acpi-dsdt-isa.dsl @@ -47,24 +47,6 @@ Scope(\_SB.PCI0.ISA) { }) } -Device(FDC0) { -Name(_HID, EisaId("PNP0700")) -Method(_STA, 0, NotSerialized) { -Store(FDEN, Local0) -If (LEqual(Local0, 0)) { -Return (0x00) -} Else { -Return (0x0F) -} -} -Name(_CRS, ResourceTemplate() { -IO(Decode16, 0x03F2, 0x03F2, 0x00, 0x04) -IO(Decode16, 0x03F7, 0x03F7, 0x00, 0x01) -IRQNoFlags() { 6 } -DMA(Compatibility, NotBusMaster, Transfer8) { 2 } -}) -} - Device(LPT) { Name(_HID, EisaId("PNP0400")) Method(_STA, 0, NotSerialized) { diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl index 8dba096..aa50990 100644 --- a/hw/i386/acpi-dsdt.dsl +++ b/hw/i386/acpi-dsdt.dsl @@ -80,7 +80,6 @@ DefinitionBlock ( , 3, CBEN, 1, // COM2 } -Name(FDEN, 1) } } diff --git a/hw/i386/q35-acpi-dsdt.dsl b/hw/i386/q35-acpi-dsdt.dsl index 7be7b37..fcb9915 100644 --- a/hw/i386/q35-acpi-dsdt.dsl +++ b/hw/i386/q35-acpi-dsdt.dsl @@ -137,16 +137,13 @@ DefinitionBlock ( COMB, 3, Offset(0x01), -LPTD, 2, -, 2, -FDCD, 2 +LPTD, 2 } OperationRegion(LPCE, PCI_Config, 0x82, 0x2) Field(LPCE, AnyAcc, NoLock, Preserve) { CAEN, 1, CBEN, 1, -LPEN, 1, -FDEN, 1 +LPEN, 1 } } } diff --git a/tests/acpi-test-data/pc/DSDT b/tests/acpi-test-data/pc/DSDT index c658203db94a7e7db7c36fde99a7075a8d75498d..d8ebf12cc0ae9f
[Qemu-devel] [PATCH v4 0/4] i386: expose floppy-related objects in SSDT
Windows on UEFI systems is only capable of detecting the presence and the type of floppy drives via corresponding ACPI objects. Those objects are added in the last patch of the series; the three preceding ones pave the way to it, by making the necessary data public and by moving the whole floppy drive controller description into runtime-generated SSDT. Note that the series conflicts with Igor's patchset for dynamic DSDT, in particular, with "[PATCH 50/74] pc: acpi: move FDC0 device from DSDT to SSDT"; I haven't managed to avoid that while trying to meet maintainer's comments. Roman Kagan (4): i386/pc: expose identifying the floppy controller i386/acpi: make floppy controller object dynamic expose floppy drive geometry and CMOS type i386: populate floppy drive information in SSDT Signed-off-by: Roman Kagan Cc: "Michael S. Tsirkin" Cc: Eduardo Habkost Cc: Igor Mammedov Cc: John Snow Cc: Kevin Wolf Cc: Paolo Bonzini Cc: Richard Henderson Cc: qemu-bl...@nongnu.org Cc: qemu-sta...@nongnu.org --- changes since v3: - make FDC object fully dynamic in a separate patch - split out support patches - include test data updates with the respective patches to maintain bisectability changes since v2: - explicit endianness for buffer data - reorder code to reduce conflicts with dynamic DSDT patchset - update test data hw/block/fdc.c | 11 + hw/i386/acpi-build.c| 92 hw/i386/acpi-dsdt-isa.dsl | 18 --- hw/i386/acpi-dsdt.dsl | 1 - hw/i386/pc.c| 46 ++ hw/i386/q35-acpi-dsdt.dsl | 7 +-- include/hw/block/fdc.h | 2 + include/hw/i386/pc.h| 3 ++ tests/acpi-test-data/pc/DSDT| Bin 3028 -> 2946 bytes tests/acpi-test-data/pc/SSDT| Bin 2486 -> 2635 bytes tests/acpi-test-data/pc/SSDT.bridge | Bin 4345 -> 4494 bytes tests/acpi-test-data/q35/DSDT | Bin 7666 -> 7578 bytes 12 files changed, 137 insertions(+), 43 deletions(-) -- 2.5.0
[Qemu-devel] [PATCH v4 3/4] expose floppy drive geometry and CMOS type
Make it possible to query the geometry and the CMOS type of a floppy drive outside of the respective source files. It will be useful, in particular, when dynamically building ACPI tables, and will allow to properly populate the corresponding ACPI objects and thus enable BIOS-less systems to access the floppy drives. Signed-off-by: Roman Kagan Cc: "Michael S. Tsirkin" Cc: Eduardo Habkost Cc: Igor Mammedov Cc: John Snow Cc: Kevin Wolf Cc: Paolo Bonzini Cc: Richard Henderson Cc: qemu-bl...@nongnu.org Cc: qemu-sta...@nongnu.org --- changes since v3: - split out into a separate patch to faciliate review hw/block/fdc.c | 11 +++ hw/i386/pc.c | 2 +- include/hw/block/fdc.h | 2 ++ include/hw/i386/pc.h | 1 + 4 files changed, 15 insertions(+), 1 deletion(-) diff --git a/hw/block/fdc.c b/hw/block/fdc.c index 4292ece..c858c5f 100644 --- a/hw/block/fdc.c +++ b/hw/block/fdc.c @@ -2408,6 +2408,17 @@ FDriveType isa_fdc_get_drive_type(ISADevice *fdc, int i) return isa->state.drives[i].drive; } +void isa_fdc_get_drive_geometry(ISADevice *fdc, int i, uint8_t *cylinders, +uint8_t *heads, uint8_t *sectors) +{ +FDCtrlISABus *isa = ISA_FDC(fdc); +FDrive *drv = &isa->state.drives[i]; + +*cylinders = drv->max_track; +*heads = (drv->flags & FDISK_DBL_SIDES) ? 2 : 1; +*sectors = drv->last_sect; +} + static const VMStateDescription vmstate_isa_fdc ={ .name = "fdc", .version_id = 2, diff --git a/hw/i386/pc.c b/hw/i386/pc.c index c36b8cf..99fab83 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -199,7 +199,7 @@ static void pic_irq_request(void *opaque, int irq, int level) #define REG_EQUIPMENT_BYTE 0x14 -static int cmos_get_fd_drive_type(FDriveType fd0) +int cmos_get_fd_drive_type(FDriveType fd0) { int val; diff --git a/include/hw/block/fdc.h b/include/hw/block/fdc.h index d48b2f8..adaf3dc 100644 --- a/include/hw/block/fdc.h +++ b/include/hw/block/fdc.h @@ -22,5 +22,7 @@ void sun4m_fdctrl_init(qemu_irq irq, hwaddr io_base, DriveInfo **fds, qemu_irq *fdc_tc); FDriveType isa_fdc_get_drive_type(ISADevice *fdc, int i); +void isa_fdc_get_drive_geometry(ISADevice *fdc, int i, uint8_t *cylinders, +uint8_t *heads, uint8_t *sectors); #endif diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index 819..d044a9a 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -268,6 +268,7 @@ typedef void (*cpu_set_smm_t)(int smm, void *arg); void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name); ISADevice *pc_find_fdc0(void); +int cmos_get_fd_drive_type(FDriveType fd0); /* acpi_piix.c */ -- 2.5.0
[Qemu-devel] [PATCH v4 1/4] i386/pc: expose identifying the floppy controller
Factor out and expose the function to locate the floppy controller in the system. It will be useful when dynamically populating the relevant objects in the ACPI tables. Signed-off-by: Roman Kagan Cc: "Michael S. Tsirkin" Cc: Eduardo Habkost Cc: Igor Mammedov Cc: John Snow Cc: Kevin Wolf Cc: Paolo Bonzini Cc: Richard Henderson Cc: qemu-bl...@nongnu.org Cc: qemu-sta...@nongnu.org --- changes since v3: - split out into a separate patch to faciliate review hw/i386/pc.c | 44 ++-- include/hw/i386/pc.h | 2 ++ 2 files changed, 28 insertions(+), 18 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 459260b..c36b8cf 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -360,6 +360,31 @@ static const char * const fdc_container_path[] = { "/unattached", "/peripheral", "/peripheral-anon" }; +/* + * Locate the FDC at IO address 0x3f0, in order to configure the CMOS registers + * and ACPI objects. + */ +ISADevice *pc_find_fdc0(void) +{ +int i; +Object *container; +CheckFdcState state = { 0 }; + +for (i = 0; i < ARRAY_SIZE(fdc_container_path); i++) { +container = container_get(qdev_get_machine(), fdc_container_path[i]); +object_child_foreach(container, check_fdc, &state); +} + +if (state.multiple) { +error_report("warning: multiple floppy disk controllers with " + "iobase=0x3f0 have been found;\n" + "the one being picked for CMOS setup might not reflect " + "your intent"); +} + +return state.floppy; +} + static void pc_cmos_init_late(void *opaque) { pc_cmos_init_late_arg *arg = opaque; @@ -368,8 +393,6 @@ static void pc_cmos_init_late(void *opaque) int8_t heads, sectors; int val; int i, trans; -Object *container; -CheckFdcState state = { 0 }; val = 0; if (ide_get_geometry(arg->idebus[0], 0, @@ -399,22 +422,7 @@ static void pc_cmos_init_late(void *opaque) } rtc_set_memory(s, 0x39, val); -/* - * Locate the FDC at IO address 0x3f0, and configure the CMOS registers - * accordingly. - */ -for (i = 0; i < ARRAY_SIZE(fdc_container_path); i++) { -container = container_get(qdev_get_machine(), fdc_container_path[i]); -object_child_foreach(container, check_fdc, &state); -} - -if (state.multiple) { -error_report("warning: multiple floppy disk controllers with " - "iobase=0x3f0 have been found;\n" - "the one being picked for CMOS setup might not reflect " - "your intent"); -} -pc_cmos_init_floppy(s, state.floppy); +pc_cmos_init_floppy(s, pc_find_fdc0()); qemu_unregister_reset(pc_cmos_init_late, opaque); } diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index b0d6283..819 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -267,6 +267,8 @@ typedef void (*cpu_set_smm_t)(int smm, void *arg); void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name); +ISADevice *pc_find_fdc0(void); + /* acpi_piix.c */ I2CBus *piix4_pm_init(PCIBus *bus, int devfn, uint32_t smb_io_base, -- 2.5.0
[Qemu-devel] [PATCH v4 4/4] i386: populate floppy drive information in SSDT
On x86-based systems Linux determines the presence and the type of floppy drives via a query of a CMOS field. So does SeaBIOS when populating the return data for int 0x13 function 0x08. Windows doesn't; instead, it requests this information from BIOS via int 0x13/0x08 or through ACPI objects _FDE (Floppy Drive Enumerate) and _FDI (Floppy Drive Information) of the floppy controller object. On UEFI systems only ACPI-based detection is supported. QEMU used not to provide those objects in its ACPI tables; as a result floppy drives were invisible to Windows on UEFI/OVMF. This patch adds those objects to the floppy controller in SSDT, populating them with the information from the respective QEMU objects. Signed-off-by: Roman Kagan Cc: "Michael S. Tsirkin" Cc: Eduardo Habkost Cc: Igor Mammedov Cc: John Snow Cc: Kevin Wolf Cc: Paolo Bonzini Cc: Richard Henderson Cc: qemu-bl...@nongnu.org Cc: qemu-sta...@nongnu.org --- changes since v3: - build on top of dynamic FDC0 patch - include test data updates to maintain bisectability changes since v2: - explicit endianness for buffer data - reorder code to reduce conflicts with dynamic DSDT patchset hw/i386/acpi-build.c| 68 tests/acpi-test-data/pc/SSDT| Bin 2554 -> 2635 bytes tests/acpi-test-data/pc/SSDT.bridge | Bin 4413 -> 4494 bytes 3 files changed, 68 insertions(+) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index a01e909..7b8de59 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -38,6 +38,7 @@ #include "hw/acpi/bios-linker-loader.h" #include "hw/loader.h" #include "hw/isa/isa.h" +#include "hw/block/fdc.h" #include "hw/acpi/memory_hotplug.h" #include "hw/mem/nvdimm.h" #include "sysemu/tpm.h" @@ -106,6 +107,13 @@ typedef struct AcpiPmInfo { uint16_t pcihp_io_len; } AcpiPmInfo; +typedef struct AcpiFDInfo { +uint8_t type; +uint8_t cylinders; +uint8_t heads; +uint8_t sectors; +} AcpiFDInfo; + typedef struct AcpiMiscInfo { bool has_hpet; TPMVersion tpm_version; @@ -114,6 +122,7 @@ typedef struct AcpiMiscInfo { uint16_t pvpanic_port; uint16_t applesmc_io_base; bool has_fdc; +AcpiFDInfo fdinfo[2]; } AcpiMiscInfo; typedef struct AcpiBuildPciBusHotplugState { @@ -237,6 +246,7 @@ static void acpi_get_pm_info(AcpiPmInfo *pm) static void acpi_get_misc_info(AcpiMiscInfo *info) { +int i; ISADevice *fdc; info->has_hpet = hpet_find(); @@ -246,6 +256,16 @@ static void acpi_get_misc_info(AcpiMiscInfo *info) fdc = pc_find_fdc0(); info->has_fdc = !!fdc; +if (fdc) { +for (i = 0; i < ARRAY_SIZE(info->fdinfo); i++) { +AcpiFDInfo *fdinfo = &info->fdinfo[i]; +fdinfo->type = isa_fdc_get_drive_type(fdc, i); +if (fdinfo->type < FDRIVE_DRV_NONE) { +isa_fdc_get_drive_geometry(fdc, i, &fdinfo->cylinders, + &fdinfo->heads, &fdinfo->sectors); +} +} +} } /* @@ -935,6 +955,40 @@ static Aml *build_crs(PCIHostState *host, return crs; } +static Aml *build_fdinfo_aml(int idx, AcpiFDInfo *fdinfo) +{ +Aml *dev, *fdi; + +dev = aml_device("FLP%c", 'A' + idx); + +aml_append(dev, aml_name_decl("_ADR", aml_int(idx))); + +fdi = aml_package(0x10); +aml_append(fdi, aml_int(idx)); /* Drive Number */ +aml_append(fdi, +aml_int(cmos_get_fd_drive_type(fdinfo->type))); /* Device Type */ +aml_append(fdi, +aml_int(fdinfo->cylinders - 1)); /* Maximum Cylinder Number */ +aml_append(fdi, aml_int(fdinfo->sectors)); /* Maximum Sector Number */ +aml_append(fdi, aml_int(fdinfo->heads - 1)); /* Maximum Head Number */ +/* SeaBIOS returns the below values for int 0x13 func 0x08 regardless of + * the drive type, so shall we */ +aml_append(fdi, aml_int(0xAF)); /* disk_specify_1 */ +aml_append(fdi, aml_int(0x02)); /* disk_specify_2 */ +aml_append(fdi, aml_int(0x25)); /* disk_motor_wait */ +aml_append(fdi, aml_int(0x02)); /* disk_sector_siz */ +aml_append(fdi, aml_int(0x12)); /* disk_eot */ +aml_append(fdi, aml_int(0x1B)); /* disk_rw_gap */ +aml_append(fdi, aml_int(0xFF)); /* disk_dtl */ +aml_append(fdi, aml_int(0x6C)); /* disk_formt_gap */ +aml_append(fdi, aml_int(0xF6)); /* disk_fill */ +aml_append(fdi, aml_int(0x0F)); /* disk_head_sttl */ +aml_append(fdi, aml_int(0x08)); /* disk_motor_strt */ + +aml_append(dev, aml_name_decl("_FDI", fdi)); +return dev; +} + static void build_ssdt(GArray *table_data, GArray *linker, AcpiCpuInfo *cpu, AcpiPmInfo *pm, AcpiMiscInfo *misc, @@ -1106,6 +1160,8 @@ build_ssdt(GArray *table_data, GArray *linker, aml_append(ssdt, scope); if (misc->has_fdc) { +uint32_t fde_buf[5] = {0, 0, 0, 0, cpu_to_le32(0x2)}; + scope = aml_scope("\\_SB.PCI0.ISA"); dev = aml_device("FDC0"); @@ -1119,6 +1
Re: [Qemu-devel] [PATCH v3 2/2] tests: update expected SSDT for floppy changes
On Thu, Dec 24, 2015 at 08:17:45AM +0200, Michael S. Tsirkin wrote: > On Wed, Dec 23, 2015 at 08:51:45PM +0300, Roman Kagan wrote: > > On Wed, Dec 23, 2015 at 06:47:16PM +0100, Igor Mammedov wrote: > > > On Wed, 23 Dec 2015 20:20:54 +0300 > > > Roman Kagan wrote: > > > > > ... two 1.44M drives with bogus geometry for q35. > > > > > > > > This one is a bug in my patch, indeed: I was tricked by FDRIVE_DRV_NONE > > > > being non-zero, and forgot to initialize the respective fields in > > > > acpi_get_misc_info() in case there is no floppy controller at all. > > > so instead of fake initialization, it's worth to make your patch > > > conditional on presence of controller after all. > > > i.e. add AML only if controller was present. > > > > Indeed :) > > > > Roman. > > Or rather, start series with a patch making FDC conditional, > then update expected ssdt, then tweak methods within - > should not change ssdt since we don't create a floppy in > the test. Actually we do. "pc" machine type has it by default regardless of whether anything is attached to it. So I ended up with the patch series v4 I just posted but I'm not sure it addresses all the concerns people have had about v3. Thanks, Roman.
Re: [Qemu-devel] git send-email didn't arrive?
On 25 December 2015 at 14:42, Johan Ouwerkerk wrote: > Scratch this? It seems to have arrived just now, finally, having taken > a mere 16 hours and 4 minutes to get there... ? The GNU list servers occasionally get a bit overloaded and sit on mail for a while... thanks -- PMM
Re: [Qemu-devel] live migration vs device assignment (motivation)
On Thu, Dec 24, 2015 at 11:03 PM, Lan Tianyu wrote: > Merry Christmas. > Sorry for later response due to personal affair. > > On 2015年12月14日 03:30, Alexander Duyck wrote: >>> > These sounds we need to add a faked bridge for migration and adding a >>> > driver in the guest for it. It also needs to extend PCI bus/hotplug >>> > driver to do pause/resume other devices, right? >>> > >>> > My concern is still that whether we can change PCI bus/hotplug like that >>> > without spec change. >>> > >>> > IRQ should be general for any devices and we may extend it for >>> > migration. Device driver also can make decision to support migration >>> > or not. >> The device should have no say in the matter. Either we are going to >> migrate or we will not. This is why I have suggested my approach as >> it allows for the least amount of driver intrusion while providing the >> maximum number of ways to still perform migration even if the device >> doesn't support it. > > Even if the device driver doesn't support migration, you still want to > migrate VM? That maybe risk and we should add the "bad path" for the > driver at least. At a minimum we should have support for hot-plug if we are expecting to support migration. You would simply have to hot-plug the device before you start migration and then return it after. That is how the current bonding approach for this works if I am not mistaken. The advantage we are looking to gain is to avoid removing/disabling the device for as long as possible. Ideally we want to keep the device active through the warm-up period, but if the guest doesn't do that we should still be able to fall back on the older approaches if needed. >> >> The solution I have proposed is simple: >> >> 1. Extend swiotlb to allow for a page dirtying functionality. >> >> This part is pretty straight forward. I'll submit a few patches >> later today as RFC that can provided the minimal functionality needed >> for this. > > Very appreciate to do that. > >> >> 2. Provide a vendor specific configuration space option on the QEMU >> implementation of a PCI bridge to act as a bridge between direct >> assigned devices and the host bridge. >> >> My thought was to add some vendor specific block that includes a >> capabilities, status, and control register so you could go through and >> synchronize things like the DMA page dirtying feature. The bridge >> itself could manage the migration capable bit inside QEMU for all >> devices assigned to it. So if you added a VF to the bridge it would >> flag that you can support migration in QEMU, while the bridge would >> indicate you cannot until the DMA page dirtying control bit is set by >> the guest. >> >> We could also go through and optimize the DMA page dirtying after >> this is added so that we can narrow down the scope of use, and as a >> result improve the performance for other devices that don't need to >> support migration. It would then be a matter of adding an interrupt >> in the device to handle an event such as the DMA page dirtying status >> bit being set in the config space status register, while the bit is >> not set in the control register. If it doesn't get set then we would >> have to evict the devices before the warm-up phase of the migration, >> otherwise we can defer it until the end of the warm-up phase. >> >> 3. Extend existing shpc driver to support the optional "pause" >> functionality as called out in section 4.1.2 of the Revision 1.1 PCI >> hot-plug specification. > > Since your solution has added a faked PCI bridge. Why not notify the > bridge directly during migration via irq and call device driver's > callback in the new bridge driver? > > Otherwise, the new bridge driver also can check whether the device > driver provides migration callback or not and call them to improve the > passthough device's performance during migration. This is basically what I had in mind. Though I would take things one step further. You don't need to add any new call-backs if you make use of the existing suspend/resume logic. For a VF this does exactly what you would need since the VFs don't support wake on LAN so it will simply clear the bus master enable and put the netdev in a suspended state until the resume can be called. The PCI hot-plug specification calls out that the OS can optionally implement a "pause" mechanism which is meant to be used for high availability type environments. What I am proposing is basically extending the standard SHPC capable PCI bridge so that we can support the DMA page dirtying for everything hosted on it, add a vendor specific block to the config space so that the guest can notify the host that it will do page dirtying, and add a mechanism to indicate that all hot-plug events during the warm-up phase of the migration are pause events instead of full removals. I've been poking around in the kernel and QEMU code and the part I have been trying to sort out is how to get QEMU based pci-bridge to use the SHPC