[PATCH v2] Fixed a QEMU hang when guest poweroff in COLO mode
From: "Rao, Lei" When the PVM guest poweroff, the COLO thread may wait a semaphore in colo_process_checkpoint().So, we should wake up the COLO thread before migration shutdown. Signed-off-by: Lei Rao --- include/migration/colo.h | 1 + migration/colo.c | 20 migration/migration.c| 6 ++ 3 files changed, 27 insertions(+) diff --git a/include/migration/colo.h b/include/migration/colo.h index 768e1f0..5fbe1a6 100644 --- a/include/migration/colo.h +++ b/include/migration/colo.h @@ -37,4 +37,5 @@ COLOMode get_colo_mode(void); void colo_do_failover(void); void colo_checkpoint_notify(void *opaque); +void colo_shutdown(void); #endif diff --git a/migration/colo.c b/migration/colo.c index 2415325..0d3d98f 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -820,6 +820,26 @@ static void colo_wait_handle_message(MigrationIncomingState *mis, } } +void colo_shutdown(void) +{ +MigrationIncomingState *mis = NULL; +MigrationState *s = NULL; + +switch (get_colo_mode()) { +case COLO_MODE_PRIMARY: +s = migrate_get_current(); +qemu_event_set(&s->colo_checkpoint_event); +qemu_sem_post(&s->colo_exit_sem); +break; +case COLO_MODE_SECONDARY: +mis = migration_incoming_get_current(); +qemu_sem_post(&mis->colo_incoming_sem); +break; +default: +break; +} +} + void *colo_process_incoming_thread(void *opaque) { MigrationIncomingState *mis = opaque; diff --git a/migration/migration.c b/migration/migration.c index abaf6f9..c0ab86e 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -226,6 +226,12 @@ void migration_cancel(const Error *error) void migration_shutdown(void) { /* + * When the QEMU main thread exit, the COLO thread + * may wait a semaphore. So, we should wakeup the + * COLO thread before migration shutdown. + */ +colo_shutdown(); +/* * Cancel the current migration - that will (eventually) * stop the migration using this structure */ -- 1.8.3.1
[PATCH] docs/COLO-FT.txt: Drop deprecated 'props' from object-add in COLO docs
From: "Rao, Lei" With the removal of deprecated 'props' from object-add in the commit of "50243407457a9fb0ed17b9a9ba9fc9aee09495b1", we also should update COLO's documents. Signed-off-by: Lei Rao --- docs/COLO-FT.txt | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt index 8d6d53a..fd5ffcc 100644 --- a/docs/COLO-FT.txt +++ b/docs/COLO-FT.txt @@ -289,11 +289,11 @@ Wait until disk is synced, then: {'execute': 'human-monitor-command', 'arguments':{ 'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=,file.export=parent0,node-name=replication0'}} {'execute': 'x-blockdev-change', 'arguments':{ 'parent': 'colo-disk0', 'node': 'replication0' } } -{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-mirror', 'id': 'm0', 'props': { 'netdev': 'hn0', 'queue': 'tx', 'outdev': 'mirror0' } } } -{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-redirector', 'id': 'redire0', 'props': { 'netdev': 'hn0', 'queue': 'rx', 'indev': 'compare_out' } } } -{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-redirector', 'id': 'redire1', 'props': { 'netdev': 'hn0', 'queue': 'rx', 'outdev': 'compare0' } } } +{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-mirror', 'id': 'm0', 'netdev': 'hn0', 'queue': 'tx', 'outdev': 'mirror0' } } +{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-redirector', 'id': 'redire0', 'netdev': 'hn0', 'queue': 'rx', 'indev': 'compare_out' } } +{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-redirector', 'id': 'redire1', 'netdev': 'hn0', 'queue': 'rx', 'outdev': 'compare0' } } {'execute': 'object-add', 'arguments':{ 'qom-type': 'iothread', 'id': 'iothread1' } } -{'execute': 'object-add', 'arguments':{ 'qom-type': 'colo-compare', 'id': 'comp0', 'props': { 'primary_in': 'compare0-0', 'secondary_in': 'compare1', 'outdev': 'compare_out0', 'iothread': 'iothread1' } } } +{'execute': 'object-add', 'arguments':{ 'qom-type': 'colo-compare', 'id': 'comp0', 'primary_in': 'compare0-0', 'secondary_in': 'compare1', 'outdev': 'compare_out0', 'iothread': 'iothread1' } } {'execute': 'migrate-set-capabilities', 'arguments':{ 'capabilities': [ {'capability': 'x-colo', 'state': true } ] } } {'execute': 'migrate', 'arguments':{ 'uri': 'tcp:127.0.0.2:9998' } } @@ -318,11 +318,11 @@ Wait until disk is synced, then: {'execute': 'human-monitor-command', 'arguments':{ 'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.1,file.port=,file.export=parent0,node-name=replication0'}} {'execute': 'x-blockdev-change', 'arguments':{ 'parent': 'colo-disk0', 'node': 'replication0' } } -{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-mirror', 'id': 'm0', 'props': { 'insert': 'before', 'position': 'id=rew0', 'netdev': 'hn0', 'queue': 'tx', 'outdev': 'mirror0' } } } -{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-redirector', 'id': 'redire0', 'props': { 'insert': 'before', 'position': 'id=rew0', 'netdev': 'hn0', 'queue': 'rx', 'indev': 'compare_out' } } } -{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-redirector', 'id': 'redire1', 'props': { 'insert': 'before', 'position': 'id=rew0', 'netdev': 'hn0', 'queue': 'rx', 'outdev': 'compare0' } } } +{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-mirror', 'id': 'm0', 'insert': 'before', 'position': 'id=rew0', 'netdev': 'hn0', 'queue': 'tx', 'outdev': 'mirror0' } } +{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-redirector', 'id': 'redire0', 'insert': 'before', 'position': 'id=rew0', 'netdev': 'hn0', 'queue': 'rx', 'indev': 'compare_out' } } +{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-redirector', 'id': 'redire1', 'insert': 'before', 'position': 'id=rew0', 'netdev': 'hn0', 'queue': 'rx', 'outdev': 'compare0' } } {'execute': 'object-add', 'arguments':{ 'qom-type': 'iothread', 'id': 'iothread1' } } -{'execute': 'object-add', 'arguments':{ 'qom-type': 'colo-compare', 'id': 'comp0', 'props': { 'primary_in': 'compare0-0', 'secondary_in': 'compare1', 'outdev': 'compare_out0', 'iothread': 'iothread1' } } } +{'execute': 'object-add', 'arguments':{ 'qom-type': 'colo-compare', 'id': 'comp0', 'primary_in': 'compare0-0', 'secondary_in': 'compare1', 'outdev': 'compare_out0', 'iothread': 'iothread1' } } {'execute': 'migrate-set-capabilities', 'arguments':{ 'capabilities': [ {'capability': 'x-colo', 'state': true } ] } } {'execute': 'migrate', 'arguments':{ 'uri': 'tcp:127.0.0.1:9998' } } -- 1.8.3.1
RE: [PATCH] docs/COLO-FT.txt: Drop deprecated 'props' from object-add in COLO docs
OK,all files you mentioned will be modified in the next patch. Thanks, Lei. -Original Message- From: Markus Armbruster Sent: Thursday, November 18, 2021 2:50 PM To: Rao, Lei Cc: Zhang, Chen ; zhang.zhanghaili...@huawei.com; quint...@redhat.com; dgilb...@redhat.com; qemu-triv...@nongnu.org; qemu-devel@nongnu.org Subject: Re: [PATCH] docs/COLO-FT.txt: Drop deprecated 'props' from object-add in COLO docs "Rao, Lei" writes: > From: "Rao, Lei" > > With the removal of deprecated 'props' from object-add in the commit > of "50243407457a9fb0ed17b9a9ba9fc9aee09495b1", we also should update > COLO's documents. We should've done this right when we deprecated it. Not your fault, and better late than never. Recommend to quote commits like in commit 5024340745 "qapi/qom: Drop deprecated 'props' from object-add" (v6.0.0) I have [alias] whatis = "!f() { git --no-pager show -s --pretty=\"tformat:%h \"%s\" (`git dc $1 | sed 's/~.*//;s/%/%%/g'`, %cd)\" --date=short $1; }; f" in my ~/.gitconfig, so I can run $ git whatis 50243407457a9fb0ed17b9a9ba9fc9aee09495b1 5024340745 qapi/qom: Drop deprecated 'props' from object-add (v6.0.0-rc0, 2021-03-19) > Signed-off-by: Lei Rao Reviewed-by: Markus Armbruster Same issue in docs/system/authz.rst, docs/throttle.txt, and docs/tools/qemu-nbd.rst. Care to fix it there as well? Also... > --- > docs/COLO-FT.txt | 16 > 1 file changed, 8 insertions(+), 8 deletions(-) > > diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt index > 8d6d53a..fd5ffcc 100644 > --- a/docs/COLO-FT.txt > +++ b/docs/COLO-FT.txt > @@ -289,11 +289,11 @@ Wait until disk is synced, then: > {'execute': 'human-monitor-command', 'arguments':{ 'command-line': > 'drive_add -n buddy > driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,fi > le.port=,file.export=parent0,node-name=replication0'}} > {'execute': 'x-blockdev-change', 'arguments':{ 'parent': > 'colo-disk0', 'node': 'replication0' } } > ... I'd like us to use double quotes in examples, not single quotes. QMP supports single quotes as an extension over JSON. Best to avoid it. Separate patch, of course. Same in docs/block-replication.txt. [...]
[PATCH 1/2] docs: Drop deprecated 'props' from object-add
From: "Rao, Lei" In commit 5024340745 "qapi/qom: Drop deprecated 'props' from object-add" (v6.0.0), we also should update documents. Signed-off-by: Lei Rao --- docs/system/authz.rst | 26 ++ docs/throttle.txt | 8 +++- docs/tools/qemu-nbd.rst | 2 +- 3 files changed, 14 insertions(+), 22 deletions(-) diff --git a/docs/system/authz.rst b/docs/system/authz.rst index 942af39..55b7315 100644 --- a/docs/system/authz.rst +++ b/docs/system/authz.rst @@ -77,9 +77,7 @@ To create an instance of this driver via QMP: "arguments": { "qom-type": "authz-simple", "id": "authz0", - "props": { - "identity": "fred" - } + "identity": "fred" } } @@ -110,15 +108,13 @@ To create an instance of this class via QMP: "arguments": { "qom-type": "authz-list", "id": "authz0", - "props": { - "rules": [ -{ "match": "fred", "policy": "allow", "format": "exact" }, -{ "match": "bob", "policy": "allow", "format": "exact" }, -{ "match": "danb", "policy": "deny", "format": "exact" }, -{ "match": "dan*", "policy": "allow", "format": "glob" } - ], - "policy": "deny" - } + "rules": [ + { "match": "fred", "policy": "allow", "format": "exact" }, + { "match": "bob", "policy": "allow", "format": "exact" }, + { "match": "danb", "policy": "deny", "format": "exact" }, + { "match": "dan*", "policy": "allow", "format": "glob" } + ], + "policy": "deny" } } @@ -143,10 +139,8 @@ To create an instance of this class via QMP: "arguments": { "qom-type": "authz-list-file", "id": "authz0", - "props": { - "filename": "/etc/qemu/myvm-vnc.acl", - "refresh": true - } + "filename": "/etc/qemu/myvm-vnc.acl", + "refresh": true } } diff --git a/docs/throttle.txt b/docs/throttle.txt index b5b78b7..0a0453a 100644 --- a/docs/throttle.txt +++ b/docs/throttle.txt @@ -273,11 +273,9 @@ A group can be created using the object-add QMP function: "arguments": { "qom-type": "throttle-group", "id": "group0", - "props": { - "limits" : { - "iops-total": 1000 - "bps-write": 2097152 - } + "limits" : { + "iops-total": 1000, + "bps-write": 2097152 } } } diff --git a/docs/tools/qemu-nbd.rst b/docs/tools/qemu-nbd.rst index 56e54cd..726cd189 100644 --- a/docs/tools/qemu-nbd.rst +++ b/docs/tools/qemu-nbd.rst @@ -31,7 +31,7 @@ driver options if ``--image-opts`` is specified. *dev* is an NBD device. -.. option:: --object type,id=ID,...props... +.. option:: --object type,id=ID,... Define a new instance of the *type* object class identified by *ID*. See the :manpage:`qemu(1)` manual page for full details of the properties -- 1.8.3.1
[PATCH 2/2] docs: Use double quotes instead of single quotes for COLO
From: "Rao, Lei" Signed-off-by: Lei Rao --- docs/COLO-FT.txt | 106 ++--- docs/block-replication.txt | 52 +++--- 2 files changed, 79 insertions(+), 79 deletions(-) diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt index fd5ffcc..8ec653f 100644 --- a/docs/COLO-FT.txt +++ b/docs/COLO-FT.txt @@ -209,9 +209,9 @@ children.0=childs0 \ 3. On Secondary VM's QEMU monitor, issue command -{'execute':'qmp_capabilities'} -{'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet', 'data': {'host': '0.0.0.0', 'port': ''} } } } -{'execute': 'nbd-server-add', 'arguments': {'device': 'parent0', 'writable': true } } +{"execute":"qmp_capabilities"} +{"execute": "nbd-server-start", "arguments": {"addr": {"type": "inet", "data": {"host": "0.0.0.0", "port": ""} } } } +{"execute": "nbd-server-add", "arguments": {"device": "parent0", "writable": true } } Note: a. The qmp command nbd-server-start and nbd-server-add must be run @@ -222,11 +222,11 @@ Note: will be merged into the parent disk on failover. 4. On Primary VM's QEMU monitor, issue command: -{'execute':'qmp_capabilities'} -{'execute': 'human-monitor-command', 'arguments': {'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=,file.export=parent0,node-name=replication0'}} -{'execute': 'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', 'node': 'replication0' } } -{'execute': 'migrate-set-capabilities', 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } ] } } -{'execute': 'migrate', 'arguments': {'uri': 'tcp:127.0.0.2:9998' } } +{"execute":"qmp_capabilities"} +{"execute": "human-monitor-command", "arguments": {"command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=,file.export=parent0,node-name=replication0"}} +{"execute": "x-blockdev-change", "arguments":{"parent": "colo-disk0", "node": "replication0" } } +{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [ {"capability": "x-colo", "state": true } ] } } +{"execute": "migrate", "arguments": {"uri": "tcp:127.0.0.2:9998" } } Note: a. There should be only one NBD Client for each primary disk. @@ -249,59 +249,59 @@ if you want to resume the replication, follow "Secondary resume replication" == Primary Failover == The Secondary died, resume on the Primary -{'execute': 'x-blockdev-change', 'arguments':{ 'parent': 'colo-disk0', 'child': 'children.1'} } -{'execute': 'human-monitor-command', 'arguments':{ 'command-line': 'drive_del replication0' } } -{'execute': 'object-del', 'arguments':{ 'id': 'comp0' } } -{'execute': 'object-del', 'arguments':{ 'id': 'iothread1' } } -{'execute': 'object-del', 'arguments':{ 'id': 'm0' } } -{'execute': 'object-del', 'arguments':{ 'id': 'redire0' } } -{'execute': 'object-del', 'arguments':{ 'id': 'redire1' } } -{'execute': 'x-colo-lost-heartbeat' } +{"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "child": "children.1"} } +{"execute": "human-monitor-command", "arguments":{ "command-line": "drive_del replication0" } } +{"execute": "object-del", "arguments":{ "id": "comp0" } } +{"execute": "object-del", "arguments":{ "id": "iothread1" } } +{"execute": "object-del", "arguments":{ "id": "m0" } } +{"execu
[PATCH v2 1/2] docs: Drop deprecated 'props' from object-add
From: "Rao, Lei" In commit 5024340745 "qapi/qom: Drop deprecated 'props' from object-add" (v6.0.0), we also should update documents. Signed-off-by: Lei Rao --- docs/COLO-FT.txt| 16 docs/system/authz.rst | 26 ++ docs/throttle.txt | 8 +++- docs/tools/qemu-nbd.rst | 2 +- 4 files changed, 22 insertions(+), 30 deletions(-) diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt index 8d6d53a..fd5ffcc 100644 --- a/docs/COLO-FT.txt +++ b/docs/COLO-FT.txt @@ -289,11 +289,11 @@ Wait until disk is synced, then: {'execute': 'human-monitor-command', 'arguments':{ 'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=,file.export=parent0,node-name=replication0'}} {'execute': 'x-blockdev-change', 'arguments':{ 'parent': 'colo-disk0', 'node': 'replication0' } } -{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-mirror', 'id': 'm0', 'props': { 'netdev': 'hn0', 'queue': 'tx', 'outdev': 'mirror0' } } } -{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-redirector', 'id': 'redire0', 'props': { 'netdev': 'hn0', 'queue': 'rx', 'indev': 'compare_out' } } } -{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-redirector', 'id': 'redire1', 'props': { 'netdev': 'hn0', 'queue': 'rx', 'outdev': 'compare0' } } } +{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-mirror', 'id': 'm0', 'netdev': 'hn0', 'queue': 'tx', 'outdev': 'mirror0' } } +{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-redirector', 'id': 'redire0', 'netdev': 'hn0', 'queue': 'rx', 'indev': 'compare_out' } } +{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-redirector', 'id': 'redire1', 'netdev': 'hn0', 'queue': 'rx', 'outdev': 'compare0' } } {'execute': 'object-add', 'arguments':{ 'qom-type': 'iothread', 'id': 'iothread1' } } -{'execute': 'object-add', 'arguments':{ 'qom-type': 'colo-compare', 'id': 'comp0', 'props': { 'primary_in': 'compare0-0', 'secondary_in': 'compare1', 'outdev': 'compare_out0', 'iothread': 'iothread1' } } } +{'execute': 'object-add', 'arguments':{ 'qom-type': 'colo-compare', 'id': 'comp0', 'primary_in': 'compare0-0', 'secondary_in': 'compare1', 'outdev': 'compare_out0', 'iothread': 'iothread1' } } {'execute': 'migrate-set-capabilities', 'arguments':{ 'capabilities': [ {'capability': 'x-colo', 'state': true } ] } } {'execute': 'migrate', 'arguments':{ 'uri': 'tcp:127.0.0.2:9998' } } @@ -318,11 +318,11 @@ Wait until disk is synced, then: {'execute': 'human-monitor-command', 'arguments':{ 'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.1,file.port=,file.export=parent0,node-name=replication0'}} {'execute': 'x-blockdev-change', 'arguments':{ 'parent': 'colo-disk0', 'node': 'replication0' } } -{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-mirror', 'id': 'm0', 'props': { 'insert': 'before', 'position': 'id=rew0', 'netdev': 'hn0', 'queue': 'tx', 'outdev': 'mirror0' } } } -{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-redirector', 'id': 'redire0', '
[PATCH v2 2/2] docs: Use double quotes instead of single quotes for COLO
From: "Rao, Lei" Signed-off-by: Lei Rao --- docs/COLO-FT.txt | 106 ++--- docs/block-replication.txt | 52 +++--- 2 files changed, 79 insertions(+), 79 deletions(-) diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt index fd5ffcc..8ec653f 100644 --- a/docs/COLO-FT.txt +++ b/docs/COLO-FT.txt @@ -209,9 +209,9 @@ children.0=childs0 \ 3. On Secondary VM's QEMU monitor, issue command -{'execute':'qmp_capabilities'} -{'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet', 'data': {'host': '0.0.0.0', 'port': ''} } } } -{'execute': 'nbd-server-add', 'arguments': {'device': 'parent0', 'writable': true } } +{"execute":"qmp_capabilities"} +{"execute": "nbd-server-start", "arguments": {"addr": {"type": "inet", "data": {"host": "0.0.0.0", "port": ""} } } } +{"execute": "nbd-server-add", "arguments": {"device": "parent0", "writable": true } } Note: a. The qmp command nbd-server-start and nbd-server-add must be run @@ -222,11 +222,11 @@ Note: will be merged into the parent disk on failover. 4. On Primary VM's QEMU monitor, issue command: -{'execute':'qmp_capabilities'} -{'execute': 'human-monitor-command', 'arguments': {'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=,file.export=parent0,node-name=replication0'}} -{'execute': 'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', 'node': 'replication0' } } -{'execute': 'migrate-set-capabilities', 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } ] } } -{'execute': 'migrate', 'arguments': {'uri': 'tcp:127.0.0.2:9998' } } +{"execute":"qmp_capabilities"} +{"execute": "human-monitor-command", "arguments": {"command-line": "drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=,file.export=parent0,node-name=replication0"}} +{"execute": "x-blockdev-change", "arguments":{"parent": "colo-disk0", "node": "replication0" } } +{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [ {"capability": "x-colo", "state": true } ] } } +{"execute": "migrate", "arguments": {"uri": "tcp:127.0.0.2:9998" } } Note: a. There should be only one NBD Client for each primary disk. @@ -249,59 +249,59 @@ if you want to resume the replication, follow "Secondary resume replication" == Primary Failover == The Secondary died, resume on the Primary -{'execute': 'x-blockdev-change', 'arguments':{ 'parent': 'colo-disk0', 'child': 'children.1'} } -{'execute': 'human-monitor-command', 'arguments':{ 'command-line': 'drive_del replication0' } } -{'execute': 'object-del', 'arguments':{ 'id': 'comp0' } } -{'execute': 'object-del', 'arguments':{ 'id': 'iothread1' } } -{'execute': 'object-del', 'arguments':{ 'id': 'm0' } } -{'execute': 'object-del', 'arguments':{ 'id': 'redire0' } } -{'execute': 'object-del', 'arguments':{ 'id': 'redire1' } } -{'execute': 'x-colo-lost-heartbeat' } +{"execute": "x-blockdev-change", "arguments":{ "parent": "colo-disk0", "child": "children.1"} } +{"execute": "human-monitor-command", "arguments":{ "command-line": "drive_del replication0" } } +{"execute": "object-del", "arguments":{ "id": "comp0" } } +{"execute": "object-del", "arguments":{ "id": "iothread1" } } +{"execute": "object-del", "arguments":{ "id": "m0" } } +{"execu
[PATCH] block/nbd.c: Fixed IO request coroutine not being wakeup when kill NBD server.
rite_quorum_entry (opaque=0x7f2fddaf8c50) at ../block/quorum.c:699 16 0x55575c2ee4db in coroutine_trampoline (i0=1578543808, i1=21847) at ../util/coroutine-ucontext.c:173 17 0x7f2ff9855660 in __start_context () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 When we do failover in COLO mode, QEMU will hang while it is waiting for the in flight IO. >From the call trace, we can see the IO request coroutine which is waiting for >send_mutex has yield in nbd_co_send_request(). When we kill nbd server, it will never be wake up. So, it is necessary to wake up the coroutine in nbd_channel_error(). Signed-off-by: Rao Lei --- block/nbd.c | 1 + 1 file changed, 1 insertion(+) diff --git a/block/nbd.c b/block/nbd.c index 5853d85d60..cf9dda537c 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -167,6 +167,7 @@ static void nbd_channel_error(BDRVNBDState *s, int ret) s->state = NBD_CLIENT_QUIT; } +qemu_co_queue_restart_all(&s->free_sema); nbd_recv_coroutines_wake(s, true); } -- 2.32.0
Re: [PATCH] block/nbd.c: Fixed IO request coroutine not being wakeup when kill NBD server.
On 3/3/2022 5:25 PM, Vladimir Sementsov-Ogievskiy wrote: 03.03.2022 05:21, Rao Lei wrote: During the stress test, the IO request coroutine has a probability that it can't be awakened when the NBD server is killed. The GDB statck is as follows: (gdb) bt 0 0x7f2ff990cbf6 in __ppoll (fds=0x55575de85000, nfds=1, timeout=, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44 1 0x55575c302e7c in qemu_poll_ns (fds=0x55575de85000, nfds=1, timeout=59603140) at ../util/qemu-timer.c:348 2 0x55575c2d3c34 in fdmon_poll_wait (ctx=0x55575dc480f0, ready_list=0x7ffd9dd1dae0, timeout=59603140) at ../util/fdmon-poll.c:80 3 0x55575c2d350d in aio_poll (ctx=0x55575dc480f0, blocking=true) at ../util/aio-posix.c:655 4 0x55575c16eabd in bdrv_do_drained_begin (bs=0x55575dee7fe0, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at ../block/io.c:474 5 0x55575c16eba6 in bdrv_drained_begin (bs=0x55575dee7fe0) at ../block/io.c:480 6 0x55575c1aff33 in quorum_del_child (bs=0x55575dee7fe0, child=0x55575dcea690, errp=0x7ffd9dd1dd08) at ../block/quorum.c:1130 7 0x55575c14239b in bdrv_del_child (parent_bs=0x55575dee7fe0, child=0x55575dcea690, errp=0x7ffd9dd1dd08) at ../block.c:7705 8 0x55575c12da28 in qmp_x_blockdev_change (parent=0x55575df404c0 "colo-disk0", has_child=true, child=0x55575de867f0 "children.1", has_node=false, node=0x0, errp=0x7ffd9dd1dd08) at ../blockdev.c:3676 9 0x55575c258435 in qmp_marshal_x_blockdev_change (args=0x7f2fec008190, ret=0x7f2ff7b0bd98, errp=0x7f2ff7b0bd90) at qapi/qapi-commands-block-core.c:1675 10 0x55575c2c6201 in do_qmp_dispatch_bh (opaque=0x7f2ff7b0be30) at ../qapi/qmp-dispatch.c:129 11 0x55575c2ebb1c in aio_bh_call (bh=0x55575dc429c0) at ../util/async.c:141 12 0x55575c2ebc2a in aio_bh_poll (ctx=0x55575dc480f0) at ../util/async.c:169 13 0x55575c2d2d96 in aio_dispatch (ctx=0x55575dc480f0) at ../util/aio-posix.c:415 14 0x55575c2ec07f in aio_ctx_dispatch (source=0x55575dc480f0, callback=0x0, user_data=0x0) at ../util/async.c:311 15 0x7f2ff9e7cfbd in g_main_context_dispatch () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 16 0x55575c2fd581 in glib_pollfds_poll () at ../util/main-loop.c:232 17 0x55575c2fd5ff in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:255 18 0x55575c2fd710 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:531 19 0x55575bfa7588 in qemu_main_loop () at ../softmmu/runstate.c:726 20 0x55575bbee57a in main (argc=60, argv=0x7ffd9dd1e0e8, envp=0x7ffd9dd1e2d0) at ../softmmu/main.c:50 (gdb) qemu coroutine 0x55575e16aac0 0 0x55575c2ee7dc in qemu_coroutine_switch (from_=0x55575e16aac0, to_=0x7f2ff830fba0, action=COROUTINE_YIELD) at ../util/coroutine-ucontext.c:302 1 0x55575c2fe2a9 in qemu_coroutine_yield () at ../util/qemu-coroutine.c:195 2 0x55575c2fe93c in qemu_co_queue_wait_impl (queue=0x55575dc46170, lock=0x7f2b32ad9850) at ../util/qemu-coroutine-lock.c:56 3 0x55575c17ddfb in nbd_co_send_request (bs=0x55575ebfaf20, request=0x7f2b32ad9920, qiov=0x55575dfc15d8) at ../block/nbd.c:478 4 0x55575c17f931 in nbd_co_request (bs=0x55575ebfaf20, request=0x7f2b32ad9920, write_qiov=0x55575dfc15d8) at ../block/nbd.c:1182 5 0x55575c17fe14 in nbd_client_co_pwritev (bs=0x55575ebfaf20, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, flags=0) at ../block/nbd.c:1284 6 0x55575c170d25 in bdrv_driver_pwritev (bs=0x55575ebfaf20, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, qiov_offset=0, flags=0) at ../block/io.c:1264 7 0x55575c1733b4 in bdrv_aligned_pwritev (child=0x55575dff6890, req=0x7f2b32ad9ad0, offset=403487858688, bytes=4538368, align=1, qiov=0x55575dfc15d8, qiov_offset=0, flags=0) at ../block/io.c:2126 8 0x55575c173c67 in bdrv_co_pwritev_part (child=0x55575dff6890, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, qiov_offset=0, flags=0) at ../block/io.c:2314 9 0x55575c17391b in bdrv_co_pwritev (child=0x55575dff6890, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, flags=0) at ../block/io.c:2233 10 0x55575c1ee506 in replication_co_writev (bs=0x55575e9824f0, sector_num=788062224, remaining_sectors=8864, qiov=0x55575dfc15d8, flags=0) at ../block/replication.c:270 11 0x55575c170eed in bdrv_driver_pwritev (bs=0x55575e9824f0, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, qiov_offset=0, flags=0) at ../block/io.c:1297 12 0x55575c1733b4 in bdrv_aligned_pwritev (child=0x55575dcea690, req=0x7f2b32ad9e00, offset=403487858688, bytes=4538368, align=512, qiov=0x55575dfc15d8, qiov_offset=0, flags=0) at ../block/io.c:2126 13 0x55575c173c67 in bdrv_co_pwritev_part (child=0x55575dcea690, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, qiov_offset=0, flags=0) at ../block/io.c:2314 14 0x55575c17391b in bdrv_co_pwritev (child=0x55575dcea690, offset=40348785
[PATCH v2 2/7] Fixed qemu crash when guest power off in COLO mode
From: "Rao, Lei" This patch fixes the following: qemu-system-x86_64: invalid runstate transition: 'shutdown' -> 'running' Aborted (core dumped) The gdb bt as following: 0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 1 0x7faa3d613859 in __GI_abort () at abort.c:79 2 0x55c5a21268fd in runstate_set (new_state=RUN_STATE_RUNNING) at vl.c:723 3 0x55c5a1f8cae4 in vm_prepare_start () at /home/workspace/colo-qemu/cpus.c:2206 4 0x55c5a1f8cb1b in vm_start () at /home/workspace/colo-qemu/cpus.c:2213 5 0x55c5a2332bba in migration_iteration_finish (s=0x55c5a4658810) at migration/migration.c:3376 6 0x55c5a2332f3b in migration_thread (opaque=0x55c5a4658810) at migration/migration.c:3527 7 0x55c5a251d68a in qemu_thread_start (args=0x55c5a5491a70) at util/qemu-thread-posix.c:519 8 0x7faa3d7e9609 in start_thread (arg=) at pthread_create.c:477 9 0x7faa3d710293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Signed-off-by: Lei Rao --- migration/migration.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/migration/migration.c b/migration/migration.c index 9172686b89..e5b38a492f 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -3616,7 +3616,9 @@ static void migration_iteration_finish(MigrationState *s) case MIGRATION_STATUS_CANCELLED: case MIGRATION_STATUS_CANCELLING: if (s->vm_was_running) { -vm_start(); +if (!runstate_check(RUN_STATE_SHUTDOWN)) { +vm_start(); +} } else { if (runstate_check(RUN_STATE_FINISH_MIGRATE)) { runstate_set(RUN_STATE_POSTMIGRATE); -- 2.30.2
[PATCH v2 1/7] Some minor optimizations for COLO
From: "Rao, Lei" Signed-off-by: Lei Rao Reviewed-by: Dr. David Alan Gilbert --- migration/colo.c | 2 +- net/colo-compare.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/migration/colo.c b/migration/colo.c index 79fa1f6619..616dc00af7 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -152,7 +152,7 @@ static void primary_vm_do_failover(void) * kick COLO thread which might wait at * qemu_sem_wait(&s->colo_checkpoint_sem). */ -colo_checkpoint_notify(migrate_get_current()); +colo_checkpoint_notify(s); /* * Wake up COLO thread which may blocked in recv() or send(), diff --git a/net/colo-compare.c b/net/colo-compare.c index b100e7b51f..4a64a5d386 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -170,7 +170,7 @@ static bool packet_matches_str(const char *str, return false; } -return !memcmp(str, buf, strlen(str)); +return !memcmp(str, buf, packet_len); } static void notify_remote_frame(CompareState *s) -- 2.30.2
[PATCH v2 0/7] Fixed some bugs and optimized some codes for COLO
Changes since v1: --Move the s->rp_state.from_dst_file = NULL behind qemu_close(). The series of patches include: Fixed some bugs of qemu crash and segment fault. Optimized the function of fill_connection_key. Remove some unnecessary code to improve COLO. Rao, Lei (7): Some minor optimizations for COLO Fixed qemu crash when guest power off in COLO mode Fixed SVM hang when do failover before PVM crash colo: fixed 'Segmentation fault' when the simplex mode PVM poweroff Removed the qemu_fclose() in colo_process_incoming_thread Changed the last-mode to none of first start COLO Optimized the function of fill_connection_key. migration/colo.c | 16 +--- migration/migration.c | 6 +- net/colo-compare.c| 4 ++-- net/colo.c| 31 --- net/colo.h| 6 +++--- net/filter-rewriter.c | 10 +- 6 files changed, 28 insertions(+), 45 deletions(-) -- 2.30.2
[PATCH v2 4/7] colo: fixed 'Segmentation fault' when the simplex mode PVM poweroff
From: "Rao, Lei" The GDB statck is as follows: Program terminated with signal SIGSEGV, Segmentation fault. 0 object_class_dynamic_cast (class=0x55c8f5d2bf50, typename=0x55c8f2f7379e "qio-channel") at qom/object.c:832 if (type->class->interfaces && [Current thread is 1 (Thread 0x7f756e97eb00 (LWP 1811577))] (gdb) bt 0 object_class_dynamic_cast (class=0x55c8f5d2bf50, typename=0x55c8f2f7379e "qio-channel") at qom/object.c:832 1 0x55c8f2c3dd14 in object_dynamic_cast (obj=0x55c8f543ac00, typename=0x55c8f2f7379e "qio-channel") at qom/object.c:763 2 0x55c8f2c3ddce in object_dynamic_cast_assert (obj=0x55c8f543ac00, typename=0x55c8f2f7379e "qio-channel", file=0x55c8f2f73780 "migration/qemu-file-channel.c", line=117, func=0x55c8f2f73800 <__func__.18724> "channel_shutdown") at qom/object.c:786 3 0x55c8f2bbc6ac in channel_shutdown (opaque=0x55c8f543ac00, rd=true, wr=true, errp=0x0) at migration/qemu-file-channel.c:117 4 0x55c8f2bba56e in qemu_file_shutdown (f=0x7f7558070f50) at migration/qemu-file.c:67 5 0x55c8f2ba5373 in migrate_fd_cancel (s=0x55c8f4ccf3f0) at migration/migration.c:1699 6 0x55c8f2ba1992 in migration_shutdown () at migration/migration.c:187 7 0x55c8f29a5b77 in main (argc=69, argv=0x7fff3e9e8c08, envp=0x7fff3e9e8e38) at vl.c:4512 The root cause is that we still want to shutdown the from_dst_file in migrate_fd_cancel() after qemu_close in colo_process_checkpoint(). So, we should set the s->rp_state.from_dst_file = NULL after qemu_close(). Signed-off-by: Lei Rao --- migration/colo.c | 1 + 1 file changed, 1 insertion(+) diff --git a/migration/colo.c b/migration/colo.c index 616dc00af7..907241ab5c 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -640,6 +640,7 @@ out: */ if (s->rp_state.from_dst_file) { qemu_fclose(s->rp_state.from_dst_file); +s->rp_state.from_dst_file = NULL; } } -- 2.30.2
[PATCH v2 5/7] Removed the qemu_fclose() in colo_process_incoming_thread
From: "Rao, Lei" After the live migration, the related fd will be cleanup in migration_incoming_state_destroy(). So, the qemu_close() in colo_process_incoming_thread is not necessary. Signed-off-by: Lei Rao --- migration/colo.c | 5 - 1 file changed, 5 deletions(-) diff --git a/migration/colo.c b/migration/colo.c index 907241ab5c..71fc82a040 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -919,11 +919,6 @@ out: /* Hope this not to be too long to loop here */ qemu_sem_wait(&mis->colo_incoming_sem); qemu_sem_destroy(&mis->colo_incoming_sem); -/* Must be called after failover BH is completed */ -if (mis->to_src_file) { -qemu_fclose(mis->to_src_file); -mis->to_src_file = NULL; -} rcu_unregister_thread(); return NULL; -- 2.30.2
[PATCH v2 3/7] Fixed SVM hang when do failover before PVM crash
From: "Rao, Lei" This patch fixed as follows: Thread 1 (Thread 0x7f34ee738d80 (LWP 11212)): #0 __pthread_clockjoin_ex (threadid=139847152957184, thread_return=0x7f30b1febf30, clockid=, abstime=, block=) at pthread_join_common.c:145 #1 0x563401998e36 in qemu_thread_join (thread=0x563402d66610) at util/qemu-thread-posix.c:587 #2 0x5634017a79fa in process_incoming_migration_co (opaque=0x0) at migration/migration.c:502 #3 0x5634019b59c9 in coroutine_trampoline (i0=63395504, i1=22068) at util/coroutine-ucontext.c:115 #4 0x7f34ef860660 in ?? () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 from /lib/x86_64-linux-gnu/libc.so.6 #5 0x7f30b21ee730 in ?? () #6 0x in ?? () Thread 13 (Thread 0x7f30b3dff700 (LWP 11747)): #0 __lll_lock_wait (futex=futex@entry=0x56340218ffa0 , private=0) at lowlevellock.c:52 #1 0x7f34efa000a3 in _GI__pthread_mutex_lock (mutex=0x56340218ffa0 ) at ../nptl/pthread_mutex_lock.c:80 #2 0x563401997f99 in qemu_mutex_lock_impl (mutex=0x56340218ffa0 , file=0x563401b7a80e "migration/colo.c", line=806) at util/qemu-thread-posix.c:78 #3 0x563401407144 in qemu_mutex_lock_iothread_impl (file=0x563401b7a80e "migration/colo.c", line=806) at /home/workspace/colo-qemu/cpus.c:1899 #4 0x5634017ba8e8 in colo_process_incoming_thread (opaque=0x563402d664c0) at migration/colo.c:806 #5 0x563401998b72 in qemu_thread_start (args=0x5634039f8370) at util/qemu-thread-posix.c:519 #6 0x7f34ef9fd609 in start_thread (arg=) at pthread_create.c:477 #7 0x7f34ef924293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 The QEMU main thread is holding the lock: (gdb) p qemu_global_mutex $1 = {lock = {_data = {lock = 2, __count = 0, __owner = 11212, __nusers = 9, __kind = 0, __spins = 0, __elision = 0, __list = {_prev = 0x0, __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\314+\000\000\t", '\000' , __align = 2}, file = 0x563401c07e4b "util/main-loop.c", line = 240, initialized = true} >From the call trace, we can see it is a deadlock bug. and the QEMU main thread >holds the global mutex to wait until the COLO thread ends. and the colo thread wants to acquire the global mutex, which will cause a deadlock. So, we should release the qemu_global_mutex before waiting colo thread ends. Signed-off-by: Lei Rao Reviewed-by: Li Zhijian --- migration/migration.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/migration/migration.c b/migration/migration.c index e5b38a492f..811097c423 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -587,8 +587,10 @@ static void process_incoming_migration_co(void *opaque) mis->have_colo_incoming_thread = true; qemu_coroutine_yield(); +qemu_mutex_unlock_iothread(); /* Wait checkpoint incoming thread exit before free resource */ qemu_thread_join(&mis->colo_incoming_thread); +qemu_mutex_lock_iothread(); /* We hold the global iothread lock, so it is safe here */ colo_release_ram_cache(); } -- 2.30.2
[PATCH v2 6/7] Changed the last-mode to none of first start COLO
From: "Rao, Lei" When we first stated the COLO, the last-mode is as follows: { "execute": "query-colo-status" } {"return": {"last-mode": "primary", "mode": "primary", "reason": "none"}} The last-mode is unreasonable. After the patch, will be changed to the following: { "execute": "query-colo-status" } {"return": {"last-mode": "none", "mode": "primary", "reason": "none"}} Signed-off-by: Lei Rao --- migration/colo.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/migration/colo.c b/migration/colo.c index 71fc82a040..e3b1f136f4 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -205,7 +205,7 @@ void colo_do_failover(void) vm_stop_force_state(RUN_STATE_COLO); } -switch (get_colo_mode()) { +switch (last_colo_mode = get_colo_mode()) { case COLO_MODE_PRIMARY: primary_vm_do_failover(); break; @@ -530,8 +530,7 @@ static void colo_process_checkpoint(MigrationState *s) Error *local_err = NULL; int ret; -last_colo_mode = get_colo_mode(); -if (last_colo_mode != COLO_MODE_PRIMARY) { +if (get_colo_mode() != COLO_MODE_PRIMARY) { error_report("COLO mode must be COLO_MODE_PRIMARY"); return; } @@ -830,8 +829,7 @@ void *colo_process_incoming_thread(void *opaque) migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE, MIGRATION_STATUS_COLO); -last_colo_mode = get_colo_mode(); -if (last_colo_mode != COLO_MODE_SECONDARY) { +if (get_colo_mode() != COLO_MODE_SECONDARY) { error_report("COLO mode must be COLO_MODE_SECONDARY"); return NULL; } -- 2.30.2
[PATCH v2 7/7] Optimized the function of fill_connection_key.
From: "Rao, Lei" Remove some unnecessary code to improve the performance of the filter-rewriter module. Signed-off-by: Lei Rao Reviewed-by: Zhang Chen --- net/colo-compare.c| 2 +- net/colo.c| 31 --- net/colo.h| 6 +++--- net/filter-rewriter.c | 10 +- 4 files changed, 17 insertions(+), 32 deletions(-) diff --git a/net/colo-compare.c b/net/colo-compare.c index 4a64a5d386..6a1354dcb1 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -264,7 +264,7 @@ static int packet_enqueue(CompareState *s, int mode, Connection **con) pkt = NULL; return -1; } -fill_connection_key(pkt, &key); +fill_connection_key(pkt, &key, 0); conn = connection_get(s->connection_track_table, &key, diff --git a/net/colo.c b/net/colo.c index 3a3e6e89a0..5e7232ce47 100644 --- a/net/colo.c +++ b/net/colo.c @@ -83,19 +83,26 @@ int parse_packet_early(Packet *pkt) return 0; } -void extract_ip_and_port(uint32_t tmp_ports, ConnectionKey *key, Packet *pkt) +void extract_ip_and_port(uint32_t tmp_ports, ConnectionKey *key, + Packet *pkt, int reverse) { +if (reverse) { +key->src = pkt->ip->ip_dst; +key->dst = pkt->ip->ip_src; +key->src_port = ntohs(tmp_ports & 0x); +key->dst_port = ntohs(tmp_ports >> 16); +} else { key->src = pkt->ip->ip_src; key->dst = pkt->ip->ip_dst; key->src_port = ntohs(tmp_ports >> 16); key->dst_port = ntohs(tmp_ports & 0x); +} } -void fill_connection_key(Packet *pkt, ConnectionKey *key) +void fill_connection_key(Packet *pkt, ConnectionKey *key, int reverse) { -uint32_t tmp_ports; +uint32_t tmp_ports = 0; -memset(key, 0, sizeof(*key)); key->ip_proto = pkt->ip->ip_p; switch (key->ip_proto) { @@ -106,29 +113,15 @@ void fill_connection_key(Packet *pkt, ConnectionKey *key) case IPPROTO_SCTP: case IPPROTO_UDPLITE: tmp_ports = *(uint32_t *)(pkt->transport_header); -extract_ip_and_port(tmp_ports, key, pkt); break; case IPPROTO_AH: tmp_ports = *(uint32_t *)(pkt->transport_header + 4); -extract_ip_and_port(tmp_ports, key, pkt); break; default: break; } -} - -void reverse_connection_key(ConnectionKey *key) -{ -struct in_addr tmp_ip; -uint16_t tmp_port; - -tmp_ip = key->src; -key->src = key->dst; -key->dst = tmp_ip; -tmp_port = key->src_port; -key->src_port = key->dst_port; -key->dst_port = tmp_port; +extract_ip_and_port(tmp_ports, key, pkt, reverse); } Connection *connection_new(ConnectionKey *key) diff --git a/net/colo.h b/net/colo.h index d91cd245c4..5f4d502df3 100644 --- a/net/colo.h +++ b/net/colo.h @@ -89,9 +89,9 @@ typedef struct Connection { uint32_t connection_key_hash(const void *opaque); int connection_key_equal(const void *opaque1, const void *opaque2); int parse_packet_early(Packet *pkt); -void extract_ip_and_port(uint32_t tmp_ports, ConnectionKey *key, Packet *pkt); -void fill_connection_key(Packet *pkt, ConnectionKey *key); -void reverse_connection_key(ConnectionKey *key); +void extract_ip_and_port(uint32_t tmp_ports, ConnectionKey *key, + Packet *pkt, int reverse); +void fill_connection_key(Packet *pkt, ConnectionKey *key, int reverse); Connection *connection_new(ConnectionKey *key); void connection_destroy(void *opaque); Connection *connection_get(GHashTable *connection_track_table, diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index cb3a96cde1..bf05023dc3 100644 --- a/net/filter-rewriter.c +++ b/net/filter-rewriter.c @@ -279,15 +279,7 @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf, */ if (pkt && is_tcp_packet(pkt)) { -fill_connection_key(pkt, &key); - -if (sender == nf->netdev) { -/* - * We need make tcp TX and RX packet - * into one connection. - */ -reverse_connection_key(&key); -} +fill_connection_key(pkt, &key, sender == nf->netdev); /* After failover we needn't change new TCP packet */ if (s->failover_mode && -- 2.30.2
RE: [PATCH v2 7/7] Optimized the function of fill_connection_key.
Will be changed and sent separately. Thanks, Lei -Original Message- From: Juan Quintela Sent: Wednesday, November 3, 2021 12:23 AM To: Rao, Lei Cc: Zhang, Chen ; lizhij...@cn.fujitsu.com; jasow...@redhat.com; zhang.zhanghaili...@huawei.com; lukasstra...@web.de; dgilb...@redhat.com; qemu-devel@nongnu.org Subject: Re: [PATCH v2 7/7] Optimized the function of fill_connection_key. "Rao, Lei" wrote: > From: "Rao, Lei" > > Remove some unnecessary code to improve the performance of the > filter-rewriter module. > > Signed-off-by: Lei Rao > Reviewed-by: Zhang Chen As Chen has already reviewed it: Reviewed-by: Juan Quintela But I think that you should change in a following patch: s/int reverse/bool reverse/ Later, Juan.
[PATCH v3] Optimized the function of fill_connection_key.
From: "Rao, Lei" Remove some unnecessary code to improve the performance of the filter-rewriter module. Signed-off-by: Lei Rao Reviewed-by: Zhang Chen Reviewed-by: Juan Quintela --- net/colo-compare.c| 2 +- net/colo.c| 31 --- net/colo.h| 6 +++--- net/filter-rewriter.c | 10 +- 4 files changed, 17 insertions(+), 32 deletions(-) diff --git a/net/colo-compare.c b/net/colo-compare.c index 4a64a5d..b8876d7 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -264,7 +264,7 @@ static int packet_enqueue(CompareState *s, int mode, Connection **con) pkt = NULL; return -1; } -fill_connection_key(pkt, &key); +fill_connection_key(pkt, &key, false); conn = connection_get(s->connection_track_table, &key, diff --git a/net/colo.c b/net/colo.c index 3a3e6e8..1f8162f 100644 --- a/net/colo.c +++ b/net/colo.c @@ -83,19 +83,26 @@ int parse_packet_early(Packet *pkt) return 0; } -void extract_ip_and_port(uint32_t tmp_ports, ConnectionKey *key, Packet *pkt) +void extract_ip_and_port(uint32_t tmp_ports, ConnectionKey *key, + Packet *pkt, bool reverse) { +if (reverse) { +key->src = pkt->ip->ip_dst; +key->dst = pkt->ip->ip_src; +key->src_port = ntohs(tmp_ports & 0x); +key->dst_port = ntohs(tmp_ports >> 16); +} else { key->src = pkt->ip->ip_src; key->dst = pkt->ip->ip_dst; key->src_port = ntohs(tmp_ports >> 16); key->dst_port = ntohs(tmp_ports & 0x); +} } -void fill_connection_key(Packet *pkt, ConnectionKey *key) +void fill_connection_key(Packet *pkt, ConnectionKey *key, bool reverse) { -uint32_t tmp_ports; +uint32_t tmp_ports = 0; -memset(key, 0, sizeof(*key)); key->ip_proto = pkt->ip->ip_p; switch (key->ip_proto) { @@ -106,29 +113,15 @@ void fill_connection_key(Packet *pkt, ConnectionKey *key) case IPPROTO_SCTP: case IPPROTO_UDPLITE: tmp_ports = *(uint32_t *)(pkt->transport_header); -extract_ip_and_port(tmp_ports, key, pkt); break; case IPPROTO_AH: tmp_ports = *(uint32_t *)(pkt->transport_header + 4); -extract_ip_and_port(tmp_ports, key, pkt); break; default: break; } -} - -void reverse_connection_key(ConnectionKey *key) -{ -struct in_addr tmp_ip; -uint16_t tmp_port; - -tmp_ip = key->src; -key->src = key->dst; -key->dst = tmp_ip; -tmp_port = key->src_port; -key->src_port = key->dst_port; -key->dst_port = tmp_port; +extract_ip_and_port(tmp_ports, key, pkt, reverse); } Connection *connection_new(ConnectionKey *key) diff --git a/net/colo.h b/net/colo.h index d91cd24..8b3e8d5 100644 --- a/net/colo.h +++ b/net/colo.h @@ -89,9 +89,9 @@ typedef struct Connection { uint32_t connection_key_hash(const void *opaque); int connection_key_equal(const void *opaque1, const void *opaque2); int parse_packet_early(Packet *pkt); -void extract_ip_and_port(uint32_t tmp_ports, ConnectionKey *key, Packet *pkt); -void fill_connection_key(Packet *pkt, ConnectionKey *key); -void reverse_connection_key(ConnectionKey *key); +void extract_ip_and_port(uint32_t tmp_ports, ConnectionKey *key, + Packet *pkt, bool reverse); +void fill_connection_key(Packet *pkt, ConnectionKey *key, bool reverse); Connection *connection_new(ConnectionKey *key); void connection_destroy(void *opaque); Connection *connection_get(GHashTable *connection_track_table, diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index cb3a96c..bf05023 100644 --- a/net/filter-rewriter.c +++ b/net/filter-rewriter.c @@ -279,15 +279,7 @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf, */ if (pkt && is_tcp_packet(pkt)) { -fill_connection_key(pkt, &key); - -if (sender == nf->netdev) { -/* - * We need make tcp TX and RX packet - * into one connection. - */ -reverse_connection_key(&key); -} +fill_connection_key(pkt, &key, sender == nf->netdev); /* After failover we needn't change new TCP packet */ if (s->failover_mode && -- 1.8.3.1
[PATCH v7 0/2] Optimized some codes and fixed PVM hang when enabling auto-converge
From: "Rao, Lei" Changes since v1-v6: --Reset the state of the auto-converge counters at every checkpoint instead of directly disabling. --Remove cpu_throttle_stop from mig_throttle_counter_reset. The series of patches include: Reduced the PVM stop time during checkpoint. Fixed the PVM hang when enabling auto-converge feature for COLO. Rao, Lei (2): Reset the auto-converge counter at every checkpoint. Reduce the PVM stop time during Checkpoint migration/colo.c | 4 migration/ram.c | 57 +--- migration/ram.h | 1 + 3 files changed, 59 insertions(+), 3 deletions(-) -- 1.8.3.1
[PATCH v7 1/2] Reset the auto-converge counter at every checkpoint.
From: "Rao, Lei" if we don't reset the auto-converge counter, it will continue to run with COLO running, and eventually the system will hang due to the CPU throttle reaching DEFAULT_MIGRATE_MAX_CPU_THROTTLE. Signed-off-by: Lei Rao Reviewed-by: Dr. David Alan Gilbert Reviewed-by: Lukas Straub Tested-by: Lukas Straub --- migration/colo.c | 4 migration/ram.c | 9 + migration/ram.h | 1 + 3 files changed, 14 insertions(+) diff --git a/migration/colo.c b/migration/colo.c index e3b1f13..2415325 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -459,6 +459,10 @@ static int colo_do_checkpoint_transaction(MigrationState *s, if (ret < 0) { goto out; } + +if (migrate_auto_converge()) { +mig_throttle_counter_reset(); +} /* * Only save VM's live state, which not including device state. * TODO: We may need a timeout mechanism to prevent COLO process diff --git a/migration/ram.c b/migration/ram.c index 847af46..d5f98e6 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -641,6 +641,15 @@ static void mig_throttle_guest_down(uint64_t bytes_dirty_period, } } +void mig_throttle_counter_reset(void) +{ +RAMState *rs = ram_state; + +rs->time_last_bitmap_sync = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); +rs->num_dirty_pages_period = 0; +rs->bytes_xfer_prev = ram_counters.transferred; +} + /** * xbzrle_cache_zero_page: insert a zero page in the XBZRLE cache * diff --git a/migration/ram.h b/migration/ram.h index dda1988..c515396 100644 --- a/migration/ram.h +++ b/migration/ram.h @@ -50,6 +50,7 @@ bool ramblock_is_ignored(RAMBlock *block); int xbzrle_cache_resize(uint64_t new_size, Error **errp); uint64_t ram_bytes_remaining(void); uint64_t ram_bytes_total(void); +void mig_throttle_counter_reset(void); uint64_t ram_pagesize_summary(void); int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len); -- 1.8.3.1
[PATCH v7 2/2] Reduce the PVM stop time during Checkpoint
From: "Rao, Lei" When flushing memory from ram cache to ram during every checkpoint on secondary VM, we can copy continuous chunks of memory instead of 4096 bytes per time to reduce the time of VM stop during checkpoint. Signed-off-by: Lei Rao Reviewed-by: Dr. David Alan Gilbert Reviewed-by: Lukas Straub Tested-by: Lukas Straub --- migration/ram.c | 48 +--- 1 file changed, 45 insertions(+), 3 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index d5f98e6..863035d 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -845,6 +845,41 @@ migration_clear_memory_region_dirty_bitmap_range(RAMBlock *rb, } } +/* + * colo_bitmap_find_diry:find contiguous dirty pages from start + * + * Returns the page offset within memory region of the start of the contiguout + * dirty page + * + * @rs: current RAM state + * @rb: RAMBlock where to search for dirty pages + * @start: page where we start the search + * @num: the number of contiguous dirty pages + */ +static inline +unsigned long colo_bitmap_find_dirty(RAMState *rs, RAMBlock *rb, + unsigned long start, unsigned long *num) +{ +unsigned long size = rb->used_length >> TARGET_PAGE_BITS; +unsigned long *bitmap = rb->bmap; +unsigned long first, next; + +*num = 0; + +if (ramblock_is_ignored(rb)) { +return size; +} + +first = find_next_bit(bitmap, size, start); +if (first >= size) { +return first; +} +next = find_next_zero_bit(bitmap, size, first + 1); +assert(next >= first); +*num = next - first; +return first; +} + static inline bool migration_bitmap_clear_dirty(RAMState *rs, RAMBlock *rb, unsigned long page) @@ -3895,19 +3930,26 @@ void colo_flush_ram_cache(void) block = QLIST_FIRST_RCU(&ram_list.blocks); while (block) { -offset = migration_bitmap_find_dirty(ram_state, block, offset); +unsigned long num = 0; +offset = colo_bitmap_find_dirty(ram_state, block, offset, &num); if (!offset_in_ramblock(block, ((ram_addr_t)offset) << TARGET_PAGE_BITS)) { offset = 0; +num = 0; block = QLIST_NEXT_RCU(block, next); } else { -migration_bitmap_clear_dirty(ram_state, block, offset); +unsigned long i = 0; + +for (i = 0; i < num; i++) { +migration_bitmap_clear_dirty(ram_state, block, offset + i); +} dst_host = block->host + (((ram_addr_t)offset) << TARGET_PAGE_BITS); src_host = block->colo_cache + (((ram_addr_t)offset) << TARGET_PAGE_BITS); -memcpy(dst_host, src_host, TARGET_PAGE_SIZE); +memcpy(dst_host, src_host, TARGET_PAGE_SIZE * num); +offset += num; } } } -- 1.8.3.1
[PATCH 2/2] migration/ram.c: Remove the qemu_mutex_lock in colo_flush_ram_cache.
From: "Rao, Lei" The code to acquire bitmap_mutex is added in the commit of "63268c4970a5f126cc9af75f3ccb8057abef5ec0". There is no need to acquire bitmap_mutex in colo_flush_ram_cache(). This is because the colo_flush_ram_cache only be called on the COLO secondary VM, which is the destination side. On the COLO secondary VM, only the COLO thread will touch the bitmap of ram cache. Signed-off-by: Lei Rao --- migration/ram.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 863035d..2c688f5 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -3918,7 +3918,6 @@ void colo_flush_ram_cache(void) unsigned long offset = 0; memory_global_dirty_log_sync(); -qemu_mutex_lock(&ram_state->bitmap_mutex); WITH_RCU_READ_LOCK_GUARD() { RAMBLOCK_FOREACH_NOT_IGNORED(block) { ramblock_sync_dirty_bitmap(ram_state, block); @@ -3954,7 +3953,6 @@ void colo_flush_ram_cache(void) } } trace_colo_flush_ram_cache_end(); -qemu_mutex_unlock(&ram_state->bitmap_mutex); } /** -- 1.8.3.1
[PATCH 1/2] Fixed a QEMU hang when guest poweroff in COLO mode
From: "Rao, Lei" When the PVM guest poweroff, the COLO thread may wait a semaphore in colo_process_checkpoint().So, we should wake up the COLO thread before migration shutdown. Signed-off-by: Lei Rao --- include/migration/colo.h | 1 + migration/colo.c | 14 ++ migration/migration.c| 10 ++ 3 files changed, 25 insertions(+) diff --git a/include/migration/colo.h b/include/migration/colo.h index 768e1f0..525b45a 100644 --- a/include/migration/colo.h +++ b/include/migration/colo.h @@ -37,4 +37,5 @@ COLOMode get_colo_mode(void); void colo_do_failover(void); void colo_checkpoint_notify(void *opaque); +void colo_shutdown(COLOMode mode); #endif diff --git a/migration/colo.c b/migration/colo.c index 2415325..385c1d7 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -820,6 +820,20 @@ static void colo_wait_handle_message(MigrationIncomingState *mis, } } +void colo_shutdown(COLOMode mode) +{ +if (mode == COLO_MODE_PRIMARY) { +MigrationState *s = migrate_get_current(); + +qemu_event_set(&s->colo_checkpoint_event); +qemu_sem_post(&s->colo_exit_sem); +} else { +MigrationIncomingState *mis = migration_incoming_get_current(); + +qemu_sem_post(&mis->colo_incoming_sem); +} +} + void *colo_process_incoming_thread(void *opaque) { MigrationIncomingState *mis = opaque; diff --git a/migration/migration.c b/migration/migration.c index abaf6f9..9df6328 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -225,6 +225,16 @@ void migration_cancel(const Error *error) void migration_shutdown(void) { +COLOMode mode = get_colo_mode(); + +/* + * When the QEMU main thread exit, the COLO thread + * may wait a semaphore. So, we should wakeup the + * COLO thread before migration shutdown. + */ +if (mode != COLO_MODE_NONE) { +colo_shutdown(mode); + } /* * Cancel the current migration - that will (eventually) * stop the migration using this structure -- 1.8.3.1
RE: [PATCH 1/2] Fixed a QEMU hang when guest poweroff in COLO mode
OK, will be changed in V2. Thanks, Lei -Original Message- From: Juan Quintela Sent: Wednesday, November 10, 2021 5:55 PM To: Rao, Lei Cc: Zhang, Chen ; zhang.zhanghaili...@huawei.com; dgilb...@redhat.com; qemu-devel@nongnu.org Subject: Re: [PATCH 1/2] Fixed a QEMU hang when guest poweroff in COLO mode "Rao, Lei" wrote: > From: "Rao, Lei" > > When the PVM guest poweroff, the COLO thread may wait a semaphore in > colo_process_checkpoint().So, we should wake up the COLO thread before > migration shutdown. > > Signed-off-by: Lei Rao A couple of notes. > --- > include/migration/colo.h | 1 + > migration/colo.c | 14 ++ > migration/migration.c| 10 ++ > 3 files changed, 25 insertions(+) > > diff --git a/include/migration/colo.h b/include/migration/colo.h index > 768e1f0..525b45a 100644 > --- a/include/migration/colo.h > +++ b/include/migration/colo.h > @@ -37,4 +37,5 @@ COLOMode get_colo_mode(void); void > colo_do_failover(void); > > void colo_checkpoint_notify(void *opaque); > +void colo_shutdown(COLOMode mode); > #endif > diff --git a/migration/colo.c b/migration/colo.c index > 2415325..385c1d7 100644 > --- a/migration/colo.c > +++ b/migration/colo.c > @@ -820,6 +820,20 @@ static void > colo_wait_handle_message(MigrationIncomingState *mis, > } > } > > +void colo_shutdown(COLOMode mode) > +{ > +if (mode == COLO_MODE_PRIMARY) { > +MigrationState *s = migrate_get_current(); > + > +qemu_event_set(&s->colo_checkpoint_event); > +qemu_sem_post(&s->colo_exit_sem); > +} else { > +MigrationIncomingState *mis = > + migration_incoming_get_current(); > + > +qemu_sem_post(&mis->colo_incoming_sem); > +} > +} > + > void *colo_process_incoming_thread(void *opaque) { > MigrationIncomingState *mis = opaque; diff --git > a/migration/migration.c b/migration/migration.c index abaf6f9..9df6328 > 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -225,6 +225,16 @@ void migration_cancel(const Error *error) > > void migration_shutdown(void) > { > +COLOMode mode = get_colo_mode(); > + > +/* > + * When the QEMU main thread exit, the COLO thread > + * may wait a semaphore. So, we should wakeup the > + * COLO thread before migration shutdown. > + */ > +if (mode != COLO_MODE_NONE) { > +colo_shutdown(mode); > + } don't put the code on the main path. Could you just put all of this inside a colo_shutdown() call? Thanks, Juan. > /* > * Cancel the current migration - that will (eventually) > * stop the migration using this structure
[PATCH v2] ui/vnc.c: Fixed a deadlock bug.
The GDB statck is as follows: (gdb) bt 0 __lll_lock_wait (futex=futex@entry=0x56211df20360, private=0) at lowlevellock.c:52 1 0x7f263caf20a3 in __GI___pthread_mutex_lock (mutex=0x56211df20360) at ../nptl/pthread_mutex_lock.c:80 2 0x56211a757364 in qemu_mutex_lock_impl (mutex=0x56211df20360, file=0x56211a804857 "../ui/vnc-jobs.h", line=60) at ../util/qemu-thread-posix.c:80 3 0x56211a0ef8c7 in vnc_lock_output (vs=0x56211df14200) at ../ui/vnc-jobs.h:60 4 0x56211a0efcb7 in vnc_clipboard_send (vs=0x56211df14200, count=1, dwords=0x7ffdf1701338) at ../ui/vnc-clipboard.c:138 5 0x56211a0f0129 in vnc_clipboard_notify (notifier=0x56211df244c8, data=0x56211dd1bbf0) at ../ui/vnc-clipboard.c:209 6 0x56211a75dde8 in notifier_list_notify (list=0x56211afa17d0 , data=0x56211dd1bbf0) at ../util/notify.c:39 7 0x56211a0bf0e6 in qemu_clipboard_update (info=0x56211dd1bbf0) at ../ui/clipboard.c:50 8 0x56211a0bf05d in qemu_clipboard_peer_release (peer=0x56211df244c0, selection=QEMU_CLIPBOARD_SELECTION_CLIPBOARD) at ../ui/clipboard.c:41 9 0x56211a0bef9b in qemu_clipboard_peer_unregister (peer=0x56211df244c0) at ../ui/clipboard.c:19 10 0x56211a0d45f3 in vnc_disconnect_finish (vs=0x56211df14200) at ../ui/vnc.c:1358 11 0x56211a0d4c9d in vnc_client_read (vs=0x56211df14200) at ../ui/vnc.c:1611 12 0x56211a0d4df8 in vnc_client_io (ioc=0x56211ce70690, condition=G_IO_IN, opaque=0x56211df14200) at ../ui/vnc.c:1649 13 0x56211a5b976c in qio_channel_fd_source_dispatch (source=0x56211ce50a00, callback=0x56211a0d4d71 , user_data=0x56211df14200) at ../io/channel-watch.c:84 14 0x7f263ccede8e in g_main_context_dispatch () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 15 0x56211a77d4a1 in glib_pollfds_poll () at ../util/main-loop.c:232 16 0x56211a77d51f in os_host_main_loop_wait (timeout=958545) at ../util/main-loop.c:255 17 0x56211a77d630 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:531 18 0x56211a45bc8e in qemu_main_loop () at ../softmmu/runstate.c:726 19 0x56211a0b45fa in main (argc=69, argv=0x7ffdf1701778, envp=0x7ffdf17019a8) at ../softmmu/main.c:50 >From the call trace, we can see it is a deadlock bug. vnc_disconnect_finish will acquire the output_mutex. But, the output_mutex will be acquired again in vnc_clipboard_send. Repeated locking will cause deadlock. So, I move qemu_clipboard_peer_unregister() behind vnc_unlock_output(); Fixes: 0bf41cab93e ("ui/vnc: clipboard support") Signed-off-by: Lei Rao Reviewed-by: Marc-André Lureau --- ui/vnc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/ui/vnc.c b/ui/vnc.c index 1ed1c7efc6..3ccd33dedc 100644 --- a/ui/vnc.c +++ b/ui/vnc.c @@ -1354,12 +1354,12 @@ void vnc_disconnect_finish(VncState *vs) /* last client gone */ vnc_update_server_surface(vs->vd); } +vnc_unlock_output(vs); + if (vs->cbpeer.notifier.notify) { qemu_clipboard_peer_unregister(&vs->cbpeer); } -vnc_unlock_output(vs); - qemu_mutex_destroy(&vs->output_mutex); if (vs->bh != NULL) { qemu_bh_delete(vs->bh); -- 2.32.0
Re: [PATCH] block/nbd.c: Fixed IO request coroutine not being wakeup when kill NBD server.
On 3/3/2022 10:05 PM, Rao, Lei wrote: On 3/3/2022 5:25 PM, Vladimir Sementsov-Ogievskiy wrote: 03.03.2022 05:21, Rao Lei wrote: During the stress test, the IO request coroutine has a probability that it can't be awakened when the NBD server is killed. The GDB statck is as follows: (gdb) bt 0 0x7f2ff990cbf6 in __ppoll (fds=0x55575de85000, nfds=1, timeout=, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44 1 0x55575c302e7c in qemu_poll_ns (fds=0x55575de85000, nfds=1, timeout=59603140) at ../util/qemu-timer.c:348 2 0x55575c2d3c34 in fdmon_poll_wait (ctx=0x55575dc480f0, ready_list=0x7ffd9dd1dae0, timeout=59603140) at ../util/fdmon-poll.c:80 3 0x55575c2d350d in aio_poll (ctx=0x55575dc480f0, blocking=true) at ../util/aio-posix.c:655 4 0x55575c16eabd in bdrv_do_drained_begin (bs=0x55575dee7fe0, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at ../block/io.c:474 5 0x55575c16eba6 in bdrv_drained_begin (bs=0x55575dee7fe0) at ../block/io.c:480 6 0x55575c1aff33 in quorum_del_child (bs=0x55575dee7fe0, child=0x55575dcea690, errp=0x7ffd9dd1dd08) at ../block/quorum.c:1130 7 0x55575c14239b in bdrv_del_child (parent_bs=0x55575dee7fe0, child=0x55575dcea690, errp=0x7ffd9dd1dd08) at ../block.c:7705 8 0x55575c12da28 in qmp_x_blockdev_change (parent=0x55575df404c0 "colo-disk0", has_child=true, child=0x55575de867f0 "children.1", has_node=false, node=0x0, errp=0x7ffd9dd1dd08) at ../blockdev.c:3676 9 0x55575c258435 in qmp_marshal_x_blockdev_change (args=0x7f2fec008190, ret=0x7f2ff7b0bd98, errp=0x7f2ff7b0bd90) at qapi/qapi-commands-block-core.c:1675 10 0x55575c2c6201 in do_qmp_dispatch_bh (opaque=0x7f2ff7b0be30) at ../qapi/qmp-dispatch.c:129 11 0x55575c2ebb1c in aio_bh_call (bh=0x55575dc429c0) at ../util/async.c:141 12 0x55575c2ebc2a in aio_bh_poll (ctx=0x55575dc480f0) at ../util/async.c:169 13 0x55575c2d2d96 in aio_dispatch (ctx=0x55575dc480f0) at ../util/aio-posix.c:415 14 0x55575c2ec07f in aio_ctx_dispatch (source=0x55575dc480f0, callback=0x0, user_data=0x0) at ../util/async.c:311 15 0x7f2ff9e7cfbd in g_main_context_dispatch () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 16 0x55575c2fd581 in glib_pollfds_poll () at ../util/main-loop.c:232 17 0x55575c2fd5ff in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:255 18 0x55575c2fd710 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:531 19 0x55575bfa7588 in qemu_main_loop () at ../softmmu/runstate.c:726 20 0x55575bbee57a in main (argc=60, argv=0x7ffd9dd1e0e8, envp=0x7ffd9dd1e2d0) at ../softmmu/main.c:50 (gdb) qemu coroutine 0x55575e16aac0 0 0x55575c2ee7dc in qemu_coroutine_switch (from_=0x55575e16aac0, to_=0x7f2ff830fba0, action=COROUTINE_YIELD) at ../util/coroutine-ucontext.c:302 1 0x55575c2fe2a9 in qemu_coroutine_yield () at ../util/qemu-coroutine.c:195 2 0x55575c2fe93c in qemu_co_queue_wait_impl (queue=0x55575dc46170, lock=0x7f2b32ad9850) at ../util/qemu-coroutine-lock.c:56 3 0x55575c17ddfb in nbd_co_send_request (bs=0x55575ebfaf20, request=0x7f2b32ad9920, qiov=0x55575dfc15d8) at ../block/nbd.c:478 4 0x55575c17f931 in nbd_co_request (bs=0x55575ebfaf20, request=0x7f2b32ad9920, write_qiov=0x55575dfc15d8) at ../block/nbd.c:1182 5 0x55575c17fe14 in nbd_client_co_pwritev (bs=0x55575ebfaf20, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, flags=0) at ../block/nbd.c:1284 6 0x55575c170d25 in bdrv_driver_pwritev (bs=0x55575ebfaf20, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, qiov_offset=0, flags=0) at ../block/io.c:1264 7 0x55575c1733b4 in bdrv_aligned_pwritev (child=0x55575dff6890, req=0x7f2b32ad9ad0, offset=403487858688, bytes=4538368, align=1, qiov=0x55575dfc15d8, qiov_offset=0, flags=0) at ../block/io.c:2126 8 0x55575c173c67 in bdrv_co_pwritev_part (child=0x55575dff6890, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, qiov_offset=0, flags=0) at ../block/io.c:2314 9 0x55575c17391b in bdrv_co_pwritev (child=0x55575dff6890, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, flags=0) at ../block/io.c:2233 10 0x55575c1ee506 in replication_co_writev (bs=0x55575e9824f0, sector_num=788062224, remaining_sectors=8864, qiov=0x55575dfc15d8, flags=0) at ../block/replication.c:270 11 0x55575c170eed in bdrv_driver_pwritev (bs=0x55575e9824f0, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, qiov_offset=0, flags=0) at ../block/io.c:1297 12 0x55575c1733b4 in bdrv_aligned_pwritev (child=0x55575dcea690, req=0x7f2b32ad9e00, offset=403487858688, bytes=4538368, align=512, qiov=0x55575dfc15d8, qiov_offset=0, flags=0) at ../block/io.c:2126 13 0x55575c173c67 in bdrv_co_pwritev_part (child=0x55575dcea690, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, qiov_offset=0, flags=0) at ../block/io.c:2314 14 0x55575c17391b in bdrv_co_pwri
[PATCH v2] block/nbd.c: Fixed IO request coroutine not being wakeup when kill NBD server
During the IO stress test, the IO request coroutine has a probability that is can't be awakened when the NBD server is killed. The GDB stack is as follows: (gdb) bt 0 0x7f2ff990cbf6 in __ppoll (fds=0x55575de85000, nfds=1, timeout=, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44 1 0x55575c302e7c in qemu_poll_ns (fds=0x55575de85000, nfds=1, timeout=59603140) at ../util/qemu-timer.c:348 2 0x55575c2d3c34 in fdmon_poll_wait (ctx=0x55575dc480f0, ready_list=0x7ffd9dd1dae0, timeout=59603140) at ../util/fdmon-poll.c:80 3 0x55575c2d350d in aio_poll (ctx=0x55575dc480f0, blocking=true) at ../util/aio-posix.c:655 4 0x55575c16eabd in bdrv_do_drained_begin(bs=0x55575dee7fe0, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true)at ../block/io.c:474 5 0x55575c16eba6 in bdrv_drained_begin (bs=0x55575dee7fe0) at ../block/io.c:480 6 0x55575c1aff33 in quorum_del_child (bs=0x55575dee7fe0, child=0x55575dcea690, errp=0x7ffd9dd1dd08) at ../block/quorum.c:1130 7 0x55575c14239b in bdrv_del_child (parent_bs=0x55575dee7fe0, child=0x55575dcea690, errp=0x7ffd9dd1dd08) at ../block.c:7705 8 0x55575c12da28 in qmp_x_blockdev_change(parent=0x55575df404c0 "colo-disk0", has_child=true, child=0x55575de867f0 "children.1", has_node=false, no de=0x0, errp=0x7ffd9dd1dd08) at ../blockdev.c:3676 9 0x55575c258435 in qmp_marshal_x_blockdev_change (args=0x7f2fec008190, ret=0x7f2ff7b0bd98, errp=0x7f2ff7b0bd90) at qapi/qapi-commands-block-core.c :1675 10 0x55575c2c6201 in do_qmp_dispatch_bh (opaque=0x7f2ff7b0be30) at ../qapi/qmp-dispatch.c:129 11 0x55575c2ebb1c in aio_bh_call (bh=0x55575dc429c0) at ../util/async.c:141 12 0x55575c2ebc2a in aio_bh_poll (ctx=0x55575dc480f0) at ../util/async.c:169 13 0x55575c2d2d96 in aio_dispatch (ctx=0x55575dc480f0) at ../util/aio-posix.c:415 14 0x55575c2ec07f in aio_ctx_dispatch (source=0x55575dc480f0, callback=0x0, user_data=0x0) at ../util/async.c:311 15 0x7f2ff9e7cfbd in g_main_context_dispatch () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 16 0x55575c2fd581 in glib_pollfds_poll () at ../util/main-loop.c:232 17 0x55575c2fd5ff in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:255 18 0x55575c2fd710 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:531 19 0x55575bfa7588 in qemu_main_loop () at ../softmmu/runstate.c:726 20 0x55575bbee57a in main (argc=60, argv=0x7ffd9dd1e0e8, envp=0x7ffd9dd1e2d0) at ../softmmu/main.c:50 (gdb) qemu coroutine 0x55575e16aac0 0 0x55575c2ee7dc in qemu_coroutine_switch (from_=0x55575e16aac0, to_=0x7f2ff830fba0, action=COROUTINE_YIELD) at ../util/coroutine-ucontext.c:302 1 0x55575c2fe2a9 in qemu_coroutine_yield () at ../util/qemu-coroutine.c:195 2 0x55575c2fe93c in qemu_co_queue_wait_impl (queue=0x55575dc46170, lock=0x7f2b32ad9850) at ../util/qemu-coroutine-lock.c:56 3 0x55575c17ddfb in nbd_co_send_request (bs=0x55575ebfaf20, request=0x7f2b32ad9920, qiov=0x55575dfc15d8) at ../block/nbd.c:478 4 0x55575c17f931 in nbd_co_request (bs=0x55575ebfaf20, request=0x7f2b32ad9920, write_qiov=0x55575dfc15d8) at ../block/nbd.c:1182 5 0x55575c17fe14 in nbd_client_co_pwritev (bs=0x55575ebfaf20, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, flags=0) at ../block/nbd.c:1284 6 0x55575c170d25 in bdrv_driver_pwritev (bs=0x55575ebfaf20, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, qiov_offset=0, flags=0) at ../block/io.c:1264 7 0x55575c1733b4 in bdrv_aligned_pwritev (child=0x55575dff6890, req=0x7f2b32ad9ad0, offset=403487858688, bytes=4538368, align=1, qiov=0x55575dfc15d8, qiov_offset=0, flags=0) at ../block/io.c:2126 8 0x55575c173c67 in bdrv_co_pwritev_part (child=0x55575dff6890, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, qiov_offset=0, flags=0) at ../block/io.c:2314 9 0x55575c17391b in bdrv_co_pwritev (child=0x55575dff6890, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, flags=0) at ../block/io.c:2233 10 0x55575c1ee506 in replication_co_writev (bs=0x55575e9824f0, sector_num=788062224, remaining_sectors=8864, qiov=0x55575dfc15d8, flags=0) at ../block/replication.c:270 11 0x55575c170eed in bdrv_driver_pwritev (bs=0x55575e9824f0, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, qiov_offset=0, flags=0) at ../block/io.c:1297 12 0x55575c1733b4 in bdrv_aligned_pwritev (child=0x55575dcea690, req=0x7f2b32ad9e00, offset=403487858688, bytes=4538368, align=512, qiov=0x55575dfc15d8, qiov_offset=0, flags=0) at ../block/io.c:2126 13 0x55575c173c67 in bdrv_co_pwritev_part (child=0x55575dcea690, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, qiov_offset=0, flags=0) at ../block/io.c:2314 14 0x55575c17391b in bdrv_co_pwritev (child=0x55575dcea690, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, flags=0) at ../block/io.c:2233 15 0x55575c1aeffa in write_quorum_entry (opaque=0x7f2fddaf8
[PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case
We found that the QIO channel coroutine could not be awakened in some corner cases during our stress test for COLO. The patch fixes as follow: #0 0x7fad72e24bf6 in __ppoll (fds=0x5563d75861f0, nfds=1, timeout=, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44 #1 0x5563d6977c68 in qemu_poll_ns (fds=0x5563d75861f0, nfds=1, timeout=59697630) at util/qemu-timer.c:348 #2 0x5563d697ac44 in aio_poll (ctx=0x5563d755dfa0, blocking=true) at util/aio-posix.c:669 #3 0x5563d68ba24f in bdrv_do_drained_begin (bs=0x5563d75ea0c0, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at block/io.c:432 #4 0x5563d68ba338 in bdrv_drained_begin (bs=0x5563d75ea0c0) at block/io.c:438 #5 0x5563d689c6d2 in quorum_del_child (bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block/quorum.c:1063 #6 0x5563d684328f in bdrv_del_child (parent_bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block.c:6568 #7 0x5563d658499e in qmp_x_blockdev_change (parent=0x5563d80ec4c0 "colo-disk0", has_child=true, child=0x5563d7577d30 "children.1", has_node=false, node=0x0,errp=0x7fff3cf5b960) at blockdev.c:4494 #8 0x5563d67e8b4e in qmp_marshal_x_blockdev_change (args=0x7fad6400ada0, ret=0x7fff3cf5b9f8, errp=0x7fff3cf5b9f0) at qapi/qapi-commands-block-core.c:1538 #9 0x5563d691cd9e in do_qmp_dispatch (cmds=0x5563d719a4f0 , request=0x7fad64009d80, allow_oob=true, errp=0x7fff3cf5ba98) at qapi/qmp-dispatch.c:132 #10 0x5563d691cfab in qmp_dispatch (cmds=0x5563d719a4f0 , request=0x7fad64009d80, allow_oob=true) at qapi/qmp-dispatch.c:175 #11 0x5563d67b7787 in monitor_qmp_dispatch (mon=0x5563d7605d40, req=0x7fad64009d80) at monitor/qmp.c:145 #12 0x5563d67b7b24 in monitor_qmp_bh_dispatcher (data=0x0) at monitor/qmp.c:234 #13 0x5563d69754c2 in aio_bh_call (bh=0x5563d7473230) at util/async.c:89 #14 0x5563d697555e in aio_bh_poll (ctx=0x5563d7471da0) at util/async.c:117 #15 0x5563d697a423 in aio_dispatch (ctx=0x5563d7471da0) at util/aio-posix.c:459 #16 0x5563d6975917 in aio_ctx_dispatch (source=0x5563d7471da0, callback=0x0, user_data=0x0) at util/async.c:260 #17 0x7fad730e1fbd in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #18 0x5563d6978cda in glib_pollfds_poll () at util/main-loop.c:219 #19 0x5563d6978d58 in os_host_main_loop_wait (timeout=977650) at util/main-loop.c:242 #20 0x5563d6978e69 in main_loop_wait (nonblocking=0) at util/main-loop.c:518 #21 0x5563d658f5ed in main_loop () at vl.c:1814 #22 0x5563d6596bb7 in main (argc=61, argv=0x7fff3cf5c0c8, envp=0x7fff3cf5c2b8) at vl.c:450 From the call trace, we can see that the QEMU main thread is waiting for the in_flight return to zero. But the in_filght never reaches 0. After analysis, the root cause is that the coroutine of NBD was not awakened. Although this bug was found in the COLO test, I think this is a universal problem in the QIO module. This issue also affects other modules depending on QIO such as NBD. We dump the following data: $2 = { in_flight = 2, state = NBD_CLIENT_QUIT, connect_status = 0, connect_err = 0x0, wait_in_flight = false, requests = {{ coroutine = 0x0, offset = 273077071872, receiving = false, }, { coroutine = 0x7f1164571df0, offset = 498792529920, receiving = false, }, { coroutine = 0x0, offset = 500663590912, receiving = false, }, { ... } }, reply = { simple = { magic = 1732535960, error = 0, handle = 0 }, structured = { magic = 1732535960, flags = 0, type = 0, handle = 0, length = 0 }, { magic = 1732535960, _skip = 0, handle = 0 } }, bs = 0x7f11640e2140, reconnect_delay = 0, saddr = 0x7f11640e1f80, export = 0x7f11640dc560 "parent0", } From the data, we can see the coroutine of NBD does not exit normally when killing the NBD server(we kill the Secondary VM in the COLO stress test). Then it will not execute in_flight--. So, the QEMU main thread will hang here. Further analysis, I found the coroutine of NBD will yield in nbd_send_request() or qio_channel_write_all() in nbd_co_send_request. By the way, the yield is due to the kernel return EAGAIN(under the stress test). However, NBD didn't know it would yield here. So, the nbd_recv_coroutines_wake won't wake it up because req->receiving is false. if the NBD server is terminated abnormally at the same time. The G_IO_OUT will be invalid,
RE: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case
-Original Message- From: Daniel P. Berrangé Sent: Wednesday, December 1, 2021 5:11 PM To: Rao, Lei Cc: Zhang, Chen ; ebl...@redhat.com; vsement...@virtuozzo.com; kw...@redhat.com; hre...@redhat.com; qemu-bl...@nongnu.org; qemu-devel@nongnu.org Subject: Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case On Wed, Dec 01, 2021 at 03:54:27PM +0800, Rao, Lei wrote: > We found that the QIO channel coroutine could not be awakened in some > corner cases during our stress test for COLO. > The patch fixes as follow: > #0 0x7fad72e24bf6 in __ppoll (fds=0x5563d75861f0, nfds=1, > timeout=, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44 > #1 0x5563d6977c68 in qemu_poll_ns (fds=0x5563d75861f0, nfds=1, > timeout=59697630) at util/qemu-timer.c:348 > #2 0x5563d697ac44 in aio_poll (ctx=0x5563d755dfa0, > blocking=true) at util/aio-posix.c:669 > #3 0x5563d68ba24f in bdrv_do_drained_begin (bs=0x5563d75ea0c0, > recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at > block/io.c:432 > #4 0x5563d68ba338 in bdrv_drained_begin (bs=0x5563d75ea0c0) at > block/io.c:438 > #5 0x5563d689c6d2 in quorum_del_child (bs=0x5563d75ea0c0, > child=0x5563d7908600, errp=0x7fff3cf5b960) at block/quorum.c:1063 > #6 0x5563d684328f in bdrv_del_child (parent_bs=0x5563d75ea0c0, > child=0x5563d7908600, errp=0x7fff3cf5b960) at block.c:6568 > #7 0x5563d658499e in qmp_x_blockdev_change > (parent=0x5563d80ec4c0 "colo-disk0", has_child=true, child=0x5563d7577d30 > "children.1", has_node=false, node=0x0,errp=0x7fff3cf5b960) at blockdev.c:4494 > #8 0x5563d67e8b4e in qmp_marshal_x_blockdev_change > (args=0x7fad6400ada0, ret=0x7fff3cf5b9f8, errp=0x7fff3cf5b9f0) at > qapi/qapi-commands-block-core.c:1538 > #9 0x5563d691cd9e in do_qmp_dispatch (cmds=0x5563d719a4f0 > , request=0x7fad64009d80, allow_oob=true, errp=0x7fff3cf5ba98) > at qapi/qmp-dispatch.c:132 > #10 0x5563d691cfab in qmp_dispatch (cmds=0x5563d719a4f0 > , request=0x7fad64009d80, allow_oob=true) at > qapi/qmp-dispatch.c:175 > #11 0x5563d67b7787 in monitor_qmp_dispatch (mon=0x5563d7605d40, > req=0x7fad64009d80) at monitor/qmp.c:145 > #12 0x5563d67b7b24 in monitor_qmp_bh_dispatcher (data=0x0) at > monitor/qmp.c:234 > #13 0x5563d69754c2 in aio_bh_call (bh=0x5563d7473230) at > util/async.c:89 > #14 0x5563d697555e in aio_bh_poll (ctx=0x5563d7471da0) at > util/async.c:117 > #15 0x5563d697a423 in aio_dispatch (ctx=0x5563d7471da0) at > util/aio-posix.c:459 > #16 0x5563d6975917 in aio_ctx_dispatch (source=0x5563d7471da0, > callback=0x0, user_data=0x0) at util/async.c:260 > #17 0x7fad730e1fbd in g_main_context_dispatch () from > /lib/x86_64-linux-gnu/libglib-2.0.so.0 > #18 0x5563d6978cda in glib_pollfds_poll () at util/main-loop.c:219 > #19 0x5563d6978d58 in os_host_main_loop_wait (timeout=977650) at > util/main-loop.c:242 > #20 0x5563d6978e69 in main_loop_wait (nonblocking=0) at > util/main-loop.c:518 > #21 0x5563d658f5ed in main_loop () at vl.c:1814 > #22 0x5563d6596bb7 in main (argc=61, argv=0x7fff3cf5c0c8, > envp=0x7fff3cf5c2b8) at vl.c:450 > > From the call trace, we can see that the QEMU main thread is waiting for > the in_flight return to zero. But the in_filght never reaches 0. > After analysis, the root cause is that the coroutine of NBD was not > awakened. Although this bug was found in the COLO test, I think this is a > universal problem in the QIO module. This issue also affects other > modules depending on QIO such as NBD. We dump the following data: > $2 = { > in_flight = 2, > state = NBD_CLIENT_QUIT, > connect_status = 0, > connect_err = 0x0, > wait_in_flight = false, > requests = {{ > coroutine = 0x0, > offset = 273077071872, > receiving = false, > }, { > coroutine = 0x7f1164571df0, > offset = 498792529920, > receiving = false, > }, { > coroutine = 0x0, > offset = 500663590912, > receiving = false, > }, { > ... > } }, > reply = { > simple = { > magic = 1732535960, > error = 0, > handle = 0 > }, > structured = { > magic = 1732535960, > flags = 0, > type = 0, > handle = 0, > length = 0 > }, >
RE: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case
-Original Message- From: Vladimir Sementsov-Ogievskiy Sent: Wednesday, December 1, 2021 10:27 PM To: Rao, Lei ; Daniel P. Berrangé Cc: Zhang, Chen ; ebl...@redhat.com; kw...@redhat.com; hre...@redhat.com; qemu-bl...@nongnu.org; qemu-devel@nongnu.org Subject: Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case 01.12.2021 12:48, Rao, Lei wrote: > > > -Original Message- > From: Daniel P. Berrangé > Sent: Wednesday, December 1, 2021 5:11 PM > To: Rao, Lei > Cc: Zhang, Chen ; ebl...@redhat.com; > vsement...@virtuozzo.com; kw...@redhat.com; hre...@redhat.com; > qemu-bl...@nongnu.org; qemu-devel@nongnu.org > Subject: Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure > QIO exits cleanly in some corner case > > On Wed, Dec 01, 2021 at 03:54:27PM +0800, Rao, Lei wrote: >> We found that the QIO channel coroutine could not be awakened in some >> corner cases during our stress test for COLO. >> The patch fixes as follow: >> #0 0x7fad72e24bf6 in __ppoll (fds=0x5563d75861f0, nfds=1, >> timeout=, sigmask=0x0) at >> ../sysdeps/unix/sysv/linux/ppoll.c:44 >> #1 0x5563d6977c68 in qemu_poll_ns (fds=0x5563d75861f0, nfds=1, >> timeout=59697630) at util/qemu-timer.c:348 >> #2 0x5563d697ac44 in aio_poll (ctx=0x5563d755dfa0, >> blocking=true) at util/aio-posix.c:669 >> #3 0x5563d68ba24f in bdrv_do_drained_begin (bs=0x5563d75ea0c0, >> recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at >> block/io.c:432 >> #4 0x5563d68ba338 in bdrv_drained_begin (bs=0x5563d75ea0c0) at >> block/io.c:438 >> #5 0x5563d689c6d2 in quorum_del_child (bs=0x5563d75ea0c0, >> child=0x5563d7908600, errp=0x7fff3cf5b960) at block/quorum.c:1063 >> #6 0x5563d684328f in bdrv_del_child (parent_bs=0x5563d75ea0c0, >> child=0x5563d7908600, errp=0x7fff3cf5b960) at block.c:6568 >> #7 0x5563d658499e in qmp_x_blockdev_change >> (parent=0x5563d80ec4c0 "colo-disk0", has_child=true, child=0x5563d7577d30 >> "children.1", has_node=false, node=0x0,errp=0x7fff3cf5b960) at >> blockdev.c:4494 >> #8 0x5563d67e8b4e in qmp_marshal_x_blockdev_change >> (args=0x7fad6400ada0, ret=0x7fff3cf5b9f8, errp=0x7fff3cf5b9f0) at >> qapi/qapi-commands-block-core.c:1538 >> #9 0x5563d691cd9e in do_qmp_dispatch (cmds=0x5563d719a4f0 >> , request=0x7fad64009d80, allow_oob=true, errp=0x7fff3cf5ba98) >> at qapi/qmp-dispatch.c:132 >> #10 0x5563d691cfab in qmp_dispatch (cmds=0x5563d719a4f0 >> , request=0x7fad64009d80, allow_oob=true) at >> qapi/qmp-dispatch.c:175 >> #11 0x5563d67b7787 in monitor_qmp_dispatch (mon=0x5563d7605d40, >> req=0x7fad64009d80) at monitor/qmp.c:145 >> #12 0x5563d67b7b24 in monitor_qmp_bh_dispatcher (data=0x0) at >> monitor/qmp.c:234 >> #13 0x5563d69754c2 in aio_bh_call (bh=0x5563d7473230) at >> util/async.c:89 >> #14 0x5563d697555e in aio_bh_poll (ctx=0x5563d7471da0) at >> util/async.c:117 >> #15 0x5563d697a423 in aio_dispatch (ctx=0x5563d7471da0) at >> util/aio-posix.c:459 >> #16 0x5563d6975917 in aio_ctx_dispatch (source=0x5563d7471da0, >> callback=0x0, user_data=0x0) at util/async.c:260 >> #17 0x7fad730e1fbd in g_main_context_dispatch () from >> /lib/x86_64-linux-gnu/libglib-2.0.so.0 >> #18 0x5563d6978cda in glib_pollfds_poll () at >> util/main-loop.c:219 >> #19 0x5563d6978d58 in os_host_main_loop_wait (timeout=977650) >> at util/main-loop.c:242 >> #20 0x5563d6978e69 in main_loop_wait (nonblocking=0) at >> util/main-loop.c:518 >> #21 0x5563d658f5ed in main_loop () at vl.c:1814 >> #22 0x5563d6596bb7 in main (argc=61, >> argv=0x7fff3cf5c0c8, >> envp=0x7fff3cf5c2b8) at vl.c:450 >> >> From the call trace, we can see that the QEMU main thread is waiting >> for the in_flight return to zero. But the in_filght never reaches 0. >> After analysis, the root cause is that the coroutine of NBD was not >> awakened. Although this bug was found in the COLO test, I think this is a >> universal problem in the QIO module. This issue also affects other >> modules depending on QIO such as NBD. We dump the following data: >> $2 = { >>in_flight = 2, >>state = NBD_CLIENT_QUIT, >>connect_status = 0, >>connect_err = 0x0, >&g
Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case
Sorry, resending with correct indentation and quoting. On 12/1/2021 10:27 PM, Vladimir Sementsov-Ogievskiy wrote: 01.12.2021 12:48, Rao, Lei wrote: -Original Message- From: Daniel P. Berrangé Sent: Wednesday, December 1, 2021 5:11 PM To: Rao, Lei Cc: Zhang, Chen ; ebl...@redhat.com; vsement...@virtuozzo.com; kw...@redhat.com; hre...@redhat.com; qemu-bl...@nongnu.org; qemu-devel@nongnu.org Subject: Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case On Wed, Dec 01, 2021 at 03:54:27PM +0800, Rao, Lei wrote: We found that the QIO channel coroutine could not be awakened in some corner cases during our stress test for COLO. The patch fixes as follow: #0 0x7fad72e24bf6 in __ppoll (fds=0x5563d75861f0, nfds=1, timeout=, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44 #1 0x5563d6977c68 in qemu_poll_ns (fds=0x5563d75861f0, nfds=1, timeout=59697630) at util/qemu-timer.c:348 #2 0x5563d697ac44 in aio_poll (ctx=0x5563d755dfa0, blocking=true) at util/aio-posix.c:669 #3 0x5563d68ba24f in bdrv_do_drained_begin (bs=0x5563d75ea0c0, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at block/io.c:432 #4 0x5563d68ba338 in bdrv_drained_begin (bs=0x5563d75ea0c0) at block/io.c:438 #5 0x5563d689c6d2 in quorum_del_child (bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block/quorum.c:1063 #6 0x5563d684328f in bdrv_del_child (parent_bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block.c:6568 #7 0x5563d658499e in qmp_x_blockdev_change (parent=0x5563d80ec4c0 "colo-disk0", has_child=true, child=0x5563d7577d30 "children.1", has_node=false, node=0x0,errp=0x7fff3cf5b960) at blockdev.c:4494 #8 0x5563d67e8b4e in qmp_marshal_x_blockdev_change (args=0x7fad6400ada0, ret=0x7fff3cf5b9f8, errp=0x7fff3cf5b9f0) at qapi/qapi-commands-block-core.c:1538 #9 0x5563d691cd9e in do_qmp_dispatch (cmds=0x5563d719a4f0 , request=0x7fad64009d80, allow_oob=true, errp=0x7fff3cf5ba98) at qapi/qmp-dispatch.c:132 #10 0x5563d691cfab in qmp_dispatch (cmds=0x5563d719a4f0 , request=0x7fad64009d80, allow_oob=true) at qapi/qmp-dispatch.c:175 #11 0x5563d67b7787 in monitor_qmp_dispatch (mon=0x5563d7605d40, req=0x7fad64009d80) at monitor/qmp.c:145 #12 0x5563d67b7b24 in monitor_qmp_bh_dispatcher (data=0x0) at monitor/qmp.c:234 #13 0x5563d69754c2 in aio_bh_call (bh=0x5563d7473230) at util/async.c:89 #14 0x5563d697555e in aio_bh_poll (ctx=0x5563d7471da0) at util/async.c:117 #15 0x5563d697a423 in aio_dispatch (ctx=0x5563d7471da0) at util/aio-posix.c:459 #16 0x5563d6975917 in aio_ctx_dispatch (source=0x5563d7471da0, callback=0x0, user_data=0x0) at util/async.c:260 #17 0x7fad730e1fbd in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #18 0x5563d6978cda in glib_pollfds_poll () at util/main-loop.c:219 #19 0x5563d6978d58 in os_host_main_loop_wait (timeout=977650) at util/main-loop.c:242 #20 0x5563d6978e69 in main_loop_wait (nonblocking=0) at util/main-loop.c:518 #21 0x5563d658f5ed in main_loop () at vl.c:1814 #22 0x5563d6596bb7 in main (argc=61, argv=0x7fff3cf5c0c8, envp=0x7fff3cf5c2b8) at vl.c:450 From the call trace, we can see that the QEMU main thread is waiting for the in_flight return to zero. But the in_filght never reaches 0. After analysis, the root cause is that the coroutine of NBD was not awakened. Although this bug was found in the COLO test, I think this is a universal problem in the QIO module. This issue also affects other modules depending on QIO such as NBD. We dump the following data: $2 = { in_flight = 2, state = NBD_CLIENT_QUIT, connect_status = 0, connect_err = 0x0, wait_in_flight = false, requests = {{ coroutine = 0x0, offset = 273077071872, receiving = false, }, { coroutine = 0x7f1164571df0, offset = 498792529920, receiving = false, }, { coroutine = 0x0, offset = 500663590912, receiving = false, }, { ... } }, reply = { simple = { magic = 1732535960, error = 0, handle = 0 }, structured = { magic = 1732535960, flags = 0, type = 0, handle = 0, length = 0 }, { magic = 1732535960, _skip = 0, handle = 0 } }, bs = 0x7f11640e2140, reconnect_delay = 0, saddr = 0x7f11640e1f80, export = 0x7f11640dc560 "parent0", }
Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case
On 12/2/2021 5:54 PM, Vladimir Sementsov-Ogievskiy wrote: 02.12.2021 11:53, Daniel P. Berrangé wrote: On Thu, Dec 02, 2021 at 01:14:47PM +0800, Rao, Lei wrote: Sorry, resending with correct indentation and quoting. On 12/1/2021 10:27 PM, Vladimir Sementsov-Ogievskiy wrote: 01.12.2021 12:48, Rao, Lei wrote: -Original Message- From: Daniel P. Berrangé Sent: Wednesday, December 1, 2021 5:11 PM To: Rao, Lei Cc: Zhang, Chen ; ebl...@redhat.com; vsement...@virtuozzo.com; kw...@redhat.com; hre...@redhat.com; qemu-bl...@nongnu.org; qemu-devel@nongnu.org Subject: Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case On Wed, Dec 01, 2021 at 03:54:27PM +0800, Rao, Lei wrote: We found that the QIO channel coroutine could not be awakened in some corner cases during our stress test for COLO. The patch fixes as follow: #0 0x7fad72e24bf6 in __ppoll (fds=0x5563d75861f0, nfds=1, timeout=, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44 #1 0x5563d6977c68 in qemu_poll_ns (fds=0x5563d75861f0, nfds=1, timeout=59697630) at util/qemu-timer.c:348 #2 0x5563d697ac44 in aio_poll (ctx=0x5563d755dfa0, blocking=true) at util/aio-posix.c:669 #3 0x5563d68ba24f in bdrv_do_drained_begin (bs=0x5563d75ea0c0, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at block/io.c:432 #4 0x5563d68ba338 in bdrv_drained_begin (bs=0x5563d75ea0c0) at block/io.c:438 #5 0x5563d689c6d2 in quorum_del_child (bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block/quorum.c:1063 #6 0x5563d684328f in bdrv_del_child (parent_bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block.c:6568 #7 0x5563d658499e in qmp_x_blockdev_change (parent=0x5563d80ec4c0 "colo-disk0", has_child=true, child=0x5563d7577d30 "children.1", has_node=false, node=0x0,errp=0x7fff3cf5b960) at blockdev.c:4494 #8 0x5563d67e8b4e in qmp_marshal_x_blockdev_change (args=0x7fad6400ada0, ret=0x7fff3cf5b9f8, errp=0x7fff3cf5b9f0) at qapi/qapi-commands-block-core.c:1538 #9 0x5563d691cd9e in do_qmp_dispatch (cmds=0x5563d719a4f0 , request=0x7fad64009d80, allow_oob=true, errp=0x7fff3cf5ba98) at qapi/qmp-dispatch.c:132 #10 0x5563d691cfab in qmp_dispatch (cmds=0x5563d719a4f0 , request=0x7fad64009d80, allow_oob=true) at qapi/qmp-dispatch.c:175 #11 0x5563d67b7787 in monitor_qmp_dispatch (mon=0x5563d7605d40, req=0x7fad64009d80) at monitor/qmp.c:145 #12 0x5563d67b7b24 in monitor_qmp_bh_dispatcher (data=0x0) at monitor/qmp.c:234 #13 0x5563d69754c2 in aio_bh_call (bh=0x5563d7473230) at util/async.c:89 #14 0x5563d697555e in aio_bh_poll (ctx=0x5563d7471da0) at util/async.c:117 #15 0x5563d697a423 in aio_dispatch (ctx=0x5563d7471da0) at util/aio-posix.c:459 #16 0x5563d6975917 in aio_ctx_dispatch (source=0x5563d7471da0, callback=0x0, user_data=0x0) at util/async.c:260 #17 0x7fad730e1fbd in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #18 0x5563d6978cda in glib_pollfds_poll () at util/main-loop.c:219 #19 0x5563d6978d58 in os_host_main_loop_wait (timeout=977650) at util/main-loop.c:242 #20 0x5563d6978e69 in main_loop_wait (nonblocking=0) at util/main-loop.c:518 #21 0x5563d658f5ed in main_loop () at vl.c:1814 #22 0x5563d6596bb7 in main (argc=61, argv=0x7fff3cf5c0c8, envp=0x7fff3cf5c2b8) at vl.c:450 From the call trace, we can see that the QEMU main thread is waiting for the in_flight return to zero. But the in_filght never reaches 0. After analysis, the root cause is that the coroutine of NBD was not awakened. Although this bug was found in the COLO test, I think this is a universal problem in the QIO module. This issue also affects other modules depending on QIO such as NBD. We dump the following data: $2 = { in_flight = 2, state = NBD_CLIENT_QUIT, connect_status = 0, connect_err = 0x0, wait_in_flight = false, requests = {{ coroutine = 0x0, offset = 273077071872, receiving = false, }, { coroutine = 0x7f1164571df0, offset = 498792529920, receiving = false, }, { coroutine = 0x0, offset = 500663590912, receiving = false, }, { ... } }, reply = { simple = { magic = 1732535960, error = 0, handle = 0 }, structured = { magic = 1732535960, flags = 0, type = 0, handle = 0, length = 0 }, {
[PATCH] COLO: Move some trace code behind qemu_mutex_unlock_iothread()
Signed-off-by: Lei Rao --- migration/colo.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/migration/colo.c b/migration/colo.c index 2415325262..3ccacb29c8 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -683,8 +683,8 @@ static void colo_incoming_process_checkpoint(MigrationIncomingState *mis, qemu_mutex_lock_iothread(); vm_stop_force_state(RUN_STATE_COLO); -trace_colo_vm_state_change("run", "stop"); qemu_mutex_unlock_iothread(); +trace_colo_vm_state_change("run", "stop"); /* FIXME: This is unnecessary for periodic checkpoint mode */ colo_send_message(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_REPLY, @@ -786,8 +786,8 @@ static void colo_incoming_process_checkpoint(MigrationIncomingState *mis, vmstate_loading = false; vm_start(); -trace_colo_vm_state_change("stop", "run"); qemu_mutex_unlock_iothread(); +trace_colo_vm_state_change("stop", "run"); if (failover_get_state() == FAILOVER_STATUS_RELAUNCH) { return; @@ -870,8 +870,8 @@ void *colo_process_incoming_thread(void *opaque) abort(); #endif vm_start(); -trace_colo_vm_state_change("stop", "run"); qemu_mutex_unlock_iothread(); +trace_colo_vm_state_change("stop", "run"); colo_send_message(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_READY, &local_err); -- 2.32.0
Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case
On 12/3/2021 9:26 AM, Rao, Lei wrote: On 12/2/2021 5:54 PM, Vladimir Sementsov-Ogievskiy wrote: 02.12.2021 11:53, Daniel P. Berrangé wrote: On Thu, Dec 02, 2021 at 01:14:47PM +0800, Rao, Lei wrote: Sorry, resending with correct indentation and quoting. On 12/1/2021 10:27 PM, Vladimir Sementsov-Ogievskiy wrote: 01.12.2021 12:48, Rao, Lei wrote: -Original Message- From: Daniel P. Berrangé Sent: Wednesday, December 1, 2021 5:11 PM To: Rao, Lei Cc: Zhang, Chen ; ebl...@redhat.com; vsement...@virtuozzo.com; kw...@redhat.com; hre...@redhat.com; qemu-bl...@nongnu.org; qemu-devel@nongnu.org Subject: Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case On Wed, Dec 01, 2021 at 03:54:27PM +0800, Rao, Lei wrote: We found that the QIO channel coroutine could not be awakened in some corner cases during our stress test for COLO. The patch fixes as follow: #0 0x7fad72e24bf6 in __ppoll (fds=0x5563d75861f0, nfds=1, timeout=, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44 #1 0x5563d6977c68 in qemu_poll_ns (fds=0x5563d75861f0, nfds=1, timeout=59697630) at util/qemu-timer.c:348 #2 0x5563d697ac44 in aio_poll (ctx=0x5563d755dfa0, blocking=true) at util/aio-posix.c:669 #3 0x5563d68ba24f in bdrv_do_drained_begin (bs=0x5563d75ea0c0, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at block/io.c:432 #4 0x5563d68ba338 in bdrv_drained_begin (bs=0x5563d75ea0c0) at block/io.c:438 #5 0x5563d689c6d2 in quorum_del_child (bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block/quorum.c:1063 #6 0x5563d684328f in bdrv_del_child (parent_bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block.c:6568 #7 0x5563d658499e in qmp_x_blockdev_change (parent=0x5563d80ec4c0 "colo-disk0", has_child=true, child=0x5563d7577d30 "children.1", has_node=false, node=0x0,errp=0x7fff3cf5b960) at blockdev.c:4494 #8 0x5563d67e8b4e in qmp_marshal_x_blockdev_change (args=0x7fad6400ada0, ret=0x7fff3cf5b9f8, errp=0x7fff3cf5b9f0) at qapi/qapi-commands-block-core.c:1538 #9 0x5563d691cd9e in do_qmp_dispatch (cmds=0x5563d719a4f0 , request=0x7fad64009d80, allow_oob=true, errp=0x7fff3cf5ba98) at qapi/qmp-dispatch.c:132 #10 0x5563d691cfab in qmp_dispatch (cmds=0x5563d719a4f0 , request=0x7fad64009d80, allow_oob=true) at qapi/qmp-dispatch.c:175 #11 0x5563d67b7787 in monitor_qmp_dispatch (mon=0x5563d7605d40, req=0x7fad64009d80) at monitor/qmp.c:145 #12 0x5563d67b7b24 in monitor_qmp_bh_dispatcher (data=0x0) at monitor/qmp.c:234 #13 0x5563d69754c2 in aio_bh_call (bh=0x5563d7473230) at util/async.c:89 #14 0x5563d697555e in aio_bh_poll (ctx=0x5563d7471da0) at util/async.c:117 #15 0x5563d697a423 in aio_dispatch (ctx=0x5563d7471da0) at util/aio-posix.c:459 #16 0x5563d6975917 in aio_ctx_dispatch (source=0x5563d7471da0, callback=0x0, user_data=0x0) at util/async.c:260 #17 0x7fad730e1fbd in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #18 0x5563d6978cda in glib_pollfds_poll () at util/main-loop.c:219 #19 0x5563d6978d58 in os_host_main_loop_wait (timeout=977650) at util/main-loop.c:242 #20 0x5563d6978e69 in main_loop_wait (nonblocking=0) at util/main-loop.c:518 #21 0x5563d658f5ed in main_loop () at vl.c:1814 #22 0x5563d6596bb7 in main (argc=61, argv=0x7fff3cf5c0c8, envp=0x7fff3cf5c2b8) at vl.c:450 Signed-off-by: Lei Rao Signed-off-by: Zhang Chen --- block/nbd.c | 5 + include/io/channel.h | 19 +++ io/channel.c | 22 ++ 3 files changed, 46 insertions(+) diff --git a/block/nbd.c b/block/nbd.c index 5ef462db1b..5ee4eaaf57 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -208,6 +208,8 @@ static void nbd_teardown_connection(BlockDriverState *bs) assert(!s->in_flight); if (s->ioc) { + qio_channel_set_force_quit(s->ioc, true); + qio_channel_coroutines_wake(s->ioc); qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL); Calling shutdown here should already be causing the qio_chanel_readv_all to wakeup and break out of its poll() loop. We shouldn't need to add a second way to break out of the poll(). Calling shutdown can wake up the coroutines which is polling. But I think it's not enough. I tried to forcibly wake up the NBD coroutine, It may cause segment fault. The root cause is that it will continue to access ioc->xxx in qio_channel_yield(), but the ioc has been released and set it NULL such as in nbd_co_do_establish_connection(); I think call shutdown will have the same result. So,
Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case
On 12/13/2021 4:38 PM, Vladimir Sementsov-Ogievskiy wrote: 13.12.2021 11:02, Rao, Lei wrote: On 12/3/2021 9:26 AM, Rao, Lei wrote: On 12/2/2021 5:54 PM, Vladimir Sementsov-Ogievskiy wrote: 02.12.2021 11:53, Daniel P. Berrangé wrote: On Thu, Dec 02, 2021 at 01:14:47PM +0800, Rao, Lei wrote: Sorry, resending with correct indentation and quoting. On 12/1/2021 10:27 PM, Vladimir Sementsov-Ogievskiy wrote: 01.12.2021 12:48, Rao, Lei wrote: -Original Message- From: Daniel P. Berrangé Sent: Wednesday, December 1, 2021 5:11 PM To: Rao, Lei Cc: Zhang, Chen ; ebl...@redhat.com; vsement...@virtuozzo.com; kw...@redhat.com; hre...@redhat.com; qemu-bl...@nongnu.org; qemu-devel@nongnu.org Subject: Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case On Wed, Dec 01, 2021 at 03:54:27PM +0800, Rao, Lei wrote: We found that the QIO channel coroutine could not be awakened in some corner cases during our stress test for COLO. The patch fixes as follow: #0 0x7fad72e24bf6 in __ppoll (fds=0x5563d75861f0, nfds=1, timeout=, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44 #1 0x5563d6977c68 in qemu_poll_ns (fds=0x5563d75861f0, nfds=1, timeout=59697630) at util/qemu-timer.c:348 #2 0x5563d697ac44 in aio_poll (ctx=0x5563d755dfa0, blocking=true) at util/aio-posix.c:669 #3 0x5563d68ba24f in bdrv_do_drained_begin (bs=0x5563d75ea0c0, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at block/io.c:432 #4 0x5563d68ba338 in bdrv_drained_begin (bs=0x5563d75ea0c0) at block/io.c:438 #5 0x5563d689c6d2 in quorum_del_child (bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block/quorum.c:1063 #6 0x5563d684328f in bdrv_del_child (parent_bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block.c:6568 #7 0x5563d658499e in qmp_x_blockdev_change (parent=0x5563d80ec4c0 "colo-disk0", has_child=true, child=0x5563d7577d30 "children.1", has_node=false, node=0x0,errp=0x7fff3cf5b960) at blockdev.c:4494 #8 0x5563d67e8b4e in qmp_marshal_x_blockdev_change (args=0x7fad6400ada0, ret=0x7fff3cf5b9f8, errp=0x7fff3cf5b9f0) at qapi/qapi-commands-block-core.c:1538 #9 0x5563d691cd9e in do_qmp_dispatch (cmds=0x5563d719a4f0 , request=0x7fad64009d80, allow_oob=true, errp=0x7fff3cf5ba98) at qapi/qmp-dispatch.c:132 #10 0x5563d691cfab in qmp_dispatch (cmds=0x5563d719a4f0 , request=0x7fad64009d80, allow_oob=true) at qapi/qmp-dispatch.c:175 #11 0x5563d67b7787 in monitor_qmp_dispatch (mon=0x5563d7605d40, req=0x7fad64009d80) at monitor/qmp.c:145 #12 0x5563d67b7b24 in monitor_qmp_bh_dispatcher (data=0x0) at monitor/qmp.c:234 #13 0x5563d69754c2 in aio_bh_call (bh=0x5563d7473230) at util/async.c:89 #14 0x5563d697555e in aio_bh_poll (ctx=0x5563d7471da0) at util/async.c:117 #15 0x5563d697a423 in aio_dispatch (ctx=0x5563d7471da0) at util/aio-posix.c:459 #16 0x5563d6975917 in aio_ctx_dispatch (source=0x5563d7471da0, callback=0x0, user_data=0x0) at util/async.c:260 #17 0x7fad730e1fbd in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #18 0x5563d6978cda in glib_pollfds_poll () at util/main-loop.c:219 #19 0x5563d6978d58 in os_host_main_loop_wait (timeout=977650) at util/main-loop.c:242 #20 0x5563d6978e69 in main_loop_wait (nonblocking=0) at util/main-loop.c:518 #21 0x5563d658f5ed in main_loop () at vl.c:1814 #22 0x5563d6596bb7 in main (argc=61, argv=0x7fff3cf5c0c8, envp=0x7fff3cf5c2b8) at vl.c:450 Signed-off-by: Lei Rao Signed-off-by: Zhang Chen --- block/nbd.c | 5 + include/io/channel.h | 19 +++ io/channel.c | 22 ++ 3 files changed, 46 insertions(+) diff --git a/block/nbd.c b/block/nbd.c index 5ef462db1b..5ee4eaaf57 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -208,6 +208,8 @@ static void nbd_teardown_connection(BlockDriverState *bs) assert(!s->in_flight); if (s->ioc) { + qio_channel_set_force_quit(s->ioc, true); + qio_channel_coroutines_wake(s->ioc); qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL); Calling shutdown here should already be causing the qio_chanel_readv_all to wakeup and break out of its poll() loop. We shouldn't need to add a second way to break out of the poll(). Calling shutdown can wake up the coroutines which is polling. But I think it's not enough. I tried to forcibly wake up the NBD coroutine, It may cause segment fault. The root cause is that it will continue to access ioc->xxx in qio_channel_yield(), but the ioc has been released and set it NULL such as i
Re: [PATCH] COLO: Move some trace code behind qemu_mutex_unlock_iothread()
On 12/13/2021 7:45 PM, Dr. David Alan Gilbert wrote: * Rao, Lei (lei@intel.com) wrote: Signed-off-by: Lei Rao You don't say why you want to move it - it's just a trace, what's the advantage? I think it's not necessary to put trace code in the critical section. So, moving it behind qemu_mutex_unlock_iothread() may reduce the lock time. I will send the V2 and add some commit messages to say why. Thanks, Lei Dave --- migration/colo.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/migration/colo.c b/migration/colo.c index 2415325262..3ccacb29c8 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -683,8 +683,8 @@ static void colo_incoming_process_checkpoint(MigrationIncomingState *mis, qemu_mutex_lock_iothread(); vm_stop_force_state(RUN_STATE_COLO); -trace_colo_vm_state_change("run", "stop"); qemu_mutex_unlock_iothread(); +trace_colo_vm_state_change("run", "stop"); /* FIXME: This is unnecessary for periodic checkpoint mode */ colo_send_message(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_REPLY, @@ -786,8 +786,8 @@ static void colo_incoming_process_checkpoint(MigrationIncomingState *mis, vmstate_loading = false; vm_start(); -trace_colo_vm_state_change("stop", "run"); qemu_mutex_unlock_iothread(); +trace_colo_vm_state_change("stop", "run"); if (failover_get_state() == FAILOVER_STATUS_RELAUNCH) { return; @@ -870,8 +870,8 @@ void *colo_process_incoming_thread(void *opaque) abort(); #endif vm_start(); -trace_colo_vm_state_change("stop", "run"); qemu_mutex_unlock_iothread(); +trace_colo_vm_state_change("stop", "run"); colo_send_message(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_READY, &local_err); -- 2.32.0
[PATCH v2] COLO: Move some trace code behind qemu_mutex_unlock_iothread()
There is no need to put some trace code in the critical section. So, moving it behind qemu_mutex_unlock_iothread() can reduce the lock time. Signed-off-by: Lei Rao --- migration/colo.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/migration/colo.c b/migration/colo.c index 2415325262..3ccacb29c8 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -683,8 +683,8 @@ static void colo_incoming_process_checkpoint(MigrationIncomingState *mis, qemu_mutex_lock_iothread(); vm_stop_force_state(RUN_STATE_COLO); -trace_colo_vm_state_change("run", "stop"); qemu_mutex_unlock_iothread(); +trace_colo_vm_state_change("run", "stop"); /* FIXME: This is unnecessary for periodic checkpoint mode */ colo_send_message(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_REPLY, @@ -786,8 +786,8 @@ static void colo_incoming_process_checkpoint(MigrationIncomingState *mis, vmstate_loading = false; vm_start(); -trace_colo_vm_state_change("stop", "run"); qemu_mutex_unlock_iothread(); +trace_colo_vm_state_change("stop", "run"); if (failover_get_state() == FAILOVER_STATUS_RELAUNCH) { return; @@ -870,8 +870,8 @@ void *colo_process_incoming_thread(void *opaque) abort(); #endif vm_start(); -trace_colo_vm_state_change("stop", "run"); qemu_mutex_unlock_iothread(); +trace_colo_vm_state_change("stop", "run"); colo_send_message(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_READY, &local_err); -- 2.32.0
[PATCH] net/filter: Optimize filter_send to coroutine
This patch is to improve the logic of QEMU main thread sleep code in qemu_chr_write_buffer() where it can be blocked and can't run other coroutines during COLO IO stress test. Our approach is to put filter_send() in a coroutine. In this way, filter_send() will call qemu_coroutine_yield() in qemu_co_sleep_ns(), so that it can be scheduled out and QEMU main thread has opportunity to run other tasks. Signed-off-by: Lei Rao Signed-off-by: Zhang Chen --- net/filter-mirror.c | 67 - 1 file changed, 54 insertions(+), 13 deletions(-) diff --git a/net/filter-mirror.c b/net/filter-mirror.c index f20240cc9f..1e9f8b6216 100644 --- a/net/filter-mirror.c +++ b/net/filter-mirror.c @@ -20,6 +20,7 @@ #include "chardev/char-fe.h" #include "qemu/iov.h" #include "qemu/sockets.h" +#include "block/aio-wait.h" #define TYPE_FILTER_MIRROR "filter-mirror" typedef struct MirrorState MirrorState; @@ -42,20 +43,21 @@ struct MirrorState { bool vnet_hdr; }; -static int filter_send(MirrorState *s, - const struct iovec *iov, - int iovcnt) +typedef struct FilterSendCo { +MirrorState *s; +char *buf; +ssize_t size; +bool done; +int ret; +} FilterSendCo; + +static int _filter_send(MirrorState *s, + char *buf, + ssize_t size) { NetFilterState *nf = NETFILTER(s); int ret = 0; -ssize_t size = 0; uint32_t len = 0; -char *buf; - -size = iov_size(iov, iovcnt); -if (!size) { -return 0; -} len = htonl(size); ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, sizeof(len)); @@ -80,10 +82,7 @@ static int filter_send(MirrorState *s, } } -buf = g_malloc(size); -iov_to_buf(iov, iovcnt, 0, buf, size); ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)buf, size); -g_free(buf); if (ret != size) { goto err; } @@ -94,6 +93,48 @@ err: return ret < 0 ? ret : -EIO; } +static void coroutine_fn filter_send_co(void *opaque) +{ +FilterSendCo *data = opaque; + +data->ret = _filter_send(data->s, data->buf, data->size); +data->done = true; +g_free(data->buf); +aio_wait_kick(); +} + +static int filter_send(MirrorState *s, + const struct iovec *iov, + int iovcnt) +{ +ssize_t size = iov_size(iov, iovcnt); +char *buf = NULL; + +if (!size) { +return 0; +} + +buf = g_malloc(size); +iov_to_buf(iov, iovcnt, 0, buf, size); + +FilterSendCo data = { +.s = s, +.size = size, +.buf = buf, +.ret = 0, +}; + +Coroutine *co = qemu_coroutine_create(filter_send_co, &data); +qemu_coroutine_enter(co); + +while (!data.done) { +aio_poll(qemu_get_aio_context(), true); +} + +return data.ret; + +} + static void redirector_to_filter(NetFilterState *nf, const uint8_t *buf, int len) -- 2.32.0
Re: [PATCH] net/filter: Optimize filter_send to coroutine
On 12/24/2021 6:07 PM, lizhij...@fujitsu.com wrote: On 24/12/2021 10:37, Rao, Lei wrote: This patch is to improve the logic of QEMU main thread sleep code in qemu_chr_write_buffer() where it can be blocked and can't run other coroutines during COLO IO stress test. Our approach is to put filter_send() in a coroutine. In this way, filter_send() will call qemu_coroutine_yield() in qemu_co_sleep_ns(), so that it can be scheduled out and QEMU main thread has opportunity to run other tasks. Signed-off-by: Lei Rao Signed-off-by: Zhang Chen --- net/filter-mirror.c | 67 - 1 file changed, 54 insertions(+), 13 deletions(-) diff --git a/net/filter-mirror.c b/net/filter-mirror.c index f20240cc9f..1e9f8b6216 100644 --- a/net/filter-mirror.c +++ b/net/filter-mirror.c @@ -20,6 +20,7 @@ #include "chardev/char-fe.h" #include "qemu/iov.h" #include "qemu/sockets.h" +#include "block/aio-wait.h" #define TYPE_FILTER_MIRROR "filter-mirror" typedef struct MirrorState MirrorState; @@ -42,20 +43,21 @@ struct MirrorState { bool vnet_hdr; }; -static int filter_send(MirrorState *s, - const struct iovec *iov, - int iovcnt) +typedef struct FilterSendCo { +MirrorState *s; +char *buf; +ssize_t size; +bool done; +int ret; +} FilterSendCo; + +static int _filter_send(MirrorState *s, + char *buf, + ssize_t size) { NetFilterState *nf = NETFILTER(s); int ret = 0; -ssize_t size = 0; uint32_t len = 0; -char *buf; - -size = iov_size(iov, iovcnt); -if (!size) { -return 0; -} len = htonl(size); ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, sizeof(len)); @@ -80,10 +82,7 @@ static int filter_send(MirrorState *s, } } -buf = g_malloc(size); -iov_to_buf(iov, iovcnt, 0, buf, size); ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)buf, size); -g_free(buf); if (ret != size) { goto err; } @@ -94,6 +93,48 @@ err: return ret < 0 ? ret : -EIO; } +static void coroutine_fn filter_send_co(void *opaque) +{ +FilterSendCo *data = opaque; + +data->ret = _filter_send(data->s, data->buf, data->size); +data->done = true; +g_free(data->buf); +aio_wait_kick(); +} + +static int filter_send(MirrorState *s, + const struct iovec *iov, + int iovcnt) +{ +ssize_t size = iov_size(iov, iovcnt); +char *buf = NULL; + +if (!size) { +return 0; +} + +buf = g_malloc(size); +iov_to_buf(iov, iovcnt, 0, buf, size); + +FilterSendCo data = { +.s = s, +.size = size, +.buf = buf, +.ret = 0, +}; + +Coroutine *co = qemu_coroutine_create(filter_send_co, &data); BTW, does qemu/old gcc complaint such coding style ? int a; a = foo() int b = a; There are a lot of codes of this style in QEMU. It is written that we need at least GCC v7.4 to compile QEMU in the configure file. So, I think it is no problem. +qemu_coroutine_enter(co); + +while (!data.done) { +aio_poll(qemu_get_aio_context(), true); +} + +return data.ret; + redundant newline will be changed in V2. Thanks, Lei Otherwise, Reviewed-by: Li Zhijian +} + static void redirector_to_filter(NetFilterState *nf, const uint8_t *buf, int len)
[PATCH v2] net/filter: Optimize filter_send to coroutine
This patch is to improve the logic of QEMU main thread sleep code in qemu_chr_write_buffer() where it can be blocked and can't run other coroutines during COLO IO stress test. Our approach is to put filter_send() in a coroutine. In this way, filter_send() will call qemu_coroutine_yield() in qemu_co_sleep_ns(), so that it can be scheduled out and QEMU main thread has opportunity to run other tasks. Signed-off-by: Lei Rao Signed-off-by: Zhang Chen Reviewed-by: Li Zhijian --- net/filter-mirror.c | 66 - 1 file changed, 53 insertions(+), 13 deletions(-) diff --git a/net/filter-mirror.c b/net/filter-mirror.c index f20240cc9f..34a63b5dbb 100644 --- a/net/filter-mirror.c +++ b/net/filter-mirror.c @@ -20,6 +20,7 @@ #include "chardev/char-fe.h" #include "qemu/iov.h" #include "qemu/sockets.h" +#include "block/aio-wait.h" #define TYPE_FILTER_MIRROR "filter-mirror" typedef struct MirrorState MirrorState; @@ -42,20 +43,21 @@ struct MirrorState { bool vnet_hdr; }; -static int filter_send(MirrorState *s, - const struct iovec *iov, - int iovcnt) +typedef struct FilterSendCo { +MirrorState *s; +char *buf; +ssize_t size; +bool done; +int ret; +} FilterSendCo; + +static int _filter_send(MirrorState *s, + char *buf, + ssize_t size) { NetFilterState *nf = NETFILTER(s); int ret = 0; -ssize_t size = 0; uint32_t len = 0; -char *buf; - -size = iov_size(iov, iovcnt); -if (!size) { -return 0; -} len = htonl(size); ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, sizeof(len)); @@ -80,10 +82,7 @@ static int filter_send(MirrorState *s, } } -buf = g_malloc(size); -iov_to_buf(iov, iovcnt, 0, buf, size); ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)buf, size); -g_free(buf); if (ret != size) { goto err; } @@ -94,6 +93,47 @@ err: return ret < 0 ? ret : -EIO; } +static void coroutine_fn filter_send_co(void *opaque) +{ +FilterSendCo *data = opaque; + +data->ret = _filter_send(data->s, data->buf, data->size); +data->done = true; +g_free(data->buf); +aio_wait_kick(); +} + +static int filter_send(MirrorState *s, + const struct iovec *iov, + int iovcnt) +{ +ssize_t size = iov_size(iov, iovcnt); +char *buf = NULL; + +if (!size) { +return 0; +} + +buf = g_malloc(size); +iov_to_buf(iov, iovcnt, 0, buf, size); + +FilterSendCo data = { +.s = s, +.size = size, +.buf = buf, +.ret = 0, +}; + +Coroutine *co = qemu_coroutine_create(filter_send_co, &data); +qemu_coroutine_enter(co); + +while (!data.done) { +aio_poll(qemu_get_aio_context(), true); +} + +return data.ret; +} + static void redirector_to_filter(NetFilterState *nf, const uint8_t *buf, int len) -- 2.32.0
[PATCH] migration/colo.c: Add missed return in error handling
When doing failover and checkpoint, some returns are missed in error handling. Let's add it. Signed-off-by: Lei Rao --- migration/colo.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/migration/colo.c b/migration/colo.c index 5f7071b3cd..014d3cba01 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -94,12 +94,15 @@ static void secondary_vm_do_failover(void) if (local_err) { error_report_err(local_err); local_err = NULL; +return; } /* Notify all filters of all NIC to do checkpoint */ colo_notify_filters_event(COLO_EVENT_FAILOVER, &local_err); if (local_err) { error_report_err(local_err); +local_err = NULL; +return; } if (!autostart) { @@ -178,6 +181,7 @@ static void primary_vm_do_failover(void) if (local_err) { error_report_err(local_err); local_err = NULL; +return; } /* Notify COLO thread that failover work is finished */ @@ -507,12 +511,11 @@ static int colo_do_checkpoint_transaction(MigrationState *s, goto out; } -ret = 0; - qemu_mutex_lock_iothread(); vm_start(); qemu_mutex_unlock_iothread(); trace_colo_vm_state_change("stop", "run"); +return 0; out: if (local_err) { -- 2.32.0
[RFC PATCH] ui/vnc.c: Fixed a deadlock bug.
The GDB statck is as follows: (gdb) bt 0 __lll_lock_wait (futex=futex@entry=0x56211df20360, private=0) at lowlevellock.c:52 1 0x7f263caf20a3 in __GI___pthread_mutex_lock (mutex=0x56211df20360) at ../nptl/pthread_mutex_lock.c:80 2 0x56211a757364 in qemu_mutex_lock_impl (mutex=0x56211df20360, file=0x56211a804857 "../ui/vnc-jobs.h", line=60) at ../util/qemu-thread-posix.c:80 3 0x56211a0ef8c7 in vnc_lock_output (vs=0x56211df14200) at ../ui/vnc-jobs.h:60 4 0x56211a0efcb7 in vnc_clipboard_send (vs=0x56211df14200, count=1, dwords=0x7ffdf1701338) at ../ui/vnc-clipboard.c:138 5 0x56211a0f0129 in vnc_clipboard_notify (notifier=0x56211df244c8, data=0x56211dd1bbf0) at ../ui/vnc-clipboard.c:209 6 0x56211a75dde8 in notifier_list_notify (list=0x56211afa17d0 , data=0x56211dd1bbf0) at ../util/notify.c:39 7 0x56211a0bf0e6 in qemu_clipboard_update (info=0x56211dd1bbf0) at ../ui/clipboard.c:50 8 0x56211a0bf05d in qemu_clipboard_peer_release (peer=0x56211df244c0, selection=QEMU_CLIPBOARD_SELECTION_CLIPBOARD) at ../ui/clipboard.c:41 9 0x56211a0bef9b in qemu_clipboard_peer_unregister (peer=0x56211df244c0) at ../ui/clipboard.c:19 10 0x56211a0d45f3 in vnc_disconnect_finish (vs=0x56211df14200) at ../ui/vnc.c:1358 11 0x56211a0d4c9d in vnc_client_read (vs=0x56211df14200) at ../ui/vnc.c:1611 12 0x56211a0d4df8 in vnc_client_io (ioc=0x56211ce70690, condition=G_IO_IN, opaque=0x56211df14200) at ../ui/vnc.c:1649 13 0x56211a5b976c in qio_channel_fd_source_dispatch (source=0x56211ce50a00, callback=0x56211a0d4d71 , user_data=0x56211df14200) at ../io/channel-watch.c:84 14 0x7f263ccede8e in g_main_context_dispatch () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 15 0x56211a77d4a1 in glib_pollfds_poll () at ../util/main-loop.c:232 16 0x56211a77d51f in os_host_main_loop_wait (timeout=958545) at ../util/main-loop.c:255 17 0x56211a77d630 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:531 18 0x56211a45bc8e in qemu_main_loop () at ../softmmu/runstate.c:726 19 0x56211a0b45fa in main (argc=69, argv=0x7ffdf1701778, envp=0x7ffdf17019a8) at ../softmmu/main.c:50 >From the call trace, we can see it is a deadlock bug. vnc_disconnect_finish will acquire the output_mutex. But, the output_mutex will be acquired again in vnc_clipboard_send. Repeated locking will cause deadlock. So, I move qemu_clipboard_peer_unregister() behind vnc_unlock_output(); Signed-off-by: Lei Rao --- ui/vnc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/ui/vnc.c b/ui/vnc.c index 1ed1c7efc6..3ccd33dedc 100644 --- a/ui/vnc.c +++ b/ui/vnc.c @@ -1354,12 +1354,12 @@ void vnc_disconnect_finish(VncState *vs) /* last client gone */ vnc_update_server_surface(vs->vd); } +vnc_unlock_output(vs); + if (vs->cbpeer.notifier.notify) { qemu_clipboard_peer_unregister(&vs->cbpeer); } -vnc_unlock_output(vs); - qemu_mutex_destroy(&vs->output_mutex); if (vs->bh != NULL) { qemu_bh_delete(vs->bh); -- 2.32.0
RE: [PATCH v3 07/10] Reset the auto-converge counter at every checkpoint.
After testing, I think you are right. Will remove the cpu_throttle_stop() in V4. Thanks, Lei. -Original Message- From: Dr. David Alan Gilbert Sent: Thursday, March 25, 2021 12:40 AM To: Rao, Lei Cc: Zhang, Chen ; lizhij...@cn.fujitsu.com; jasow...@redhat.com; quint...@redhat.com; pbonz...@redhat.com; lukasstra...@web.de; qemu-devel@nongnu.org Subject: Re: [PATCH v3 07/10] Reset the auto-converge counter at every checkpoint. * leirao (lei@intel.com) wrote: > From: "Rao, Lei" > > if we don't reset the auto-converge counter, it will continue to run > with COLO running, and eventually the system will hang due to the CPU > throttle reaching DEFAULT_MIGRATE_MAX_CPU_THROTTLE. > > Signed-off-by: Lei Rao > --- > migration/colo.c | 4 > migration/ram.c | 10 ++ > migration/ram.h | 1 + > 3 files changed, 15 insertions(+) > > diff --git a/migration/colo.c b/migration/colo.c index > 1aaf316..723ffb8 100644 > --- a/migration/colo.c > +++ b/migration/colo.c > @@ -459,6 +459,10 @@ static int colo_do_checkpoint_transaction(MigrationState > *s, > if (ret < 0) { > goto out; > } > + > +if (migrate_auto_converge()) { > +mig_throttle_counter_reset(); > +} > /* > * Only save VM's live state, which not including device state. > * TODO: We may need a timeout mechanism to prevent COLO process > diff --git a/migration/ram.c b/migration/ram.c index 72143da..e795a8d > 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -652,6 +652,16 @@ static void mig_throttle_guest_down(uint64_t > bytes_dirty_period, > } > } > > +void mig_throttle_counter_reset(void) { > +RAMState *rs = ram_state; > + > +rs->time_last_bitmap_sync = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); > +rs->num_dirty_pages_period = 0; > +rs->bytes_xfer_prev = ram_counters.transferred; > +cpu_throttle_stop(); I think this is right, so: Reviewed-by: Dr. David Alan Gilbert however, do you really need the cpu_throttle_stop? Shouldn't the previous iteration have called that in migration_iteration_finish() ? Dave > +} > + > /** > * xbzrle_cache_zero_page: insert a zero page in the XBZRLE cache > * > diff --git a/migration/ram.h b/migration/ram.h index 6378bb3..3f78175 > 100644 > --- a/migration/ram.h > +++ b/migration/ram.h > @@ -50,6 +50,7 @@ bool ramblock_is_ignored(RAMBlock *block); int > xbzrle_cache_resize(uint64_t new_size, Error **errp); uint64_t > ram_bytes_remaining(void); uint64_t ram_bytes_total(void); > +void mig_throttle_counter_reset(void); > > uint64_t ram_pagesize_summary(void); > int ram_save_queue_pages(const char *rbname, ram_addr_t start, > ram_addr_t len); > -- > 1.8.3.1 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
RE: [PATCH v4 09/10] Add the function of colo_bitmap_clear_diry.
-Original Message- From: Dr. David Alan Gilbert Sent: Friday, March 26, 2021 2:08 AM To: Rao, Lei Cc: Zhang, Chen ; lizhij...@cn.fujitsu.com; jasow...@redhat.com; quint...@redhat.com; pbonz...@redhat.com; lukasstra...@web.de; qemu-devel@nongnu.org Subject: Re: [PATCH v4 09/10] Add the function of colo_bitmap_clear_diry. * leirao (lei@intel.com) wrote: > From: "Rao, Lei" > > When we use continuous dirty memory copy for flushing ram cache on > secondary VM, we can also clean up the bitmap of contiguous dirty page > memory. This also can reduce the VM stop time during checkpoint. > > Signed-off-by: Lei Rao > --- > migration/ram.c | 29 + > 1 file changed, 25 insertions(+), 4 deletions(-) > > diff --git a/migration/ram.c b/migration/ram.c index a258466..ae1e659 > 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -855,6 +855,30 @@ unsigned long colo_bitmap_find_dirty(RAMState *rs, > RAMBlock *rb, > return first; > } > > +/** > + * colo_bitmap_clear_dirty:when we flush ram cache to ram, we will > +use > + * continuous memory copy, so we can also clean up the bitmap of > +contiguous > + * dirty memory. > + */ > +static inline bool colo_bitmap_clear_dirty(RAMState *rs, > + RAMBlock *rb, > + unsigned long start, > + unsigned long num) { > +bool ret; > +unsigned long i = 0; > + > +qemu_mutex_lock(&rs->bitmap_mutex); Please use QEMU_LOCK_GUARD(&rs->bitmap_mutex); Will be changed in V5. Thanks. > +for (i = 0; i < num; i++) { > +ret = test_and_clear_bit(start + i, rb->bmap); > +if (ret) { > +rs->migration_dirty_pages--; > +} > +} > +qemu_mutex_unlock(&rs->bitmap_mutex); > +return ret; This implementation is missing the clear_bmap code that migration_bitmap_clear_dirty has. I think that's necessary now. Are we sure there's any benefit in this? Dave There is such a note about clear_bmap in struct RAMBlock: "On destination side, this should always be NULL, and the variable `clear_bmap_shift' is meaningless." This means that clear_bmap is always NULL on secondary VM. And for the behavior of flush ram cache to ram, we will always only happen on secondary VM. So, I think the clear_bmap code is unnecessary for COLO. As for the benefits, When the number of dirty pages from flush ram cache to ram is too much. it will reduce the number of locks acquired. Lei > +} > + > static inline bool migration_bitmap_clear_dirty(RAMState *rs, > RAMBlock *rb, > unsigned long page) > @@ -3700,7 +3724,6 @@ void colo_flush_ram_cache(void) > void *src_host; > unsigned long offset = 0; > unsigned long num = 0; > -unsigned long i = 0; > > memory_global_dirty_log_sync(); > WITH_RCU_READ_LOCK_GUARD() { > @@ -3722,9 +3745,7 @@ void colo_flush_ram_cache(void) > num = 0; > block = QLIST_NEXT_RCU(block, next); > } else { > -for (i = 0; i < num; i++) { > -migration_bitmap_clear_dirty(ram_state, block, offset + > i); > -} > +colo_bitmap_clear_dirty(ram_state, block, offset, > + num); > dst_host = block->host > + (((ram_addr_t)offset) << TARGET_PAGE_BITS); > src_host = block->colo_cache > -- > 1.8.3.1 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
RE: [PATCH v4 09/10] Add the function of colo_bitmap_clear_diry.
The performance data has been added to the commit message in V6. Thanks, Lei. -Original Message- From: Dr. David Alan Gilbert Sent: Monday, March 29, 2021 7:32 PM To: Rao, Lei Cc: Zhang, Chen ; lizhij...@cn.fujitsu.com; jasow...@redhat.com; quint...@redhat.com; pbonz...@redhat.com; lukasstra...@web.de; qemu-devel@nongnu.org Subject: Re: [PATCH v4 09/10] Add the function of colo_bitmap_clear_diry. * Rao, Lei (lei@intel.com) wrote: > > -Original Message- > From: Dr. David Alan Gilbert > Sent: Friday, March 26, 2021 2:08 AM > To: Rao, Lei > Cc: Zhang, Chen ; lizhij...@cn.fujitsu.com; > jasow...@redhat.com; quint...@redhat.com; pbonz...@redhat.com; > lukasstra...@web.de; qemu-devel@nongnu.org > Subject: Re: [PATCH v4 09/10] Add the function of colo_bitmap_clear_diry. > > * leirao (lei....@intel.com) wrote: > > From: "Rao, Lei" > > > > When we use continuous dirty memory copy for flushing ram cache on > > secondary VM, we can also clean up the bitmap of contiguous dirty > > page memory. This also can reduce the VM stop time during checkpoint. > > > > Signed-off-by: Lei Rao > > --- > > migration/ram.c | 29 + > > 1 file changed, 25 insertions(+), 4 deletions(-) > > > > diff --git a/migration/ram.c b/migration/ram.c index > > a258466..ae1e659 > > 100644 > > --- a/migration/ram.c > > +++ b/migration/ram.c > > @@ -855,6 +855,30 @@ unsigned long colo_bitmap_find_dirty(RAMState *rs, > > RAMBlock *rb, > > return first; > > } > > > > +/** > > + * colo_bitmap_clear_dirty:when we flush ram cache to ram, we will > > +use > > + * continuous memory copy, so we can also clean up the bitmap of > > +contiguous > > + * dirty memory. > > + */ > > +static inline bool colo_bitmap_clear_dirty(RAMState *rs, > > + RAMBlock *rb, > > + unsigned long start, > > + unsigned long num) { > > +bool ret; > > +unsigned long i = 0; > > + > > +qemu_mutex_lock(&rs->bitmap_mutex); > > Please use QEMU_LOCK_GUARD(&rs->bitmap_mutex); > > Will be changed in V5. Thanks. > > > +for (i = 0; i < num; i++) { > > +ret = test_and_clear_bit(start + i, rb->bmap); > > +if (ret) { > > +rs->migration_dirty_pages--; > > +} > > +} > > +qemu_mutex_unlock(&rs->bitmap_mutex); > > +return ret; > > This implementation is missing the clear_bmap code that > migration_bitmap_clear_dirty has. > I think that's necessary now. > > Are we sure there's any benefit in this? > > Dave > > There is such a note about clear_bmap in struct RAMBlock: > "On destination side, this should always be NULL, and the variable > `clear_bmap_shift' is meaningless." > This means that clear_bmap is always NULL on secondary VM. And for the > behavior of flush ram cache to ram, we will always only happen on secondary > VM. > So, I think the clear_bmap code is unnecessary for COLO. Ah yes; can you add a comment there to note this is on the secondary to make that clear. > As for the benefits, When the number of dirty pages from flush ram cache to > ram is too much. it will reduce the number of locks acquired. It might be good to measure the benefit. Dave > Lei > > > +} > > + > > static inline bool migration_bitmap_clear_dirty(RAMState *rs, > > RAMBlock *rb, > > unsigned long page) > > @@ -3700,7 +3724,6 @@ void colo_flush_ram_cache(void) > > void *src_host; > > unsigned long offset = 0; > > unsigned long num = 0; > > -unsigned long i = 0; > > > > memory_global_dirty_log_sync(); > > WITH_RCU_READ_LOCK_GUARD() { > > @@ -3722,9 +3745,7 @@ void colo_flush_ram_cache(void) > > num = 0; > > block = QLIST_NEXT_RCU(block, next); > > } else { > > -for (i = 0; i < num; i++) { > > -migration_bitmap_clear_dirty(ram_state, block, offset > > + i); > > -} > > +colo_bitmap_clear_dirty(ram_state, block, offset, > > + num); > > dst_host = block->host > > + (((ram_addr_t)offset) << TARGET_PAGE_BITS); > > src_host = block->colo_cache > > -- > > 1.8.3.1 > > > -- > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
RE: [PATCH v2 05/10] Optimize the function of packet_new
How about redefine a function named packet_new_nocopy? In comments, we can tell the caller don't release the buffer and the packet_destroy will release it. Thanks, Lei. -Original Message- From: lizhij...@fujitsu.com Sent: Friday, March 12, 2021 2:53 PM To: Rao, Lei ; Zhang, Chen ; jasow...@redhat.com; quint...@redhat.com; dgilb...@redhat.com; pbonz...@redhat.com; lukasstra...@web.de Cc: qemu-devel@nongnu.org Subject: Re: [PATCH v2 05/10] Optimize the function of packet_new On 3/12/21 1:02 PM, leirao wrote: > From: "Rao, Lei" > > if we put the data copy outside the packet_new(), then for the > filter-rewrite module, there will be one less memory copy in the > processing of each network packet. > > Signed-off-by: Lei Rao > --- > net/colo-compare.c| 7 +-- > net/colo.c| 4 ++-- > net/colo.h| 2 +- > net/filter-rewriter.c | 1 - > 4 files changed, 8 insertions(+), 6 deletions(-) > > diff --git a/net/colo-compare.c b/net/colo-compare.c index > 9e18baa..8bdf5a8 100644 > --- a/net/colo-compare.c > +++ b/net/colo-compare.c > @@ -247,14 +247,17 @@ static int packet_enqueue(CompareState *s, int mode, > Connection **con) > ConnectionKey key; > Packet *pkt = NULL; > Connection *conn; > +char *data = NULL; > int ret; > > if (mode == PRIMARY_IN) { > -pkt = packet_new(s->pri_rs.buf, > +data = g_memdup(s->pri_rs.buf, s->pri_rs.packet_len); > +pkt = packet_new(data, >s->pri_rs.packet_len, >s->pri_rs.vnet_hdr_len); > } else { > -pkt = packet_new(s->sec_rs.buf, > +data = g_memdup(s->sec_rs.buf, s->sec_rs.packet_len); > +pkt = packet_new(data, >s->sec_rs.packet_len, >s->sec_rs.vnet_hdr_len); > } > diff --git a/net/colo.c b/net/colo.c > index ef00609..08fb37e 100644 > --- a/net/colo.c > +++ b/net/colo.c > @@ -155,11 +155,11 @@ void connection_destroy(void *opaque) > g_slice_free(Connection, conn); > } > > -Packet *packet_new(const void *data, int size, int vnet_hdr_len) > +Packet *packet_new(void *data, int size, int vnet_hdr_len) > { > Packet *pkt = g_slice_new(Packet); > > -pkt->data = g_memdup(data, size); > +pkt->data = data; if so, should packet_destroy() free() data which may be not alloc by itself Thanks Zhijian
RE: [PATCH v2 08/10] Reduce the PVM stop time during Checkpoint
I think it's enough to know the start and the number of dirty pages in the bitmap. Because the size of consecutive dirty pages can be calculated. By the way, no problems were found in our automated testing. Thanks, Lei. -Original Message- From: lizhij...@fujitsu.com Sent: Friday, March 12, 2021 3:54 PM To: Rao, Lei ; Zhang, Chen ; jasow...@redhat.com; quint...@redhat.com; dgilb...@redhat.com; pbonz...@redhat.com; lukasstra...@web.de Cc: qemu-devel@nongnu.org Subject: Re: [PATCH v2 08/10] Reduce the PVM stop time during Checkpoint On 3/12/21 1:03 PM, leirao wrote: > From: "Rao, Lei" > > When flushing memory from ram cache to ram during every checkpoint on > secondary VM, we can copy continuous chunks of memory instead of > 4096 bytes per time to reduce the time of VM stop during checkpoint. > > Signed-off-by: Lei Rao > --- > migration/ram.c | 44 +--- > 1 file changed, 41 insertions(+), 3 deletions(-) > > diff --git a/migration/ram.c b/migration/ram.c index e795a8d..b269637 > 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -823,6 +823,39 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, > RAMBlock *rb, > return next; > } > > +/* > + * colo_bitmap_find_diry:find contiguous dirty pages from start > + * > + * Returns the page offset within memory region of the start of the > +contiguout > + * dirty page > + * > + * @rs: current RAM state > + * @rb: RAMBlock where to search for dirty pages > + * @start: page where we start the search > + * @num: the number of contiguous dirty pages */ static inline > +unsigned long colo_bitmap_find_dirty(RAMState *rs, RAMBlock *rb, > + unsigned long start, unsigned > +long *num) { > +unsigned long size = rb->used_length >> TARGET_PAGE_BITS; > +unsigned long *bitmap = rb->bmap; > +unsigned long first, next; > + > +if (ramblock_is_ignored(rb)) { > +return size; > +} > + > +first = find_next_bit(bitmap, size, start); > +if (first >= size) { > +return first; > +} > +next = find_next_zero_bit(bitmap, size, first + 1); > +assert(next >= first); > +*num = next - first; > +return first; The idea is outstanding i wonder it should return (next - 1) ? Thanks Zhijian > +} > + > static inline bool migration_bitmap_clear_dirty(RAMState *rs, > RAMBlock *rb, > unsigned long page) > @@ -3669,6 +3702,8 @@ void colo_flush_ram_cache(void) > void *dst_host; > void *src_host; > unsigned long offset = 0; > +unsigned long num = 0; > +unsigned long i = 0; > > memory_global_dirty_log_sync(); > WITH_RCU_READ_LOCK_GUARD() { > @@ -3682,19 +3717,22 @@ void colo_flush_ram_cache(void) > block = QLIST_FIRST_RCU(&ram_list.blocks); > > while (block) { > -offset = migration_bitmap_find_dirty(ram_state, block, offset); > +offset = colo_bitmap_find_dirty(ram_state, block, offset, > + &num); > > if (((ram_addr_t)offset) << TARGET_PAGE_BITS > >= block->used_length) { > offset = 0; > +num = 0; > block = QLIST_NEXT_RCU(block, next); > } else { > -migration_bitmap_clear_dirty(ram_state, block, offset); > +for (i = 0; i < num; i++) { > +migration_bitmap_clear_dirty(ram_state, block, offset + > i); > +} > dst_host = block->host >+ (((ram_addr_t)offset) << TARGET_PAGE_BITS); > src_host = block->colo_cache >+ (((ram_addr_t)offset) << TARGET_PAGE_BITS); > -memcpy(dst_host, src_host, TARGET_PAGE_SIZE); > +memcpy(dst_host, src_host, TARGET_PAGE_SIZE * num); > } > } > }
RE: [PATCH v2 05/10] Optimize the function of packet_new
Oh, I understand what you mean, and will change it in V3. Thanks, Lei. -Original Message- From: lizhij...@fujitsu.com Sent: Friday, March 12, 2021 6:23 PM To: Rao, Lei ; Zhang, Chen ; jasow...@redhat.com; quint...@redhat.com; dgilb...@redhat.com; pbonz...@redhat.com; lukasstra...@web.de Cc: qemu-devel@nongnu.org Subject: Re: [PATCH v2 05/10] Optimize the function of packet_new > +offset = colo_bitmap_find_dirty(ram_state, block, offset, > + &num); IIUC, this return value would pass to the next round as start index, so you should skip the already checked one. Thanks On 3/12/21 5:56 PM, Rao, Lei wrote: > How about redefine a function named packet_new_nocopy? > In comments, we can tell the caller don't release the buffer and the > packet_destroy will release it. > > Thanks, > Lei. > -Original Message- > From:lizhij...@fujitsu.com > Sent: Friday, March 12, 2021 2:53 PM > To: Rao, Lei; Zhang, > Chen;jasow...@redhat.com;quint...@redhat.com;dgi > lb...@redhat.com;pbonz...@redhat.com;lukasstra...@web.de > Cc:qemu-devel@nongnu.org > Subject: Re: [PATCH v2 05/10] Optimize the function of packet_new > >
RE: [PATCH v6 09/10] Add the function of colo_bitmap_clear_dirty
Hi, Dave The performance data has added to the commit messages. Do you have any other suggestions? Thanks Lei. -Original Message- From: Rao, Lei Sent: Friday, April 9, 2021 11:21 AM To: Zhang, Chen ; lizhij...@cn.fujitsu.com; jasow...@redhat.com; quint...@redhat.com; dgilb...@redhat.com; pbonz...@redhat.com; lukasstra...@web.de Cc: qemu-devel@nongnu.org; Rao, Lei Subject: [PATCH v6 09/10] Add the function of colo_bitmap_clear_dirty From: "Rao, Lei" When we use continuous dirty memory copy for flushing ram cache on secondary VM, we can also clean up the bitmap of contiguous dirty page memory. This also can reduce the VM stop time during checkpoint. The performance test for COLO as follow: Server configuraton: CPU :Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz MEM :251G(type:DDR4 Speed:2666 MT/s) SSD :Intel 730 and DC S35x0/3610/3700 Series SSDs dirty pages:3189376 migration_bitmap_clear_dirty time consuming(ns):105194000 dirty pages:3189784 migration_bitmap_clear_dirty time consuming(ns):105297000 dirty pages:3190501 migration_bitmap_clear_dirty time consuming(ns):10541 dirty pages:3188734 migration_bitmap_clear_dirty time consuming(ns):105138000 dirty pages:3189464 migration_bitmap_clear_dirty time consuming(ns):111736000 dirty pages:3188558 migration_bitmap_clear_dirty time consuming(ns):105079000 dirty pages:3239489 migration_bitmap_clear_dirty time consuming(ns):106761000 dirty pages:3190240 colo_bitmap_clear_dirty time consuming(ns):8369000 dirty pages:3189293 colo_bitmap_clear_dirty time consuming(ns):8388000 dirty pages:3189171 colo_bitmap_clear_dirty time consuming(ns):8641000 dirty pages:3189099 colo_bitmap_clear_dirty time consuming(ns):828 dirty pages:3189974 colo_bitmap_clear_dirty time consuming(ns):8352000 dirty pages:3189471 colo_bitmap_clear_dirty time consuming(ns):8348000 dirty pages:3189681 colo_bitmap_clear_dirty time consuming(ns):8426000 it can be seen from the data that colo_bitmap_clear_dirty is more efficient. Signed-off-by: Lei Rao Reviewed-by: Lukas Straub Tested-by: Lukas Straub --- migration/ram.c | 36 +++- 1 file changed, 31 insertions(+), 5 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 8661d82..11275cd 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -857,6 +857,36 @@ unsigned long colo_bitmap_find_dirty(RAMState *rs, RAMBlock *rb, return first; } +/** + * colo_bitmap_clear_dirty:when we flush ram cache to ram, we will use + * continuous memory copy, so we can also clean up the bitmap of +contiguous + * dirty memory. + */ +static inline bool colo_bitmap_clear_dirty(RAMState *rs, + RAMBlock *rb, + unsigned long start, + unsigned long num) { +bool ret; +unsigned long i = 0; + +/* + * Since flush ram cache to ram can only happen on Secondary VM. + * and the clear bitmap always is NULL on destination side. + * Therefore, there is unnecessary to judge whether the + * clear_bitmap needs clear. + */ +QEMU_LOCK_GUARD(&rs->bitmap_mutex); +for (i = 0; i < num; i++) { +ret = test_and_clear_bit(start + i, rb->bmap); +if (ret) { +rs->migration_dirty_pages--; +} +} + +return ret; +} + static inline bool migration_bitmap_clear_dirty(RAMState *rs, RAMBlock *rb, unsigned long page) @@ -3774,11 +3804,7 @@ void colo_flush_ram_cache(void) num = 0; block = QLIST_NEXT_RCU(block, next); } else { -unsigned long i = 0; - -for (i = 0; i < num; i++) { -migration_bitmap_clear_dirty(ram_state, block, offset + i); -} +colo_bitmap_clear_dirty(ram_state, block, offset, num); dst_host = block->host + (((ram_addr_t)offset) << TARGET_PAGE_BITS); src_host = block->colo_cache -- 1.8.3.1
RE: [PATCH v6 09/10] Add the function of colo_bitmap_clear_dirty
Hi, Dave I think this set of patches is beneficial to upstream. Please check these performance data. If you have any other ideas, please let me know. Thanks Lei. -Original Message- From: Rao, Lei Sent: Friday, April 16, 2021 3:57 PM To: dgilb...@redhat.com Cc: qemu-devel@nongnu.org; Zhang, Chen ; lizhij...@cn.fujitsu.com; jasow...@redhat.com; quint...@redhat.com; pbonz...@redhat.com; lukasstra...@web.de Subject: RE: [PATCH v6 09/10] Add the function of colo_bitmap_clear_dirty Hi, Dave The performance data has added to the commit messages. Do you have any other suggestions? Thanks Lei. -Original Message- From: Rao, Lei Sent: Friday, April 9, 2021 11:21 AM To: Zhang, Chen ; lizhij...@cn.fujitsu.com; jasow...@redhat.com; quint...@redhat.com; dgilb...@redhat.com; pbonz...@redhat.com; lukasstra...@web.de Cc: qemu-devel@nongnu.org; Rao, Lei Subject: [PATCH v6 09/10] Add the function of colo_bitmap_clear_dirty From: "Rao, Lei" When we use continuous dirty memory copy for flushing ram cache on secondary VM, we can also clean up the bitmap of contiguous dirty page memory. This also can reduce the VM stop time during checkpoint. The performance test for COLO as follow: Server configuraton: CPU :Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz MEM :251G(type:DDR4 Speed:2666 MT/s) SSD :Intel 730 and DC S35x0/3610/3700 Series SSDs dirty pages:3189376 migration_bitmap_clear_dirty time consuming(ns):105194000 dirty pages:3189784 migration_bitmap_clear_dirty time consuming(ns):105297000 dirty pages:3190501 migration_bitmap_clear_dirty time consuming(ns):10541 dirty pages:3188734 migration_bitmap_clear_dirty time consuming(ns):105138000 dirty pages:3189464 migration_bitmap_clear_dirty time consuming(ns):111736000 dirty pages:3188558 migration_bitmap_clear_dirty time consuming(ns):105079000 dirty pages:3239489 migration_bitmap_clear_dirty time consuming(ns):106761000 dirty pages:3190240 colo_bitmap_clear_dirty time consuming(ns):8369000 dirty pages:3189293 colo_bitmap_clear_dirty time consuming(ns):8388000 dirty pages:3189171 colo_bitmap_clear_dirty time consuming(ns):8641000 dirty pages:3189099 colo_bitmap_clear_dirty time consuming(ns):828 dirty pages:3189974 colo_bitmap_clear_dirty time consuming(ns):8352000 dirty pages:3189471 colo_bitmap_clear_dirty time consuming(ns):8348000 dirty pages:3189681 colo_bitmap_clear_dirty time consuming(ns):8426000 it can be seen from the data that colo_bitmap_clear_dirty is more efficient. Signed-off-by: Lei Rao Reviewed-by: Lukas Straub Tested-by: Lukas Straub --- migration/ram.c | 36 +++- 1 file changed, 31 insertions(+), 5 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 8661d82..11275cd 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -857,6 +857,36 @@ unsigned long colo_bitmap_find_dirty(RAMState *rs, RAMBlock *rb, return first; } +/** + * colo_bitmap_clear_dirty:when we flush ram cache to ram, we will use + * continuous memory copy, so we can also clean up the bitmap of +contiguous + * dirty memory. + */ +static inline bool colo_bitmap_clear_dirty(RAMState *rs, + RAMBlock *rb, + unsigned long start, + unsigned long num) { +bool ret; +unsigned long i = 0; + +/* + * Since flush ram cache to ram can only happen on Secondary VM. + * and the clear bitmap always is NULL on destination side. + * Therefore, there is unnecessary to judge whether the + * clear_bitmap needs clear. + */ +QEMU_LOCK_GUARD(&rs->bitmap_mutex); +for (i = 0; i < num; i++) { +ret = test_and_clear_bit(start + i, rb->bmap); +if (ret) { +rs->migration_dirty_pages--; +} +} + +return ret; +} + static inline bool migration_bitmap_clear_dirty(RAMState *rs, RAMBlock *rb, unsigned long page) @@ -3774,11 +3804,7 @@ void colo_flush_ram_cache(void) num = 0; block = QLIST_NEXT_RCU(block, next); } else { -unsigned long i = 0; - -for (i = 0; i < num; i++) { -migration_bitmap_clear_dirty(ram_state, block, offset + i); -} +colo_bitmap_clear_dirty(ram_state, block, offset, num); dst_host = block->host + (((ram_addr_t)offset) << TARGET_PAGE_BITS); src_host = block->colo_cache -- 1.8.3.1
RE: [PATCH 07/10] Disable auto-coverge before entering COLO mode.
Sorry for the late reply due to CNY. Auto-converge ensure that live migration can be completed smoothly when there are too many dirty pages. COLO may encounter the same situation when rebuild a new secondary VM. So, I think it is necessary to enable COLO and auto-converge at the same time. Thanks, Lei. -Original Message- From: Lukas Straub Sent: Sunday, February 14, 2021 6:52 PM To: Rao, Lei Cc: Zhang, Chen ; lizhij...@cn.fujitsu.com; jasow...@redhat.com; zhang.zhanghaili...@huawei.com; quint...@redhat.com; dgilb...@redhat.com; qemu-devel@nongnu.org Subject: Re: [PATCH 07/10] Disable auto-coverge before entering COLO mode. On Wed, 13 Jan 2021 10:46:32 +0800 leirao wrote: > From: "Rao, Lei" > > If we don't disable the feature of auto-converge for live migration > before entering COLO mode, it will continue to run with COLO running, > and eventually the system will hang due to the CPU throttle reaching > DEFAULT_MIGRATE_MAX_CPU_THROTTLE. > > Signed-off-by: Lei Rao > --- > migration/migration.c | 16 +++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/migration/migration.c b/migration/migration.c index > 31417ce..6ab37e5 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -1673,6 +1673,20 @@ void migrate_set_block_enabled(bool value, Error > **errp) > qapi_free_MigrationCapabilityStatusList(cap); > } > > +static void colo_auto_converge_enabled(bool value, Error **errp) { > +MigrationCapabilityStatusList *cap = NULL; > + > +if (migrate_colo_enabled() && migrate_auto_converge()) { > +QAPI_LIST_PREPEND(cap, > + migrate_cap_add(MIGRATION_CAPABILITY_AUTO_CONVERGE, > + value)); > +qmp_migrate_set_capabilities(cap, errp); > +qapi_free_MigrationCapabilityStatusList(cap); > +} > +cpu_throttle_stop(); > +} > + I think it's better to error out in migration_prepare or migrate_caps_check if both colo and auto-converge is enabled. > static void migrate_set_block_incremental(MigrationState *s, bool > value) { > s->parameters.block_incremental = value; @@ -3401,7 +3415,7 @@ > static MigIterateState migration_iteration_run(MigrationState *s) > static void migration_iteration_finish(MigrationState *s) { > /* If we enabled cpu throttling for auto-converge, turn it off. */ > -cpu_throttle_stop(); > +colo_auto_converge_enabled(false, &error_abort); > > qemu_mutex_lock_iothread(); > switch (s->state) { --
RE: [PATCH 02/10] Fix the qemu crash when guest shutdown during checkpoint
If user executes the shutdown normally and QEMU crashes, I think this is unacceptable. Since we can avoid this situation, why not do it? Thanks, Lei. -Original Message- From: Lukas Straub Sent: Sunday, February 14, 2021 7:46 PM To: Rao, Lei Cc: Zhang, Chen ; lizhij...@cn.fujitsu.com; jasow...@redhat.com; zhang.zhanghaili...@huawei.com; quint...@redhat.com; dgilb...@redhat.com; qemu-devel@nongnu.org Subject: Re: [PATCH 02/10] Fix the qemu crash when guest shutdown during checkpoint On Fri, 29 Jan 2021 02:57:57 + "Rao, Lei" wrote: > The state will be set RUN_STATE_COLO in colo_do_checkpoint_transaction(). If > the guest executes power off or shutdown at this time and the QEMU main > thread will call vm_shutdown(), it will set the state to RUN_STATE_SHUTDOWN. > The state switch from RUN_STATE_COLO to RUN_STATE_SHUTDOWN is not defined in > runstate_transitions_def. this will cause QEMU crash. Although this is small > probability, it may still happen. This patch fixes the 'colo' -> 'shutdown' transition. AFAIK then colo_do_checkpoint_transaction will call vm_start() again, which does 'shutdown' -> 'running' and (rightfully) crashes. So I think it is better to crash here too. > By the way. Do you have any comments about other patches? > Thanks, > Lei. > > -Original Message- > From: Lukas Straub > Sent: Thursday, January 28, 2021 2:24 AM > To: Rao, Lei > Cc: Zhang, Chen ; lizhij...@cn.fujitsu.com; > jasow...@redhat.com; zhang.zhanghaili...@huawei.com; > quint...@redhat.com; dgilb...@redhat.com; qemu-devel@nongnu.org > Subject: Re: [PATCH 02/10] Fix the qemu crash when guest shutdown > during checkpoint > > On Thu, 21 Jan 2021 01:48:31 + > "Rao, Lei" wrote: > > > The Primary VM can be shut down when it is in COLO state, which may trigger > > this bug. > > Do you have a backtrace for this bug? > > > About 'shutdown' -> 'colo' -> 'running', I think you are right, I did have > > the problems you said. For 'shutdown'->'colo', The fixed > > patch(5647051f432b7c9b57525470b0a79a31339062d2) have been merged. > > Recently, I found another bug as follows in the test. > > qemu-system-x86_64: invalid runstate transition: 'shutdown' -> 'running' > > Aborted (core dumped) > > The gdb bt as following: > > #0 __GI_raise (sig=sig@entry=6) at > > ../sysdeps/unix/sysv/linux/raise.c:50 > > #1 0x7faa3d613859 in __GI_abort () at abort.c:79 > > #2 0x55c5a21268fd in runstate_set (new_state=RUN_STATE_RUNNING) at > > vl.c:723 > > #3 0x55c5a1f8cae4 in vm_prepare_start () at > > /home/workspace/colo-qemu/cpus.c:2206 > > #4 0x55c5a1f8cb1b in vm_start () at > > /home/workspace/colo-qemu/cpus.c:2213 > > #5 0x55c5a2332bba in migration_iteration_finish (s=0x55c5a4658810) > > at migration/migration.c:3376 > > #6 0x55c5a2332f3b in migration_thread (opaque=0x55c5a4658810) at > > migration/migration.c:3527 > > #7 0x55c5a251d68a in qemu_thread_start (args=0x55c5a5491a70) at > > util/qemu-thread-posix.c:519 > > #8 0x7faa3d7e9609 in start_thread (arg=) at > > pthread_create.c:477 > > #9 0x7faa3d710293 in clone () at > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 > > > > For the bug, I made the following changes: > > @@ -3379,7 +3379,9 @@ static void > > migration_iteration_finish(MigrationState *s) > > case MIGRATION_STATUS_CANCELLED: > > case MIGRATION_STATUS_CANCELLING: > > if (s->vm_was_running) { > > -vm_start(); > > +if (!runstate_check(RUN_STATE_SHUTDOWN)) { > > +vm_start(); > > +} > > } else { > > if (runstate_check(RUN_STATE_FINISH_MIGRATE)) { > > runstate_set(RUN_STATE_POSTMIGRATE); > > > > I will send the patch to community after more test. > > > > Thanks, > > Lei. > > > > -Original Message- > > From: Lukas Straub > > Sent: Thursday, January 21, 2021 3:13 AM > > To: Rao, Lei > > Cc: Zhang, Chen ; lizhij...@cn.fujitsu.com; > > jasow...@redhat.com; zhang.zhanghaili...@huawei.com; > > quint...@redhat.com; dgilb...@redhat.com; qemu-devel@nongnu.org > > Subject: Re: [PATCH 02/10] Fix the qemu crash when guest shutdown > > during checkpoint > > > > On Wed, 13 Jan 2021 10:46:27 +0800 > > leirao wrote: > > > > > From: &qu
RE: [PATCH 02/10] Fix the qemu crash when guest shutdown during checkpoint
The state will be set RUN_STATE_COLO in colo_do_checkpoint_transaction(). If the guest executes power off or shutdown at this time and the QEMU main thread will call vm_shutdown(), it will set the state to RUN_STATE_SHUTDOWN. The state switch from RUN_STATE_COLO to RUN_STATE_SHUTDOWN is not defined in runstate_transitions_def. this will cause QEMU crash. Although this is small probability, it may still happen. By the way. Do you have any comments about other patches? Thanks, Lei. -Original Message- From: Lukas Straub Sent: Thursday, January 28, 2021 2:24 AM To: Rao, Lei Cc: Zhang, Chen ; lizhij...@cn.fujitsu.com; jasow...@redhat.com; zhang.zhanghaili...@huawei.com; quint...@redhat.com; dgilb...@redhat.com; qemu-devel@nongnu.org Subject: Re: [PATCH 02/10] Fix the qemu crash when guest shutdown during checkpoint On Thu, 21 Jan 2021 01:48:31 + "Rao, Lei" wrote: > The Primary VM can be shut down when it is in COLO state, which may trigger > this bug. Do you have a backtrace for this bug? > About 'shutdown' -> 'colo' -> 'running', I think you are right, I did have > the problems you said. For 'shutdown'->'colo', The fixed > patch(5647051f432b7c9b57525470b0a79a31339062d2) have been merged. > Recently, I found another bug as follows in the test. > qemu-system-x86_64: invalid runstate transition: 'shutdown' -> 'running' > Aborted (core dumped) > The gdb bt as following: > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 > #1 0x7faa3d613859 in __GI_abort () at abort.c:79 > #2 0x55c5a21268fd in runstate_set (new_state=RUN_STATE_RUNNING) at > vl.c:723 > #3 0x55c5a1f8cae4 in vm_prepare_start () at > /home/workspace/colo-qemu/cpus.c:2206 > #4 0x55c5a1f8cb1b in vm_start () at > /home/workspace/colo-qemu/cpus.c:2213 > #5 0x55c5a2332bba in migration_iteration_finish (s=0x55c5a4658810) > at migration/migration.c:3376 > #6 0x55c5a2332f3b in migration_thread (opaque=0x55c5a4658810) at > migration/migration.c:3527 > #7 0x55c5a251d68a in qemu_thread_start (args=0x55c5a5491a70) at > util/qemu-thread-posix.c:519 > #8 0x7faa3d7e9609 in start_thread (arg=) at > pthread_create.c:477 > #9 0x7faa3d710293 in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 > > For the bug, I made the following changes: > @@ -3379,7 +3379,9 @@ static void > migration_iteration_finish(MigrationState *s) > case MIGRATION_STATUS_CANCELLED: > case MIGRATION_STATUS_CANCELLING: > if (s->vm_was_running) { > -vm_start(); > +if (!runstate_check(RUN_STATE_SHUTDOWN)) { > +vm_start(); > +} > } else { > if (runstate_check(RUN_STATE_FINISH_MIGRATE)) { > runstate_set(RUN_STATE_POSTMIGRATE); > > I will send the patch to community after more test. > > Thanks, > Lei. > > -Original Message- > From: Lukas Straub > Sent: Thursday, January 21, 2021 3:13 AM > To: Rao, Lei > Cc: Zhang, Chen ; lizhij...@cn.fujitsu.com; > jasow...@redhat.com; zhang.zhanghaili...@huawei.com; > quint...@redhat.com; dgilb...@redhat.com; qemu-devel@nongnu.org > Subject: Re: [PATCH 02/10] Fix the qemu crash when guest shutdown > during checkpoint > > On Wed, 13 Jan 2021 10:46:27 +0800 > leirao wrote: > > > From: "Rao, Lei" > > > > This patch fixes the following: > > qemu-system-x86_64: invalid runstate transition: 'colo' ->'shutdown' > > Aborted (core dumped) > > > > Signed-off-by: Lei Rao > > I wonder how that is possible, since the VM is stopped during 'colo' state. > > Unrelated to this patch, I think this area needs some work since the > following unintended runstate transition is possible: > 'shutdown' -> 'colo' -> 'running'. > > > --- > > softmmu/runstate.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/softmmu/runstate.c b/softmmu/runstate.c index > > 636aab0..455ad0d 100644 > > --- a/softmmu/runstate.c > > +++ b/softmmu/runstate.c > > @@ -125,6 +125,7 @@ static const RunStateTransition > > runstate_transitions_def[] = { > > { RUN_STATE_RESTORE_VM, RUN_STATE_PRELAUNCH }, > > > > { RUN_STATE_COLO, RUN_STATE_RUNNING }, > > +{ RUN_STATE_COLO, RUN_STATE_SHUTDOWN}, > > > > { RUN_STATE_RUNNING, RUN_STATE_DEBUG }, > > { RUN_STATE_RUNNING, RUN_STATE_INTERNAL_ERROR }, > > > --
RE: [PATCH 07/10] Disable auto-coverge before entering COLO mode.
I think there is a difference between doing checkpoints in COLO and live migration. The feature of auto-converge is to ensure the success of live migration even though the dirty page generation speed is faster than data transfer. but for COLO, we will force the VM to stop when something is doing a checkpoint. This will ensure the success of doing a checkpoint and this has nothing to do with auto-converge. Thanks, Lei. -Original Message- From: Dr. David Alan Gilbert Sent: Wednesday, January 13, 2021 7:32 PM To: Rao, Lei Cc: Zhang, Chen ; lizhij...@cn.fujitsu.com; jasow...@redhat.com; zhang.zhanghaili...@huawei.com; quint...@redhat.com; qemu-devel@nongnu.org Subject: Re: [PATCH 07/10] Disable auto-coverge before entering COLO mode. * leirao (lei@intel.com) wrote: > From: "Rao, Lei" > > If we don't disable the feature of auto-converge for live migration > before entering COLO mode, it will continue to run with COLO running, > and eventually the system will hang due to the CPU throttle reaching > DEFAULT_MIGRATE_MAX_CPU_THROTTLE. > > Signed-off-by: Lei Rao I don't think that's the right answer, because it would seem reasonable to use auto-converge to ensure that a COLO snapshot succeeded by limiting guest CPU time. Is the right fix here to reset the state of the auto-converge counters at the start of each colo snapshot? Dave > --- > migration/migration.c | 16 +++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/migration/migration.c b/migration/migration.c index > 31417ce..6ab37e5 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -1673,6 +1673,20 @@ void migrate_set_block_enabled(bool value, Error > **errp) > qapi_free_MigrationCapabilityStatusList(cap); > } > > +static void colo_auto_converge_enabled(bool value, Error **errp) { > +MigrationCapabilityStatusList *cap = NULL; > + > +if (migrate_colo_enabled() && migrate_auto_converge()) { > +QAPI_LIST_PREPEND(cap, > + migrate_cap_add(MIGRATION_CAPABILITY_AUTO_CONVERGE, > + value)); > +qmp_migrate_set_capabilities(cap, errp); > +qapi_free_MigrationCapabilityStatusList(cap); > +} > +cpu_throttle_stop(); > +} > + > static void migrate_set_block_incremental(MigrationState *s, bool > value) { > s->parameters.block_incremental = value; @@ -3401,7 +3415,7 @@ > static MigIterateState migration_iteration_run(MigrationState *s) > static void migration_iteration_finish(MigrationState *s) { > /* If we enabled cpu throttling for auto-converge, turn it off. */ > -cpu_throttle_stop(); > +colo_auto_converge_enabled(false, &error_abort); > > qemu_mutex_lock_iothread(); > switch (s->state) { > -- > 1.8.3.1 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
RE: [PATCH 03/10] Optimize the function of filter_send
OK, you are right, I will change it in V2. Thanks, Lei. -Original Message- From: Lukas Straub Sent: Thursday, January 21, 2021 3:21 AM To: Rao, Lei Cc: Zhang, Chen ; lizhij...@cn.fujitsu.com; jasow...@redhat.com; zhang.zhanghaili...@huawei.com; quint...@redhat.com; dgilb...@redhat.com; qemu-devel@nongnu.org Subject: Re: [PATCH 03/10] Optimize the function of filter_send On Wed, 13 Jan 2021 10:46:28 +0800 leirao wrote: > From: "Rao, Lei" > > The iov_size has been calculated in filter_send(). we can directly > return the size.In this way, this is no need to repeat calculations in > filter_redirector_receive_iov(); > > Signed-off-by: Lei Rao > --- > net/filter-mirror.c | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/net/filter-mirror.c b/net/filter-mirror.c index > f8e6500..7fa2eb3 100644 > --- a/net/filter-mirror.c > +++ b/net/filter-mirror.c > @@ -88,7 +88,7 @@ static int filter_send(MirrorState *s, > goto err; > } > > -return 0; > +return size; > > err: > return ret < 0 ? ret : -EIO; > @@ -159,7 +159,7 @@ static ssize_t filter_mirror_receive_iov(NetFilterState > *nf, > int ret; > > ret = filter_send(s, iov, iovcnt); > -if (ret) { > +if (ret <= 0) { > error_report("filter mirror send failed(%s)", strerror(-ret)); > } 0 is a valid return value if the data to send has size = 0. > @@ -182,10 +182,10 @@ static ssize_t > filter_redirector_receive_iov(NetFilterState *nf, > > if (qemu_chr_fe_backend_connected(&s->chr_out)) { > ret = filter_send(s, iov, iovcnt); > -if (ret) { > +if (ret <= 0) { > error_report("filter redirector send failed(%s)", > strerror(-ret)); > } dito > -return iov_size(iov, iovcnt); > +return ret; > } else { > return 0; > } --
RE: [PATCH 02/10] Fix the qemu crash when guest shutdown during checkpoint
The Primary VM can be shut down when it is in COLO state, which may trigger this bug. About 'shutdown' -> 'colo' -> 'running', I think you are right, I did have the problems you said. For 'shutdown'->'colo', The fixed patch(5647051f432b7c9b57525470b0a79a31339062d2) have been merged. Recently, I found another bug as follows in the test. qemu-system-x86_64: invalid runstate transition: 'shutdown' -> 'running' Aborted (core dumped) The gdb bt as following: #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x7faa3d613859 in __GI_abort () at abort.c:79 #2 0x55c5a21268fd in runstate_set (new_state=RUN_STATE_RUNNING) at vl.c:723 #3 0x55c5a1f8cae4 in vm_prepare_start () at /home/workspace/colo-qemu/cpus.c:2206 #4 0x55c5a1f8cb1b in vm_start () at /home/workspace/colo-qemu/cpus.c:2213 #5 0x55c5a2332bba in migration_iteration_finish (s=0x55c5a4658810) at migration/migration.c:3376 #6 0x55c5a2332f3b in migration_thread (opaque=0x55c5a4658810) at migration/migration.c:3527 #7 0x55c5a251d68a in qemu_thread_start (args=0x55c5a5491a70) at util/qemu-thread-posix.c:519 #8 0x7faa3d7e9609 in start_thread (arg=) at pthread_create.c:477 #9 0x7faa3d710293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 For the bug, I made the following changes: @@ -3379,7 +3379,9 @@ static void migration_iteration_finish(MigrationState *s) case MIGRATION_STATUS_CANCELLED: case MIGRATION_STATUS_CANCELLING: if (s->vm_was_running) { -vm_start(); +if (!runstate_check(RUN_STATE_SHUTDOWN)) { +vm_start(); +} } else { if (runstate_check(RUN_STATE_FINISH_MIGRATE)) { runstate_set(RUN_STATE_POSTMIGRATE); I will send the patch to community after more test. Thanks, Lei. -Original Message- From: Lukas Straub Sent: Thursday, January 21, 2021 3:13 AM To: Rao, Lei Cc: Zhang, Chen ; lizhij...@cn.fujitsu.com; jasow...@redhat.com; zhang.zhanghaili...@huawei.com; quint...@redhat.com; dgilb...@redhat.com; qemu-devel@nongnu.org Subject: Re: [PATCH 02/10] Fix the qemu crash when guest shutdown during checkpoint On Wed, 13 Jan 2021 10:46:27 +0800 leirao wrote: > From: "Rao, Lei" > > This patch fixes the following: > qemu-system-x86_64: invalid runstate transition: 'colo' ->'shutdown' > Aborted (core dumped) > > Signed-off-by: Lei Rao I wonder how that is possible, since the VM is stopped during 'colo' state. Unrelated to this patch, I think this area needs some work since the following unintended runstate transition is possible: 'shutdown' -> 'colo' -> 'running'. > --- > softmmu/runstate.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/softmmu/runstate.c b/softmmu/runstate.c index > 636aab0..455ad0d 100644 > --- a/softmmu/runstate.c > +++ b/softmmu/runstate.c > @@ -125,6 +125,7 @@ static const RunStateTransition > runstate_transitions_def[] = { > { RUN_STATE_RESTORE_VM, RUN_STATE_PRELAUNCH }, > > { RUN_STATE_COLO, RUN_STATE_RUNNING }, > +{ RUN_STATE_COLO, RUN_STATE_SHUTDOWN}, > > { RUN_STATE_RUNNING, RUN_STATE_DEBUG }, > { RUN_STATE_RUNNING, RUN_STATE_INTERNAL_ERROR }, --
RE: [PATCH 4/7] colo: fixed 'Segmentation fault' when the simplex mode PVM poweroff
It's been a long time since this bug, I will reproduce it to get the GDB stack, but it may take some time. Thanks, Lei. -Original Message- From: Lukas Straub Sent: Sunday, July 4, 2021 4:36 AM To: Rao, Lei Cc: Zhang, Chen ; lizhij...@cn.fujitsu.com; jasow...@redhat.com; zhang.zhanghaili...@huawei.com; quint...@redhat.com; dgilb...@redhat.com; like.xu.li...@gmail.com; qemu-devel@nongnu.org; Like Xu Subject: Re: [PATCH 4/7] colo: fixed 'Segmentation fault' when the simplex mode PVM poweroff On Thu, 17 Jun 2021 10:47:12 +0800 Lei Rao wrote: > From: "Rao, Lei" > > When a PVM completed its SVM failover steps and begins to run in the > simplex mode, QEMU would encounter a 'Segmentation fault' if the guest > poweroff with the following calltrace: > > Program received signal SIGSEGV, Segmentation fault. > > This is because primary_vm_do_failover() would call > "qemu_file_shutdown (s->rp_state.from_dst_file);" and later the > migration_shutdown() would do it again. So, we should set the > s->rp_state.from_dst_file to NULL. Hello, Please provide a backtrace to such bugfixes. It helps the reviewers to better understand the bug and the fix and it helps yourself to check if it is actually correct. I suggest you to enable coredumps in your test (or even production) system, so for every crash you definitely have a backtrace. > Signed-off-by: Like Xu > Signed-off-by: Lei Rao > --- > migration/colo.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/migration/colo.c b/migration/colo.c index > 616dc00..c25e488 100644 > --- a/migration/colo.c > +++ b/migration/colo.c > @@ -156,14 +156,15 @@ static void primary_vm_do_failover(void) > > /* > * Wake up COLO thread which may blocked in recv() or send(), > - * The s->rp_state.from_dst_file and s->to_dst_file may use the > - * same fd, but we still shutdown the fd for twice, it is harmless. > + * The s->to_dst_file may use the same fd, but we still shutdown > + * the fd for twice, it is harmless. > */ This change to the comment is incorrect. Shutting down a file multiple times is safe and not a bug in it self. If it leads to a crash anyway, that points to a bug in the qemu_file_shutdown() function or similar. > if (s->to_dst_file) { > qemu_file_shutdown(s->to_dst_file); > } > if (s->rp_state.from_dst_file) { > qemu_file_shutdown(s->rp_state.from_dst_file); > +s->rp_state.from_dst_file = NULL; > } This is incorrect, as the file won't be closed after this and is leaked. And indeed, when applying the patch to master, qemu crashes when triggering failover on the primary, due to the leak: qemu-system-x86_64: ../util/yank.c:107: yank_unregister_instance: Assertion `QLIST_EMPTY(&entry->yankfns)' failed. > old_state = failover_set_state(FAILOVER_STATUS_ACTIVE, --
RE: [PATCH 4/7] colo: fixed 'Segmentation fault' when the simplex mode PVM poweroff
I can reproduce this bug successfully and the GDB stack is as follows: Program terminated with signal SIGSEGV, Segmentation fault. #0 object_class_dynamic_cast (class=0x55c8f5d2bf50, typename=0x55c8f2f7379e "qio-channel") at qom/object.c:832 832 if (type->class->interfaces && [Current thread is 1 (Thread 0x7f756e97eb00 (LWP 1811577))] (gdb) bt #0 object_class_dynamic_cast (class=0x55c8f5d2bf50, typename=0x55c8f2f7379e "qio-channel") at qom/object.c:832 #1 0x55c8f2c3dd14 in object_dynamic_cast (obj=0x55c8f543ac00, typename=0x55c8f2f7379e "qio-channel") at qom/object.c:763 #2 0x55c8f2c3ddce in object_dynamic_cast_assert (obj=0x55c8f543ac00, typename=0x55c8f2f7379e "qio-channel", file=0x55c8f2f73780 "migration/qemu-file-channel.c", line=117, func=0x55c8f2f73800 <__func__.18724> "channel_shutdown") at qom/object.c:786 #3 0x55c8f2bbc6ac in channel_shutdown (opaque=0x55c8f543ac00, rd=true, wr=true, errp=0x0) at migration/qemu-file-channel.c:117 #4 0x55c8f2bba56e in qemu_file_shutdown (f=0x7f7558070f50) at migration/qemu-file.c:67 #5 0x55c8f2ba5373 in migrate_fd_cancel (s=0x55c8f4ccf3f0) at migration/migration.c:1699 #6 0x55c8f2ba1992 in migration_shutdown () at migration/migration.c:187 #7 0x55c8f29a5b77 in main (argc=69, argv=0x7fff3e9e8c08, envp=0x7fff3e9e8e38) at vl.c:4512 >From the call trace we can see that the reason for core dump is due to >QIOChannel *ioc = QIO_CHANNEL(opaque) in the channel_shutdown(); After analysis, I think what you said is right, Shutting down a file multiple times is safe and not a bug in it self. The reason for the segmentation fault Is that after doing failover, the COLO thread will qemu_fclose(s->rp_state.from_dst_file) in colo_process_checkpoint(); if we power off the guest at the same time. This will cause qemu_file_shutdown after qemu_close(from_dst_file). As a result, the qemu will crash. So, I think the better way is as follows: /* * Must be called after failover BH is completed, * Or the failover BH may shutdown the wrong fd that * re-used by other threads after we release here. */ if (s->rp_state.from_dst_file) { qemu_fclose(s->rp_state.from_dst_file); + s->rp_state.from_dst_file = NULL; } After qemu_close(s->rp_state.from_dst_file), we set the from_dst_file = NULL; Ways to reproduce bugs: You can kill SVM, after executing the failover QMP command, immediately execute the power off command in the guest, which will have a high probability to reproduce this bug. Thanks, Lei. -Original Message- From: Rao, Lei Sent: Monday, July 5, 2021 10:54 AM To: Lukas Straub Cc: Zhang, Chen ; lizhij...@cn.fujitsu.com; jasow...@redhat.com; zhang.zhanghaili...@huawei.com; quint...@redhat.com; dgilb...@redhat.com; like.xu.li...@gmail.com; qemu-devel@nongnu.org; Like Xu Subject: RE: [PATCH 4/7] colo: fixed 'Segmentation fault' when the simplex mode PVM poweroff It's been a long time since this bug, I will reproduce it to get the GDB stack, but it may take some time. Thanks, Lei. -Original Message- From: Lukas Straub Sent: Sunday, July 4, 2021 4:36 AM To: Rao, Lei Cc: Zhang, Chen ; lizhij...@cn.fujitsu.com; jasow...@redhat.com; zhang.zhanghaili...@huawei.com; quint...@redhat.com; dgilb...@redhat.com; like.xu.li...@gmail.com; qemu-devel@nongnu.org; Like Xu Subject: Re: [PATCH 4/7] colo: fixed 'Segmentation fault' when the simplex mode PVM poweroff On Thu, 17 Jun 2021 10:47:12 +0800 Lei Rao wrote: > From: "Rao, Lei" > > When a PVM completed its SVM failover steps and begins to run in the > simplex mode, QEMU would encounter a 'Segmentation fault' if the guest > poweroff with the following calltrace: > > Program received signal SIGSEGV, Segmentation fault. > > This is because primary_vm_do_failover() would call > "qemu_file_shutdown (s->rp_state.from_dst_file);" and later the > migration_shutdown() would do it again. So, we should set the > s->rp_state.from_dst_file to NULL. Hello, Please provide a backtrace to such bugfixes. It helps the reviewers to better understand the bug and the fix and it helps yourself to check if it is actually correct. I suggest you to enable coredumps in your test (or even production) system, so for every crash you definitely have a backtrace. > Signed-off-by: Like Xu > Signed-off-by: Lei Rao > --- > migration/colo.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/migration/colo.c b/migration/colo.c index > 616dc00..c25e488 100644 > --- a/migration/colo.c > +++ b/migration/colo.c > @@ -156,14 +156,15 @@ static void primary_vm_do_failover(void) > > /* > * Wake up COLO thread which may blocked in recv() or send(), >
RE: [PATCH v1 1/1] COLO: only flush dirty ram pages from colo cache
Hi, Derek and Chen ram_bulk_stage is false by default before Hailiang's patch. For COLO, it does not seem to be used, so I think there is no need to reset it to true. Thanks, Lei. From: Derek Su Sent: Tuesday, September 22, 2020 11:48 AM To: Zhang, Chen Cc: qemu-devel ; Rao, Lei ; zhang.zhanghaili...@huawei.com; quint...@redhat.com; dgilb...@redhat.com Subject: Re: [PATCH v1 1/1] COLO: only flush dirty ram pages from colo cache Hi, Chen Sure. BTW, I just went through Lei's patch. ram_bulk_stage() might need to reset to true after stopping COLO service as my patch. How about your opinion? Thanks. Best regards, Derek Zhang, Chen <mailto:chen.zh...@intel.com> 於 2020年9月22日 週二 上午11:41寫道: Hi Derek and Lei, It looks same with Lei’ patch: [PATCH 2/3] Reduce the time of checkpoint for COLO Can you discuss to merge it into one patch? Thanks Zhang Chen From: Derek Su <mailto:dere...@qnap.com> Sent: Tuesday, September 22, 2020 11:31 AM To: qemu-devel <mailto:qemu-devel@nongnu.org> Cc: mailto:zhang.zhanghaili...@huawei.com; mailto:quint...@redhat.com; mailto:dgilb...@redhat.com; Zhang, Chen <mailto:chen.zh...@intel.com> Subject: Re: [PATCH v1 1/1] COLO: only flush dirty ram pages from colo cache Hello, all Ping... Regards, Derek Su Derek Su <mailto:dere...@qnap.com> 於 2020年9月10日 週四 下午6:47寫道: In secondary side, the colo_flush_ram_cache() calls migration_bitmap_find_dirty() to finding the dirty pages and flush them to host. But ram_state's ram_bulk_stage flag is always enabled in secondary side, it leads to the whole ram pages copy instead of only dirty pages. Here, the ram_bulk_stage in secondary side is disabled in the preparation of COLO incoming process to avoid the whole dirty ram pages flush. Signed-off-by: Derek Su <mailto:dere...@qnap.com> --- migration/colo.c | 6 +- migration/ram.c | 10 ++ migration/ram.h | 3 +++ 3 files changed, 18 insertions(+), 1 deletion(-) diff --git a/migration/colo.c b/migration/colo.c index ea7d1e9d4e..6e644db306 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -844,6 +844,8 @@ void *colo_process_incoming_thread(void *opaque) return NULL; } + colo_disable_ram_bulk_stage(); + failover_init_state(); mis->to_src_file = qemu_file_get_return_path(mis->from_src_file); @@ -873,7 +875,7 @@ void *colo_process_incoming_thread(void *opaque) goto out; } #else - abort(); + abort(); #endif vm_start(); trace_colo_vm_state_change("stop", "run"); @@ -924,6 +926,8 @@ out: qemu_fclose(fb); } + colo_enable_ram_bulk_stage(); + /* Hope this not to be too long to loop here */ qemu_sem_wait(&mis->colo_incoming_sem); qemu_sem_destroy(&mis->colo_incoming_sem); diff --git a/migration/ram.c b/migration/ram.c index 76d4fee5d5..65e9b12058 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -3357,6 +3357,16 @@ static bool postcopy_is_running(void) return ps >= POSTCOPY_INCOMING_LISTENING && ps < POSTCOPY_INCOMING_END; } +void colo_enable_ram_bulk_stage(void) +{ + ram_state->ram_bulk_stage = true; +} + +void colo_disable_ram_bulk_stage(void) +{ + ram_state->ram_bulk_stage = false; +} + /* * Flush content of RAM cache into SVM's memory. * Only flush the pages that be dirtied by PVM or SVM or both. diff --git a/migration/ram.h b/migration/ram.h index 2eeaacfa13..c1c0ebbe0f 100644 --- a/migration/ram.h +++ b/migration/ram.h @@ -69,4 +69,7 @@ void colo_flush_ram_cache(void); void colo_release_ram_cache(void); void colo_incoming_start_dirty_log(void); +void colo_enable_ram_bulk_stage(void); +void colo_disable_ram_bulk_stage(void); + #endif -- 2.25.1
RE: [PATCH 2/3] Reduce the time of checkpoint for COLO
Got it. It looks more reasonable. Will be changed in V2. Thanks, Lei. -Original Message- From: Li Zhijian Sent: Tuesday, September 22, 2020 2:58 PM To: Rao, Lei ; Zhang, Chen ; jasow...@redhat.com; quint...@redhat.com; dgilb...@redhat.com; pbonz...@redhat.com Cc: qemu-devel@nongnu.org Subject: Re: [PATCH 2/3] Reduce the time of checkpoint for COLO On 9/19/20 11:10 AM, leirao wrote: > we should set ram_bulk_stage to false after ram_state_init, otherwise > the bitmap will be unused in migration_bitmap_find_dirty. > all pages in ram cache will be flushed to the ram of secondary guest > for each checkpoint. > > Signed-off-by: leirao > --- > migration/ram.c | 12 > 1 file changed, 12 insertions(+) > > diff --git a/migration/ram.c b/migration/ram.c index 76d4fee..6a2b6c1 > 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -3019,6 +3019,17 @@ static void > decompress_data_with_multi_threads(QEMUFile *f, > } > > /* > + * we must set ram_bulk_stage to fasle, otherwise in > + * migation_bitmap_find_dirty the bitmap will be unused and > + * all the pages in ram cache wil be flushed to the ram of > + * secondary VM. > + */ > +static void colo_set_ram_state(RAMState *rsp) this function name is too general, how about colo_init_ram_state(ram_state) { ram_state_init(&ram_state); ram_state->ram_bulk_stage = false; } Thanks Zhijian > +{ > +rsp->ram_bulk_stage = false; > +} > + > +/* >* colo cache: this is for secondary VM, we cache the whole >* memory of the secondary VM, it is need to hold the global lock >* to call this helper. > @@ -3062,6 +3073,7 @@ int colo_init_ram_cache(void) > } > > ram_state_init(&ram_state); > +colo_set_ram_state(ram_state); > return 0; > } >