On 12/19/2015 06:02 PM, Markus Armbruster wrote: > Copying qemu-block because this seems related to generalising block jobs > to background jobs. > > zhanghailiang <zhang.zhanghaili...@huawei.com> writes: > >> If some errors happen during VM's COLO FT stage, it's important to notify >> the users >> of this event. Together with 'colo_lost_heartbeat', users can intervene in >> COLO's >> failover work immediately. >> If users don't want to get involved in COLO's failover verdict, >> it is still necessary to notify users that we exited COLO mode. >> >> Cc: Markus Armbruster <arm...@redhat.com> >> Cc: Michael Roth <mdr...@linux.vnet.ibm.com> >> Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com> >> Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com> >> --- >> v11: >> - Fix several typos found by Eric >> >> Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com> >> --- >> docs/qmp-events.txt | 17 +++++++++++++++++ >> migration/colo.c | 11 +++++++++++ >> qapi-schema.json | 16 ++++++++++++++++ >> qapi/event.json | 17 +++++++++++++++++ >> 4 files changed, 61 insertions(+) >> >> diff --git a/docs/qmp-events.txt b/docs/qmp-events.txt >> index d2f1ce4..19f68fc 100644 >> --- a/docs/qmp-events.txt >> +++ b/docs/qmp-events.txt >> @@ -184,6 +184,23 @@ Example: >> Note: The "ready to complete" status is always reset by a BLOCK_JOB_ERROR >> event. >> >> +COLO_EXIT >> +--------- >> + >> +Emitted when VM finishes COLO mode due to some errors happening or >> +at the request of users. > > How would the event's recipient distinguish between "due to error" and > "at the user's request"? > >> + >> +Data: >> + >> + - "mode": COLO mode, primary or secondary side (json-string) >> + - "reason": the exit reason, internal error or external request. >> (json-string) >> + - "error": error message (json-string, operation) >> + >> +Example: >> + >> +{"timestamp": {"seconds": 2032141960, "microseconds": 417172}, >> + "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } } >> + > > Pardon my ignorance again... Does "VM finishes COLO mode" means have > some kind of COLO background job, and it just finished for whatever > reason? > > If yes, this COLO job could be an instance of the general background job > concept we're trying to grow from the existing block job concept. > > I'm not asking you to rebase your work onto the background job > infrastructure, not least for the simple reason that it doesn't exist, > yet. But I think it would be fruitful to compare your COLO job > management QMP interface with the one we have for block jobs. Not only > may that avoid unnecessary inconsistency, it could also help shape the > general background job interface.
COLO is not a block job. If live migration is a background jon, COLO is also a backgroud job. > > Quick overview of the block job QMP interface: > > * Commands to create a job: block-commit, block-stream, drive-mirror, > drive-backup. > > * Get information on jobs: query-block-jobs > > * Pause a job: block-job-pause > > * Resume a job: block-job-resume > > * Cancel a job: block-job-cancel > > * Block job completion events: BLOCK_JOB_COMPLETED, BLOCK_JOB_CANCELLED > > * Block job error event: BLOCK_JOB_ERROR > > * Block job synchronous completion: event BLOCK_JOB_READY and command > block-job-complete What is background job infrastructure? Do you mean implement all the above interfaces for each background job? Thanks Wen Congyang > >> DEVICE_DELETED >> -------------- >> >> diff --git a/migration/colo.c b/migration/colo.c >> index d1dd4e1..d06c14f 100644 >> --- a/migration/colo.c >> +++ b/migration/colo.c >> @@ -18,6 +18,7 @@ >> #include "qemu/error-report.h" >> #include "qemu/sockets.h" >> #include "migration/failover.h" >> +#include "qapi-event.h" >> >> /* colo buffer */ >> #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024) >> @@ -349,6 +350,11 @@ static void colo_process_checkpoint(MigrationState *s) >> out: >> if (ret < 0) { >> error_report("%s: %s", __func__, strerror(-ret)); >> + qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_ERROR, >> + true, strerror(-ret), NULL); >> + } else { >> + qapi_event_send_colo_exit(COLO_MODE_PRIMARY, >> COLO_EXIT_REASON_REQUEST, >> + false, NULL, NULL); >> } >> >> qsb_free(buffer); >> @@ -516,6 +522,11 @@ out: >> if (ret < 0) { >> error_report("colo incoming thread will exit, detect error: %s", >> strerror(-ret)); >> + qapi_event_send_colo_exit(COLO_MODE_SECONDARY, >> COLO_EXIT_REASON_ERROR, >> + true, strerror(-ret), NULL); >> + } else { >> + qapi_event_send_colo_exit(COLO_MODE_SECONDARY, >> COLO_EXIT_REASON_REQUEST, >> + false, NULL, NULL); >> } >> >> if (fb) { >> diff --git a/qapi-schema.json b/qapi-schema.json >> index feb7d53..f6ecb88 100644 >> --- a/qapi-schema.json >> +++ b/qapi-schema.json >> @@ -778,6 +778,22 @@ >> 'data': [ 'unknown', 'primary', 'secondary'] } >> >> ## >> +# @COLOExitReason >> +# >> +# The reason for a COLO exit >> +# >> +# @unknown: unknown reason > > How can @unknown happen? > >> +# >> +# @request: COLO exit is due to an external request >> +# >> +# @error: COLO exit is due to an internal error >> +# >> +# Since: 2.6 >> +## >> +{ 'enum': 'COLOExitReason', >> + 'data': [ 'unknown', 'request', 'error'] } >> + >> +## >> # @x-colo-lost-heartbeat >> # >> # Tell qemu that heartbeat is lost, request it to do takeover procedures. >> diff --git a/qapi/event.json b/qapi/event.json >> index f0cef01..f63d456 100644 >> --- a/qapi/event.json >> +++ b/qapi/event.json >> @@ -255,6 +255,23 @@ >> 'data': {'status': 'MigrationStatus'}} >> >> ## >> +# @COLO_EXIT >> +# >> +# Emitted when VM finishes COLO mode due to some errors happening or >> +# at the request of users. >> +# >> +# @mode: which COLO mode the VM was in when it exited. > > Can we get 'unknown' here? > >> +# >> +# @reason: describes the reason for the COLO exit. > > Can we get 'unknown' here? > >> +# >> +# @error: #optional, error message. Only present on error happening. >> +# >> +# Since: 2.6 >> +## >> +{ 'event': 'COLO_EXIT', >> + 'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason', '*error': 'str' >> } } >> + >> +## >> # @ACPI_DEVICE_OST >> # >> # Emitted when guest executes ACPI _OST method. > > > > . >