Paolo Bonzini <pbonz...@redhat.com> writes: > Il 01/07/2014 19:08, Eric Blake ha scritto: >> On 06/27/2014 11:24 AM, Markus Armbruster wrote: >>> Commit bcada37 dropped the (up to now undocumented) members type, len, >>> offset, speed, breaking tests/qemu-iotests/040 and 041. >>> >>> Restore and document them. This fixes 040, and partially fixes 041. >>> >>> Signed-off-by: Markus Armbruster <arm...@redhat.com> >>> Tested-By: Benoit Canet <ben...@irqsave.net> >>> --- >>> blockjob.c | 6 +++++- >>> qapi/block-core.json | 15 ++++++++++++++- >>> 2 files changed, 19 insertions(+), 2 deletions(-) >> >> Nothing wrong with this commit, but a design issue that I've recently >> run into: >> >> what happens if management misses the BLOCK_JOB_COMPLETED event? How is >> it supposed to learn whether the job succeeded or failed? >> 'query-blockjobs' no longer reports the job (because it is completed), >> so all information about the job is lost. Normally, we've tried hard to >> make sure that all information learned from an event can also be polled
Yes. Every time we neglect that, we find out it's a design bug later. We should review all events for pollability, and add "how to poll" information to their documentation. Then enforce presence of "how to poll" information in review. >> (the ideal is use of events to minimize cpu overhead, but to rely on the >> poll in situations where events may have been lost such as on a libvirtd >> restart). >> >> Should we enhance job failure to be sticky, in that it not only causes >> an event, but also remains around so that it can be reported in the next >> 'query-blockjobs'? > > I think this fixes itself automatically if you use > rerror=stop/werror=stop on block jobs. At least that was part of the > design, whether the implementation gets it right I cannot say without > looking at the code more carefully. What if an underlying device doesn't support [rw]error=stop? Not all do...