On 27.07.23 18:27, Hanna Czenczek wrote:
On 25.07.23 19:40, Vladimir Sementsov-Ogievskiy wrote:
From: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com>

Actually block job is not completed without this final flush. It's
rather unexpected to have broken target when job was successfully
completed long ago and now we fail to flush or process just
crashed/killed.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@yandex-team.ru>
---
  block/backup.c               |  7 +++++--
  block/commit.c               |  2 +-
  block/mirror.c               |  4 ++++
  block/stream.c               |  7 ++++++-
  blockjob.c                   | 18 ++++++++++++++++++
  include/block/blockjob_int.h | 11 +++++++++++
  6 files changed, 45 insertions(+), 4 deletions(-)

Yes, that’s a good change.

[...]

diff --git a/block/stream.c b/block/stream.c
index e522bbdec5..f7e8b35e94 100644
--- a/block/stream.c
+++ b/block/stream.c

[...]

@@ -207,6 +207,11 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
          }
      }
+    ret = block_job_final_target_flush(&s->common, s->target_bs);
+    if (error == 0) {
+        error = ret;
+    }

In all other jobs, this function is invoked only if the job was successful, but 
here it’s called unconditionally.  I don’t mind one way or the other, but I 
think it should be consistent.  (Mainly just because inconsistency makes me 
wonder whether there’s an undocumented reason for it.)

The inconsistency is preexisting: stream job do return error even when there were 
ignored errors. I'm not sure do we really need this logic for stream job, but 
decided to flush anyway. It looks consistent in context of stream_run: it 
continues copy operations even when error is already < 0. So, let's do flush 
too.


+
      /* Do not remove the backing file if an error was there but ignored. */
      return error;
  }
diff --git a/blockjob.c b/blockjob.c
index 25fe8e625d..313e586b0d 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -611,3 +611,21 @@ AioContext *block_job_get_aio_context(BlockJob *job)
      GLOBAL_STATE_CODE();
      return job->job.aio_context;
  }
+
+int coroutine_fn
+block_job_final_target_flush(BlockJob *job, BlockDriverState *target_bs)
+{
+    int ret;
+

Should we mark this as IO_CODE()?

I'm not sure, but seems that yes.


+    WITH_GRAPH_RDLOCK_GUARD() {
+        ret = bdrv_co_flush(target_bs);
+    }
+
+    if (ret < 0 && !block_job_is_internal(job)) {
+        qapi_event_send_block_job_error(job->job.id,
+                                        IO_OPERATION_TYPE_WRITE,
+                                        BLOCK_ERROR_ACTION_REPORT);
+    }

Would it make sense to rely on block_job_error_action() instead?  If so, should 
we use the on-target-error setting?

Hmm. To be honest we should. And handle the returned action appropriately. I'll 
resend, that's a good caught.


I have no informed opinion on this, but I think using block_job_error_action() 
does come to mind, so if we consciously decide against it, that’s probably 
worth a comment, too.

Hanna

+
+    return ret;
+}


--
Best regards,
Vladimir


Reply via email to