On 04/25/2017 12:55 PM, Peter Lieven wrote:
Am 24.04.2017 um 22:13 schrieb Anton Nefedov:
On 24/04/2017 21:16, Peter Lieven wrote:
Am 24.04.2017 um 18:27 schrieb Anton Nefedov
<anton.nefe...@virtuozzo.com>:
On 04/21/2017 03:37 PM, Peter Lieven wrote:
Am 21.04.2017 um 14:19 schrieb Anton Nefedov:
On 04/21/2017 01:44 PM, Peter Lieven wrote:
Am 21.04.2017 um 12:04 schrieb Anton Nefedov:
On error path (like i/o error in one of the coroutines), it's
required to
- wait for coroutines completion before cleaning the common
structures
- reenter dependent coroutines so they ever finish
Introduced in 2d9187bc65.
Signed-off-by: Anton Nefedov <anton.nefe...@virtuozzo.com>
---
[..]
And what if we error out in the read path? Wouldn't be something
like this easier?
diff --git a/qemu-img.c b/qemu-img.c
index 22f559a..4ff1085 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1903,6 +1903,16 @@ static int convert_do_copy(ImgConvertState
*s)
main_loop_wait(false);
}
+ /* on error path we need to enter all coroutines that are still
+ * running before cleaning up common structures */
+ if (s->ret) {
+ for (i = 0; i < s->num_coroutines; i++) {
+ if (s->co[i]) {
+ qemu_coroutine_enter(s->co[i]);
+ }
+ }
+ }
+
if (s->compressed && !s->ret) {
/* signal EOF to align */
ret = blk_pwrite_compressed(s->target, 0, NULL, 0);
Peter
seemed a bit too daring to me to re-enter every coroutine
potentially including the ones that yielded waiting for I/O
completion.
If that's ok - that is for sure easier :)
I think we should enter every coroutine that is still running and
have it terminate correctly. It was my mistake that I have not
done this in the original patch. Can you check if the above fixes
your test cases that triggered the bug?
hi, sorry I'm late with the answer
this segfaults in bdrv_close(). Looks like it tries to finish some
i/o which coroutine we have already entered and terminated?
(gdb) run
Starting program: /vz/anefedov/qemu-build/us/./qemu-img convert -O
[..]
Peter
/Anton
it seems that this is a bit tricky, can you share how your test case
works?
thanks,
peter
how I tested today was basically
diff --git a/qemu-img.c b/qemu-img.c
index 4425aaa..3d2d506 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1788,6 +1788,10 @@ static void coroutine_fn
convert_co_do_copy(void *opaque)
if (status == BLK_DATA) {
ret = convert_co_read(s, sector_num, n, buf);
+ const uint64_t fsector = 10*1024*1024/512;
+ if (sector_num <= fsector && fsector < sector_num+n) {
+ ret = -EIO;
+ }
if (ret < 0) {
error_report("error while reading sector %" PRId64
": %s", sector_num, strerror(-ret));
I ended up with sth similar to your approach. Can you check this?
Thanks,
Peter
diff --git a/qemu-img.c b/qemu-img.c
index b94fc11..ed9ce9e 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1761,13 +1761,13 @@ static void coroutine_fn convert_co_do_copy(void
*opaque)
qemu_co_mutex_lock(&s->lock);
if (s->ret != -EINPROGRESS || s->sector_num >= s->total_sectors) {
qemu_co_mutex_unlock(&s->lock);
- goto out;
+ break;
}
n = convert_iteration_sectors(s, s->sector_num);
if (n < 0) {
qemu_co_mutex_unlock(&s->lock);
s->ret = n;
- goto out;
+ break;
}
/* save current sector and allocation status to local variables */
sector_num = s->sector_num;
@@ -1792,7 +1792,6 @@ static void coroutine_fn convert_co_do_copy(void
*opaque)
error_report("error while reading sector %" PRId64
": %s", sector_num, strerror(-ret));
s->ret = ret;
- goto out;
}
} else if (!s->min_sparse && status == BLK_ZERO) {
status = BLK_DATA;
@@ -1801,22 +1800,20 @@ static void coroutine_fn convert_co_do_copy(void
*opaque)
if (s->wr_in_order) {
/* keep writes in order */
- while (s->wr_offs != sector_num) {
- if (s->ret != -EINPROGRESS) {
- goto out;
- }
+ while (s->wr_offs != sector_num && s->ret == -EINPROGRESS) {
s->wait_sector_num[index] = sector_num;
qemu_coroutine_yield();
}
s->wait_sector_num[index] = -1;
}
- ret = convert_co_write(s, sector_num, n, buf, status);
- if (ret < 0) {
- error_report("error while writing sector %" PRId64
- ": %s", sector_num, strerror(-ret));
- s->ret = ret;
- goto out;
+ if (s->ret == -EINPROGRESS) {
+ ret = convert_co_write(s, sector_num, n, buf, status);
+ if (ret < 0) {
+ error_report("error while writing sector %" PRId64
+ ": %s", sector_num, strerror(-ret));
+ s->ret = ret;
+ }
}
if (s->wr_in_order) {
@@ -1837,7 +1834,6 @@ static void coroutine_fn convert_co_do_copy(void
*opaque)
}
}
-out:
qemu_vfree(buf);
s->co[index] = NULL;
s->running_coroutines--;
@@ -1899,7 +1895,7 @@ static int convert_do_copy(ImgConvertState *s)
qemu_coroutine_enter(s->co[i]);
}
- while (s->ret == -EINPROGRESS) {
+ while (s->running_coroutines) {
main_loop_wait(false);
}
This one seems to work; I've been running for several hours qemu-img
convert from nbd which spontaneously fails in ~ every 2nd convert; and
yet qemu-img keeps terminating smoothly
/Anton