On Tue, Nov 1, 2011 at 6:06 PM, Marcelo Tosatti <mtosa...@redhat.com> wrote: > On Thu, Oct 27, 2011 at 04:22:50PM +0100, Stefan Hajnoczi wrote: >> +static int stream_one_iteration(StreamBlockJob *s, int64_t sector_num, >> + void *buf, int max_sectors, int *n) >> +{ >> + BlockDriverState *bs = s->common.bs; >> + int ret; >> + >> + trace_stream_one_iteration(s, sector_num, max_sectors); >> + >> + ret = bdrv_is_allocated(bs, sector_num, max_sectors, n); >> + if (ret < 0) { >> + return ret; >> + } > > bdrv_is_allocated is still synchronous? If so, there should be at least > a plan to make it asynchronous.
Yes, that's a good discussion to have. My thoughts are that bdrv_is_allocated() should be executed in coroutine context. The semantics are a little tricky because of parallel requests: 1. If a write request is in progress when we do bdrv_is_allocated() we might get back "unallocated" even though clusters are just being allocated. 2. If a TRIM request is in progress when we do bdrv_is_allocated() we might get back "allocated" even though clusters are just being deallocated. In order to be reliable the caller needs to be aware of parallel requests. I think it's correct to defer this problem to the caller. In the case of image streaming we're not TRIM-safe, I haven't really thought about it yet. But we are safe against parallel write requests because there is serialization to prevent copy-on-read requests from racing with write requests. >> + if (!ret) { >> + ret = stream_populate(bs, sector_num, *n, buf); >> + } >> + return ret; >> +} >> + >> +static void coroutine_fn stream_run(void *opaque) >> +{ >> + StreamBlockJob *s = opaque; >> + BlockDriverState *bs = s->common.bs; >> + int64_t sector_num, end; >> + int ret = 0; >> + int n; >> + void *buf; >> + >> + buf = qemu_blockalign(bs, STREAM_BUFFER_SIZE); >> + s->common.len = bdrv_getlength(bs); >> + bdrv_get_geometry(bs, (uint64_t *)&end); >> + >> + bdrv_set_zero_detection(bs, true); >> + bdrv_set_copy_on_read(bs, true); > > Should distinguish between stream initiated and user initiated setting > of zero detection and cor (so that unsetting below does not clear > user settings). For zero detection I agree. For copy-on-read it is questionable since once streaming is complete it does not make sense to have copy-on-read enabled. I will address this in the next revision and think more about the copy-on-read case. >> + >> + for (sector_num = 0; sector_num < end; sector_num += n) { >> + if (block_job_is_cancelled(&s->common)) { >> + break; >> + } > > If cancellation is seen here in the last loop iteration, > bdrv_change_backing_file below should not be executed. I documented this case in the QMP API. I'm not sure if it's possible to guarantee that the operation isn't just completing as you cancel it. Any blocking point between completion of the last iteration and completing the operation is vulnerable to missing the cancel. It's easier to explicitly say the operation might just have completed when you canceled, rather than trying to protect the completion path. Do you think it's a problem to have these loose semantics that I described? >> + >> + /* TODO rate-limit */ >> + /* Note that even when no rate limit is applied we need to yield >> with >> + * no pending I/O here so that qemu_aio_flush() is able to return. >> + */ >> + co_sleep_ns(rt_clock, 0); > > How do you plan to implement rate limit? It was implemented in the QED-specific image streaming series: http://repo.or.cz/w/qemu/stefanha.git/commitdiff/22f2c09d2fcfe5e49ac4604fd23e4744f549a476 That implementation works fine and is small but I'd like to reuse the migration speed limit, if possible. That way we don't have 3 different rate-limiting implementations in QEMU :). Stefan