On Wed, Nov 02, 2011 at 03:43:49PM +0000, Stefan Hajnoczi wrote: > On Tue, Nov 1, 2011 at 6:06 PM, Marcelo Tosatti <mtosa...@redhat.com> wrote: > > On Thu, Oct 27, 2011 at 04:22:50PM +0100, Stefan Hajnoczi wrote: > >> +static int stream_one_iteration(StreamBlockJob *s, int64_t sector_num, > >> + void *buf, int max_sectors, int *n) > >> +{ > >> + BlockDriverState *bs = s->common.bs; > >> + int ret; > >> + > >> + trace_stream_one_iteration(s, sector_num, max_sectors); > >> + > >> + ret = bdrv_is_allocated(bs, sector_num, max_sectors, n); > >> + if (ret < 0) { > >> + return ret; > >> + } > > > > bdrv_is_allocated is still synchronous? If so, there should be at least > > a plan to make it asynchronous. > > Yes, that's a good discussion to have. My thoughts are that > bdrv_is_allocated() should be executed in coroutine context. The > semantics are a little tricky because of parallel requests: > > 1. If a write request is in progress when we do bdrv_is_allocated() we > might get back "unallocated" even though clusters are just being > allocated. > 2. If a TRIM request is in progress when we do bdrv_is_allocated() we > might get back "allocated" even though clusters are just being > deallocated. > > In order to be reliable the caller needs to be aware of parallel > requests. I think it's correct to defer this problem to the caller. > > In the case of image streaming we're not TRIM-safe, I haven't really > thought about it yet. But we are safe against parallel write requests > because there is serialization to prevent copy-on-read requests from > racing with write requests. > > >> + if (!ret) { > >> + ret = stream_populate(bs, sector_num, *n, buf); > >> + } > >> + return ret; > >> +} > >> + > >> +static void coroutine_fn stream_run(void *opaque) > >> +{ > >> + StreamBlockJob *s = opaque; > >> + BlockDriverState *bs = s->common.bs; > >> + int64_t sector_num, end; > >> + int ret = 0; > >> + int n; > >> + void *buf; > >> + > >> + buf = qemu_blockalign(bs, STREAM_BUFFER_SIZE); > >> + s->common.len = bdrv_getlength(bs); > >> + bdrv_get_geometry(bs, (uint64_t *)&end); > >> + > >> + bdrv_set_zero_detection(bs, true); > >> + bdrv_set_copy_on_read(bs, true); > > > > Should distinguish between stream initiated and user initiated setting > > of zero detection and cor (so that unsetting below does not clear > > user settings). > > For zero detection I agree. > > For copy-on-read it is questionable since once streaming is complete > it does not make sense to have copy-on-read enabled. > > I will address this in the next revision and think more about the > copy-on-read case. > > >> + > >> + for (sector_num = 0; sector_num < end; sector_num += n) { > >> + if (block_job_is_cancelled(&s->common)) { > >> + break; > >> + } > > > > If cancellation is seen here in the last loop iteration, > > bdrv_change_backing_file below should not be executed. > > I documented this case in the QMP API. I'm not sure if it's possible > to guarantee that the operation isn't just completing as you cancel > it. Any blocking point between completion of the last iteration and > completing the operation is vulnerable to missing the cancel. It's > easier to explicitly say the operation might just have completed when > you canceled, rather than trying to protect the completion path. Do > you think it's a problem to have these loose semantics that I > described?
No, that is ok. I'm referring to bdrv_change_backing_file() being executed without the entire image being streamed. "if (sector_num == end && ret == 0)" includes both all sectors being streamed and all sectors except the last iteration being streamed (due to job cancelled break). > >> + > >> + /* TODO rate-limit */ > >> + /* Note that even when no rate limit is applied we need to yield > >> with > >> + * no pending I/O here so that qemu_aio_flush() is able to return. > >> + */ > >> + co_sleep_ns(rt_clock, 0); > > > > How do you plan to implement rate limit? > > It was implemented in the QED-specific image streaming series: > > http://repo.or.cz/w/qemu/stefanha.git/commitdiff/22f2c09d2fcfe5e49ac4604fd23e4744f549a476 > > That implementation works fine and is small but I'd like to reuse the > migration speed limit, if possible. That way we don't have 3 > different rate-limiting implementations in QEMU :). One possibility would be to create a "virtual" block device for streaming, sitting on top of the real block device. Then enforce block I/O limits on the virtual block device, the guest would remain accessing the real block device.