On 09/11/2015 17:04, Kevin Wolf wrote: > Am 06.11.2015 um 11:22 hat Fam Zheng geschrieben: >> The "pnum < nb_sectors" condition in deciding whether to actually copy >> data is unnecessarily strict, and the qiov initialization is >> unnecessarily too, for both bdrv_aio_write_zeroes and bdrv_aio_discard >> branches. >> >> Reorganize mirror_iteration flow so that we: >> >> 1) Find the contiguous zero/discarded sectors with >> bdrv_get_block_status_above() before deciding what to do. We query >> s->buf_size sized blocks at a time. >> >> 2) If the sectors in question are zeroed/discarded and aligned to >> target cluster, issue zero write or discard accordingly. It's done >> in mirror_do_zero_or_discard, where we don't add buffer to qiov. >> >> 3) Otherwise, do the same loop as before in mirror_do_read. >> >> Signed-off-by: Fam Zheng <f...@redhat.com> > > I'm not sure where in the patch to comment on this, so I'll just do it > here right in the beginning. > > I'm concerned that we need to be more careful about races in this patch, > in particular regarding the bitmaps. I think the conditions for the two > bitmaps are: > > * Dirty bitmap: We must clear the bit after finding the next piece of > data to be mirrored, but before we yield after getting information > that we use for the decision which kind of operation we need. > > In other words, we need to clear the dirty bitmap bit before calling > bdrv_get_block_status_above(), because that's both the function that > retrieves information about the next chunk and also a function that > can yield. > > If after this point the data is written to, we need to mirror it > again.
With Fam's patch, that's not trivial for two reasons: 1) bdrv_get_block_status_above() can return a smaller amount than what is asked. 2) the "read and write" case can handle s->granularity sectors per iteration (many of them can be coalesced, but still that's how the iteration works). The simplest solution is to perform the query with s->granularity size rather than s->buf_size. Paolo > * In-flight bitmap: We need to make sure that we never mirror the same > data twice at the same time as older data could overwrite newer data > then. > > Strictly speaking, it looks to me as if this meant we can delay > setting the bit until before we issue an AIO operation. It might be > more obviously correct to set it at the same time as the dirty bitmap > is cleared.