On Mon, Jan 04, 2016 at 01:50:40PM +0800, Wen Congyang wrote: > On 12/23/2015 05:47 PM, Stefan Hajnoczi wrote: > > On Wed, Dec 02, 2015 at 01:37:25PM +0800, Wen Congyang wrote: > >> + /* > >> + * Only write to active disk if the sectors have > >> + * already been allocated in active disk/hidden disk. > >> + */ > >> + qemu_iovec_init(&hd_qiov, qiov->niov); > >> + while (remaining_sectors > 0) { > >> + ret = bdrv_is_allocated_above(top, base, sector_num, > >> + remaining_sectors, &n); > > > > There is a race condition here since multiple I/O requests can be in > > flight at the same time. If two requests touch the same cluster > > between top->base then the result of these checks could be unreliable. > > I don't think so. When we come here, primary qemu is gone, and failover is > done. We only write to active disk if the sectors have already been allocated > in active disk/hidden disk before failover. So it two requests touch the same > cluster, it is OK, because the function bdrv_is_allocated_above()'s return > value is not changed.
You are right. I didn't realize that there will be no allocating writes to the active disk. There is no race condition. > >> + if (ret < 0) { > >> + return ret; > >> + } > >> + > >> + qemu_iovec_reset(&hd_qiov); > >> + qemu_iovec_concat(&hd_qiov, qiov, bytes_done, n * 512); > >> + > >> + target = ret ? top : base; > >> + ret = bdrv_co_writev(target, sector_num, n, &hd_qiov); > >> + if (ret < 0) { > >> + return ret; > >> + } > >> + > >> + remaining_sectors -= n; > >> + sector_num += n; > >> + bytes_done += n * BDRV_SECTOR_SIZE; > >> + } > > > > I think this can be replaced with an active commit block job that copies > > data down from the hidden/active disk to the secondary disk. It is okay > > to keep writing to the secondary disk while the block job is running and > > then switch over to the secondary disk once it completes. > > Yes, active commit is another choice. IIRC, I don't use it because mirror job > has some problem. It is fixed now(see bdrv_drained_begin()/bdrv_drained_end() > in the mirror job). > We will use mirror job in the next version. I see, thanks. > > > >> + > >> + return 0; > >> +} > >> + > >> +static coroutine_fn int replication_co_discard(BlockDriverState *bs, > >> + int64_t sector_num, > >> + int nb_sectors) > >> +{ > >> + BDRVReplicationState *s = bs->opaque; > >> + int ret; > >> + > >> + ret = replication_get_io_status(s); > >> + if (ret < 0) { > >> + return ret; > >> + } > >> + > >> + if (ret == 1) { > >> + /* It is secondary qemu and we are after failover */ > >> + ret = bdrv_co_discard(s->secondary_disk, sector_num, nb_sectors); > > > > What if the clusters are still allocated in the hidden/active disk? > > > > What does discard do? Drop the data that allocated in the disk? > If so, I think I make a misunderstand. I will fix it in the next version. If I understand correctly the chain is: (base) secondary_disk <- hidden disk <- active disk (top) Clusters in hidden disk or active disk will remain if you discard in secondary_disk. I think you need to consider how discard works with a backing chain (I have forgotten the details but you can check block.c and block/qcow2.c to find out). Stefan
signature.asc
Description: PGP signature