2011/1/21 Pierre Riteau <pierre.rit...@irisa.fr>: > On 21 janv. 2011, at 14:59, Yoshiaki Tamura wrote: > >> 2011/1/21 Pierre Riteau <pierre.rit...@irisa.fr>: >>> On 21 janv. 2011, at 13:36, Yoshiaki Tamura wrote: >>> >>>> 2011/1/21 Kevin Wolf <kw...@redhat.com>: >>>>> Am 21.01.2011 13:15, schrieb Yoshiaki Tamura: >>>>>> 2011/1/21 Pierre Riteau <pierre.rit...@irisa.fr>: >>>>>>> Le 20 janv. 2011 à 17:18, Yoshiaki Tamura >>>>>>> <tamura.yoshi...@lab.ntt.co.jp> a écrit : >>>>>>> >>>>>>>> 2011/1/20 Pierre Riteau <pierre.rit...@irisa.fr>: >>>>>>>>> On 20 janv. 2011, at 03:06, Yoshiaki Tamura wrote: >>>>>>>>> >>>>>>>>>> 2011/1/19 Pierre Riteau <pierre.rit...@irisa.fr>: >>>>>>>>>>> b02bea3a85cc939f09aa674a3f1e4f36d418c007 added a check on the return >>>>>>>>>>> value of bdrv_write and aborts migration when it fails. However, if >>>>>>>>>>> the >>>>>>>>>>> size of the block device to migrate is not a multiple of BLOCK_SIZE >>>>>>>>>>> (currently 1 MB), the last bdrv_write will fail with -EIO. >>>>>>>>>>> >>>>>>>>>>> Fixed by calling bdrv_write with the correct size of the last block. >>>>>>>>>>> --- >>>>>>>>>>> block-migration.c | 16 +++++++++++++++- >>>>>>>>>>> 1 files changed, 15 insertions(+), 1 deletions(-) >>>>>>>>>>> >>>>>>>>>>> diff --git a/block-migration.c b/block-migration.c >>>>>>>>>>> index 1475325..eeb9c62 100644 >>>>>>>>>>> --- a/block-migration.c >>>>>>>>>>> +++ b/block-migration.c >>>>>>>>>>> @@ -635,6 +635,8 @@ static int block_load(QEMUFile *f, void >>>>>>>>>>> *opaque, int version_id) >>>>>>>>>>> int64_t addr; >>>>>>>>>>> BlockDriverState *bs; >>>>>>>>>>> uint8_t *buf; >>>>>>>>>>> + int64_t total_sectors; >>>>>>>>>>> + int nr_sectors; >>>>>>>>>>> >>>>>>>>>>> do { >>>>>>>>>>> addr = qemu_get_be64(f); >>>>>>>>>>> @@ -656,10 +658,22 @@ static int block_load(QEMUFile *f, void >>>>>>>>>>> *opaque, int version_id) >>>>>>>>>>> return -EINVAL; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> + total_sectors = bdrv_getlength(bs) >> BDRV_SECTOR_BITS; >>>>>>>>>>> + if (total_sectors <= 0) { >>>>>>>>>>> + fprintf(stderr, "Error getting length of block >>>>>>>>>>> device %s\n", device_name); >>>>>>>>>>> + return -EINVAL; >>>>>>>>>>> + } >>>>>>>>>>> + >>>>>>>>>>> + if (total_sectors - addr < >>>>>>>>>>> BDRV_SECTORS_PER_DIRTY_CHUNK) { >>>>>>>>>>> + nr_sectors = total_sectors - addr; >>>>>>>>>>> + } else { >>>>>>>>>>> + nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK; >>>>>>>>>>> + } >>>>>>>>>>> + >>>>>>>>>>> buf = qemu_malloc(BLOCK_SIZE); >>>>>>>>>>> >>>>>>>>>>> qemu_get_buffer(f, buf, BLOCK_SIZE); >>>>>>>>>>> - ret = bdrv_write(bs, addr, buf, >>>>>>>>>>> BDRV_SECTORS_PER_DIRTY_CHUNK); >>>>>>>>>>> + ret = bdrv_write(bs, addr, buf, nr_sectors); >>>>>>>>>>> >>>>>>>>>>> qemu_free(buf); >>>>>>>>>>> if (ret < 0) { >>>>>>>>>>> -- >>>>>>>>>>> 1.7.3.5 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Pierre, >>>>>>>>>> >>>>>>>>>> I don't think the fix above is correct. If you have a file which >>>>>>>>>> isn't aliened with BLOCK_SIZE, you won't get an error with the >>>>>>>>>> patch. However, the receiver doesn't know how much sectors which >>>>>>>>>> the sender wants to be written, so the guest may fail after >>>>>>>>>> migration because some data may not be written. IIUC, although >>>>>>>>>> changing bytestream should be prevented as much as possible, we >>>>>>>>>> should save/load total_sectors to check appropriate file is >>>>>>>>>> allocated on the receiver side. >>>>>>>>> >>>>>>>>> Isn't the guest supposed to be started using a file with the correct >>>>>>>>> size? >>>>>>>> >>>>>>>> I personally don't like that; It's insisting too much to the user. >>>>>>>> Can't we expand the image on the fly? We can just abort if expanding >>>>>>>> failed anyway. >>>>>>> >>>>>>> At first I thought your expansion idea was best, but now I think there >>>>>>> are valid scenarios where it fails. >>>>>>> >>>>>>> Imagine both sides are not using a file but a disk partition as >>>>>>> storage. If the partition size is not rounded to 1 MB, the last write >>>>>>> will fail with the current code, and there is no way we can expand the >>>>>>> partition. >>>>>>> >>>>>> >>>>>> Right. But in case of partition doesn't the check in the patch below >>>>>> return error? Does bdrv_getlength return the size correctly? >>>>> >>>>> I'm pretty sure that it does. We would have problems in other places if >>>>> it didn't (e.g. we're checking if I/O requests are within the disk size). >>>> >>>> Sorry for the noise. I just learned it's returning the value of lseek >>>> in case of raw-posix. >>> >>> >>> And it does a ioctl call on other platforms than Linux. >> >> Thanks. Just a quick question regarding total_sectors. >> BlockDriverState seems to contain total_sectors. Can we avoid >> calling bdrv_getlength() if bs->total_sectors were already there? > > From a comment in bdrv_getlength(): > > Fixed size devices use the total_sectors value for speed instead of > issuing a length query (like lseek) on each call. Also, legacy block > drivers don't provide a bdrv_getlength function and must use > total_sectors. > > So using bdrv_getlength will protect against devices being resized during > migration, but as far as I can see, the sender side doesn't support it: the > value of total_sectors is cached for the whole block migration.
Even if the sender supports it, as far as total_sectors isn't sent to the receiver, can we follow the resize on the receiver? Yoshi > > -- > Pierre Riteau -- PhD student, Myriads team, IRISA, Rennes, France > http://perso.univ-rennes1.fr/pierre.riteau/ > > >