On 21 janv. 2011, at 15:30, Yoshiaki Tamura wrote: > 2011/1/21 Pierre Riteau <pierre.rit...@irisa.fr>: >> On 21 janv. 2011, at 15:21, Yoshiaki Tamura wrote: >> >>> 2011/1/21 Pierre Riteau <pierre.rit...@irisa.fr>: >>>> On 21 janv. 2011, at 14:59, Yoshiaki Tamura wrote: >>>> >>>>> 2011/1/21 Pierre Riteau <pierre.rit...@irisa.fr>: >>>>>> On 21 janv. 2011, at 13:36, Yoshiaki Tamura wrote: >>>>>> >>>>>>> 2011/1/21 Kevin Wolf <kw...@redhat.com>: >>>>>>>> Am 21.01.2011 13:15, schrieb Yoshiaki Tamura: >>>>>>>>> 2011/1/21 Pierre Riteau <pierre.rit...@irisa.fr>: >>>>>>>>>> Le 20 janv. 2011 à 17:18, Yoshiaki Tamura >>>>>>>>>> <tamura.yoshi...@lab.ntt.co.jp> a écrit : >>>>>>>>>> >>>>>>>>>>> 2011/1/20 Pierre Riteau <pierre.rit...@irisa.fr>: >>>>>>>>>>>> On 20 janv. 2011, at 03:06, Yoshiaki Tamura wrote: >>>>>>>>>>>> >>>>>>>>>>>>> 2011/1/19 Pierre Riteau <pierre.rit...@irisa.fr>: >>>>>>>>>>>>>> b02bea3a85cc939f09aa674a3f1e4f36d418c007 added a check on the >>>>>>>>>>>>>> return >>>>>>>>>>>>>> value of bdrv_write and aborts migration when it fails. However, >>>>>>>>>>>>>> if the >>>>>>>>>>>>>> size of the block device to migrate is not a multiple of >>>>>>>>>>>>>> BLOCK_SIZE >>>>>>>>>>>>>> (currently 1 MB), the last bdrv_write will fail with -EIO. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Fixed by calling bdrv_write with the correct size of the last >>>>>>>>>>>>>> block. >>>>>>>>>>>>>> --- >>>>>>>>>>>>>> block-migration.c | 16 +++++++++++++++- >>>>>>>>>>>>>> 1 files changed, 15 insertions(+), 1 deletions(-) >>>>>>>>>>>>>> >>>>>>>>>>>>>> diff --git a/block-migration.c b/block-migration.c >>>>>>>>>>>>>> index 1475325..eeb9c62 100644 >>>>>>>>>>>>>> --- a/block-migration.c >>>>>>>>>>>>>> +++ b/block-migration.c >>>>>>>>>>>>>> @@ -635,6 +635,8 @@ static int block_load(QEMUFile *f, void >>>>>>>>>>>>>> *opaque, int version_id) >>>>>>>>>>>>>> int64_t addr; >>>>>>>>>>>>>> BlockDriverState *bs; >>>>>>>>>>>>>> uint8_t *buf; >>>>>>>>>>>>>> + int64_t total_sectors; >>>>>>>>>>>>>> + int nr_sectors; >>>>>>>>>>>>>> >>>>>>>>>>>>>> do { >>>>>>>>>>>>>> addr = qemu_get_be64(f); >>>>>>>>>>>>>> @@ -656,10 +658,22 @@ static int block_load(QEMUFile *f, void >>>>>>>>>>>>>> *opaque, int version_id) >>>>>>>>>>>>>> return -EINVAL; >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> + total_sectors = bdrv_getlength(bs) >> >>>>>>>>>>>>>> BDRV_SECTOR_BITS; >>>>>>>>>>>>>> + if (total_sectors <= 0) { >>>>>>>>>>>>>> + fprintf(stderr, "Error getting length of block >>>>>>>>>>>>>> device %s\n", device_name); >>>>>>>>>>>>>> + return -EINVAL; >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + >>>>>>>>>>>>>> + if (total_sectors - addr < >>>>>>>>>>>>>> BDRV_SECTORS_PER_DIRTY_CHUNK) { >>>>>>>>>>>>>> + nr_sectors = total_sectors - addr; >>>>>>>>>>>>>> + } else { >>>>>>>>>>>>>> + nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK; >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + >>>>>>>>>>>>>> buf = qemu_malloc(BLOCK_SIZE); >>>>>>>>>>>>>> >>>>>>>>>>>>>> qemu_get_buffer(f, buf, BLOCK_SIZE); >>>>>>>>>>>>>> - ret = bdrv_write(bs, addr, buf, >>>>>>>>>>>>>> BDRV_SECTORS_PER_DIRTY_CHUNK); >>>>>>>>>>>>>> + ret = bdrv_write(bs, addr, buf, nr_sectors); >>>>>>>>>>>>>> >>>>>>>>>>>>>> qemu_free(buf); >>>>>>>>>>>>>> if (ret < 0) { >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> 1.7.3.5 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Pierre, >>>>>>>>>>>>> >>>>>>>>>>>>> I don't think the fix above is correct. If you have a file which >>>>>>>>>>>>> isn't aliened with BLOCK_SIZE, you won't get an error with the >>>>>>>>>>>>> patch. However, the receiver doesn't know how much sectors which >>>>>>>>>>>>> the sender wants to be written, so the guest may fail after >>>>>>>>>>>>> migration because some data may not be written. IIUC, although >>>>>>>>>>>>> changing bytestream should be prevented as much as possible, we >>>>>>>>>>>>> should save/load total_sectors to check appropriate file is >>>>>>>>>>>>> allocated on the receiver side. >>>>>>>>>>>> >>>>>>>>>>>> Isn't the guest supposed to be started using a file with the >>>>>>>>>>>> correct size? >>>>>>>>>>> >>>>>>>>>>> I personally don't like that; It's insisting too much to the user. >>>>>>>>>>> Can't we expand the image on the fly? We can just abort if >>>>>>>>>>> expanding >>>>>>>>>>> failed anyway. >>>>>>>>>> >>>>>>>>>> At first I thought your expansion idea was best, but now I think >>>>>>>>>> there are valid scenarios where it fails. >>>>>>>>>> >>>>>>>>>> Imagine both sides are not using a file but a disk partition as >>>>>>>>>> storage. If the partition size is not rounded to 1 MB, the last >>>>>>>>>> write will fail with the current code, and there is no way we can >>>>>>>>>> expand the partition. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Right. But in case of partition doesn't the check in the patch below >>>>>>>>> return error? Does bdrv_getlength return the size correctly? >>>>>>>> >>>>>>>> I'm pretty sure that it does. We would have problems in other places if >>>>>>>> it didn't (e.g. we're checking if I/O requests are within the disk >>>>>>>> size). >>>>>>> >>>>>>> Sorry for the noise. I just learned it's returning the value of lseek >>>>>>> in case of raw-posix. >>>>>> >>>>>> >>>>>> And it does a ioctl call on other platforms than Linux. >>>>> >>>>> Thanks. Just a quick question regarding total_sectors. >>>>> BlockDriverState seems to contain total_sectors. Can we avoid >>>>> calling bdrv_getlength() if bs->total_sectors were already there? >>>> >>>> From a comment in bdrv_getlength(): >>>> >>>> Fixed size devices use the total_sectors value for speed instead of >>>> issuing a length query (like lseek) on each call. Also, legacy block >>>> drivers don't provide a bdrv_getlength function and must use >>>> total_sectors. >>>> >>>> So using bdrv_getlength will protect against devices being resized during >>>> migration, but as far as I can see, the sender side doesn't support it: >>>> the value of total_sectors is cached for the whole block migration. >>> >>> Even if the sender supports it, as far as total_sectors isn't >>> sent to the receiver, can we follow the resize on the receiver? >> >> >> I was referring to the complex, and probably unrealistic scenario, where a >> user allocates a file of the correct size on the receiving side, starts >> block migration, and during migration grows the size of the disk on both the >> sender and receiver side. > > I thought supporting resize while block-migration would be a good > feature because Kemari is live migrating again and again :)
Then bdrv_getlength would need to be called in the sender loop as well. But there's one thing I don't know: how does the guest cope with online disk size changes? AFAIK Linux detects the size of the disk at boot. -- Pierre Riteau -- PhD student, Myriads team, IRISA, Rennes, France http://perso.univ-rennes1.fr/pierre.riteau/