Re: [Qemu-devel] [PATCH] Fix block migration when the device size is not a multiple of 1 MB

Yoshiaki Tamura Fri, 21 Jan 2011 06:22:15 -0800

2011/1/21 Pierre Riteau <pierre.rit...@irisa.fr>:
> On 21 janv. 2011, at 14:59, Yoshiaki Tamura wrote:
>
>> 2011/1/21 Pierre Riteau <pierre.rit...@irisa.fr>:
>>> On 21 janv. 2011, at 13:36, Yoshiaki Tamura wrote:
>>>
>>>> 2011/1/21 Kevin Wolf <kw...@redhat.com>:
>>>>> Am 21.01.2011 13:15, schrieb Yoshiaki Tamura:
>>>>>> 2011/1/21 Pierre Riteau <pierre.rit...@irisa.fr>:
>>>>>>> Le 20 janv. 2011 à 17:18, Yoshiaki Tamura 
>>>>>>> <tamura.yoshi...@lab.ntt.co.jp> a écrit :
>>>>>>>
>>>>>>>> 2011/1/20 Pierre Riteau <pierre.rit...@irisa.fr>:
>>>>>>>>> On 20 janv. 2011, at 03:06, Yoshiaki Tamura wrote:
>>>>>>>>>
>>>>>>>>>> 2011/1/19 Pierre Riteau <pierre.rit...@irisa.fr>:
>>>>>>>>>>> b02bea3a85cc939f09aa674a3f1e4f36d418c007 added a check on the return
>>>>>>>>>>> value of bdrv_write and aborts migration when it fails. However, if 
>>>>>>>>>>> the
>>>>>>>>>>> size of the block device to migrate is not a multiple of BLOCK_SIZE
>>>>>>>>>>> (currently 1 MB), the last bdrv_write will fail with -EIO.
>>>>>>>>>>>
>>>>>>>>>>> Fixed by calling bdrv_write with the correct size of the last block.
>>>>>>>>>>> ---
>>>>>>>>>>>  block-migration.c |   16 +++++++++++++++-
>>>>>>>>>>>  1 files changed, 15 insertions(+), 1 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/block-migration.c b/block-migration.c
>>>>>>>>>>> index 1475325..eeb9c62 100644
>>>>>>>>>>> --- a/block-migration.c
>>>>>>>>>>> +++ b/block-migration.c
>>>>>>>>>>> @@ -635,6 +635,8 @@ static int block_load(QEMUFile *f, void 
>>>>>>>>>>> *opaque, int version_id)
>>>>>>>>>>>     int64_t addr;
>>>>>>>>>>>     BlockDriverState *bs;
>>>>>>>>>>>     uint8_t *buf;
>>>>>>>>>>> +    int64_t total_sectors;
>>>>>>>>>>> +    int nr_sectors;
>>>>>>>>>>>
>>>>>>>>>>>     do {
>>>>>>>>>>>         addr = qemu_get_be64(f);
>>>>>>>>>>> @@ -656,10 +658,22 @@ static int block_load(QEMUFile *f, void 
>>>>>>>>>>> *opaque, int version_id)
>>>>>>>>>>>                 return -EINVAL;
>>>>>>>>>>>             }
>>>>>>>>>>>
>>>>>>>>>>> +            total_sectors = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
>>>>>>>>>>> +            if (total_sectors <= 0) {
>>>>>>>>>>> +                fprintf(stderr, "Error getting length of block 
>>>>>>>>>>> device %s\n", device_name);
>>>>>>>>>>> +                return -EINVAL;
>>>>>>>>>>> +            }
>>>>>>>>>>> +
>>>>>>>>>>> +            if (total_sectors - addr < 
>>>>>>>>>>> BDRV_SECTORS_PER_DIRTY_CHUNK) {
>>>>>>>>>>> +                nr_sectors = total_sectors - addr;
>>>>>>>>>>> +            } else {
>>>>>>>>>>> +                nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
>>>>>>>>>>> +            }
>>>>>>>>>>> +
>>>>>>>>>>>             buf = qemu_malloc(BLOCK_SIZE);
>>>>>>>>>>>
>>>>>>>>>>>             qemu_get_buffer(f, buf, BLOCK_SIZE);
>>>>>>>>>>> -            ret = bdrv_write(bs, addr, buf, 
>>>>>>>>>>> BDRV_SECTORS_PER_DIRTY_CHUNK);
>>>>>>>>>>> +            ret = bdrv_write(bs, addr, buf, nr_sectors);
>>>>>>>>>>>
>>>>>>>>>>>             qemu_free(buf);
>>>>>>>>>>>             if (ret < 0) {
>>>>>>>>>>> --
>>>>>>>>>>> 1.7.3.5
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Pierre,
>>>>>>>>>>
>>>>>>>>>> I don't think the fix above is correct.  If you have a file which
>>>>>>>>>> isn't aliened with BLOCK_SIZE, you won't get an error with the
>>>>>>>>>> patch.  However, the receiver doesn't know how much sectors which
>>>>>>>>>> the sender wants to be written, so the guest may fail after
>>>>>>>>>> migration because some data may not be written.  IIUC, although
>>>>>>>>>> changing bytestream should be prevented as much as possible, we
>>>>>>>>>> should save/load total_sectors to check appropriate file is
>>>>>>>>>> allocated on the receiver side.
>>>>>>>>>
>>>>>>>>> Isn't the guest supposed to be started using a file with the correct 
>>>>>>>>> size?
>>>>>>>>
>>>>>>>> I personally don't like that; It's insisting too much to the user.
>>>>>>>> Can't we expand the image on the fly?  We can just abort if expanding
>>>>>>>> failed anyway.
>>>>>>>
>>>>>>> At first I thought your expansion idea was best, but now I think there 
>>>>>>> are valid scenarios where it fails.
>>>>>>>
>>>>>>> Imagine both sides are not using a file but a disk partition as 
>>>>>>> storage. If the partition size is not rounded to 1 MB, the last write 
>>>>>>> will fail with the current code, and there is no way we can expand the 
>>>>>>> partition.
>>>>>>>
>>>>>>
>>>>>> Right.  But in case of partition doesn't the check in the patch below
>>>>>> return error?  Does bdrv_getlength return the size correctly?
>>>>>
>>>>> I'm pretty sure that it does. We would have problems in other places if
>>>>> it didn't (e.g. we're checking if I/O requests are within the disk size).
>>>>
>>>> Sorry for the noise.  I just learned it's returning the value of lseek
>>>> in case of raw-posix.
>>>
>>>
>>> And it does a ioctl call on other platforms than Linux.
>>
>> Thanks.  Just a quick question regarding total_sectors.
>> BlockDriverState seems to contain total_sectors.  Can we avoid
>> calling bdrv_getlength() if bs->total_sectors were already there?
>
> From a comment in bdrv_getlength():
>
> Fixed size devices use the total_sectors value for speed instead of
> issuing a length query (like lseek) on each call.  Also, legacy block
> drivers don't provide a bdrv_getlength function and must use
> total_sectors.
>
> So using bdrv_getlength will protect against devices being resized during 
> migration, but as far as I can see, the sender side doesn't support it: the 
> value of total_sectors is cached for the whole block migration.


Even if the sender supports it, as far as total_sectors isn't
sent to the receiver, can we follow the resize on the receiver?

Yoshi

>
> --
> Pierre Riteau -- PhD student, Myriads team, IRISA, Rennes, France
> http://perso.univ-rennes1.fr/pierre.riteau/
>
>
>

Re: [Qemu-devel] [PATCH] Fix block migration when the device size is not a multiple of 1 MB

Reply via email to