>>My recommendation would be to add that bdrv_invalidate() implementation, 
>>then we can be sure for raw, and get the rest fixed as well. 

They are a bug tracker about bdrv_invalidate(), closed 2 years ago

http://tracker.ceph.com/issues/2467

Can we reopened it ?


----- Mail original ----- 

De: "Kevin Wolf" <kw...@redhat.com> 
À: "Josh Durgin" <josh.dur...@inktank.com> 
Cc: "Alexandre DERUMIER" <aderum...@odiso.com>, ceph-users@lists.ceph.com, 
"qemu-devel" <qemu-de...@nongnu.org> 
Envoyé: Mardi 22 Avril 2014 11:08:08 
Objet: Re: [Qemu-devel] [ceph-users] qemu + rbd block driver with 
cache=writeback, is live migration safe ? 

Am 19.04.2014 um 00:33 hat Josh Durgin geschrieben: 
> On 04/18/2014 10:47 AM, Alexandre DERUMIER wrote: 
> >Thanks Kevin for for the full explain! 
> > 
> >>>cache.writeback=on,cache.direct=off,cache.no-flush=off 
> > 
> >I didn't known about the cache options split,thanks. 
> > 
> > 
> >>>rbd does, to my knowledge, not use the kernel page cache, so we're safe 
> >>>from that part. It does however honour the cache.direct flag when it 
> >>>decides whether to use its own cache. rbd doesn't implement 
> >>>bdrv_invalidate_cache() in order to clear that cache when migration 
> >>>completes. 
> > 
> >Maybe some ceph devs could comment about this ? 
> 
> That's correct, librbd uses its own in-memory cache instead of 
> the kernel page cache, and it honors flush requests. Furthermore, 
> librbd keeps its own metadata synchronized among different 
> clients via the ceph cluster (this is information like image 
> size, which rbd snapshots exist, and rbd parent image). 
> 
> So as I understand it live migration with raw format images on 
> rbd is safe even with cache.writeback=true and cache.direct=false 
> (i.e. cache=writeback) because: 
> 
> 1) rbd metadata is synchronized internally 
> 
> 2) the source vm has any rbd caches flushed by vm_stop() before 
> the destination starts 
> 
> 3) rbd does not read anything into its cache before the 
> destination starts 
> 
> 4) raw format images have no extra metadata that needs invalidation 
> 
> If librbd populated its cache when the disk was opened, the rbd driver 
> would need to implement bdrv_invalidate(), but since it does not, it's 
> unnecessary. 
> 
> Is this correct Kevin? 

I'm not sure about 3). The rbd block driver itself may not be reading 
anything into its cache during bdrv_open (though, what about things like 
the image size?), but qemu doesn't guarantee that it doesn't read 
anything from the image before migration completes. 

I think you may indeed be lucky for raw images, even though wouldn't bet 
money on it, but if your cache isn't internally kept coherent by librbd, 
without a bdrv_invalidate() implementation you're almost for sure unsafe 
with non-raw image formats. 

My recommendation would be to add that bdrv_invalidate() implementation, 
then we can be sure for raw, and get the rest fixed as well. 

Kevin 

> >>>No, such a QMP command doesn't exist, though it would be possible to 
> >>>implement (for toggling cache.direct, that is; cache.writeback is guest 
> >>>visible and can therefore only be toggled by the guest) 
> > 
> >yes, that's what I have in mind, toggling cache.direct=on before migration, 
> >then disable it after the migration. 
> > 
> > 
> > 
> >----- Mail original ----- 
> > 
> >De: "Kevin Wolf" <kw...@redhat.com> 
> >À: "Alexandre DERUMIER" <aderum...@odiso.com> 
> >Cc: "qemu-devel" <qemu-de...@nongnu.org>, ceph-users@lists.ceph.com 
> >Envoyé: Mardi 15 Avril 2014 11:36:22 
> >Objet: Re: [Qemu-devel] qemu + rbd block driver with cache=writeback, is 
> >live migration safe ? 
> > 
> >Am 12.04.2014 um 17:01 hat Alexandre DERUMIER geschrieben: 
> >>Hello, 
> >> 
> >>I known that qemu live migration with disk with cache=writeback are not 
> >>safe with storage like nfs,iscsi... 
> >> 
> >>Is it also true with rbd ? 
> > 
> >First of all, in order to avoid misunderstandings, let's be clear that 
> >there are three dimensions for the cache configuration of qemu block 
> >devices. In current versions, they are separately configurable and 
> >cache=writeback really expands to: 
> > 
> >cache.writeback=on,cache.direct=off,cache.no-flush=off 
> > 
> >The problematic part of this for live migration is generally not 
> >cache.writeback being enabled, but cache.direct being disabled. 
> > 
> >The reason for that is that the destination host will open the image 
> >file immediately, because it needs things like the image size to 
> >correctly initialise the emulated disk devices. Now during the migration 
> >the source keeps working on the image, so if qemu read some metadata on 
> >the destination host, that metadata may be stale by the time that the 
> >migration actually completes. 
> > 
> >In order to solve this problem, qemu calls bdrv_invalidate_cache(), 
> >which throws away everything that is cached in qemu so that it is reread 
> >from the image. However, this is ineffective if there are other caches 
> >having stale data, such as the kernel page cache. cache.direct bypasses 
> >the kernel page cache, so this is why it's important in many cases. 
> > 
> >rbd does, to my knowledge, not use the kernel page cache, so we're safe 
> >from that part. It does however honour the cache.direct flag when it 
> >decides whether to use its own cache. rbd doesn't implement 
> >bdrv_invalidate_cache() in order to clear that cache when migration 
> >completes. 
> > 
> >So the answer to your original question is that it's probably _not_ safe 
> >to use live migration with rbd and cache.direct=off. 
> > 
> >>If yes, it is possible to disable manually writeback online with qmp ? 
> > 
> >No, such a QMP command doesn't exist, though it would be possible to 
> >implement (for toggling cache.direct, that is; cache.writeback is guest 
> >visible and can therefore only be toggled by the guest). 
> > 
> >Kevin 
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to