>>My recommendation would be to add that bdrv_invalidate() implementation, >>then we can be sure for raw, and get the rest fixed as well.
They are a bug tracker about bdrv_invalidate(), closed 2 years ago http://tracker.ceph.com/issues/2467 Can we reopened it ? ----- Mail original ----- De: "Kevin Wolf" <kw...@redhat.com> À: "Josh Durgin" <josh.dur...@inktank.com> Cc: "Alexandre DERUMIER" <aderum...@odiso.com>, ceph-users@lists.ceph.com, "qemu-devel" <qemu-de...@nongnu.org> Envoyé: Mardi 22 Avril 2014 11:08:08 Objet: Re: [Qemu-devel] [ceph-users] qemu + rbd block driver with cache=writeback, is live migration safe ? Am 19.04.2014 um 00:33 hat Josh Durgin geschrieben: > On 04/18/2014 10:47 AM, Alexandre DERUMIER wrote: > >Thanks Kevin for for the full explain! > > > >>>cache.writeback=on,cache.direct=off,cache.no-flush=off > > > >I didn't known about the cache options split,thanks. > > > > > >>>rbd does, to my knowledge, not use the kernel page cache, so we're safe > >>>from that part. It does however honour the cache.direct flag when it > >>>decides whether to use its own cache. rbd doesn't implement > >>>bdrv_invalidate_cache() in order to clear that cache when migration > >>>completes. > > > >Maybe some ceph devs could comment about this ? > > That's correct, librbd uses its own in-memory cache instead of > the kernel page cache, and it honors flush requests. Furthermore, > librbd keeps its own metadata synchronized among different > clients via the ceph cluster (this is information like image > size, which rbd snapshots exist, and rbd parent image). > > So as I understand it live migration with raw format images on > rbd is safe even with cache.writeback=true and cache.direct=false > (i.e. cache=writeback) because: > > 1) rbd metadata is synchronized internally > > 2) the source vm has any rbd caches flushed by vm_stop() before > the destination starts > > 3) rbd does not read anything into its cache before the > destination starts > > 4) raw format images have no extra metadata that needs invalidation > > If librbd populated its cache when the disk was opened, the rbd driver > would need to implement bdrv_invalidate(), but since it does not, it's > unnecessary. > > Is this correct Kevin? I'm not sure about 3). The rbd block driver itself may not be reading anything into its cache during bdrv_open (though, what about things like the image size?), but qemu doesn't guarantee that it doesn't read anything from the image before migration completes. I think you may indeed be lucky for raw images, even though wouldn't bet money on it, but if your cache isn't internally kept coherent by librbd, without a bdrv_invalidate() implementation you're almost for sure unsafe with non-raw image formats. My recommendation would be to add that bdrv_invalidate() implementation, then we can be sure for raw, and get the rest fixed as well. Kevin > >>>No, such a QMP command doesn't exist, though it would be possible to > >>>implement (for toggling cache.direct, that is; cache.writeback is guest > >>>visible and can therefore only be toggled by the guest) > > > >yes, that's what I have in mind, toggling cache.direct=on before migration, > >then disable it after the migration. > > > > > > > >----- Mail original ----- > > > >De: "Kevin Wolf" <kw...@redhat.com> > >À: "Alexandre DERUMIER" <aderum...@odiso.com> > >Cc: "qemu-devel" <qemu-de...@nongnu.org>, ceph-users@lists.ceph.com > >Envoyé: Mardi 15 Avril 2014 11:36:22 > >Objet: Re: [Qemu-devel] qemu + rbd block driver with cache=writeback, is > >live migration safe ? > > > >Am 12.04.2014 um 17:01 hat Alexandre DERUMIER geschrieben: > >>Hello, > >> > >>I known that qemu live migration with disk with cache=writeback are not > >>safe with storage like nfs,iscsi... > >> > >>Is it also true with rbd ? > > > >First of all, in order to avoid misunderstandings, let's be clear that > >there are three dimensions for the cache configuration of qemu block > >devices. In current versions, they are separately configurable and > >cache=writeback really expands to: > > > >cache.writeback=on,cache.direct=off,cache.no-flush=off > > > >The problematic part of this for live migration is generally not > >cache.writeback being enabled, but cache.direct being disabled. > > > >The reason for that is that the destination host will open the image > >file immediately, because it needs things like the image size to > >correctly initialise the emulated disk devices. Now during the migration > >the source keeps working on the image, so if qemu read some metadata on > >the destination host, that metadata may be stale by the time that the > >migration actually completes. > > > >In order to solve this problem, qemu calls bdrv_invalidate_cache(), > >which throws away everything that is cached in qemu so that it is reread > >from the image. However, this is ineffective if there are other caches > >having stale data, such as the kernel page cache. cache.direct bypasses > >the kernel page cache, so this is why it's important in many cases. > > > >rbd does, to my knowledge, not use the kernel page cache, so we're safe > >from that part. It does however honour the cache.direct flag when it > >decides whether to use its own cache. rbd doesn't implement > >bdrv_invalidate_cache() in order to clear that cache when migration > >completes. > > > >So the answer to your original question is that it's probably _not_ safe > >to use live migration with rbd and cache.direct=off. > > > >>If yes, it is possible to disable manually writeback online with qmp ? > > > >No, such a QMP command doesn't exist, though it would be possible to > >implement (for toggling cache.direct, that is; cache.writeback is guest > >visible and can therefore only be toggled by the guest). > > > >Kevin > > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com