Daniel P. Berrangé <berra...@redhat.com> writes: > On Wed, Oct 25, 2023 at 11:32:00AM -0300, Fabiano Rosas wrote: >> Daniel P. Berrangé <berra...@redhat.com> writes: >> >> > On Tue, Oct 24, 2023 at 04:32:10PM -0300, Fabiano Rosas wrote: >> >> Markus Armbruster <arm...@redhat.com> writes: >> >> >> >> > Fabiano Rosas <faro...@suse.de> writes: >> >> > >> >> >> Add the direct-io migration parameter that tells the migration code to >> >> >> use O_DIRECT when opening the migration stream file whenever possible. >> >> >> >> >> >> This is currently only used for the secondary channels of fixed-ram >> >> >> migration, which can guarantee that writes are page aligned. >> >> >> >> >> >> However the parameter could be made to affect other types of >> >> >> file-based migrations in the future. >> >> >> >> >> >> Signed-off-by: Fabiano Rosas <faro...@suse.de> >> >> > >> >> > When would you want to enable @direct-io, and when would you want to >> >> > leave it disabled? >> >> >> >> That depends on a performance analysis. You'd generally leave it >> >> disabled unless there's some indication that the operating system is >> >> having trouble draining the page cache. >> > >> > That's not the usage model I would suggest. >> > >> >> Hehe I took a shot at answering but I really wanted to say "ask Daniel". >> >> > The biggest value of the page cache comes when it holds data that >> > will be repeatedly accessed. >> > >> > When you are saving/restoring a guest to file, that data is used >> > once only (assuming there's a large gap between save & restore). >> > By using the page cache to save a big guest we essentially purge >> > the page cache of most of its existing data that is likely to be >> > reaccessed, to fill it up with data never to be reaccessed. >> > >> > I usually describe save/restore operations as trashing the page >> > cache. >> > >> > IMHO, mgmt apps should request O_DIRECT always unless they expect >> > the save/restore operation to run in quick succession, or if they >> > know that the host has oodles of free RAM such that existing data >> > in the page cache won't be trashed, or >> >> Thanks, I'll try to incorporate this to some kind of doc in the next >> version. >> >> > if the host FS does not support O_DIRECT of course. >> >> Should we try to probe for this and inform the user? > > qemu_open_internall will already do a nice error message. If it gets > EINVAL when using O_DIRECT, it'll retry without O_DIRECT and if that > works, it'll reoprt "filesystem does not support O_DIRECT" > > Having said that I see a problem with /dev/fdset handling, because > we're only validating O_ACCMODE and that excludes O_DIRECT. > > If the mgmt apps passes an FD with O_DIRECT already set, then it > won't work for VMstate saving which is unaligned. > > If the mgmt app passes an FD without O_DIRECT set, then we are > not setting O_DIRECT for the multifd RAM threads.
Worse, the fds get dup'ed so even without O_DIRECT, we we enable it for the secondary channels the main channel will break on unaligned writes. For now I can only think of requiring two fds. One for the main channel and a second one for the rest of the channels. And validating the fd flags to make sure O_DIRECT is only allowed to be set in the second fd.