On Fri, May 03, 2024 at 04:56:08PM -0300, Fabiano Rosas wrote: > Peter Xu <pet...@redhat.com> writes: > > > On Fri, Apr 26, 2024 at 11:20:35AM -0300, Fabiano Rosas wrote: > >> When the migration using the "file:" URI was implemented, I don't > >> think any of us noticed that if you pass in a file name with the > >> format "/dev/fdset/N", this allows a file descriptor to be passed in > >> to QEMU and that behaves just like the "fd:" URI. So the "file:" > >> support has been added without regard for the fdset part and we got > >> some things wrong. > >> > >> The first issue is that we should not truncate the migration file if > >> we're allowing an fd + offset. We need to leave the file contents > >> untouched. > > > > I'm wondering whether we can use fallocate() instead on the ranges so that > > we always don't open() with O_TRUNC. Before that.. could you remind me > > why do we need to truncate in the first place? I definitely missed > > something else here too. > > AFAIK, just to avoid any issues if the file is pre-existing. I don't see > the difference between O_TRUNC and fallocate in this case.
Then, shall we avoid truncations at all, leaving all the feasibility to user (also errors prone to make)? > > > > >> > >> The second issue is that there's an expectation that QEMU removes the > >> fd after the migration has finished. That's what the "fd:" code > >> does. Otherwise a second migration on the same VM could attempt to > >> provide an fdset with the same name and QEMU would reject it. > > > > Let me check what we do when with "fd:" and when migration completes or > > cancels. > > > > IIUC it's qio_channel_file_close() that does the final cleanup work on > > e.g. to_dst_file, right? Then there's qemu_close(), and it has: > > > > /* Close fd that was dup'd from an fdset */ > > fdset_id = monitor_fdset_dup_fd_find(fd); > > if (fdset_id != -1) { > > int ret; > > > > ret = close(fd); > > if (ret == 0) { > > monitor_fdset_dup_fd_remove(fd); > > } > > > > return ret; > > } > > > > Shouldn't this done the work already? > > That removes the mon_fdset_fd_dup->fd, we want to remove the > mon_fdset_fd->fd. What I read so far is when we are removing the dup-fds, we'll do one more thing: monitor_fdset_dup_fd_find_remove(): if (QLIST_EMPTY(&mon_fdset->dup_fds)) { monitor_fdset_cleanup(mon_fdset); } It means if we removed all the dup-fds correctly, we should also remove the whole fdset, which includes the ->fds, IIUC. > > > > > Off topic: I think this code is over complicated too, maybe I missed > > something, but afaict we don't need monitor_fdset_dup_fd_find at all.. we > > simply walk the list and remove stuff.. I attach a patch at the end that I > > tried to clean that up, just in case there's early comments. But we can > > ignore that so we don't get side-tracked, and focus on the direct-io > > issues. > > Well, I'm not confident touching this code. This is more than a decade > old, I have no idea what the original motivations were. The possible > interactions with the user via command-line (-add-fd), QMP (add-fd) and > the monitor lifetime make me confused. Not to mention the fdset part > being plumbed into the guts of a widely used qemu_open_internal() that > very misleadingly presents itself as just a wrapper for open(). If to make QEMU long live, we'll probably need to touch it at some point.. or at least discuss about it and figure things out. We pay tech debts like this when there's no good comment / docs to refer in this case, then the earlier, perhaps also the better.. to try taking the stab, imho. Definitely not a request to clean everything up. :) Let's see whether others can chim in with better knowledge of the history. > > > > > Thanks, > > > > ======= > > > > From 2f6b6d1224486d8ee830a7afe34738a07003b863 Mon Sep 17 00:00:00 2001 > > From: Peter Xu <pet...@redhat.com> > > Date: Fri, 3 May 2024 11:27:20 -0400 > > Subject: [PATCH] monitor: Drop monitor_fdset_dup_fd_add() > > MIME-Version: 1.0 > > Content-Type: text/plain; charset=UTF-8 > > Content-Transfer-Encoding: 8bit > > > > This function is not needed, one remove function should already work. > > Clean it up. > > > > Here the code doesn't really care about whether we need to keep that dupfd > > around if close() failed: when that happens something got very wrong, > > keeping the dup_fd around the fdsets may not help that situation so far. > > > > Cc: Dr. David Alan Gilbert <d...@treblig.org> > > Cc: Markus Armbruster <arm...@redhat.com> > > Cc: Philippe Mathieu-Daudé <phi...@linaro.org> > > Cc: Paolo Bonzini <pbonz...@redhat.com> > > Cc: Daniel P. Berrangé <berra...@redhat.com> > > Signed-off-by: Peter Xu <pet...@redhat.com> > > --- > > include/monitor/monitor.h | 1 - > > monitor/fds.c | 27 +++++---------------------- > > stubs/fdset.c | 5 ----- > > util/osdep.c | 15 +-------------- > > 4 files changed, 6 insertions(+), 42 deletions(-) > > > > diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h > > index 965f5d5450..fd9b3f538c 100644 > > --- a/include/monitor/monitor.h > > +++ b/include/monitor/monitor.h > > @@ -53,7 +53,6 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool > > has_fdset_id, int64_t fdset_id, > > const char *opaque, Error **errp); > > int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags); > > void monitor_fdset_dup_fd_remove(int dup_fd); > > -int64_t monitor_fdset_dup_fd_find(int dup_fd); > > > > void monitor_register_hmp(const char *name, bool info, > > void (*cmd)(Monitor *mon, const QDict *qdict)); > > diff --git a/monitor/fds.c b/monitor/fds.c > > index d86c2c674c..d5aecfb70e 100644 > > --- a/monitor/fds.c > > +++ b/monitor/fds.c > > @@ -458,7 +458,7 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int > > flags) > > #endif > > } > > > > -static int64_t monitor_fdset_dup_fd_find_remove(int dup_fd, bool remove) > > +void monitor_fdset_dup_fd_remove(int dup_fd) > > { > > MonFdset *mon_fdset; > > MonFdsetFd *mon_fdset_fd_dup; > > @@ -467,31 +467,14 @@ static int64_t monitor_fdset_dup_fd_find_remove(int > > dup_fd, bool remove) > > QLIST_FOREACH(mon_fdset, &mon_fdsets, next) { > > QLIST_FOREACH(mon_fdset_fd_dup, &mon_fdset->dup_fds, next) { > > if (mon_fdset_fd_dup->fd == dup_fd) { > > - if (remove) { > > - QLIST_REMOVE(mon_fdset_fd_dup, next); > > - g_free(mon_fdset_fd_dup); > > - if (QLIST_EMPTY(&mon_fdset->dup_fds)) { > > - monitor_fdset_cleanup(mon_fdset); > > - } > > - return -1; > > - } else { > > - return mon_fdset->id; > > + QLIST_REMOVE(mon_fdset_fd_dup, next); > > + g_free(mon_fdset_fd_dup); > > + if (QLIST_EMPTY(&mon_fdset->dup_fds)) { > > + monitor_fdset_cleanup(mon_fdset); > > } > > } > > } > > } > > - > > - return -1; > > -} > > - > > -int64_t monitor_fdset_dup_fd_find(int dup_fd) > > -{ > > - return monitor_fdset_dup_fd_find_remove(dup_fd, false); > > -} > > - > > -void monitor_fdset_dup_fd_remove(int dup_fd) > > -{ > > - monitor_fdset_dup_fd_find_remove(dup_fd, true); > > } > > > > int monitor_fd_param(Monitor *mon, const char *fdname, Error **errp) > > diff --git a/stubs/fdset.c b/stubs/fdset.c > > index d7c39a28ac..389e368a29 100644 > > --- a/stubs/fdset.c > > +++ b/stubs/fdset.c > > @@ -9,11 +9,6 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags) > > return -1; > > } > > > > -int64_t monitor_fdset_dup_fd_find(int dup_fd) > > -{ > > - return -1; > > -} > > - > > void monitor_fdset_dup_fd_remove(int dupfd) > > { > > } > > diff --git a/util/osdep.c b/util/osdep.c > > index e996c4744a..2d9749d060 100644 > > --- a/util/osdep.c > > +++ b/util/osdep.c > > @@ -393,21 +393,8 @@ int qemu_open_old(const char *name, int flags, ...) > > > > int qemu_close(int fd) > > { > > - int64_t fdset_id; > > - > > /* Close fd that was dup'd from an fdset */ > > - fdset_id = monitor_fdset_dup_fd_find(fd); > > - if (fdset_id != -1) { > > - int ret; > > - > > - ret = close(fd); > > - if (ret == 0) { > > - monitor_fdset_dup_fd_remove(fd); > > - } > > - > > - return ret; > > - } > > - > > + monitor_fdset_dup_fd_remove(fd); > > return close(fd); > > } > > > > -- > > 2.44.0 > -- Peter Xu