On Mon, May 28, 2012 at 08:12:12AM +0900, MATSUDA, Daiki wrote: > I researched the source of the bug. > > qemu-ga calls guest_fsfreeze_build_mount_list from > qemp_guest_fsreeze_thaw in qga/commands-posix.c. And it tries to read > /etc/mtab (= MOUNTED) to get mounted filesystems. But when they are > frozen, getmntent(fp) is not finished in the situation /etc/mtab in > frozen filesystem. > > I suggest to read the list from not frozen filesystem file or on-memory > data, e.g. /proc/mounts.
Yikes, this is a scary bug. Thanks for catching this. I suspect the getmntent() call is causing an access time update to /etc/mtab, which unfortunately will block while in a frozen state. RHEL 6 and newer kernels use relatime by default so the issue isn't as prevalent (though an atime update can still occur if /etc/mtab hasn't been updated since the last mtime update within the kernel's 24 hour limit. Unlikely, since guest-fsfreeze-freeze causes an update, so you'd need to wait that long before guest-fsfreeze-thaw would trigger it, since /etc/mtab modifications would block within that time, but not still plausible) If you can reproduce it on RHEL 6 using the "strictatime" mount option for /etc's filesystem, and I think that should confirm it. Previously to commit 9e8aded432884477bcd4fa1c7e849a196412bcc4, we stored the mount list created by guest-fsfreeze-freeze, but that behavior was changed so that qemu-ga could thaw the system regardless of whether or not it was the same instance that did the freeze, so that's why you're only seeing it in 1.1 RCs. Your suggested fix seems reasonable, but I'm having a hard time figuring out what the differences are between /etc/mtab and /proc/mounts, and whether they can be safely ignored for our purposes. One issue seems to be that it doesnt distinguish --bind mounts from real ones, but that at least is handled gracefully with the newer code (qemu-ga might freeze the filesystem multiple times, but the thaw implementation will thaw as many times as it needs to to unfreeze). If all seems well I'll send a patch this evening (or feel free to resend your's with a signed-off-by for due credit) > > Regards > MATSUDA Daiki > > --- qga/commands-posix.c.orig 2012-05-28 08:10:47.842332018 +0900 > +++ qga/commands-posix.c 2012-05-28 08:11:01.598340937 +0900 > @@ -347,7 +347,7 @@ static int guest_fsfreeze_build_mount_li > { > struct mntent *ment; > GuestFsfreezeMount *mount; > - char const *mtab = MOUNTED; > + char const *mtab = "/proc/mounts"; > FILE *fp; > > fp = setmntent(mtab, "r"); > > > I encountered the serious bug on QEMU Guest Agent. > > > > environment > > Guest OS : RHEL 5.8 / 5.7 (i686) > > Guest Agent Version : qemu-1.1.0rc2 and rc3 > > > > I am trying to take snapshot via virsh snapshot-create-as command. And > > to freeze guest's filesystem and take snapshot is succeed. But after > > sending the thaw command to Guest, time error occurs on libvirt qemu > > agent because of not catch Guest's answer. > > In addition, its situation is worst because the Guest Filesystem is kept > > as frozen. > > > > The problem does not occur on RHEL 6.2 Guest OS and in about qemu-1.0 it > > does not occur. > > > > Regards > > MATSUDA Daiki > > > > > > > > >