Re: [Qemu-devel] Question about nonblocking stderr and lost logs

Sam Bobroff Sun, 13 Dec 2015 21:22:42 -0800

On Thu, Dec 10, 2015 at 04:37:22PM +0800, Stefan Hajnoczi wrote:
> On Mon, Dec 07, 2015 at 02:22:18PM +1100, Sam Bobroff wrote:
> > On Fri, Dec 04, 2015 at 02:33:01PM +0800, Stefan Hajnoczi wrote:
> > > On Mon, Nov 30, 2015 at 04:34:47PM +1100, Sam Bobroff wrote:
> > > > Hi QEMU programmers,
> > > > 
> > > > While doing some experimental work on QEMU that has involved adding a 
> > > > lot of
> > > > new log messages (using qemu_log_mask()), I've discovered that under 
> > > > some
> > > > conditions a lot of my log messages go missing.
> > > > 
> > > > I've tracked the issue down to qemu_logfile being left at the default 
> > > > (stderr)
> > > > (so when not using -D) and according to strace what is happening is 
> > > > that the
> > > > debug messages are being passed to write() but write() is returning 
> > > > EWOULDBLOCK
> > > > and the messags are dropped.
> > > > 
> > > > This seems to be happening because stderr is being set non-blocking 
> > > > (which is a
> > > > bit odd to me), and it seems like NONBLOCK is being set as 
> > > > qmp_chardev_add() adds a
> > > > device for stdout (yes stdout, not stderr; perhaps file descriptors 
> > > > have been
> > > > dup'd by that point?).
> > > > 
> > > > Is this by design to prevent a slow console from blocking QEMU? If not, 
> > > > should
> > > > I delve further and try to prevent non-blocking being set on stderr?
> > > > 
> > > > (Unfortunately I don't have a replication for this using an unmodified 
> > > > QEMU but
> > > > I suspect I could find one if necessary.)
> > > 
> > > This sounds like a bug.  stderr should be blocking.
> > > 
> > > But it's specific to your QEMU build or your command-line.  Here are my
> > > results on an x86 host:
> > > 
> > > $ qemu-system-x86_64 -enable-kvm -m 1024 -cpu host \
> > >                      -drive if=virtio,cache=none,file=test.img,format=raw
> > > $ cat /proc/$(pgrep qemu)/fdinfo/2
> > > pos:      0
> > > flags:    0100002
> > > mnt_id:   22
> > > 
> > > The flags are O_RDWR | O_LARGEFILE.
> > > 
> > > O_NONBLOCK would have been 04000 (octal).
> > > 
> > > Please retry with qemu.git/master and post your QEMU command-line.
> > > 
> > > Stefan
> > 
> > Hi Stefan,
> > 
> > Thanks for checking this! I think I've worked out what's happening and it's 
> > a
> > bit complicated ;-)
> > 
> > It appears that the behaviour is not directly related to the command line or
> > build but rather the way the shell sets up the environment.
> > 
> > Here's a test very similar to yours using origin/master QEMU on my Debian
> > stable machine, with bash as my shell:
> > 
> > $ qemu-system-x86_64 -S -vga none -nographic
> > ( In another virtual console: )
> > $ cat /proc/$(pgrep qemu)/fdinfo/2
> > pos:    0
> > flags:  0104002
> > mnt_id: 19
> > 
> > So I seem to have O_NONBLOCK set on stderr.
> > 
> > However, if I perform that test by backgrounding QEMU and using the same
> > console:
> > $ qemu-system-x86_64 -S -vga none -nographic &
> > [1] 4970
> > [1]+  Stopped                 qemu-system-x86_64 -S -nographic -vga none
> > $ cat /proc/$(pgrep qemu)/fdinfo/2
> > pos:    0
> > flags:  0100002
> > mnt_id: 19
> > 
> > Hmm!
> > 
> > After a bit of reading, it seems like this is caused by the fact that bash,
> > when there is no redirection, sets up fds 0, 1 and 2 by dup'ing a single 
> > file
> > descriptor. I found this surprising!
> > 
> > A small test program that sets O_NONBLOCK on fd 1 and checks for it on fd 2
> > seems to corroborate it:
> > 
> > #include <stdio.h>
> > #include <fcntl.h>
> > #include <assert.h>
> > 
> > int main() {
> >         int f1, f2;
> > 
> >         f1 = fcntl(1, F_GETFL, 0);
> >         f2 = fcntl(2, F_GETFL, 0);
> >         assert(!(f1 & O_NONBLOCK));
> >         assert(!(f2 & O_NONBLOCK));
> >         assert(!fcntl(1, F_SETFL, f1 | O_NONBLOCK));
> >         if (fcntl(2, F_GETFL, 0) & O_NONBLOCK)
> >                 fprintf(stderr, "File descriptors are duplicates.\n");
> >         else
> >                 fprintf(stderr, "File descriptors are separate.\n");
> >         return 0;
> > }
> > 
> > gcc -Wall foo.c
> > $ ./a.out
> > File descriptors are duplicates.
> > $ ./a.out > /dev/null
> > File descriptors are separate.
> > 
> > The nonblocking flag is always (?) being set on stdin and stdout during
> > qemu_chr_open_stdio(), but it doesn't affect stderr unless they are dup'd.
> > 
> > So this is probably not going to occur often, and certainly not when running
> > under libvirt or other tools. But if it is actually what is happening, I'm 
> > not
> > sure what to do about it. Any ideas?
> 
> This is unfortunate.  I wonder if opening /proc/fd/<n> is just a dup or
> if it actually creates separate file status flags.


Calling open() always creates a new "file description" (which contains file
status flags), so it's certainly a way to approach this.

I've testing doing this if we're going to use stderr as the qemu_logfile:

        dup2(open("/proc/self/fd/2", O_WRONLY), 2)

... in do_qemu_set_log() and it fixes the issue, however...

* Doing the dup2() causes it to fix any other fprintf()s to stderr, but
  I feel bad for changing the fd that's inside the stderr "FILE *" "underneath
  it". Would it be better to do the following...
        qemu_logfile = fopen("/proc/self/fd/2", "a")
  ... even though that will leave other fprintf()s unfixed. (I think that is
  probably OK.)
* We could also unconditionally do the dup2(), maybe early in main().
* I'm not sure which platforms other than modern Linux support
  /proc/self/fd/<n>. (Just fixing it for Linux seems a lot better than
  nothing.)
* We could print a warning if stderr is nonblocking and we can't work around
  it. This seems like a good idea even if don't fix it for Linux.
* Alternatively, we could take this approach to stdin and stdout as they are
  added to the char driver and set nonblocking. This somehow feels more
  invasive, although I'm not sure why.
* We could also handle it by converting the qemu_log_mask calls from fprintf()
  to sprintf() + write(), as we'd then be able to handle EAGAIN correctly. This
  doesn't feel any cleaner to me.

Cheers,
Sam.

Re: [Qemu-devel] Question about nonblocking stderr and lost logs

Reply via email to