On Sun, Jul 01, 2012 at 09:06:20PM +1000, Alexey Kardashevskiy wrote: > QEMU from qemu.org/master (not qemu-kvm), linux host 3.4, ppc64. > > Shortly the problem is in the host kernel: closing a file in one thread does > not interrupt select() waiting on the same file description in another thread. > > Longer story is: > I'll use VFIO as an example as I hit this when I was debugging it but VFIO > itself has nothing to do with the issue. > > The bug looks like: I start the guest with MSI-capable PCI card passed via > VFIO. The guest does dhclient from .bashrc and this dhclient does not receive > anything till I press any key. > I did not see it for a while as I always start the guest and then typed > "dhclient" manually in the guest console. > > What happens: > VFIO initializes INTx eventfd via event_notifier_init() (returns fd=XX) and > qemu_set_fd_handler() before the guest starts. Then QEMU starts the guest and > enters a loop is os_host_main_loop_wait() which stays in select() until > something happens. > > >From the host kernel prospective, the XX fd was allocated, struct file* (P1) > >with eventfd specific private_data allocated and initialized. select() added > >a file refcounter (called get_file() in __pollwait()) and started waiting on > >file* P1 but not on fd's number XX (what is mmm weird but ok). > > The guest starts and tries to initialize MSI for the PCI card passed through. > The guest does PCI config write and this write creates a second thread in > QEMU.
Why does this create a thread btw? > In this thread QEMU-VFIO closes fd XX which makes the host kernel release > fd=XX, release the corresponding file* P1 (fput() in filp_close()) but this > does not free this P1 as there is select() waiting on it. Doesn't qemu remove an fd handler before closing the fd? If not that's the bug right there. > eventfd_release() could wake it up but it is called when its refcounter > becomes 0 and this won't happen till select() interrupts. > > Doing MSI init stuff, QEMU-VFIO calls the same event_notifier_init() (returns > recycled fd=XX what is correct but confuses) and qemu_set_fd_handler() which > adds a handler but select() does not pick it up. > The eventfd() (called by event_notifier_init()) allocates new struct file *P2 > in the kernel, QEMU-VFIO passes this fd=XX to the kernel's VFIO. When MSI > interrupt comes to the host kernel, the VFIO interrupt handler calls > eventfd_signal() on the correct file* P2. However do_select() in the kernel > does not interrupt to deliver eventfd event as it is still waiting on P1 - > nobody interrupted select(). It can only interrupt on stdin events (like > typing keys) and INTx interrupt (which does not happen as MSI is enabled). > > So we need to sync eventfd()/close() calls in one thread with select() in > another. Below is the patch which simply sends SIGUSR2 to the main thread > (the sample patch is below). Another solution could be adding new IO handler > to wake select() up. Or to do something with the kernel but I am not sure > what - implementing file_operations::flush for eventfd to do wakeup did not > help and I did not dig any further. > > > Does it make sense? What do I miss? What would be the right solution? > > > --- > iohandler.c | 1 + > main-loop.c | 17 +++++++++++++++++ > 2 files changed, 18 insertions(+) > > diff --git a/iohandler.c b/iohandler.c > index 3c74de6..54f4c48 100644 > --- a/iohandler.c > +++ b/iohandler.c > @@ -77,6 +77,7 @@ int qemu_set_fd_handler2(int fd, > ioh->fd_write = fd_write; > ioh->opaque = opaque; > ioh->deleted = 0; > + kill(getpid(), SIGUSR2); > } > return 0; > } > diff --git a/main-loop.c b/main-loop.c > index eb3b6e6..f65e048 100644 > --- a/main-loop.c > +++ b/main-loop.c > @@ -199,10 +199,27 @@ static int qemu_signal_init(void) > } > #endif > > +static void sigusr2_print(int signal) > +{ > +} > + > +static void sigusr2_init(void) > +{ > + struct sigaction action; > + > + memset(&action, 0, sizeof(action)); > + sigfillset(&action.sa_mask); > + action.sa_handler = sigusr2_print; > + action.sa_flags = 0; > + sigaction(SIGUSR2, &action, NULL); > +} > + > int main_loop_init(void) > { > int ret; > > + sigusr2_init(); > + > qemu_mutex_lock_iothread(); > ret = qemu_signal_init(); > if (ret) { > -- > 1.7.10