On Wed, Aug 15, 2012 at 10:00:27AM +0200, Kevin Wolf wrote: > Am 15.08.2012 07:21, schrieb Bharata B Rao: > > On Tue, Aug 14, 2012 at 10:29:26AM +0200, Kevin Wolf wrote: > >>>>> +static void gluster_finish_aiocb(struct glfs_fd *fd, ssize_t ret, void > >>>>> *arg) > >>>>> +{ > >>>>> + GlusterAIOCB *acb = (GlusterAIOCB *)arg; > >>>>> + BDRVGlusterState *s = acb->common.bs->opaque; > >>>>> + > >>>>> + acb->ret = ret; > >>>>> + if (qemu_gluster_send_pipe(s, acb) < 0) { > >>>>> + /* > >>>>> + * Gluster AIO callback thread failed to notify the waiting > >>>>> + * QEMU thread about IO completion. Nothing much can be done > >>>>> + * here but to abruptly abort. > >>>>> + * > >>>>> + * FIXME: Check if the read side of the fd handler can somehow > >>>>> + * be notified of this failure paving the way for a graceful > >>>>> exit. > >>>>> + */ > >>>>> + error_report("Gluster failed to notify QEMU about IO > >>>>> completion"); > >>>>> + abort(); > >>>> > >>>> In the extreme case you may choose to make this disk inaccessible > >>>> (something like bs->drv = NULL), but abort() kills the whole VM and > >>>> should only be called when there is a bug. > >>> > >>> There have been concerns raised about this earlier too. I settled for this > >>> since I couldn't see a better way out and I could see the precedence > >>> for this in posix-aio-compat.c > >>> > >>> So I could just do the necessary cleanup, set bs->drv to NULL and return > >>> from > >>> here ? But how do I wake up the QEMU thread that is waiting on the read > >>> side > >>> of the pipe ? W/o that, the QEMU thread that waits on the read side of the > >>> pipe is still hung. > >> > >> There is no other thread. But you're right, you should probably > >> unregister the aio_fd_handler and any other pending callbacks. > > > > As I clarified in the other mail, this (gluster_finish_aiocb) is called > > from gluster thread context and hence QEMU thread that raised the original > > read/write request is still blocked on qemu_aio_wait(). > > > > I tried the following cleanup instead of abrupt abort: > > > > close(read_fd); /* This will wake up the QEMU thread blocked on > > select(read_fd...) */ > > close(write_fd); > > qemu_aio_set_fd_handler(read_fd, NULL, NULL, NULL, NULL); > > qemu_aio_release(acb); > > s->qemu_aio_count--; > > bs->drv = NULL; > > > > I tested this by manually injecting faults into qemu_gluster_send_pipe(). > > With the above cleanup, the guest kernel crashes with IO errors. > > What does "crash" really mean? IO errors certainly shouldn't cause a > kernel to crash?
Since an IO failed, it resulted in root file system corruption which subsequently led to a panic. [ 1.529042] dracut: Switching root qemu-system-x86_64: Gluster failed to notify QEMU about IO completion qemu-system-x86_64: Gluster failed to notify QEMU about IO completion qemu-system-x86_64: Gluster failed to notify QEMU about IO completion qemu-system-x86_64: Gluster failed to notify QEMU about IO completion [ 1.584130] end_request: I/O error, dev vda, sector 13615224 [ 1.585119] end_request: I/O error, dev vda, sector 13615344 [ 1.585119] end_request: I/O error, dev vda, sector 13615352 [ 1.585119] end_request: I/O error, dev vda, sector 13615360 [ 1.593188] end_request: I/O error, dev vda, sector 1030144 [ 1.594169] Buffer I/O error on device vda3, logical block 0 [ 1.594169] lost page write due to I/O error on vda3 [ 1.594169] EXT4-fs error (device vda3): __ext4_get_inode_loc:3539: inode #392441: block 1573135: comm systemd: unable to read itable block [...] [ 1.620064] EXT4-fs error (device vda3): __ext4_get_inode_loc:3539: inode #392441: block 1573135: comm systemd: unable to read itable block /usr/lib/systemd/systemd: error while loading shared libraries: libselinux.so.1: cannot open shared object file: Input/output error [ 1.626193] Kernel panic - not syncing: Attempted to kill init! [ 1.627789] Pid: 1, comm: systemd Not tainted 3.3.4-5.fc17.x86_64 #1 [ 1.630063] Call Trace: [ 1.631120] [<ffffffff815e21eb>] panic+0xba/0x1c6 [ 1.632477] [<ffffffff8105aff1>] do_exit+0x8b1/0x8c0 [ 1.633851] [<ffffffff8105b34f>] do_group_exit+0x3f/0xa0 [ 1.635258] [<ffffffff8105b3c7>] sys_exit_group+0x17/0x20 [ 1.636619] [<ffffffff815f38e9>] system_call_fastpath+0x16/0x1b > > > Is there anything else that I need to do or do differently to retain the > > VM running w/o disk access ? > > > > I thought of completing the aio callback by doing > > acb->common.cb(acb->common.opaque, -EIO); > > but that would do a coroutine enter from gluster thread, which I don't think > > should be done. > > You would have to take the global qemu mutex at least. I agree it's not > a good thing to do. So is it really worth doing all this to handle this unlikely error ? The chances of this error happening is quite remote I believe. Regards, Bharata.