On Sat, Aug 4, 2012 at 2:44 AM, Bharata B Rao <bhar...@linux.vnet.ibm.com> wrote: > On Fri, Aug 03, 2012 at 03:57:20PM +0000, Blue Swirl wrote: >> >> > +static void gluster_finish_aiocb(struct glfs_fd *fd, ssize_t ret, void >> >> > *arg) >> >> > +{ >> >> > + GlusterAIOCB *acb = (GlusterAIOCB *)arg; >> >> > + BDRVGlusterState *s = acb->common.bs->opaque; >> >> > + >> >> > + acb->ret = ret; >> >> > + if (qemu_gluster_send_pipe(s, acb) < 0) { >> >> > + error_report("Could not complete read/write/flush from >> >> > gluster"); >> >> > + abort(); >> >> >> >> Aborting is a bit drastic, it would be nice to save and exit gracefully. >> > >> > I am not sure if there is an easy way to recover sanely and exit from this >> > kind of error. >> > >> > Here the non-QEMU thread (gluster thread) failed to notify the QEMU thread >> > on the read side of the pipe about the IO completion. So essentially >> > bdrv_read or bdrv_write will never complete if this error happens. >> > >> > Do you have any suggestions on how to exit gracefully here ? >> >> Ignore but set the callback return to -EIO, see for example curl.c:249. > > I see the precedence for how I am handling this in > posix-aio-compat.c:posix_aio_notify_event(). > > So instead of aborting, I could do acb->common.cb(acb->common.opaque, -EIO) > as you suggest, but that would not help because, the thread at the read side > of the pipe is still waiting and user will see the read/write failure as hang.
Probably the other side needs to be informed somehow. Maybe it's enough for 1.2 to just use abort() and add a FIXME comment. > > [root@bharata qemu]# gdb ./x86_64-softmmu/qemu-system-x86_64 > Starting program: ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm > --nographic -m 1024 -smp 4 -drive > file=gluster://bharata/test/F16,if=virtio,cache=none > [New Thread 0x7ffff4c7f700 (LWP 6537)] > [New Thread 0x7ffff447e700 (LWP 6538)] > [New Thread 0x7ffff3420700 (LWP 6539)] > [New Thread 0x7ffff1407700 (LWP 6540)] > qemu-system-x86_64: -drive > file=gluster://bharata/test/F16,if=virtio,cache=none: Could not complete > read/write/flush from gluster > ^C > Program received signal SIGINT, Interrupt. > 0x00007ffff60e9403 in select () from /lib64/libc.so.6 > (gdb) bt > #0 0x00007ffff60e9403 in select () from /lib64/libc.so.6 > #1 0x00005555555baee3 in qemu_aio_wait () at aio.c:158 > #2 0x00005555555cf57b in bdrv_rw_co (bs=0x5555564cfa50, sector_num=0, buf= > 0x7fffffffb640 "\353c\220", nb_sectors=4, is_write=false) at block.c:1623 > #3 0x00005555555cf5e1 in bdrv_read (bs=0x5555564cfa50, sector_num=0, buf= > 0x7fffffffb640 "\353c\220", nb_sectors=4) at block.c:1633 > #4 0x00005555555cf9d0 in bdrv_pread (bs=0x5555564cfa50, offset=0, > buf=0x7fffffffb640, > count1=2048) at block.c:1720 > #5 0x00005555555cc8d4 in find_image_format (filename= > 0x5555564cc290 "gluster://bharata/test/F16", pdrv=0x7fffffffbe60) at > block.c:529 > #6 0x00005555555cd303 in bdrv_open (bs=0x5555564cef20, filename= > 0x5555564cc290 "gluster://bharata/test/F16", flags=98, drv=0x0) at > block.c:800 > #7 0x0000555555609f69 in drive_init (opts=0x5555564cf900, default_to_scsi=0) > at blockdev.c:608 > #8 0x0000555555711b6c in drive_init_func (opts=0x5555564cc1e0, > opaque=0x555555c357a0) > at vl.c:775 > #9 0x000055555574ceda in qemu_opts_foreach (list=0x555555c319e0, func= > 0x555555711b31 <drive_init_func>, opaque=0x555555c357a0, > abort_on_failure=1) > at qemu-option.c:1094 > #10 0x0000555555719d78 in main (argc=9, argv=0x7fffffffe468, > envp=0x7fffffffe4b8) > at vl.c:3430 >