On Thu, Aug 22, 2013 at 01:15:59PM +0200, Paolo Bonzini wrote: > Il 22/08/2013 12:28, Bharata B Rao ha scritto: > > On Thu, Aug 22, 2013 at 12:00:48PM +0200, Paolo Bonzini wrote: > >> Il 22/08/2013 11:55, Bharata B Rao ha scritto: > >>> This was the first apporach I had. I used to abort when writes to pipe > >>> fail. But there were concerns raised about handling the failures > >>> gracefully > >>> and hence we ended up doing all that error handling of completing the aio > >>> with -EIO, closing the pipe and making the disk inaccessible. > >>> > >>>>> Under what circumstances could it happen? > >>> Not very sure, I haven't seen that happening. I had to manually inject > >>> faults to test this error path and verify the graceful recovery. > >> > >> Looking at write(2), it looks like it is impossible > >> > >> EAGAIN or EWOULDBLOCK > >> can't happen, blocking file descriptor > >> > >> EBADF, EPIPE > >> shouldn't happen since the device is drained before > >> calling qemu_gluster_close. > >> > >> EDESTADDRREQ, EDQUOT, EFBIG, EIO, ENOSPC > >> cannot happen for pipes > >> > >> EFAULT > >> abort would be fine > > > > In the case where we have separate system and data disks and if error > > (EFAULT) > > happens for the data disk, don't we want to keep the VM up by gracefully > > disabling IO to the data disk ? > > EFAULT means the buffer address is invalid, I/O error would be EIO, but... > > > I remember this was one of the motivations to > > handle this failure. > > ... this write is on the pipe, not on a disk.
Right. Failure to complete the write on the pipe means that IO done to the disk didn't complete and hence to the VM it is essentially a disk IO failure. That's the reason we return -EIO and make the disk inaccessible when this failure happens. My question was if it is ok to abort the VM when IO to one of the disks fails ? But, if you think it is not worth handling such errors then may be we can drop this elaborate and race-prone error recovery and just abort. Regards, Bharata.