On Tue, 19 Nov 2019 18:59:11 +0100 Halil Pasic <pa...@linux.ibm.com> wrote:
> On Tue, 19 Nov 2019 13:02:20 +0100 > Cornelia Huck <coh...@redhat.com> wrote: > > > On Tue, 19 Nov 2019 12:23:40 +0100 > > Halil Pasic <pa...@linux.ibm.com> wrote: > > > > > On Mon, 18 Nov 2019 19:13:34 +0100 > > > Cornelia Huck <coh...@redhat.com> wrote: > > > > > > > > EIO is returned by vfio-ccw mediated device when the backing > > > > > host subchannel is not operational anymore. So return cc=3 > > > > > back to the guest, rather than returning a unit check. > > > > > This way the guest can take appropriate action such as > > > > > issue an 'stsch'. > > > > > > > > Hnm, I'm trying to recall whether that was actually a conscious choice, > > > > but I can't quite remember... the change does make sense at a glance, > > > > however. > > > > > > Is EIO returned if and only if the host subchannel/device is not > > > operational any more, or are there cases as well? > > > > Ok, I walked through the kernel code, and it seems -EIO can happen > > Thanks Connie for having a look. > > > - when we try to do I/O while in the NOT_OPER or STANDBY states... cc 3 > > makes sense in those cases > > I do understand NOT_OPER, but I'm not sure about STANDBY. > > Here is what the PoP says about cc 3 for SSCH. > """ > Condition code 3 is set, and no other action is > taken, when the subchannel is not operational for > START SUBCHANNEL. A subchannel is not opera- > tional for START SUBCHANNEL if the subchannel is > not provided in the channel subsystem, has no valid > device number associated with it, or is not enabled. > """ > > Are we guaranteed to reflect one of these conditions back? > > Under what circumstances do we expect that our request will > find the device in STANDBY? IIRC, the subchannel is not enabled when the device is in STANDBY? Anyway, it seems the check here is more like a safety measure, in case we messed up. > > > - when the cp is not initialized when trying to fetch the orb... which > > is an internal vfio-ccw kernel module error > > > So the answer seems to be, no EIO is also used for something else than > 'device not operational' in a sense of the s390 IO architecture (cc=3 > and stuff). > > AFAIR the idea was that EIO means something is broken, and we decided > to reflect that as an unit check (because the broader device -- the > actual device + our pass-through code == device for the guest) is broken. > So I think it was a conscious choice. Hm, if you put it like that... maybe leaving it as -EIO makes more sense. The main question is: What happens if userspace triggers I/O to be started and we find the device to have become not operational? Can we even switch the state to NOT_OPER before we try the ssch (which will fail with cc 3)? If not, it's probably safe to leave the -EIO in place.