On 09/07/2017 10:02 AM, Dong Jia Shi wrote: > * Cornelia Huck <coh...@redhat.com> [2017-09-06 13:25:38 +0200]: > >> On Wed, 6 Sep 2017 16:27:20 +0800 >> Dong Jia Shi <bjsdj...@linux.vnet.ibm.com> wrote: >> >>> * Halil Pasic <pa...@linux.vnet.ibm.com> [2017-09-05 19:20:43 +0200]: >>> >>>> >>>> >>>> On 09/05/2017 05:46 PM, Cornelia Huck wrote: >>>>> On Tue, 5 Sep 2017 17:24:19 +0200 >>>>> Halil Pasic <pa...@linux.vnet.ibm.com> wrote: >>>>> >>>>>> My problem with a program check (indicated by SCSW word 2 bit 10) is >>>>>> that, in my reading of the architecture, the semantic behind it is: The >>>>>> channel subsystem (not the cu or device) has detected, that the >>>>>> the channel program (previously submitted as an ORB) is erroneous. Which >>>>>> programs are erroneous is specified by the architecture. What we have >>>>>> here does not qualify. >>>>>> >>>>>> My idea was to rather blame the virtual hardware (device) and put no >>>>>> blame >>>>>> on the program nor he channel subsystem. This could be done using device >>>>>> status (unit check with command reject, maybe unit exception) or >>>>>> interface >>>>>> check. My train of thought was, the problem is not consistent across a >>>>>> device type, so it has to be device specific. >>>>> >>>>> Unit exception might be a better way to express what is happening here. >>>>> At least, it moves us away from cc 1 and not towards cc 3 :) >>>>> >>>> >>>> I will do a follow up patch pursuing device exception. >>>> >>>>>> >>>>>> Of course blaming the device could mislead the person encountering the >>>>>> problem, and make him believe it's an non-virtual hardware problem. >>>>>> >>>>>> About the misleading, I think the best we can do is log out a message >>>>>> indicating what really happened. >>>>> >>>>> Just document it in the code? If it doesn't happen with Linux as a >>>>> guest, it is highly unlikely to be seen in the wild. >>>>> >>>> >>>> >>>> Well we have two problems here: >>>> 1) Unit exception can be already defined by the device type for the >>>> command (reference: >>>> http://publibfp.dhe.ibm.com/cgi-bin/bookmgr/BOOKS/dz9ar110/2.6.10?DT=19920904110920). >>>> I think this one is what you mean. And I agree that's best handled >>>> with comment in code. >>> Using unit check, with bit 3 byte 0 of the sense data set to 1, to >>> indicate an 'Equipment check', sounds a bit more proper than unit >>> exception. >> >> I don't agree: Equipment check sounds a lot more dire (and seems to >> imply a malfunction). I like unit exception better. > Got the point. Fair enough! >
I do see some benefit in doing unit check over unit exception. Just kept quite to see the discussion unfold. As already said, unit exception seems to be something reserved for the device type to define in a more or less arbitrary but unambiguous way. I agreed to use this, because I trust Connie's assessment about not really being used by the devices in the wild (obviously nothing changed here). If we consider the semantic of unit check with command reject, it's a surprisingly good match: basically device detected a programming error (which can not be detected by the channel-subsystem because it is device (type) specific). For reference see: http://publibfp.dhe.ibm.com/cgi-bin/bookmgr/BOOKS/dz9ar110/2.7.2.1?DT=19920904110920 IMHO that's almost exactly what we have here: the channel-program is good from the perspective of the channel subsystem, but the device can't deal with it. So we would not lie that the device is at fault (was Connie's concern initially) but we would not lie about having a generally invalid channel program (was my concern). So how about an unit check with a command reject? (The only problem I see is is on the device vs device type plane -- but that ain't better for unit exception.) Halil