On Thu, 7 Sep 2017 13:01:34 +0200 Halil Pasic <pa...@linux.vnet.ibm.com> wrote:
> On 09/07/2017 10:02 AM, Dong Jia Shi wrote: > > * Cornelia Huck <coh...@redhat.com> [2017-09-06 13:25:38 +0200]: > > > >> On Wed, 6 Sep 2017 16:27:20 +0800 > >> Dong Jia Shi <bjsdj...@linux.vnet.ibm.com> wrote: > >> > >>> * Halil Pasic <pa...@linux.vnet.ibm.com> [2017-09-05 19:20:43 +0200]: > >>> > >>>> > >>>> > >>>> On 09/05/2017 05:46 PM, Cornelia Huck wrote: > >>>>> On Tue, 5 Sep 2017 17:24:19 +0200 > >>>>> Halil Pasic <pa...@linux.vnet.ibm.com> wrote: > >>>>> > >>>>>> My problem with a program check (indicated by SCSW word 2 bit 10) is > >>>>>> that, in my reading of the architecture, the semantic behind it is: The > >>>>>> channel subsystem (not the cu or device) has detected, that the > >>>>>> the channel program (previously submitted as an ORB) is erroneous. > >>>>>> Which > >>>>>> programs are erroneous is specified by the architecture. What we have > >>>>>> here does not qualify. > >>>>>> > >>>>>> My idea was to rather blame the virtual hardware (device) and put no > >>>>>> blame > >>>>>> on the program nor he channel subsystem. This could be done using > >>>>>> device > >>>>>> status (unit check with command reject, maybe unit exception) or > >>>>>> interface > >>>>>> check. My train of thought was, the problem is not consistent across a > >>>>>> device type, so it has to be device specific. > >>>>> > >>>>> Unit exception might be a better way to express what is happening here. > >>>>> At least, it moves us away from cc 1 and not towards cc 3 :) > >>>>> > >>>> > >>>> I will do a follow up patch pursuing device exception. > >>>> > >>>>>> > >>>>>> Of course blaming the device could mislead the person encountering the > >>>>>> problem, and make him believe it's an non-virtual hardware problem. > >>>>>> > >>>>>> About the misleading, I think the best we can do is log out a message > >>>>>> indicating what really happened. > >>>>> > >>>>> Just document it in the code? If it doesn't happen with Linux as a > >>>>> guest, it is highly unlikely to be seen in the wild. > >>>>> > >>>> > >>>> > >>>> Well we have two problems here: > >>>> 1) Unit exception can be already defined by the device type for the > >>>> command (reference: > >>>> http://publibfp.dhe.ibm.com/cgi-bin/bookmgr/BOOKS/dz9ar110/2.6.10?DT=19920904110920). > >>>> I think this one is what you mean. And I agree that's best handled > >>>> with comment in code. > >>> Using unit check, with bit 3 byte 0 of the sense data set to 1, to > >>> indicate an 'Equipment check', sounds a bit more proper than unit > >>> exception. > >> > >> I don't agree: Equipment check sounds a lot more dire (and seems to > >> imply a malfunction). I like unit exception better. > > Got the point. Fair enough! > > > > I do see some benefit in doing unit check over unit exception. Just > kept quite to see the discussion unfold. As already said, unit exception > seems to be something reserved for the device type to define in a more > or less arbitrary but unambiguous way. I agreed to use this, because > I trust Connie's assessment about not really being used by the > devices in the wild (obviously nothing changed here). > > If we consider the semantic of unit check with command reject, it's > a surprisingly good match: basically device detected a programming > error (which can not be detected by the channel-subsystem because it > is device (type) specific). For reference see: > http://publibfp.dhe.ibm.com/cgi-bin/bookmgr/BOOKS/dz9ar110/2.7.2.1?DT=19920904110920 > > IMHO that's almost exactly what we have here: the channel-program > is good from the perspective of the channel subsystem, but the device > can't deal with it. So we would not lie that the device is at fault > (was Connie's concern initially) but we would not lie about having > a generally invalid channel program (was my concern). > > So how about an unit check with a command reject? (The only problem > I see is is on the device vs device type plane -- but that ain't better > for unit exception.) I don't know, it feels a bit weird if I look at the cases where I saw command reject in the wild before, even if seems to agree with the architecture... but just a gut feeling.