On 09/13/2017 12:08 PM, Cornelia Huck wrote: > On Thu, 7 Sep 2017 13:01:34 +0200 > Halil Pasic <pa...@linux.vnet.ibm.com> wrote: > >> On 09/07/2017 10:02 AM, Dong Jia Shi wrote: >>> * Cornelia Huck <coh...@redhat.com> [2017-09-06 13:25:38 +0200]: >>> >>>> On Wed, 6 Sep 2017 16:27:20 +0800 >>>> Dong Jia Shi <bjsdj...@linux.vnet.ibm.com> wrote: >>>> >>>>> * Halil Pasic <pa...@linux.vnet.ibm.com> [2017-09-05 19:20:43 +0200]: >>>>> >>>>>> >>>>>> >>>>>> On 09/05/2017 05:46 PM, Cornelia Huck wrote: >>>>>>> On Tue, 5 Sep 2017 17:24:19 +0200 >>>>>>> Halil Pasic <pa...@linux.vnet.ibm.com> wrote: >>>>>>> >>>>>>>> My problem with a program check (indicated by SCSW word 2 bit 10) is >>>>>>>> that, in my reading of the architecture, the semantic behind it is: The >>>>>>>> channel subsystem (not the cu or device) has detected, that the >>>>>>>> the channel program (previously submitted as an ORB) is erroneous. >>>>>>>> Which >>>>>>>> programs are erroneous is specified by the architecture. What we have >>>>>>>> here does not qualify. >>>>>>>> >>>>>>>> My idea was to rather blame the virtual hardware (device) and put no >>>>>>>> blame >>>>>>>> on the program nor he channel subsystem. This could be done using >>>>>>>> device >>>>>>>> status (unit check with command reject, maybe unit exception) or >>>>>>>> interface >>>>>>>> check. My train of thought was, the problem is not consistent across a >>>>>>>> device type, so it has to be device specific. >>>>>>> >>>>>>> Unit exception might be a better way to express what is happening here. >>>>>>> At least, it moves us away from cc 1 and not towards cc 3 :) >>>>>>> >>>>>> >>>>>> I will do a follow up patch pursuing device exception. >>>>>> >>>>>>>> >>>>>>>> Of course blaming the device could mislead the person encountering the >>>>>>>> problem, and make him believe it's an non-virtual hardware problem. >>>>>>>> >>>>>>>> About the misleading, I think the best we can do is log out a message >>>>>>>> indicating what really happened. >>>>>>> >>>>>>> Just document it in the code? If it doesn't happen with Linux as a >>>>>>> guest, it is highly unlikely to be seen in the wild. >>>>>>> >>>>>> >>>>>> >>>>>> Well we have two problems here: >>>>>> 1) Unit exception can be already defined by the device type for the >>>>>> command (reference: >>>>>> http://publibfp.dhe.ibm.com/cgi-bin/bookmgr/BOOKS/dz9ar110/2.6.10?DT=19920904110920). >>>>>> I think this one is what you mean. And I agree that's best handled >>>>>> with comment in code. >>>>> Using unit check, with bit 3 byte 0 of the sense data set to 1, to >>>>> indicate an 'Equipment check', sounds a bit more proper than unit >>>>> exception. >>>> >>>> I don't agree: Equipment check sounds a lot more dire (and seems to >>>> imply a malfunction). I like unit exception better. >>> Got the point. Fair enough! >>> >> >> I do see some benefit in doing unit check over unit exception. Just >> kept quite to see the discussion unfold. As already said, unit exception >> seems to be something reserved for the device type to define in a more >> or less arbitrary but unambiguous way. I agreed to use this, because >> I trust Connie's assessment about not really being used by the >> devices in the wild (obviously nothing changed here). >> >> If we consider the semantic of unit check with command reject, it's >> a surprisingly good match: basically device detected a programming >> error (which can not be detected by the channel-subsystem because it >> is device (type) specific). For reference see: >> http://publibfp.dhe.ibm.com/cgi-bin/bookmgr/BOOKS/dz9ar110/2.7.2.1?DT=19920904110920 >> >> IMHO that's almost exactly what we have here: the channel-program >> is good from the perspective of the channel subsystem, but the device >> can't deal with it. So we would not lie that the device is at fault >> (was Connie's concern initially) but we would not lie about having >> a generally invalid channel program (was my concern). >> >> So how about an unit check with a command reject? (The only problem >> I see is is on the device vs device type plane -- but that ain't better >> for unit exception.) > > I don't know, it feels a bit weird if I look at the cases where I saw > command reject in the wild before, even if seems to agree with the > architecture... but just a gut feeling. >
Then let's settle for unit exception for now. I will let this topic (series) rest for a couple of days in favor of things like virtio-crypto spec review, maybe IDA, and some other stuff. But I definitely intend to pick this series up again. Halil