On 12/11/20 16:04, Hannes Reinecke wrote:
On 11/12/20 10:52 AM, Paolo Bonzini wrote:
Well, ironically I'm currently debugging a customer escalation which
touches exactly this area. It revolves more around the SG_IO handling;
Technically this patch is for *non* passthrough, but yeah it's a similar
case.
qemu ignores the host_status setting completely, leading to data
corruption if the host has to abort commands. (Not that the host
_does_ abort commands, as qemu SG_IO sets an infinite timeout. But
that's another story).
And part of the patchset is to enable passing of the host_status code
back to the drivers. In particular virtio_scsi has a 'response' code
which matches pretty closely to the linux SCSI DID_XXX codes.
Yeah, most of the time that's just because it's what can go wrong in
SCSI. Sometimes it's because I had no clue when writing the virtio-scsi
spec and just copied blindly from Linux. For example
VIRTIO_SCSI_S_NEXUS_FAILURE probably should have never existed, since
DID_NEXUS_FAILURE really should have been DID_RESRVATION_CONFLICT.
As an aside, I hate Linux host_status. It's never clear when looking at
the code if the statuses have been mapped back to BLK_STS codes or not,
so you don't know if you already have gotten rid of DID_TARGET_FAILURE,
DID_NEXUS_FAILURE, DID_ALLOC_FAILURE and DID_MEDIUM_ERROR (which are
just weird way to store the SCSI status or sense key for future use, and
not really "host statuses), and would really be a nexus failure only in
the rare case of path-specific reservations).
So my idea is to pass the host_status directly down to the drivers,
allowing virtio-scsi to do a mapping between DID_XX and virtio response
codes.
But yeah, this is a good idea. But since I hate host_status, please
define your own enum instead of DID_*. Of course you can use the same
values as DID_* and assert with QEMU_BUILD_BUG_ON that they are the
same, but I don't want people to read the code and have to think of
DID_ALLOC_FAILURE and the like.
Paolo