Re: [RFC 0/4] POC: Generating realistic block errors

Kevin Wolf Tue, 26 Nov 2019 11:42:47 -0800

Am 26.11.2019 um 19:19 hat Tony Asleson geschrieben:
> On 11/21/19 4:30 AM, Stefan Hajnoczi wrote:
> > blkdebug can inject EIO when a specific LBA is accessed.  Is that
> > enough for what you want to do?  Then you can reuse and maybe extend
> > blkdebug.
> 
> Not exactly.  For SCSI, I would like to be able to return different
> types of device errors on reads eg. 03/1101, 03/1600 and writes.  The
> SCSI sense data needs to include the first block in error for the
> transfer.  It would be good to also have the ability to include things
> like SCSI check conditions with recoverable errors too.
> 
> I've been experimenting with blkdebug, to learn more and to see how it
> would need to be extended.  One thing that I was trying to understand is
> how an EIO from blkdebug gets translated into a bus/device specific
> error.  At the moment I'm not sure.  I've been trying to figure out the
> layering.  I think that blkdebug sits between the device specific model
> and the underlying block representation on disk.  Thus it injects error
> return values when accessing the underlying data, but that could be
> incorrect.  If it is correct I should see some code that translates the
> EIO to something transport/device specific.


The point where the device calls into the generic block layer is where
the functions that start with blk_ are called (blk_aio_pwritev() and
blk_aio_preadv() are probably the most interesting ones).

The callback path in scsi-disk is not that easy to follow, but in the
end, error returns should result in scsi_handle_rw_error() being called
where error codes are translated into SCSI sense codes.

> Although I don't understand how returning an ENOSPC from read_aio in
> blkdebug would get translated for a SCSI disk as it doesn't make sense
> to me (one of the examples in the documentation).  Actually I don't
> know how getting ENOSPC on a read could happen?

That scenario doesn't make a lot of sense to me either, but blkdebug can
just inject any error code, even nonsensical ones.

> During my blkdebug experimentation, I've been using lsi53c895a  with
> scsi-disk and thus far I've not been able to generate a read error back
> to the guest kernel.  I've managed to abort qemu with an assert and hang
> qemu without being able to get an error back to the guest kernel.  I
> wrote up one of them: https://bugs.launchpad.net/qemu/+bug/1853898 .
> Specifying a specific sector hasn't worked for me yet.  I'm still trying
> to figure out how to enable tracing/debugging etc. to see what I'm going
> incorrectly.

Note that depending on the rerror/werror options, QEMU may not deliver
errors to the guest, but stop VMs instead. If the monitor is still
responsive, it's likely that you just got a stopped VM rather than a
hanging QEMU.

The default is that the VM is stopped for ENOSPC and other errors are
delivered to the guest.

Kevin

Re: [RFC 0/4] POC: Generating realistic block errors

Reply via email to