On 21-02-11 13:24:22, Keith Busch wrote: > On Thu, Feb 11, 2021 at 12:38:48PM +0900, Minwoo Im wrote: > > On 21-02-11 12:00:11, Keith Busch wrote: > > > But I would prefer to see advanced retry tied to real errors that can be > > > retried, like if we got an EBUSY or EAGAIN errno or something like that. > > > > I have seen a thread [1] about ACRE. Forgive me If I misunderstood this > > thread or missed something after this thread. It looks like CRD field in > > the CQE can be set for any NVMe error state which means it *may* depend on > > the device status. > > Right! Setting CRD values is at the controller's discretion for any > error status as long as the host enables ACRE. > > > And this patch just introduced a internal temporarily error state of > > the controller by returning Command Intrrupted status. > > It's just purely synthetic, though. I was hoping something more natural > could trigger the status. That might not provide the deterministic > scenario you're looking for, though.
That makes snese. If some status can be triggered more naturally, that would be much better. > I'm not completely against using QEMU as a development/test vehicle for > corner cases like this, but we are introducing a whole lot of knobs > recently, and you practically need to be a QEMU developer to even find > them. We probably should step up the documentation in the wiki along > with these types of features. Oh, that's a really good advice, really appreciate that one. > > I think, in this stage, we can go with some errors in the middle of the > > AIO (nvme_aio_err()) for advanced retry. Shouldn't AIO errors are > > retry-able and supposed to be retried ? > > Sure, we can assume that receiving an error in the AIO callback means > the lower layers exhausted available recovery mechanisms. Okay, please let me find a way to trigger this kind of errors more naturally. I think this HMP command should be the last one to try if there's nothing we can do really.