On Thu, Feb 11, 2021 at 12:38:48PM +0900, Minwoo Im wrote: > On 21-02-11 12:00:11, Keith Busch wrote: > > But I would prefer to see advanced retry tied to real errors that can be > > retried, like if we got an EBUSY or EAGAIN errno or something like that. > > I have seen a thread [1] about ACRE. Forgive me If I misunderstood this > thread or missed something after this thread. It looks like CRD field in > the CQE can be set for any NVMe error state which means it *may* depend on > the device status.
Right! Setting CRD values is at the controller's discretion for any error status as long as the host enables ACRE. > And this patch just introduced a internal temporarily error state of > the controller by returning Command Intrrupted status. It's just purely synthetic, though. I was hoping something more natural could trigger the status. That might not provide the deterministic scenario you're looking for, though. I'm not completely against using QEMU as a development/test vehicle for corner cases like this, but we are introducing a whole lot of knobs recently, and you practically need to be a QEMU developer to even find them. We probably should step up the documentation in the wiki along with these types of features. > I think, in this stage, we can go with some errors in the middle of the > AIO (nvme_aio_err()) for advanced retry. Shouldn't AIO errors are > retry-able and supposed to be retried ? Sure, we can assume that receiving an error in the AIO callback means the lower layers exhausted available recovery mechanisms.