A VM in the cloud environment may use a virutal disk as the backend storage, and there are usually filesystems on the virtual block device. When backend storage is temporarily down, any I/O issued to the virtual block device will cause an error. For example, an error occurred in ext4 filesystem would make the filesystem readonly. However a cloud backend storage can be soon recovered. For example, an IP-SAN may be down due to network failure and will be online soon after network is recovered. The error in the filesystem may not be recovered unless a device reattach or system restart. So an I/O rehandle is in need to implement a self-healing mechanism.
This patch series propose a feature called I/O hang. It can rehandle AIOs with EIO error without sending error back to guest. From guest's perspective of view it is just like an IO is hanging and not returned. Guest can get back running smoothly when I/O is recovred with this feature enabled. Ying Fang (7): block-backend: introduce I/O rehandle info block-backend: rehandle block aios when EIO block-backend: add I/O hang timeout block-backend: add I/O hang drain when disbale virtio-blk: disable I/O hang when resetting qemu-option: add I/O hang timeout option qapi: add I/O hang and I/O hang timeout qapi event block/block-backend.c | 285 +++++++++++++++++++++++++++++++++ blockdev.c | 11 ++ hw/block/virtio-blk.c | 8 + include/sysemu/block-backend.h | 5 + qapi/block-core.json | 26 +++ 5 files changed, 335 insertions(+) -- 2.23.0