Hello Kevin, [I'm sorry for the previous email. It seems that I clicked "send" in the wrong window].
On Tue, 2025-04-29 at 18:50 +0200, Kevin Wolf wrote: > Multipath cannot directly provide failover for ioctls in the kernel > because it doesn't know what each ioctl means and which result could > indicate a path error. Userspace generally knows what the ioctl it > issued means and if it might be a path error, but neither does it > know > which path the ioctl took nor does it necessarily have the privileges > to > fail a path using the control device. Thanks for this effort. I have some remarks about your approach. The most important one is that the DM_MPATH_PROBE_PATHS_CMD ioctl appears to be a dangerous command. It sends IO to possibly broken paths and waits for it to complete. In the common error case of a device not responding, this might cause the code to hang for a long time in the kernel ioctl code path, in uninterruptible sleep (note that these probing commands will be queued after other possibly hanging IO). In the past we have put a lot of effort into other code paths in multipath-tools and elsewhere to avoid this kind of hang to the extent possible. It seems to me that your set re-introduces this risk. Apart from that, minor observations are that in patch 2/2 you don't check whether the map is in queueing state, and I don't quite understand why you only check paths in the map's active path group without attempting a PG failover in the case where all paths in the current PG fail. I am wondering whether the DM_MPATH_PROBE_PATHS_CMD ioctl is necessary at all. Rather than triggering explicit path probing, you could have dm-mpath "handle" IO errors by failing the active path for "path errors". That would be similar to my patch set from 2021 [1], but it would avoid the extra IO and thus the additional risk of hanging in the kernel. Contrary to my set, you wouldn't attempt retries in the kernel, but leave this part to qemu instead, like in the current set. That would avoid Christoph's main criticism that "Failing over SG_IO does not make sense" [2]. Doing the failover in qemu has the general disadvantage that qemu has no notion about the number of available and/or healthy paths; it can thus only guess the reasonable number of retries for any given I/O. But that's unavoidable, given that the idea to do kernel-level failover on SG_IO is rejected. Please let me know your thoughts. Best Regards Martin [1] https://lore.kernel.org/all/20210628151558.2289-1-mwi...@suse.com/ [2] https://lore.kernel.org/all/20210701075629.ga25...@lst.de/