Hello Mikulas, On Mon, 2025-05-12 at 15:46 +0200, Mikulas Patocka wrote: > Hi > > > On Thu, 8 May 2025, Martin Wilck wrote: > > > Hello Kevin, > > > > [I'm sorry for the previous email. It seems that I clicked "send" > > in > > the wrong window]. > > > > On Tue, 2025-04-29 at 18:50 +0200, Kevin Wolf wrote: > > > Multipath cannot directly provide failover for ioctls in the > > > kernel > > > because it doesn't know what each ioctl means and which result > > > could > > > indicate a path error. Userspace generally knows what the ioctl > > > it > > > issued means and if it might be a path error, but neither does it > > > know > > > which path the ioctl took nor does it necessarily have the > > > privileges > > > to > > > fail a path using the control device. > > > > Thanks for this effort. > > > > I have some remarks about your approach. The most important one is > > that > > the DM_MPATH_PROBE_PATHS_CMD ioctl appears to be a dangerous > > command. > > It sends IO to possibly broken paths and waits for it to complete. > > In > > the common error case of a device not responding, this might cause > > the > > code to hang for a long time in the kernel ioctl code path, in > > uninterruptible sleep (note that these probing commands will be > > queued > > Normal reads and writes would also hang in an uninterruptible sleep > if a > path stops responding.
Right. That's why multipathd uses TEST UNIT READY if supported by the storage, and either aio or separate I/O threads to avoid the main thread being blocked, and generally goes to great lengths to make sure that misbehaving storage hardware can cause no freeze. The way I read Kevin's code, you'd queue up additional IO in the same context that had submitted the original failing IO. I realize that qemu uses asynchronous IO, but AFAIK the details depend on the qemu options for the individual VM, and it isn't clear to me whether or not DM_MPATH_PROBE_PATHS_CMD can bring the entire VM to halt. > > [...] > > > > I am wondering whether the DM_MPATH_PROBE_PATHS_CMD ioctl is > > necessary > > at all. Rather than triggering explicit path probing, you could > > have > > dm-mpath "handle" IO errors by failing the active path for "path > > errors". That would be similar to my patch set from 2021 [1], but > > it > > would avoid the extra IO and thus the additional risk of hanging in > > the > > kernel. > > What exactly do you mean? You say "you could have dm-mpath 'handle' > IO > errors by failing the active path for "path errors". What I meant was that dm-mpath could call fail_path() in case of an error, using a similar mechanism to the block IO code path. > Christoph doesn't want dm-mpath can't look at the error code - so dm- > mpath > doesn't know if path error occured or not. My impression from the previous discussion was that Christoph mainly objected to SG_IO requests being retried in the kernel [1], not so much to the inspection of the error code by device mapper. I may have misunderstood this of course. Adding Christoph into the loop so that he can clarify what he meant. > qemu could look at the error > code, but qemu doesn't know which path did the SG_IO go through - so > it > can't instruct qemu to fail that path. > > One of the possibility that we discussed was to add a path-id to > SG_IO, so > that dm-mpath would mark which path did the SG_IO go through and qemu > could instruct dm-mpath to fail that path. But we rejected it as > being > more complicated than the current approach. If we discuss extending SG_IO itself, it might be easier to have it store the blkstatus_t somewhere in the sg_io_hdr. More about that idea in my reply to Kevin. Regards, Martin [1] https://lore.kernel.org/all/20210701075629.ga25...@lst.de/ > > Contrary to my set, you wouldn't attempt retries in the kernel, but > > leave this part to qemu instead, like in the current set. That > > would > > avoid Christoph's main criticism that "Failing over SG_IO does not > > make > > sense" [2]. > > > > Doing the failover in qemu has the general disadvantage that qemu > > has > > no notion about the number of available and/or healthy paths; it > > can > > thus only guess the reasonable number of retries for any given I/O. > > But > > that's unavoidable, given that the idea to do kernel-level failover > > on > > SG_IO is rejected. > > > > Please let me know your thoughts. > > > > Best Regards > > Martin > > > > [1] > > https://lore.kernel.org/all/20210628151558.2289-1-mwi...@suse.com/ > > [2] https://lore.kernel.org/all/20210701075629.ga25...@lst.de/ > > Mikulas