On Thu, Nov 27, 2025 at 12:31:12PM +0100, Martin Wilck wrote: > On Mon, 2025-11-24 at 15:11 -0500, Benjamin Marzinski wrote: > > On Fri, Nov 21, 2025 at 02:35:23PM -0800, Brian Bunker wrote: > > > > > > > > I added the purge thread because I didn't want to starve the > > > checker thread at a large disconnect volume scale. I noticed > > > that the number of devices if I purged inline with the check that > > > it didn't scale linearly after a point and seemed to be > > > significantly > > > starving the checker thread. Doing the purge in another thread > > > seemed to relieve that pressure. > > > > That might be because the check thread has safeguards to try to avoid > > starving the other threads. Since a lot of multipathd's work is gated > > by > > the vecs lock, there's only so much parallelism that can happen with > > multiple threads. If deleting the devices is taking a long time, it > > might > > be better for this to get interrupted, so that other threads can run. > > > > Since purging the paths is fairly low priority, we could continue to > > run > > it in its own thread, but instead of running through all the devices > > at > > once, purgeloop could lock the vecs->lock, handle one device, and > > then > > unlock and loop. This means it would need to search through the list > > from the start each time it regrabbed the lock, since the list could > > have changed while it wasn't holding it. When purgeloop makes it all > > the > > way through the list without finding any paths that are due for a > > delete > > attempt, it can sleep or block waiting for more paths. > > > > This would mean that paths would need to remember if they had already > > been handled in a cycle. That could be done by purgeloop keeping > > track > > of which cycle it was on, and the paths storing the cycle number > > whenever they were checked. > > As I wrote in my other post, wish that this thread wouldn't hold the > vecs lock at all. multipathd could simply use a pipe to write dev_t's > of to-be-purged paths to the purge thread (or process :-) ).
This seems like a good idea. Obviously, writing to sysfs while the vecs lock is being held means that multipath will continue to have that scsi device open, so there's no chance of the device name getting reused if the device is removed from the system. But this other thread/process could simply open the scsi device, double check that it's still disconnected, write to sysfs, and then close the device. That would offer the same guarantee, without the need to interact with anything else. > Martin
