On Fri, 2016-11-04 at 12:17 -0600, Bart Van Assche wrote:
> On 11/04/2016 07:47 AM, James Bottomley wrote:
> > You know after
> >
> > if (device_remove_file_self(dev, attr))
> >
> > returns true that s_active is held and also that KERNFS_SUICIDAL is
> > set on the node, so the non-self remove paths can simply check for
> > this flag and return without descending into __kernfs_remove(),
> > which would mean they never take s_active. That means we never get
> > the inversion.
>
> Hello James,
>
> The lock inversion is not triggered by the non-self remove paths but
> by the self-remove path.
I think we should agree first what the problem is. The inversion
occurs between the sysfs delete path and the device node delete caused
by a remove host. When both are happening the inversion is that when
if (device_remove_file_self(dev, attr))
scsi_remove_device(to_scsi_device(dev));
Happens, after the if, the s_active lock is held then
scsi_remove_device goes on to take the scan_mutex.
Conversely in scsi_remove_host, the mutex is taken first then
scsi_forget_host iterates removing the devices, but sysfs file removal
eventually takes s_active in kernfs_drain, which is called from
kernfs_remove via kernfs_remove_by_name_ns, hence the inversion.
This is therefore a conflict between the self and non-self remove
paths.
> Anyway, can you have a look at the two attached
> patches?
Well, they're still overly complex, but perhaps due to the
misunderstanding above? If you look through that trigger case, you'll
see that the fix is simply to check KERNFS_SUICIDAL in
kernfs_remove_by_name_ns and not descend into __kernfs_remove() if it's
set. I think kernfs_mutex mediates this, but probably only if it's
moved lower down in kernfs_drain.
James
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html